[kudu-CR] [tool] kudu tool, support add column

2021-09-29 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17851 )

Change subject: [tool] kudu tool, support add column
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17851/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17851/2//COMMIT_MSG@7
PS2, Line 7: kudu tool, support add column
CLI to add a column


http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/client/schema.h
File src/kudu/client/schema.h:

http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/client/schema.h@310
PS2, Line 310: /// @todo maybe should united as one, such as 
kudu::EncodingType(pb) and
 :   /// kudu::client::KuduColumnStorageAttributes::EncodingType, 
kudu::CompressionType(pb)
 :   /// and 
kudu::client::KuduColumnStorageAttributes::CompressionType
 :   /// @return Storage attributes of the column
I don't understand this comment.


http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@128
PS2, Line 128:  "AUTO_ENCODING, PLAIN_ENCODING, PREFIX_ENCODING, 
RLE, DICT_ENCODING, "
 :   "BIT_SHUFFLE, GROUP_VARINT");
Type of encoding for the column including AUTO_ENCODING, PLAIN_ENCODING...


http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@131
PS2, Line 131: "DEFAULT_COMPRESSION, NO_COMPRESSION, SNAPPY, LZ4, ZLIB"
Type of compression for the column;


http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@985
PS2, Line 985: const string& data_type_name = 
FindOrDie(context.required_args, kDataTypeArg);
 : KuduColumnSchema::DataType data_type;
 : 
RETURN_NOT_OK(KuduColumnSchema::StringToDataType(data_type_name, _type));
This required param should go at the top after column_name.


http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@984
PS2, Line 984:   {
 : const string& data_type_name = 
FindOrDie(context.required_args, kDataTypeArg);
 : KuduColumnSchema::DataType data_type;
 : 
RETURN_NOT_OK(KuduColumnSchema::StringToDataType(data_type_name, _type));
 : column_spec->Type(data_type);
 :
 : if (!FLAGS_default_value.empty()) {
 :   KuduValue* value = nullptr;
 :   RETURN_NOT_OK(ParseValueOfType(FLAGS_default_value, 
data_type, ));
 :   column_spec->Default(value);
 : }
 :   }
Nit: Don't see a need for the block scope and below as well.



--
To view, visit http://gerrit.cloudera.org:8080/17851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93e6c2af07e9fa5aa909b4a09b79b7c54401d4e3
Gerrit-Change-Number: 17851
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 29 Sep 2021 23:16:14 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner

2021-09-29 Thread Hongjiang Zhang (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Grant Henke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17773

to look at the new patch set (#13).

Change subject: KUDU-1260: Fix prefetching bug on Java scanner
..

KUDU-1260: Fix prefetching bug on Java scanner

Add a UT to test prefetching. The UT has two concurrent threads: writing
thread and scanner thread. The writing thread records the timestamp of
its write, and the scanner thread creates two scanners (w and w/o prefetching),
by comparing the scan result of the two scanners, we can verify the
prefetching result.

When prefetching is enabled, there is a RowResultIterator prefetched and
it will override the one which has not yet been consumed in current
implementation, as a result, some data will loss. The fix is simple:
use an atomic reference to cache the result and avoid value overriding.

Furthermore, there are at most two concurrent ScanRequests
sent to the tserver. But if the scan data reached the end, only one 
hasMore=false is returned.
As a result, one of the ScanRequests got "scanner not found (it may have 
expired)" exception.
The same issue occurs for KeepAliveRequest. This patch also addressed this 
issue.

Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
A 
java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScannerPrefetching.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
5 files changed, 461 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/17773/13
--
To view, visit http://gerrit.cloudera.org:8080/17773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
Gerrit-Change-Number: 17773
Gerrit-PatchSet: 13
Gerrit-Owner: Hongjiang Zhang 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hongjiang Zhang 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner

2021-09-29 Thread Hongjiang Zhang (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Grant Henke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17773

to look at the new patch set (#12).

Change subject: KUDU-1260: Fix prefetching bug on Java scanner
..

KUDU-1260: Fix prefetching bug on Java scanner

Add a UT to test prefetching. The UT has two concurrent threads: writing
thread and scanner thread. The writing thread records the timestamp of
its write, and the scanner thread creates two scanners (w and w/o prefetching),
by comparing the scan result of the two scanners, we can verify the
prefetching result.

When prefetching is enabled, there is a RowResultIterator prefetched and
it will override the one which has not yet been consumed in current
implementation, as a result, some data will loss. The fix is simple:
use an atomic reference to cache the result and avoid value overriding.

Furthermore, there are at most two concurrent ScanRequests
sent to the tserver. But if the scan data reached the end, only one 
hasMore=false is returned.
As a result, one of the ScanRequests got "scanner not found (it may have 
expired)" exception.
The same issue occurs for KeepAliveRequest. This patch also addressed this 
issue.

Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
A 
java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScannerPrefetching.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
5 files changed, 461 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/17773/12
--
To view, visit http://gerrit.cloudera.org:8080/17773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
Gerrit-Change-Number: 17773
Gerrit-PatchSet: 12
Gerrit-Owner: Hongjiang Zhang 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hongjiang Zhang 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner

2021-09-29 Thread Hongjiang Zhang (Code Review)
Hongjiang Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17773 )

Change subject: KUDU-1260: Fix prefetching bug on Java scanner
..


Patch Set 11:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@24
PS11, Line 24: This error may cause the spark task to fail sometimes.
> nit: I haven't gone through the entire patch yet, but is this addressed in
This description is confusing. In fact, this issue is addressed by this patch.


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@292
PS11, Line 292:
> nit: we typically indent at four spaces for line wraps
Ack


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@299
PS11, Line 299:* This error may cause the spark task to fail sometimes.
> nit: this detail doesn't seem important -- it seems like this would affect
Ack


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@449
PS11, Line 449: return this.hasMore || cachedPrefetcherDeferred.get() != 
null;
> nit: now it's a bit confusing to have both the concept of "having more" and
Ack


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@654
PS11, Line 654:
> nit: sam here with regards to spacing
Ack


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@671
PS11, Line 671: Deferred prefetcherDeferred = 
client.scanNextRows(AsyncKuduScanner.this)
  : .addCallbacks(gotNextRow, nextRowErrback());
  : if (!cachedPrefetcherDeferred.compareAndSet(null, 
prefetcherDeferred)) {
  :   LOG.info("Skip one prefetching because two 
consecutive prefetching scan occurs");
  : }
> Rephrasing Alexey's question, once we call client.scanNextRows(), does it s
client.scanNextRows() calls immediately sends a scan request. If we drop the 
request's deferred, I don't think there is any rows missing.


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1212
PS11, Line 1212: if (!(prefetching && closed)) {
> nit: before entering the switch, would it make sense to check canBeIgnored(
Ack


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1252
PS11, Line 1252:   if (error != null) {
   : if (canBeIgnored(resp.getError().getCode())) {
   :   LOG.info("Ignore false alert of scanner not found 
for scan request");
   :   error = null;
   : }
   :   }
> Would it make sense to put this up by the switch statement so we don't have
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/17773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
Gerrit-Change-Number: 17773
Gerrit-PatchSet: 11
Gerrit-Owner: Hongjiang Zhang 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hongjiang Zhang 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 29 Sep 2021 09:18:14 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3318 [LBM] Runtime compact log container metadata

2021-09-29 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17871 )

Change subject: KUDU-3318 [LBM] Runtime compact log container metadata
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17871/3//COMMIT_MSG
Commit Message:

PS3:
Have you tried deploying this on a cluster? Curious if you were able to see it 
solve the high disk usage issue you were seeing earlier.


http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@208
PS3, Line 208: enable
nit: "when enabling"


http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@576
PS3, Line 576: // Try lock before reading metadata offset, consider it not 
full if lock failed.
 : if (!metadata_lock_.try_lock()) {
 :   return false;
 : }
Does this mean that if a metadata file is getting written to a lot, we may 
repeatedly consider it not full, even though it is?

+1 to the idea of adding metadata size, if that helps us avoid this.


http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@621
PS3, Line 621:   bool check_compact_condition(bool lock) const {
nit: maybe call this ShouldCompact()?


http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@628
PS3, Line 628: if (lock) {
nit: typically when we need both a locked and an unlocked version, that takes 
the form of

T DoSomethingUnlocked() {
  // maybe DCHECK that the lock is held
  // ... does something
}

T DoSomething() {
  Lock l(lock_);
  DoSomethingUnlocked();
}


http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@864
PS3, Line 864:   // However, we're hosed if we can't open the new metadata file.
In the event of a disk failure, rather than crashing, perhaps we should set 
some flag in the container to indicate to appenders that the container is bad?


http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@2632
PS3, Line 2632: LogBlockContainerRefPtr c = FindOrDie(all_containers_by_name_, 
lb->container()->ToString());
Rather than looking up the container reference, could we use scoped_refptr& 
operator=(T* p) to generate a scoped refptr? e.g. 
scoped_refptr self(lb->container());



--
To view, visit http://gerrit.cloudera.org:8080/17871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72bdf73fa73403918c01dfa9d45530fad47ecaf6
Gerrit-Change-Number: 17871
Gerrit-PatchSet: 3
Gerrit-Owner: Yingchun Lai 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 29 Sep 2021 07:54:21 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner

2021-09-29 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17773 )

Change subject: KUDU-1260: Fix prefetching bug on Java scanner
..


Patch Set 11:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@10
PS11, Line 10: The writing thread records the timestamp of
 : its write, and the scanner thread creates two scanners (w and 
w/o prefetching),
 : by comparing the scan result of the two scanners, we can verify 
the
 : prefetching result.
It's great that this test does a lot of things, but in order to drill into the 
specific behavior of scan prefetching, do we actually have to do all of these 
things? Wouldn't it be simpler and more easily debuggable to just have scanners 
with prefetching, while still maintaining test coverage for this feature?


http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@24
PS11, Line 24: This error may cause the spark task to fail sometimes.
nit: I haven't gone through the entire patch yet, but is this addressed in this 
patch? Could you clarify whether this is existing behavior or even with this 
patch spark tasks fail?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@292
PS11, Line 292:
nit: we typically indent at four spaces for line wraps


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@299
PS11, Line 299:* This error may cause the spark task to fail sometimes.
nit: this detail doesn't seem important -- it seems like this would affect all 
Java applications, not just Spark.


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@449
PS11, Line 449: return this.hasMore || cachedPrefetcherDeferred.get() != 
null;
nit: now it's a bit confusing to have both the concept of "having more" and 
"having more rows". Perhaps we can rename 'hasMore' to something more specific 
so it's clearer what it's referring to? Perhaps canRequestMore or something?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@654
PS11, Line 654:
nit: sam here with regards to spacing


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@671
PS11, Line 671: Deferred prefetcherDeferred = 
client.scanNextRows(AsyncKuduScanner.this)
  : .addCallbacks(gotNextRow, nextRowErrback());
  : if (!cachedPrefetcherDeferred.compareAndSet(null, 
prefetcherDeferred)) {
  :   LOG.info("Skip one prefetching because two 
consecutive prefetching scan occurs");
  : }
> If cachedPrepetcherDeffered's value is null, which means no prefetching, th
Rephrasing Alexey's question, once we call client.scanNextRows(), does it start 
immediately sending a scan request? In which case, if we drop that request's 
deferred, won't we be missing some rows?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1212
PS11, Line 1212: if (!(prefetching && closed)) {
nit: before entering the switch, would it make sense to check canBeIgnored() 
instead? That way we ensure the ignoring is consistent since it all uses the 
same canBeIgnored() method


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1252
PS11, Line 1252:   if (error != null) {
   : if (canBeIgnored(resp.getError().getCode())) {
   :   LOG.info("Ignore false alert of scanner not found 
for scan request");
   :   error = null;
   : }
   :   }
Would it make sense to put this up by the switch statement so we don't have to 
have additional handling at L1210?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

PS11:
nit: please move the Spark portion of this into a separate changelist rather 
than lumping everything into one big patch



--
To view, visit http://gerrit.cloudera.org:8080/17773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
Gerrit-Change-Number: 17773