[kudu-CR] [tool] kudu tool, support add column
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17851 ) Change subject: [tool] kudu tool, support add column .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/17851/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17851/2//COMMIT_MSG@7 PS2, Line 7: kudu tool, support add column CLI to add a column http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/client/schema.h File src/kudu/client/schema.h: http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/client/schema.h@310 PS2, Line 310: /// @todo maybe should united as one, such as kudu::EncodingType(pb) and : /// kudu::client::KuduColumnStorageAttributes::EncodingType, kudu::CompressionType(pb) : /// and kudu::client::KuduColumnStorageAttributes::CompressionType : /// @return Storage attributes of the column I don't understand this comment. http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc File src/kudu/tools/tool_action_table.cc: http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@128 PS2, Line 128: "AUTO_ENCODING, PLAIN_ENCODING, PREFIX_ENCODING, RLE, DICT_ENCODING, " : "BIT_SHUFFLE, GROUP_VARINT"); Type of encoding for the column including AUTO_ENCODING, PLAIN_ENCODING... http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@131 PS2, Line 131: "DEFAULT_COMPRESSION, NO_COMPRESSION, SNAPPY, LZ4, ZLIB" Type of compression for the column; http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@985 PS2, Line 985: const string& data_type_name = FindOrDie(context.required_args, kDataTypeArg); : KuduColumnSchema::DataType data_type; : RETURN_NOT_OK(KuduColumnSchema::StringToDataType(data_type_name, _type)); This required param should go at the top after column_name. http://gerrit.cloudera.org:8080/#/c/17851/2/src/kudu/tools/tool_action_table.cc@984 PS2, Line 984: { : const string& data_type_name = FindOrDie(context.required_args, kDataTypeArg); : KuduColumnSchema::DataType data_type; : RETURN_NOT_OK(KuduColumnSchema::StringToDataType(data_type_name, _type)); : column_spec->Type(data_type); : : if (!FLAGS_default_value.empty()) { : KuduValue* value = nullptr; : RETURN_NOT_OK(ParseValueOfType(FLAGS_default_value, data_type, )); : column_spec->Default(value); : } : } Nit: Don't see a need for the block scope and below as well. -- To view, visit http://gerrit.cloudera.org:8080/17851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93e6c2af07e9fa5aa909b4a09b79b7c54401d4e3 Gerrit-Change-Number: 17851 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 29 Sep 2021 23:16:14 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Grant Henke, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17773 to look at the new patch set (#13). Change subject: KUDU-1260: Fix prefetching bug on Java scanner .. KUDU-1260: Fix prefetching bug on Java scanner Add a UT to test prefetching. The UT has two concurrent threads: writing thread and scanner thread. The writing thread records the timestamp of its write, and the scanner thread creates two scanners (w and w/o prefetching), by comparing the scan result of the two scanners, we can verify the prefetching result. When prefetching is enabled, there is a RowResultIterator prefetched and it will override the one which has not yet been consumed in current implementation, as a result, some data will loss. The fix is simple: use an atomic reference to cache the result and avoid value overriding. Furthermore, there are at most two concurrent ScanRequests sent to the tserver. But if the scan data reached the end, only one hasMore=false is returned. As a result, one of the ScanRequests got "scanner not found (it may have expired)" exception. The same issue occurs for KeepAliveRequest. This patch also addressed this issue. Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScannerPrefetching.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala 5 files changed, 461 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/17773/13 -- To view, visit http://gerrit.cloudera.org:8080/17773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd Gerrit-Change-Number: 17773 Gerrit-PatchSet: 13 Gerrit-Owner: Hongjiang Zhang Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hongjiang Zhang Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Grant Henke, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17773 to look at the new patch set (#12). Change subject: KUDU-1260: Fix prefetching bug on Java scanner .. KUDU-1260: Fix prefetching bug on Java scanner Add a UT to test prefetching. The UT has two concurrent threads: writing thread and scanner thread. The writing thread records the timestamp of its write, and the scanner thread creates two scanners (w and w/o prefetching), by comparing the scan result of the two scanners, we can verify the prefetching result. When prefetching is enabled, there is a RowResultIterator prefetched and it will override the one which has not yet been consumed in current implementation, as a result, some data will loss. The fix is simple: use an atomic reference to cache the result and avoid value overriding. Furthermore, there are at most two concurrent ScanRequests sent to the tserver. But if the scan data reached the end, only one hasMore=false is returned. As a result, one of the ScanRequests got "scanner not found (it may have expired)" exception. The same issue occurs for KeepAliveRequest. This patch also addressed this issue. Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScannerPrefetching.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala 5 files changed, 461 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/17773/12 -- To view, visit http://gerrit.cloudera.org:8080/17773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd Gerrit-Change-Number: 17773 Gerrit-PatchSet: 12 Gerrit-Owner: Hongjiang Zhang Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hongjiang Zhang Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner
Hongjiang Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17773 ) Change subject: KUDU-1260: Fix prefetching bug on Java scanner .. Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@24 PS11, Line 24: This error may cause the spark task to fail sometimes. > nit: I haven't gone through the entire patch yet, but is this addressed in This description is confusing. In fact, this issue is addressed by this patch. http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@292 PS11, Line 292: > nit: we typically indent at four spaces for line wraps Ack http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@299 PS11, Line 299:* This error may cause the spark task to fail sometimes. > nit: this detail doesn't seem important -- it seems like this would affect Ack http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@449 PS11, Line 449: return this.hasMore || cachedPrefetcherDeferred.get() != null; > nit: now it's a bit confusing to have both the concept of "having more" and Ack http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@654 PS11, Line 654: > nit: sam here with regards to spacing Ack http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@671 PS11, Line 671: Deferred prefetcherDeferred = client.scanNextRows(AsyncKuduScanner.this) : .addCallbacks(gotNextRow, nextRowErrback()); : if (!cachedPrefetcherDeferred.compareAndSet(null, prefetcherDeferred)) { : LOG.info("Skip one prefetching because two consecutive prefetching scan occurs"); : } > Rephrasing Alexey's question, once we call client.scanNextRows(), does it s client.scanNextRows() calls immediately sends a scan request. If we drop the request's deferred, I don't think there is any rows missing. http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1212 PS11, Line 1212: if (!(prefetching && closed)) { > nit: before entering the switch, would it make sense to check canBeIgnored( Ack http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1252 PS11, Line 1252: if (error != null) { : if (canBeIgnored(resp.getError().getCode())) { : LOG.info("Ignore false alert of scanner not found for scan request"); : error = null; : } : } > Would it make sense to put this up by the switch statement so we don't have Ack -- To view, visit http://gerrit.cloudera.org:8080/17773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd Gerrit-Change-Number: 17773 Gerrit-PatchSet: 11 Gerrit-Owner: Hongjiang Zhang Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hongjiang Zhang Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 29 Sep 2021 09:18:14 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3318 [LBM] Runtime compact log container metadata
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17871 ) Change subject: KUDU-3318 [LBM] Runtime compact log container metadata .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/17871/3//COMMIT_MSG Commit Message: PS3: Have you tried deploying this on a cluster? Curious if you were able to see it solve the high disk usage issue you were seeing earlier. http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@208 PS3, Line 208: enable nit: "when enabling" http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@576 PS3, Line 576: // Try lock before reading metadata offset, consider it not full if lock failed. : if (!metadata_lock_.try_lock()) { : return false; : } Does this mean that if a metadata file is getting written to a lot, we may repeatedly consider it not full, even though it is? +1 to the idea of adding metadata size, if that helps us avoid this. http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@621 PS3, Line 621: bool check_compact_condition(bool lock) const { nit: maybe call this ShouldCompact()? http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@628 PS3, Line 628: if (lock) { nit: typically when we need both a locked and an unlocked version, that takes the form of T DoSomethingUnlocked() { // maybe DCHECK that the lock is held // ... does something } T DoSomething() { Lock l(lock_); DoSomethingUnlocked(); } http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@864 PS3, Line 864: // However, we're hosed if we can't open the new metadata file. In the event of a disk failure, rather than crashing, perhaps we should set some flag in the container to indicate to appenders that the container is bad? http://gerrit.cloudera.org:8080/#/c/17871/3/src/kudu/fs/log_block_manager.cc@2632 PS3, Line 2632: LogBlockContainerRefPtr c = FindOrDie(all_containers_by_name_, lb->container()->ToString()); Rather than looking up the container reference, could we use scoped_refptr& operator=(T* p) to generate a scoped refptr? e.g. scoped_refptr self(lb->container()); -- To view, visit http://gerrit.cloudera.org:8080/17871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72bdf73fa73403918c01dfa9d45530fad47ecaf6 Gerrit-Change-Number: 17871 Gerrit-PatchSet: 3 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 29 Sep 2021 07:54:21 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1260: Fix prefetching bug on Java scanner
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17773 ) Change subject: KUDU-1260: Fix prefetching bug on Java scanner .. Patch Set 11: (10 comments) http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@10 PS11, Line 10: The writing thread records the timestamp of : its write, and the scanner thread creates two scanners (w and w/o prefetching), : by comparing the scan result of the two scanners, we can verify the : prefetching result. It's great that this test does a lot of things, but in order to drill into the specific behavior of scan prefetching, do we actually have to do all of these things? Wouldn't it be simpler and more easily debuggable to just have scanners with prefetching, while still maintaining test coverage for this feature? http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@24 PS11, Line 24: This error may cause the spark task to fail sometimes. nit: I haven't gone through the entire patch yet, but is this addressed in this patch? Could you clarify whether this is existing behavior or even with this patch spark tasks fail? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@292 PS11, Line 292: nit: we typically indent at four spaces for line wraps http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@299 PS11, Line 299:* This error may cause the spark task to fail sometimes. nit: this detail doesn't seem important -- it seems like this would affect all Java applications, not just Spark. http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@449 PS11, Line 449: return this.hasMore || cachedPrefetcherDeferred.get() != null; nit: now it's a bit confusing to have both the concept of "having more" and "having more rows". Perhaps we can rename 'hasMore' to something more specific so it's clearer what it's referring to? Perhaps canRequestMore or something? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@654 PS11, Line 654: nit: sam here with regards to spacing http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@671 PS11, Line 671: Deferred prefetcherDeferred = client.scanNextRows(AsyncKuduScanner.this) : .addCallbacks(gotNextRow, nextRowErrback()); : if (!cachedPrefetcherDeferred.compareAndSet(null, prefetcherDeferred)) { : LOG.info("Skip one prefetching because two consecutive prefetching scan occurs"); : } > If cachedPrepetcherDeffered's value is null, which means no prefetching, th Rephrasing Alexey's question, once we call client.scanNextRows(), does it start immediately sending a scan request? In which case, if we drop that request's deferred, won't we be missing some rows? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1212 PS11, Line 1212: if (!(prefetching && closed)) { nit: before entering the switch, would it make sense to check canBeIgnored() instead? That way we ensure the ignoring is consistent since it all uses the same canBeIgnored() method http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1252 PS11, Line 1252: if (error != null) { : if (canBeIgnored(resp.getError().getCode())) { : LOG.info("Ignore false alert of scanner not found for scan request"); : error = null; : } : } Would it make sense to put this up by the switch statement so we don't have to have additional handling at L1210? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: PS11: nit: please move the Spark portion of this into a separate changelist rather than lumping everything into one big patch -- To view, visit http://gerrit.cloudera.org:8080/17773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd Gerrit-Change-Number: 17773