ji chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/23613 )
Change subject: IMPALA-14092 Part2: Support querying of paimon data table via JNI ...................................................................... Patch Set 11: (44 comments) http://gerrit.cloudera.org:8080/#/c/23613/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23613/9//COMMIT_MSG@39 PS9, Line 39: 2.x > Did the use of static_cast instead of dynamic_cast improve it further? a bit better, still within 2.x range. http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.h File be/src/exec/paimon/paimon-jni-row-reader.h: http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.h@49 PS9, Line 49: a row denoted > nit: "a row denoted by 'row_idx' from an arrow batch" Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.h@54 PS9, Line 54: /// Template method that writes a value from 'arraw_array' to 'slot'. : /// 'T' is the type of the slot, > nit: Template method that writes a value from 'arraw_array' to 'slot'. 'T' Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc File be/src/exec/paimon/paimon-jni-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc@50 PS9, Line 50: & > nit: redundant space Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc@188 PS9, Line 188: (P > nit: no need for this parenthesis Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc@233 PS9, Line 233: std::string& tz_name = type.timezone(); : const Timezone* > nit: we put */& next to the type (without a space), e.g.: Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc File be/src/exec/paimon/paimon-jni-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@54 PS9, Line 54: nOpenTime"); > It measures the time spent in Open(). Shouldn't it be called ScanOpenTime? Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@86 PS9, Line 86: > nit: needs +4 indent Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@143 PS9, Line 143: /// update allocated offheap memory reported from Jni. : if (offheap_consumed_bytes > paimon_last_arrow_record_batch_consumed_bytes_) { : OffheapTrackFree(); : OffheapTrackAllocation(offheap_consumed_bytes); : } > Any reason why we only update the tracked offheap memory usage when the new calling memtracker consume/free method is not lightweight, only update if higher value is found, so calling count can be reduced to save cpu cycles, http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@151 PS9, Line 151: > From the arrow documentation: Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.h File be/src/exec/paimon/paimon-jni-scanner.h: http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.h@56 PS9, Line 56: inline > nit: inline keywords are redundant looks like it is not redundant, will cause compile error if it is removed. http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.cc File be/src/exec/paimon/paimon-jni-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.cc@76 PS9, Line 76: , long* rows, long* offheap_use > nit: long* rows, long* offheap_used Done http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.cc@77 PS9, Line 77: . > nit: ", and the offheap usage in bytes" Done http://gerrit.cloudera.org:8080/#/c/23613/9/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/23613/9/bin/bootstrap_toolchain.py@480 PS9, Line 480: ]] > nit: the packages are in lexicographic order Done http://gerrit.cloudera.org:8080/#/c/23613/9/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/23613/9/bin/impala-config.sh@210 PS9, Line 210: 13 > Seems like the newest stable version is 22. Any reason we use version 13? E yes, but only arrow 13 is available in impala toolchain, it is working fine according to the test, so currently, 13 is used. http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/Table.java@673 PS9, Line 673: Pa > nit: indentation is off Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/Table.java@676 PS9, Line 676: > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java@49 PS9, Line 49: Ta > nit: too much indentation here and below Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java@68 PS9, Line 68: addVirtualColumns(ref.getVirtualColumns()); > Why do we add virtual columns if they cannot be queried? Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@60 PS9, Line 60: import org.apache.impala.thrift.TImpalaTableType; > Unused import Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java@45 PS9, Line 45: */ > The imports are unused. Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonImpalaTypeUtils.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonImpalaTypeUtils.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonImpalaTypeUtils.java@212 PS9, Line 212: for (StructField field : fields) { : builder.field(field.getName(), visit(field.getType(), visitor)); : } > Any reason to change this? The original is more concise and readable. Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonStructField.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonStructField.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonStructField.java@78 PS9, Line 78: o > nit: no need for the parentheses, same goes for next line Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@31 PS9, Line 31: import org.apache.impala.catalog.StructType; > Unused import Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@886 PS9, Line 886: > nit: needs +4 indent Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@50 PS9, Line 50: . > nit: we prefer non-wildcard imports. Same for L25 & L28. Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@130 PS9, Line 130: throw new AnalysisException("Paimon Scanner doesn't support virtual columns."); : } : if (sd.getPath().getRawPath() != null && sd.getPath().getRawPath().size() > 1) { : throw new AnalysisException("Paimon Scanner doesn't support nested columns."); : } : PaimonColumn paimonColumn = (PaimonColumn) desc_.getSlots().get(i).getColumn(); : projection_[i] = paimonColumn.getFieldId(); : } : Preconditions.checkArgument(projection_.length == desc_.getSlots().size()); : L > Could be just: Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@161 PS9, Line 161: > nit: magic constant. Please introduce a variable with a descriptive name fo Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@216 PS9, Line 216: * ultimately determined by the scheduling done by the backend's Scheduler). : * As of now, scan ranges are scheduled round-robin, since they > As of now, scan ranges are scheduled round-robin, since they have no locati yes, it can be simplified. http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java File fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java@33 PS9, Line 33: factory > Paimon. Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java@35 PS9, Line 35: > Paimon Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java@34 PS9, Line 34: de later. : */ : public class PaimonScanPlanner { > This class currently doesn't deal with any of this. They are taken care by Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1915 PS9, Line 1915: tb > nit: +4 spaces is enough, see code above and below Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/paimon/PaimonSplit.java File fe/src/main/java/org/apache/impala/planner/paimon/PaimonSplit.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/paimon/PaimonSplit.java@39 PS9, Line 39: ( > nit: no need for he '_' for public methods. Same in L43. Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatNativeWriter.java File fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatNativeWriter.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatNativeWriter.java@34 PS9, Line 34: relevant PR > Can you please link the PR? Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java File fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java@38 PS9, Line 38: relevant PR > please link PR Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java@92 PS9, Line 92: > nit: please add empty line after this Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java@132 PS9, Line 132: > nit: no need for the '_' for public methods. Same at L136. Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java File fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java@37 PS9, Line 37: A > nit: we prefer non-wildcard imports Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java@179 PS9, Line 179: toArrowField(f > This method is unused Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java File fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java: http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@71 PS9, Line 71: > nit: writer Done http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@125 PS9, Line 125: } : } else { : break; : } > The for loop here checks of 'i < batchSize_'. But it is also checked by wri the code is used to get the row count, in writer, rowId_ is private. so can't directly get the row count. http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@151 PS9, Line 151: } : // Create Iterator for given splits. : L > Currently predicates are always null/empty, right? All conjuncts are evalua yes.exactly. http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@176 PS9, Line 176: s > nit: add space Done -- To view, visit http://gerrit.cloudera.org:8080/23613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Gerrit-Change-Number: 23613 Gerrit-PatchSet: 11 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Mon, 01 Dec 2025 10:34:02 +0000 Gerrit-HasComments: Yes
