ji chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )

Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 11:

(44 comments)

http://gerrit.cloudera.org:8080/#/c/23613/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23613/9//COMMIT_MSG@39
PS9, Line 39: 2.x
> Did the use of static_cast instead of dynamic_cast improve it further?
a bit better, still within 2.x range.


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.h
File be/src/exec/paimon/paimon-jni-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.h@49
PS9, Line 49: a row denoted
> nit: "a row denoted by 'row_idx' from an arrow batch"
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.h@54
PS9, Line 54:   /// Template method that writes a value from 'arraw_array' to 
'slot'.
            :   /// 'T' is the type of the slot,
> nit: Template method that writes a value from 'arraw_array' to 'slot'. 'T'
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc
File be/src/exec/paimon/paimon-jni-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc@50
PS9, Line 50: &
> nit: redundant space
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc@188
PS9, Line 188: (P
> nit: no need for this parenthesis
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-row-reader.cc@233
PS9, Line 233: std::string& tz_name = type.timezone();
             :   const Timezone*
> nit: we put */& next to the type (without a space), e.g.:
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc
File be/src/exec/paimon/paimon-jni-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@54
PS9, Line 54: nOpenTime");
> It measures the time spent in Open(). Shouldn't it be called ScanOpenTime?
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@86
PS9, Line 86:
> nit: needs +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@143
PS9, Line 143:     /// update allocated offheap memory reported from Jni.
             :     if (offheap_consumed_bytes > 
paimon_last_arrow_record_batch_consumed_bytes_) {
             :       OffheapTrackFree();
             :       OffheapTrackAllocation(offheap_consumed_bytes);
             :     }
> Any reason why we only update the tracked offheap memory usage when the new
calling memtracker consume/free method  is not lightweight, only update if 
higher value is found, so calling count can be reduced to save cpu cycles,


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scan-node.cc@151
PS9, Line 151:
> From the arrow documentation:
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.h
File be/src/exec/paimon/paimon-jni-scanner.h:

http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.h@56
PS9, Line 56: inline
> nit: inline keywords are redundant
looks like it is not redundant, will cause compile error if it is removed.


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.cc
File be/src/exec/paimon/paimon-jni-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.cc@76
PS9, Line 76: , long* rows, long* offheap_use
> nit: long* rows, long* offheap_used
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/be/src/exec/paimon/paimon-jni-scanner.cc@77
PS9, Line 77: .
> nit: ", and the offheap usage in bytes"
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/23613/9/bin/bootstrap_toolchain.py@480
PS9, Line 480: ]]
> nit: the packages are in lexicographic order
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/23613/9/bin/impala-config.sh@210
PS9, Line 210: 13
> Seems like the newest stable version is 22. Any reason we use version 13? E
yes, but only arrow 13 is available in impala toolchain, it is working fine 
according to the test, so currently, 13 is used.


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/Table.java@673
PS9, Line 673: Pa
> nit: indentation is off
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/Table.java@676
PS9, Line 676:
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java@49
PS9, Line 49: Ta
> nit: too much indentation here and below
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java@68
PS9, Line 68: addVirtualColumns(ref.getVirtualColumns());
> Why do we add virtual columns if they cannot be queried?
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@60
PS9, Line 60: import org.apache.impala.thrift.TImpalaTableType;
> Unused import
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java@45
PS9, Line 45:  */
> The imports are unused.
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonImpalaTypeUtils.java
File 
fe/src/main/java/org/apache/impala/catalog/paimon/PaimonImpalaTypeUtils.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonImpalaTypeUtils.java@212
PS9, Line 212:         for (StructField field : fields) {
             :           builder.field(field.getName(), visit(field.getType(), 
visitor));
             :         }
> Any reason to change this? The original is more concise and readable.
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonStructField.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonStructField.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonStructField.java@78
PS9, Line 78: o
> nit: no need for the parentheses, same goes for next line
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@31
PS9, Line 31: import org.apache.impala.catalog.StructType;
> Unused import
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@886
PS9, Line 886:
> nit: needs +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java
File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@50
PS9, Line 50: .
> nit: we prefer non-wildcard imports. Same for L25 & L28.
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@130
PS9, Line 130:         throw new AnalysisException("Paimon Scanner doesn't 
support virtual columns.");
             :       }
             :       if (sd.getPath().getRawPath() != null && 
sd.getPath().getRawPath().size() > 1) {
             :         throw new AnalysisException("Paimon Scanner doesn't 
support nested columns.");
             :       }
             :       PaimonColumn paimonColumn = (PaimonColumn) 
desc_.getSlots().get(i).getColumn();
             :       projection_[i] = paimonColumn.getFieldId();
             :     }
             :     Preconditions.checkArgument(projection_.length == 
desc_.getSlots().size());
             :     L
> Could be just:
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@161
PS9, Line 161:
> nit: magic constant. Please introduce a variable with a descriptive name fo
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@216
PS9, Line 216:    * ultimately determined by the scheduling done by the 
backend's Scheduler).
             :    * As of now, scan ranges are scheduled round-robin, since they
> As of now, scan ranges are scheduled round-robin, since they have no locati
yes, it can be simplified.


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java@33
PS9, Line 33: factory
> Paimon.
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java@35
PS9, Line 35:
> Paimon
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/PaimonScanPlanner.java@34
PS9, Line 34: de later.
            :  */
            : public class PaimonScanPlanner {
> This class currently doesn't deal with any of this. They are taken care by
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1915
PS9, Line 1915:   tb
> nit: +4 spaces is enough, see code above and below
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/paimon/PaimonSplit.java
File fe/src/main/java/org/apache/impala/planner/paimon/PaimonSplit.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/planner/paimon/PaimonSplit.java@39
PS9, Line 39: (
> nit: no need for he '_' for public methods. Same in L43.
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatNativeWriter.java
File 
fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatNativeWriter.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatNativeWriter.java@34
PS9, Line 34: relevant PR
> Can you please link the PR?
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java
File 
fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java@38
PS9, Line 38: relevant PR
> please link PR
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java@92
PS9, Line 92:
> nit: please add empty line after this
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowFormatWriter.java@132
PS9, Line 132:
> nit: no need for the '_' for public methods. Same at L136.
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java
File fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java@37
PS9, Line 37: A
> nit: we prefer non-wildcard imports
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonArrowUtils.java@179
PS9, Line 179: toArrowField(f
> This method is unused
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java
File fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java:

http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@71
PS9, Line 71:
> nit: writer
Done


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@125
PS9, Line 125:         }
             :       } else {
             :         break;
             :       }
> The for loop here checks of 'i < batchSize_'. But it is also checked by wri
the code is used to get the row count, in writer, rowId_ is private. so can't 
directly get the row count.


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@151
PS9, Line 151:       }
             :       // Create Iterator for given splits.
             :       L
> Currently predicates are always null/empty, right? All conjuncts are evalua
yes.exactly.


http://gerrit.cloudera.org:8080/#/c/23613/9/fe/src/main/java/org/apache/impala/util/paimon/PaimonJniScanner.java@176
PS9, Line 176: s
> nit: add space
Done



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 11
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Mon, 01 Dec 2025 10:34:02 +0000
Gerrit-HasComments: Yes

Reply via email to