Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )

Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 8:

(16 comments)

Thanks for your great work, again! Started looking at the backend stuff, will 
continue next week.

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h
File be/src/exec/paimon/paimon-jni-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@36
PS8, Line 36: an
nit: 'an' is redundant


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@43
PS8, Line 43:
nit: we usually don't put space here


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@48
PS8, Line 48:
            :
nit: redundant empty lines


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@53
PS8, Line 53: *
nit: we put the '*' next to the type


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@97
PS8, Line 97: CharVarChar
nit: For WriteStringOrBinarySlot String is mentioned first, then Binary, and 
the template bool param refers to the second.

In WriteCharVarCharSlot Char is mentioned first, then VarChar, and the template 
bool param refers to the first. It'd be better align the two methods.


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc
File be/src/exec/paimon/paimon-jni-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@64
PS8, Line 64: t
nit: needs +4 indent


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@123
PS8, Line 123: c
nit: missing space


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@148
PS8, Line 148: dynamic_cast
We could add DCHECK_NOTNULL around the dynamic_cast. Current code might 
dereference a null pointer.

Or, if we are sure about the actual arrow Array subtype, then we can just use 
static_cast which is much more efficient than dynamic_cast. Especially that we 
are doing a dynamic cast for each value in the array.

It would be interesting to measure performance with static_casts only.


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@149
PS8, Line 149:   bool result = bool_array->Value(row_idx);
             :   *reinterpret_cast<bool*>(slot) = result;
             :   return Status::OK();
This pattern repeats multiple times, could be moved to a template function like 
this:

template<typename T, typename AT>
Status WriteSlot(const AT* arrow_array, int row_idx, void* slot) {
  T value = arrow_array->Value(row_idx);
  *reinterpret_cast<T*>(slot) = value;
  return Status::OK();
}

In most cases the following method could be used :

template<typename T, typename AT>
Status DowncastAndWriteSlot(const arrow::Array* arrow_array,
    const int row_idx, void* slot) {
  auto derived_array = row_idx == 0 ?
      DCHECK_NOTNULL(dynamic_cast<const AT*>(arrow_array)) :
      static_cast<const AT*>(arrow_array);
  return WriteSlot<T>(derived_array, row_idx, slot);
}


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@171
PS8, Line 171: reinterpret_cast<int8_t>(
Is it needed? 'v' is already int16_t. Same goes for L179, L187, L195.


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@270
PS8, Line 270: type.timezone();
Do we have tests with TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP (WITHOUT 
TIME ZONE) as well?


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.h
File be/src/exec/paimon/paimon-jni-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.h@57
PS8, Line 57: Comsume
nit: Consume


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc
File be/src/exec/paimon/paimon-jni-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@148
PS8, Line 148:
             :
nit: too many empty lines


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@162
PS8, Line 162:
nit: redundant empty line


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@209
PS8, Line 209: row
nit: row batch


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@218
PS8, Line 218:     RETURN_IF_ERROR(paimon_row_reader_->MaterializeTuple(
             :         *paimon_arrow_record_batch_holder_, 
arrow_record_batch_row_index_, tuple_desc_,
             :         tuple, row_batch->tuple_data_pool(), state));
             :     /// Evaluate conjuncts on this tuple row
             :     if (ExecNode::EvalConjuncts(conjunct_evals().data(),
             :         conjunct_evals().size(), tuple_row)) {
             :       row_batch->CommitLastRow();
             :       tuple = reinterpret_cast<Tuple*>(
             :           reinterpret_cast<uint8_t*>(tuple) + 
tuple_desc_->byte_size());
             :       IncrementNumRowsReturned(1);
Since we are getting column-oriented data, in the future (definitely not in 
this CR) we could make the flow similar to the columnar scanners, where we fill 
a "scratch batch" by going through the columns one by one, then applying the 
conjuncts to the scratch batch.



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 8
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Fri, 21 Nov 2025 17:27:31 +0000
Gerrit-HasComments: Yes

Reply via email to