Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23483 )

Change subject: KUDU-1261 [Java] Add read support for Array Type
......................................................................


Patch Set 12:

(10 comments)

High-level feedback items:
  * I'd not expose ArrayCellView in the public API at all -- I don't see why 
it's necessary.  If we provide access to the array's data via Java-style arrays 
of boxed types, it should be already enough in the scope of this changelist, I 
guess.
  * Do you think should we also expose the data as unboxed Java arrays along 
with validity bitset/vector?  I'm not advocating for adding the corresponding 
code in this particular changelist, but I'm just curious what you think about 
this.  I guess one important consumer of the scanned data should be Spark 
bindings, and it will be clear what's needed there once trying to implement 
those.

http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java:

http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@626
PS12, Line 626:          assert arr != null;
Could you use proper Precondition instead of assert here?


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@667
PS12, Line 667: tsArr
What if tsAttr == null?


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@680
PS12, Line 680:
nit: add 'break' in 'default' as well?  or that's prohibited by the style 
checker?


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@775
PS12, Line 775: public final ArrayCellView getArray(int columnIndex)
Why do you want to expose ArrayCellView as a part of the public interface?  
Wouldn't it be enough to provide accessors that return Java arrays of 
corresponding types given column name or column index in a scanner's result row?

For the C++ code, we don't expose ArrayCellMetadataView as a part of the public 
client API.  Instead, there is KuduArrayCellView, and the only purpose of its 
existence is providing access to array cell's data via already existing  
KuduScanBatch::RowPtr::direct_data() API (e.g., it's used by Impala).  If not 
direct_data(), there wouldn't be KuduArrayCellView() at all.

With Java API, I don't see anything that resembles 
KuduScanBatch::RowPtr::direct_data().  Why to expose ArrayCellView in the Java 
API then?  Remember: every part of public API is a liability -- it's necessary 
to support it in future releases and we cannot update it without making sure it 
stays backward-compatible.


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@794
PS12, Line 794:   /**
              :    * Convenience overload for {@link #getArray(int)} by column 
name.
              :    */
              :   public final ArrayCellView getArray(String columnName) {
              :     return getArray(this.schema.getColumnIndex(columnName));
              :   }
I don't think this should be a public method.


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@845
PS12, Line 845:     int precision = 0;
              :     int scale = 0;
              :     if (col.getTypeAttributes() != null) {
              :       precision = col.getTypeAttributes().getPrecision();
              :       scale = col.getTypeAttributes().getScale();
              :     }
              :     Type logicalType = 
col.getNestedTypeDescriptor().getArrayDescriptor().getElemType();
Why not to do this in ArrayCellViewUtil method implementation instead, passing 
there just 'col' and 'view'?

Also, what if that's VARCHAR() type?  Where is the field 'size' attribute being 
passed then?


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2235
PS12, Line 2235:       if (i % 5 == 0) {
               :         session.flush();
               :       }
What is this for?  There flush() call at line 2239 already.


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2245
PS12, Line 2245:     // Ensure deterministic order by sorting on the key column
               :     rowStrings.sort(Comparator.comparingInt(rowStr -> {
               :       int eq = rowStr.indexOf('=');
               :       int comma = rowStr.indexOf(',', eq);
               :       return Integer.parseInt(rowStr.substring(eq + 1, 
comma).trim());
               :     }));
I'm not sure I understand why this is necessary: scanTableToStrings() already 
sorts the result strings. The key column is the first one in the output, and 
there is just 10 rows, indexed from 0 to 9, so why to have this at all?


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2311
PS12, Line 2311: if (intArr != null) {
Here and below: should there be an assert based on index of the row instead of 
'if()' clause?  What if every row results in null instead of the expected array?


http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2314
PS12, Line 2314:         }
Here and below: consider adding verification of the actual values retrieved.  
It shouldn't be that hard, given it's already done for the stringified results, 
right?



--
To view, visit http://gerrit.cloudera.org:8080/23483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie26750cf540b0097c77e5454c1c1d20b3a194c52
Gerrit-Change-Number: 23483
Gerrit-PatchSet: 12
Gerrit-Owner: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Comment-Date: Wed, 15 Oct 2025 04:37:20 +0000
Gerrit-HasComments: Yes

Reply via email to