Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23483 )
Change subject: KUDU-1261 [Java] Add read support for Array Type ...................................................................... Patch Set 12: (10 comments) High-level feedback items: * I'd not expose ArrayCellView in the public API at all -- I don't see why it's necessary. If we provide access to the array's data via Java-style arrays of boxed types, it should be already enough in the scope of this changelist, I guess. * Do you think should we also expose the data as unboxed Java arrays along with validity bitset/vector? I'm not advocating for adding the corresponding code in this particular changelist, but I'm just curious what you think about this. I guess one important consumer of the scanned data should be Spark bindings, and it will be clear what's needed there once trying to implement those. http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java File java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java: http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@626 PS12, Line 626: assert arr != null; Could you use proper Precondition instead of assert here? http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@667 PS12, Line 667: tsArr What if tsAttr == null? http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@680 PS12, Line 680: nit: add 'break' in 'default' as well? or that's prohibited by the style checker? http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@775 PS12, Line 775: public final ArrayCellView getArray(int columnIndex) Why do you want to expose ArrayCellView as a part of the public interface? Wouldn't it be enough to provide accessors that return Java arrays of corresponding types given column name or column index in a scanner's result row? For the C++ code, we don't expose ArrayCellMetadataView as a part of the public client API. Instead, there is KuduArrayCellView, and the only purpose of its existence is providing access to array cell's data via already existing KuduScanBatch::RowPtr::direct_data() API (e.g., it's used by Impala). If not direct_data(), there wouldn't be KuduArrayCellView() at all. With Java API, I don't see anything that resembles KuduScanBatch::RowPtr::direct_data(). Why to expose ArrayCellView in the Java API then? Remember: every part of public API is a liability -- it's necessary to support it in future releases and we cannot update it without making sure it stays backward-compatible. http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@794 PS12, Line 794: /** : * Convenience overload for {@link #getArray(int)} by column name. : */ : public final ArrayCellView getArray(String columnName) { : return getArray(this.schema.getColumnIndex(columnName)); : } I don't think this should be a public method. http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@845 PS12, Line 845: int precision = 0; : int scale = 0; : if (col.getTypeAttributes() != null) { : precision = col.getTypeAttributes().getPrecision(); : scale = col.getTypeAttributes().getScale(); : } : Type logicalType = col.getNestedTypeDescriptor().getArrayDescriptor().getElemType(); Why not to do this in ArrayCellViewUtil method implementation instead, passing there just 'col' and 'view'? Also, what if that's VARCHAR() type? Where is the field 'size' attribute being passed then? http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java: http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2235 PS12, Line 2235: if (i % 5 == 0) { : session.flush(); : } What is this for? There flush() call at line 2239 already. http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2245 PS12, Line 2245: // Ensure deterministic order by sorting on the key column : rowStrings.sort(Comparator.comparingInt(rowStr -> { : int eq = rowStr.indexOf('='); : int comma = rowStr.indexOf(',', eq); : return Integer.parseInt(rowStr.substring(eq + 1, comma).trim()); : })); I'm not sure I understand why this is necessary: scanTableToStrings() already sorts the result strings. The key column is the first one in the output, and there is just 10 rows, indexed from 0 to 9, so why to have this at all? http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2311 PS12, Line 2311: if (intArr != null) { Here and below: should there be an assert based on index of the row instead of 'if()' clause? What if every row results in null instead of the expected array? http://gerrit.cloudera.org:8080/#/c/23483/12/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@2314 PS12, Line 2314: } Here and below: consider adding verification of the actual values retrieved. It shouldn't be that hard, given it's already done for the stringified results, right? -- To view, visit http://gerrit.cloudera.org:8080/23483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie26750cf540b0097c77e5454c1c1d20b3a194c52 Gerrit-Change-Number: 23483 Gerrit-PatchSet: 12 Gerrit-Owner: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Comment-Date: Wed, 15 Oct 2025 04:37:20 +0000 Gerrit-HasComments: Yes
