Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23565 )
Change subject: KUDU-1261 [Java] Add spark bindings for Array columns ...................................................................... Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala: http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@169 PS3, Line 169: if (rowResult.isNull(i)) { : columns(i) = null Is this really necessary given the code at line 174? http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala: http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala@60 PS3, Line 60: case Type.NESTED => : throw new IllegalArgumentException( : "Type.NESTED should not be converted directly;" + : " handle via col.isArray/col.getElementType.") : case other => : throw new IllegalArgumentException(s"No support for Kudu type $other") Why not to update this utility function to include what's done code at lines 105 -- 114? Is there any concern with updating the signature of this function to switch using a single 'col' parameter? >From the git history, I could see this was a private utility function, >residing in KuduRelation before it was moved here. http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala@87 PS3, Line 87: case _ => : throw new IllegalArgumentException(s"No support for Spark SQL type $dt") I suspect this could be thrown from the control paths involving writeOptions.handleSchemaDrift case in writeRows, but I might misunderstand what's going on there. Does it make sense to add a test scenario with schema drift involving new array type column into the corresponding test suite, at least to confirm it's not throwing if you don't expect this to be thrown in such cases? http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala: http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala@138 PS3, Line 138: .nullable(true) Maybe for at least for column, make it non-nullable to whether it's OK and doesn't blow up somewhere? http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/SparkSQLTest.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/SparkSQLTest.scala: http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/SparkSQLTest.scala@553 PS3, Line 553: } Could you also add coverage for empty arrays and null array cells as well, please? -- To view, visit http://gerrit.cloudera.org:8080/23565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I786fdd1cbfbb67b4895b2e95b89addbc04341746 Gerrit-Change-Number: 23565 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 21 Oct 2025 01:59:27 +0000 Gerrit-HasComments: Yes
