Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23565 )

Change subject: KUDU-1261 [Java] Add spark bindings for Array columns
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@169
PS3, Line 169:       if (rowResult.isNull(i)) {
             :         columns(i) = null
Is this really necessary given the code at line 174?


http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala:

http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala@60
PS3, Line 60:      case Type.NESTED =>
            :         throw new IllegalArgumentException(
            :           "Type.NESTED should not be converted directly;" +
            :             " handle via col.isArray/col.getElementType.")
            :       case other =>
            :         throw new IllegalArgumentException(s"No support for Kudu 
type $other")
Why not to update this utility function to include what's done code at lines 
105 -- 114?  Is there any concern with updating the signature of this function 
to switch using a single 'col' parameter?

>From the git history, I could see this was a private utility function, 
>residing in KuduRelation before it was moved here.


http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/SparkUtil.scala@87
PS3, Line 87:    case _ =>
            :       throw new IllegalArgumentException(s"No support for Spark 
SQL type $dt")
I suspect this could be thrown from the control paths involving 
writeOptions.handleSchemaDrift case in writeRows, but I might misunderstand 
what's going on there.

Does it make sense to add a test scenario with schema drift involving new array 
type column into the corresponding test suite, at least to confirm it's not 
throwing if you don't expect this to be thrown in such cases?


http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
File 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala:

http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala@138
PS3, Line 138:         .nullable(true)
Maybe for at least for column, make it non-nullable to whether it's OK and 
doesn't blow up somewhere?


http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/SparkSQLTest.scala
File 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/SparkSQLTest.scala:

http://gerrit.cloudera.org:8080/#/c/23565/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/SparkSQLTest.scala@553
PS3, Line 553: }
Could you also add coverage for empty arrays and null array cells as well, 
please?



--
To view, visit http://gerrit.cloudera.org:8080/23565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I786fdd1cbfbb67b4895b2e95b89addbc04341746
Gerrit-Change-Number: 23565
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 21 Oct 2025 01:59:27 +0000
Gerrit-HasComments: Yes

Reply via email to