HyukjinKwon commented on a change in pull request #29421:
URL: https://github.com/apache/spark/pull/29421#discussion_r511684691



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##########
@@ -111,15 +111,14 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
               .zip(outputFieldWriters)
               .map { case (data, writer) => writer(data) })
       } else {
-        // In schema less mode, hive default serde will choose first two 
output column as output
-        // if output column size less then 2, it will throw 
ArrayIndexOutOfBoundsException.
-        // Here we change spark's behavior same as hive's default serde.
-        // But in hive, TRANSFORM with schema less behavior like origin spark, 
we will fix this
-        // to keep spark and hive behavior same in SPARK-32388
+        // In schema less mode, hive will choose first two output column as 
output.
+        // If output column size less then 2, it will return NULL for columns 
with missing values.
+        // Here we split row string and choose first 2 values, if values's 
size less then 2,
+        // we pad NULL value until 2 to make behavior same with hive.
         val kvWriter = 
CatalystTypeConverters.createToCatalystConverter(StringType)
         prevLine: String =>
           new GenericInternalRow(
-            prevLine.split(outputRowFormat).slice(0, 2)
+            prevLine.split(outputRowFormat).slice(0, 2).padTo(2, null)

Review comment:
       @AngersZhuuuu, can you add a configuration to control the legacy 
behaviour, and add a note to the migration guide since it's arguable to say 
this is a bug? With that, I think I can leave my sign-off and merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to