[GitHub] [spark] dongjoon-hyun commented on pull request #31966: [SPARK-34638][SQL] Single field nested column prune on generator output

GitBox Sun, 18 Apr 2021 22:53:26 -0700


dongjoon-hyun commented on pull request #31966:
URL: https://github.com/apache/spark/pull/31966#issuecomment-822188605



   Hi, @viirya . While testing this PR, I found the following regression.
   
   **BEFORE (3.1.1)**
   ```scala
   scala> sql("select * from values array(array(named_struct('a', 1, 'b', 3), 
named_struct('a', 2, 'b', 4))) T(items)").write.parquet("/tmp/nested_array")
   
   scala> spark.read.parquet("/tmp/nested_array").createOrReplaceTempView("t")
   
   scala> sql("select d.a from (select explode(c) d from (select explode(items) 
c from t))").show()
   +---+
   |  a|
   +---+
   |  1|
   |  2|
   +---+
   ```
   
   **BEFORE(master)**
   ```scala
   scala> spark.read.parquet("/tmp/nested_array").createOrReplaceTempView("t")
   
   scala> sql("select d.a from (select explode(c) d from (select explode(items) 
c from t))").show()
   +---+
   |  a|
   +---+
   |  1|
   |  2|
   +---+
   ```
   
   **AFTER (This PR)**
   ```scala
   scala> spark.read.parquet("/tmp/nested_array").createOrReplaceTempView("t")
   
   scala> sql("select d.a from (select explode(c) d from (select explode(items) 
c from t))").show()
   java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be 
cast to org.apache.spark.sql.types.StructType
     at 
org.apache.spark.sql.catalyst.expressions.SelectedField$.selectField(SelectedField.scala:81)
     at 
org.apache.spark.sql.catalyst.expressions.SelectedField$.unapply(SelectedField.scala:62)
     at 
org.apache.spark.sql.catalyst.expressions.SchemaPruning$.getRootFields(SchemaPruning.scala:124)
     at 
org.apache.spark.sql.catalyst.expressions.SchemaPruning$.$anonfun$identifyRootFields$1(SchemaPruning.scala:81)
   ```
   
   Could you double-check this and add some test coverage?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on pull request #31966: [SPARK-34638][SQL] Single field nested column prune on generator output

Reply via email to