guiyanakuang opened a new pull request #30467:
URL: https://github.com/apache/spark/pull/30467


   ### What changes were proposed in this pull request?
   
   For data nest.json
   
   ```json
   {"a": [{"b": [{"c": [1,2]}]}]}
   {"a": [{"b": [{"c": [1]}, {"c": [2]}]}]}
   ```
   
   run with
   
   ```scala
   val df: DataFrame = spark.read.json(testFile("nest-data.json"))
   df.createTempView("nest_table")
   sql("select a.b.c from nest_table").show()
   ```
   
   will got error
   
   ```log
   org.apache.spark.sql.AnalysisException: cannot resolve 
'nest_table.`a`.`b`['c']' due to data type mismatch: argument 2 requires 
integral type, however, ''c'' is of string type.; line 1 pos 7;
   'Project [a#6.b[c] AS c#8|#6.b[c] AS c#8]
   +- SubqueryAlias `nest_table`
   +- Relationa#6 json
   ```
   
   Analyse the causes, a.b Expression dataType match extractor for c, but a.b 
extractor is GetArrayStructFields, ArrayType(ArrayType()) match GetArrayItem, 
extraction ("c") treat as an ordinal.
   
   ### Why are the changes needed?
   
   Spark sql cannot analyse nested arrays, it is very common to analyse this 
type of data, especially in the field of advertising!
   
   ### Does this PR introduce _any_ user-facing change?
   
   Users can query nested arrays directly using sql and pruning is supported.
   
   
   ### How was this patch tested?
   
   Added UT
   
   ComplexTypesSuite 
   Added tests show that querying nested arrays is possible.
   
   NestArraySchemaPruningSuite
   Test extraction of nested arrays while supporting schema pruning.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to