Github user jainaks commented on the issue:
https://github.com/apache/spark/pull/21320
Hi @mallman, Thanks for this PR. It has huge impact on performance, when
querying the nested parquet schema. I had used the original PR#16578 and found
an issue, that it does not works well when the query has column names in
different case.
e.g. the schema is:
root
|-- name: struct
| |-- First: string
| |-- Last: string
|-- address: string
and if i put a join query, referring the column as "NAME.first".
It throws an exception:
**ERROR: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Binding attribute, tree: NAME#137322**
If you want, i can share the exact schema and query for debugging.
Though, i have fixed this in my local repo and get it working fine.
I have commented on the exact code line, which causes this issue.
Please let me know if you want me to share the fix.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]