[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

jainaks Fri, 08 Jun 2018 06:13:33 -0700

Github user jainaks commented on the issue:

    https://github.com/apache/spark/pull/21320
  
    Hi @mallman, Thanks for this PR. It has huge impact on performance, when 
querying the nested parquet schema. I had used the original PR#16578 and found 
an issue, that it does not works well when the query has column names in 
different case.
    e.g. the schema is:
    root
     |-- name: struct
     |    |-- First: string
     |    |-- Last: string
     |-- address: string
    and if i put a join query, referring the column as "NAME.first".
    It throws an exception:
    **ERROR:  org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Binding attribute, tree: NAME#137322**
    If you want, i can share the exact schema and query for debugging.
    Though, i have fixed this in my local repo and get it working fine. 
    I have commented on the exact code line, which causes this issue. 
    Please let me know if you want me to share the fix.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

Reply via email to