Liang-Chi Hsieh created SPARK-25363:
---------------------------------------

             Summary: Schema pruning doesn't work if nested column is used in 
where clause
                 Key: SPARK-25363
                 URL: https://issues.apache.org/jira/browse/SPARK-25363
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Liang-Chi Hsieh


Schema pruning doesn't work if nested column is used in where clause.

For example,
{code}
sql("select name.first from contacts where name.first = 'David'")

== Physical Plan ==
*(1) Project [name#19.first AS first#40]
+- *(1) Filter (isnotnull(name#19) && (name#19.first = David))
   +- *(1) FileScan parquet [name#19] Batched: false, Format: Parquet, 
PartitionFilters: [], 
    PushedFilters: [IsNotNull(name)], ReadSchema: 
struct<name:struct<first:string,middle:string,last:string>>
{code}

In above query plan, the scan node reads the entire schema of `name` column.

This issue is reported by:
https://github.com/apache/spark/pull/21320#issuecomment-419290197



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to