[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

jainaks Tue, 12 Jun 2018 00:49:02 -0700

Github user jainaks commented on the issue:

    https://github.com/apache/spark/pull/21320
  
    Hi @mallman ,
    I found another major issue after having this fix.
    Schema:
    a: struct (nullable = true)
     |    |-- b: struct (nullable = true)
     |    |    |-- c1: string (nullable = true)
     |    |    |-- c2: string (nullable = true)
     |    |    |-- c3: string (nullable = true)
     |    |    |-- c4: string (nullable = true)
     |    |    |-- c5: boolean (nullable = true)
    id: struct (nullable = true)
     |    |-- i1: struct (nullable = true)
     |    |    |-- i2: string (nullable = true)
    timestamp: bigint
    **Query:**
    select      a.b.c3 as c3, 
                first(a.b.c3) over (partition by id.i1.i2 order by timestamp 
rows between current row and unbounded following) as first_c3
    from        temp;
    The column "first_c3" gets the value of column "c2".
    It works well, if i just turn the parquetSchemaPrunning flag to false.
    It may sound odd in the first look and so does for me, but this is what i 
am getting.
    PS: I am running all my tests using #16578 pr.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

Reply via email to