[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

jainaks Thu, 26 Jul 2018 00:26:47 -0700

Github user jainaks commented on the issue:

    https://github.com/apache/spark/pull/21320
  
    Thanks @mallman for making this huge contribution. 3 years is really a long 
time to keep patience for concluding things.
    I am attaching the sample parquet file for your reference with which you 
can reproduce the Window function, wrong column selection issue.
    
    
[sample.parquet.txt](https://github.com/apache/spark/files/2230873/sample.parquet.txt)
    Please remove .txt from the filename.
    
    Following are the simple steps you can follow to reproduce this issue via 
spark shell.
    ```
    import org.apache.spark.sql.SparkSession
    val ss = 
SparkSession.builder().config("spark.sql.nestedSchemaPruning.enabled", 
"true").getOrCreate()
    val inputdf = ss.read.parquet("sample.parquet")
    inputdf.createOrReplaceTempView(âtemptableâ)
    ss.sql("select page.url, first(page.url) over (partition by id order by 
timestamp rows between current row and unbounded following) from 
temptable").collect.foreach(println)
    ```
    Result:
    
`[https://adobeid-na1.services.adobe.com/renga-idprovider/pages/login,Account:IMS:onLoad_SignInForm]`
    Please let me know if you need any help from my side.
    
    PS: Sorry for responding late to it. Few high priority items kept me busy.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

Reply via email to