[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

mallman Fri, 22 Sep 2017 15:58:15 -0700

Github user mallman commented on the issue:

    https://github.com/apache/spark/pull/16578
  
    > @mallman how about adding comment explaining why such workaround was done 
+ bug number in parquet-mr ? So in future once that bug is fixed, code can be 
cleaned.
    
    It will take me more time to clarify/discover the root cause. Meanwhile, I 
reverted `ParquetReadSupport.scala` back to its previous version. I've also 
verified that unit tests fail unless we set `parquetMrCompatibility = false` 
for the built-in reader and `parquetMrCompatibility = true` for the parquet-mr 
reader. That is, we can't hardcode a single boolean value for 
`parquetMrCompatibility` without failing unit tests.
    
    > Also maybe it's time to remove "DO NOT MERGE" from title? As I understand 
most of comments were addressed :)
    
    The problem that refers to is still unresolved. I'm still thinking about 
how to fix it, but I suspect someone who firmly understands code generation in 
Spark SQL will be able to identify and whip up a fix faster and better than I 
can. Maybe @viirya can help.
    
    > Thank you very much for work on this feature. I must admit that we are 
looking forward to have this merged. For us this will be most important 
improvement in Spark 2.3.0 (I hope it will be part of 2.3.0 :) )
    
    Thanks for the encouragement and feedback! I'm glad you're finding this 
useful.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

Reply via email to