Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20511
  
    I agree on what @omalley said. The new reader based on ORC 1.4 is better 
than the old reader. That is why we chose the new reader as the default at the 
beginning. We also saw the performance improvement in the micro-benchmark. 
    
    However, at the RC3 of Spark 2.3, we realize the new reader introduces the 
regression. After RC3, we reverted many PRs that caused the regressions and 
also rejected many bug fixes that could introduce new regressions. We are very 
conservative when merging the bug fixes in this stage. Thus, I also think the 
suggestion from @marmbrus is very reasonable. That is the strategy we are 
following in the previous Spark releases. 
    
    Regarding this specific case, we did not revert the new reader, but just 
changed the default values. Users can try the new reader. We just want to avoid 
breaking the existing workloads when they upgrade to the upcoming Spark 2.3 
release. In the next Spark 2.4 release, I believe we can feel more confident to 
choose the ORC new reader as the default. 
    
    @dongjoon-hyun Could you submit a PR against the master branch to turn on 
them by default? Also add a migration guide. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to