[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

dongjoon-hyun Tue, 08 Aug 2017 09:56:09 -0700

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/18640
  
    Thank you again for coming and reviewing this PR, @rxin , @kiszk , @mridulm 
, @omalley .
    So far, we discussed the followings.
    
    1. `Why are we adding this to core? Why not just the hive module?` (@rxin)
       - `sql/core` module gives more benefit than `sql/hive`.
       - Apache ORC library (`no-hive` version) is a general and resonably 
small library designed for non-hive apps.
    
    2. `Can we add smaller amount of new code to use this, too?` (@kiszk)
       - The previous #17980 , #17924, and #17943 are the complete examples 
containing this PR.
       - This PR is focusing on dependency only.
    
    3. `Why don't we then create a separate orc module? Just copy a few of the 
files over?` (@rxin)
       -  Apache ORC library is the same with most of other data sources(CSV, 
JDBC, JSON, PARQUET, TEXT) which live inside `sql/core`
       - It's better to use as a library instead of copying ORC files because 
Apache ORC shaded jar has many files. We had better depend on Apache ORC 
community's effort until an unavoidable reason for copying occurs.
    
    4. `I do worry in the future whether ORC would bring in a lot more jars` 
(@rxin)
       - The ORC core library's dependency tree is aggressively kept as small 
as possible. I've gone through and excluded unnecessary jars from our 
dependencies. I also kick back pull requests that add unnecessary new 
dependencies. (@omalley)
    
    I tried to contain and summarize all advices here, but please let me know 
if I missed some concerns here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

Reply via email to