Github user dongjoon-hyun commented on the issue:
    Thank you again for coming and reviewing this PR, @rxin , @kiszk , @mridulm 
, @omalley .
    So far, we discussed the followings.
    1. `Why are we adding this to core? Why not just the hive module?` (@rxin)
       - `sql/core` module gives more benefit than `sql/hive`.
       - Apache ORC library (`no-hive` version) is a general and resonably 
small library designed for non-hive apps.
    2. `Can we add smaller amount of new code to use this, too?` (@kiszk)
       - The previous #17980 , #17924, and #17943 are the complete examples 
containing this PR.
       - This PR is focusing on dependency only.
    3. `Why don't we then create a separate orc module? Just copy a few of the 
files over?` (@rxin)
       -  Apache ORC library is the same with most of other data sources(CSV, 
JDBC, JSON, PARQUET, TEXT) which live inside `sql/core`
       - It's better to use as a library instead of copying ORC files because 
Apache ORC shaded jar has many files. We had better depend on Apache ORC 
community's effort until an unavoidable reason for copying occurs.
    4. `I do worry in the future whether ORC would bring in a lot more jars` 
       - The ORC core library's dependency tree is aggressively kept as small 
as possible. I've gone through and excluded unnecessary jars from our 
dependencies. I also kick back pull requests that add unnecessary new 
dependencies. (@omalley)
    I tried to contain and summarize all advices here, but please let me know 
if I missed some concerns here.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to