[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

scottcarey Wed, 18 Apr 2018 14:43:51 -0700

Github user scottcarey commented on the issue:

    https://github.com/apache/spark/pull/21070
  
    @rdblue 
    The problem with zstd is that it is only in Hadoop 3.0, and dropping _that_ 
jar in breaks things as it is a major release.  Extracting out only the 
ZStandardCodec from that and recompiling to a 2.x release does not work either, 
because it depends on some low level hadoop native library management to load 
the native library (it does not appear to use  
https://github.com/luben/zstd-jni).
    
    The alternative is to write a custom ZStandardCodec implementation that 
uses luben:zstd-jni
    
    Furthermore, if you add a `o.a.h.io.codecs.ZStandardCodec` class to a jar 
on the client side, it is still not found -- my guess is there is some 
classloader isolation between client code and spark itself and spark itself is 
what needs to find the class.  So one has to have it installed inside of the 
spark distribution.
    
    I may take you up on fixing the compression codec dependency mess in a 
couple months.  The hardest part will be lining up the configuration options 
with what users already expect -- the raw codecs aren't that hard to do.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

Reply via email to