[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...

10110346 Fri, 07 Sep 2018 00:37:43 -0700

GitHub user 10110346 opened a pull request:

    https://github.com/apache/spark/pull/22358


    [SPARK-25366][SQL]Zstd and brotil CompressionCodec are not supported for 
parquet files

    ## What changes were proposed in this pull request?
    Hadoop2.6  and  hadoop2.7 do not contain zstd and brotil compressioncodec 
,hadoop 3.1 also contains only zstd  compressioncodec .
     So I think we should remove zstd and brotil  for the time being.
    
    **set  `spark.sql.parquet.compression.codec=brotli`:**
    Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class 
org.apache.hadoop.io.compress.BrotliCodec was not found
            at 
org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
            at 
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
            at 
org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
            at 
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
            at 
org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
            at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
            at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
            at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
            at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)
        
            
    **set  `spark.sql.parquet.compression.codec=zstd`:**       
    Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class 
org.apache.hadoop.io.compress.ZStandardCodec was not found
            at 
org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
            at 
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
            at 
org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
            at 
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
            at 
org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
            at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
            at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
            at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
            at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)
    
    ## How was this patch tested?
    Exist unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/10110346/spark notsupportzstdandbrotil

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22358
    
----
commit 1db036ad725bc7a3c60dbb9aede0f91cf0d798d0
Author: liuxian <liu.xian3@...>
Date:   2018-09-07T07:12:36Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...

Reply via email to