GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/22358
[SPARK-25366][SQL]Zstd and brotil CompressionCodec are not supported for parquet files ## What changes were proposed in this pull request? Hadoop2.6 and hadoop2.7 do not contain zstd and brotil compressioncodec ,hadoop 3.1 also contains only zstd compressioncodec . So I think we should remove zstd and brotil for the time being. **set `spark.sql.parquet.compression.codec=brotli`:** Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.BrotliCodec was not found at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235) at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142) at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206) at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189) at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161) **set `spark.sql.parquet.compression.codec=zstd`:** Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.ZStandardCodec was not found at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235) at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142) at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206) at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189) at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161) ## How was this patch tested? Exist unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark notsupportzstdandbrotil Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22358.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22358 ---- commit 1db036ad725bc7a3c60dbb9aede0f91cf0d798d0 Author: liuxian <liu.xian3@...> Date: 2018-09-07T07:12:36Z fix ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org