Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21837#discussion_r204757826 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala --- @@ -68,4 +70,25 @@ class AvroOptions( .map(_.toBoolean) .getOrElse(!ignoreFilesWithoutExtension) } + + /** + * The `compression` option allows to specify a compression codec used in write. + * Currently supported codecs are `uncompressed`, `snappy` and `deflate`. + * If the option is not set, the `snappy` compression is used by default. + */ + val compression: String = parameters.get("compression").getOrElse(sqlConf.avroCompressionCodec) + + + /** + * Level of compression in the range of 1..9 inclusive. 1 - for fast, 9 - for best compression. + * If the compression level is not set for `deflate` compression, the current value of SQL + * config `spark.sql.avro.deflate.level` is used by default. For other compressions, the default + * value is `6`. + */ + val compressionLevel: Int = { --- End diff -- I added the option keeping in mind other compression codecs can be added in the future, for example zstandard. For those codecs, the level could be useful too. Another point is specifying compression level together with compression codec in Avro options looks more natural comparing to SQL global settings: ``` df.write .options(Map("compression" -> "deflate", "compressionLevel" -> "9")) .format("avro") .save(deflateDir) ``` vs ``` spark.conf.set("spark.sql.avro.deflate.level", "9") df.write .option("compression", "deflate")) .format("avro") .save(deflateDir) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org