[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

MaxGekk Tue, 24 Jul 2018 06:38:10 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21837#discussion_r204757826
  
    --- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala ---
    @@ -68,4 +70,25 @@ class AvroOptions(
           .map(_.toBoolean)
           .getOrElse(!ignoreFilesWithoutExtension)
       }
    +
    +  /**
    +   * The `compression` option allows to specify a compression codec used 
in write.
    +   * Currently supported codecs are `uncompressed`, `snappy` and `deflate`.
    +   * If the option is not set, the `snappy` compression is used by default.
    +   */
    +  val compression: String = 
parameters.get("compression").getOrElse(sqlConf.avroCompressionCodec)
    +
    +
    +  /**
    +   * Level of compression in the range of 1..9 inclusive. 1 - for fast, 9 
- for best compression.
    +   * If the compression level is not set for `deflate` compression, the 
current value of SQL
    +   * config `spark.sql.avro.deflate.level` is used by default. For other 
compressions, the default
    +   * value is `6`.
    +   */
    +  val compressionLevel: Int = {
    --- End diff --
    
    I added the option keeping in mind other compression codecs can be added in 
the future, for example zstandard. For those codecs, the level could be useful 
too. Another point is specifying compression level together with compression 
codec in Avro options looks more natural comparing to SQL global settings:
    
    ```
    df.write
      .options(Map("compression" -> "deflate", "compressionLevel" -> "9"))
      .format("avro")
      .save(deflateDir)
    ```
    vs
    ```
    spark.conf.set("spark.sql.avro.deflate.level", "9")
    df.write
      .option("compression", "deflate"))
      .format("avro")
      .save(deflateDir)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

Reply via email to