[GitHub] spark pull request #17024: [SPARK-19525][CORE] Compressing checkpoints.

mridulm Wed, 22 Feb 2017 01:59:39 -0800

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17024#discussion_r102418860
  
    --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
    @@ -95,6 +95,7 @@ private[spark] object CompressionCodec {
       val FALLBACK_COMPRESSION_CODEC = "snappy"
       val DEFAULT_COMPRESSION_CODEC = "lz4"
       val ALL_COMPRESSION_CODECS = shortCompressionCodecNames.values.toSeq
    +  val ALL_COMPRESSION_CODECS_SHORT: Set[String] = 
shortCompressionCodecNames.keySet
    --- End diff --
    
    Instead of exposing this and supporting only short codec names for 
checkpoint, the pattern should be same as in rest of spark code when dealing 
with codec's.
    ```
    sparkConf.getOption("spark.checkpoint.compress.codec").map(c => 
      logInfo(s"Compressing checkpoint using $c.")
      CompressionCodec.createCodec(conf, c)
    ).getOrElse(fileStream)
    
    ```
    This will ensure that support for checkpoint compression is in line with 
rest of spark (short and long classes, no need to introduce 'none')
    
    Note: you will need to change fileStream to a `lazy val` - so that if codec 
creation throws exception, we dont leave dangling streams around (with limited 
block visibility scope to fileStream)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17024: [SPARK-19525][CORE] Compressing checkpoints.

Reply via email to