[GitHub] spark pull request #17303: [SPARK-19112][CORE] add codec for ZStandard

2017-05-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17303: [SPARK-19112][CORE] add codec for ZStandard

2017-05-08 Thread Cyan4973
Github user Cyan4973 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17303#discussion_r115351567
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -215,3 +217,22 @@ private final class SnappyOutputStreamWrapper(os: 
SnappyOutputStream) extends Ou
 }
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]].
+ *
+ * @note The wire protocol for this codec is not guaranteed to be 
compatible across versions
+ * of Spark. This is intended for use as an internal compression utility 
within a single Spark
+ * application.
+ */
+@DeveloperApi
+class ZStandardCompressionCodec(conf: SparkConf) extends CompressionCodec {
+
+  override def compressedOutputStream(s: OutputStream): OutputStream = {
+val level = 
conf.getSizeAsBytes("spark.io.compression.zstandard.level", "3").toInt
--- End diff --

Use cases which favor speed over size should prefer using level 1.
Compression speed difference is fairly large.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17303: [SPARK-19112][CORE] add codec for ZStandard

2017-03-15 Thread dongjinleekr
GitHub user dongjinleekr opened a pull request:

https://github.com/apache/spark/pull/17303

[SPARK-19112][CORE] add codec for ZStandard

## What changes were proposed in this pull request?

Hadoop[^1] & HBase[^2] started to support ZStandard Compression from their 
recent releases. This update enables saving a file in HDFS using ZStandard 
Codec, by implementing ZStandardCodec. It also requires adding a new 
configuration for default compression level, for example, 
'spark.io.compression.zstandard.level.'

[^1]: https://issues.apache.org/jira/browse/HADOOP-13578
[^2]: https://issues.apache.org/jira/browse/HBASE-16710

## How was this patch tested?

3 additional unit tests in `CompressionCodecSuite.scala`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjinleekr/spark feature/SPARK-19112

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17303


commit 1927b91d0d8621e9e2dc2a88a93e07780cfc66bf
Author: Lee Dongjin 
Date:   2017-03-15T08:09:56Z

Implement ZStandardCompressionCodec




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org