Re: pyspark - gzip output compression

2015-02-05 Thread Kane Kim
I'm getting SequenceFile doesn't work with GzipCodec without native-hadoop
code! Where to get those libs and where to put it in the spark?

Also can I save plain text file (like saveAsTextFile) as gzip?

Thanks.

On Wed, Feb 4, 2015 at 11:10 PM, Kane Kim kane.ist...@gmail.com wrote:

 How to save RDD with gzip compression?

 Thanks.



Re: pyspark - gzip output compression

2015-02-05 Thread Sean Owen
No, you can compress SequenceFile with gzip. If you are reading outside
Hadoop then SequenceFile may not be a great choice. You can use the gzip
codec with TextOutputFormat if you need to.
On Feb 5, 2015 8:28 AM, Kane Kim kane.ist...@gmail.com wrote:

 I'm getting SequenceFile doesn't work with GzipCodec without native-hadoop
 code! Where to get those libs and where to put it in the spark?

 Also can I save plain text file (like saveAsTextFile) as gzip?

 Thanks.

 On Wed, Feb 4, 2015 at 11:10 PM, Kane Kim kane.ist...@gmail.com wrote:

 How to save RDD with gzip compression?

 Thanks.