You could use the following as compressionCodecClass: DEFLATE org.apache.hadoop.io.compress.DefaultCodec gzip org.apache.hadoop.io.compress.GzipCodec bzip2 org.apache.hadoop.io.compress.BZip2Codec LZO com.hadoop.compression.lzo.LzopCodec
for gzip, compressionCodecClass should be org.apache.hadoop.io.compress.GzipCodec On Thu, Nov 13, 2014 at 8:28 PM, sahanbull <sa...@skimlinks.com> wrote: > Hi, > > I am trying to save an RDD to an S3 bucket using > RDD.saveAsSequenceFile(self, path, CompressionCodec) function in python. I > need to save the RDD in GZIP. Can anyone tell me how to send the gzip codec > class as a parameter into the function. > > I tried > *RDD.saveAsSequenceFile('{0}{1}'.format(outputFolder,datePath),compressionCodecClass=gzip.GzipFile)* > > but it hits me with a : *AttributeError: type object 'GzipFile' has no > attribute '_get_object_id' * > Which I think is because JVM cant find the scala mapping gzip. > > *If you can guide me about any method to write the RDD as a gzip(.gz) into > disc that is very much appreciated. * > > Many thanks > SahanB > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Using-a-compression-codec-in-saveAsSequenceFile-in-Pyspark-Python-API-tp18899.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org