There is no compress type for snappy.
Sent from my iPhone5s > On 2014年4月4日, at 23:06, Konstantin Kudryavtsev > <kudryavtsev.konstan...@gmail.com> wrote: > > Can anybody suggest how to change compression level (Record, Block) for > Snappy? > if it possible, of course > > thank you in advance > > Thank you, > Konstantin Kudryavtsev > > >> On Thu, Apr 3, 2014 at 10:28 PM, Konstantin Kudryavtsev >> <kudryavtsev.konstan...@gmail.com> wrote: >> Thanks all, it works fine now and I managed to compress output. However, I >> am still in stuck... How is it possible to set compression type for Snappy? >> I mean to set up record or block level of compression for output >> >>> On Apr 3, 2014 1:15 AM, "Nicholas Chammas" <nicholas.cham...@gmail.com> >>> wrote: >>> Thanks for pointing that out. >>> >>> >>>> On Wed, Apr 2, 2014 at 6:11 PM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> First, you shouldn't be using spark.incubator.apache.org anymore, just >>>> spark.apache.org. Second, saveAsSequenceFile doesn't appear to exist in >>>> the Python API at this point. >>>> >>>> >>>>> On Wed, Apr 2, 2014 at 3:00 PM, Nicholas Chammas >>>>> <nicholas.cham...@gmail.com> wrote: >>>>> Is this a Scala-only feature? >>>>> >>>>> >>>>>> On Wed, Apr 2, 2014 at 5:55 PM, Patrick Wendell <pwend...@gmail.com> >>>>>> wrote: >>>>>> For textFile I believe we overload it and let you set a codec directly: >>>>>> >>>>>> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala#L59 >>>>>> >>>>>> For saveAsSequenceFile yep, I think Mark is right, you need an option. >>>>>> >>>>>> >>>>>>> On Wed, Apr 2, 2014 at 12:36 PM, Mark Hamstra <m...@clearstorydata.com> >>>>>>> wrote: >>>>>>> http://www.scala-lang.org/api/2.10.3/index.html#scala.Option >>>>>>> >>>>>>> The signature is 'def saveAsSequenceFile(path: String, codec: >>>>>>> Option[Class[_ <: CompressionCodec]] = None)', but you are providing a >>>>>>> Class, not an Option[Class]. >>>>>>> >>>>>>> Try counts.saveAsSequenceFile(output, >>>>>>> Some(classOf[org.apache.hadoop.io.compress.SnappyCodec])) >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Wed, Apr 2, 2014 at 12:18 PM, Kostiantyn Kudriavtsev >>>>>>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>>>>>> Hi there, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I've started using Spark recently and evaluating possible use cases in >>>>>>>> our company. >>>>>>>> >>>>>>>> I'm trying to save RDD as compressed Sequence file. I'm able to save >>>>>>>> non-compressed file be calling: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> counts.saveAsSequenceFile(output) >>>>>>>> where counts is my RDD (IntWritable, Text). However, I didn't manage >>>>>>>> to compress output. I tried several configurations and always got >>>>>>>> exception: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> counts.saveAsSequenceFile(output, >>>>>>>> classOf[org.apache.hadoop.io.compress.SnappyCodec]) >>>>>>>> <console>:21: error: type mismatch; >>>>>>>> found : >>>>>>>> Class[org.apache.hadoop.io.compress.SnappyCodec](classOf[org.apache.hadoop.io.compress.SnappyCodec]) >>>>>>>> required: Option[Class[_ <: >>>>>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>>>>> counts.saveAsSequenceFile(output, >>>>>>>> classOf[org.apache.hadoop.io.compress.SnappyCodec]) >>>>>>>> >>>>>>>> counts.saveAsSequenceFile(output, >>>>>>>> classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>>>>> <console>:21: error: type mismatch; >>>>>>>> found : >>>>>>>> Class[org.apache.spark.io.SnappyCompressionCodec](classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>>>>> required: Option[Class[_ <: >>>>>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>>>>> counts.saveAsSequenceFile(output, >>>>>>>> classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>>>>> and it doesn't work even for Gzip: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> counts.saveAsSequenceFile(output, >>>>>>>> classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>>>>> <console>:21: error: type mismatch; >>>>>>>> found : >>>>>>>> Class[org.apache.hadoop.io.compress.GzipCodec](classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>>>>> required: Option[Class[_ <: >>>>>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>>>>> counts.saveAsSequenceFile(output, >>>>>>>> classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>>>>> Could you please suggest solution? also, I didn't find how is it >>>>>>>> possible to specify compression parameters (i.e. compression type for >>>>>>>> Snappy). I wondered if you could share code snippets for >>>>>>>> writing/reading RDD with compression? >>>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> >>>>>>>> Konstantin Kudryavtsev >