I think a lot of the confusion is cleared up with a quick look at the code:

https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L901

saveAsObjectFile is just a thin wrapper around saveAsSequenceFile, which
makes a null key and calls the java serializer.

if you want to use kryo, just do the same thing yourself, but use the kryo
serializer in place of the java one.




On Fri, Jan 3, 2014 at 1:33 PM, Aureliano Buendia <[email protected]>wrote:

>
>
>
> On Fri, Jan 3, 2014 at 7:26 PM, Guillaume Pitel <
> [email protected]> wrote:
>
>>  Actually, the interesting part in hadoop files is the sequencefile
>> format which allows to split the data in various blocks. Other files in
>> HDFS are single-blocks. They do not scale
>>
>
> But the output of saveAsObjectFile looks like: part-00000, part-00001,
> part-00002,... . It does output split data, making it scalable, no?
>
>
>>
>> An ObjectFile cannot be naturally splitted.
>>
>> Usually, in Hadoop when storing a sequence of elements instead of a
>> sequence of key,value the trick is to store key,null
>>
>> I don't know what's the most effective way to do that in scala/spark.
>> Actually that would be a good thing to add it to RDD[U]
>>
>> Guillaume
>>
>>
>>
>>
>> On Fri, Jan 3, 2014 at 7:10 PM, Andrew Ash <[email protected]> wrote:
>>
>>> saveAsHadoopFile and saveAsNewAPIHadoopFile are on PairRDDFunctions
>>> which uses some Scala magic to become available when you have an that's
>>> RDD[Key, Value]
>>>
>>>
>>> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L648
>>>
>>
>>  I see. So if my data is of RDD[Value] type, I cannot use compression?
>> Why does it have to be of RDD[Key, Value] in order to save it in hadoop?
>>
>>  Also, doesn't saveAsObjectFile("hdfs://...") save data in hadoop? This
>> is confusing.
>>
>>  I'm only interested in saving data on s3 ("s3n://..."), does it matter
>> if I use saveAsHadoopFile, or saveAsObjectFile?
>>
>>
>>>
>>>
>> --
>>    [image: eXenSa]
>>  *Guillaume PITEL, Président*
>> +33(0)6 25 48 86 80 / +33(0)9 70 44 67 53
>>
>>  eXenSa S.A.S. <http://www.exensa.com/>
>>  41, rue Périer - 92120 Montrouge - FRANCE
>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>>
>
>

<<exensa_logo_mail.png>>

Reply via email to