I thought it didn't split the files. Seems I'm wrong. Maybe it's a matter of size then.

In this case, yes it's scalable. After all it's a RDD initially.


On Fri, Jan 3, 2014 at 7:26 PM, Guillaume Pitel <[email protected]> wrote:
Actually, the interesting part in hadoop files is the sequencefile format which allows to split the data in various blocks. Other files in HDFS are single-blocks. They do not scale

But the output of saveAsObjectFile looks like: part-00000, part-00001, part-00002,... . It does output split data, making it scalable, no?
 


--
eXenSa
Guillaume PITEL, Président
+33(0)6 25 48 86 80 / +33(0)9 70 44 67 53

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Reply via email to