It won't last for long = for now my dataset are small enough, but I'll have to change it someday

How does it depend on the dataset size? Are you saying zipWithIndex is slow for bigger datasets?


No, but for a multi-tens of billion elements dataset, you cannot fit your elements in memory on a single host.

So at some point, the solution I currently use (not the one I sent) : dataset.collect().zipWithIndex just won't scale.

Did you try the code I sent ? I think the sortBy is probably in the wrong direction, so change it with -i instead of i

Guillaume

--
eXenSa
Guillaume PITEL, Président
+33(0)6 25 48 86 80 / +33(0)9 70 44 67 53

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Reply via email to