RE: RDD order preservation through transformations

2017-09-15 Thread johan.grande.ext
Well, the dataframes make it easier to work on some columns of the data only and to store results in new columns, removing the need to zip it all back together and thus to preserve order. On 2017-09-05 14:04 CEST, mehmet.su...@gmail.com wrote: Hi Johan, DataFrames are building on top of RDDs,

RE: RDD order preservation through transformations

2017-09-15 Thread johan.grande.ext
Thanks all for your answers. After reading the provided links I am still uncertain of the details of what I'd need to do to get my calculations right with RDDs. However I discovered DataFrames and Pipelines on the "ML" side of the libs and I think they'll be better suited to my needs. Best, Joh

RE: RDD order preservation through transformations

2017-09-14 Thread johan.grande.ext
In several situations I would like to zip RDDs knowing that their order matches. In particular I’m using an MLLib KMeansModel on an RDD of Vectors so I would like to do: myData.zip(myModel.predict(myData)) Also the first column in my RDD is a timestamp which I don’t want to be a part of the mo

RE: RDD order preservation through transformations

2017-09-14 Thread johan.grande.ext
(Sorry Mehmet, I'm seeing just now your first reply with the link to SO; it had first gone to my spam folder :-/ ) On 2017-09-14 10:02 CEST, GRANDE Johan Ext DTSI/DSI wrote: Well if the order cannot be guaranteed in case of a failure (or at all since failure can happen transparently), what doe

RE: RDD order preservation through transformations

2017-09-14 Thread johan.grande.ext
Well if the order cannot be guaranteed in case of a failure (or at all since failure can happen transparently), what does it mean to sort an RDD (method sortBy)? On 2017-09-14 03:36 CEST mehmet.su...@gmail.com wrote: I think it is one of the conceptual difference in Spark compare to other lan

RDD order preservation through transformations

2017-09-13 Thread johan.grande.ext
Hi, I'm a beginner using Spark with Scala and I'm having trouble understanding ordering in RDDs. I understand that RDDs are ordered (as they can be sorted) but that some transformations don't preserve order. How can I know which transformations preserve order and which don't? Regarding map, fo