Well, the dataframes make it easier to work on some columns of the data only
and to store results in new columns, removing the need to zip it all back
together and thus to preserve order.
On 2017-09-05 14:04 CEST, mehmet.su...@gmail.com wrote:
Hi Johan,
DataFrames are building on top of RDDs,
Thanks all for your answers. After reading the provided links I am still
uncertain of the details of what I'd need to do to get my calculations right
with RDDs. However I discovered DataFrames and Pipelines on the "ML" side of
the libs and I think they'll be better suited to my needs.
Best,
Joh
In several situations I would like to zip RDDs knowing that their order
matches. In particular I’m using an MLLib KMeansModel on an RDD of Vectors so I
would like to do:
myData.zip(myModel.predict(myData))
Also the first column in my RDD is a timestamp which I don’t want to be a part
of the mo
(Sorry Mehmet, I'm seeing just now your first reply with the link to SO; it had
first gone to my spam folder :-/ )
On 2017-09-14 10:02 CEST, GRANDE Johan Ext DTSI/DSI wrote:
Well if the order cannot be guaranteed in case of a failure (or at all since
failure can happen transparently), what doe
Well if the order cannot be guaranteed in case of a failure (or at all since
failure can happen transparently), what does it mean to sort an RDD (method
sortBy)?
On 2017-09-14 03:36 CEST mehmet.su...@gmail.com wrote:
I think it is one of the conceptual difference in Spark compare to other
lan
Hi,
I'm a beginner using Spark with Scala and I'm having trouble understanding
ordering in RDDs. I understand that RDDs are ordered (as they can be sorted)
but that some transformations don't preserve order.
How can I know which transformations preserve order and which don't? Regarding
map, fo