Hi Shivaram/Alek, I understand that a better way to import data is to DataFrame rather than RDD. If one wants to do a map-like transformation for such row in sparkR, one could use sparkR:::lapply(), but is there a counterpart row operation on DataFrame? The use case I am working on requires complicated row level pre-processing and then goes to the actually modeling.
Thanks. Best, Wei 2015-06-25 9:25 GMT-07:00 Shivaram Venkataraman <[email protected]> : > In addition to Aleksander's point please let us know what use case would > use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 -- > We are hoping to have a version of this API in upcoming releases. > > Thanks > Shivaram > > On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander < > [email protected]> wrote: > >> The simple answer is that SparkR does support map/reduce operations >> over RDD’s through the RDD API, but since Spark v 1.4.0, those functions >> were made private in SparkR. They can still be accessed by prepending the >> function with the namespace, like SparkR:::lapply(rdd, func). It was >> thought though that many of the functions in the RDD API were too low level >> to expose, with much more of the focus going into the DataFrame API. The >> original rationale for this decision can be found in its JIRA [1]. The devs >> are still deciding which functions of the RDD API, if any, should be made >> public for future releases. If you feel some use cases are most easily >> handled in SparkR through RDD functions, go ahead and let the dev email >> list know. >> >> Alek >> [1] -- https://issues.apache.org/jira/browse/SPARK-7230 >> >> From: Wei Zhou <[email protected]> >> Date: Wednesday, June 24, 2015 at 4:59 PM >> To: "[email protected]" <[email protected]> >> Subject: How to Map and Reduce in sparkR >> >> Anyone knows whether sparkR supports map and reduce operations as the >> RDD transformations? Thanks in advance. >> >> Best, >> Wei >> CONFIDENTIALITY NOTICE This message and any included attachments are >> from Cerner Corporation and are intended only for the addressee. The >> information contained in this message is confidential and may constitute >> inside or non-public information under international, federal, or state >> securities laws. Unauthorized forwarding, printing, copying, distribution, >> or use of such information is strictly prohibited and may be unlawful. If >> you are not the addressee, please promptly delete this message and notify >> the sender of the delivery error by e-mail or you may call Cerner's >> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >> > >
