Hi Shivaram/Alek,

I understand that a better way to import data is to DataFrame rather than
RDD. If one wants to do a map-like transformation for such row in sparkR,
one could use sparkR:::lapply(), but is there a counterpart row operation
on DataFrame? The use case I am working on requires complicated row level
pre-processing and then goes to the actually modeling.

Thanks.

Best,
Wei

2015-06-25 9:25 GMT-07:00 Shivaram Venkataraman <[email protected]>
:

> In addition to Aleksander's point please let us know what use case would
> use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 --
> We are hoping to have a version of this API in upcoming releases.
>
> Thanks
> Shivaram
>
> On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander <
> [email protected]> wrote:
>
>>  The  simple answer is that SparkR does support map/reduce operations
>> over RDD’s through the RDD API, but since Spark v 1.4.0, those functions
>> were made private in SparkR. They can still be accessed by prepending the
>> function with the namespace, like SparkR:::lapply(rdd, func). It was
>> thought though that many of the functions in the RDD API were too low level
>> to expose, with much more of the focus going into the DataFrame API. The
>> original rationale for this decision can be found in its JIRA [1]. The devs
>> are still deciding which functions of the RDD API, if any, should be made
>> public for future releases. If you feel some use cases are most easily
>> handled in SparkR through RDD functions, go ahead and let the dev email
>> list know.
>>
>>  Alek
>> [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>
>>   From: Wei Zhou <[email protected]>
>> Date: Wednesday, June 24, 2015 at 4:59 PM
>> To: "[email protected]" <[email protected]>
>> Subject: How to Map and Reduce in sparkR
>>
>>   Anyone knows whether sparkR supports map and reduce operations as the
>> RDD transformations? Thanks in advance.
>>
>>  Best,
>> Wei
>>    CONFIDENTIALITY NOTICE This message and any included attachments are
>> from Cerner Corporation and are intended only for the addressee. The
>> information contained in this message is confidential and may constitute
>> inside or non-public information under international, federal, or state
>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>> or use of such information is strictly prohibited and may be unlawful. If
>> you are not the addressee, please promptly delete this message and notify
>> the sender of the delivery error by e-mail or you may call Cerner's
>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>
>
>

Reply via email to