Re: How to Map and Reduce in sparkR

2015-06-25 Thread Eskilson,Aleksander
The  simple answer is that SparkR does support map/reduce operations over RDD’s 
through the RDD API, but since Spark v 1.4.0, those functions were made private 
in SparkR. They can still be accessed by prepending the function with the 
namespace, like SparkR:::lapply(rdd, func). It was thought though that many of 
the functions in the RDD API were too low level to expose, with much more of 
the focus going into the DataFrame API. The original rationale for this 
decision can be found in its JIRA [1]. The devs are still deciding which 
functions of the RDD API, if any, should be made public for future releases. If 
you feel some use cases are most easily handled in SparkR through RDD 
functions, go ahead and let the dev email list know.

Alek
[1] -- https://issues.apache.org/jira/browse/SPARK-7230

From: Wei Zhou zhweisop...@gmail.commailto:zhweisop...@gmail.com
Date: Wednesday, June 24, 2015 at 4:59 PM
To: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: How to Map and Reduce in sparkR

Anyone knows whether sparkR supports map and reduce operations as the RDD 
transformations? Thanks in advance.

Best,
Wei

CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.


Re: How to Map and Reduce in sparkR

2015-06-25 Thread Shivaram Venkataraman
In addition to Aleksander's point please let us know what use case would
use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 -- We
are hoping to have a version of this API in upcoming releases.

Thanks
Shivaram

On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander 
alek.eskil...@cerner.com wrote:

  The  simple answer is that SparkR does support map/reduce operations
 over RDD’s through the RDD API, but since Spark v 1.4.0, those functions
 were made private in SparkR. They can still be accessed by prepending the
 function with the namespace, like SparkR:::lapply(rdd, func). It was
 thought though that many of the functions in the RDD API were too low level
 to expose, with much more of the focus going into the DataFrame API. The
 original rationale for this decision can be found in its JIRA [1]. The devs
 are still deciding which functions of the RDD API, if any, should be made
 public for future releases. If you feel some use cases are most easily
 handled in SparkR through RDD functions, go ahead and let the dev email
 list know.

  Alek
 [1] -- https://issues.apache.org/jira/browse/SPARK-7230

   From: Wei Zhou zhweisop...@gmail.com
 Date: Wednesday, June 24, 2015 at 4:59 PM
 To: user@spark.apache.org user@spark.apache.org
 Subject: How to Map and Reduce in sparkR

   Anyone knows whether sparkR supports map and reduce operations as the
 RDD transformations? Thanks in advance.

  Best,
 Wei
CONFIDENTIALITY NOTICE This message and any included attachments are
 from Cerner Corporation and are intended only for the addressee. The
 information contained in this message is confidential and may constitute
 inside or non-public information under international, federal, or state
 securities laws. Unauthorized forwarding, printing, copying, distribution,
 or use of such information is strictly prohibited and may be unlawful. If
 you are not the addressee, please promptly delete this message and notify
 the sender of the delivery error by e-mail or you may call Cerner's
 corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.



Re: How to Map and Reduce in sparkR

2015-06-25 Thread Wei Zhou
Hi Shivaram/Alek,

I understand that a better way to import data is to DataFrame rather than
RDD. If one wants to do a map-like transformation for such row in sparkR,
one could use sparkR:::lapply(), but is there a counterpart row operation
on DataFrame? The use case I am working on requires complicated row level
pre-processing and then goes to the actually modeling.

Thanks.

Best,
Wei

2015-06-25 9:25 GMT-07:00 Shivaram Venkataraman shiva...@eecs.berkeley.edu
:

 In addition to Aleksander's point please let us know what use case would
 use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 --
 We are hoping to have a version of this API in upcoming releases.

 Thanks
 Shivaram

 On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander 
 alek.eskil...@cerner.com wrote:

  The  simple answer is that SparkR does support map/reduce operations
 over RDD’s through the RDD API, but since Spark v 1.4.0, those functions
 were made private in SparkR. They can still be accessed by prepending the
 function with the namespace, like SparkR:::lapply(rdd, func). It was
 thought though that many of the functions in the RDD API were too low level
 to expose, with much more of the focus going into the DataFrame API. The
 original rationale for this decision can be found in its JIRA [1]. The devs
 are still deciding which functions of the RDD API, if any, should be made
 public for future releases. If you feel some use cases are most easily
 handled in SparkR through RDD functions, go ahead and let the dev email
 list know.

  Alek
 [1] -- https://issues.apache.org/jira/browse/SPARK-7230

   From: Wei Zhou zhweisop...@gmail.com
 Date: Wednesday, June 24, 2015 at 4:59 PM
 To: user@spark.apache.org user@spark.apache.org
 Subject: How to Map and Reduce in sparkR

   Anyone knows whether sparkR supports map and reduce operations as the
 RDD transformations? Thanks in advance.

  Best,
 Wei
CONFIDENTIALITY NOTICE This message and any included attachments are
 from Cerner Corporation and are intended only for the addressee. The
 information contained in this message is confidential and may constitute
 inside or non-public information under international, federal, or state
 securities laws. Unauthorized forwarding, printing, copying, distribution,
 or use of such information is strictly prohibited and may be unlawful. If
 you are not the addressee, please promptly delete this message and notify
 the sender of the delivery error by e-mail or you may call Cerner's
 corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.





Re: How to Map and Reduce in sparkR

2015-06-25 Thread Wei Zhou
Thanks Shivaram. For those who prefer to watch the video version for the
talk, like me, you can actually register for spark summit live stream 2015
free of cost. I personally find the talk extremely helpful.

2015-06-25 15:20 GMT-07:00 Shivaram Venkataraman shiva...@eecs.berkeley.edu
:

 We don't support UDFs on DataFrames in SparkR in the 1.4 release. The
 existing functionality can be seen as a pre-processing step which you can
 do and then collect data back to the driver to do more complex processing.
 Along with the RDD API ticket, we are also working on UDF support. You can
 see the Spark summit talk slides from last week for a bigger picture view
 http://www.slideshare.net/SparkSummit/07-venkataraman-sun

 Thanks
 Shivaram

 On Thu, Jun 25, 2015 at 3:08 PM, Wei Zhou zhweisop...@gmail.com wrote:

 Hi Shivaram/Alek,

 I understand that a better way to import data is to DataFrame rather than
 RDD. If one wants to do a map-like transformation for such row in sparkR,
 one could use sparkR:::lapply(), but is there a counterpart row operation
 on DataFrame? The use case I am working on requires complicated row level
 pre-processing and then goes to the actually modeling.

 Thanks.

 Best,
 Wei

 2015-06-25 9:25 GMT-07:00 Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu:

 In addition to Aleksander's point please let us know what use case would
 use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 --
 We are hoping to have a version of this API in upcoming releases.

 Thanks
 Shivaram

 On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander 
 alek.eskil...@cerner.com wrote:

  The  simple answer is that SparkR does support map/reduce operations
 over RDD’s through the RDD API, but since Spark v 1.4.0, those functions
 were made private in SparkR. They can still be accessed by prepending the
 function with the namespace, like SparkR:::lapply(rdd, func). It was
 thought though that many of the functions in the RDD API were too low level
 to expose, with much more of the focus going into the DataFrame API. The
 original rationale for this decision can be found in its JIRA [1]. The devs
 are still deciding which functions of the RDD API, if any, should be made
 public for future releases. If you feel some use cases are most easily
 handled in SparkR through RDD functions, go ahead and let the dev email
 list know.

  Alek
 [1] -- https://issues.apache.org/jira/browse/SPARK-7230

   From: Wei Zhou zhweisop...@gmail.com
 Date: Wednesday, June 24, 2015 at 4:59 PM
 To: user@spark.apache.org user@spark.apache.org
 Subject: How to Map and Reduce in sparkR

   Anyone knows whether sparkR supports map and reduce operations as
 the RDD transformations? Thanks in advance.

  Best,
 Wei
CONFIDENTIALITY NOTICE This message and any included attachments
 are from Cerner Corporation and are intended only for the addressee. The
 information contained in this message is confidential and may constitute
 inside or non-public information under international, federal, or state
 securities laws. Unauthorized forwarding, printing, copying, distribution,
 or use of such information is strictly prohibited and may be unlawful. If
 you are not the addressee, please promptly delete this message and notify
 the sender of the delivery error by e-mail or you may call Cerner's
 corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024
 .