Re: Support R in Spark

2014-09-19 Thread oppokui
Thanks, Shivaram. 

Kui

> On Sep 19, 2014, at 12:58 AM, Shivaram Venkataraman 
>  wrote:
> 
> As R is single-threaded, SparkR launches one R process per-executor on
> the worker side.
> 
> Thanks
> Shivaram
> 
> On Thu, Sep 18, 2014 at 7:49 AM, oppokui  wrote:
>> Shivaram,
>> 
>> As I know, SparkR used rJava package. In work node, spark code will execute 
>> R code by launching R process and send/receive byte array.
>> I have a question on when to launch R process. R process is per Work 
>> process, or per executor thread, or per each RDD processing?
>> 
>> Thanks and Regards.
>> 
>> Kui
>> 
>>> On Sep 6, 2014, at 5:53 PM, oppokui  wrote:
>>> 
>>> Cool! It is a very good news. Can’t wait for it.
>>> 
>>> Kui
>>> 
 On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman 
  wrote:
 
 Thanks Kui. SparkR is a pretty young project, but there are a bunch of
 things we are working on. One of the main features is to expose a data
 frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
 be integrating this with Spark's MLLib.  At a high-level this will
 allow R users to use a familiar API but make use of MLLib's efficient
 distributed implementation. This is the same strategy used in Python
 as well.
 
 Also we do hope to merge SparkR with mainline Spark -- we have a few
 features to complete before that and plan to shoot for integration by
 Spark 1.3.
 
 Thanks
 Shivaram
 
 On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
> Thanks, Shivaram.
> 
> No specific use case yet. We try to use R in our project as data scientest
> are all knowing R. We had a concern that how R handles the mass data. 
> Spark
> does a better work on big data area, and Spark ML is focusing on 
> predictive
> analysis area. Then we are thinking whether we can merge R and Spark
> together. We tried SparkR and it is pretty easy to use. But we didn’t see
> any feedback on this package in industry. It will be better if Spark team
> has R support just like scala/Java/Python.
> 
> Another question is that MLlib will re-implement all famous data mining
> algorithms in Spark, then what is the purpose of using R?
> 
> There is another technique for us H2O which support R natively. H2O is 
> more
> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> Water).  It is better than using SparkR?
> 
> Thanks and Regards.
> 
> Kui
> 
> 
> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>  wrote:
> 
> Hi
> 
> Do you have a specific use-case where SparkR doesn't work well ? We'd love
> to hear more about use-cases and features that can be improved with 
> SparkR.
> 
> Thanks
> Shivaram
> 
> 
> On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
>> 
>> Does spark ML team have plan to support R script natively? There is a
>> SparkR project, but not from spark team. Spark ML used netlib-java to 
>> talk
>> with native fortran routines or use NumPy, why not try to use R in some
>> sense.
>> 
>> R had lot of useful packages. If spark ML team can include R support, it
>> will be a very powerful.
>> 
>> Any comment?
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
> 
> 
>>> 
>> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Support R in Spark

2014-09-18 Thread Shivaram Venkataraman
As R is single-threaded, SparkR launches one R process per-executor on
the worker side.

Thanks
Shivaram

On Thu, Sep 18, 2014 at 7:49 AM, oppokui  wrote:
> Shivaram,
>
> As I know, SparkR used rJava package. In work node, spark code will execute R 
> code by launching R process and send/receive byte array.
> I have a question on when to launch R process. R process is per Work process, 
> or per executor thread, or per each RDD processing?
>
> Thanks and Regards.
>
> Kui
>
>> On Sep 6, 2014, at 5:53 PM, oppokui  wrote:
>>
>> Cool! It is a very good news. Can’t wait for it.
>>
>> Kui
>>
>>> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman 
>>>  wrote:
>>>
>>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>>> things we are working on. One of the main features is to expose a data
>>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>>> be integrating this with Spark's MLLib.  At a high-level this will
>>> allow R users to use a familiar API but make use of MLLib's efficient
>>> distributed implementation. This is the same strategy used in Python
>>> as well.
>>>
>>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>>> features to complete before that and plan to shoot for integration by
>>> Spark 1.3.
>>>
>>> Thanks
>>> Shivaram
>>>
>>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
 Thanks, Shivaram.

 No specific use case yet. We try to use R in our project as data scientest
 are all knowing R. We had a concern that how R handles the mass data. Spark
 does a better work on big data area, and Spark ML is focusing on predictive
 analysis area. Then we are thinking whether we can merge R and Spark
 together. We tried SparkR and it is pretty easy to use. But we didn’t see
 any feedback on this package in industry. It will be better if Spark team
 has R support just like scala/Java/Python.

 Another question is that MLlib will re-implement all famous data mining
 algorithms in Spark, then what is the purpose of using R?

 There is another technique for us H2O which support R natively. H2O is more
 friendly to data scientist. I saw H2O can also work on Spark (Sparkling
 Water).  It is better than using SparkR?

 Thanks and Regards.

 Kui


 On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
  wrote:

 Hi

 Do you have a specific use-case where SparkR doesn't work well ? We'd love
 to hear more about use-cases and features that can be improved with SparkR.

 Thanks
 Shivaram


 On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
>
> Does spark ML team have plan to support R script natively? There is a
> SparkR project, but not from spark team. Spark ML used netlib-java to talk
> with native fortran routines or use NumPy, why not try to use R in some
> sense.
>
> R had lot of useful packages. If spark ML team can include R support, it
> will be a very powerful.
>
> Any comment?
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>


>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Support R in Spark

2014-09-18 Thread oppokui
Shivaram, 

As I know, SparkR used rJava package. In work node, spark code will execute R 
code by launching R process and send/receive byte array. 
I have a question on when to launch R process. R process is per Work process, 
or per executor thread, or per each RDD processing?

Thanks and Regards.

Kui  

> On Sep 6, 2014, at 5:53 PM, oppokui  wrote:
> 
> Cool! It is a very good news. Can’t wait for it.
> 
> Kui 
> 
>> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman 
>>  wrote:
>> 
>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>> things we are working on. One of the main features is to expose a data
>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>> be integrating this with Spark's MLLib.  At a high-level this will
>> allow R users to use a familiar API but make use of MLLib's efficient
>> distributed implementation. This is the same strategy used in Python
>> as well.
>> 
>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>> features to complete before that and plan to shoot for integration by
>> Spark 1.3.
>> 
>> Thanks
>> Shivaram
>> 
>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
>>> Thanks, Shivaram.
>>> 
>>> No specific use case yet. We try to use R in our project as data scientest
>>> are all knowing R. We had a concern that how R handles the mass data. Spark
>>> does a better work on big data area, and Spark ML is focusing on predictive
>>> analysis area. Then we are thinking whether we can merge R and Spark
>>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>>> any feedback on this package in industry. It will be better if Spark team
>>> has R support just like scala/Java/Python.
>>> 
>>> Another question is that MLlib will re-implement all famous data mining
>>> algorithms in Spark, then what is the purpose of using R?
>>> 
>>> There is another technique for us H2O which support R natively. H2O is more
>>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>>> Water).  It is better than using SparkR?
>>> 
>>> Thanks and Regards.
>>> 
>>> Kui
>>> 
>>> 
>>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>>>  wrote:
>>> 
>>> Hi
>>> 
>>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>>> to hear more about use-cases and features that can be improved with SparkR.
>>> 
>>> Thanks
>>> Shivaram
>>> 
>>> 
>>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
 
 Does spark ML team have plan to support R script natively? There is a
 SparkR project, but not from spark team. Spark ML used netlib-java to talk
 with native fortran routines or use NumPy, why not try to use R in some
 sense.
 
 R had lot of useful packages. If spark ML team can include R support, it
 will be a very powerful.
 
 Any comment?
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
>>> 
>>> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui, sorry about that. That link you mentioned is probably the one for
the products. We don't have one pointing from adatao.com to ddf.io; maybe
we'll add it.

As for access to the code base itself, I think the team has already created
a GitHub repo for it, and should open it up within a few weeks. There's
some debate about whether to put out the implementation with Shark
dependencies now, or SparkSQL with a bit limited functionality and not as
well tested.

I'll check and ping when this is opened up.

The license is Apache.

Sent while mobile. Please excuse typos etc.
On Sep 6, 2014 1:39 PM, "oppokui"  wrote:

> Thanks, Christopher. I saw it before, it is amazing. Last time I try to
> download it from adatao, but no response after filling the table. How can I
> download it or its source code? What is the license?
>
> Kui
>
>
> On Sep 6, 2014, at 8:08 PM, Christopher Nguyen  wrote:
>
> Hi Kui,
>
> DDF (open sourced) also aims to do something similar, adding RDBMS idioms,
> and is already implemented on top of Spark.
>
> One philosophy is that the DDF API aggressively hides the notion of
> parallel datasets, exposing only (mutable) tables to users, on which they
> can apply R and other familiar data mining/machine learning idioms, without
> having to know about the distributed representation underneath. Now, you
> can get to the underlying RDDs if you want to, simply by asking for it.
>
> This was launched at the July Spark Summit. See
> http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us
> .
>
> Sent while mobile. Please excuse typos etc.
> On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" <
> shiva...@eecs.berkeley.edu> wrote:
>
>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>> things we are working on. One of the main features is to expose a data
>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>> be integrating this with Spark's MLLib.  At a high-level this will
>> allow R users to use a familiar API but make use of MLLib's efficient
>> distributed implementation. This is the same strategy used in Python
>> as well.
>>
>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>> features to complete before that and plan to shoot for integration by
>> Spark 1.3.
>>
>> Thanks
>> Shivaram
>>
>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
>> > Thanks, Shivaram.
>> >
>> > No specific use case yet. We try to use R in our project as data
>> scientest
>> > are all knowing R. We had a concern that how R handles the mass data.
>> Spark
>> > does a better work on big data area, and Spark ML is focusing on
>> predictive
>> > analysis area. Then we are thinking whether we can merge R and Spark
>> > together. We tried SparkR and it is pretty easy to use. But we didn’t
>> see
>> > any feedback on this package in industry. It will be better if Spark
>> team
>> > has R support just like scala/Java/Python.
>> >
>> > Another question is that MLlib will re-implement all famous data mining
>> > algorithms in Spark, then what is the purpose of using R?
>> >
>> > There is another technique for us H2O which support R natively. H2O is
>> more
>> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>> > Water).  It is better than using SparkR?
>> >
>> > Thanks and Regards.
>> >
>> > Kui
>> >
>> >
>> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>> >  wrote:
>> >
>> > Hi
>> >
>> > Do you have a specific use-case where SparkR doesn't work well ? We'd
>> love
>> > to hear more about use-cases and features that can be improved with
>> SparkR.
>> >
>> > Thanks
>> > Shivaram
>> >
>> >
>> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
>> >>
>> >> Does spark ML team have plan to support R script natively? There is a
>> >> SparkR project, but not from spark team. Spark ML used netlib-java to
>> talk
>> >> with native fortran routines or use NumPy, why not try to use R in some
>> >> sense.
>> >>
>> >> R had lot of useful packages. If spark ML team can include R support,
>> it
>> >> will be a very powerful.
>> >>
>> >> Any comment?
>> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Support R in Spark

2014-09-06 Thread oppokui
Thanks, Christopher. I saw it before, it is amazing. Last time I try to 
download it from adatao, but no response after filling the table. How can I 
download it or its source code? What is the license?

Kui


> On Sep 6, 2014, at 8:08 PM, Christopher Nguyen  wrote:
> 
> Hi Kui,
> 
> DDF (open sourced) also aims to do something similar, adding RDBMS idioms, 
> and is already implemented on top of Spark.
> 
> One philosophy is that the DDF API aggressively hides the notion of parallel 
> datasets, exposing only (mutable) tables to users, on which they can apply R 
> and other familiar data mining/machine learning idioms, without having to 
> know about the distributed representation underneath. Now, you can get to the 
> underlying RDDs if you want to, simply by asking for it.
> 
> This was launched at the July Spark Summit. See 
> http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us
>  .
> 
> Sent while mobile. Please excuse typos etc.
> 
> On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman"  
> wrote:
> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
> things we are working on. One of the main features is to expose a data
> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
> be integrating this with Spark's MLLib.  At a high-level this will
> allow R users to use a familiar API but make use of MLLib's efficient
> distributed implementation. This is the same strategy used in Python
> as well.
> 
> Also we do hope to merge SparkR with mainline Spark -- we have a few
> features to complete before that and plan to shoot for integration by
> Spark 1.3.
> 
> Thanks
> Shivaram
> 
> On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
> > Thanks, Shivaram.
> >
> > No specific use case yet. We try to use R in our project as data scientest
> > are all knowing R. We had a concern that how R handles the mass data. Spark
> > does a better work on big data area, and Spark ML is focusing on predictive
> > analysis area. Then we are thinking whether we can merge R and Spark
> > together. We tried SparkR and it is pretty easy to use. But we didn’t see
> > any feedback on this package in industry. It will be better if Spark team
> > has R support just like scala/Java/Python.
> >
> > Another question is that MLlib will re-implement all famous data mining
> > algorithms in Spark, then what is the purpose of using R?
> >
> > There is another technique for us H2O which support R natively. H2O is more
> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> > Water).  It is better than using SparkR?
> >
> > Thanks and Regards.
> >
> > Kui
> >
> >
> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
> >  wrote:
> >
> > Hi
> >
> > Do you have a specific use-case where SparkR doesn't work well ? We'd love
> > to hear more about use-cases and features that can be improved with SparkR.
> >
> > Thanks
> > Shivaram
> >
> >
> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
> >>
> >> Does spark ML team have plan to support R script natively? There is a
> >> SparkR project, but not from spark team. Spark ML used netlib-java to talk
> >> with native fortran routines or use NumPy, why not try to use R in some
> >> sense.
> >>
> >> R had lot of useful packages. If spark ML team can include R support, it
> >> will be a very powerful.
> >>
> >> Any comment?
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
> >
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 



Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui,

DDF (open sourced) also aims to do something similar, adding RDBMS idioms,
and is already implemented on top of Spark.

One philosophy is that the DDF API aggressively hides the notion of
parallel datasets, exposing only (mutable) tables to users, on which they
can apply R and other familiar data mining/machine learning idioms, without
having to know about the distributed representation underneath. Now, you
can get to the underlying RDDs if you want to, simply by asking for it.

This was launched at the July Spark Summit. See
http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us
.

Sent while mobile. Please excuse typos etc.
On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" 
wrote:

> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
> things we are working on. One of the main features is to expose a data
> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
> be integrating this with Spark's MLLib.  At a high-level this will
> allow R users to use a familiar API but make use of MLLib's efficient
> distributed implementation. This is the same strategy used in Python
> as well.
>
> Also we do hope to merge SparkR with mainline Spark -- we have a few
> features to complete before that and plan to shoot for integration by
> Spark 1.3.
>
> Thanks
> Shivaram
>
> On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
> > Thanks, Shivaram.
> >
> > No specific use case yet. We try to use R in our project as data
> scientest
> > are all knowing R. We had a concern that how R handles the mass data.
> Spark
> > does a better work on big data area, and Spark ML is focusing on
> predictive
> > analysis area. Then we are thinking whether we can merge R and Spark
> > together. We tried SparkR and it is pretty easy to use. But we didn’t see
> > any feedback on this package in industry. It will be better if Spark team
> > has R support just like scala/Java/Python.
> >
> > Another question is that MLlib will re-implement all famous data mining
> > algorithms in Spark, then what is the purpose of using R?
> >
> > There is another technique for us H2O which support R natively. H2O is
> more
> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> > Water).  It is better than using SparkR?
> >
> > Thanks and Regards.
> >
> > Kui
> >
> >
> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
> >  wrote:
> >
> > Hi
> >
> > Do you have a specific use-case where SparkR doesn't work well ? We'd
> love
> > to hear more about use-cases and features that can be improved with
> SparkR.
> >
> > Thanks
> > Shivaram
> >
> >
> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
> >>
> >> Does spark ML team have plan to support R script natively? There is a
> >> SparkR project, but not from spark team. Spark ML used netlib-java to
> talk
> >> with native fortran routines or use NumPy, why not try to use R in some
> >> sense.
> >>
> >> R had lot of useful packages. If spark ML team can include R support, it
> >> will be a very powerful.
> >>
> >> Any comment?
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Support R in Spark

2014-09-06 Thread oppokui
Cool! It is a very good news. Can’t wait for it.

Kui 

> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman 
>  wrote:
> 
> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
> things we are working on. One of the main features is to expose a data
> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
> be integrating this with Spark's MLLib.  At a high-level this will
> allow R users to use a familiar API but make use of MLLib's efficient
> distributed implementation. This is the same strategy used in Python
> as well.
> 
> Also we do hope to merge SparkR with mainline Spark -- we have a few
> features to complete before that and plan to shoot for integration by
> Spark 1.3.
> 
> Thanks
> Shivaram
> 
> On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
>> Thanks, Shivaram.
>> 
>> No specific use case yet. We try to use R in our project as data scientest
>> are all knowing R. We had a concern that how R handles the mass data. Spark
>> does a better work on big data area, and Spark ML is focusing on predictive
>> analysis area. Then we are thinking whether we can merge R and Spark
>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>> any feedback on this package in industry. It will be better if Spark team
>> has R support just like scala/Java/Python.
>> 
>> Another question is that MLlib will re-implement all famous data mining
>> algorithms in Spark, then what is the purpose of using R?
>> 
>> There is another technique for us H2O which support R natively. H2O is more
>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>> Water).  It is better than using SparkR?
>> 
>> Thanks and Regards.
>> 
>> Kui
>> 
>> 
>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>>  wrote:
>> 
>> Hi
>> 
>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>> to hear more about use-cases and features that can be improved with SparkR.
>> 
>> Thanks
>> Shivaram
>> 
>> 
>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
>>> 
>>> Does spark ML team have plan to support R script natively? There is a
>>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>>> with native fortran routines or use NumPy, why not try to use R in some
>>> sense.
>>> 
>>> R had lot of useful packages. If spark ML team can include R support, it
>>> will be a very powerful.
>>> 
>>> Any comment?
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> 
>> 
>> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Support R in Spark

2014-09-04 Thread Shivaram Venkataraman
Thanks Kui. SparkR is a pretty young project, but there are a bunch of
things we are working on. One of the main features is to expose a data
frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
be integrating this with Spark's MLLib.  At a high-level this will
allow R users to use a familiar API but make use of MLLib's efficient
distributed implementation. This is the same strategy used in Python
as well.

Also we do hope to merge SparkR with mainline Spark -- we have a few
features to complete before that and plan to shoot for integration by
Spark 1.3.

Thanks
Shivaram

On Wed, Sep 3, 2014 at 9:24 PM, oppokui  wrote:
> Thanks, Shivaram.
>
> No specific use case yet. We try to use R in our project as data scientest
> are all knowing R. We had a concern that how R handles the mass data. Spark
> does a better work on big data area, and Spark ML is focusing on predictive
> analysis area. Then we are thinking whether we can merge R and Spark
> together. We tried SparkR and it is pretty easy to use. But we didn’t see
> any feedback on this package in industry. It will be better if Spark team
> has R support just like scala/Java/Python.
>
> Another question is that MLlib will re-implement all famous data mining
> algorithms in Spark, then what is the purpose of using R?
>
> There is another technique for us H2O which support R natively. H2O is more
> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> Water).  It is better than using SparkR?
>
> Thanks and Regards.
>
> Kui
>
>
> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>  wrote:
>
> Hi
>
> Do you have a specific use-case where SparkR doesn't work well ? We'd love
> to hear more about use-cases and features that can be improved with SparkR.
>
> Thanks
> Shivaram
>
>
> On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
>>
>> Does spark ML team have plan to support R script natively? There is a
>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>> with native fortran routines or use NumPy, why not try to use R in some
>> sense.
>>
>> R had lot of useful packages. If spark ML team can include R support, it
>> will be a very powerful.
>>
>> Any comment?
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Support R in Spark

2014-09-03 Thread oppokui
Thanks, Shivaram. 

No specific use case yet. We try to use R in our project as data scientest are 
all knowing R. We had a concern that how R handles the mass data. Spark does a 
better work on big data area, and Spark ML is focusing on predictive analysis 
area. Then we are thinking whether we can merge R and Spark together. We tried 
SparkR and it is pretty easy to use. But we didn’t see any feedback on this 
package in industry. It will be better if Spark team has R support just like 
scala/Java/Python. 

Another question is that MLlib will re-implement all famous data mining 
algorithms in Spark, then what is the purpose of using R?

There is another technique for us H2O which support R natively. H2O is more 
friendly to data scientist. I saw H2O can also work on Spark (Sparkling Water). 
 It is better than using SparkR?

Thanks and Regards.

Kui


> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman 
>  wrote:
> 
> Hi 
> 
> Do you have a specific use-case where SparkR doesn't work well ? We'd love to 
> hear more about use-cases and features that can be improved with SparkR.
> 
> Thanks
> Shivaram
> 
> 
> On Wed, Sep 3, 2014 at 3:19 AM, oppokui  wrote:
> Does spark ML team have plan to support R script natively? There is a SparkR 
> project, but not from spark team. Spark ML used netlib-java to talk with 
> native fortran routines or use NumPy, why not try to use R in some sense.
> 
> R had lot of useful packages. If spark ML team can include R support, it will 
> be a very powerful.
> 
> Any comment?
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 



Support R in Spark

2014-09-03 Thread oppokui
Does spark ML team have plan to support R script natively? There is a SparkR 
project, but not from spark team. Spark ML used netlib-java to talk with native 
fortran routines or use NumPy, why not try to use R in some sense. 

R had lot of useful packages. If spark ML team can include R support, it will 
be a very powerful. 

Any comment?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org