Re: Spark MLib v/s SparkR

2015-08-07 Thread gen tang
Hi,

It depends on the problem that you work on.
Just as python and R, Mllib focuses on machine learning and SparkR will
focus on statistics, if SparkR follow the way of R.

For instance, If you want to use glm to analyse data:
1. if you are interested only in parameters of model, and use this model to
predict, then you should use Mllib
2. if your focus is on confidence of the model, for example the confidence
interval of result and the significant level of parameters, you should
choose SparkR. However, as there is no glm package to this purpose yet, you
need to code it by yourself.

Hope it can be helpful

Cheers
Gen


On Thu, Aug 6, 2015 at 2:24 AM, praveen S mylogi...@gmail.com wrote:

 I was wondering when one should go for MLib or SparkR. What is the
 criteria or what should be considered before choosing either of the
 solutions for data analysis?
 or What is the advantages of Spark MLib over Spark R or advantages of
 SparkR over MLib?



Re: Spark MLib v/s SparkR

2015-08-07 Thread Feynman Liang
SparkR and MLlib are becoming more integrated (we recently added R formula
support) but the integration is still quite small. If you learn R and
SparkR, you will not be able to leverage most of the distributed algorithms
in MLlib (e.g. all the algorithms you cited). However, you could use the
equivalent R implementations (e.g. glm for Logistic) but be aware that
these will not scale to the large scale datasets Spark is designed to
handle.

On Thu, Aug 6, 2015 at 8:06 PM, praveen S mylogi...@gmail.com wrote:

 I am starting off with classification models, Logistic,RandomForest.
 Basically wanted to learn Machine learning.
 Since I have a java background I started off with MLib, but later heard R
 works as well ( with scaling issues - only).

 So, with SparkR was wondering the scaling issue would be resolved - hence
 my question why not go with R and Spark R alone.( keeping aside my
 inclination towards java)

 On Thu, Aug 6, 2015 at 12:28 AM, Charles Earl charles.ce...@gmail.com
 wrote:

 What machine learning algorithms are you interested in exploring or
 using? Start from there or better yet the problem you are trying to solve,
 and then the selection may be evident.


 On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote:

 I was wondering when one should go for MLib or SparkR. What is the
 criteria or what should be considered before choosing either of the
 solutions for data analysis?
 or What is the advantages of Spark MLib over Spark R or advantages of
 SparkR over MLib?



 --
 - Charles





Re: Spark MLib v/s SparkR

2015-08-06 Thread praveen S
I am starting off with classification models, Logistic,RandomForest.
Basically wanted to learn Machine learning.
Since I have a java background I started off with MLib, but later heard R
works as well ( with scaling issues - only).

So, with SparkR was wondering the scaling issue would be resolved - hence
my question why not go with R and Spark R alone.( keeping aside my
inclination towards java)

On Thu, Aug 6, 2015 at 12:28 AM, Charles Earl charles.ce...@gmail.com
wrote:

 What machine learning algorithms are you interested in exploring or using?
 Start from there or better yet the problem you are trying to solve, and
 then the selection may be evident.


 On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote:

 I was wondering when one should go for MLib or SparkR. What is the
 criteria or what should be considered before choosing either of the
 solutions for data analysis?
 or What is the advantages of Spark MLib over Spark R or advantages of
 SparkR over MLib?



 --
 - Charles



Spark MLib v/s SparkR

2015-08-05 Thread praveen S
I was wondering when one should go for MLib or SparkR. What is the criteria
or what should be considered before choosing either of the solutions for
data analysis?
or What is the advantages of Spark MLib over Spark R or advantages of
SparkR over MLib?


Re: Spark MLib v/s SparkR

2015-08-05 Thread Benamara, Ali
SparkR doesn't support all the ML algorithms yet, the next 1.5 will have more 
support but still not all the algorithms that are currently supported in Mllib. 
SparkR is more of a convenience for R users to get acquainted with Spark at 
this point.
 -Ali

From: praveen S mylogi...@gmail.commailto:mylogi...@gmail.com
Date: Wednesday, August 5, 2015 at 2:24 PM
To: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Spark MLib v/s SparkR

I was wondering when one should go for MLib or SparkR. What is the criteria or 
what should be considered before choosing either of the solutions for data 
analysis?
or What is the advantages of Spark MLib over Spark R or advantages of SparkR 
over MLib?


Re: Spark MLib v/s SparkR

2015-08-05 Thread Krishna Sankar
A few points to consider:
a) SparkR gives the union of R_in_a_single_machine and the
distributed_computing_of_Spark:
b) It also gives the ability to wrangle with data in R, that is in the
Spark eco system
c) Coming to MLlib, the question is MLlib and R (not MLlib or R) -
depending on the scale, data location et al
d) As Ali mentioned, some of the MLlib might not be supported in R (I
haven't looked at it that carefully, but can be resolved by the APIs),
OTOH, 1.5 is on it's way.
e) So it all depends on the algorithms that one wants to use and whether
one needs R for pre or post processing
HTH.
Cheers
k/

On Wed, Aug 5, 2015 at 11:24 AM, praveen S mylogi...@gmail.com wrote:

 I was wondering when one should go for MLib or SparkR. What is the
 criteria or what should be considered before choosing either of the
 solutions for data analysis?
 or What is the advantages of Spark MLib over Spark R or advantages of
 SparkR over MLib?



Re: Spark MLib v/s SparkR

2015-08-05 Thread Charles Earl
What machine learning algorithms are you interested in exploring or using?
Start from there or better yet the problem you are trying to solve, and
then the selection may be evident.


On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote:

 I was wondering when one should go for MLib or SparkR. What is the
 criteria or what should be considered before choosing either of the
 solutions for data analysis?
 or What is the advantages of Spark MLib over Spark R or advantages of
 SparkR over MLib?



-- 
- Charles