Re: Spark MLib v/s SparkR
Hi, It depends on the problem that you work on. Just as python and R, Mllib focuses on machine learning and SparkR will focus on statistics, if SparkR follow the way of R. For instance, If you want to use glm to analyse data: 1. if you are interested only in parameters of model, and use this model to predict, then you should use Mllib 2. if your focus is on confidence of the model, for example the confidence interval of result and the significant level of parameters, you should choose SparkR. However, as there is no glm package to this purpose yet, you need to code it by yourself. Hope it can be helpful Cheers Gen On Thu, Aug 6, 2015 at 2:24 AM, praveen S mylogi...@gmail.com wrote: I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib?
Re: Spark MLib v/s SparkR
SparkR and MLlib are becoming more integrated (we recently added R formula support) but the integration is still quite small. If you learn R and SparkR, you will not be able to leverage most of the distributed algorithms in MLlib (e.g. all the algorithms you cited). However, you could use the equivalent R implementations (e.g. glm for Logistic) but be aware that these will not scale to the large scale datasets Spark is designed to handle. On Thu, Aug 6, 2015 at 8:06 PM, praveen S mylogi...@gmail.com wrote: I am starting off with classification models, Logistic,RandomForest. Basically wanted to learn Machine learning. Since I have a java background I started off with MLib, but later heard R works as well ( with scaling issues - only). So, with SparkR was wondering the scaling issue would be resolved - hence my question why not go with R and Spark R alone.( keeping aside my inclination towards java) On Thu, Aug 6, 2015 at 12:28 AM, Charles Earl charles.ce...@gmail.com wrote: What machine learning algorithms are you interested in exploring or using? Start from there or better yet the problem you are trying to solve, and then the selection may be evident. On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote: I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib? -- - Charles
Re: Spark MLib v/s SparkR
I am starting off with classification models, Logistic,RandomForest. Basically wanted to learn Machine learning. Since I have a java background I started off with MLib, but later heard R works as well ( with scaling issues - only). So, with SparkR was wondering the scaling issue would be resolved - hence my question why not go with R and Spark R alone.( keeping aside my inclination towards java) On Thu, Aug 6, 2015 at 12:28 AM, Charles Earl charles.ce...@gmail.com wrote: What machine learning algorithms are you interested in exploring or using? Start from there or better yet the problem you are trying to solve, and then the selection may be evident. On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote: I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib? -- - Charles
Spark MLib v/s SparkR
I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib?
Re: Spark MLib v/s SparkR
SparkR doesn't support all the ML algorithms yet, the next 1.5 will have more support but still not all the algorithms that are currently supported in Mllib. SparkR is more of a convenience for R users to get acquainted with Spark at this point. -Ali From: praveen S mylogi...@gmail.commailto:mylogi...@gmail.com Date: Wednesday, August 5, 2015 at 2:24 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Spark MLib v/s SparkR I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib?
Re: Spark MLib v/s SparkR
A few points to consider: a) SparkR gives the union of R_in_a_single_machine and the distributed_computing_of_Spark: b) It also gives the ability to wrangle with data in R, that is in the Spark eco system c) Coming to MLlib, the question is MLlib and R (not MLlib or R) - depending on the scale, data location et al d) As Ali mentioned, some of the MLlib might not be supported in R (I haven't looked at it that carefully, but can be resolved by the APIs), OTOH, 1.5 is on it's way. e) So it all depends on the algorithms that one wants to use and whether one needs R for pre or post processing HTH. Cheers k/ On Wed, Aug 5, 2015 at 11:24 AM, praveen S mylogi...@gmail.com wrote: I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib?
Re: Spark MLib v/s SparkR
What machine learning algorithms are you interested in exploring or using? Start from there or better yet the problem you are trying to solve, and then the selection may be evident. On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote: I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib? -- - Charles