Hi Spark Experts,

We are trying to streamline the development lifecycle of our data
scientists taking algorithms from the lab into production.  Currently the
tool of choice for our data scientists is R.  Historically our engineers
have had to manually convert the R based algorithms to Java or Scala to run
in production on Hadoop or Spark clusters.

We are curious if we can do better by leveraging SparkR or MLlib by data
scientists to minimize the manual translation to move algorithms into
production.  Ideally it would be great to use SparkR as the data scientists
are much more familiar with R than MLlib.  Can SparkR run in production or
are there some downsides to this approach?

I noticed the following JIRAs for MLlib / SparkR integration.

https://issues.apache.org/jira/browse/SPARK-6805
https://issues.apache.org/jira/browse/SPARK-9647

Beyond the lack of full MLlib features supported in SparkR, the main
question is if it is as stable and fault tolerant as using MLlib directly.

Thanks in advance for any guidance you can provide.

Jonathan

Reply via email to