Hi Spark Experts, We are trying to streamline the development lifecycle of our data scientists taking algorithms from the lab into production. Currently the tool of choice for our data scientists is R. Historically our engineers have had to manually convert the R based algorithms to Java or Scala to run in production on Hadoop or Spark clusters.
We are curious if we can do better by leveraging SparkR or MLlib by data scientists to minimize the manual translation to move algorithms into production. Ideally it would be great to use SparkR as the data scientists are much more familiar with R than MLlib. Can SparkR run in production or are there some downsides to this approach? I noticed the following JIRAs for MLlib / SparkR integration. https://issues.apache.org/jira/browse/SPARK-6805 https://issues.apache.org/jira/browse/SPARK-9647 Beyond the lack of full MLlib features supported in SparkR, the main question is if it is as stable and fault tolerant as using MLlib directly. Thanks in advance for any guidance you can provide. Jonathan