I am trying to use an existing R package in SparkR. I am trying to follow the example at https://amplab-extras.github.io/SparkR-pkg/ in the section "Using existing R packages".
Here is the sample in ample extras -- generateSparse <- function(x) { # Use sparseMatrix function from the Matrix package sparseMatrix(i=c(1, 2, 3), j=c(1, 2, 3), x=c(1, 2, 3)) } includePackage(sc, Matrix) sparseMat <- lapplyPartition(rdd, generateSparse) My package (named 'galileo') consists of a number of clustering methods that operate on input in a dense matrix. Here is my code prototype, based on the above sample: t1 <- jsonFile(sqlContext,"/root/test1.txt") runGalileo <- function(x) { galileo(x,model="kmeans",dist="maximum", K=5) } SparkR:::includePackage(sc,galileo) f <- SparkR:::lapplyPartition(t1,runGalileo) I'm assuming t1 would be a data frame created from data coming from my existing application as json ( in the prototype from a file, ultimately from MongoDB). So my first question is - what should that json look like to represent a dense matrix (dgeMatrix in R, perhaps) ? Question two- I have noted that some of the APIs in the example are no longer readily available (I had to add "SparkR:::" to use lapplyPartition - in Spark 1.4). Is there a different way I should be calling existing R packages? Where I am coming from is that I was developing a distributed worker akka/scala framework for scaling out my use of R to run a large numbers of R methods on behalf of a large number of users through multiple RServe instances. The call "galileo(x,model="kmeans",dist="maximum", K=5)", where x is the dense matrix, is typical of the R calls I was sending to RServe. As I was developing this I kept running into posts to the Spark User group when I googled troublesome stack traces I was encountering. As I began to become familiar with Spark and saw that it included SparkR, I came to see this as an alternative to me developing my own system with all the challenges I was anticipating. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-existing-R-packages-from-SparkR-tp24693.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org