RE: Specifying Scala types when calling methods from SparkR
Hi Sun Rui, I’ve had some luck simply using “objectFile” when saving from SparkR directly. The problem is that if you do it that way, the model object will only work if you continue to use the current Spark Context, and I think model persistence should really enable you to use the model at a later time. That’s where I found that I could drop down to the JVM level and interact with the Scala object directly, but that seems to only work if you specify the type. On December 9, 2015 at 7:59:43 PM, Sun, Rui (rui@intel.com<mailto:rui@intel.com>) wrote: Hi, Just use ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. You can take the objectFile() in context.R as example. Since the SparkContext created in SparkR is actually a JavaSparkContext, there is no need to pass the implicit ClassTag. -Original Message- From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu] Sent: Thursday, December 10, 2015 8:21 AM To: Chris Freeman Cc: dev@spark.apache.org Subject: Re: Specifying Scala types when calling methods from SparkR The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do something like create a class tag for the LinearRegressionModel and pass that in as the first or last argument etc. Thanks Shivaram On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman <cfree...@alteryx.com> wrote: > Hey everyone, > > I’m currently looking at ways to save out SparkML model objects from > SparkR and I’ve had some luck putting the model into an RDD and then > saving the RDD as an Object File. Once it’s saved, I’m able to load it > back in with something like: > > sc.objectFile[LinearRegressionModel](“path/to/model”) > > I’d like to try and replicate this same process from SparkR using the > JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able > to replicate my success and I’m guessing that it’s (at least in part) > due to the necessity of specifying the type when calling the objectFile > method. > > Does anyone know if this is actually possible? For example, here’s > what I’ve come up with so far: > > loadModel <- function(sc, modelPath) { > modelRDD <- SparkR:::callJMethod(sc, > > "objectFile[PipelineModel]", > modelPath, > SparkR:::getMinPartitions(sc, NULL)) > return(modelRDD) > } > > Any help is appreciated! > > -- > Chris Freeman > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
RE: Specifying Scala types when calling methods from SparkR
Hi, Chris, I know your point: objectFile and saveAsObjectFile pair in SparkR can only be used in SparkR context, as the content of RDD is assumed to be serialized R objects. It’s fine to drop down to JVM level in the case the model is saved as objectFile in Scala, and load it in SparkR. But I don’t understand “but that seems to only work if you specify the type”, seems no need to specify type because of type erasure? Did you try something like: convert the RDD to DataFrame, save it , and load it as a DataFrame in SparkR and then to RDD? From: Chris Freeman [mailto:cfree...@alteryx.com] Sent: Friday, December 11, 2015 2:47 AM To: Sun, Rui; shiva...@eecs.berkeley.edu Cc: dev@spark.apache.org Subject: RE: Specifying Scala types when calling methods from SparkR Hi Sun Rui, I’ve had some luck simply using “objectFile” when saving from SparkR directly. The problem is that if you do it that way, the model object will only work if you continue to use the current Spark Context, and I think model persistence should really enable you to use the model at a later time. That’s where I found that I could drop down to the JVM level and interact with the Scala object directly, but that seems to only work if you specify the type. On December 9, 2015 at 7:59:43 PM, Sun, Rui (rui@intel.com<mailto:rui@intel.com>) wrote: Hi, Just use ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. You can take the objectFile() in context.R as example. Since the SparkContext created in SparkR is actually a JavaSparkContext, there is no need to pass the implicit ClassTag. -Original Message- From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu] Sent: Thursday, December 10, 2015 8:21 AM To: Chris Freeman Cc: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Re: Specifying Scala types when calling methods from SparkR The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do something like create a class tag for the LinearRegressionModel and pass that in as the first or last argument etc. Thanks Shivaram On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman <cfree...@alteryx.com<mailto:cfree...@alteryx.com>> wrote: > Hey everyone, > > I’m currently looking at ways to save out SparkML model objects from > SparkR and I’ve had some luck putting the model into an RDD and then > saving the RDD as an Object File. Once it’s saved, I’m able to load it > back in with something like: > > sc.objectFile[LinearRegressionModel](“path/to/model”) > > I’d like to try and replicate this same process from SparkR using the > JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able > to replicate my success and I’m guessing that it’s (at least in part) > due to the necessity of specifying the type when calling the objectFile > method. > > Does anyone know if this is actually possible? For example, here’s > what I’ve come up with so far: > > loadModel <- function(sc, modelPath) { > modelRDD <- SparkR:::callJMethod(sc, > > "objectFile[PipelineModel]", > modelPath, > SparkR:::getMinPartitions(sc, NULL)) > return(modelRDD) > } > > Any help is appreciated! > > -- > Chris Freeman > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org> For additional commands, e-mail: dev-h...@spark.apache.org<mailto:dev-h...@spark.apache.org>
RE: Specifying Scala types when calling methods from SparkR
Hi, Just use ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. You can take the objectFile() in context.R as example. Since the SparkContext created in SparkR is actually a JavaSparkContext, there is no need to pass the implicit ClassTag. -Original Message- From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu] Sent: Thursday, December 10, 2015 8:21 AM To: Chris Freeman Cc: dev@spark.apache.org Subject: Re: Specifying Scala types when calling methods from SparkR The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do something like create a class tag for the LinearRegressionModel and pass that in as the first or last argument etc. Thanks Shivaram On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman <cfree...@alteryx.com> wrote: > Hey everyone, > > I’m currently looking at ways to save out SparkML model objects from > SparkR and I’ve had some luck putting the model into an RDD and then > saving the RDD as an Object File. Once it’s saved, I’m able to load it > back in with something like: > > sc.objectFile[LinearRegressionModel](“path/to/model”) > > I’d like to try and replicate this same process from SparkR using the > JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able > to replicate my success and I’m guessing that it’s (at least in part) > due to the necessity of specifying the type when calling the objectFile > method. > > Does anyone know if this is actually possible? For example, here’s > what I’ve come up with so far: > > loadModel <- function(sc, modelPath) { > modelRDD <- SparkR:::callJMethod(sc, > > "objectFile[PipelineModel]", > modelPath, > SparkR:::getMinPartitions(sc, NULL)) > return(modelRDD) > } > > Any help is appreciated! > > -- > Chris Freeman > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Specifying Scala types when calling methods from SparkR
The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do something like create a class tag for the LinearRegressionModel and pass that in as the first or last argument etc. Thanks Shivaram On Wed, Dec 9, 2015 at 10:11 AM, Chris Freemanwrote: > Hey everyone, > > I’m currently looking at ways to save out SparkML model objects from SparkR > and I’ve had some luck putting the model into an RDD and then saving the RDD > as an Object File. Once it’s saved, I’m able to load it back in with > something like: > > sc.objectFile[LinearRegressionModel](“path/to/model”) > > I’d like to try and replicate this same process from SparkR using the JVM > backend APIs (e.g. “callJMethod”), but so far I haven’t been able to > replicate my success and I’m guessing that it’s (at least in part) due to > the necessity of specifying the type when calling the objectFile method. > > Does anyone know if this is actually possible? For example, here’s what I’ve > come up with so far: > > loadModel <- function(sc, modelPath) { > modelRDD <- SparkR:::callJMethod(sc, > > "objectFile[PipelineModel]", > modelPath, > SparkR:::getMinPartitions(sc, NULL)) > return(modelRDD) > } > > Any help is appreciated! > > -- > Chris Freeman > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org