Re: How to save mllib model to hdfs and reload it

Jaideep Dhok Wed, 13 Aug 2014 06:48:54 -0700

Hi,
I have faced a similar issue when trying to run a map function with
predict. In my case I had some non-serializable fields in my calling class.
After making those fields transient, the error went away.



On Wed, Aug 13, 2014 at 6:39 PM, lancezhange <lancezha...@gmail.com> wrote:

> let's say you have a model which is of class
> "org.apache.spark.mllib.classification.LogisticRegressionModel"
> you can save model to disk as following:
>
>   /import java.io.FileOutputStream
>   import java.io.ObjectOutputStream
>   val fos = new FileOutputStream("e:/model.obj")
>   val oos = new ObjectOutputStream(fos)
>   oos.writeObject(model)
>   oos.close/
>
> and load it in:
>   /import java.io.FileInputStream
>   import java.io.ObjectInputStream
>   val fos = new FileInputStream("e:/model.obj")
>   val oos = new ObjectInputStream(fos)
>   val newModel =
>
> oos.readObject().asInstanceOf[org.apache.spark.mllib.classification.LogisticRegressionModel]/
>
> you can check that '/newModel.weights/' gives you the weights, implying
> that
> newModel is loaded successfully.
>
> There remains, however, another problem, which confuses me badly: when i
> use
> the loaded newModel to predict on LabeledPoints, there is always a "Task
> not
> serializable" exception! Detailed logs:
> INFO DAGScheduler: Failed to run count at <console>:49
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not
> serializable: java.io.NotSeri
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndInd
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
>         at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$submitMissing
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$submitStage(D
>         at
>
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697)
>         at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGSch
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219) in 2646 ms on
> localhost (progress: 345/345)
>         at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)ed in
> 528.389 s
>         at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)ed,
> from pool
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Any help here?
>  PS. any one knows the *constructor function* of the model assuming you
> have
> weights and intercept?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12030.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: How to save mllib model to hdfs and reload it

Reply via email to