Hi Given training and predictions are two different applications, I typically save model objects to hdfs and load it back during prediction map stages.
Best Ayan On Fri, 13 Jan 2017 at 5:39 am, Sumona Routh <sumos...@gmail.com> wrote: > Hi all, > I've been working with Spark mllib 2.0.2 RandomForestClassificationModel. > > I encountered two frustrating issues and would really appreciate some > advice: > > 1) RandomForestClassificationModel is effectively not serializable (I > assume it's referencing something that can't be serialized, since it itself > extends serializable), so I ended up with the well-known exception: > org.apache.spark.SparkException: Task not serializable. > Basically, my original intention was to pass the model as a parameter > > because which model we use is dynamic based on what record we are > > predicting on. > > Has anyone else encountered this? Is this currently being addressed? I > would expect objects from Spark's own libraries be able to be used > seamlessly in their applications without these types of exceptions. > > 2) The RandomForestClassificationModel.load method appears to hang > indefinitely when executed from inside a map function (which I assume is > passed to the executor). So, I basically cannot load a model from a worker. > We have multiple "profiles" that use differently trained models, which are > accessed from within a map function to run predictions on different sets of > data. > The thread that is hanging has this as the latest (most pertinent) code: > > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:391) > Looking at the code in github, it appears that it is calling sc.textFile. > I could not find anything stating that this particular function would not > work from within a map function. > > Are there any suggestions as to how I can get this model to work on a real > production job (either by allowing it to be serializable and passed around > or loaded from a worker)? > > I've extenisvely POCed this model (saving, loading, transforming, > training, etc.), however this is the first time I'm attempting to use it > from within a real application. > > Sumona >