Re: Save RandomForest Model from ML package

2015-10-23 Thread amarouni

It's an open issue : https://issues.apache.org/jira/browse/SPARK-4587

That's being said, you can workaround the issue by serializing the Model
(simple java serialization) and then restoring it before calling the
predicition job.

Best Regards,

On 22/10/2015 14:33, Sebastian Kuepers wrote:
> Hey,
>
> I try to figure out the best practice on saving and loading models
> which have bin fitted with the ML package - i.e. with the RandomForest
> classifier.
>
> There is PMML support in the MLib package afaik but not in ML - is
> that correct?
>
> How do you approach this, so that you do not have to fit your model
> before every prediction job?
>
> Thanks,
> Sebastian
>
>
> Sebastian Küpers
> Account Director
>
> Publicis Pixelpark
> Leibnizstrasse 65, 10629 Berlin
> T +49 30 5058 1838
> M +49 172 389 28 52
> sebastian.kuep...@publicispixelpark.de
> Web: publicispixelpark.de, Twitter: @pubpxp
> Facebook: publicispixelpark.de/facebook
> Publicis Pixelpark - eine Marke der Pixelpark AG
> Vorstand: Horst Wagner (Vorsitzender), Dirk Kedrowitsch
> Aufsichtsratsvorsitzender: Pedro Simko
> Amtsgericht Charlottenburg: HRB 72163
>
>
>
>
>
> 
> Disclaimer The information in this email and any attachments may
> contain proprietary and confidential information that is intended for
> the addressee(s) only. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution, retention
> or use of the contents of this information is prohibited. When
> addressed to our clients or vendors, any information contained in this
> e-mail or any attachments is subject to the terms and conditions in
> any governing contract. If you have received this e-mail in error,
> please immediately contact the sender and delete the e-mail. 



Re: Save RandomForest Model from ML package

2015-10-22 Thread Sujit Pal
Hi Sebastian,

You can save models to disk and load them back up. In the snippet below
(copied out of a working Databricks notebook), I train a model, then save
it to disk, then retrieve it back into model2 from disk.

import org.apache.spark.mllib.tree.RandomForest
> import org.apache.spark.mllib.tree.model.RandomForestModel
>


val model = RandomForest.trainClassifier(data, numClasses,
> categoricalFeaturesInfo,
> numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed)
> model.save(sc, inputDir + "models/randomForestModel")
>


val model2 = RandomForestModel.load(sc, inputDir +
> "models/randomForestModel")


Not sure if there is PMML support. The model saves itself into a directory
structure that looks like this:

data/
>   _SUCCESS
>   _common_metadata
>   _metadata
>   part-r-*.gz.parquet (multiple files)
> metadata/
>   _SUCCESS
>   part-0


HTH

-sujit




On Thu, Oct 22, 2015 at 5:33 AM, Sebastian Kuepers <
sebastian.kuep...@publicispixelpark.de> wrote:

> Hey,
>
> I try to figure out the best practice on saving and loading models which
> have bin fitted with the ML package - i.e. with the RandomForest
> classifier.
>
> There is PMML support in the MLib package afaik but not in ML - is that
> correct?
>
> How do you approach this, so that you do not have to fit your model before
> every prediction job?
>
> Thanks,
> Sebastian
>
>
> Sebastian Küpers
> Account Director
>
> Publicis Pixelpark
> Leibnizstrasse 65, 10629 Berlin
> T +49 30 5058 1838
> M +49 172 389 28 52
> sebastian.kuep...@publicispixelpark.de
> Web: publicispixelpark.de, Twitter: @pubpxp
> Facebook: publicispixelpark.de/facebook
> Publicis Pixelpark - eine Marke der Pixelpark AG
> Vorstand: Horst Wagner (Vorsitzender), Dirk Kedrowitsch
> Aufsichtsratsvorsitzender: Pedro Simko
> Amtsgericht Charlottenburg: HRB 72163
>
>
>
>
>
> 
> Disclaimer The information in this email and any attachments may contain
> proprietary and confidential information that is intended for the
> addressee(s) only. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, retention or use of
> the contents of this information is prohibited. When addressed to our
> clients or vendors, any information contained in this e-mail or any
> attachments is subject to the terms and conditions in any governing
> contract. If you have received this e-mail in error, please immediately
> contact the sender and delete the e-mail.
>


Save RandomForest Model from ML package

2015-10-22 Thread Sebastian Kuepers
Hey,

I try to figure out the best practice on saving and loading models which have 
bin fitted with the ML package - i.e. with the RandomForest classifier.

There is PMML support in the MLib package afaik but not in ML - is that correct?

How do you approach this, so that you do not have to fit your model before 
every prediction job?

Thanks,
Sebastian


Sebastian Küpers
Account Director

Publicis Pixelpark
Leibnizstrasse 65, 10629 Berlin
T +49 30 5058 1838
M +49 172 389 28 52
sebastian.kuep...@publicispixelpark.de
Web: publicispixelpark.de, Twitter: @pubpxp
Facebook: publicispixelpark.de/facebook
Publicis Pixelpark - eine Marke der Pixelpark AG
Vorstand: Horst Wagner (Vorsitzender), Dirk Kedrowitsch
Aufsichtsratsvorsitzender: Pedro Simko
Amtsgericht Charlottenburg: HRB 72163






Disclaimer The information in this email and any attachments may contain 
proprietary and confidential information that is intended for the addressee(s) 
only. If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution, retention or use of the contents of this 
information is prohibited. When addressed to our clients or vendors, any 
information contained in this e-mail or any attachments is subject to the terms 
and conditions in any governing contract. If you have received this e-mail in 
error, please immediately contact the sender and delete the e-mail.