Hi, Thanks Rezaul and Asher Krim.
The method suggested by Rezaul works fine for NaiveBayes but still fails for RandomForest and Multi-layer perceptron classifier. Everything properly is saved until this stage. CrossValidator cv = new CrossValidator() .setEstimator(pipeline) .setEvaluator(evaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(folds); Any idea on how to resolve this? On Thu, Jan 12, 2017 at 9:13 PM, Asher Krim <ak...@hubspot.com> wrote: > What version of Spark are you on? > Although it's cut off, I think your error is with RandomForestClassifier, > is that correct? If so, you should upgrade to spark 2 since I think this > class only became writeable/readable in Spark 2 ( > https://github.com/apache/spark/pull/12118) > > On Thu, Jan 12, 2017 at 8:43 AM, Md. Rezaul Karim < > rezaul.ka...@insight-centre.org> wrote: > >> Hi Malshan, >> >> The error says that one (or more) of the estimators/stages is either not >> writable or compatible that supports overwrite/model write operation. >> >> Suppose you want to configure an ML pipeline consisting of three stages >> (i.e. estimator): tokenizer, hashingTF, and nb: >> val nb = new NaiveBayes().setSmoothing(0.00001) >> val tokenizer = new Tokenizer().setInputCol("label >> ").setOutputCol("label") >> val hashingTF = new HashingTF().setInputCol(tokeni >> zer.getOutputCol).setOutputCol("features") >> val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, >> nb)) >> >> >> Now check if all the stages are writable. And to make it ease try saving >> stages individually: -e.g. tokenizer.write.save("path") >> >> >> hashingTF.write.save("path") >> After that suppose you want to perform a 10-fold cross-validation as >> follows: >> val cv = new CrossValidator() >> .setEstimator(pipeline) >> .setEvaluator(new BinaryClassificationEvaluator) >> .setEstimatorParamMaps(paramGrid) >> .setNumFolds(10) >> >> Where: >> val paramGrid = new ParamGridBuilder() >> .addGrid(hashingTF.numFeatures, Array(10, >> 100, 1000)) >> .addGrid(nb.smoothing, Array(0.001, 0.0001)) >> .build() >> >> Now the model that you trained using the training set should be writable >> if all of the stages are okay: >> val model = cv.fit(trainingData) >> model.write.overwrite().save("output/NBModel") >> >> >> >> Hope that helps. >> >> >> >> >> >> >> >> Regards, >> _________________________________ >> *Md. Rezaul Karim*, BSc, MSc >> PhD Researcher, INSIGHT Centre for Data Analytics >> National University of Ireland, Galway >> IDA Business Park, Dangan, Galway, Ireland >> Web: http://www.reza-analytics.eu/index.html >> <http://139.59.184.114/index.html> >> >> On 12 January 2017 at 09:09, Minudika Malshan <minudika...@gmail.com> >> wrote: >> >>> Hi, >>> >>> When I try to save a pipeline model using spark ML (Java) , the >>> following exception is thrown. >>> >>> >>> java.lang.UnsupportedOperationException: Pipeline write will fail on >>> this Pipeline because it contains a stage which does not implement >>> Writable. Non-Writable stage: rfc_98f8c9e0bd04 of type class >>> org.apache.spark.ml.classification.Rand >>> >>> >>> Here is my code segment. >>> >>> >>> model.write().overwrite,save >>> >>> >>> model.write().overwrite().save("path >>> model.write().overwrite().save("mypath"); >>> >>> >>> How to resolve this? >>> >>> Thanks and regards! >>> >>> Minudika >>> >>> >> > > > -- > Asher Krim > Senior Software Engineer > -- *Minudika Malshan* Undergraduate Department of Computer Science and Engineering University of Moratuwa Sri Lanka. <https://lk.linkedin.com/pub/minudika-malshan/100/656/a80>