[ https://issues.apache.org/jira/browse/SPARK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298837#comment-15298837 ]
Bryan Cutler commented on SPARK-15497: -------------------------------------- This was added in SPARK-11888 and will be in Spark 2.0. > DecisionTreeClassificationModel can't be saved within in Pipeline caused by > not implement Writable > ---------------------------------------------------------------------------------------------------- > > Key: SPARK-15497 > URL: https://issues.apache.org/jira/browse/SPARK-15497 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.6.1 > Reporter: lichenglin > Fix For: 2.0.0 > > > Here is my code > {code} > SQLContext sqlContext = getSQLContext(); > DataFrame data = > sqlContext.read().format("libsvm").load("file:///E:/workspace-mars/bigdata/sparkjob/data/mllib/sample_libsvm_data.txt"); > // Index labels, adding metadata to the label column. > // Fit on whole dataset to include all labels in index. > StringIndexerModel labelIndexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("indexedLabel") > .fit(data); > // Automatically identify categorical features, and index them. > VectorIndexerModel featureIndexer = new VectorIndexer() > .setInputCol("features") > .setOutputCol("indexedFeatures") > .setMaxCategories(4) // features with > 4 distinct values are > treated as continuous > .fit(data); > // Split the data into training and test sets (30% held out for > testing) > DataFrame[] splits = data.randomSplit(new double[]{0.7, 0.3}); > DataFrame trainingData = splits[0]; > DataFrame testData = splits[1]; > // Train a DecisionTree model. > DecisionTreeClassifier dt = new DecisionTreeClassifier() > .setLabelCol("indexedLabel") > .setFeaturesCol("indexedFeatures"); > // Convert indexed labels back to original labels. > IndexToString labelConverter = new IndexToString() > .setInputCol("prediction") > .setOutputCol("predictedLabel") > .setLabels(labelIndexer.labels()); > // Chain indexers and tree in a Pipeline > Pipeline pipeline = new Pipeline() > .setStages(new PipelineStage[]{labelIndexer, featureIndexer, > dt, labelConverter}); > // Train model. This also runs the indexers. > PipelineModel model = pipeline.fit(trainingData); > model.save("file:///e:/tmpmodel"); > {code} > and here is the exception > {code} > Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline > write will fail on this Pipeline because it contains a stage which does not > implement Writable. Non-Writable stage: dtc_7bdeae1c4fb8 of type class > org.apache.spark.ml.classification.DecisionTreeClassificationModel > at > org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:218) > at > org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:215) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:215) > at > org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:325) > at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309) > at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131) > at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280) > at com.bjdv.spark.job.Testjob.main(Testjob.java:142) > {code} > sample_libsvm_data.txt is included in the 1.6.1 release tar -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org