[jira] [Commented] (SPARK-15497) DecisionTreeClassificationModel can't be saved within in Pipeline caused by not implement Writable

Bryan Cutler (JIRA) Tue, 24 May 2016 13:33:25 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298837#comment-15298837
 ]


Bryan Cutler commented on SPARK-15497:
--------------------------------------

This was added in SPARK-11888 and will be in Spark 2.0.

> DecisionTreeClassificationModel can't be saved within in  Pipeline caused by 
> not implement Writable 
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15497
>                 URL: https://issues.apache.org/jira/browse/SPARK-15497
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.1
>            Reporter: lichenglin
>             Fix For: 2.0.0
>
>
> Here is my code
> {code}
> SQLContext sqlContext = getSQLContext();
>               DataFrame data = 
> sqlContext.read().format("libsvm").load("file:///E:/workspace-mars/bigdata/sparkjob/data/mllib/sample_libsvm_data.txt");
>               // Index labels, adding metadata to the label column.
>               // Fit on whole dataset to include all labels in index.
>               StringIndexerModel labelIndexer = new StringIndexer()
>                 .setInputCol("label")
>                 .setOutputCol("indexedLabel")
>                 .fit(data);
>               // Automatically identify categorical features, and index them.
>               VectorIndexerModel featureIndexer = new VectorIndexer()
>                 .setInputCol("features")
>                 .setOutputCol("indexedFeatures")
>                 .setMaxCategories(4) // features with > 4 distinct values are 
> treated as continuous
>                 .fit(data);
>               // Split the data into training and test sets (30% held out for 
> testing)
>               DataFrame[] splits = data.randomSplit(new double[]{0.7, 0.3});
>               DataFrame trainingData = splits[0];
>               DataFrame testData = splits[1];
>               // Train a DecisionTree model.
>               DecisionTreeClassifier dt = new DecisionTreeClassifier()
>                 .setLabelCol("indexedLabel")
>                 .setFeaturesCol("indexedFeatures");
>               // Convert indexed labels back to original labels.
>               IndexToString labelConverter = new IndexToString()
>                 .setInputCol("prediction")
>                 .setOutputCol("predictedLabel")
>                 .setLabels(labelIndexer.labels());
>               // Chain indexers and tree in a Pipeline
>               Pipeline pipeline = new Pipeline()
>                 .setStages(new PipelineStage[]{labelIndexer, featureIndexer, 
> dt, labelConverter});
>               // Train model.  This also runs the indexers.
>               PipelineModel model = pipeline.fit(trainingData);
>               model.save("file:///e:/tmpmodel");
> {code}
> and here is the exception
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline 
> write will fail on this Pipeline because it contains a stage which does not 
> implement Writable. Non-Writable stage: dtc_7bdeae1c4fb8 of type class 
> org.apache.spark.ml.classification.DecisionTreeClassificationModel
>       at 
> org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:218)
>       at 
> org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:215)
>       at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>       at 
> org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:215)
>       at 
> org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:325)
>       at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309)
>       at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131)
>       at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280)
>       at com.bjdv.spark.job.Testjob.main(Testjob.java:142)
> {code}
> sample_libsvm_data.txt is included in the 1.6.1 release tar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15497) DecisionTreeClassificationModel can't be saved within in Pipeline caused by not implement Writable

Reply via email to