Hi forumI am currently using Spark 1.4.0, and started using the ML pipeline
framework.I ran the example program
"ml.JavaSimpleTextClassificationPipeline" which uses the LogisticRegression.
But I wanted to do multiclass classification, so I used
DecisionTreeClassifier present in the org.apache.spark.ml.classification
package.The model got trained properly using the fit method, but when
testing the model using the print statement from above example, I am getting
following error that 'probability' column is not present.Is this column
present only for LogisticRegression? If so can I see what are the possible
columns present after DecisionTreeClassifier predicts the output? Also, one
morething how can I convert the predicted output back to String format if I
am using StringIndexer.*org.apache.spark.sql.AnalysisException: cannot
resolve 'probability' given input columns id, prediction, labelStr, data,
features, words, label;*        at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:63)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108)
at
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)   at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)   at
scala.collection.AbstractTraversable.map(Traversable.scala:105) at
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:122)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)  at
scala.collection.Iterator$class.foreach(Iterator.scala:727)     at
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)  at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)        
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)       
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)        
at
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)    at
scala.collection.AbstractIterator.to(Iterator.scala:1157)       at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)      at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)       
at
scala.collection.AbstractIterator.toArray(Iterator.scala:1157)  at
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:127)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:52)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42)
at
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:920)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:131) at
org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595)   at
org.apache.spark.sql.DataFrame.select(DataFrame.scala:611)      at
org.apache.spark.sql.DataFrame.select(DataFrame.scala:611)      at
com.xxx.ml.xxx.execute(xxx.java:129)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-org-apache-spark-sql-AnalysisException-cannot-resolve-probability-given-input-columns-tp23874.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to