HI all i am trying to run a sample decision tree, following examples here (for Mllib)
https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier the example seems to use a Vectorindexer, however i am missing something. How does the featureIndexer knows which columns are features? Isnt' there something missing? or the featuresIndexer is able to figure out by itself which columns of teh DAtaFrame are features? val labelIndexer = new StringIndexer() .setInputCol("label") .setOutputCol("indexedLabel") .fit(data)// Automatically identify categorical features, and index them.val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .setMaxCategories(4) // features with > 4 distinct values are treated as continuous. .fit(data) Using this code i am getting back this exception Exception in thread "main" java.lang.IllegalArgumentException: Field "features" does not exist. at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:266) at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:266) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:59) at org.apache.spark.sql.types.StructType.apply(StructType.scala:265) at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40) at org.apache.spark.ml.feature.VectorIndexer.transformSchema(VectorIndexer.scala:141) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at org.apache.spark.ml.feature.VectorIndexer.fit(VectorIndexer.scala:118) what am i missing? w/kindest regarsd marco