Re: Please Help with DecisionTree/FeatureIndexer

2017-12-19 Thread Weichen Xu
Hi, Marco Do not call any single fit/transform by your self. You only need to call `pipeline.fit`/`pipelineModel.transform`. Like following: val assembler = new VectorAssembler(). setInputCols(inputData.columns.filter(_ != "Severity")). setOutputCol("features") val data =

Re: Please Help with DecisionTree/FeatureIndexer

2017-12-18 Thread Weichen Xu
Hi Marco, If you add assembler at the first of the pipeline, like: ``` val pipeline = new Pipeline() .setStages(Array(assembler, labelIndexer, featureIndexer, dt, labelConverter)) ``` Which error do you got ? I think it can work fine if the `assembler` added into pipeline. Thanks. On

Re: Please Help with DecisionTree/FeatureIndexer

2017-12-16 Thread Weichen Xu
Hi Marco, Yes you can apply `VectorAssembler` first in the pipeline to assemble multiple features column. Thanks. On Sun, Dec 17, 2017 at 6:33 AM, Marco Mistroni wrote: > Hello Wei > Thanks, i should have c hecked the data > My data has this format >

Re: Please Help with DecisionTree/FeatureIndexer

2017-12-16 Thread Marco Mistroni
Hello Wei Thanks, i should have c hecked the data My data has this format |col1|col2|col3|label| so it looks like i cannot use VectorIndexer directly (it accepts a Vector column). I am guessing what i should do is something like this (given i have few categorical features) val assembler = new

Re: Please Help with DecisionTree/FeatureIndexer

2017-12-16 Thread Weichen Xu
Hi, Marco, val data = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") The data now include a feature column with name "features", val featureIndexer = new VectorIndexer() .setInputCol("features") <-- Here specify the "features" column to index.

Please Help with DecisionTree/FeatureIndexer

2017-12-15 Thread Marco Mistroni
HI all i am trying to run a sample decision tree, following examples here (for Mllib) https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier the example seems to use a Vectorindexer, however i am missing something. How does the featureIndexer knows