Hi, Marco
Do not call any single fit/transform by your self. You only need to call
`pipeline.fit`/`pipelineModel.transform`. Like following:
val assembler = new VectorAssembler().
setInputCols(inputData.columns.filter(_ != "Severity")).
setOutputCol("features")
val data =
Hi Marco,
If you add assembler at the first of the pipeline, like:
```
val pipeline = new Pipeline()
.setStages(Array(assembler, labelIndexer, featureIndexer, dt,
labelConverter))
```
Which error do you got ?
I think it can work fine if the `assembler` added into pipeline.
Thanks.
On
Hi Marco,
Yes you can apply `VectorAssembler` first in the pipeline to assemble
multiple features column.
Thanks.
On Sun, Dec 17, 2017 at 6:33 AM, Marco Mistroni wrote:
> Hello Wei
> Thanks, i should have c hecked the data
> My data has this format
>
Hello Wei
Thanks, i should have c hecked the data
My data has this format
|col1|col2|col3|label|
so it looks like i cannot use VectorIndexer directly (it accepts a Vector
column).
I am guessing what i should do is something like this (given i have few
categorical features)
val assembler = new
Hi, Marco,
val data =
spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
The data now include a feature column with name "features",
val featureIndexer = new VectorIndexer()
.setInputCol("features") <-- Here specify the "features"
column to index.
HI all
i am trying to run a sample decision tree, following examples here (for
Mllib)
https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier
the example seems to use a Vectorindexer, however i am missing something.
How does the featureIndexer knows