Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19527
The behavior I thought is:
* keep=true, dropLast=true ==> error option
* keep=true, dropLast=false ==> vector size n (all-0 only for invalid value)
For the cases of `dropLast = false`, it behaves similarly as
`sklearn.preprocessing.OneHotEncoder`.
If we make it behave as:
* keep=true, dropLast=true ==> vector size n (all-0 only if there was an
invalid value)
For example with 5 categories, we don't know `[0.0, 0.0, 0.0, 0.0, 0.0]`
means last category or invalid value.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]