Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19527
  
    The behavior I thought is:
    
    * keep=true, dropLast=true ==> error option
    * keep=true, dropLast=false ==> vector size n (all-0 only for invalid value)
    
    For the cases of `dropLast = false`, it behaves similarly as 
`sklearn.preprocessing.OneHotEncoder`.
    
    If we make it behave as:
    
    * keep=true, dropLast=true ==> vector size n (all-0 only if there was an 
invalid value)
    
    For example with 5 categories, we don't know `[0.0, 0.0, 0.0, 0.0, 0.0]` 
means last category or invalid value.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to