Raghuvarran V H created SPARK-32218:
---------------------------------------

             Summary: spark-ml must support one hot encoded output labels for 
classification
                 Key: SPARK-32218
                 URL: https://issues.apache.org/jira/browse/SPARK-32218
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.4.0
            Reporter: Raghuvarran V H


In any classification algorithm, for target labels that have no ordinal 
relationship, it is advised to one hot encode the target labels. Refer here:

[https://stackoverflow.com/questions/51384911/one-hot-encoding-of-output-labels/53291690#53291690]

[https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/]

spark-ml is not supporting the one hot encoded target labels. When I try, i get 
the below error:

IllegalArgumentException: u'requirement failed: Column label_ohe must be of 
type numeric but was actually of type 
struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'

So it will be nice if OHE is supported for target labels



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to