Michel Lemay created SPARK-23216:
------------------------------------

             Summary: Multiclass LogisticRegression could have methods like 
NCE, NEG, Hierarchical SoftMax, Blackout or IS
                 Key: SPARK-23216
                 URL: https://issues.apache.org/jira/browse/SPARK-23216
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
    Affects Versions: 2.2.1
            Reporter: Michel Lemay


When training a classifier with large number of classes, performance sink. This 
is expected when using regular (log)SoftMax methods to compute the loss since 
it needs to normalize current class score with the sum of all other classes 
score.

I think this would be helpful to have approximate methods like Hierarchical 
SoftMax, NCE, NEG, IS to speedup training.

A paper comparing different methods for approximate normalization over all 
classes:
[http://web4.cs.ucl.ac.uk/staff/D.Barber/publications/AISTATS2017.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to