Vincent created SPARK-18023:
-------------------------------

             Summary: Adam optimizer
                 Key: SPARK-18023
                 URL: https://issues.apache.org/jira/browse/SPARK-18023
             Project: Spark
          Issue Type: New Feature
          Components: ML, MLlib
            Reporter: Vincent
            Priority: Minor


It could be incredibly slow for SGD methods to diverge or converge if their  
learning rate alpha are set inappropriately, many alternative methods have been 
proposed to produce desirable convergence with less dependence on 
hyperparameter settings, and to help prevent local optimum, e.g. Momentom, NAG 
(Nesterov's Accelerated Gradient), Adagrad, RMSProp etc.
Among which, Adam is one of the popular algorithms, which is for first-order 
gradient-based optimization of stochastic objective functions. It's proved to 
be well suited for problems with large data and/or parameters, and for problems 
with noisy and/or sparse gradients and is computationally efficient. Refer to 
this paper for details<https://arxiv.org/pdf/1412.6980v8.pdf>

In fact, Tensorflow has implemented most of the adaptive optimization methods 
mentioned, and we have seen that Adam out performs most of SGD methods in 
certain cases, such as very sparse dataset in a FM model.

It could be nice for Spark to have these adaptive optimization methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to