subject:"Some feedback on the Gradient Descent Code"

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Till Rohrmann

Yes GradientDescent == (batch-)SGD. That was also my first idea of how to implement it. However, what happens if the regularization is specific to the actually used algorithm. For example, for L-BFGS with L1 regularization you have a different `parameterUpdate` step (Orthant-wise Limited Memory

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Theodore Vasiloudis

+1 This separation was the idea from the start, there is trade-off between having highly configureable optimizers and ensuring that the right types of regularization can only be applied to optimization algorithms that support them. It comes down to viewing the optimization framework mostly as a