Paul K described in memory algorithms in his dissertation. Mahout uses on-line algorithms which are not limited by memory size.
The method used in Mahout is closer to what Bob Carpenter describes here: http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf The most important additions in Mahout are: a) confidence weighted learning rates per term b) evolutionary tuning of hyper-parameters c) mixed ranking and regression d) grouped AUC On Mon, Apr 25, 2011 at 6:12 AM, Stanley Xu <[email protected]> wrote: > Dear All, > > I am trying to go through the Mahout SGD algorithm and trying to read > the "Logistic > Regression for Data Mining and High-Dimensional Classification" a little > bit, I am wondering which algorithm is exactly used in the SGD code? There > are quite a couple of algorithms mentioned in the paper, a little hard to > me > to find out the algorithm matched the code. > > Thanks in advance. > > Best wishes, > Stanley Xu >
