How much time do you have available for training?

If you can do feature encoding in parallel, then you can probably do this
pretty fast with SGD.

My guess is that you can push 2-20 MB/s of data through SGD with your kind
of  data with a good 8 core processor.  If you pre-process your data into 8
B / dimension, this is 0.25 - 2.5 million data points per second.  This
could mean that your training takes less than an hour.  If your training
converges with less data, you may do even better.

Is that not acceptable?

On Mon, Apr 25, 2011 at 10:11 PM, Stanley Xu <[email protected]> wrote:

> Thanks Ted. Read the paper and the code and got the rough idea of how the
> iteration goes. Thanks so much.
>
> With the current data scale we have, we were considering if we could train
> more data with the Logistic Regression. For example, if we wanted to train
> a
> model for CTR prediction for last 90 days data. It would be 900M records
> after down sampling, and assume there are 1000 feature dimension there. It
> would still be so slow by a single machine with the current SGD algorithm.
>
> I wondering if there is a parallel algorithm with map-reduce I could use
> for
> Logistic Regression? The original Newton-Raphson will take N*N*M/P by
> the "Map-Reduce
> for Machine Learning on Multicore" paper, which is much slower than SGD on
> a
> single machine in a high-dimension space.
>
> Could algorithm like IRLS be parallelized or any approximate algorithm
> there
> could be parallelized?
>
> Thanks,
> Stanley Xu
>
>
>
> On Mon, Apr 25, 2011 at 11:58 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Paul K described in memory algorithms in his dissertation.  Mahout uses
> > on-line algorithms which are not limited by memory size.
> >
> > The method used in Mahout is closer to what Bob Carpenter describes here:
> > http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf
> >
> > The most important additions in Mahout are:
> >
> > a) confidence weighted learning rates per term
> >
> > b) evolutionary tuning of hyper-parameters
> >
> > c) mixed ranking and regression
> >
> > d) grouped AUC
> >
> > On Mon, Apr 25, 2011 at 6:12 AM, Stanley Xu <[email protected]> wrote:
> >
> > > Dear All,
> > >
> > > I am trying to go through the Mahout SGD algorithm and trying to read
> > > the "Logistic
> > > Regression for Data Mining and High-Dimensional Classification" a
> little
> > > bit, I am wondering which algorithm is exactly used in the SGD code?
> > There
> > > are quite a couple of algorithms mentioned in the paper, a little hard
> to
> > > me
> > > to find out the algorithm matched the code.
> > >
> > > Thanks in advance.
> > >
> > > Best wishes,
> > > Stanley Xu
> > >
> >
>

Reply via email to