No.  They don't have particularly good locality.  The would have moderate
hotspots, but these would be scatter all over.  The hotspots might allow L2
cache to help, but would not allow disk based data to work.

The major opportunity for improvement here is to incorporate some of the
advances that VW has recently pioneered.

I owe the list a long email summarizing a call that John Langford and I had
last month, but I am running pretty hard and fast at work just now.

On Tue, Jan 3, 2012 at 3:30 PM, Lance Norskog <[email protected]> wrote:

> Does these algorithms have good locality? For doing giant online
> computations it might be worth storing these in memory-mapped files.
> Or, give up and get the M/R SGD code in.
>
> On Tue, Jan 3, 2012 at 2:59 PM, Ted Dunning <[email protected]> wrote:
> > You math is correct.
> >
> > When you say you have 105 features, what do you mean?  Are these textual
> > features?  Or what?
> >
> > On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll <[email protected]>
> wrote:
> >
> >> I'm trying to run the full ASF email SGD classifier problem and am
> facing
> >> heap size issues.  My current setup has 105 features and I am using a
> >> cardinality of 100K.  I'm using the AdaptiveLogisticRegression.  I'm
> >> getting heap errors and they occur when trying to construct the ALR
> class
> >> (i.e. not later during training).
> >>
> >> Just trying to check my math on memory:
> >> ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5
> >> OnlineLogisticRegression instances, which each have a DenseMatrix of
> >> (numFeatures -1) X cardinality, plus some other vectors.
> >>
> >> This means, in my case, I have:
> >> 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39
> GB
> >>
> >> Am I understanding the major parts of memory for ALR correctly?  In
> other
> >> words, I need to tone down the number of CFLs in the TrainASFEmail.java
> >> file so as to not use 20 CFLs, right?
>
>
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to