Does these algorithms have good locality? For doing giant online computations it might be worth storing these in memory-mapped files. Or, give up and get the M/R SGD code in.
On Tue, Jan 3, 2012 at 2:59 PM, Ted Dunning <[email protected]> wrote: > You math is correct. > > When you say you have 105 features, what do you mean? Are these textual > features? Or what? > > On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll <[email protected]> wrote: > >> I'm trying to run the full ASF email SGD classifier problem and am facing >> heap size issues. My current setup has 105 features and I am using a >> cardinality of 100K. I'm using the AdaptiveLogisticRegression. I'm >> getting heap errors and they occur when trying to construct the ALR class >> (i.e. not later during training). >> >> Just trying to check my math on memory: >> ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5 >> OnlineLogisticRegression instances, which each have a DenseMatrix of >> (numFeatures -1) X cardinality, plus some other vectors. >> >> This means, in my case, I have: >> 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 GB >> >> Am I understanding the major parts of memory for ALR correctly? In other >> words, I need to tone down the number of CFLs in the TrainASFEmail.java >> file so as to not use 20 CFLs, right? -- Lance Norskog [email protected]
