The model size is very simple. If you have k categories and m features, the model size will be (k-1) x m x s1 + m * s2 + s3 where s1 is roughly 8 bytes and s2 is about 4 bytes and s3 is probably around 100 bytes. These are approximate numbers and could be off by 2 if I forgot something. The first product is the beta matrix, the second is the learning rate vector and the third is the object overhead. There might be an additional component to s2 to account for per row overhead of beta.
On Tue, Sep 4, 2012 at 2:05 PM, Grant Ingersoll <[email protected]> wrote: > Hi, > > I'm wondering if any has any rules of thumb around model size and memory > usage for SGD? I'm doing some testing of it myself, but thought I would > ask to see how it compares. > > Thanks, > Grant
