It scales better than producing the vectors does! Seriously, whatever is producing the vectors can easily produce counts, even if there are many counts. The SVD driver code can read and summarize many, many counts in essentially zero time.
On Mon, Jul 5, 2010 at 4:46 PM, Grant Ingersoll <[email protected]> wrote: > > Yes and no. The number of rows should be the number of documents you > > vectorized. The number of columns should be the number of distinct terms > > that you observed in vectorizing. Both should be pretty easily > available. > > Yeah, I can count the rows w/ the VectorDumper, but that doesn't really > scale.
