My unsubstantiated guess is that most of these could actually be replaced with random vectors with no impact. All of the studies I have seen that measure how many singular vectors are necessary change the dimensionality as they test different numbers. I think it would be better to keep the dimensionality constant and just change how many vectors are actually singular vectors and how many are random.
On Tue, Jul 6, 2010 at 2:27 PM, Jake Mannix <[email protected]> wrote: > My rule of thumb has been that for text type stuff (i.e. LSI/LSA), > something > around 200-400 is the most you'll ever need. For smaller corpora and/or > vocabularies, even below the bottom end of this range is fine >
