Also, cleansvd appears to be spewing a bunch of numbers (something about largestCleanEigens) to the log. It's almost completely unreadable. Any objection to me making it debug at a minimum? Or, can it be removed?
On Aug 8, 2010, at 3:35 PM, Grant Ingersoll wrote: > Just to make sure I'm understanding, the docs for "clean SVD" at > https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction are > not correct, right? > > In looking at the code, the SVD command requires --Dmapred.input.dir (soon to > be --input like everything else, see MAHOUT-461) a --tempDir and > --Dmapred.output.dir (soon to be --output). Then, in the cleansvd command, > the --eigenInput should actually refer to the Output directory not the > tempDir as the docs suggest, right? > > Also, any recommendations on setting maxError and minEigenValue? What are > the tradeoffs I'm making there? I mean, I suppose maxError is some measure > of convergence and minEigenValue is just as it sounds, but what are the > practical implications of those settings? Are the values in the example good > starting points? > > Thanks, > Grant
