On 17 October 2011 14:47, Sebastian Schelter <[email protected]> wrote: > RowSimilarityJob will have quadratic runtime for dense input and might > generate large intermediate outputs. I'd argue against using it for such > purposes.
Thanks, that's clear. I wonder if there's any way the Lanczos output can be sparse-ified... Or maybe if I ruthlessly prune down to only a handful of columns. Would changing the similarity measure make any significant difference? Dan
