(Yes it would be pretty simply to hack it up to only compute for a certain list of known items, not all pairs. That ought to change the scaling factor dramatically as it would be more like linear in the number of items. Brute force is probably not so bad here.)
On Fri, Jul 13, 2012 at 10:47 PM, Pat Ferrel <[email protected]> wrote: > What I really need is to calculate the k most similar docs to a short list, > known ahead of time. I don't know of an algorithm to do this (other than > brute force). It would take a realatively small set of docs and find similar > docs in a much much larger set. Rowsimilarity finds all pair-wise > similarities. Strictly speaking I need only a tiny number of those.
