Cool. Thanks for the clarification.
On Tue, Feb 25, 2014 at 3:18 PM, Suneel Marthi <[email protected]>wrote: > That's a mistake on wiki that needs to be corrected. U r tight it should > be the similarity. > > Each row would have the 10 most similar docs for ever doc. > > > > Sent from my iPhone > > > On Feb 25, 2014, at 9:22 AM, Juan José Ramos <[email protected]> wrote: > > > > In the wiki page: 'Quick tour of text analysis using the Mahout command > > line'. > > > > > https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line > > > > At the very bottom it is said that > > > > 1. This will generate the 10 most similar docs to each doc in the > > collection. > > > > > > 1. Examine the similarity list: > > mahout seqdumper -i reuters-matrix/matrix | more > > > > > > Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/ > > part-r-00000 since that is the file of the output of rowsimilarity? Or > does > > on the contrary the rowsimilarity tool also write to reuters-matrix/? > > > > I would expect to contain the 10 most similar documents for every > document > > in the reuters' catalogue. Is that correct? > > > > Many thanks. > > Juanjo. >
