Hi all,

I run Rowsimilarity between text documents. my documents are sorted as the
folowing:
*DocID                DocText*
    0                      xxxxx
    1                    xxxxxxxx
     2                    xxxxxxxx
  ......                      ......
The DocID is sorted from 0 and so on. I added the all documents into
sequence file(Tokenization) as:
 writer.append(new Text(Integer.toString(DocID)), new Text(DocText));

then I created tfidf vectors from sequence file,  after that I run
RowSimilarity on the tfidf vectors.
after dumping: the output was as:

key: xx        value:   key1,  key 2 ............
everything is good.

My question is How do I know if the "key" is the same of orginal number
"DocID". Im not sure if they are the same. In more details, the final
output of the RowSimilarity  is "Keys" and "values", how do I can map the
keys to the orginals "DocID"?

Thank you,
Donni

Reply via email to