Hi,

I'm trying to use RowSimilarityJob (current trunk) to calculate pairwise similarities between feature vectors but I'm struggling a bit with the correct input format.

I used SparseVectorsFromSequenceFiles to create a bunch of vectors from documents. But using the tfidf vectors directly as input doesn't work as it produces vectors with Strings as keys, while RowSimilarityJob seems to expect IntWritable. I've also seen something about DistributedRowMatrix as input in some older docs.

Any hints? Is RowSimilarityJob a good choice for that task at all?

Thanks for your help,
Sören

Reply via email to