RowSimilarityJob input

Sören Brunk Tue, 08 Nov 2011 08:33:32 -0800

Hi,

I'm trying to use RowSimilarityJob (current trunk) to calculate pairwisesimilarities between feature vectors but I'm struggling a bit with thecorrect input format.

I used SparseVectorsFromSequenceFiles to create a bunch of vectors fromdocuments. But using the tfidf vectors directly as input doesn't work asit produces vectors with Strings as keys, while RowSimilarityJob seemsto expect IntWritable.I've also seen something about DistributedRowMatrix as input in someolder docs.


Any hints? Is RowSimilarityJob a good choice for that task at all?

Thanks for your help,
Sören

RowSimilarityJob input

Reply via email to