Thanks for the prompt reply.
RowSimilarityJob produces an output in the form of:
Key: 0: Value: {61112:0.21139380179557016,52144:0.23797846026935565,...}
whereas FileItemSimilarity is expecting a comma or tab separated inputs.
I assume that you meant that the output of RowSimilarityJob can be loaded
by the FileItemSimilarity after doing the appropriate parsing. Is that
correct, or is there actually a way to load the raw output of
RowSimilarityJob into FileItemSimilarity?
Thanks.
On Mon, Feb 24, 2014 at 7:41 PM, Sebastian Schelter <[email protected]> wrote:
> The output of RowSimilarityJob can be loaded by the FileItemSimilarity.
>
> --sebastian
>
>
> On 02/24/2014 08:31 PM, Juan José Ramos wrote:
>
>> Is there a way to reproduce this process:
>> https://cwiki.apache.org/confluence/display/MAHOUT/
>> Quick+tour+of+text+analysis+using+the+Mahout+command+line
>>
>> inside Java code and not using the command line tool? I am not interested
>> in the clustering part but in 'Calculate several similar docs to each doc
>> in the data'. In particular, I am interested in loading the output of the
>> rowsimilarity tool into memory to be used as my custom ItemSimilarity
>> implementation for an ItemBasedRecommender.
>>
>> What I exactly want is to have a matrix in memory where for every doc in
>> my
>> catalogue I have the similarity with the 100 (that is the threshold I am
>> using) most similar items an undefined similarity for the rest.
>>
>> Is it possible to do with the Java API? I know it can be done calling the
>> commands from inside the Java code and I guess that also using
>> corresponding SparseVectorsFromSequenceFiles, DistributedRowMatrix and
>> RowItemSimilarityJob. But I still see cannot see an easy way of parsing
>> the
>> output of RowItemSimilarityJob to the memory representation I intend to
>> use.
>>
>> Thanks a lot.
>>
>>
>