Re: Need to reduce execution time of RowSimilarityJob

Sean Owen Mon, 01 Oct 2012 01:40:26 -0700

This is probably because the Hadoop job does some sampling and pruning
whereas the non-Hadoop generally doesn't.


On Mon, Oct 1, 2012 at 9:38 AM, yamo93 <[email protected]> wrote:
> Because the results are not the same as with RowSimilarityJob and because
> similar documents have few words in common.
>
>
> On 10/01/2012 10:24 AM, Sean Owen wrote:
>>
>> There's not really any information here. Why do you think the result is
>> wrong?
>>
>> On Mon, Oct 1, 2012 at 9:09 AM, yamo93 <[email protected]> wrote:
>>>
>>> I tried your suggestion.
>>>
>>> I genereted a CSV file with (term, docId, distance) and I used the method
>>> mostSimilarItems with UncenteredCosineSimilarity.
>>>
>>> But this seems to produce wrong results, i don't understand why ?
>>>
>>> Any idea ?
>
>

Re: Need to reduce execution time of RowSimilarityJob

Reply via email to