This is probably because the Hadoop job does some sampling and pruning whereas the non-Hadoop generally doesn't.
On Mon, Oct 1, 2012 at 9:38 AM, yamo93 <[email protected]> wrote: > Because the results are not the same as with RowSimilarityJob and because > similar documents have few words in common. > > > On 10/01/2012 10:24 AM, Sean Owen wrote: >> >> There's not really any information here. Why do you think the result is >> wrong? >> >> On Mon, Oct 1, 2012 at 9:09 AM, yamo93 <[email protected]> wrote: >>> >>> I tried your suggestion. >>> >>> I genereted a CSV file with (term, docId, distance) and I used the method >>> mostSimilarItems with UncenteredCosineSimilarity. >>> >>> But this seems to produce wrong results, i don't understand why ? >>> >>> Any idea ? > >
