It takes a truly gargantuan amount of data to justify map-reducing
LSH. You can get very far with a plain single-machine implementation.

On Wed, Apr 13, 2011 at 5:57 AM, Sebastian Schelter <[email protected]> wrote:
> They are using PLSI which we already tried to implement in
> https://issues.apache.org/jira/browse/MAHOUT-106. We didn't get it scalable,
> as far as I remember the paper, they are doing a nasty trick when sending
> data to the reducers in a certain step so that they only have to load a
> certain portion of data into memory. I'm not sure this can be replicated in
> hadoop (would love to be proven wrong through).
>
> They are also using LSH to cluster users by jaccard-coefficient, don't we
> already have code for this in org.apache.mahout.clustering.minhash ?
>
> --sebastian
>
> On 13.04.2011 10:49, Sean Owen wrote:
>>
>> One of the three approaches that they combine is latent semantic indexing
>> --
>> that is what I was referring to.
>>
>> On Wed, Apr 13, 2011 at 8:33 AM, Ted Dunning<[email protected]>
>>  wrote:
>>
>>> Sean,
>>>
>>> Do you mean LSI (latent semantic indexing)?  Or LSH (locality sensitive
>>> hashing)?
>>>
>>> (are you a victim of agressive error correction?)
>>>
>>> (or am I the victim of too little?)
>>>
>>>
>
>

Reply via email to