Re: Difference in results : Clustering : sequential and MapReduce

Konstantin Shmakov Sun, 02 Oct 2011 13:18:15 -0700

I'd assume that distributed and sequential algorithms shouldn't produce
identical results. To start with, they differ in initial setup:
-- In distributed algorithm each mapper deals with subset of data and starts
by picking up a random point, so N random points are picked up by N mappers
to start with.
-- In sequential algorithm 1 mapper deals with all data and starts by
picking up 1 random point.
But for the data with real clusters both algorithms should produce similar
results.  How different are the results in your case?


Thanks
--Konstantin








On Sun, Oct 2, 2011 at 1:36 AM, Paritosh Ranjan <[email protected]> wrote:

> Even run() of CanopyDriver, which takes only T1 and T2 is giving different
> results for sequential and mapreduce.
> This is preventing me from scaling up, as I need to run mapreduce on hadoop
> to scale.
>
> Is anyone having any idea of this problem?
>
> On 02-10-2011 00:27, Paritosh Ranjan wrote:
>
>> Hi,
>>
>> I am able to cluster correctly sequentially, using CanopyDriver.
>>
>> However, the same dataset, when processed as a MapReduce job, where ( t1 =
>> t3 and t2 = t4 and t1>t2) is not working. I am getting errors like Canopies
>> are empty.
>>
>> I also tried to reduce the values of t3 and t4. But reducing it either has
>> no effect or gives meaningless results.
>>
>> Am I doing something wrong? or is there a bug somewhere?
>>
>> I feel that both, sequential and MapReduce should give similar results.
>> But, It is not happening.
>>
>> Thanks and Regards,
>> Paritosh
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11
>>
>
>


-- 
ksh:

Re: Difference in results : Clustering : sequential and MapReduce

Reply via email to