I'd assume that distributed and sequential algorithms shouldn't produce identical results. To start with, they differ in initial setup: -- In distributed algorithm each mapper deals with subset of data and starts by picking up a random point, so N random points are picked up by N mappers to start with. -- In sequential algorithm 1 mapper deals with all data and starts by picking up 1 random point. But for the data with real clusters both algorithms should produce similar results. How different are the results in your case?
Thanks --Konstantin On Sun, Oct 2, 2011 at 1:36 AM, Paritosh Ranjan <[email protected]> wrote: > Even run() of CanopyDriver, which takes only T1 and T2 is giving different > results for sequential and mapreduce. > This is preventing me from scaling up, as I need to run mapreduce on hadoop > to scale. > > Is anyone having any idea of this problem? > > On 02-10-2011 00:27, Paritosh Ranjan wrote: > >> Hi, >> >> I am able to cluster correctly sequentially, using CanopyDriver. >> >> However, the same dataset, when processed as a MapReduce job, where ( t1 = >> t3 and t2 = t4 and t1>t2) is not working. I am getting errors like Canopies >> are empty. >> >> I also tried to reduce the values of t3 and t4. But reducing it either has >> no effect or gives meaningless results. >> >> Am I doing something wrong? or is there a bug somewhere? >> >> I feel that both, sequential and MapReduce should give similar results. >> But, It is not happening. >> >> Thanks and Regards, >> Paritosh >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11 >> > > -- ksh:
