I got the reason for difference.
Actually, its due to
if (canopy.getNumPoints()> clusterFilter)
in CanopyMapper.
Similar data is not distributed evenly in the mappers. So, the canopies
might come out with points < clusterFilter which are not processed further.
But, this check is a great performance enhancer. I have experienced that.
Maybe, distributing similar vectors on mappers might help to attain both
quality and performance.
On 03-10-2011 09:29, Paritosh Ranjan wrote:
The sequential algorithm finds more/better clusters than the
mapreduce one.
There's not a huge difference, but the standalone one is better for sure.
Thanks and Regards,
Paritosh
On 03-10-2011 01:47, Konstantin Shmakov wrote:
I'd assume that distributed and sequential algorithms shouldn't produce
identical results. To start with, they differ in initial setup:
-- In distributed algorithm each mapper deals with subset of data and
starts
by picking up a random point, so N random points are picked up by N
mappers
to start with.
-- In sequential algorithm 1 mapper deals with all data and starts by
picking up 1 random point.
But for the data with real clusters both algorithms should produce
similar
results. How different are the results in your case?
Thanks
--Konstantin
On Sun, Oct 2, 2011 at 1:36 AM, Paritosh Ranjan<[email protected]>
wrote:
Even run() of CanopyDriver, which takes only T1 and T2 is giving
different
results for sequential and mapreduce.
This is preventing me from scaling up, as I need to run mapreduce on
hadoop
to scale.
Is anyone having any idea of this problem?
On 02-10-2011 00:27, Paritosh Ranjan wrote:
Hi,
I am able to cluster correctly sequentially, using CanopyDriver.
However, the same dataset, when processed as a MapReduce job, where
( t1 =
t3 and t2 = t4 and t1>t2) is not working. I am getting errors like
Canopies
are empty.
I also tried to reduce the values of t3 and t4. But reducing it
either has
no effect or gives meaningless results.
Am I doing something wrong? or is there a bug somewhere?
I feel that both, sequential and MapReduce should give similar
results.
But, It is not happening.
Thanks and Regards,
Paritosh
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date:
10/01/11
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1410 / Virus Database: 1520/3933 - Release Date: 10/02/11