PFPGrowth on cluster does not distribute work load equally on nodes

Björn Jacobs Wed, 16 Jun 2010 14:41:56 -0700

Hallo everyone!

I am trying to get used to the PFPGrowth in the Mahout packages. I am planning 
to adapt this code to be able to run a parallelized subgroup discovery. This is 
btw the aim of my bachelor thesis, which I am currently writing.


I'm having the problem that the algorithm does not distribute the work load 
equally on the nodes in my cluster. I have 10 nodes and I set the 
mapred.map.tasks=15 as well as the mapred.reduce.tasks variable.

My problem is, that the "PFP Growth Driver running over 
input/test002/sortedoutput"-Job did the following:

Node 0 got nearly 100% of the work (finished in 20 minutes)
Node 1-3 got a very small piece (finished in less than 10 seconds)
Node 4-14 got nothing and finished execution immediately

This way one node had to do all the work while the others had nothing to do and 
the job took really long to finish... that's not parallel.

Is this a bug or do I have to configure something to get this working?
Thanks a lot!

Yours,
Björn Jacobs
-- 
GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl.  
Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl

PFPGrowth on cluster does not distribute work load equally on nodes

Reply via email to