Hello all,
I was able to successfully test PFPGrowth with 50M transactions. Now I am 
testing with 150M transactions and no matter what group size I use I am getting 
out of memory when running the FPGrowth job. It finishes parallel counting and 
transaction sorting job fine but when its running FPGrowth job, I always get 
outofmemory.

On Hadoop side:map/reduce process heap size is 2G. No. of reduce jobs is 24 on 
total of 4 hadoop cluster.
On Mahout side: I specified minSupport as 250 and tried with group size from 
500 to 3000.
Out of 150M transactions, Its generating about 6500 features so I thought group 
size of 500 should be good enough to avoid out of memory.

What params can I change to fix the outofmemory issue?
Can someone throw some light on how to come up with optimal parameter values to 
avoid such issues on production system?

Any help is appreciated.

Praveen

10/11/23 10:16:52 INFO mapred.JobClient:  map 100% reduce 20%
10/11/23 10:17:01 INFO mapred.JobClient:  map 100% reduce 17%
10/11/23 10:17:03 INFO mapred.JobClient: Task Id : 
attempt_201011221932_0009_r_000013_2, Status : FAILED
Error: Java heap space
10/11/23 10:17:10 INFO mapred.JobClient:  map 100% reduce 14%
10/11/23 10:17:12 INFO mapred.JobClient: Task Id : 
attempt_201011221932_0009_r_000018_0, Status : FAILED
Error: Java heap space
10/11/23 10:17:14 INFO mapred.JobClient:  map 100% reduce 11%
10/11/23 10:17:16 INFO mapred.JobClient:  map 100% reduce 12%
10/11/23 10:17:16 INFO mapred.JobClient: Task Id : 
attempt_201011221932_0009_r_000016_1, Status : FAILED
Error: Java heap space
10/11/23 10:17:19 INFO mapred.JobClient:  map 100% reduce 8%
10/11/23 10:17:22 INFO mapred.JobClient:  map 100% reduce 9%
10/11/23 10:17:25 INFO mapred.JobClient: Task Id : 
attempt_201011221932_0009_r_000019_0, Status : FAILED
Error: Java heap space

Reply via email to