Hello all, I was able to successfully test PFPGrowth with 50M transactions. Now I am testing with 150M transactions and no matter what group size I use I am getting out of memory when running the FPGrowth job. It finishes parallel counting and transaction sorting job fine but when its running FPGrowth job, I always get outofmemory.
On Hadoop side:map/reduce process heap size is 2G. No. of reduce jobs is 24 on total of 4 hadoop cluster. On Mahout side: I specified minSupport as 250 and tried with group size from 500 to 3000. Out of 150M transactions, Its generating about 6500 features so I thought group size of 500 should be good enough to avoid out of memory. What params can I change to fix the outofmemory issue? Can someone throw some light on how to come up with optimal parameter values to avoid such issues on production system? Any help is appreciated. Praveen 10/11/23 10:16:52 INFO mapred.JobClient: map 100% reduce 20% 10/11/23 10:17:01 INFO mapred.JobClient: map 100% reduce 17% 10/11/23 10:17:03 INFO mapred.JobClient: Task Id : attempt_201011221932_0009_r_000013_2, Status : FAILED Error: Java heap space 10/11/23 10:17:10 INFO mapred.JobClient: map 100% reduce 14% 10/11/23 10:17:12 INFO mapred.JobClient: Task Id : attempt_201011221932_0009_r_000018_0, Status : FAILED Error: Java heap space 10/11/23 10:17:14 INFO mapred.JobClient: map 100% reduce 11% 10/11/23 10:17:16 INFO mapred.JobClient: map 100% reduce 12% 10/11/23 10:17:16 INFO mapred.JobClient: Task Id : attempt_201011221932_0009_r_000016_1, Status : FAILED Error: Java heap space 10/11/23 10:17:19 INFO mapred.JobClient: map 100% reduce 8% 10/11/23 10:17:22 INFO mapred.JobClient: map 100% reduce 9% 10/11/23 10:17:25 INFO mapred.JobClient: Task Id : attempt_201011221932_0009_r_000019_0, Status : FAILED Error: Java heap space
