Thanks. I am still looking on how to increase the timeouts for FPGrowth. ________________________________________ From: 戴清灏 [[email protected]] Sent: Monday, October 22, 2012 10:53 PM To: [email protected] Subject: Re: Increase timeout for running PFPGrowth
-g means the number of groups when executing the fp-growth. it equals with the number of the reduce tasks, so I suggest you using the same number of your reducer in your cluster. -k means the cache that will be kept, so it could be larger if you have a big memory on single node. 在 2012年10月23日星期二,Matt Molek 写道: > Did you have those spaces "-D mapred.task.timeout=18000000"? That > won't be parsed correctly. It should be: > "-Dmapred.task.timeout=18000000" > > On Mon, Oct 22, 2012 at 1:08 PM, Amit Krishna Joshi > <[email protected]<javascript:;>> > wrote: > > Hi, > > > > I am running PFP on several datasets and it works well for smaller ones > (< > > 5GB) > > However, for the larger ones, I keep getting following timeout message. > > > > Task attempt_201210140938_0105_r_000000_0 failed to report status for 600 > > seconds. Killing! > > > > Is there a way I can increase the timeout? > > > > I even tried passing these parameter but in vain: > > -D mapred.task.timeout=18000000 -D mapred.child.java.opts=-Xmx4000m > > > > My input params are: -s 10000 -g 1000 -tc 8 -k 50 -method mapreduce > > > > Also, please suggest what would be the optimum value of g and k. > > Number of features > million > > > > > > Thanks, > > Amit > -- Regards, Q
