Glad that it worked for you. I don't think that matters much since that is not used for the actual job execution (correct me if I am wrong). I have it as 2G but that's because I have plenty of memory on my system.
Praveen -----Original Message----- From: ext Mark [mailto:[email protected]] Sent: Thursday, November 11, 2010 12:10 PM To: [email protected] Subject: Re: Java heap space error on PFPGrowth That did it. Thanks. What do you have set for your HADOOP_HEAPSIZE in hadoop-env.sh? On 11/11/10 8:28 AM, [email protected] wrote: > Hi Mark, > I got into the same error and figured that I needed to add following hadoop > param in mapred-site.xml in hadoop 0.20.2. You can try with lesser memory > than 4GB. > > <property> > <name>mapred.child.java.opts</name> > <value>-Xmx4096m</value> > <description>map heap size for child task</description> > </property> > > Hope this solves your issue. > > Praveen > > -----Original Message----- > From: ext Mark [mailto:[email protected]] > Sent: Thursday, November 11, 2010 11:24 AM > To: [email protected]; [email protected] > Subject: Java heap space error on PFPGrowth > > I am trying to run PFPGrowth but I keep receiving this Java heap space error > at the end of the first step/beginning of second step. > > I am using the following parameters: .... -method mapreduce -regex > [\\t] -s 5 -g 55000 > > Output: > > ...... > 10/11/11 08:12:56 INFO mapred.JobClient: map 100% reduce 85% > 10/11/11 08:12:59 INFO mapred.JobClient: map 100% reduce 90% > 10/11/11 08:13:02 INFO mapred.JobClient: map 100% reduce 94% > 10/11/11 08:13:09 INFO mapred.JobClient: map 100% reduce 100% > 10/11/11 08:13:11 INFO mapred.JobClient: Job complete: > job_201011101701_0005 > 10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17 > 10/11/11 08:13:11 INFO mapred.JobClient: Job Counters > 10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1 > 10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8 > 10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8 > 10/11/11 08:13:11 INFO mapred.JobClient: FileSystemCounters > 10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205 > 10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517 > 10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794 > 10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630 > 10/11/11 08:13:11 INFO mapred.JobClient: Map-Reduce Framework > 10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378 > 10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042 > 10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220 > 10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336 > 10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378 > 10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354 > 10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927 > 10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687 > 10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874 > 10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229 > 10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215 > 10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing > the arguments. Applications should implement Tool for the same. > 10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to > process : 1 > 10/11/11 08:13:44 INFO mapred.JobClient: Running job: > job_201011101701_0006 > 10/11/11 08:13:45 INFO mapred.JobClient: map 0% reduce 0% > 10/11/11 08:14:16 INFO mapred.JobClient: Task Id : > attempt_201011101701_0006_m_000000_0, Status : FAILED > Error: Java heap space > .... > > Is there anything I can do to alleviate this problem? > > FYI: I running a 4-node cluster with 12GB of ram in each machine. > > Thanks
