Re: Java Heap space error

2012-03-08 Thread hadoopman
I'm curious if you have been able to track down the cause of the error?  
We've seen similar problems with loading data and I've discovered if I 
presort my data before the load that things go a LOT smoother.


When running queries against our data sometimes we've seen it where the 
jobtracker just freezes.  I've seen Heap out of memory errors when I 
cranked up jobtracker logging to debug.  Still working on figuring this 
one out.  should be an interesting ride :D




On 03/06/2012 11:10 AM, Mohit Anchlia wrote:

I am still trying to see how to narrow this down. Is it possible to set
heapdumponoutofmemoryerror option on these individual tasks?

On Mon, Mar 5, 2012 at 5:49 PM, Mohit Anchliamohitanch...@gmail.comwrote:







Re: Java Heap space error

2012-03-06 Thread Mohit Anchlia
I am still trying to see how to narrow this down. Is it possible to set
heapdumponoutofmemoryerror option on these individual tasks?

On Mon, Mar 5, 2012 at 5:49 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Sorry for multiple emails. I did find:


 2012-03-05 17:26:35,636 INFO
 org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call-
 Usage threshold init = 715849728(699072K) used = 575921696(562423K)
 committed = 715849728(699072K) max = 715849728(699072K)

 2012-03-05 17:26:35,719 INFO
 org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of
 7816154 bytes from 1 objects. init = 715849728(699072K) used =
 575921696(562423K) committed = 715849728(699072K) max = 715849728(699072K)

 2012-03-05 17:26:36,881 INFO
 org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call
 - Collection threshold init = 715849728(699072K) used = 358720384(350312K)
 committed = 715849728(699072K) max = 715849728(699072K)

 2012-03-05 17:26:36,885 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1

 2012-03-05 17:26:36,888 FATAL org.apache.hadoop.mapred.Child: Error
 running child : java.lang.OutOfMemoryError: Java heap space

 at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:39)

 at java.nio.CharBuffer.allocate(CharBuffer.java:312)

 at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:760)

 at org.apache.hadoop.io.Text.decode(Text.java:350)

 at org.apache.hadoop.io.Text.decode(Text.java:327)

 at org.apache.hadoop.io.Text.toString(Text.java:254)

 at
 org.apache.pig.piggybank.storage.SequenceFileLoader.translateWritableToPigDataType(SequenceFileLoader.java:105)

 at
 org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:139)

 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)

 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)

 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)

 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)

 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)

 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)

 at org.apache.hadoop.mapred.Child.main(Child.java:264)


   On Mon, Mar 5, 2012 at 5:46 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 All I see in the logs is:


 2012-03-05 17:26:36,889 FATAL org.apache.hadoop.mapred.TaskTracker: Task:
 attempt_201203051722_0001_m_30_1 - Killed : Java heap space

 Looks like task tracker is killing the tasks. Not sure why. I increased
 heap from 512 to 1G and still it fails.


 On Mon, Mar 5, 2012 at 5:03 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 I currently have java.opts.mapred set to 512MB and I am getting heap
 space errors. How should I go about debugging heap space issues?






Re: Java Heap space error

2012-03-05 Thread Mohit Anchlia
All I see in the logs is:


2012-03-05 17:26:36,889 FATAL org.apache.hadoop.mapred.TaskTracker: Task:
attempt_201203051722_0001_m_30_1 - Killed : Java heap space

Looks like task tracker is killing the tasks. Not sure why. I increased
heap from 512 to 1G and still it fails.


On Mon, Mar 5, 2012 at 5:03 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 I currently have java.opts.mapred set to 512MB and I am getting heap space
 errors. How should I go about debugging heap space issues?



Re: Java Heap space error

2012-03-05 Thread Mohit Anchlia
Sorry for multiple emails. I did find:


2012-03-05 17:26:35,636 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call-
Usage threshold init = 715849728(699072K) used = 575921696(562423K)
committed = 715849728(699072K) max = 715849728(699072K)

2012-03-05 17:26:35,719 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of
7816154 bytes from 1 objects. init = 715849728(699072K) used =
575921696(562423K) committed = 715849728(699072K) max = 715849728(699072K)

2012-03-05 17:26:36,881 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call
- Collection threshold init = 715849728(699072K) used = 358720384(350312K)
committed = 715849728(699072K) max = 715849728(699072K)

2012-03-05 17:26:36,885 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1

2012-03-05 17:26:36,888 FATAL org.apache.hadoop.mapred.Child: Error running
child : java.lang.OutOfMemoryError: Java heap space

at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:39)

at java.nio.CharBuffer.allocate(CharBuffer.java:312)

at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:760)

at org.apache.hadoop.io.Text.decode(Text.java:350)

at org.apache.hadoop.io.Text.decode(Text.java:327)

at org.apache.hadoop.io.Text.toString(Text.java:254)

at
org.apache.pig.piggybank.storage.SequenceFileLoader.translateWritableToPigDataType(SequenceFileLoader.java:105)

at
org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:139)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)

at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)

at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)

at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)

at org.apache.hadoop.mapred.Child.main(Child.java:264)


On Mon, Mar 5, 2012 at 5:46 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 All I see in the logs is:


 2012-03-05 17:26:36,889 FATAL org.apache.hadoop.mapred.TaskTracker: Task:
 attempt_201203051722_0001_m_30_1 - Killed : Java heap space

 Looks like task tracker is killing the tasks. Not sure why. I increased
 heap from 512 to 1G and still it fails.


 On Mon, Mar 5, 2012 at 5:03 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 I currently have java.opts.mapred set to 512MB and I am getting heap
 space errors. How should I go about debugging heap space issues?





Java heap space error on PFPGrowth

2010-11-11 Thread Mark
I am trying to run PFPGrowth but I keep receiving this Java heap space 
error at the end of the first step/beginning of second step.


I am using the following parameters:  -method mapreduce -regex [\\t] 
-s 5 -g 55000


Output:

..
10/11/11 08:12:56 INFO mapred.JobClient:  map 100% reduce 85%
10/11/11 08:12:59 INFO mapred.JobClient:  map 100% reduce 90%
10/11/11 08:13:02 INFO mapred.JobClient:  map 100% reduce 94%
10/11/11 08:13:09 INFO mapred.JobClient:  map 100% reduce 100%
10/11/11 08:13:11 INFO mapred.JobClient: Job complete: job_201011101701_0005
10/11/11 08:13:11 INFO mapred.JobClient: Counters: 17
10/11/11 08:13:11 INFO mapred.JobClient:   Job Counters
10/11/11 08:13:11 INFO mapred.JobClient: Launched reduce tasks=1
10/11/11 08:13:11 INFO mapred.JobClient: Launched map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient: Data-local map tasks=8
10/11/11 08:13:11 INFO mapred.JobClient:   FileSystemCounters
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_READ=146083205
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_READ=411751517
10/11/11 08:13:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=177276794
10/11/11 08:13:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82352630
10/11/11 08:13:11 INFO mapred.JobClient:   Map-Reduce Framework
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input groups=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Combine output records=30759042
10/11/11 08:13:11 INFO mapred.JobClient: Map input records=6049220
10/11/11 08:13:11 INFO mapred.JobClient: Reduce shuffle bytes=26239336
10/11/11 08:13:11 INFO mapred.JobClient: Reduce output records=3146378
10/11/11 08:13:11 INFO mapred.JobClient: Spilled Records=54248354
10/11/11 08:13:11 INFO mapred.JobClient: Map output bytes=743485927
10/11/11 08:13:11 INFO mapred.JobClient: Combine input records=63744687
10/11/11 08:13:11 INFO mapred.JobClient: Map output records=41469874
10/11/11 08:13:11 INFO mapred.JobClient: Reduce input records=8484229
10/11/11 08:13:26 INFO pfpgrowth.PFPGrowth: No of Features: 1087215
10/11/11 08:13:40 WARN mapred.JobClient: Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
10/11/11 08:13:40 INFO input.FileInputFormat: Total input paths to 
process : 1

10/11/11 08:13:44 INFO mapred.JobClient: Running job: job_201011101701_0006
10/11/11 08:13:45 INFO mapred.JobClient:  map 0% reduce 0%
10/11/11 08:14:16 INFO mapred.JobClient: Task Id : 
attempt_201011101701_0006_m_00_0, Status : FAILED

Error: Java heap space


Is there anything I can do to alleviate this problem?

FYI: I running a 4-node cluster with 12GB of ram in each machine.

Thanks


TaskTracker: Java heap space error

2010-03-11 Thread Boyu Zhang
Dear All,

I am running a hadoop job processing data. The output of map is really
large, and it spill 15 times. So I was trying to set io.sort.mb = 256
instead of 100. And I leave everything else default. I am using 0.20.2
version. And when I run the job, I got the following errors:

2010-03-11 11:09:37,581 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2010-03-11 11:09:38,073 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2010-03-11 11:09:38,086 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 256
2010-03-11 11:09:38,326 FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


I can't figure out why, could anyone please give me a hint? Any hlep will be
appreciate! Thanks a lot!

SIncerely,

Boyu


Re: TaskTracker: Java heap space error

2010-03-11 Thread Arun C Murthy

Moving to mapreduce-user@, bcc: common-user

Have you tried bumping up the heap for the map task?

Since you are setting io.sort.mb to 256M, pls set heap-size to 512M at  
least, if not more.


mapred.child.java.opts - -Xmx512M or -Xmx1024m

Arun

On Mar 11, 2010, at 8:24 AM, Boyu Zhang wrote:


Dear All,

I am running a hadoop job processing data. The output of map is really
large, and it spill 15 times. So I was trying to set io.sort.mb = 256
instead of 100. And I leave everything else default. I am using 0.20.2
version. And when I run the job, I got the following errors:

2010-03-11 11:09:37,581 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2010-03-11 11:09:38,073 INFO org.apache.hadoop.mapred.MapTask:  
numReduceTasks: 1
2010-03-11 11:09:38,086 INFO org.apache.hadoop.mapred.MapTask:  
io.sort.mb = 256

2010-03-11 11:09:38,326 FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.MapTask 
$MapOutputBuffer.init(MapTask.java:781)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


I can't figure out why, could anyone please give me a hint? Any hlep  
will be

appreciate! Thanks a lot!

SIncerely,

Boyu




Re: TaskTracker: Java heap space error

2010-03-11 Thread Boyu Zhang
Hi Arun,

I did what you said, and it seems to work. Thanks a lot! I guess I do not
completely understand how the tuning parameters affect each other. Thanks!

Boyu

On Thu, Mar 11, 2010 at 12:27 PM, Arun C Murthy a...@yahoo-inc.com wrote:

 Moving to mapreduce-user@, bcc: common-user

 Have you tried bumping up the heap for the map task?

 Since you are setting io.sort.mb to 256M, pls set heap-size to 512M at
 least, if not more.

 mapred.child.java.opts - -Xmx512M or -Xmx1024m

 Arun


 On Mar 11, 2010, at 8:24 AM, Boyu Zhang wrote:

  Dear All,

 I am running a hadoop job processing data. The output of map is really
 large, and it spill 15 times. So I was trying to set io.sort.mb = 256
 instead of 100. And I leave everything else default. I am using 0.20.2
 version. And when I run the job, I got the following errors:

 2010-03-11 11:09:37,581 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=MAP, sessionId=
 2010-03-11 11:09:38,073 INFO org.apache.hadoop.mapred.MapTask:
 numReduceTasks: 1
 2010-03-11 11:09:38,086 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb
 = 256
 2010-03-11 11:09:38,326 FATAL org.apache.hadoop.mapred.TaskTracker:
 Error running child : java.lang.OutOfMemoryError: Java heap space
at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


 I can't figure out why, could anyone please give me a hint? Any hlep will
 be
 appreciate! Thanks a lot!

 SIncerely,

 Boyu