Re: reducer outofmemoryerror

2008-04-24 Thread Apurva Jadhav


I made two changes:
1) Increased mapred.child.java opts tp 768m
2) Coalesced the files into smaller number of larger files.

This has resolved my problem and reduced the running time by a factor of 3.

Thanks for all the suggestions.


Ted Dunning wrote:

If these files are small, you will have a significant (but not massive) hit
in performance due to having so many files.


On 4/24/08 12:07 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

  

On Apr 23, 2008, at 7:51 AM, Apurva Jadhav wrote:



There are six reducers and 24000 mappers because there are 24000
files.
The number of tasks per node is 2.
mapred.child.java opts is the default value 200m. What is a good
value for this.? My mappers and reducers are fairly simple and do
not make large allocations.
  

Try upping that to 512M.

Arun



Regards,
aj

Amar Kamat wrote:
  

Apurva Jadhav wrote:


Hi,
 I have a 4 node hadoop 0.15.3 cluster. I am using the default
config files. I am running a map reduce job to process 40 GB log
data.
  

How many maps and reducers are there? Make sure that there are
sufficient number of reducers. Look at conf/hadoop-default.xml
(see mapred.child.java.opts parameter) to change the heap settings.
Amar


Some reduce tasks are failing with the following errors:
1)
stderr
Exception in thread "org.apache.hadoop.io.ObjectWritable
Connection Culler" Exception in thread
"[EMAIL PROTECTED]"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "IPC Client connection to /127.0.0.1:34691"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "main" java.lang.OutOfMemoryError: Java heap
space

2)
stderr
Exception in thread "org.apache.hadoop.io.ObjectWritable
Connection Culler" java.lang.OutOfMemoryError: Java heap space

syslog:
2008-04-22 19:32:50,784 INFO org.apache.hadoop.mapred.ReduceTask:
task_200804212359_0007_r_04_0 Merge of the 19 files in
InMemoryFileSystem complete. Local file is /data/hadoop-im2/
mapred/loca
l/task_200804212359_0007_r_04_0/map_22600.out
2008-04-22 20:34:16,012 INFO org.apache.hadoop.ipc.Client:
java.net.SocketException: Socket closed
   at java.net.SocketInputStream.read(SocketInputStream.java:
162)
   at java.io.FilterInputStream.read(FilterInputStream.java:111)
   at org.apache.hadoop.ipc.Client$Connection$1.read
(Client.java:181)
   at java.io.BufferedInputStream.fill
(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read
(BufferedInputStream.java:235)
   at java.io.DataInputStream.readInt(DataInputStream.java:353)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:
258)

2008-04-22 20:34:16,032 WARN
org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.OutOfMemoryError: Java heap space
2008-04-22 20:34:16,031 INFO org.apache.hadoop.mapred.TaskRunner:
Communication exception: java.lang.OutOfMemoryError: Java heap space

Has anyone experienced similar problem ?   Is there any
configuration change that can help resolve this issue.

Regards,
aj



  




  




Re: reducer outofmemoryerror

2008-04-23 Thread Apurva Jadhav

There are six reducers and 24000 mappers because there are 24000 files.
The number of tasks per node is 2.
mapred.child.java opts is the default value 200m. What is a good value 
for this.? My mappers and reducers are fairly simple and do not make 
large allocations.

Regards,
aj

Amar Kamat wrote:

Apurva Jadhav wrote:

Hi,
 I have a 4 node hadoop 0.15.3 cluster. I am using the default config 
files. I am running a map reduce job to process 40 GB log data.
How many maps and reducers are there? Make sure that there are 
sufficient number of reducers. Look at conf/hadoop-default.xml (see 
mapred.child.java.opts parameter) to change the heap settings.

Amar

Some reduce tasks are failing with the following errors:
1)
stderr
Exception in thread "org.apache.hadoop.io.ObjectWritable Connection 
Culler" Exception in thread 
"[EMAIL PROTECTED]" 
java.lang.OutOfMemoryError: Java heap space
Exception in thread "IPC Client connection to /127.0.0.1:34691" 
java.lang.OutOfMemoryError: Java heap space

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2)
stderr
Exception in thread "org.apache.hadoop.io.ObjectWritable Connection 
Culler" java.lang.OutOfMemoryError: Java heap space


syslog:
2008-04-22 19:32:50,784 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804212359_0007_r_04_0 Merge of the 19 files in 
InMemoryFileSystem complete. Local file is /data/hadoop-im2/mapred/loca

l/task_200804212359_0007_r_04_0/map_22600.out
2008-04-22 20:34:16,012 INFO org.apache.hadoop.ipc.Client: 
java.net.SocketException: Socket closed

   at java.net.SocketInputStream.read(SocketInputStream.java:162)
   at java.io.FilterInputStream.read(FilterInputStream.java:111)
   at 
org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:181)

   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
   at java.io.DataInputStream.readInt(DataInputStream.java:353)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:258)

2008-04-22 20:34:16,032 WARN org.apache.hadoop.mapred.TaskTracker: 
Error running child

java.lang.OutOfMemoryError: Java heap space
2008-04-22 20:34:16,031 INFO org.apache.hadoop.mapred.TaskRunner: 
Communication exception: java.lang.OutOfMemoryError: Java heap space


Has anyone experienced similar problem ?   Is there any configuration 
change that can help resolve this issue.


Regards,
aj










reducer outofmemoryerror

2008-04-22 Thread Apurva Jadhav

Hi,
 I have a 4 node hadoop 0.15.3 cluster. I am using the default config 
files. I am running a map reduce job to process 40 GB log data.

Some reduce tasks are failing with the following errors:
1)
stderr
Exception in thread "org.apache.hadoop.io.ObjectWritable Connection 
Culler" Exception in thread 
"[EMAIL PROTECTED]" 
java.lang.OutOfMemoryError: Java heap space
Exception in thread "IPC Client connection to /127.0.0.1:34691" 
java.lang.OutOfMemoryError: Java heap space

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2)
stderr
Exception in thread "org.apache.hadoop.io.ObjectWritable Connection 
Culler" java.lang.OutOfMemoryError: Java heap space


syslog:
2008-04-22 19:32:50,784 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804212359_0007_r_04_0 Merge of the 19 files in 
InMemoryFileSystem complete. Local file is /data/hadoop-im2/mapred/loca

l/task_200804212359_0007_r_04_0/map_22600.out
2008-04-22 20:34:16,012 INFO org.apache.hadoop.ipc.Client: 
java.net.SocketException: Socket closed

   at java.net.SocketInputStream.read(SocketInputStream.java:162)
   at java.io.FilterInputStream.read(FilterInputStream.java:111)
   at org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:181)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
   at java.io.DataInputStream.readInt(DataInputStream.java:353)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:258)

2008-04-22 20:34:16,032 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child

java.lang.OutOfMemoryError: Java heap space
2008-04-22 20:34:16,031 INFO org.apache.hadoop.mapred.TaskRunner: 
Communication exception: java.lang.OutOfMemoryError: Java heap space


Has anyone experienced similar problem ?   Is there any configuration 
change that can help resolve this issue.


Regards,
aj