Re: reducer outofmemoryerror
I made two changes: 1) Increased mapred.child.java opts tp 768m 2) Coalesced the files into smaller number of larger files. This has resolved my problem and reduced the running time by a factor of 3. Thanks for all the suggestions. Ted Dunning wrote: If these files are small, you will have a significant (but not massive) hit in performance due to having so many files. On 4/24/08 12:07 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: On Apr 23, 2008, at 7:51 AM, Apurva Jadhav wrote: There are six reducers and 24000 mappers because there are 24000 files. The number of tasks per node is 2. mapred.child.java opts is the default value 200m. What is a good value for this.? My mappers and reducers are fairly simple and do not make large allocations. Try upping that to 512M. Arun Regards, aj Amar Kamat wrote: Apurva Jadhav wrote: Hi, I have a 4 node hadoop 0.15.3 cluster. I am using the default config files. I am running a map reduce job to process 40 GB log data. How many maps and reducers are there? Make sure that there are sufficient number of reducers. Look at conf/hadoop-default.xml (see mapred.child.java.opts parameter) to change the heap settings. Amar Some reduce tasks are failing with the following errors: 1) stderr Exception in thread "org.apache.hadoop.io.ObjectWritable Connection Culler" Exception in thread "[EMAIL PROTECTED]" java.lang.OutOfMemoryError: Java heap space Exception in thread "IPC Client connection to /127.0.0.1:34691" java.lang.OutOfMemoryError: Java heap space Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 2) stderr Exception in thread "org.apache.hadoop.io.ObjectWritable Connection Culler" java.lang.OutOfMemoryError: Java heap space syslog: 2008-04-22 19:32:50,784 INFO org.apache.hadoop.mapred.ReduceTask: task_200804212359_0007_r_04_0 Merge of the 19 files in InMemoryFileSystem complete. Local file is /data/hadoop-im2/ mapred/loca l/task_200804212359_0007_r_04_0/map_22600.out 2008-04-22 20:34:16,012 INFO org.apache.hadoop.ipc.Client: java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java: 162) at java.io.FilterInputStream.read(FilterInputStream.java:111) at org.apache.hadoop.ipc.Client$Connection$1.read (Client.java:181) at java.io.BufferedInputStream.fill (BufferedInputStream.java:218) at java.io.BufferedInputStream.read (BufferedInputStream.java:235) at java.io.DataInputStream.readInt(DataInputStream.java:353) at org.apache.hadoop.ipc.Client$Connection.run(Client.java: 258) 2008-04-22 20:34:16,032 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.OutOfMemoryError: Java heap space 2008-04-22 20:34:16,031 INFO org.apache.hadoop.mapred.TaskRunner: Communication exception: java.lang.OutOfMemoryError: Java heap space Has anyone experienced similar problem ? Is there any configuration change that can help resolve this issue. Regards, aj
Re: reducer outofmemoryerror
There are six reducers and 24000 mappers because there are 24000 files. The number of tasks per node is 2. mapred.child.java opts is the default value 200m. What is a good value for this.? My mappers and reducers are fairly simple and do not make large allocations. Regards, aj Amar Kamat wrote: Apurva Jadhav wrote: Hi, I have a 4 node hadoop 0.15.3 cluster. I am using the default config files. I am running a map reduce job to process 40 GB log data. How many maps and reducers are there? Make sure that there are sufficient number of reducers. Look at conf/hadoop-default.xml (see mapred.child.java.opts parameter) to change the heap settings. Amar Some reduce tasks are failing with the following errors: 1) stderr Exception in thread "org.apache.hadoop.io.ObjectWritable Connection Culler" Exception in thread "[EMAIL PROTECTED]" java.lang.OutOfMemoryError: Java heap space Exception in thread "IPC Client connection to /127.0.0.1:34691" java.lang.OutOfMemoryError: Java heap space Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 2) stderr Exception in thread "org.apache.hadoop.io.ObjectWritable Connection Culler" java.lang.OutOfMemoryError: Java heap space syslog: 2008-04-22 19:32:50,784 INFO org.apache.hadoop.mapred.ReduceTask: task_200804212359_0007_r_04_0 Merge of the 19 files in InMemoryFileSystem complete. Local file is /data/hadoop-im2/mapred/loca l/task_200804212359_0007_r_04_0/map_22600.out 2008-04-22 20:34:16,012 INFO org.apache.hadoop.ipc.Client: java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:162) at java.io.FilterInputStream.read(FilterInputStream.java:111) at org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:181) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:235) at java.io.DataInputStream.readInt(DataInputStream.java:353) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:258) 2008-04-22 20:34:16,032 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.OutOfMemoryError: Java heap space 2008-04-22 20:34:16,031 INFO org.apache.hadoop.mapred.TaskRunner: Communication exception: java.lang.OutOfMemoryError: Java heap space Has anyone experienced similar problem ? Is there any configuration change that can help resolve this issue. Regards, aj
reducer outofmemoryerror
Hi, I have a 4 node hadoop 0.15.3 cluster. I am using the default config files. I am running a map reduce job to process 40 GB log data. Some reduce tasks are failing with the following errors: 1) stderr Exception in thread "org.apache.hadoop.io.ObjectWritable Connection Culler" Exception in thread "[EMAIL PROTECTED]" java.lang.OutOfMemoryError: Java heap space Exception in thread "IPC Client connection to /127.0.0.1:34691" java.lang.OutOfMemoryError: Java heap space Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 2) stderr Exception in thread "org.apache.hadoop.io.ObjectWritable Connection Culler" java.lang.OutOfMemoryError: Java heap space syslog: 2008-04-22 19:32:50,784 INFO org.apache.hadoop.mapred.ReduceTask: task_200804212359_0007_r_04_0 Merge of the 19 files in InMemoryFileSystem complete. Local file is /data/hadoop-im2/mapred/loca l/task_200804212359_0007_r_04_0/map_22600.out 2008-04-22 20:34:16,012 INFO org.apache.hadoop.ipc.Client: java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:162) at java.io.FilterInputStream.read(FilterInputStream.java:111) at org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:181) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:235) at java.io.DataInputStream.readInt(DataInputStream.java:353) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:258) 2008-04-22 20:34:16,032 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.OutOfMemoryError: Java heap space 2008-04-22 20:34:16,031 INFO org.apache.hadoop.mapred.TaskRunner: Communication exception: java.lang.OutOfMemoryError: Java heap space Has anyone experienced similar problem ? Is there any configuration change that can help resolve this issue. Regards, aj