Awesome, Working good .. need to start analysing why only 300MB is free out of configured 1.9GB heap for mappers and reducers.
On Wed, Mar 27, 2013 at 3:25 PM, Hemanth Yamijala <yhema...@thoughtworks.com > wrote: > Hi, > > >> "Dumping heap to ./heapdump.hprof" > > >> File myheapdump.hprof does not exist. > > The file names don't match - can you check your script / command line args. > > Thanks > hemanth > > > On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi < > nagarjuna.kanamarlap...@gmail.com> wrote: > >> Hi Hemanth, >> >> Nice to see this. I didnot know about this till now. >> >> But few one more issue.. the dump file did not get created.. The >> following are the logs >> >> >> >> ttempt_201302211510_81218_m_000000_0: >> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers >> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java >> heap space >> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof >> ... >> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441 >> bytes in 3.778 secs] >> attempt_201302211510_81218_m_000000_0: # >> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java >> heap space >> attempt_201302211510_81218_m_000000_0: # >> -XX:OnOutOfMemoryError="./dump.sh" >> attempt_201302211510_81218_m_000000_0: # Executing /bin/sh -c >> "./dump.sh"... >> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does >> not exist. >> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be >> found for logger (org.apache.hadoop.hdfs.DFSClient). >> >> >> >> >> >> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala < >> yhema...@thoughtworks.com> wrote: >> >>> Couple of things to check: >>> >>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool >>> interface ? You can look at an example at ( >>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0). >>> That's what accepts the -D params on command line. Alternatively, you can >>> also set the same in the configuration object like this, in your launcher >>> code: >>> >>> Configuration conf = new Configuration() >>> >>> conf.set("mapred.create.symlink", "yes"); >>> >>> >>> >>> conf.set("mapred.cache.files", >>> "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh"); >>> >>> >>> >>> conf.set("mapred.child.java.opts", >>> >>> >>> >>> "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError >>> -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh"); >>> >>> >>> Second, the position of the arguments matters. I think the command >>> should be >>> >>> hadoop jar -Dmapred.create.symlink=yes >>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh >>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError >>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' >>> com.hadoop.publicationMrPOC.Launcher Fudan\ Univ >>> >>> Thanks >>> Hemanth >>> >>> >>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi < >>> nagarjuna.kanamarlap...@gmail.com> wrote: >>> >>>> Hi Hemanth/Koji, >>>> >>>> Seems the above script doesn't work for me. Can u look into the >>>> following and suggest what more can I do >>>> >>>> >>>> hadoop fs -cat /user/ims-b/dump.sh >>>> #!/bin/sh >>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof >>>> >>>> >>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher Fudan\ Univ >>>> -Dmapred.create.symlink=yes >>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh >>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError >>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' >>>> >>>> >>>> I am not able to see the heap dump at /tmp/myheapdump_ims >>>> >>>> >>>> >>>> Erorr in the mapper : >>>> >>>> Caused by: java.lang.reflect.InvocationTargetException >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>> ... 17 more >>>> Caused by: java.lang.OutOfMemoryError: Java heap space >>>> at java.util.Arrays.copyOf(Arrays.java:2734) >>>> at java.util.ArrayList.ensureCapacity(ArrayList.java:167) >>>> at java.util.ArrayList.add(ArrayList.java:351) >>>> at >>>> com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59) >>>> ... 22 more >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala < >>>> yhema...@thoughtworks.com> wrote: >>>> >>>>> Koji, >>>>> >>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things >>>>> with your script today ! >>>>> >>>>> Hemanth >>>>> >>>>> >>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi >>>>> <knogu...@yahoo-inc.com>wrote: >>>>> >>>>>> Create a dump.sh on hdfs. >>>>>> >>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh >>>>>> #!/bin/sh >>>>>> hadoop dfs -put myheapdump.hprof >>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof >>>>>> >>>>>> Run your job with >>>>>> >>>>>> -Dmapred.create.symlink=yes >>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh >>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m >>>>>> -XX:+HeapDumpOnOutOfMemoryError >>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' >>>>>> >>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi. >>>>>> >>>>>> Koji >>>>>> >>>>>> >>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote: >>>>>> >>>>>> > Hi, >>>>>> > >>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, >>>>>> like I suspected, the dump goes to the current work directory of the task >>>>>> attempt as it executes on the cluster. This directory is cleaned up once >>>>>> the task is done. There are options to keep failed task files or task >>>>>> files >>>>>> matching a pattern. However, these are NOT retaining the current working >>>>>> directory. Hence, there is no option to get this from a cluster AFAIK. >>>>>> > >>>>>> > You are effectively left with the jmap option on pseudo distributed >>>>>> cluster I think. >>>>>> > >>>>>> > Thanks >>>>>> > Hemanth >>>>>> > >>>>>> > >>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala < >>>>>> yhema...@thoughtworks.com> wrote: >>>>>> > If your task is running out of memory, you could add the option >>>>>> -XX:+HeapDumpOnOutOfMemoryError >>>>>> > to mapred.child.java.opts (along with the heap memory). However, I >>>>>> am not sure where it stores the dump.. You might need to experiment a >>>>>> little on it.. Will try and send out the info if I get time to try out. >>>>>> > >>>>>> > >>>>>> > Thanks >>>>>> > Hemanth >>>>>> > >>>>>> > >>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi < >>>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>>> > Hi hemanth, >>>>>> > >>>>>> > This sounds interesting, will out try out that on the pseudo >>>>>> cluster. But the real problem for me is, the cluster is being maintained >>>>>> by third party. I only have have a edge node through which I can submit >>>>>> the >>>>>> jobs. >>>>>> > >>>>>> > Is there any other way of getting the dump instead of physically >>>>>> going to that machine and checking out. >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala < >>>>>> yhema...@thoughtworks.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > One option to find what could be taking the memory is to use jmap >>>>>> on the running task. The steps I followed are: >>>>>> > >>>>>> > - I ran a sleep job (which comes in the examples jar of the >>>>>> distribution - effectively does nothing in the mapper / reducer). >>>>>> > - From the JobTracker UI looked at a map task attempt ID. >>>>>> > - Then on the machine where the map task is running, got the PID of >>>>>> the running task - ps -ef | grep <task attempt id> >>>>>> > - On the same machine executed jmap -histo <pid> >>>>>> > >>>>>> > This will give you an idea of the count of objects allocated and >>>>>> size. Jmap also has options to get a dump, that will contain more >>>>>> information, but this should help to get you started with debugging. >>>>>> > >>>>>> > For my sleep job task - I saw allocations worth roughly 130 MB. >>>>>> > >>>>>> > Thanks >>>>>> > hemanth >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi < >>>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>>> > I have a lookup file which I need in the mapper. So I am trying to >>>>>> read the whole file and load it into list in the mapper. >>>>>> > >>>>>> > >>>>>> > For each and every record Iook in this file which I got from >>>>>> distributed cache. >>>>>> > >>>>>> > — >>>>>> > Sent from iPhone >>>>>> > >>>>>> > >>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala < >>>>>> yhema...@thoughtworks.com> wrote: >>>>>> > >>>>>> > Hmm. How are you loading the file into memory ? Is it some sort of >>>>>> memory mapping etc ? Are they being read as records ? Some details of the >>>>>> app will help >>>>>> > >>>>>> > >>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi < >>>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>>> > Hi Hemanth, >>>>>> > >>>>>> > I tried out your suggestion loading 420 MB file into memory. It >>>>>> threw java heap space error. >>>>>> > >>>>>> > I am not sure where this 1.6 GB of configured heap went to ? >>>>>> > >>>>>> > >>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala < >>>>>> yhema...@thoughtworks.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > The free memory might be low, just because GC hasn't reclaimed what >>>>>> it can. Can you just try reading in the data you want to read and see if >>>>>> that works ? >>>>>> > >>>>>> > Thanks >>>>>> > Hemanth >>>>>> > >>>>>> > >>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi < >>>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>>> > io.sort.mb = 256 MB >>>>>> > >>>>>> > >>>>>> > On Monday, March 25, 2013, Harsh J wrote: >>>>>> > The MapTask may consume some memory of its own as well. What is your >>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to? >>>>>> > >>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi >>>>>> > <nagarjuna.kanamarlap...@gmail.com> wrote: >>>>>> > > Hi, >>>>>> > > >>>>>> > > I configured my child jvm heap to 2 GB. So, I thought I could >>>>>> really read >>>>>> > > 1.5GB of data and store it in memory (mapper/reducer). >>>>>> > > >>>>>> > > I wanted to confirm the same and wrote the following piece of >>>>>> code in the >>>>>> > > configure method of mapper. >>>>>> > > >>>>>> > > @Override >>>>>> > > >>>>>> > > public void configure(JobConf job) { >>>>>> > > >>>>>> > > System.out.println("FREE MEMORY -- " >>>>>> > > >>>>>> > > + Runtime.getRuntime().freeMemory()); >>>>>> > > >>>>>> > > System.out.println("MAX MEMORY ---" + >>>>>> Runtime.getRuntime().maxMemory()); >>>>>> > > >>>>>> > > } >>>>>> > > >>>>>> > > >>>>>> > > Surprisingly the output was >>>>>> > > >>>>>> > > >>>>>> > > FREE MEMORY -- 341854864 = 320 MB >>>>>> > > MAX MEMORY ---1908932608 = 1.9 GB >>>>>> > > >>>>>> > > >>>>>> > > I am just wondering what processes are taking up that extra 1.6GB >>>>>> of heap >>>>>> > > which I configured for the child jvm heap. >>>>>> > > >>>>>> > > >>>>>> > > Appreciate in helping me understand the scenario. >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > Regards >>>>>> > > >>>>>> > > Nagarjuna K >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Harsh J >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Sent from iPhone >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> >>>>>> >>>>> >>>> >>> >> >