Hi Jim, Which JVM are you using?
I don't think you have any memory issue. Else you will have got some OOME... JM 2013/5/24 Jim Twensky <[email protected]> > Hi again, in addition to my previous post, I was able to get some error > logs from the task tracker/data node this morning and looks like it might > be a jetty issue: > > 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to > retrieve stdout log for task: attempt_201305231647_0007_m_001096_0 > java.io.IOException: Owner 'jim' for path > /var/tmp/jim/hadoop-logs/userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout > did not match expected owner '10929' > at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) > at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:117) > at org.apache.hadoop.mapred.TaskLog$Reader.<init>(TaskLog.java:455) > at > org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) > at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > > I am wondering if I am hitting > MAPREDUCE-2389<https://issues.apache.org/jira/browse/MAPREDUCE-2389>If so, > how do I downgrade my jetty version? Should I just replace the jetty > jar file in the lib directory with an earlier version and restart my > cluster? > > Thank you. > > > > > On Thu, May 23, 2013 at 7:14 PM, Jim Twensky <[email protected]>wrote: > >> Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and >> an 8-core processor. I sometimes get the following error on a random basis: >> >> >> >> ----------------------------------------------------------------------------------------------------------- >> >> Exception in thread "main" java.io.IOException: Exception reading >> file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken >> at >> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) >> at >> org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) >> at org.apache.hadoop.mapred.Child.main(Child.java:92) >> Caused by: java.io.IOException: failure to login >> at >> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) >> at >> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) >> at >> org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1519) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) >> at >> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) >> ... 2 more >> Caused by: javax.security.auth.login.LoginException: >> java.lang.NullPointerException: invalid null input: name >> at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:70) >> at >> com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> ...... >> >> >> ----------------------------------------------------------------------------------------------------------- >> >> This does not always happen but I see a pattern when the intermediate >> data is larger, it tends to occur more frequently. In the web log, I can >> see the following: >> >> java.lang.Throwable: Child Error >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >> Caused by: java.io.IOException: Task process exit with nonzero status of 1. >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >> >> From what I read online, a possible cause is when there is not enough >> memory for all JVM's. My mapred site.xml is set up to allocate 1100MB for >> each child and the maximum number of map and reduce tasks are set to 3 - So >> 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker >> (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but I >> couldn't avoid it so far. >> In case it helps, here are the relevant sections of my mapred-site.xml >> >> >> ----------------------------------------------------------------------------------------------------------- >> >> <name>mapred.tasktracker.map.tasks.maximum</name> >> <value>3</value> >> >> <name>mapred.tasktracker.reduce.tasks.maximum</name> >> <value>3</value> >> >> <name>mapred.child.java.opts</name> >> <value>-Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=/var/tmp/soner</value> >> >> <name>mapred.reduce.parallel.copies</name> >> <value>5</value> >> >> <name>tasktracker.http.threads</name> >> <value>80</value> >> >> ----------------------------------------------------------------------------------------------------------- >> >> My jobs still complete most of the time though they occasionally fail and >> I'm really puzzled at this point. I'd appreciate any help or ideas. >> >> Thanks >> > >
