Re: Load .so library error when Hadoop calls JNI interfaces
Thanks for answering. I run my Hadoop in single node, not cluster mode. On Thu, Apr 30, 2009 at 11:21 AM, jason hadoop jason.had...@gmail.com wrote: You need to make sure that the shared library is available on the tasktracker nodes, either by installing it, or by pushing it around via the distributed cache On Wed, Apr 29, 2009 at 8:19 PM, Ian jonhson jonhson@gmail.com wrote: Dear all, I wrote a plugin codes for Hadoop, which calls the interfaces in Cpp-built .so library. The plugin codes are written in java, so I prepared a JNI class to encapsulate the C interfaces. The java codes can be executed successfully when I compiled it and run it standalone. However, it does not work when I embedded in Hadoop. The exception shown out is (found in Hadoop logs): screen dump - # grep myClass logs/* -r logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1217897050 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1887898624 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) ... It seems the library can not be loaded in Hadoop. My codes (myClass.java) is like: --- myClass.java -- public class myClass { public static final Log LOG = LogFactory.getLog(org.apache.hadoop.mapred.myClass); public myClass() { try { //System.setProperty(java.library.path, /usr/local/lib); /* The above line does not work, so I have to do something * like following line. */ addDir(new String(/usr/local/lib)); System.loadLibrary(myclass); } catch(UnsatisfiedLinkError e) { LOG.info( Cannot load library:\n + e.toString() ); } catch(IOException ioe) { LOG.info( IO error:\n + ioe.toString() ); } } /* Since the System.setProperty() does not work, I have to add the following * function to force the path is added in java.library.path */ public static void addDir(String s) throws IOException { try { Field field = ClassLoader.class.getDeclaredField(usr_paths); field.setAccessible(true); String[] paths = (String[])field.get(null); for (int i = 0; i paths.length; i++) { if (s.equals(paths[i])) { return; } } String[] tmp = new String[paths.length+1]; System.arraycopy(paths,0,tmp,0,paths.length); tmp[paths.length] = s; field.set(null,tmp); } catch (IllegalAccessException e) { throw new IOException(Failed to get permissions to set library path); } catch (NoSuchFieldException e) { throw new IOException(Failed to get field handle to set library path); } } public native int myClassfsMount(String subsys); public native int myClassfsUmount(String subsys); } I don't know what missed in my codes and am wondering whether there are any rules in Hadoop I should obey if I want to achieve my target. FYI, the myClassfsMount() and myClassfsUmount() will open a socket to call services from a daemon. I would better if this design did not cause the fail in my codes. Any comments? Thanks in advance, Ian -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: Load .so library error when Hadoop calls JNI interfaces
put your .so file in every traker's Hadoop-install/lib/native/Linux-xxx-xx/ Or In your code,try to do String oldPath=System.getProperty(java.library.path); System.setProperty(java.library.path, oldPath==null? local_path_of_lib_file:oldPath+pathSeparator +local_path_of_lib_file)) System.loadLibrary(XXX); However, you also need to fetch the library to local through DistributedCache( like jason said) or putting and getting it from hdfs by yourself. On 09-4-30 下午5:14, Ian jonhson jonhson@gmail.com wrote: You mean that the current hadoop does not support JNI calls, right? Are there any solution to achieve the calls from C interfaces? 2009/4/30 He Yongqiang heyongqi...@software.ict.ac.cn: Does hadoop now support jni calls in Mappers or Reducers? If yes, how? If not, I think we should create a jira issue for supporting that. On 09-4-30 下午4:02, Ian jonhson jonhson@gmail.com wrote: Thanks for answering. I run my Hadoop in single node, not cluster mode. On Thu, Apr 30, 2009 at 11:21 AM, jason hadoop jason.had...@gmail.com wrote: You need to make sure that the shared library is available on the tasktracker nodes, either by installing it, or by pushing it around via the distributed cache On Wed, Apr 29, 2009 at 8:19 PM, Ian jonhson jonhson@gmail.com wrote: Dear all, I wrote a plugin codes for Hadoop, which calls the interfaces in Cpp-built .so library. The plugin codes are written in java, so I prepared a JNI class to encapsulate the C interfaces. The java codes can be executed successfully when I compiled it and run it standalone. However, it does not work when I embedded in Hadoop. The exception shown out is (found in Hadoop logs): screen dump - # grep myClass logs/* -r logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1217897050 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1887898624 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) ... It seems the library can not be loaded in Hadoop. My codes (myClass.java) is like: --- myClass.java -- public class myClass { public static final Log LOG = LogFactory.getLog(org.apache.hadoop.mapred.myClass); public myClass() { try { //System.setProperty(java.library.path, /usr/local/lib); /* The above line does not work, so I have to do something * like following line. */ addDir(new String(/usr/local/lib)); System.loadLibrary(myclass); } catch(UnsatisfiedLinkError e) { LOG.info( Cannot load library:\n + e.toString() ); } catch(IOException ioe) { LOG.info( IO error:\n + ioe.toString() ); } } /* Since the System.setProperty() does not work, I have to add the following * function to force the path is added in java.library.path */ public static void addDir(String s) throws IOException { try { Field field = ClassLoader.class.getDeclaredField(usr_paths); field.setAccessible(true); String[] paths = (String[])field.get(null); for (int i = 0; i paths.length; i++) { if (s.equals(paths[i])) { return; } } String[] tmp = new String[paths.length+1]; System.arraycopy(paths,0,tmp,0,paths.length); tmp[paths.length] = s; field.set(null,tmp); } catch (IllegalAccessException e) { throw new IOException(Failed to get permissions to set library path); } catch (NoSuchFieldException e) { throw new IOException(Failed to get field handle to set library path); } } public native int myClassfsMount(String subsys); public native int
Re: Load .so library error when Hadoop calls JNI interfaces
Does hadoop now support jni calls in Mappers or Reducers? If yes, how? If not, I think we should create a jira issue for supporting that. On 09-4-30 下午4:02, Ian jonhson jonhson@gmail.com wrote: Thanks for answering. I run my Hadoop in single node, not cluster mode. On Thu, Apr 30, 2009 at 11:21 AM, jason hadoop jason.had...@gmail.com wrote: You need to make sure that the shared library is available on the tasktracker nodes, either by installing it, or by pushing it around via the distributed cache On Wed, Apr 29, 2009 at 8:19 PM, Ian jonhson jonhson@gmail.com wrote: Dear all, I wrote a plugin codes for Hadoop, which calls the interfaces in Cpp-built .so library. The plugin codes are written in java, so I prepared a JNI class to encapsulate the C interfaces. The java codes can be executed successfully when I compiled it and run it standalone. However, it does not work when I embedded in Hadoop. The exception shown out is (found in Hadoop logs): screen dump - # grep myClass logs/* -r logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1217897050 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1887898624 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) ... It seems the library can not be loaded in Hadoop. My codes (myClass.java) is like: --- myClass.java -- public class myClass { public static final Log LOG = LogFactory.getLog(org.apache.hadoop.mapred.myClass); public myClass() { try { //System.setProperty(java.library.path, /usr/local/lib); /* The above line does not work, so I have to do something * like following line. */ addDir(new String(/usr/local/lib)); System.loadLibrary(myclass); } catch(UnsatisfiedLinkError e) { LOG.info( Cannot load library:\n + e.toString() ); } catch(IOException ioe) { LOG.info( IO error:\n + ioe.toString() ); } } /* Since the System.setProperty() does not work, I have to add the following * function to force the path is added in java.library.path */ public static void addDir(String s) throws IOException { try { Field field = ClassLoader.class.getDeclaredField(usr_paths); field.setAccessible(true); String[] paths = (String[])field.get(null); for (int i = 0; i paths.length; i++) { if (s.equals(paths[i])) { return; } } String[] tmp = new String[paths.length+1]; System.arraycopy(paths,0,tmp,0,paths.length); tmp[paths.length] = s; field.set(null,tmp); } catch (IllegalAccessException e) { throw new IOException(Failed to get permissions to set library path); } catch (NoSuchFieldException e) { throw new IOException(Failed to get field handle to set library path); } } public native int myClassfsMount(String subsys); public native int myClassfsUmount(String subsys); } I don't know what missed in my codes and am wondering whether there are any rules in Hadoop I should obey if I want to achieve my target. FYI, the myClassfsMount() and myClassfsUmount() will open a socket to call services from a daemon. I would better if this design did not cause the fail in my codes. Any comments? Thanks in advance, Ian -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: I need help
Razen Alharbi wrote: Thanks everybody, The issue was that hadoop writes all the outputs to stderr instead of stdout and i don't know why. I would really love to know why the usual hadoop job progress is written to stderr. because there is a line in log4.properties telling it to do just that? log4j.appender.console.target=System.err -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: Load .so library error when Hadoop calls JNI interfaces
Hi Jason, when will the full version of your book be available?? On Thu, Apr 30, 2009 at 8:51 AM, jason hadoop jason.had...@gmail.comwrote: You need to make sure that the shared library is available on the tasktracker nodes, either by installing it, or by pushing it around via the distributed cache On Wed, Apr 29, 2009 at 8:19 PM, Ian jonhson jonhson@gmail.com wrote: Dear all, I wrote a plugin codes for Hadoop, which calls the interfaces in Cpp-built .so library. The plugin codes are written in java, so I prepared a JNI class to encapsulate the C interfaces. The java codes can be executed successfully when I compiled it and run it standalone. However, it does not work when I embedded in Hadoop. The exception shown out is (found in Hadoop logs): screen dump - # grep myClass logs/* -r logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1217897050 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1887898624 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) ... It seems the library can not be loaded in Hadoop. My codes (myClass.java) is like: --- myClass.java -- public class myClass { public static final Log LOG = LogFactory.getLog(org.apache.hadoop.mapred.myClass); public myClass() { try { //System.setProperty(java.library.path, /usr/local/lib); /* The above line does not work, so I have to do something * like following line. */ addDir(new String(/usr/local/lib)); System.loadLibrary(myclass); } catch(UnsatisfiedLinkError e) { LOG.info( Cannot load library:\n + e.toString() ); } catch(IOException ioe) { LOG.info( IO error:\n + ioe.toString() ); } } /* Since the System.setProperty() does not work, I have to add the following * function to force the path is added in java.library.path */ public static void addDir(String s) throws IOException { try { Field field = ClassLoader.class.getDeclaredField(usr_paths); field.setAccessible(true); String[] paths = (String[])field.get(null); for (int i = 0; i paths.length; i++) { if (s.equals(paths[i])) { return; } } String[] tmp = new String[paths.length+1]; System.arraycopy(paths,0,tmp,0,paths.length); tmp[paths.length] = s; field.set(null,tmp); } catch (IllegalAccessException e) { throw new IOException(Failed to get permissions to set library path); } catch (NoSuchFieldException e) { throw new IOException(Failed to get field handle to set library path); } } public native int myClassfsMount(String subsys); public native int myClassfsUmount(String subsys); } I don't know what missed in my codes and am wondering whether there are any rules in Hadoop I should obey if I want to achieve my target. FYI, the myClassfsMount() and myClassfsUmount() will open a socket to call services from a daemon. I would better if this design did not cause the fail in my codes. Any comments? Thanks in advance, Ian -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: Load .so library error when Hadoop calls JNI interfaces
2009/4/30 He Yongqiang heyongqi...@software.ict.ac.cn: put your .so file in every traker's Hadoop-install/lib/native/Linux-xxx-xx/ Or In your code,try to do String oldPath=System.getProperty(java.library.path); System.setProperty(java.library.path, oldPath==null? local_path_of_lib_file:oldPath+pathSeparator +local_path_of_lib_file)) System.loadLibrary(XXX); I have copied .so and .a files to Hadoop-install/lib/native/Linux-xxx-xx/ and called System.loadLibrary(XXX); in my codes, but nothing happens. Then, I tried the second solution mentioned above, same problem is occurred (the .so files have been in native directory). However, you also need to fetch the library to local through DistributedCache( like jason said) or putting and getting it from hdfs by yourself. Does I need to copy libraries in local machine since I run the Hadoop in single node? How can I do it either by fetching or putting from hdfs? On 09-4-30 下午5:14, Ian jonhson jonhson@gmail.com wrote: You mean that the current hadoop does not support JNI calls, right? Are there any solution to achieve the calls from C interfaces? 2009/4/30 He Yongqiang heyongqi...@software.ict.ac.cn: Does hadoop now support jni calls in Mappers or Reducers? If yes, how? If not, I think we should create a jira issue for supporting that.
Re: Load .so library error when Hadoop calls JNI interfaces
You mean that the current hadoop does not support JNI calls, right? Are there any solution to achieve the calls from C interfaces? 2009/4/30 He Yongqiang heyongqi...@software.ict.ac.cn: Does hadoop now support jni calls in Mappers or Reducers? If yes, how? If not, I think we should create a jira issue for supporting that. On 09-4-30 下午4:02, Ian jonhson jonhson@gmail.com wrote: Thanks for answering. I run my Hadoop in single node, not cluster mode. On Thu, Apr 30, 2009 at 11:21 AM, jason hadoop jason.had...@gmail.com wrote: You need to make sure that the shared library is available on the tasktracker nodes, either by installing it, or by pushing it around via the distributed cache On Wed, Apr 29, 2009 at 8:19 PM, Ian jonhson jonhson@gmail.com wrote: Dear all, I wrote a plugin codes for Hadoop, which calls the interfaces in Cpp-built .so library. The plugin codes are written in java, so I prepared a JNI class to encapsulate the C interfaces. The java codes can be executed successfully when I compiled it and run it standalone. However, it does not work when I embedded in Hadoop. The exception shown out is (found in Hadoop logs): screen dump - # grep myClass logs/* -r logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1217897050 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) logs/hadoop-hadoop-tasktracker-testbed0.container.org.out:Exception in thread JVM Runner jvm_200904261632_0001_m_-1887898624 spawned. java.lang.UnsatisfiedLinkError: org.apache.hadoop.mapred.myClass.myClassfsMount(Ljava/lang/String;)I logs/hadoop-hadoop-tasktracker-testbed0.container.org.out: at org.apache.hadoop.mapred.myClass.myClassfsMount(Native Method) ... It seems the library can not be loaded in Hadoop. My codes (myClass.java) is like: --- myClass.java -- public class myClass { public static final Log LOG = LogFactory.getLog(org.apache.hadoop.mapred.myClass); public myClass() { try { //System.setProperty(java.library.path, /usr/local/lib); /* The above line does not work, so I have to do something * like following line. */ addDir(new String(/usr/local/lib)); System.loadLibrary(myclass); } catch(UnsatisfiedLinkError e) { LOG.info( Cannot load library:\n + e.toString() ); } catch(IOException ioe) { LOG.info( IO error:\n + ioe.toString() ); } } /* Since the System.setProperty() does not work, I have to add the following * function to force the path is added in java.library.path */ public static void addDir(String s) throws IOException { try { Field field = ClassLoader.class.getDeclaredField(usr_paths); field.setAccessible(true); String[] paths = (String[])field.get(null); for (int i = 0; i paths.length; i++) { if (s.equals(paths[i])) { return; } } String[] tmp = new String[paths.length+1]; System.arraycopy(paths,0,tmp,0,paths.length); tmp[paths.length] = s; field.set(null,tmp); } catch (IllegalAccessException e) { throw new IOException(Failed to get permissions to set library path); } catch (NoSuchFieldException e) { throw new IOException(Failed to get field handle to set library path); } } public native int myClassfsMount(String subsys); public native int myClassfsUmount(String subsys); } I don't know what missed in my codes and am wondering whether there are any rules in Hadoop I should obey if I want to achieve my target. FYI, the myClassfsMount() and myClassfsUmount() will open a socket to call services from a daemon. I would better if this design did not cause the fail in my codes. Any comments? Thanks in advance, Ian -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: unable to see anything in stdout
First thing I would do is to run the job in the local jobrunner (as a single process on your local machine without involving the cluster): JobConf conf = . // set other params, mapper, etc. here conf.set(mapred.job.tracker, local); // use localjobrunner conf.set(fs.default.name, file:///); // read from local hard disk instead of hdfs JobClient.runJob(conf); This will actually print stdout, stderr, etc. to your local terminal. Try this on a single input file. This will let you confirm that it does, in fact, write to stdout. - Aaron On Thu, Apr 30, 2009 at 9:00 AM, Asim linka...@gmail.com wrote: Hi, I am not able to see any job output in userlogs/task_id/stdout. It remains empty even though I have many println statements. Are there any steps to debug this problem? Regards, Asim
Re: Specifying System Properties in the had
So you want a different -Dfoo=test on each node? It's probably grabbing the setting from the node where the job was submitted, and this overrides the settings on each task node. Try adding finaltrue/final to the property block on the tasktrackers, then restart Hadoop and try again. This will prevent the job from overriding the setting. - Aaron On Thu, Apr 30, 2009 at 9:25 AM, Marc Limotte mlimo...@feeva.com wrote: I'm trying to set a System Property in the Hadoop config, so my jobs will know which cluster they are running on. I think I should be able to do this with -Dname=value in mapred.child.java.opts (example below), but the setting is ignored. In hadoop-site.xml I have: property namemapred.child.java.opts/name value-Xmx200m -Dfoo=test/value /property But the job conf through the web server indicates: mapred.child.java.opts -Xmx1024M -Duser.timezone=UTC I'm using Hadoop-0.17.2.1. Any tips on why my setting is not picked up? Marc PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
Re: Specifying System Properties in the had
Another way to do this would be to set a property in the Hadoop config itself. In the job launcher you would have something like: JobConf conf = ... conf.setProperty(foo, test); Then you can read the property in your map or reduce task. Tom On Thu, Apr 30, 2009 at 3:25 PM, Aaron Kimball aa...@cloudera.com wrote: So you want a different -Dfoo=test on each node? It's probably grabbing the setting from the node where the job was submitted, and this overrides the settings on each task node. Try adding finaltrue/final to the property block on the tasktrackers, then restart Hadoop and try again. This will prevent the job from overriding the setting. - Aaron On Thu, Apr 30, 2009 at 9:25 AM, Marc Limotte mlimo...@feeva.com wrote: I'm trying to set a System Property in the Hadoop config, so my jobs will know which cluster they are running on. I think I should be able to do this with -Dname=value in mapred.child.java.opts (example below), but the setting is ignored. In hadoop-site.xml I have: property namemapred.child.java.opts/name value-Xmx200m -Dfoo=test/value /property But the job conf through the web server indicates: mapred.child.java.opts -Xmx1024M -Duser.timezone=UTC I'm using Hadoop-0.17.2.1. Any tips on why my setting is not picked up? Marc PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
Infinite Loop Resending status from task tracker
Has anyone seen this before? Our task tracker produced a 2.7 gig log file in a few hours. The entry is all the same (every 2 ms): 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 ... (And on and on and on...) These are the few lines before it started: 2009-04-30 02:34:29,780 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx: 50060, dest: 10.253.178.95:40268, bytes: 3341324, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_06_0 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 418891 bytes for reduce: 12 from map: attempt_200904291917_0352_m_07_0 given 418891/418887 from 4301462 with (22, 171) 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx: 50060, dest: xxx.xxx.xxx.xxx:40268, bytes: 418891, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_07_0 2009-04-30 02:34:35,382 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:38,385 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 --And on for2+ gigs
Re: Master crashed
Alex Loddengaard wrote: I'm confused. Why are you trying to stop things when you're bringing the name node back up? Try running start-all.sh instead. Alex Won't that try to start the daemons on the slave nodes again? They're already running. M On Tue, Apr 28, 2009 at 4:00 PM, Mayuran Yogarajah mayuran.yogara...@casalemedia.com wrote: The master in my cluster crashed, the dfs/mapred java processes are still running on the slaves. What should I do next? I brought the master back up and ran stop-mapred.sh and stop-dfs.sh and it said this: slave1.test.com: no tasktracker to stop slave1.test.com: no datanode to stop Not sure what happened here, please advise. thanks, M
Re: Infinite Loop Resending status from task tracker
Hi Lance, Can I ask what version you were running when you saw this? Is it reproducible? I'm trying to look at the code path that might produce such a behavior and want to make sure I'm looking at the right version. Thanks -Todd On Thu, Apr 30, 2009 at 9:33 AM, Lance Riedel la...@dotspots.com wrote: Has anyone seen this before? Our task tracker produced a 2.7 gig log file in a few hours. The entry is all the same (every 2 ms): 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 ... (And on and on and on...) These are the few lines before it started: 2009-04-30 02:34:29,780 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx:50060, dest: 10.253.178.95:40268, bytes: 3341324, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_06_0 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 418891 bytes for reduce: 12 from map: attempt_200904291917_0352_m_07_0 given 418891/418887 from 4301462 with (22, 171) 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx:50060, dest: xxx.xxx.xxx.xxx:40268, bytes: 418891, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_07_0 2009-04-30 02:34:35,382 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:38,385 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 --And on for2+ gigs
Re: Infinite Loop Resending status from task tracker
I have not been able to reproduce. We are using version 19.1 with the following patches: 4780-2v19.patch (Jira 4780) closeAll3.patch (Jira 3998) Thanks, Lance On Apr 30, 2009, at 10:40 AM, Todd Lipcon wrote: Hi Lance, Can I ask what version you were running when you saw this? Is it reproducible? I'm trying to look at the code path that might produce such a behavior and want to make sure I'm looking at the right version. Thanks -Todd On Thu, Apr 30, 2009 at 9:33 AM, Lance Riedel la...@dotspots.com wrote: Has anyone seen this before? Our task tracker produced a 2.7 gig log file in a few hours. The entry is all the same (every 2 ms): 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 ... (And on and on and on...) These are the few lines before it started: 2009-04-30 02:34:29,780 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx:50060, dest: 10.253.178.95:40268, bytes: 3341324, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_06_0 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 418891 bytes for reduce: 12 from map: attempt_200904291917_0352_m_07_0 given 418891/418887 from 4301462 with (22, 171) 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx:50060, dest: xxx.xxx.xxx.xxx:40268, bytes: 418891, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_07_0 2009-04-30 02:34:35,382 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:38,385 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 --And on for2+ gigs
Re: Infinite Loop Resending status from task tracker
Hey Lance, Did you see any error messages in the JobTracker logs around the time this started? I think I understand how this might happen. Thanks, -Todd On Thu, Apr 30, 2009 at 10:45 AM, Lance Riedel la...@dotspots.com wrote: I have not been able to reproduce. We are using version 19.1 with the following patches: 4780-2v19.patch (Jira 4780) closeAll3.patch (Jira 3998) Thanks, Lance On Apr 30, 2009, at 10:40 AM, Todd Lipcon wrote: Hi Lance, Can I ask what version you were running when you saw this? Is it reproducible? I'm trying to look at the code path that might produce such a behavior and want to make sure I'm looking at the right version. Thanks -Todd On Thu, Apr 30, 2009 at 9:33 AM, Lance Riedel la...@dotspots.com wrote: Has anyone seen this before? Our task tracker produced a 2.7 gig log file in a few hours. The entry is all the same (every 2 ms): 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 ... (And on and on and on...) These are the few lines before it started: 2009-04-30 02:34:29,780 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx:50060, dest: 10.253.178.95:40268, bytes: 3341324, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_06_0 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 418891 bytes for reduce: 12 from map: attempt_200904291917_0352_m_07_0 given 418891/418887 from 4301462 with (22, 171) 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: xxx.xxx.xxx.xxx:50060, dest: xxx.xxx.xxx.xxx:40268, bytes: 418891, op: MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_07_0 2009-04-30 02:34:35,382 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:38,385 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200904291917_0352_r_03_0 0.3030303% reduce copy (10 of 11 at 0.32 MB/s) 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' with reponseId '5341 --And on for2+ gigs
Re: Infinite Loop Resending status from task tracker
Here are the job tracker logs from the same time (and yes.. there is something there!!): 2009-04-30 02:34:28,484 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0252_r_03_0 2009-04-30 02:34:40,215 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1a93388, false, true, 5341) from 10.253.134.191:42688: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-04-30 02:34:40,215 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0296_r_14_1 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200904291917_0352_r_13_0' to tip task_200904291917_0352_r_13, for tracker 'tracker_domU-12-31-38-00- F0-41.compute-1.internal:localhost.localdomain/127.0.0.1:42479' 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200904291917_0343_m_03_0' from 'tracker_domU-12-31-38-00- F0-41.compute-1.internal:localhost.localdomain/127.0.0.1:42479' 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200904291917_0343_m_07_0' from 'tracker_domU-12-31-38-00- F0-41.compute-1.internal:localhost.localdomain/127.0.0.1:42479' And then.. a LOT more 2009-04-30 02:34:40,433 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0252_r_03_0 2009-04-30 02:34:40,433 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_200904291917_0352_m_10_1' of TIP 'task_200904291917_0352_m_10' 2009-04-30 02:34:40,433 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1b7b4c1, false, true, 5341) from 10.253.134.191:42688: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-04-30 02:34:40,441 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0252_r_03_0 2009-04-30 02:34:40,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_200904291917_0352_m_10_1' of TIP 'task_200904291917_0352_m_10' 2009-04-30 02:34:40,442 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1598c57, false, true, 5341) from 10.253.134.191:42688: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-04-30 02:34:40,444 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0252_r_03_0 2009-04-30 02:34:40,444 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_200904291917_0352_m_10_1' of TIP 'task_200904291917_0352_m_10' 2009-04-30 02:34:40,444 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54311, call
Re: Infinite Loop Resending status from task tracker
Hey Lance, Thanks for the logs. They definitely confirmed my suspicion. There are two problems here: 1) If the JobTracker throws an exception during processing of a heartbeat, the tasktracker retries with no delay, since lastHeartbeat isn't updated in TaskTracker.offerService. This is related to HADOOP-3987 2) If the TaskTracker sends a task in COMMIT_PENDING state with an invalid task id, the jobtracker will trigger a NullPointerException in JobTracker.getTasksToSave. Instead it should probably create a KillTaskAction. I just filed a JIRA to track this issue: https://issues.apache.org/jira/browse/HADOOP-5761 3) The TaskTracker somehow had a task attempt in COMMIT_PENDING state that the JobTracker didn't know about. How it got there is a separate problem that's a bit harder to track down. Thanks -Todd On Thu, Apr 30, 2009 at 11:17 AM, Lance Riedel la...@dotspots.com wrote: Here are the job tracker logs from the same time (and yes.. there is something there!!): 2009-04-30 02:34:28,484 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0252_r_03_0 2009-04-30 02:34:40,215 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1a93388, false, true, 5341) from 10.253.134.191:42688: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-04-30 02:34:40,215 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0296_r_14_1 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200904291917_0352_r_13_0' to tip task_200904291917_0352_r_13, for tracker 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ 127.0.0.1:42479' 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200904291917_0343_m_03_0' from 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ 127.0.0.1:42479' 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200904291917_0343_m_07_0' from 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ 127.0.0.1:42479' And then.. a LOT more 2009-04-30 02:34:40,433 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0252_r_03_0 2009-04-30 02:34:40,433 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_200904291917_0352_m_10_1' of TIP 'task_200904291917_0352_m_10' 2009-04-30 02:34:40,433 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1b7b4c1, false, true, 5341) from 10.253.134.191:42688: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-04-30 02:34:40,441 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200904291917_0252_r_03_0 2009-04-30 02:34:40,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_200904291917_0352_m_10_1' of TIP 'task_200904291917_0352_m_10' 2009-04-30 02:34:40,442 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1598c57, false, true, 5341) from 10.253.134.191:42688: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at
Re: Master crashed
On 4/30/09 10:18 AM, Mayuran Yogarajah mayuran.yogara...@casalemedia.com wrote: Alex Loddengaard wrote: I'm confused. Why are you trying to stop things when you're bringing the name node back up? Try running start-all.sh instead. Alex Won't that try to start the daemons on the slave nodes again? They're already running. That doesn't matter, start-all.sh detects already running processes and does not bring up duplicates. You can run it 100x in a row without a stop if you wanted: namenode running as process 12621. Stop it first. datanode running as process 28540. Stop it first. jobtracker running as process 12814. Stop it first. tasktracker running as process 28763. Stop it first. M On Tue, Apr 28, 2009 at 4:00 PM, Mayuran Yogarajah mayuran.yogara...@casalemedia.com wrote: The master in my cluster crashed, the dfs/mapred java processes are still running on the slaves. What should I do next? I brought the master back up and ran stop-mapred.sh and stop-dfs.sh and it said this: slave1.test.com: no tasktracker to stop slave1.test.com: no datanode to stop Not sure what happened here, please advise. thanks, M
Implementing compareTo in user-written keys where one extends the other is error prone
Hi. I had difficulties in getting Reduce sorting to wor - it took me a good art of a day to figure out what was going wrong, so I'm sharing this in hopes of earning something from the community or getting hadoop improved to avoid thisind of error for future users. I have 2 key classes, one holds a String, the other one extends that, and adds a boolean. I implemented the first key class (let's call it Super) public class Super implements WritableComparableSuper { . . . public int compareTo(Super o) { // sort on string value . . . } I implemented the 2nd key class (let's call it Sub) public class Sub extends Super { . . . public int compareTo(Sub o) { // sort on boolean value . . . // if equal, use the super: ... else return super.compareTo(o); } With this setup, I used the Sub class as a mapper output key, and expected the sort on the boolean value to happen first, then for equal values there, the sort on the string values. What actually happened, was that the sort on the boolean value was skipped completely, and only the sort on the string was done. The reason for this is that (in 0.19.1 release) the WritableCompator instance that is created (using the defaults - no custom Comparator) knows the class is Sub, and calls from the key value it created, and calls the compareTo method, passing it the other key. Both of these keys are of type Sub. However, they are passed via this code in WritableComparator: public int compare(WritableComparable a, WritableComparable b) { return a.compareTo(b); } Java uses the interface spec for WritableComparable that was declared, in this case WritableComparableSuper, and infers that the arg type for the compareTo is Super. So it skips calling the compareTo in Sub, and just calls the one in Super. The workaround is to change the signature of Sub's compareTo method to match the spec in the interface, namely it has to take the Super as an argument, and then cast it to Sub. This seems like a very error prone design. Am I doing something wrong, or can this be improved so that this kind of error is avoided? -Marshall Schor
classpath for finding Key classes
Hi - I have a classpath question. In hadoop, one can define the Java classes to be used for Keys and Values. I am doing this. When I make my giant Jar file holding everything needed for running my application, I include these classes. However, I've discovered that that is not enough it seems (in 0.19.1 version - in case that matters :-) ). The job start up processes is reading the configuration and finding the names of my Key classes, and tries to load them. But it is not using the giant Jar for my job, (yet), so it doesn't find them. A work-around that I've found is to include my giant Jar as the argument to -libjars - that seems to get the class path set up so the startup / validation code can find my classes. This seems wasteful - having the giant jar in two places... Is there a best practices way to do this that's better than this? Thanks. -Marshall Schor
Re: Implementing compareTo in user-written keys where one extends the other is error prone
If you use custom key types, you really should be defining a RawComparator. It will perform much much better. -- Owen