Re: JobTracker Faiing to respond with OutOfMemory error
Any update on this? We got a similar problem after we ran a hadoop job with a lot of mappers. Restarting jobtracker solved the problem for a few times. But right now, we got the out of memory error right after we restarted the jobtracker. Thanks. On Wed, Nov 19, 2008 at 8:40 PM, Palleti, Pallavi [EMAIL PROTECTED] wrote: Hi all, We are using hadoop-0.17.2 for some time now. Since yesterday, We have been seeing jobTracker failing to respond with an OutOfMemory Error very frequently. Things are going fine after restarting it. But the problem is occurring after a while. Below is the exception that we are seeing in jobtracker logs. Can someone please suggest what is going wrong in this? 2008-11-19 14:17:46,059 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9001, call heartbeat([EMAIL PROTECTED], false, true, 16068) from 205.188.170.107:51492: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space java.io.IOException: java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1452) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.mapred.Task.getTaskOutputPath(Task.java:214) at org.apache.hadoop.mapred.Task.setConf(Task.java:517) at org.apache.hadoop.mapred.TaskInProgress.getTaskToRun(TaskInProgress.java :745) at org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.ja va:664) at org.apache.hadoop.mapred.JobTracker.getNewTaskForTaskTracker(JobTracker. java:1585) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1309) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) 2008-11-19 14:18:11,869 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001, call heartbeat([EMAIL PROTECTED], false, true, 16077) f rom 205.188.170.84:54871: discarded for being too old (133957) 2008-11-19 14:18:11,869 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001, call heartbeat([EMAIL PROTECTED], false, true, 16082) from 205.188.170.90:32934: discarded for being too old (133957) Thanks Pallavi -- tp
Re: JobTracker Faiing to respond with OutOfMemory error
I found the following error message in hadoop-middleware-jobtracker-dd-9c32d01.off.tn.ask.com.out Java HotSpot(TM) Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated On Fri, Dec 5, 2008 at 10:58 AM, charles du [EMAIL PROTECTED] wrote: Any update on this? We got a similar problem after we ran a hadoop job with a lot of mappers. Restarting jobtracker solved the problem for a few times. But right now, we got the out of memory error right after we restarted the jobtracker. Thanks. On Wed, Nov 19, 2008 at 8:40 PM, Palleti, Pallavi [EMAIL PROTECTED] wrote: Hi all, We are using hadoop-0.17.2 for some time now. Since yesterday, We have been seeing jobTracker failing to respond with an OutOfMemory Error very frequently. Things are going fine after restarting it. But the problem is occurring after a while. Below is the exception that we are seeing in jobtracker logs. Can someone please suggest what is going wrong in this? 2008-11-19 14:17:46,059 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9001, call heartbeat([EMAIL PROTECTED], false, true, 16068) from 205.188.170.107:51492: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space java.io.IOException: java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1452) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.mapred.Task.getTaskOutputPath(Task.java:214) at org.apache.hadoop.mapred.Task.setConf(Task.java:517) at org.apache.hadoop.mapred.TaskInProgress.getTaskToRun(TaskInProgress.java :745) at org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.ja va:664) at org.apache.hadoop.mapred.JobTracker.getNewTaskForTaskTracker(JobTracker. java:1585) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1309) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) 2008-11-19 14:18:11,869 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001, call heartbeat([EMAIL PROTECTED], false, true, 16077) f rom 205.188.170.84:54871: discarded for being too old (133957) 2008-11-19 14:18:11,869 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001, call heartbeat([EMAIL PROTECTED], false, true, 16082) from 205.188.170.90:32934: discarded for being too old (133957) Thanks Pallavi -- tp -- tp
slow shuffle
We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting
getting Configuration object in mapper
I have set some variable using the JobConf object. jobConf.set(Operator, operator) etc. How can I get an instance of Configuration object/ JobConf object inside a map method so that I can retrieve these variables. Thanks -Abhinit
Re: slow shuffle
These configuration options will be useful: property namemapred.job.shuffle.merge.percent/name value0.66/value descriptionThe usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapred.job.shuffle.input.buffer.percent. /description /property property namemapred.job.shuffle.input.buffer.percent/name value0.70/value descriptionThe percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. /description /property property namemapred.job.reduce.input.buffer.percent/name value0.0/value descriptionThe percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting
stack trace from hung task
Hi, When a task tracker kills a non-responsive task, it prints out a message Task X not reported status for 600 seconds. Killing!. The stack trace it then dumps out is that of the task tracker itself. Is there a way to get the hung task to dump out its stack trace before exiting? Would be nice if there was an easy way to send a kill -3 to the hung process and then kill it. Sriram
Re: slow shuffle
it takes 50% of the total time. --- On Fri, 12/5/08, Alex Loddengaard [EMAIL PROTECTED] wrote: From: Alex Loddengaard [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 11:43 AM These configuration options will be useful: property namemapred.job.shuffle.merge.percent/name value0.66/value descriptionThe usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapred.job.shuffle.input.buffer.percent. /description /property property namemapred.job.shuffle.input.buffer.percent/name value0.70/value descriptionThe percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. /description /property property namemapred.job.reduce.input.buffer.percent/name value0.0/value descriptionThe percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting
Re: stack trace from hung task
For what it's worth, I started seeing these when I upgraded to 0.19. I was using 10 reduces, but changed it to 30 reduces for my job and now I don't see these errors any more. Thanks, Ryan On Fri, Dec 5, 2008 at 2:44 PM, Sriram Rao [EMAIL PROTECTED] wrote: Hi, When a task tracker kills a non-responsive task, it prints out a message Task X not reported status for 600 seconds. Killing!. The stack trace it then dumps out is that of the task tracker itself. Is there a way to get the hung task to dump out its stack trace before exiting? Would be nice if there was an easy way to send a kill -3 to the hung process and then kill it. Sriram
Re: getting Configuration object in mapper
On Dec 4, 2008, at 9:19 PM, abhinit wrote: I have set some variable using the JobConf object. jobConf.set(Operator, operator) etc. How can I get an instance of Configuration object/ JobConf object inside a map method so that I can retrieve these variables. In your Mapper class, implement a method like: public void configure(JobConf job) { ... } This will be called when the object is created with the job conf. -- Owen
Re: slow shuffle
A little more information: We optimized our Map process quite a bit that now the Shuffle becomes the bottleneck. 1. There are 300 Map jobs (128M size block), each takes about 13 sec. 2. The Reducer starts running at a very late stage (80% maps are done) 3. copy 300 map outputs (shuffle) takes as long as the entire map process, although each map output is just about 50Kbytes --- On Fri, 12/5/08, Alex Loddengaard [EMAIL PROTECTED] wrote: From: Alex Loddengaard [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 11:43 AM These configuration options will be useful: property namemapred.job.shuffle.merge.percent/name value0.66/value descriptionThe usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapred.job.shuffle.input.buffer.percent. /description /property property namemapred.job.shuffle.input.buffer.percent/name value0.70/value descriptionThe percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. /description /property property namemapred.job.reduce.input.buffer.percent/name value0.0/value descriptionThe percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting
Re: getting Configuration object in mapper
I have a related question - I have a class which is both mapper and reducer. How can I tell in configure() if the current task is map or a reduce task? Parse the taskid? C Owen O'Malley wrote: On Dec 4, 2008, at 9:19 PM, abhinit wrote: I have set some variable using the JobConf object. jobConf.set(Operator, operator) etc. How can I get an instance of Configuration object/ JobConf object inside a map method so that I can retrieve these variables. In your Mapper class, implement a method like: public void configure(JobConf job) { ... } This will be called when the object is created with the job conf. -- Owen
Re: getting Configuration object in mapper
check : mapred.task.is.map Craig Macdonald wrote: I have a related question - I have a class which is both mapper and reducer. How can I tell in configure() if the current task is map or a reduce task? Parse the taskid? C Owen O'Malley wrote: On Dec 4, 2008, at 9:19 PM, abhinit wrote: I have set some variable using the JobConf object. jobConf.set(Operator, operator) etc. How can I get an instance of Configuration object/ JobConf object inside a map method so that I can retrieve these variables. In your Mapper class, implement a method like: public void configure(JobConf job) { ... } This will be called when the object is created with the job conf. -- Owen
Re: slow shuffle
We have 4 testing data nodes with 3 reduce tasks. The parallel.copies parameter has been increased to 20,30, even 50. But it doesn't really help... --- On Fri, 12/5/08, Aaron Kimball [EMAIL PROTECTED] wrote: From: Aaron Kimball [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 12:28 PM How many reduce tasks do you have? Look into increasing mapred.reduce.parallel.copies from the default of 5 to something more like 20 or 30. - Aaron On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen [EMAIL PROTECTED]wrote: A little more information: We optimized our Map process quite a bit that now the Shuffle becomes the bottleneck. 1. There are 300 Map jobs (128M size block), each takes about 13 sec. 2. The Reducer starts running at a very late stage (80% maps are done) 3. copy 300 map outputs (shuffle) takes as long as the entire map process, although each map output is just about 50Kbytes --- On Fri, 12/5/08, Alex Loddengaard [EMAIL PROTECTED] wrote: From: Alex Loddengaard [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 11:43 AM These configuration options will be useful: property namemapred.job.shuffle.merge.percent/name value0.66/value descriptionThe usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapred.job.shuffle.input.buffer.percent. /description /property property namemapred.job.shuffle.input.buffer.percent/name value0.70/value descriptionThe percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. /description /property property namemapred.job.reduce.input.buffer.percent/name value0.0/value descriptionThe percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting
Re: slow shuffle
I think one of the issues is that the Reducer starts very late in the process, slowing the entire job significantly. Is there a way to let reducer start earlier? --- On Fri, 12/5/08, Songting Chen [EMAIL PROTECTED] wrote: From: Songting Chen [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 1:27 PM We have 4 testing data nodes with 3 reduce tasks. The parallel.copies parameter has been increased to 20,30, even 50. But it doesn't really help... --- On Fri, 12/5/08, Aaron Kimball [EMAIL PROTECTED] wrote: From: Aaron Kimball [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 12:28 PM How many reduce tasks do you have? Look into increasing mapred.reduce.parallel.copies from the default of 5 to something more like 20 or 30. - Aaron On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen [EMAIL PROTECTED]wrote: A little more information: We optimized our Map process quite a bit that now the Shuffle becomes the bottleneck. 1. There are 300 Map jobs (128M size block), each takes about 13 sec. 2. The Reducer starts running at a very late stage (80% maps are done) 3. copy 300 map outputs (shuffle) takes as long as the entire map process, although each map output is just about 50Kbytes --- On Fri, 12/5/08, Alex Loddengaard [EMAIL PROTECTED] wrote: From: Alex Loddengaard [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 11:43 AM These configuration options will be useful: property namemapred.job.shuffle.merge.percent/name value0.66/value descriptionThe usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapred.job.shuffle.input.buffer.percent. /description /property property namemapred.job.shuffle.input.buffer.percent/name value0.70/value descriptionThe percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. /description /property property namemapred.job.reduce.input.buffer.percent/name value0.0/value descriptionThe percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting
Re: slow shuffle
To summarize the slow shuffle issue: 1. I think one problem is that the Reducer starts very late in the process, slowing the entire job significantly. Is there a way to let reducer start earlier? 2. Copying 300 files with 30K each took total 3 mins (after all map finished). This really puzzles me what's behind the scene. (note that sorting takes 1 sec) Thanks, -Songting --- On Fri, 12/5/08, Songting Chen [EMAIL PROTECTED] wrote: From: Songting Chen [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 1:27 PM We have 4 testing data nodes with 3 reduce tasks. The parallel.copies parameter has been increased to 20,30, even 50. But it doesn't really help... --- On Fri, 12/5/08, Aaron Kimball [EMAIL PROTECTED] wrote: From: Aaron Kimball [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 12:28 PM How many reduce tasks do you have? Look into increasing mapred.reduce.parallel.copies from the default of 5 to something more like 20 or 30. - Aaron On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen [EMAIL PROTECTED]wrote: A little more information: We optimized our Map process quite a bit that now the Shuffle becomes the bottleneck. 1. There are 300 Map jobs (128M size block), each takes about 13 sec. 2. The Reducer starts running at a very late stage (80% maps are done) 3. copy 300 map outputs (shuffle) takes as long as the entire map process, although each map output is just about 50Kbytes --- On Fri, 12/5/08, Alex Loddengaard [EMAIL PROTECTED] wrote: From: Alex Loddengaard [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 11:43 AM These configuration options will be useful: property namemapred.job.shuffle.merge.percent/name value0.66/value descriptionThe usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapred.job.shuffle.input.buffer.percent. /description /property property namemapred.job.shuffle.input.buffer.percent/name value0.70/value descriptionThe percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. /description /property property namemapred.job.reduce.input.buffer.percent/name value0.0/value descriptionThe percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting
File loss at Nebraska
We are continuing to see a small, consistent amount of block corruption leading to file loss. We have been upgrading our cluster lately, which means we've been doing a rolling de-commissioning of our nodes (and then adding them back with more disks!). Previously, when I've had time to investigate this very deeply, I've found issues like these: https://issues.apache.org/jira/browse/HADOOP-4692 https://issues.apache.org/jira/browse/HADOOP-4543 I suspect that this causes some or all of our problems. I also saw that one of our nodes was at 100.2% full; I think this is due to the same issue; Hadoop's actual usage of the file system is greater than the max capacity because some of the blocks were truncated. I'd have to check with our sysadmins, but I think we've lost about 200-300 files during the upgrade process. Right now, there are about 900 chronically under-replicated blocks; in the past, that's meant the only replica is actually corrupt, and Hadoop is trying to relentlessly retransfer it, failing to, but not realizing the source is corrupt. To some extent, this whole issue is caused because we only have enough space for 2 replicas; I'd imagine that at 3 replicas, the issue would be much harder to trigger. Any suggestions? For us, file loss is something we can deal with (not necessarily fun to deal with, of course), but it might not be the case in the future. Brian
Re: Issues with V0.19 upgrade
Not sure if anyone else answered... 1. You need to run hadoop dfsadmin -finalizeUpgrade. Be careful, because you can't go back once you do this. http://wiki.apache.org/hadoop/Hadoop_Upgrade I don't know about 2. -Michael On 12/3/08 5:49 PM, Songting Chen [EMAIL PROTECTED] wrote: 1. The namenode webpage shows: Upgrades: Upgrade for version -18 has been completed. Upgrade is not finalized. 2. SequenceFile.Writer failed when trying to creating a new file with the following error: (we have two HaDoop clusters, both have issue 1; one has issue 2, but the other is fine on issue 2). Any idea what's going on? Thanks, -Songting java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198) at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:243) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413) at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236) at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221)
Block not found during commitBlockSynchronization
Hey, I'm seeing this message repeated over and over in my logs: 2008-12-05 19:20:00,534 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_-4236881263392665762_88597, newgenerationstamp=0, newlength=0, newtargets=[]) 2008-12-05 19:20:00,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 9000, call commitBlockSynchronization(blk_-4236881263392665762_88597, 0, 0, false, true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@67537412) from 172.16.1.184:57586: error: java.io.IOException: Block (=blk_-4236881263392665762_88597) not found java.io.IOException: Block (=blk_-4236881263392665762_88597) not found at org .apache .hadoop .hdfs .server .namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java: 1898) at org .apache .hadoop .hdfs .server.namenode.NameNode.commitBlockSynchronization(NameNode.java:410) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) What can I do to debug? Brian
Re: Block not found during commitBlockSynchronization
Which version are you using? Calling commitBlockSynchronization(...) with newgenerationstamp=0, newlength=0, newtargets=[] does not look normal. You may check the namenode log and the client log about the block blk_-4236881263392665762. Nicholas Sze - Original Message From: Brian Bockelman [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Friday, December 5, 2008 5:22:03 PM Subject: Block not found during commitBlockSynchronization Hey, I'm seeing this message repeated over and over in my logs: 2008-12-05 19:20:00,534 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_-4236881263392665762_88597, newgenerationstamp=0, newlength=0, newtargets=[]) 2008-12-05 19:20:00,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 9000, call commitBlockSynchronization(blk_-4236881263392665762_88597, 0, 0, false, true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@67537412) from 172.16.1.184:57586: error: java.io.IOException: Block (=blk_-4236881263392665762_88597) not found java.io.IOException: Block (=blk_-4236881263392665762_88597) not found at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1898) at org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:410) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) What can I do to debug? Brian
Re: Block not found during commitBlockSynchronization
This is 0.19.0. Grepping around, it appears that message for this block has been printed 1-5 Hz throughout all our logs (oldest logs are 12-3). Has happened about .5 million times. If I grep for the nextGenerationStamp error message, it's happened .4M times. Anything else I can provide? Brian On Dec 5, 2008, at 8:31 PM, Tsz Wo (Nicholas), Sze wrote: Which version are you using? Calling commitBlockSynchronization(...) with newgenerationstamp=0, newlength=0, newtargets=[] does not look normal. You may check the namenode log and the client log about the block blk_-4236881263392665762. Nicholas Sze - Original Message From: Brian Bockelman [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Friday, December 5, 2008 5:22:03 PM Subject: Block not found during commitBlockSynchronization Hey, I'm seeing this message repeated over and over in my logs: 2008-12-05 19:20:00,534 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_-4236881263392665762_88597, newgenerationstamp=0, newlength=0, newtargets=[]) 2008-12-05 19:20:00,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 9000, call commitBlockSynchronization(blk_-4236881263392665762_88597, 0, 0, false, true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@67537412) from 172.16.1.184:57586: error: java.io.IOException: Block (=blk_-4236881263392665762_88597) not found java.io.IOException: Block (=blk_-4236881263392665762_88597) not found at org .apache .hadoop .hdfs .server .namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java: 1898) at org .apache .hadoop .hdfs .server.namenode.NameNode.commitBlockSynchronization(NameNode.java: 410) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) What can I do to debug? Brian