Re: hadoop MapReduce and stop words
Just use a java.util.HashSet for this. There should only be a few dozen stopwords, so load them into a HashSet when the Mapper starts up, and then check your tokens against it while you're processing records. -- Stefan > From: tim robertson > Reply-To: > Date: Sat, 16 May 2009 15:48:23 +0200 > To: > Subject: Re: hadoop MapReduce and stop words > > Try and google "binary tree java" and you will get loads of hits... > > This is a simple implementation but I am sure there are better ones > that handle balancing better. > > Cheers > Tim > > > public class BinaryTree { > > public static void main(String[] args) { > BinaryTree bt = new BinaryTree(); > > for (int i = 0; i < 1; i++) { > bt.insert("" + i); > } > > System.out.println(bt.lookup("999")); > System.out.println(bt.lookup("100")); > System.out.println(bt.lookup("a")); // should be null > } > > private Node root; > > private static class Node { > Node left; > Node right; > String value; > > public Node(String value) { > this.value = value; > } > } > > public String lookup(String key) { > return (lookup(root, key)); > } > > private String lookup(Node node, String value) { > if (node == null) { > return (null); > } > > if (value.equals(node.value)) { > return (node.value); > } else if (value.compareTo(node.value) < 0) { > return (lookup(node.left, value)); > } else { > return (lookup(node.right, value)); > } > } > > public void insert(String value) { > root = insert(root, value); > } > > private Node insert(Node node, String value) { > if (node == null) { > node = new Node(value); > } else { > if (value.compareTo(node.value) <= 0) { > node.left = insert(node.left, value); > } else { > node.right = insert(node.right, value); > } > } > > return (node); > } > }
Re: How to do load control of MapReduce
Yes, I think the JVM uses way more memory than just its heap. Now some of it might be just reserved memory, but not actually used (not sure how to tell the difference). There are also things like thread stacks, jit compiler cache, direct nio byte buffers etc. that take up process space outside of the Java heap. But none of that should imho add up to Gigabytes... -- Stefan > From: zsongbo > Reply-To: > Date: Tue, 12 May 2009 20:06:37 +0800 > To: > Subject: Re: How to do load control of MapReduce > > Yes, I also found that the TaskTracker should not use so much memory. > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > > 32480 schubert 35 10 1411m 172m 9212 S0 2.2 8:54.78 java > > The previous 1GB is the default value, I have just change the heap of TT to > 384MB one hours ago. > > I also found DataNode also need not too much memory. > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > 32399 schubert 25 0 1638m 372m 9208 S2 4.7 32:46.28 java > > > In fact, I define the -Xmx512m in child opt for MapReduce tasks. But I found > the child task use more memory than the definition: > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > > 10577 schubert 30 10 942m 572m 9092 S 46 7.2 51:02.21 java > > 10507 schubert 29 10 878m 570m 9092 S 48 7.1 50:49.52 java > > Schubert > > On Tue, May 12, 2009 at 6:53 PM, Steve Loughran wrote: > >> zsongbo wrote: >> >>> Hi Stefan, >>> Yes, the 'nice' cannot resolve this problem. >>> >>> Now, in my cluster, there are 8GB of RAM. My java heap configuration is: >>> >>> HDFS DataNode : 1GB >>> HBase-RegionServer: 1.5GB >>> MR-TaskTracker: 1GB >>> MR-child: 512MB (max child task is 6, 4 map task + 2 reduce task) >>> >>> But the memory usage is still tight. >>> >> >> does TT need to be so big if you are running all your work in external VMs? >>
Re: Huge DataNode Virtual Memory Usage
Raghu, I don't actually have exact numbers from jmap, although I do remember that jmap -histo reported something less than 256MB for this process (before I restarted it). I just looked at another DFS process that is currently running and has a VM size of 1.5GB (~600 resident). Here jmap reports a total object heap usage of 120MB. The memory block list reported by jmap doesn't actually seem to contain the heap at all since the largest block in that list is 10MB in size (/usr/java/jdk1.6.0_10/jre/lib/amd64/server/libjvm.so). However, pmap reports a total usage of 1.56GB. -- Stefan > From: Raghu Angadi > Reply-To: > Date: Sun, 10 May 2009 20:06:13 -0700 > To: > Subject: Re: Huge DataNode Virtual Memory Usage > > what do 'jmap' and 'jmap -histo:live' show?. > > Raghu. > > Stefan Will wrote: >> Chris, >> >> Thanks for the tip ... However I'm already running 1.6_10: >> >> java version "1.6.0_10" >> Java(TM) SE Runtime Environment (build 1.6.0_10-b33) >> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) >> >> Do you know of a specific bug # in the JDK bug database that addresses this >> ? >> >> Cheers, >> Stefan >> >> >>> From: Chris Collins >>> Reply-To: >>> Date: Fri, 8 May 2009 20:34:21 -0700 >>> To: "core-user@hadoop.apache.org" >>> Subject: Re: Huge DataNode Virtual Memory Usage >>> >>> Stefan, there was a nasty memory leak in in 1.6.x before 1.6 10. It >>> manifested itself during major GC. We saw this on linux and solaris >>> and dramatically improved with an upgrade. >>> >>> C >>> On May 8, 2009, at 6:12 PM, Stefan Will wrote: >>> >>>> Hi, >>>> >>>> I just ran into something rather scary: One of my datanode processes >>>> that >>>> I¹m running with Xmx256M, and a maximum number of Xceiver threads >>>> of 4095 >>>> had a virtual memory size of over 7GB (!). I know that the VM size >>>> on Linux >>>> isn¹t necessarily equal to the actual memory used, but I wouldn¹t >>>> expect it >>>> to be an order of magnitude higher either. I ran pmap on the >>>> process, and it >>>> showed around 1000 thread stack blocks with roughly 1MB each (which >>>> is the >>>> default size on the 64bit JDK). The largest block was 3GB in size >>>> which I >>>> can¹t figure out what it is for. >>>> >>>> Does anyone have any insights into this ? Anything that can be done to >>>> prevent this other than to restart the DFS regularly ? >>>> >>>> -- Stefan >> >>
Re: How to do load control of MapReduce
I'm having similar performance issues and have been running my Hadoop processes using a nice level of 10 for a while, and haven't noticed any improvement. In my case, I believe what's happening is that the peak combined RAM usage of all the Hadoop task processes and the service processes exceeeds the ammount of RAM on my machines. This in turn causes part of the server processes to get paged out to disk while the nightly Hadoop batch processes are running. Since the swap space is typically on the same physical disks as the DFS and MapReduce working directory, I'm heavily IO bound and real time queries pretty much slow down to a crawl. I think the key is to make absolutely sure that all of your processes fit in your available RAM at all times. I'm actually having a hard time achieving this since the virtual memory usage of the JVM is usually way higher than the maximum heap size (see my other thread). -- Stefan > From: zsongbo > Reply-To: > Date: Tue, 12 May 2009 10:58:49 +0800 > To: > Subject: Re: How to do load control of MapReduce > > Thanks Billy,I am trying 'nice', and will report the result later. > > On Tue, May 12, 2009 at 3:42 AM, Billy Pearson > wrote: > >> Might try setting the tasktrackers linux nice level to say 5 or 10 >> leavening dfs and hbase setting to 0 >> >> Billy >> "zsongbo" wrote in message >> news:fa03480d0905110549j7f09be13qd434ca41c9f84...@mail.gmail.com... >> >> Hi all, >>> Now, if we have a large dataset to process by MapReduce. The MapReduce >>> will >>> take machine resources as many as possible. >>> >>> So when one such a big MapReduce job are running, the cluster would become >>> very busy and almost cannot do anything else. >>> >>> For example, we have a HDFS+MapReduc+HBase cluster. >>> There are a large dataset in HDFS to be processed by MapReduce >>> periodically, >>> the workload is CPU and I/O heavy. And the cluster also provide other >>> service for query (query HBase and read files in HDFS). So, when the job >>> is >>> running, the query latency will become very long. >>> >>> Since the MapReduce job is not time sensitive, I want to control the load >>> of >>> MapReduce. Do you have some advices ? >>> >>> Thanks in advance. >>> Schubert >>> >>> >> >>
Re: Huge DataNode Virtual Memory Usage
Chris, Thanks for the tip ... However I'm already running 1.6_10: java version "1.6.0_10" Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) Do you know of a specific bug # in the JDK bug database that addresses this ? Cheers, Stefan > From: Chris Collins > Reply-To: > Date: Fri, 8 May 2009 20:34:21 -0700 > To: "core-user@hadoop.apache.org" > Subject: Re: Huge DataNode Virtual Memory Usage > > Stefan, there was a nasty memory leak in in 1.6.x before 1.6 10. It > manifested itself during major GC. We saw this on linux and solaris > and dramatically improved with an upgrade. > > C > On May 8, 2009, at 6:12 PM, Stefan Will wrote: > >> Hi, >> >> I just ran into something rather scary: One of my datanode processes >> that >> I¹m running with Xmx256M, and a maximum number of Xceiver threads >> of 4095 >> had a virtual memory size of over 7GB (!). I know that the VM size >> on Linux >> isn¹t necessarily equal to the actual memory used, but I wouldn¹t >> expect it >> to be an order of magnitude higher either. I ran pmap on the >> process, and it >> showed around 1000 thread stack blocks with roughly 1MB each (which >> is the >> default size on the 64bit JDK). The largest block was 3GB in size >> which I >> can¹t figure out what it is for. >> >> Does anyone have any insights into this ? Anything that can be done to >> prevent this other than to restart the DFS regularly ? >> >> -- Stefan
Huge DataNode Virtual Memory Usage
Hi, I just ran into something rather scary: One of my datanode processes that I¹m running with Xmx256M, and a maximum number of Xceiver threads of 4095 had a virtual memory size of over 7GB (!). I know that the VM size on Linux isn¹t necessarily equal to the actual memory used, but I wouldn¹t expect it to be an order of magnitude higher either. I ran pmap on the process, and it showed around 1000 thread stack blocks with roughly 1MB each (which is the default size on the 64bit JDK). The largest block was 3GB in size which I can¹t figure out what it is for. Does anyone have any insights into this ? Anything that can be done to prevent this other than to restart the DFS regularly ? -- Stefan
Re: Reduce task attempt retry strategy
Jothi, thanks for the explanation. One question though: why shouldn't timed out tasks be retried on a different machine ? As you pointed out, it could very well have been due to the machine having problems. To me a timeout is just like any other kind of failure. -- Stefan > From: Jothi Padmanabhan > Reply-To: > Date: Mon, 13 Apr 2009 19:00:38 +0530 > To: > Subject: Re: Reduce task attempt retry strategy > > Currently, only failed tasks are attempted on a node other than the one > where it failed. For killed tasks, there is no such policy for retries. > > "failed to report status" usually indicates that the task did not report > sufficient progress. However, it is possible that the task itself was not > progressing fast enough because the machine where it ran had problems. > > > On 4/8/09 12:33 AM, "Stefan Will" wrote: > >> My cluster has 27 nodes with a total reduce task capacity of 54. The job had >> 31 reducers. I actually had a task today that showed the behavior you're >> describing: 3 tries on one machine, and then the 4th on a different one. >> >> As for the particular job I was talking about before: >> >> Here are the stats for the job: >> >> KindTotal Tasks(successful+failed+killed)Successful tasksFailed >> tasksKilled tasksStart TimeFinish Time >> Setup 1 1 0 0 4-Apr-2009 00:30:16 4-Apr-2009 >> 00:30:33 (17sec) >> Map 64 49 12 3 4-Apr-2009 00:30:33 4-Apr-2009 >> 01:11:15 (40mins, 41sec) >> Reduce 34 30 4 0 4-Apr-2009 00:30:44 4-Apr-2009 >> 04:31:36 (4hrs, 52sec) >> Cleanup 4 0 4 0 4-Apr-2009 04:31:36 4-Apr-2009 >> 06:32:00 (2hrs, 24sec) >> >> >> Not sure what to look for in the jobtracker log. All it shows for that >> particular failed task is that it assigned it to the same machine 4 times >> and then eventually failed. Perhaps something to note is that the 4 failures >> were all due to timeouts: >> >> "Task attempt_200904031942_0002_r_13_3 failed to report status for 1802 >> seconds. Killing!" >> >> Also, looking at the logs, there was a map task too that was retried on that >> particuar box 4 times without going to a different one. Perhaps it had >> something to do with the way this machine failed: The jobtracker still >> considered it live, while all actual tasks assigned to it timed out. >> >> -- Stefan >> >> >> >>> From: Amar Kamat >>> Reply-To: >>> Date: Tue, 07 Apr 2009 10:05:16 +0530 >>> To: >>> Subject: Re: Reduce task attempt retry strategy >>> >>> Stefan Will wrote: >>>> Hi, >>>> >>>> I had a flaky machine the other day that was still accepting jobs and >>>> sending heartbeats, but caused all reduce task attempts to fail. This in >>>> turn caused the whole job to fail because the same reduce task was retried >>>> 3 >>>> times on that particular machine. >>>> >>> What is your cluster size? If a task fails on a machine then its >>> re-tried on some other machine (based on number of good machines left in >>> the cluster). After certain number of failures, the machine will be >>> blacklisted (again based on number of machine left in the cluster). 3 >>> different reducers might be scheduled on that machine but that should >>> not lead to job failure. Can you explain in detail what exactly >>> happened. Find out where the attempts got scheduled from the >>> jobtracker's log. >>> Amar >>>> Perhaps I¹m confusing this with the block placement strategy in hdfs, but I >>>> always thought that the framework would retry jobs on a different machine >>>> if >>>> retries on the original machine keep failing. E.g. I would have expected to >>>> retry once or twice on the same machine, but then switch to a different one >>>> to minimize the likelihood of getting stuck on a bad machine. >>>> >>>> What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for >>>> improving on this in the future ? >>>> >>>> Thanks, >>>> Stefan >>>> >>>> >> >>
Re: Reduce task attempt retry strategy
My cluster has 27 nodes with a total reduce task capacity of 54. The job had 31 reducers. I actually had a task today that showed the behavior you're describing: 3 tries on one machine, and then the 4th on a different one. As for the particular job I was talking about before: Here are the stats for the job: KindTotal Tasks(successful+failed+killed)Successful tasksFailed tasksKilled tasksStart TimeFinish Time Setup 1 1 0 0 4-Apr-2009 00:30:16 4-Apr-2009 00:30:33 (17sec) Map 64 49 12 3 4-Apr-2009 00:30:33 4-Apr-2009 01:11:15 (40mins, 41sec) Reduce 34 30 4 0 4-Apr-2009 00:30:44 4-Apr-2009 04:31:36 (4hrs, 52sec) Cleanup 4 0 4 0 4-Apr-2009 04:31:36 4-Apr-2009 06:32:00 (2hrs, 24sec) Not sure what to look for in the jobtracker log. All it shows for that particular failed task is that it assigned it to the same machine 4 times and then eventually failed. Perhaps something to note is that the 4 failures were all due to timeouts: "Task attempt_200904031942_0002_r_13_3 failed to report status for 1802 seconds. Killing!" Also, looking at the logs, there was a map task too that was retried on that particuar box 4 times without going to a different one. Perhaps it had something to do with the way this machine failed: The jobtracker still considered it live, while all actual tasks assigned to it timed out. -- Stefan > From: Amar Kamat > Reply-To: > Date: Tue, 07 Apr 2009 10:05:16 +0530 > To: > Subject: Re: Reduce task attempt retry strategy > > Stefan Will wrote: >> Hi, >> >> I had a flaky machine the other day that was still accepting jobs and >> sending heartbeats, but caused all reduce task attempts to fail. This in >> turn caused the whole job to fail because the same reduce task was retried 3 >> times on that particular machine. >> > What is your cluster size? If a task fails on a machine then its > re-tried on some other machine (based on number of good machines left in > the cluster). After certain number of failures, the machine will be > blacklisted (again based on number of machine left in the cluster). 3 > different reducers might be scheduled on that machine but that should > not lead to job failure. Can you explain in detail what exactly > happened. Find out where the attempts got scheduled from the > jobtracker's log. > Amar >> Perhaps I¹m confusing this with the block placement strategy in hdfs, but I >> always thought that the framework would retry jobs on a different machine if >> retries on the original machine keep failing. E.g. I would have expected to >> retry once or twice on the same machine, but then switch to a different one >> to minimize the likelihood of getting stuck on a bad machine. >> >> What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for >> improving on this in the future ? >> >> Thanks, >> Stefan >> >>
Reduce task attempt retry strategy
Hi, I had a flaky machine the other day that was still accepting jobs and sending heartbeats, but caused all reduce task attempts to fail. This in turn caused the whole job to fail because the same reduce task was retried 3 times on that particular machine. Perhaps I¹m confusing this with the block placement strategy in hdfs, but I always thought that the framework would retry jobs on a different machine if retries on the original machine keep failing. E.g. I would have expected to retry once or twice on the same machine, but then switch to a different one to minimize the likelihood of getting stuck on a bad machine. What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for improving on this in the future ? Thanks, Stefan
Re: intermediate results not getting compressed
I noticed this too. I think the compression only applies to the final mapper and reducer outputs, but not any intermediate files produced. The reducer will decompress the map output files after copying them, and then compress its own output only after it has finished. I wonder if this is by design, or just an oversight. -- Stefan > From: Billy Pearson > Reply-To: > Date: Wed, 18 Mar 2009 22:14:07 -0500 > To: > Subject: Re: intermediate results not getting compressed > > I can run head on the map.out files and I get compressed garbish but I run > head on a intermediate file and I can read the data in the file clearly so > compression is not getting passed but I am setting the CompressMapOutput to > true by default in my hadoop-site.conf file. > > Billy > > > "Billy Pearson" > wrote in message news:gpscu3$66...@ger.gmane.org... >> the intermediate.X files are not getting compresses for some reason not >> sure why >> I download and build the latest branch for 0.19 >> >> o.a.h.mapred.Merger.class line 432 >> new Writer(conf, fs, outputFile, keyClass, valueClass, codec); >> >> this seams to use the codec defined above but for some reasion its not >> working correctly the compression is not passing from the map output files >> to the on disk merge of the intermediate.X files >> >> tail task report from one server: >> >> 2009-03-18 19:19:02,643 INFO org.apache.hadoop.mapred.ReduceTask: >> Interleaved on-disk merge complete: 1730 files left. >> 2009-03-18 19:19:02,645 INFO org.apache.hadoop.mapred.ReduceTask: >> In-memory merge complete: 3 files left. >> 2009-03-18 19:19:02,650 INFO org.apache.hadoop.mapred.ReduceTask: Keeping >> 3 segments, 39835369 bytes in memory for intermediate, on-disk merge >> 2009-03-18 19:19:03,878 INFO org.apache.hadoop.mapred.ReduceTask: Merging >> 1730 files, 70359998581 bytes from disk >> 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.ReduceTask: Merging >> 0 segments, 0 bytes from memory into reduce >> 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.Merger: Merging 1733 >> sorted segments >> 2009-03-18 19:19:04,161 INFO org.apache.hadoop.mapred.Merger: Merging 22 >> intermediate segments out of a total of 1733 >> 2009-03-18 19:21:43,693 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1712 >> 2009-03-18 19:27:07,033 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1683 >> 2009-03-18 19:33:27,669 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1654 >> 2009-03-18 19:40:38,243 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1625 >> 2009-03-18 19:48:08,151 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1596 >> 2009-03-18 19:57:16,300 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1567 >> 2009-03-18 20:07:34,224 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1538 >> 2009-03-18 20:17:54,715 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1509 >> 2009-03-18 20:28:49,273 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1480 >> 2009-03-18 20:39:28,830 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1451 >> 2009-03-18 20:50:23,706 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1422 >> 2009-03-18 21:01:36,818 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1393 >> 2009-03-18 21:13:09,509 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1364 >> 2009-03-18 21:25:17,304 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1335 >> 2009-03-18 21:36:48,536 INFO org.apache.hadoop.mapred.Merger: Merging 30 >> intermediate segments out of a total of 1306 >> >> See the size of the files is about ~70GB (70359998581) these are >> compressed at this points its went from 1733 file to 1306 left to merge >> and the intermediate.X files are well over 200Gb at this point and we are >> not even close to done. If compression is working we should not see task >> failing at this point in the task becuase lack of hard drvie space sense >> as we merge we delete the merged file from the output folder. >> >> I only see this happening when there are to many files left that did not >> get merged durring the shuffle stage and it starts on disk mergeing. >> the task that complete the merges and keep it below the io.sort size in my >> case 30 skips the on disk merge and complete useing normal hard drive >> space. >> >> Anyone care to take a look? >> This job takes two or more days to get to this point so getting kind of a >> pain in the butt to run and watch the reduces fail and the job keep >
Re: Massive discrepancies in job's bytes written/read
Some of the discrepancy could be due to compression of map input/output format. E.g. The mapper output bytes will show the compressed size, while the reduce input will show the uncompressed size. Or something along those lines. But I'm also questioning the accuracy of the reporting and suspect that some if it is due to all the disk activity that happens while processing spills in the mapper and the copy/shuffle/sort phase in the reducer. It would certainly be nice if all the byte counts were reported in a way that they're comparable. -- Stefan > From: Bryan Duxbury > Reply-To: > Date: Tue, 17 Mar 2009 17:26:52 -0700 > To: > Subject: Massive discrepancies in job's bytes written/read > > Hey all, > > In looking at the stats for a number of our jobs, the amount of data > that the UI claims we've read from or written to HDFS is vastly > larger than the amount of data that should be involved in the job. > For instance, we have a job that combines small files into big files > that we're operating on around 2TB worth of data. The outputs in HDFS > (via hadoop dfs -du) matches the expected size, but the jobtracker UI > claims that we've read and written around 22TB of data! > > By all accounts, Hadoop is actually *doing* the right thing - we're > not observing excess data reading or writing anywhere. However, this > massive discrepancy makes the job stats essentially worthless for > understanding IO in our jobs. > > Does anyone know why there's such an enormous difference? Have others > experienced this problem? > > -Bryan
Re: AlredyBeingCreatedExceptions after upgrade to 0.19.0
Yes, I am using MultipleOutputFormat with the side-band output happening in the Mapper. But there is also Reducer that uses just the regular OutputCollector. -- Stefan > From: Rasit OZDAS > Reply-To: > Date: Tue, 17 Feb 2009 15:43:12 +0200 > To: > Subject: Re: AlredyBeingCreatedExceptions after upgrade to 0.19.0 > > Stefan and Thibaut, are you using MultipleOutputFormat, and how many > reducers do you have? > if you're using MultipleOutputFormat and have no reducer, there is a JIRA > ticket about this issue. > https://issues.apache.org/jira/browse/HADOOP-5268 > > Or there is a different JIRA issue (it's not resolved yet, but gives some > underlying info) > https://issues.apache.org/jira/browse/HADOOP-4264 > > Or this issue (not resolved): > https://issues.apache.org/jira/browse/HADOOP-1583 > > Rasit > > 2009/2/16 Thibaut_ > >> >> I have the same problem. >> >> is there any solution to this? >> >> Thibaut >> >> >> -- >> View this message in context: >> http://www.nabble.com/AlredyBeingCreatedExceptions-after-upgrade-to-0.19.0-tp >> 21631077p22043484.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> > > > -- > M. Raşit ÖZDAŞ
TaskTrackers being double counted after restart job recovery
Hi, I¹m using the new persistent job state feature in 0.19.0, and it¹s worked really well so far. However, this morning my JobTracker died with and OOM error (even though the heap size is set to 768M). So I killed it and all the TaskTrackers. After starting everything up again, all my nodes were showing up twice in the JobTracker web interface, with different port numbers. Also, some of the jobs it restarted had already been completed when the job tracker died. Any idea what might be happening here ? How can I fix this ? Will temporarily setting mapred.jobtracker.restart.recover=false clear things up ? -- Stefan
AlredyBeingCreatedExceptions after upgrade to 0.19.0
Hi, Since I¹ve upgraded to 0.19.0, I¹ve been getting the following exceptions when restarting jobs, or even when a failed reducer is being restarted by the job tracker. It appears that stale file locks in the namenode don¹t get properly released sometimes: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file ... for DFSClient_attempt_200901211615_0077_r_10_3 on client 10.1.20.119 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSName system.java:1052) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.j ava:995) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:301) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) Has anyone run into this ? Any workarounds/fixes ? Thanks, Stefan
Re: Merging reducer outputs into a single part-00000 file
Jim, As far as I know, there is no difference in terms of the number of output partitions relative to the OutputFormat used. If you want to sample your output file, I'd suggest you write a new MR job that uses a random number generator to sample your output files, and outputs text key/value pairs in the mapper, and uses exactly one reducer with the TextOutputFormat. You don't even need to supply a reducer class if your mapper outputs Text/Text key/value pairs. -- Stefan > From: Jim Twensky > Reply-To: > Date: Sun, 11 Jan 2009 01:55:35 -0600 > To: > Subject: Merging reducer outputs into a single part-0 file > > Hello, The original map-reduce paper states: "After successful completion, > the output of the map-reduce execution is available in the R output files > (one per reduce task, with file names as specified by the user)." However, > when using Hadoop's TextOutputFormat, all the reducer outputs are combined in > a single file called part-0. I was wondering how and when this > merging process is done. When the reducer calls output.collect(key,value), is > this record written to a local temporary output file in the reducer's disk > and then these local files (a total of R) are later merged into one single > file with a final thread or is it directly written to the final output > file (part-0)? I am asking this because I'd like to get an ordered sample > of the final output data, ie. one record per every 1000 records or > something similar and I don't want to run a serial process that iterates on > the final output file. Thanks, Jim
Re: NameNode fatal crash - 0.18.1
I did some more poking around the name node code to find out what might have caused it to deem the storage directories to be inaccessible. Apparently it does this when it catches an IOException while trying to write to or append to the edit log. I guess that makes sense, but unfortunately it doesn't actually log the actual exception: try { EditLogFileOutputStream eStream = new EditLogFileOutputStream(getEditNewFile(sd)); eStream.create(); editStreams.add(eStream); } catch (IOException e) { // remove stream and this storage directory from list processIOError(sd); it.remove(); } -- Stefan > From: Stefan Will > Reply-To: > Date: Thu, 08 Jan 2009 11:16:29 -0800 > To: > Subject: Re: NameNode fatal crash - 0.18.1 > > Jonathan, > > It looks like the same thing happened to us today (on Hadoop 0.19.0). We > were running a nightly backup, and at some point, the namenode said: > > > 2009-01-08 05:57:28,021 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_2140680350762285267_117754, > newgenerationstamp=117757, newlength=44866560, newtargets=[10.1.20. > 116:50010, 10.1.20.111:50010]) > 2009-01-08 05:57:30,270 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit > log. Fatal Error. > 2009-01-08 05:57:30,882 FATAL > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All > storage directories are inaccessible. > 2009-01-08 05:57:31,072 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: > > Now when I try and start up the namenode again, I get an EOFException: > > > 2009-01-08 10:41:45,465 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at java.io.DataInputStream.readLong(DataInputStream.java:399) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.readCheckpointTime(FSImage.ja > va:549) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.getFields(FSImage.java:540) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.j > ava:227) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.j > ava:216) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage > .java:289) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.j > ava:87) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem. > java:311) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java > :290) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163 > ) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java > :859) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) > > > Did you ever figure out why your backup caused this to happen ? Our backup > wasn¹t even touching the Hadoop partitions on the master. Were you able to > recover your DFS state ? > > -- Stefan > > >> From: Jonathan Gray >> Reply-To: >> Date: Mon, 15 Dec 2008 12:35:39 -0800 >> To: >> Subject: RE: NameNode fatal crash - 0.18.1 >> >> I have fixed the issue with the SecondaryNameNode not contacting primary >> with the 'dfs.http.address' config option. >> >> Other issues still unsolved. >> >>> -Original Message- >>> From: Jonathan Gray [mailto:jl...@streamy.com] >>> Sent: Monday, December 15, 2008 10:55 AM >>> To: core-user@hadoop.apache.org >>> Subject: NameNode fatal crash - 0.18.1 >>> >>> I have a 10+1 node cluster, each slave running >>> DataNode/TaskTracker/HBase >>> RegionServer. >>> >>> At the time of this crash, NameNode and SecondaryNameNode were both >>> hosted >>> on same master. >>> >>> We do a nightly backup and about 95% of the way through, HDFS crashed >>> with... >>> >>> NameNode shows: >>> >>> 2008-12-15 01:49:31,178 ERROR org.apache.hadoop.fs.FSNamesystem: Unable >>> to >>> sync edit log. Fatal Error. >>> 2008-12-15 01:49:31,178 FATAL org.apache.hadoop.fs.FSNamesystem: Fatal >>> Error >>> : All storage directo
Re: NameNode fatal crash - 0.18.1
Jonathan, It looks like the same thing happened to us today (on Hadoop 0.19.0). We were running a nightly backup, and at some point, the namenode said: 2009-01-08 05:57:28,021 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_2140680350762285267_117754, newgenerationstamp=117757, newlength=44866560, newtargets=[10.1.20. 116:50010, 10.1.20.111:50010]) 2009-01-08 05:57:30,270 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit log. Fatal Error. 2009-01-08 05:57:30,882 FATAL org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All storage directories are inaccessible. 2009-01-08 05:57:31,072 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: Now when I try and start up the namenode again, I get an EOFException: 2009-01-08 10:41:45,465 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.server.namenode.FSImage.readCheckpointTime(FSImage.ja va:549) at org.apache.hadoop.hdfs.server.namenode.FSImage.getFields(FSImage.java:540) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.j ava:227) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.j ava:216) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage .java:289) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.j ava:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem. java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java :290) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163 ) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java :859) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) Did you ever figure out why your backup caused this to happen ? Our backup wasn¹t even touching the Hadoop partitions on the master. Were you able to recover your DFS state ? -- Stefan > From: Jonathan Gray > Reply-To: > Date: Mon, 15 Dec 2008 12:35:39 -0800 > To: > Subject: RE: NameNode fatal crash - 0.18.1 > > I have fixed the issue with the SecondaryNameNode not contacting primary > with the 'dfs.http.address' config option. > > Other issues still unsolved. > >> -Original Message- >> From: Jonathan Gray [mailto:jl...@streamy.com] >> Sent: Monday, December 15, 2008 10:55 AM >> To: core-user@hadoop.apache.org >> Subject: NameNode fatal crash - 0.18.1 >> >> I have a 10+1 node cluster, each slave running >> DataNode/TaskTracker/HBase >> RegionServer. >> >> At the time of this crash, NameNode and SecondaryNameNode were both >> hosted >> on same master. >> >> We do a nightly backup and about 95% of the way through, HDFS crashed >> with... >> >> NameNode shows: >> >> 2008-12-15 01:49:31,178 ERROR org.apache.hadoop.fs.FSNamesystem: Unable >> to >> sync edit log. Fatal Error. >> 2008-12-15 01:49:31,178 FATAL org.apache.hadoop.fs.FSNamesystem: Fatal >> Error >> : All storage directories are inaccessible. >> 2008-12-15 01:49:31,179 INFO org.apache.hadoop.dfs.NameNode: >> SHUTDOWN_MSG: >> >> Every single DataNode shows: >> >> 2008-12-15 01:49:32,340 WARN org.apache.hadoop.dfs.DataNode: >> java.io.IOException: Call failed on local exception >> at org.apache.hadoop.ipc.Client.call(Client.java:718) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) >> at org.apache.hadoop.dfs.$Proxy4.sendHeartbeat(Unknown Source) >> at >> org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:655) >> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) >> at java.lang.Thread.run(Thread.java:636) >> Caused by: java.io.EOFException >> at java.io.DataInputStream.readInt(DataInputStream.java:392) >> at >> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:499 >> ) >> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:441) >> >> >> This is virtually all of the information I have. At the same time as >> the >> backup, we have normal HBase traffic and our hourly batch MR jobs. So >> slave >> nodes were pretty heavily loaded, but don't see anything in DN logs >> besides >> this Call failed. There are no space issues or anything else, Ganglia >> shows >> high CPU load around this time which has been typical every night, but >> I >> don't see any issues in DN's or NN about expired leases/no >> heartbeats/etc. >> >> Is there a way to preve
Re: Datanode not detecting full disk
Hi Raghu, Each DN machine has 3 partitions, e.g.: FilesystemSize Used Avail Use% Mounted on /dev/sda1 20G 8.0G 11G 44% / /dev/sda3 1.4T 756G 508G 60% /data tmpfs 3.9G 0 3.9G 0% /dev/shm All of the paths in hadoop-site.xml point to /data, which is the partition that filled up to 100% (I deleted a bunch of files from HDFS since then). So I guess the question is whether the DN looks at just the partition its data directory is on, or all partitions when it determines disk usage. -- Stefan > From: Raghu Angadi <[EMAIL PROTECTED]> > Reply-To: > Date: Wed, 29 Oct 2008 11:57:07 -0700 > To: > Subject: Re: Datanode not detecting full disk > > Stefan Will wrote: >> Hi Jeff, >> >> Yeah, it looks like I'm running into the issues described in the bug. I'm >> running 0.18.1 on CentOS 5 by the way. Measuring available disk space >> appears to be harder than I thought ... and here I was under the impression >> the percentage in df was a pretty clear indicator of how full the disk is >> ;-) >> >> How did you guys solve/work around this ? > > How many partitions do you have? If it is just one and NameNode thinks > it has space though 'available' in df shows very less or no space, then > you need to file a jira. There should be no case where DN reports more > space than what 'available' field in 'df' shows. > > But if you have more partitions and only some of them are full, then it > is a different issue.. which should still be fixed. > > Raghu. > >> -- Stefan >> >> >>> From: Jeff Hammerbacher <[EMAIL PROTECTED]> >>> Reply-To: >>> Date: Mon, 27 Oct 2008 12:40:08 -0700 >>> To: >>> Subject: Re: Datanode not detecting full disk >>> >>> Hey Stefan, >>> >>> We used to have trouble with this issue at Facebook. What version are >>> you running? You might get more information on this ticket: >>> https://issues.apache.org/jira/browse/HADOOP-2991. >>> >>> Regards, >>> Jeff >>> >>> On Mon, Oct 27, 2008 at 10:00 AM, Stefan Will <[EMAIL PROTECTED]> wrote: >>>> Each of my datanodes has a system and a data partition, with dfs.data.dir >>>> pointed to the data partition. The data partition just filled up to 100% on >>>> all of my nodes (as evident via df), but the NameNode web ui still shows >>>> them only 88-94% full (interestingly, the numbers differ even though the >>>> machines are configured identically). I thought the datanodes used df to >>>> determine free space ? How is the storage utilization determined ? >>>> >>>> -- Stefan >>>> >> >>
Re: Datanode not detecting full disk
Hi Jeff, Yeah, it looks like I'm running into the issues described in the bug. I'm running 0.18.1 on CentOS 5 by the way. Measuring available disk space appears to be harder than I thought ... and here I was under the impression the percentage in df was a pretty clear indicator of how full the disk is ;-) How did you guys solve/work around this ? -- Stefan > From: Jeff Hammerbacher <[EMAIL PROTECTED]> > Reply-To: > Date: Mon, 27 Oct 2008 12:40:08 -0700 > To: > Subject: Re: Datanode not detecting full disk > > Hey Stefan, > > We used to have trouble with this issue at Facebook. What version are > you running? You might get more information on this ticket: > https://issues.apache.org/jira/browse/HADOOP-2991. > > Regards, > Jeff > > On Mon, Oct 27, 2008 at 10:00 AM, Stefan Will <[EMAIL PROTECTED]> wrote: >> Each of my datanodes has a system and a data partition, with dfs.data.dir >> pointed to the data partition. The data partition just filled up to 100% on >> all of my nodes (as evident via df), but the NameNode web ui still shows >> them only 88-94% full (interestingly, the numbers differ even though the >> machines are configured identically). I thought the datanodes used df to >> determine free space ? How is the storage utilization determined ? >> >> -- Stefan >>
Datanode not detecting full disk
Each of my datanodes has a system and a data partition, with dfs.data.dir pointed to the data partition. The data partition just filled up to 100% on all of my nodes (as evident via df), but the NameNode web ui still shows them only 88-94% full (interestingly, the numbers differ even though the machines are configured identically). I thought the datanodes used df to determine free space ? How is the storage utilization determined ? -- Stefan
Re: reduce task progress above 100%?
Sorry to hijack the discussion, but I've been seeing the same behavior after upgrading from 0.17.2 to 0.18.1-dev. I am using map output compression. -- Stefan > From: Arun C Murthy <[EMAIL PROTECTED]> > Reply-To: > Date: Tue, 16 Sep 2008 12:56:28 -0700 > To: > Subject: Re: reduce task progress above 100%? > > > On Sep 16, 2008, at 12:26 PM, pvvpr wrote: > >> Hello, >> A strange thing happened in my job. In reduce phase, one of the tasks >> status shows 101.44% complete and runs till some 102% and successfully >> finished back to 100%. Is this a right behavior? >> > > Which version of Hadoop are you running; and are you using compression > for map-outputs? > > Arun > >> I put a quick copy-paste of the web GUI reporting completion of the >> tasks. (sorry for the bad formatting below). >> >> thanks, >> Prasad Pingali, >> IIIT Hyderabad. >> >> >> >> Task Complete Status Start Time Finish Time Errors Counters >> task_200809151235_0026_r_00 90.91% >> >> reduce > reduce >> 15-Sep-2008 18:40:06 >> >> >> >> >> 0 >> task_200809151235_0026_r_04 86.13% >> >> reduce > reduce >> 15-Sep-2008 18:40:11 >> >> >> >> >> 0 >> task_200809151235_0026_r_08 85.08% >> >> reduce > reduce >> 15-Sep-2008 18:40:16 >> >> >> >> >> 0 >> task_200809151235_0026_r_12 82.57% >> >> reduce > reduce >> 15-Sep-2008 18:40:21 >> >> >> >> >> 0 >> task_200809151235_0026_r_14 97.90% >> >> reduce > reduce >> 15-Sep-2008 18:40:22 >> >> >> >> >> 0 >> task_200809151235_0026_r_17 101.44% >> reduce > reduce >> 15-Sep-2008 23:34:04 >> >> >> >> >> 0 >> >>
Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
I'll add a comment to Jira. I haven't tried the latest version of the patch yet, but since it's only changes the dfs client, not the datanode, I don't see how it would help with this. Two more things I noticed that happen when the datanodes become unresponsive (i.e. The "Last Contact" field on the namenode keeps increasing) is: 1. The datanode process seem to be completely hung for a while, including its Jetty web interface, sometimes for over 10 minutes. 2. The task tracker on the same machine keeps humming along, sending regular heartbeats To me this looks like there is some sort of temporary deadlock in the datanode that keeps it from responding to requests. Perhaps it's the block report being generated ? -- Stefan > From: Raghu Angadi <[EMAIL PROTECTED]> > Reply-To: > Date: Tue, 09 Sep 2008 16:40:02 -0700 > To: > Subject: Re: Could not obtain block: blk_-2634319951074439134_1129 > file=/user/root/crawl_debug/segments/20080825053518/content/part-2/data > > Espen Amble Kolstad wrote: >> There's a JIRA on this already: >> https://issues.apache.org/jira/browse/HADOOP-3831 >> Setting dfs.datanode.socket.write.timeout=0 in hadoop-site.xml seems >> to do the trick for now. > > Please comment on HADOOP-3831 that you are seeing this error.. so that > it gets committed. Did you try the patch for HADOOP-3831? > > thanks, > Raghu. > >> Espen >> >> On Mon, Sep 8, 2008 at 11:24 AM, Espen Amble Kolstad <[EMAIL PROTECTED]> >> wrote: >>> Hi, >>> >>> Thanks for the tip! >>> >>> I tried revision 692572 of the 0.18 branch, but I still get the same errors. >>> >>> On Sunday 07 September 2008 09:42:43 Dhruba Borthakur wrote: The DFS errors might have been caused by http://issues.apache.org/jira/browse/HADOOP-4040 thanks, dhruba On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <[EMAIL PROTECTED]> wrote: > These exceptions are apparently coming from the dfs side of things. Could > someone from the dfs side please look at these? > > On 9/5/08 3:04 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote: >> Hi, >> >> Thanks! >> The patch applies without change to hadoop-0.18.0, and should be >> included in a 0.18.1. >> >> However, I'm still seeing: >> in hadoop.log: >> 2008-09-05 11:13:54,805 WARN dfs.DFSClient - Exception while reading >> from blk_3428404120239503595_2664 of >> /user/trank/segments/20080905102650/crawl_generate/part-00010 from >> somehost:50010: java.io.IOException: Premeture EOF from in >> putStream >> >> in datanode.log: >> 2008-09-05 11:15:09,554 WARN dfs.DataNode - >> DatanodeRegistration(somehost:50010, >> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075, >> ipcPort=50020):Got exception while serving >> blk_-4682098638573619471_2662 to >> /somehost: >> java.net.SocketTimeoutException: 48 millis timeout while waiting >> for channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/somehost:50010 >> remote=/somehost:45244] >> >> These entries in datanode.log happens a few minutes apart repeatedly. >> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of >> free memory (so it's not resource starvation). >> >> Espen >> >> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: I started a profile of the reduce-task. I've attached the profiling output. It seems from the samples that ramManager.waitForDataToMerge() doesn't actually wait. Has anybody seen this behavior. >>> This has been fixed in HADOOP-3940 >>> >>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote: I have the same problem on our cluster. It seems the reducer-tasks are using all cpu, long before there's anything to shuffle. I started a profile of the reduce-task. I've attached the profiling output. It seems from the samples that ramManager.waitForDataToMerge() doesn't actually wait. Has anybody seen this behavior. Espen On Thursday 28 August 2008 06:11:42 wangxu wrote: > Hi,all > I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar, > and running hadoop on one namenode and 4 slaves. > attached is my hadoop-site.xml, and I didn't change the file > hadoop-default.xml > > when data in segments are large,this kind of errors occure: > > java.io.IOException: Could not obtain block: > blk_-2634319951074439134_1129 > file=/user/root/crawl_debug/segments/20080825053518/content/part- > 2/data at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie > nt.jav a:1462) at > org.apache.hadoop.dfs.DFSClient$DFSInputStream
Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
I'm not sure whether this is the same issue or not, but on my 4 slave cluster, setting the below parameter doesn't seem to fix the issue. What I'm seeing is that occasionally data nodes stop responding for up to 10 minutes at a time. In this case, the TaskTrackers will mark the nodes as dead, and occasionally the namenode will mark them as dead as well (you can see the "Last Contact" time steadily increase for a random node at a time every half hour or so. This seems to be happening during times of high disk utilization. -- Stefan > From: Espen Amble Kolstad <[EMAIL PROTECTED]> > Reply-To: > Date: Mon, 8 Sep 2008 12:40:01 +0200 > To: > Subject: Re: Could not obtain block: blk_-2634319951074439134_1129 > file=/user/root/crawl_debug/segments/20080825053518/content/part-2/data > > There's a JIRA on this already: > https://issues.apache.org/jira/browse/HADOOP-3831 > Setting dfs.datanode.socket.write.timeout=0 in hadoop-site.xml seems > to do the trick for now. > > Espen > > On Mon, Sep 8, 2008 at 11:24 AM, Espen Amble Kolstad <[EMAIL PROTECTED]> > wrote: >> Hi, >> >> Thanks for the tip! >> >> I tried revision 692572 of the 0.18 branch, but I still get the same errors. >> >> On Sunday 07 September 2008 09:42:43 Dhruba Borthakur wrote: >>> The DFS errors might have been caused by >>> >>> http://issues.apache.org/jira/browse/HADOOP-4040 >>> >>> thanks, >>> dhruba >>> >>> On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <[EMAIL PROTECTED]> wrote: These exceptions are apparently coming from the dfs side of things. Could someone from the dfs side please look at these? On 9/5/08 3:04 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote: > Hi, > > Thanks! > The patch applies without change to hadoop-0.18.0, and should be > included in a 0.18.1. > > However, I'm still seeing: > in hadoop.log: > 2008-09-05 11:13:54,805 WARN dfs.DFSClient - Exception while reading > from blk_3428404120239503595_2664 of > /user/trank/segments/20080905102650/crawl_generate/part-00010 from > somehost:50010: java.io.IOException: Premeture EOF from in > putStream > > in datanode.log: > 2008-09-05 11:15:09,554 WARN dfs.DataNode - > DatanodeRegistration(somehost:50010, > storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075, > ipcPort=50020):Got exception while serving > blk_-4682098638573619471_2662 to > /somehost: > java.net.SocketTimeoutException: 48 millis timeout while waiting > for channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/somehost:50010 > remote=/somehost:45244] > > These entries in datanode.log happens a few minutes apart repeatedly. > I've reduced # map-tasks so load on this node is below 1.0 with 5GB of > free memory (so it's not resource starvation). > > Espen > > On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: >>> I started a profile of the reduce-task. I've attached the profiling >>> output. It seems from the samples that ramManager.waitForDataToMerge() >>> doesn't actually wait. >>> Has anybody seen this behavior. >> >> This has been fixed in HADOOP-3940 >> >> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote: >>> I have the same problem on our cluster. >>> >>> It seems the reducer-tasks are using all cpu, long before there's >>> anything to >>> shuffle. >>> >>> I started a profile of the reduce-task. I've attached the profiling >>> output. It seems from the samples that ramManager.waitForDataToMerge() >>> doesn't actually wait. >>> Has anybody seen this behavior. >>> >>> Espen >>> >>> On Thursday 28 August 2008 06:11:42 wangxu wrote: Hi,all I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar, and running hadoop on one namenode and 4 slaves. attached is my hadoop-site.xml, and I didn't change the file hadoop-default.xml when data in segments are large,this kind of errors occure: java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part- 2/data at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClie nt.jav a:1462) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient. java:1 312) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:14 17) at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.j ava:64 ) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102 ) at org.apache.hadoop.io.SequenceFile$Reader.re