Re: only one reducer running in a hadoop cluster
On Feb 7, 2009, at 11:52 PM, Nick Cen wrote: Hi, I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and lucene together, so i copy some of the source code from nutch's Indexer class, but when i run my job, i found that there is only 1 reducer running on 1 pc, so the performance is not as far as expect. Set mapred.reduce.tasks in your configuration to the number of reduces, you want your jobs to have by default. Typically this should be 0.99 * mapred.tasktracker.reduce.tasks.maximum * number of computers.
RE: Reduce won't start until Map stage reaches 100%?
Hi I think the number of your job's reduce task is 1 because if the number of reduce task is 1 then reduce stage does not start until Map stage 100% completion. zhuweimin -Original Message- From: Taeho Kang [mailto:tka...@gmail.com] Sent: Monday, February 09, 2009 4:26 PM To: hadoop-u...@lucene.apache.org Subject: Reduce won't start until Map stage reaches 100%? Dear All, With Hadoop 0.19.0, Reduce stage does not start until Map stage gets to the 100% completion. Has anyone faced the similar situation? ... ... - map 90% reduce 0% - map 91% reduce 0% - map 92% reduce 0% - map 93% reduce 0% - map 94% reduce 0% - map 95% reduce 0% - map 96% reduce 0% - map 97% reduce 0% - map 98% reduce 0% - map 99% reduce 0% - map 100% reduce 0% - map 100% reduce 1% - map 100% reduce 2% - map 100% reduce 3% - map 100% reduce 4% - map 100% reduce 5% - map 100% reduce 6% - map 100% reduce 7% - map 100% reduce 8% - map 100% reduce 9% Thank you all in advance, /Taeho
Re: lost TaskTrackers
yes, I can access DFS from the cluster. namenode status seems to be OK and I see no errors in namenode log files. initially all trackers were visible, and 9433 maps completed successfully. Then, this was followed by 65975 which were killed. In log they all show same error: Error initializing attempt_200902081049_0001_m_004499_1: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) at org.apache.hadoop.ipc.Client.call(Client.java:686) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602) While this is happening, I can access Job tracker web interface, but it shows that there is 0 nodes in the cluster. I have tried to run this task several times and the result is always the same. It works at first and then starts failing. Vadim On Sun, Feb 8, 2009 at 22:19, Amar Kamat ama...@yahoo-inc.com wrote: Vadim Zaliva wrote: Hi! I am observing strange situation in my Hadoop cluster. While running task, eventually it gets into this strange mode where: 1. JobTracker reports 0 task trackers. 2. Task tracker processes are alive but log file is full of repeating messages like this: 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902 081049_0001_m_017698_0 done; removing files. 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt _200902081049_0001_m_017698_0 not found in cache 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902 081049_0001_m_021212_0 done; removing files. 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt _200902081049_0001_m_021212_0 not found in cache 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902 081049_0001_m_022133_0 done; removing files. with new one appearing every couple of seconds. In the task tracker log, before these repeating messages last 2 exceptions are: 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200902081049_0001_m_075408_3 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200902081049_0001_m_075408_3 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 8 and trying to launch attempt_200902081049_0001_m_07 5408_3 2009-02-08 17:46:51,483 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_200902081049_0001_m_075408_3: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) at org.apache.hadoop.ipc.Client.call(Client.java:686) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602) Looks like an RPC issue. Can you tell more about the cluster? Is there a task that finished successfully in this job? Can you access the dfs from the trackers? 2009-02-08
Re: only one reducer running in a hadoop cluster
Thanks everyone. I find the solution for this one, in my main method, i call the setNumReductTask() on JobConf with the value i want. 2009/2/9 Owen O'Malley omal...@apache.org On Feb 7, 2009, at 11:52 PM, Nick Cen wrote: Hi, I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and lucene together, so i copy some of the source code from nutch's Indexer class, but when i run my job, i found that there is only 1 reducer running on 1 pc, so the performance is not as far as expect. Set mapred.reduce.tasks in your configuration to the number of reduces, you want your jobs to have by default. Typically this should be 0.99 * mapred.tasktracker.reduce.tasks.maximum * number of computers. -- http://daily.appspot.com/food/
Re: can't read the SequenceFile correctly
On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote: Hey Tom, I got also burned by this ?? Why does BytesWritable.getBytes() returns non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of function. It does it because continually resizing the array to the valid length is very expensive. It would be a reasonable patch to add a getValidBytes, but most methods in Java's libraries are aware of this and let you pass in byte[], offset, and length. So once you realize what the problem is, you can work around it. -- Owen
Re: Reduce won't start until Map stage reaches 100%?
I believe that in Hadoop 0.19, scheduling was changed so that reduces don't start until 5% of maps have completed. The reasoning for this is that reduces can't do anything until there is some map output to copy over the network. So, if your job has very few map tasks, you won't see reduces start until the end. On Mon, Feb 9, 2009 at 12:51 AM, zhuweimin xim-...@tsm.kddilabs.jp wrote: Hi I think the number of your job's reduce task is 1 because if the number of reduce task is 1 then reduce stage does not start until Map stage 100% completion. zhuweimin -Original Message- From: Taeho Kang [mailto:tka...@gmail.com] Sent: Monday, February 09, 2009 4:26 PM To: hadoop-u...@lucene.apache.org Subject: Reduce won't start until Map stage reaches 100%? Dear All, With Hadoop 0.19.0, Reduce stage does not start until Map stage gets to the 100% completion. Has anyone faced the similar situation? ... ... - map 90% reduce 0% - map 91% reduce 0% - map 92% reduce 0% - map 93% reduce 0% - map 94% reduce 0% - map 95% reduce 0% - map 96% reduce 0% - map 97% reduce 0% - map 98% reduce 0% - map 99% reduce 0% - map 100% reduce 0% - map 100% reduce 1% - map 100% reduce 2% - map 100% reduce 3% - map 100% reduce 4% - map 100% reduce 5% - map 100% reduce 6% - map 100% reduce 7% - map 100% reduce 8% - map 100% reduce 9% Thank you all in advance, /Taeho
Re: Reduce won't start until Map stage reaches 100%?
On Feb 8, 2009, at 11:26 PM, Taeho Kang wrote: Dear All, With Hadoop 0.19.0, Reduce stage does not start until Map stage gets to the 100% completion. Has anyone faced the similar situation? How many maps and reduces does your job have? Arun
Using the Open Source Hadoop to Generate Data-Intensive Insights
Wednesday Feb 11, Mountain View, CA info/registration: http://www.meetup.com/CIO-IT-Executives/calendar/9528874/ Speaker: Rob Weltman has been Director of Engineering in Enterprise Software at Nescape, Chief Architect at AOL, and Director of Engineering for Yahoo's data warehouse technology. He is currently Director of Grid Services at Yahoo. Gaining and keeping a competitive edge in Internet offerings has increasingly become a matter of continuously processing enormous volumes of data about users, user activities, Web sites, ads, and Web searches. There is gold in the mountain of data but it is often impossible to extract in time to make use of it if you are constrained to a single (albeit powerful) computer or database. Hadoop (http://hadoop.apache) is open source software for creating a cluster of commodity computers from one node to several thousand nodes in size and internally managing petabytes of data. It provides a simple interface for attaching user-written code to be executed in parallel on some or all of the nodes in the cluster. As an option to creating your own Hadoop cluster, there are Hadoop AMIs (Amazon Machine Images - virtual machines) that allow you to create and run Hadoop programs on Amazon's EC2 infrastructure. Rob will talk about what Hadoop is, options for writing programs that run on a Hadoop cluster, and Yahoo use cases where Hadoop has proved beneficial in dealing with very large data volumes. -- Rob Weltman has been Director of Engineering in Enterprise Software at Nescape, Chief Architect at AOL, and Director of Engineering for Yahoo's data warehouse technology. He is currently Director of Grid Services at Yahoo. Gourmet dinner and wine are included.
Re: using HDFS for a distributed storage system
Hey Amit, Your current thoughts on keeping block size larger and removing the very small files are along the right line. Why not chose the default size of 64MB or larger? You don't seem too concerned about the number of replicas. However, you're still fighting against the tide. You've got enough files that you'll be pushing against block report and namenode limitations, especially with 20 - 50 million files. We find that about 500k blocks per node is a good stopping point right now. You really, really need to figure out how to organize your files in such a way that the average file size is above 64MB. Is there a primary key for each file? If so, maybe consider HBASE? If you just are going to be sequentially scanning through all your files, consider archiving them all to a single sequence file. Your individual data nodes are quite large ... I hope you're not expecting to measure throughput in 10's of Gbps? It's hard to give advice without knowing more about your application. I can tell you that you're going to run into a significant wall if you can't figure out a means for making your average file size at least greater than 64MB. Brian On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote: Hi Group, I am planning to use HDFS as a reliable and distributed file system for batch operations. No plans as of now to run any map reduce job on top of it, but in future we will be having map reduce operations on files in HDFS. The current (test) system has 3 machines: NameNode: dual core CPU, 2GB RAM, 500GB HDD 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of space with ext3 filesystem. I just need to put and retrieve files from this system. The files which I will put in HDFS varies from a few Bytes to a around 100MB, with the average file-size being 5MB. and the number of files would grow around 20-50 million. To avoid hitting limit of number of files under a directory, I store each file at the path derived by the SHA1 hash of its content (which is 20bytes long, and I create a 10 level deep path using 2bytes for each level). When I started the cluster a month back, I had kept the default block size to 1MB. The hardware specs mentioned at http://wiki.apache.org/hadoop/MachineScalingconsiders running map reduce operations. So not sure if my setup is good enough. I would like to get input on this setup. The final cluster would have each datanode with 8GB RAM, a quad core CPU, and 25 TB attached storage. I played with this setup a little and then planned to increase the disk space on both the DataNodes. I started by increasing its disk capacity of first dataNode to 15TB and changing the underlying filesystem to XFS (which made it a clean datanode), and put it back in the system. Before performing this operation, I had inserted around 7 files in HDFS. **NameNode:50070/dfshealth.jsp showd *677323 files and directories, 332419 blocks = 1009742 total *. I guess the way I create a 10 level deep path for the file results in ~10 times the number of actual files in HDFS. Please correct me if I am wrong. I then ran the rebalancer on the cleaned up DataNode, which was too slow (writing 2blocks per second i.e. 2MBps) to begin with and died after a few hours saying too many open files. I checked all the machiens and all the DataNode and NameNode processes were running fine on all the respective machines, but the dfshealth.jsp showd both the datanodes to be dead. Re-starting the cluster brought both of them up. I guess this has to do with RAM requirements. My question is how to figure out the RAM requirements of DataNode and NameNode in this situation. The documentation states that both Datanode and NameNode stores the block index. Its not quite clear if all the index is in memory. Once I have figured that out, how can I instruct the hadoop to rebalance with high priority? Another question is regarding the Non DFS used: statistics shown on the dfshealth.jsp: Is it the space used to store the files and directory metadata information (apart from the actual file content blocks)? Right now it is 1/4th of the total space used by HDFS. Some points which I have thought of over the last month to improve this model are: 1. I should keep very small files (lets say smaller than 1KB) out of HDFS. 2. Reduce the dir level of the file path created by SHA1 hash (instead of 10, I can keep 3). 3. I should increase the block size to reduce the number of blocks in HDFS ( http://mail-archives.apache.org/mod_mbox/hadoop-core-user/ 200805.mbox/ 4aa34eb70805180030u5de8efaam6f1e9a8832636...@mail.gmail.com says it won't result in waste of disk space). More improvement advices are appreciated. Thanks, Amit
TaskTrackers being double counted after restart job recovery
Hi, I¹m using the new persistent job state feature in 0.19.0, and it¹s worked really well so far. However, this morning my JobTracker died with and OOM error (even though the heap size is set to 768M). So I killed it and all the TaskTrackers. After starting everything up again, all my nodes were showing up twice in the JobTracker web interface, with different port numbers. Also, some of the jobs it restarted had already been completed when the job tracker died. Any idea what might be happening here ? How can I fix this ? Will temporarily setting mapred.jobtracker.restart.recover=false clear things up ? -- Stefan
Re: TaskTrackers being double counted after restart job recovery
There is a bug that when we restart the TaskTrackers they get counted twice. The problem is the name is generated from the hostname and port number. When TaskTrackers restart they get a new port number and get counted again. The problem goes away when the old TaskTrackers time out in 10 minutes or you restart the JobTracker. -- Owen
Re: can't read the SequenceFile correctly
+1 on something like getValidBytes(). Just the existence of this would warn many programmers about getBytes(). Raghu. Owen O'Malley wrote: On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote: Hey Tom, I got also burned by this ?? Why does BytesWritable.getBytes() returns non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of function. It does it because continually resizing the array to the valid length is very expensive. It would be a reasonable patch to add a getValidBytes, but most methods in Java's libraries are aware of this and let you pass in byte[], offset, and length. So once you realize what the problem is, you can work around it. -- Owen
java.io.IOException: Could not get block locations. Aborting...
Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: using HDFS for a distributed storage system
Thanks Brian for your inputs. I am eventually targeting to store 200k directories each containing 75 files on avg, with average size of directory being 300MB (ranging from 50MB to 650MB) in this storage system. It will mostly be an archival storage from where I should be able to access any of the old files easily. But the recent directories would be accessed frequently for a day or 2 as they are being added. They are added in batches of 500-1000 per week, and there can be rare bursts of adding 50k directories once in 3 months. One such burst is about to come in a month, and I want to test the current test setup against that burst. We have upgraded our test hardware a little bit from what I last mentioned. The test setup will have 3 DataNodes with 15TB space on each, 6G RAM, dual core processor, and a NameNode 500G storage, 6G RAM, dual core processor. I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Now that SequenceFile is in picture, I can make default block size to 64MB or even 128MB. For replication, I am just replicating a file at 1 extra location (i.e. replication factor = 2, since a replication factor 3 will leave me with only 33% of the usable storage). Regarding reading back from HDFS, if I can read at ~50MBps (for recent files), that would be enough. Let me know if you see any more pitfalls in this setup, or have more suggestions. I really appreciate it. Once I test this setup, I will put the results back to the list. Thanks, Amit On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey Amit, Your current thoughts on keeping block size larger and removing the very small files are along the right line. Why not chose the default size of 64MB or larger? You don't seem too concerned about the number of replicas. However, you're still fighting against the tide. You've got enough files that you'll be pushing against block report and namenode limitations, especially with 20 - 50 million files. We find that about 500k blocks per node is a good stopping point right now. You really, really need to figure out how to organize your files in such a way that the average file size is above 64MB. Is there a primary key for each file? If so, maybe consider HBASE? If you just are going to be sequentially scanning through all your files, consider archiving them all to a single sequence file. Your individual data nodes are quite large ... I hope you're not expecting to measure throughput in 10's of Gbps? It's hard to give advice without knowing more about your application. I can tell you that you're going to run into a significant wall if you can't figure out a means for making your average file size at least greater than 64MB. Brian On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote: Hi Group, I am planning to use HDFS as a reliable and distributed file system for batch operations. No plans as of now to run any map reduce job on top of it, but in future we will be having map reduce operations on files in HDFS. The current (test) system has 3 machines: NameNode: dual core CPU, 2GB RAM, 500GB HDD 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of space with ext3 filesystem. I just need to put and retrieve files from this system. The files which I will put in HDFS varies from a few Bytes to a around 100MB, with the average file-size being 5MB. and the number of files would grow around 20-50 million. To avoid hitting limit of number of files under a directory, I store each file at the path derived by the SHA1 hash of its content (which is 20bytes long, and I create a 10 level deep path using 2bytes for each level). When I started the cluster a month back, I had kept the default block size to 1MB. The hardware specs mentioned at http://wiki.apache.org/hadoop/MachineScalingconsiders running map reduce operations. So not sure if my setup is good enough. I would like to get input on this setup. The final cluster would have each datanode with 8GB RAM, a quad core CPU, and 25 TB attached storage. I played with this setup a little and then planned to increase the disk space on both the DataNodes. I started by increasing its disk capacity of first dataNode to 15TB and changing the underlying filesystem to XFS (which made it a clean datanode), and put it back in the system. Before performing this operation, I had inserted around 7 files in HDFS. **NameNode:50070/dfshealth.jsp showd *677323 files and directories, 332419 blocks = 1009742 total *. I guess the way I create a 10 level deep path for the file results in ~10 times the number of actual files in HDFS. Please correct me if I am wrong. I then ran the
Re: java.io.IOException: Could not get block locations. Aborting...
You will have to increase the per user file descriptor limit. For most linux machines the file /etc/security/limits.conf controls this on a per user basis. You will need to log in a fresh shell session after making the changes, to see them. Any login shells started before the change and process started by those shell will have the old limits. If you are opening vast numbers of files you may need to increase the per system limits, via the /etc/sysctl.conf file and the fs.file-max parameter. This page seems to be a decent reference: http://bloggerdigest.blogspot.com/2006/10/file-descriptors-vs-linux-performance.html On Mon, Feb 9, 2009 at 1:01 PM, Scott Whitecross sc...@dataxu.com wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 (DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run (DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
copyFromLocal *
I'm using the Hadoop FS shell to move files into my data store (either HDFS or S3Native). I'd like to use wildcard with copyFromLocal but this doesn't seem to work. Is there any way I can get that kind of functionality? Thanks, John
Re: using HDFS for a distributed storage system
Hey Amit, That plan sounds much better. I think you will find the system much more scalable. From our experience, it takes a while to get the right amount of monitoring and infrastructure in place to have a very dependable system with 2 replicas. I would recommend using 3 replicas until you feel you've mastered the setup. Brian On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote: Thanks Brian for your inputs. I am eventually targeting to store 200k directories each containing 75 files on avg, with average size of directory being 300MB (ranging from 50MB to 650MB) in this storage system. It will mostly be an archival storage from where I should be able to access any of the old files easily. But the recent directories would be accessed frequently for a day or 2 as they are being added. They are added in batches of 500-1000 per week, and there can be rare bursts of adding 50k directories once in 3 months. One such burst is about to come in a month, and I want to test the current test setup against that burst. We have upgraded our test hardware a little bit from what I last mentioned. The test setup will have 3 DataNodes with 15TB space on each, 6G RAM, dual core processor, and a NameNode 500G storage, 6G RAM, dual core processor. I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Now that SequenceFile is in picture, I can make default block size to 64MB or even 128MB. For replication, I am just replicating a file at 1 extra location (i.e. replication factor = 2, since a replication factor 3 will leave me with only 33% of the usable storage). Regarding reading back from HDFS, if I can read at ~50MBps (for recent files), that would be enough. Let me know if you see any more pitfalls in this setup, or have more suggestions. I really appreciate it. Once I test this setup, I will put the results back to the list. Thanks, Amit On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey Amit, Your current thoughts on keeping block size larger and removing the very small files are along the right line. Why not chose the default size of 64MB or larger? You don't seem too concerned about the number of replicas. However, you're still fighting against the tide. You've got enough files that you'll be pushing against block report and namenode limitations, especially with 20 - 50 million files. We find that about 500k blocks per node is a good stopping point right now. You really, really need to figure out how to organize your files in such a way that the average file size is above 64MB. Is there a primary key for each file? If so, maybe consider HBASE? If you just are going to be sequentially scanning through all your files, consider archiving them all to a single sequence file. Your individual data nodes are quite large ... I hope you're not expecting to measure throughput in 10's of Gbps? It's hard to give advice without knowing more about your application. I can tell you that you're going to run into a significant wall if you can't figure out a means for making your average file size at least greater than 64MB. Brian On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote: Hi Group, I am planning to use HDFS as a reliable and distributed file system for batch operations. No plans as of now to run any map reduce job on top of it, but in future we will be having map reduce operations on files in HDFS. The current (test) system has 3 machines: NameNode: dual core CPU, 2GB RAM, 500GB HDD 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of space with ext3 filesystem. I just need to put and retrieve files from this system. The files which I will put in HDFS varies from a few Bytes to a around 100MB, with the average file-size being 5MB. and the number of files would grow around 20-50 million. To avoid hitting limit of number of files under a directory, I store each file at the path derived by the SHA1 hash of its content (which is 20bytes long, and I create a 10 level deep path using 2bytes for each level). When I started the cluster a month back, I had kept the default block size to 1MB. The hardware specs mentioned at http://wiki.apache.org/hadoop/MachineScalingconsiders running map reduce operations. So not sure if my setup is good enough. I would like to get input on this setup. The final cluster would have each datanode with 8GB RAM, a quad core CPU, and 25 TB attached storage. I played with this setup a little and then planned to increase the disk space on both the DataNodes. I started by increasing its disk capacity of first dataNode to 15TB and
Backing up HDFS?
How do people back up their data that they keep on HDFS? We have many TB of data which we need to get backed up but are unclear on how to do this efficiently/reliably.
Re: Backing up HDFS?
Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz nat...@rapleaf.com wrote: How do people back up their data that they keep on HDFS? We have many TB of data which we need to get backed up but are unclear on how to do this efficiently/reliably.
Re: Backing up HDFS?
On Feb 9, 2009, at 6:41 PM, Amandeep Khurana wrote: Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... It should be. HDFS is not an archival system. Multiple replicas does not equate a backup, just like having a RAID1 or RAID5 shouldn't make you feel safe. HDFS is actively developed with lots of new features. Bugs creep in. Things can become inconsistent and mis-replicated. Even though loss due to hardware failure is small, losses due to new bugs are still possible! Brian Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz nat...@rapleaf.com wrote: How do people back up their data that they keep on HDFS? We have many TB of data which we need to get backed up but are unclear on how to do this efficiently/reliably.
Re: java.io.IOException: Could not get block locations. Aborting...
Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 (DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: Backing up HDFS?
On 2/9/09 4:41 PM, Amandeep Khurana ama...@gmail.com wrote: Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... I'm reminded of a previous job where a site administrator refused to make tape backups (despite our continual harassment and pointing out that he was in violation of the contract) because he said RAID was good enough. Then the RAID controller failed. When we couldn't recover data from the other mirror he was fired. Not sure how they ever recovered, esp. considering what the data was they lost. Hopefully they had a paper trail. To answer Nathan's question: On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz nat...@rapleaf.com wrote: How do people back up their data that they keep on HDFS? We have many TB of data which we need to get backed up but are unclear on how to do this efficiently/reliably. The content of our HDFSes is loaded from elsewhere and is not considered 'the source of authority'. It is the responsibility of the original sources to maintain backups and we then follow their policies for data retention. For user generated content, we provide *limited* (read: quota'ed) NFS space that is backed up regularly. Another strategy we take is multiple grids in multiple locations that get the data loaded simultaneously. The key here is to prioritize your data. Impossible to replicate data gets backed up using whatever means necessary, hard-to-regenerate data, next priority. Easy to regenerate and ok to nuke data, doesn't get backed up.
Hadoop Workshop for College Teaching Faculty
Hey Hadoop Fans, I wanted to call your attention to an event we're putting on next month that would be great for your academic contacts. Please take a moment and forward this to any faculty you think might be interested. http://www.cloudera.com/sigcse-2009-disc-workshop One of the big challenges to Hadoop adoption is that it requires thinking about data and computation in new ways. One of the best things we can do as a community, long term, is help educators prepare their students to work with big data using Hadoop. This is a chance to help faculty impart skills that will continue to drive Hadoop adoption for years to come. Once a year, Computer Science educators from around the world gather at the ACM's Special Interest Group for Computer Science Education: SIGCSE This year, Cloudera, is hosting a day long workshop at SIGCSE to introduce faculty to the MapReduce programming model, demonstrate how to integrate material into various types of courses, and go over some great sample projects for Hadoop. We'll also go over technical logistics around spinning up clusters on EC2 and getting free credits from Amazon for classroom use. A lot of this material is based on past work we have done with the National Science Foundation. That link again: http://www.cloudera.com/sigcse-2009-disc-workshop There is no charge for this event, and we'd love to see all your favorite computer science teachers there. Cheers, Christophe
Re: java.io.IOException: Could not get block locations. Aborting...
On Feb 9, 2009, at 7:50 PM, jason hadoop wrote: The other issue you may run into, with many files in your HDFS is that you may end up with more than a few 100k worth of blocks on each of your datanodes. At present this can lead to instability due to the way the periodic block reports to the namenode are handled. The more blocks per datanode, the larger the risk of congestion collapse in your hdfs. Of course, if you stay below, say, 500k, you don't have much of a risk of congestion. In our experience, 500k blocks or less is going to be fine with decent hardware. Between 500k and 750k, you will hit a wall somewhere depending on your hardware. Good luck getting anything above 750k. The recommendation is that you keep this number as low as possible -- and explore the limits of your system and hardware in testing before you discover them in production :) Brian On Mon, Feb 9, 2009 at 5:11 PM, Bryan Duxbury br...@rapleaf.com wrote: Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: copyFromLocal *
Which version of hadoop are you using. I think from 0.18 or 0.19 copyFromLocal accepts multiple files as input but destination should be a directory. Lohit - Original Message From: S D sd.codewarr...@gmail.com To: Hadoop Mailing List core-user@hadoop.apache.org Sent: Monday, February 9, 2009 3:34:22 PM Subject: copyFromLocal * I'm using the Hadoop FS shell to move files into my data store (either HDFS or S3Native). I'd like to use wildcard with copyFromLocal but this doesn't seem to work. Is there any way I can get that kind of functionality? Thanks, John
Re: using HDFS for a distributed storage system
I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Have you considered Hadoop archive? http://hadoop.apache.org/core/docs/current/hadoop_archives.html Depending on your access pattern, you could store files in archive step in the first place. - Original Message From: Brian Bockelman bbock...@cse.unl.edu To: core-user@hadoop.apache.org Sent: Monday, February 9, 2009 4:00:42 PM Subject: Re: using HDFS for a distributed storage system Hey Amit, That plan sounds much better. I think you will find the system much more scalable. From our experience, it takes a while to get the right amount of monitoring and infrastructure in place to have a very dependable system with 2 replicas. I would recommend using 3 replicas until you feel you've mastered the setup. Brian On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote: Thanks Brian for your inputs. I am eventually targeting to store 200k directories each containing 75 files on avg, with average size of directory being 300MB (ranging from 50MB to 650MB) in this storage system. It will mostly be an archival storage from where I should be able to access any of the old files easily. But the recent directories would be accessed frequently for a day or 2 as they are being added. They are added in batches of 500-1000 per week, and there can be rare bursts of adding 50k directories once in 3 months. One such burst is about to come in a month, and I want to test the current test setup against that burst. We have upgraded our test hardware a little bit from what I last mentioned. The test setup will have 3 DataNodes with 15TB space on each, 6G RAM, dual core processor, and a NameNode 500G storage, 6G RAM, dual core processor. I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Now that SequenceFile is in picture, I can make default block size to 64MB or even 128MB. For replication, I am just replicating a file at 1 extra location (i.e. replication factor = 2, since a replication factor 3 will leave me with only 33% of the usable storage). Regarding reading back from HDFS, if I can read at ~50MBps (for recent files), that would be enough. Let me know if you see any more pitfalls in this setup, or have more suggestions. I really appreciate it. Once I test this setup, I will put the results back to the list. Thanks, Amit On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey Amit, Your current thoughts on keeping block size larger and removing the very small files are along the right line. Why not chose the default size of 64MB or larger? You don't seem too concerned about the number of replicas. However, you're still fighting against the tide. You've got enough files that you'll be pushing against block report and namenode limitations, especially with 20 - 50 million files. We find that about 500k blocks per node is a good stopping point right now. You really, really need to figure out how to organize your files in such a way that the average file size is above 64MB. Is there a primary key for each file? If so, maybe consider HBASE? If you just are going to be sequentially scanning through all your files, consider archiving them all to a single sequence file. Your individual data nodes are quite large ... I hope you're not expecting to measure throughput in 10's of Gbps? It's hard to give advice without knowing more about your application. I can tell you that you're going to run into a significant wall if you can't figure out a means for making your average file size at least greater than 64MB. Brian On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote: Hi Group, I am planning to use HDFS as a reliable and distributed file system for batch operations. No plans as of now to run any map reduce job on top of it, but in future we will be having map reduce operations on files in HDFS. The current (test) system has 3 machines: NameNode: dual core CPU, 2GB RAM, 500GB HDD 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of space with ext3 filesystem. I just need to put and retrieve files from this system. The files which I will put in HDFS varies from a few Bytes to a around 100MB, with the average file-size being 5MB. and the number of files would grow around 20-50 million. To avoid hitting limit of number of files under a directory, I store each file
Re: using HDFS for a distributed storage system
Yo, I don't want to sound all spammy, but Tom White wrote a pretty nice blog post about small files in HDFS recently that you might find helpful. The post covers some potential solutions, including Hadoop Archives: http://www.cloudera.com/blog/2009/02/02/the-small-files-problem. Later, Jeff On Mon, Feb 9, 2009 at 6:14 PM, lohit lohit...@yahoo.com wrote: I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Have you considered Hadoop archive? http://hadoop.apache.org/core/docs/current/hadoop_archives.html Depending on your access pattern, you could store files in archive step in the first place. - Original Message From: Brian Bockelman bbock...@cse.unl.edu To: core-user@hadoop.apache.org Sent: Monday, February 9, 2009 4:00:42 PM Subject: Re: using HDFS for a distributed storage system Hey Amit, That plan sounds much better. I think you will find the system much more scalable. From our experience, it takes a while to get the right amount of monitoring and infrastructure in place to have a very dependable system with 2 replicas. I would recommend using 3 replicas until you feel you've mastered the setup. Brian On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote: Thanks Brian for your inputs. I am eventually targeting to store 200k directories each containing 75 files on avg, with average size of directory being 300MB (ranging from 50MB to 650MB) in this storage system. It will mostly be an archival storage from where I should be able to access any of the old files easily. But the recent directories would be accessed frequently for a day or 2 as they are being added. They are added in batches of 500-1000 per week, and there can be rare bursts of adding 50k directories once in 3 months. One such burst is about to come in a month, and I want to test the current test setup against that burst. We have upgraded our test hardware a little bit from what I last mentioned. The test setup will have 3 DataNodes with 15TB space on each, 6G RAM, dual core processor, and a NameNode 500G storage, 6G RAM, dual core processor. I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Now that SequenceFile is in picture, I can make default block size to 64MB or even 128MB. For replication, I am just replicating a file at 1 extra location (i.e. replication factor = 2, since a replication factor 3 will leave me with only 33% of the usable storage). Regarding reading back from HDFS, if I can read at ~50MBps (for recent files), that would be enough. Let me know if you see any more pitfalls in this setup, or have more suggestions. I really appreciate it. Once I test this setup, I will put the results back to the list. Thanks, Amit On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.edu wrote: Hey Amit, Your current thoughts on keeping block size larger and removing the very small files are along the right line. Why not chose the default size of 64MB or larger? You don't seem too concerned about the number of replicas. However, you're still fighting against the tide. You've got enough files that you'll be pushing against block report and namenode limitations, especially with 20 - 50 million files. We find that about 500k blocks per node is a good stopping point right now. You really, really need to figure out how to organize your files in such a way that the average file size is above 64MB. Is there a primary key for each file? If so, maybe consider HBASE? If you just are going to be sequentially scanning through all your files, consider archiving them all to a single sequence file. Your individual data nodes are quite large ... I hope you're not expecting to measure throughput in 10's of Gbps? It's hard to give advice without knowing more about your application. I can tell you that you're going to run into a significant wall if you can't figure out a means for making your average file size at least greater than 64MB. Brian On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote: Hi Group, I am planning to use HDFS as a reliable and distributed file system for batch operations. No plans as of now to run any map reduce job on top of it, but in future we will be having map reduce operations on files in HDFS. The current (test) system has 3 machines:
Re: using HDFS for a distributed storage system
It is a good and useful overview,thank you. It also mentions Stuart Sierra's post, where Stuart mentions that the process is slow. Does anybody know why? I have written code to write from the PC file system to HDFS, and I also noticed that it is very slow. Instead of 40M/sec, as promised by the Tom White's book, it seems to be 40 sec/Meg. Stuart's tars would work about 5 times faster. But still, why is it so slow? Is there a way to speed this up? Thanks! Mark On Mon, Feb 9, 2009 at 8:35 PM, Jeff Hammerbacher ham...@cloudera.comwrote: Yo, I don't want to sound all spammy, but Tom White wrote a pretty nice blog post about small files in HDFS recently that you might find helpful. The post covers some potential solutions, including Hadoop Archives: http://www.cloudera.com/blog/2009/02/02/the-small-files-problem. Later, Jeff On Mon, Feb 9, 2009 at 6:14 PM, lohit lohit...@yahoo.com wrote: I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Have you considered Hadoop archive? http://hadoop.apache.org/core/docs/current/hadoop_archives.html Depending on your access pattern, you could store files in archive step in the first place. - Original Message From: Brian Bockelman bbock...@cse.unl.edu To: core-user@hadoop.apache.org Sent: Monday, February 9, 2009 4:00:42 PM Subject: Re: using HDFS for a distributed storage system Hey Amit, That plan sounds much better. I think you will find the system much more scalable. From our experience, it takes a while to get the right amount of monitoring and infrastructure in place to have a very dependable system with 2 replicas. I would recommend using 3 replicas until you feel you've mastered the setup. Brian On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote: Thanks Brian for your inputs. I am eventually targeting to store 200k directories each containing 75 files on avg, with average size of directory being 300MB (ranging from 50MB to 650MB) in this storage system. It will mostly be an archival storage from where I should be able to access any of the old files easily. But the recent directories would be accessed frequently for a day or 2 as they are being added. They are added in batches of 500-1000 per week, and there can be rare bursts of adding 50k directories once in 3 months. One such burst is about to come in a month, and I want to test the current test setup against that burst. We have upgraded our test hardware a little bit from what I last mentioned. The test setup will have 3 DataNodes with 15TB space on each, 6G RAM, dual core processor, and a NameNode 500G storage, 6G RAM, dual core processor. I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Now that SequenceFile is in picture, I can make default block size to 64MB or even 128MB. For replication, I am just replicating a file at 1 extra location (i.e. replication factor = 2, since a replication factor 3 will leave me with only 33% of the usable storage). Regarding reading back from HDFS, if I can read at ~50MBps (for recent files), that would be enough. Let me know if you see any more pitfalls in this setup, or have more suggestions. I really appreciate it. Once I test this setup, I will put the results back to the list. Thanks, Amit On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.edu wrote: Hey Amit, Your current thoughts on keeping block size larger and removing the very small files are along the right line. Why not chose the default size of 64MB or larger? You don't seem too concerned about the number of replicas. However, you're still fighting against the tide. You've got enough files that you'll be pushing against block report and namenode limitations, especially with 20 - 50 million files. We find that about 500k blocks per node is a good stopping point right now. You really, really need to figure out how to organize your files in such a way that the average file size is above 64MB. Is there a primary key for each file? If so, maybe consider HBASE? If you just are going to be sequentially scanning through all your files, consider archiving them all to a single sequence file. Your individual data nodes are
Re: java.io.IOException: Could not get block locations. Aborting...
I tried modifying the settings, and I'm still running into the same issue. I increased the xceivers count (fs.datanode.max.xcievers) in the hadoop-site.xml file. I also checked to make sure the file handles were increased, but they were fairly high to begin with. I don't think I'm dealing with anything out of the ordinary either. I'm process three large 'log' files, totaling around 5 GB, and producing around 8000 output files after some data processing, probably totals 6 or 7 gig. In the past, I've produced a lot fewer files, and that has been fine. When I change the process to output to just a few files, no problem again. Anything else beyond the limits? Is HDFS creating a substantial amount of temp files as well? On Feb 9, 2009, at 8:11 PM, Bryan Duxbury wrote: Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: lost TaskTrackers
I am starting to wonder If hadoop 19 stable enough for production? Vadim On 2/9/09, Vadim Zaliva kroko...@gmail.com wrote: yes, I can access DFS from the cluster. namenode status seems to be OK and I see no errors in namenode log files. initially all trackers were visible, and 9433 maps completed successfully. Then, this was followed by 65975 which were killed. In log they all show same error: Error initializing attempt_200902081049_0001_m_004499_1: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) at org.apache.hadoop.ipc.Client.call(Client.java:686) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602) While this is happening, I can access Job tracker web interface, but it shows that there is 0 nodes in the cluster. I have tried to run this task several times and the result is always the same. It works at first and then starts failing. Vadim On Sun, Feb 8, 2009 at 22:19, Amar Kamat ama...@yahoo-inc.com wrote: Vadim Zaliva wrote: Hi! I am observing strange situation in my Hadoop cluster. While running task, eventually it gets into this strange mode where: 1. JobTracker reports 0 task trackers. 2. Task tracker processes are alive but log file is full of repeating messages like this: 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902 081049_0001_m_017698_0 done; removing files. 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt _200902081049_0001_m_017698_0 not found in cache 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902 081049_0001_m_021212_0 done; removing files. 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt _200902081049_0001_m_021212_0 not found in cache 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200902 081049_0001_m_022133_0 done; removing files. with new one appearing every couple of seconds. In the task tracker log, before these repeating messages last 2 exceptions are: 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200902081049_0001_m_075408_3 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200902081049_0001_m_075408_3 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 8 and trying to launch attempt_200902081049_0001_m_07 5408_3 2009-02-08 17:46:51,483 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_200902081049_0001_m_075408_3: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) at org.apache.hadoop.ipc.Client.call(Client.java:686) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602) Looks like an RPC issue. Can you tell more about the
Re: TaskTrackers being double counted after restart job recovery
Stefan Will wrote: Hi, I¹m using the new persistent job state feature in 0.19.0, and it¹s worked really well so far. However, this morning my JobTracker died with and OOM error (even though the heap size is set to 768M). So I killed it and all the TaskTrackers. Any specific reason why you killed the task-trackers? Ideally the JobTracker should be restarted and the task-trackers will join. After starting everything up again, all my nodes were showing up twice in the JobTracker web interface, with different port numbers. Owen is correct. Since the state is rebuild from history, the old tracker information is obtained from the history and hence the double. Killing the tasktracker after killing the jobtracker is like losing the tasktracker while the jobtracker is down. Upon restart the jobtracker assumes that the tasktracker mentioned in the history is still available and waits for it to re-connect. After the expiry interval (default 10 mins), the old tracker will be removed. Also, some of the jobs it restarted had already been completed when the job tracker died. Old job detection happens via the system directory. When a job is submitted, its info (job.xml etc) is copied to the mapred system dir and upon completion its removed from there. Upon restart, the mapred-system-dir is checked to see what all jobs needs to be resumed/re-run. So if the job folder/info is not cleaned up from the system-dir then the job will be resumed. But if the job was complete then the job logs should mention that its complete and hence upon restart it will simple finish/complete the job without even running any task. Are you seeing something different here? Look at jobtracker logs to see what is happening in the recovery. The line Restoration complete marks the end of recovery. Any idea what might be happening here ? How can I fix this ? Will temporarily setting mapred.jobtracker.restart.recover=false clear things up ? You can manually delete job files from mapred.system.dir to avoid resuming that job. Amar -- Stefan
what's going on :( ?
Hi, Hi, why is hadoop suddenly telling me Retrying connect to server: localhost/127.0.0.1:8020 with this configuration configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namemapred.job.tracker/name valuelocalhost:9001/value /property property namedfs.replication/name value1/value /property /configuration and both this http://localhost:50070/dfshealth.jsp and this http://localhost:50030/jobtracker.jsp links work fine? Thank you, Mark
Re: what's going on :( ?
Mark Kerzner wrote: Hi, Hi, why is hadoop suddenly telling me Retrying connect to server: localhost/127.0.0.1:8020 with this configuration configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namemapred.job.tracker/name valuelocalhost:9001/value Shouldnt this be valuehdfs://localhost:9001/value Amar /property property namedfs.replication/name value1/value /property /configuration and both this http://localhost:50070/dfshealth.jsp and this http://localhost:50030/jobtracker.jsp links work fine? Thank you, Mark