RE: More Replication on dfs
Hi I also tried the command $ bin/hadoop balancer. But still the same problem. Aseem -Original Message- From: Puri, Aseem [mailto:aseem.p...@honeywell.com] Sent: Friday, April 10, 2009 11:18 AM To: core-user@hadoop.apache.org Subject: RE: More Replication on dfs Hi Alex, Thanks for sharing your knowledge. Till now I have three machines and I have to check the behavior of Hadoop so I want replication factor should be 2. I started my Hadoop server with replication factor 3. After that I upload 3 files to implement word count program. But as my all files are stored on one machine and replicated to other datanodes also, so my map reduce program takes input from one Datanode only. I want my files to be on different data node so to check functionality of map reduce properly. Also before starting my Hadoop server again with replication factor 2 I formatted all Datanodes and deleted all old data manually. Please suggest what I should do now. Regards, Aseem Puri -Original Message- From: Mithila Nagendra [mailto:mnage...@asu.edu] Sent: Friday, April 10, 2009 10:56 AM To: core-user@hadoop.apache.org Subject: Re: More Replication on dfs To add to the question, how does one decide what is the optimal replication factor for a cluster. For instance what would be the appropriate replication factor for a cluster consisting of 5 nodes. Mithila On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard a...@cloudera.com wrote: Did you load any files when replication was set to 3? If so, you'll have to rebalance: http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance r http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc er Note that most people run HDFS with a replication factor of 3. There have been cases when clusters running with a replication of 2 discovered new bugs, because replication is so often set to 3. That said, if you can do it, it's probably advisable to run with a replication factor of 3 instead of 2. Alex On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem aseem.p...@honeywell.com wrote: Hi I am a new Hadoop user. I have a small cluster with 3 Datanodes. In hadoop-site.xml values of dfs.replication property is 2 but then also it is replicating data on 3 machines. Please tell why is it happening? Regards, Aseem Puri
Re: Can we somehow read from the HDFS without converting it to local?
Not sure if this is what you're looking for... http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample On Thu, Apr 9, 2009 at 10:56 PM, Sid123 itis...@gmail.com wrote: I need to reuse the O/P of my DFS file without copying to local. Is there a way? -- View this message in context: http://www.nabble.com/Can-we-somehow-read-from-the-HDFS-without-converting-it-to-local--tp22982760p22982760.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Add second partition to HDFS
Hi, On a particular datanode I have /dev/sda1 (root partition) which hdfs shows properly. Now I have added another disk which is appearing as /dev/sdb1, what changes I need to make in hadoop-site.xml so that the second disk is also used ? --Harshal
Re: HDFS read/write speeds, and read optimization
Hi. Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: http://code.google.com/p/hypertable/wiki/KFSvsHDFS From this comparison it seems KFS is quite faster then HDFS for small data transfers (for SQL commands). Any idea if same holds true for small-medium (20Mb - 150 MB) files? 2) If the chunk server is located on same host as the client, is there any optimization in read operations? For example, Kosmos FS describe the following functionality: Localhost optimization: One copy of data is placed on the chunkserver on the same host as the client doing the write Helps reduce network traffic In Hadoop-speak, we're interested in DataNodes (storage nodes) and TaskTrackers (compute nodes). In terms of MapReduce, Hadoop does try and schedule tasks such that the data being processed by a given task on a given machine is also on that machine. As for loading data onto a DataNode, loading data from a DataNode will put a replica on that node. However, if you're loading data from, say, your local machine, Hadoop will choose a DataNode at random. Ah, so if DataNode will store file to HDFS, it would try to place a replica on this same DataNode as well? And then if this DataNode would try to read the file. HDFS would try to read it first from itself first? Regards.
Re: HDFS read/write speeds, and read optimization
Hi. Depends. What hardware? How much hardware? Is the cluster under load? What does your I/O load look like? As a rule of thumb, you'll probably expect very close to hardware speed. Standard Xeon dual cpu, quad core servers, 4 GB RAM. The DataNodes also do some processing, with usual loads about ~4 (from 8 recommended). The IO load is linear, there are almost no write or read peaks. By close to hardware speed, you mean results very near the results I get via iozone? Thanks.
Re: Add second partition to HDFS
Add your second disk name in dfs.data.dir . Refer - http://hadoop.apache.org/core/docs/r0.19.1/cluster_setup.html dfs.data.dir = Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. -- Ravi On 4/10/09 6:01 AM, Harshal p.hars...@gmail.com wrote: Hi, On a particular datanode I have /dev/sda1 (root partition) which hdfs shows properly. Now I have added another disk which is appearing as /dev/sdb1, what changes I need to make in hadoop-site.xml so that the second disk is also used ? --Harshal Ravi --
Does the HDFS client read the data from NameNode, or from DataNode directly?
Hi. I wanted to verify a point about HDFS client operations: When asking for file, is the all communication done through the NameNode? Or after being pointed to correct DataNode, does the HDFS works directly against it? Also, NameNode provides a URL named streamFile which allows any HTTP client to get the stored files. Any idea how it's operations compare in terms of speed to client HDFS access? Regards.
Thin version of Hadoop jar for client
Hi. Is there any thin version of Hadoop jar, specifically for Java client? Regards.
Re: HDFS read/write speeds, and read optimization
On Thu, Apr 9, 2009 at 9:30 PM, Brian Bockelman bbock...@cse.unl.edu wrote: On Apr 9, 2009, at 5:45 PM, Stas Oskin wrote: Hi. I have 2 questions about HDFS performance: 1) How fast are the read and write operations over network, in Mbps per second? Depends. What hardware? How much hardware? Is the cluster under load? What does your I/O load look like? As a rule of thumb, you'll probably expect very close to hardware speed. For comparison, on a 1400 node cluster, I can checksum 100 TB in around 10 minutes, which means I'm seeing read averages of roughly 166 GB/sec. For writes with replication of 3, I see roughly 40-50 minutes to write 100TB, so roughly 33 GB/sec average. Of course the peaks are much higher. Each node has 4 SATA disks, dual quad core, and 8 GB of ram. -- Owen
Re: HDFS read/write speeds, and read optimization
Thanks for sharing. For comparison, on a 1400 node cluster, I can checksum 100 TB in around 10 minutes, which means I'm seeing read averages of roughly 166 GB/sec. For writes with replication of 3, I see roughly 40-50 minutes to write 100TB, so roughly 33 GB/sec average. Of course the peaks are much higher. Each node has 4 SATA disks, dual quad core, and 8 GB of ram. From your experience, how RAM hungry HDFS is? Meaning, additional 4GB or ram (to make it 8GB aas in your case), really change anything? Regards.
Two degrees of replications reliability
Hi. I know that there were some hard to find bugs with replication set to 2, which caused data loss to HDFS users. Was there any progress with these issues, and if there any fixes which were introduced? Regards.
Re: HDFS read/write speeds, and read optimization
On Apr 10, 2009, at 9:07 AM, Stas Oskin wrote: From your experience, how RAM hungry HDFS is? Meaning, additional 4GB or ram (to make it 8GB aas in your case), really change anything? I don't think the 4 to 8GB would matter much for HDFS. For Map/Reduce, it is very important. -- Owen
Re: Two degrees of replications reliability
Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going to be even better. We run about 300TB @ 2 replicas, and haven't had file loss that was Hadoop's fault since about January. Brian On Apr 10, 2009, at 11:11 AM, Stas Oskin wrote: Hi. I know that there were some hard to find bugs with replication set to 2, which caused data loss to HDFS users. Was there any progress with these issues, and if there any fixes which were introduced? Regards.
Re: [Interesting] One reducer randomly hangs on getting 0 mapper output
Does anybody have a clue? Thanks lot. --- On Thu, 4/9/09, Steve Gao steve@yahoo.com wrote: From: Steve Gao steve@yahoo.com Subject: [Interesting] One reducer randomly hangs on getting 0 mapper output To: core-user@hadoop.apache.org Date: Thursday, April 9, 2009, 6:04 PM I have hadoop jobs with the last 1 reducer randomly hangs on getting 0 mapper output. By randomly I mean the job sometimes works correctly, sometimes their last 1 reducer keeps reading map output but always gets 0 data. It would hang up to 100 hours for getting 0 data until I kill it. After I kill and re-run it, it could run correctly. The hung reducer could happen on any machine of my cluster. I attach the tail of the problematic reducer's log here. Does anybody have a hint what happened? syslog logs 2009-04-09 21:57:46,445 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0 Need 15 map output(s) 2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0: Got 0 new map-outputs 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures 2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0 Got 0 known map output location(s); scheduling... 2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2009-04-09 21:57:51,453 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0 Need 15 map output(s) 2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0: Got 0 new map-outputs 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures 2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0 Got 0 known map output location(s); scheduling... 2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask: task_200902022141_50382_r_08_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) ... (forever)
Re: More Replication on dfs
Mithila, Most people run with a replication of 3. 3 replicas gets you one local copy, one copy on a different rack, and an additional copy on that same rack. Quantcast gave a talk at a user group a while ago about physically moving a data center from one colo to another. Turning off machines and moving them increases the probability of those machines going down, so Quantcast upped their replication to 7, I believe. Once the move was done, they lowered their replication back to whatever it was set to previously, which I think was 3. So anyway, a replication factor of 3 is totally sufficient, unless you've come across a particular case when node failure is higher than normal, for example maybe if you're running super unreliable hardware. Alex On Thu, Apr 9, 2009 at 10:26 PM, Mithila Nagendra mnage...@asu.edu wrote: To add to the question, how does one decide what is the optimal replication factor for a cluster. For instance what would be the appropriate replication factor for a cluster consisting of 5 nodes. Mithila On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard a...@cloudera.com wrote: Did you load any files when replication was set to 3? If so, you'll have to rebalance: http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balancer http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalancer Note that most people run HDFS with a replication factor of 3. There have been cases when clusters running with a replication of 2 discovered new bugs, because replication is so often set to 3. That said, if you can do it, it's probably advisable to run with a replication factor of 3 instead of 2. Alex On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem aseem.p...@honeywell.com wrote: Hi I am a new Hadoop user. I have a small cluster with 3 Datanodes. In hadoop-site.xml values of dfs.replication property is 2 but then also it is replicating data on 3 machines. Please tell why is it happening? Regards, Aseem Puri
Re: More Replication on dfs
Aseem, How are you verifying that blocks are not being replicated? Have you ran fsck? *bin/hadoop fsck /* I'd be surprised if replication really wasn't happening. Can you run fsck and pay attention to Under-replicated blocks and Mis-replicated blocks? In fact, can you just copy-paste the output of fsck? Alex On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem aseem.p...@honeywell.comwrote: Hi I also tried the command $ bin/hadoop balancer. But still the same problem. Aseem -Original Message- From: Puri, Aseem [mailto:aseem.p...@honeywell.com] Sent: Friday, April 10, 2009 11:18 AM To: core-user@hadoop.apache.org Subject: RE: More Replication on dfs Hi Alex, Thanks for sharing your knowledge. Till now I have three machines and I have to check the behavior of Hadoop so I want replication factor should be 2. I started my Hadoop server with replication factor 3. After that I upload 3 files to implement word count program. But as my all files are stored on one machine and replicated to other datanodes also, so my map reduce program takes input from one Datanode only. I want my files to be on different data node so to check functionality of map reduce properly. Also before starting my Hadoop server again with replication factor 2 I formatted all Datanodes and deleted all old data manually. Please suggest what I should do now. Regards, Aseem Puri -Original Message- From: Mithila Nagendra [mailto:mnage...@asu.edu] Sent: Friday, April 10, 2009 10:56 AM To: core-user@hadoop.apache.org Subject: Re: More Replication on dfs To add to the question, how does one decide what is the optimal replication factor for a cluster. For instance what would be the appropriate replication factor for a cluster consisting of 5 nodes. Mithila On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard a...@cloudera.com wrote: Did you load any files when replication was set to 3? If so, you'll have to rebalance: http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance r http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc er Note that most people run HDFS with a replication factor of 3. There have been cases when clusters running with a replication of 2 discovered new bugs, because replication is so often set to 3. That said, if you can do it, it's probably advisable to run with a replication factor of 3 instead of 2. Alex On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem aseem.p...@honeywell.com wrote: Hi I am a new Hadoop user. I have a small cluster with 3 Datanodes. In hadoop-site.xml values of dfs.replication property is 2 but then also it is replicating data on 3 machines. Please tell why is it happening? Regards, Aseem Puri
Re: Add second partition to HDFS
Make sure you bounce the datanode daemon once you change the configuration file as well. Alex On Fri, Apr 10, 2009 at 8:23 AM, Ravi Phulari rphul...@yahoo-inc.comwrote: Add your second disk name in dfs.data.dir . Refer - http://hadoop.apache.org/core/docs/r0.19.1/cluster_setup.html dfs.data.dir = Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. -- Ravi On 4/10/09 6:01 AM, Harshal p.hars...@gmail.com wrote: Hi, On a particular datanode I have /dev/sda1 (root partition) which hdfs shows properly. Now I have added another disk which is appearing as /dev/sdb1, what changes I need to make in hadoop-site.xml so that the second disk is also used ? --Harshal Ravi --
Multithreaded Reducer
Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output sorted. Can u pl point to me any other issues or it would be safe to do so -Sagar
Re: Thin version of Hadoop jar for client
Not currently, sorry. - Aaron On Fri, Apr 10, 2009 at 8:35 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. Is there any thin version of Hadoop jar, specifically for Java client? Regards.
Re: More Replication on dfs
Changing the default replication in hadoop-site.xml does not affect files already loaded into HDFS. File replication factor is controlled on a per-file basis. You need to use the command `hadoop fs -setrep n path...` to set the replication factor to n for a particular path already present in HDFS. It can also take a -R for recursive. - Aaron On Fri, Apr 10, 2009 at 10:34 AM, Alex Loddengaard a...@cloudera.comwrote: Aseem, How are you verifying that blocks are not being replicated? Have you ran fsck? *bin/hadoop fsck /* I'd be surprised if replication really wasn't happening. Can you run fsck and pay attention to Under-replicated blocks and Mis-replicated blocks? In fact, can you just copy-paste the output of fsck? Alex On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem aseem.p...@honeywell.com wrote: Hi I also tried the command $ bin/hadoop balancer. But still the same problem. Aseem -Original Message- From: Puri, Aseem [mailto:aseem.p...@honeywell.com] Sent: Friday, April 10, 2009 11:18 AM To: core-user@hadoop.apache.org Subject: RE: More Replication on dfs Hi Alex, Thanks for sharing your knowledge. Till now I have three machines and I have to check the behavior of Hadoop so I want replication factor should be 2. I started my Hadoop server with replication factor 3. After that I upload 3 files to implement word count program. But as my all files are stored on one machine and replicated to other datanodes also, so my map reduce program takes input from one Datanode only. I want my files to be on different data node so to check functionality of map reduce properly. Also before starting my Hadoop server again with replication factor 2 I formatted all Datanodes and deleted all old data manually. Please suggest what I should do now. Regards, Aseem Puri -Original Message- From: Mithila Nagendra [mailto:mnage...@asu.edu] Sent: Friday, April 10, 2009 10:56 AM To: core-user@hadoop.apache.org Subject: Re: More Replication on dfs To add to the question, how does one decide what is the optimal replication factor for a cluster. For instance what would be the appropriate replication factor for a cluster consisting of 5 nodes. Mithila On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard a...@cloudera.com wrote: Did you load any files when replication was set to 3? If so, you'll have to rebalance: http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance r http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc er Note that most people run HDFS with a replication factor of 3. There have been cases when clusters running with a replication of 2 discovered new bugs, because replication is so often set to 3. That said, if you can do it, it's probably advisable to run with a replication factor of 3 instead of 2. Alex On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem aseem.p...@honeywell.com wrote: Hi I am a new Hadoop user. I have a small cluster with 3 Datanodes. In hadoop-site.xml values of dfs.replication property is 2 but then also it is replicating data on 3 machines. Please tell why is it happening? Regards, Aseem Puri
Re: Multithreaded Reducer
Rather than implementing a multi-threaded reducer, why not simply increase the number of reducer tasks per machine via mapred.tasktracker.reduce.tasks.maximum, and increase the total number of reduce tasks per job via mapred.reduce.tasks to ensure that they're all filled. This will effectively utilize a higher number of cores. - Aaron On Fri, Apr 10, 2009 at 11:12 AM, Sagar Naik sn...@attributor.com wrote: Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output sorted. Can u pl point to me any other issues or it would be safe to do so -Sagar
Re: Multithreaded Reducer
Two things - multi-threaded is preferred over multi-processes. The process I m planning is IO bound so I can really take advantage of multi-threads (100 threads) - Correct me if I m wrong. The next MR_JOB in the pipeline will have increased number of splits to process as the number of reducer-outputs (from prev job) have increased . This leads to increase in the map-task completion time. -Sagar Aaron Kimball wrote: Rather than implementing a multi-threaded reducer, why not simply increase the number of reducer tasks per machine via mapred.tasktracker.reduce.tasks.maximum, and increase the total number of reduce tasks per job via mapred.reduce.tasks to ensure that they're all filled. This will effectively utilize a higher number of cores. - Aaron On Fri, Apr 10, 2009 at 11:12 AM, Sagar Naik sn...@attributor.com wrote: Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output sorted. Can u pl point to me any other issues or it would be safe to do so -Sagar
Re: Does the HDFS client read the data from NameNode, or from DataNode directly?
Thanks, this is what I thought. Regards. 2009/4/10 Alex Loddengaard a...@cloudera.com Data is streamed directly from the data nodes themselves. The name node is only queried for block locations and other meta data. Alex On Fri, Apr 10, 2009 at 8:33 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. I wanted to verify a point about HDFS client operations: When asking for file, is the all communication done through the NameNode? Or after being pointed to correct DataNode, does the HDFS works directly against it? Also, NameNode provides a URL named streamFile which allows any HTTP client to get the stored files. Any idea how it's operations compare in terms of speed to client HDFS access? Regards.
Re: HDFS read/write speeds, and read optimization
Hi. Depends on what kind of I/O you do - are you going to be using MapReduce and co-locating jobs and data? If so, it's possible to get close to those speeds if you are I/O bound in your job and read right through each chunk. If you have multiple disks mounted individually, you'll need the number of streams equal to the number of disks. If you're going to do I/O that's not through MapReduce, you'll probably be bound by the network interface. Btw, this what I wanted to ask as well: Is it more efficient to unify the disks into one volume (RAID or LVM), and then present them as a single space? Or it's better to specify each disk separately? Reliability-wise, the latter sounds more correct, as a single/several (up to 3) disks going down won't take the whole node with them. But perhaps there is a performance penalty?
Re: Two degrees of replications reliability
2009/4/10 Brian Bockelman bbock...@cse.unl.edu Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going to be even better. We run about 300TB @ 2 replicas, and haven't had file loss that was Hadoop's fault since about January. Brian And you running 0.19.1? Regards.
Re: Multithreaded Reducer
At that level of parallelism, you're right that the process overhead would be too high. - Aaron On Fri, Apr 10, 2009 at 11:36 AM, Sagar Naik sn...@attributor.com wrote: Two things - multi-threaded is preferred over multi-processes. The process I m planning is IO bound so I can really take advantage of multi-threads (100 threads) - Correct me if I m wrong. The next MR_JOB in the pipeline will have increased number of splits to process as the number of reducer-outputs (from prev job) have increased . This leads to increase in the map-task completion time. -Sagar Aaron Kimball wrote: Rather than implementing a multi-threaded reducer, why not simply increase the number of reducer tasks per machine via mapred.tasktracker.reduce.tasks.maximum, and increase the total number of reduce tasks per job via mapred.reduce.tasks to ensure that they're all filled. This will effectively utilize a higher number of cores. - Aaron On Fri, Apr 10, 2009 at 11:12 AM, Sagar Naik sn...@attributor.com wrote: Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output sorted. Can u pl point to me any other issues or it would be safe to do so -Sagar
Re: Two degrees of replications reliability
Actually, now I remember that you posted some time ago about your University loosing about 300 files. So since then the situation has improved I presume? 2009/4/10 Stas Oskin stas.os...@gmail.com 2009/4/10 Brian Bockelman bbock...@cse.unl.edu Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going to be even better. We run about 300TB @ 2 replicas, and haven't had file loss that was Hadoop's fault since about January. Brian And you running 0.19.1? Regards.
Re: Two degrees of replications reliability
On Apr 10, 2009, at 1:54 PM, Stas Oskin wrote: Actually, now I remember that you posted some time ago about your University loosing about 300 files. So since then the situation has improved I presume? Yup! The only files we lose now are due to multiple simultaneous hardware loss. Since January: 11 files to accidentally reformatting 2 nodes at once, 35 to a night with 2 dead nodes. Make no mistake - HDFS with 2 replicas is *not* an archive-quality file system. HDFS does not replace tape storage for long term storage. Brian 2009/4/10 Stas Oskin stas.os...@gmail.com 2009/4/10 Brian Bockelman bbock...@cse.unl.edu Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going to be even better. We run about 300TB @ 2 replicas, and haven't had file loss that was Hadoop's fault since about January. Brian And you running 0.19.1? Regards.
Re: Two degrees of replications reliability
On Apr 10, 2009, at 1:53 PM, Stas Oskin wrote: 2009/4/10 Brian Bockelman bbock...@cse.unl.edu Most of the issues were resolved in 0.19.1 -- I think 0.20.0 is going to be even better. We run about 300TB @ 2 replicas, and haven't had file loss that was Hadoop's fault since about January. Brian And you running 0.19.1? 0.19.1 with a few convenience patches (mostly, they improve logging so the local file system researchers can play around with our data patterns). Brian
Re: Two degrees of replications reliability
On Fri, Apr 10, 2009 at 12:03 PM, Brian Bockelman bbock...@cse.unl.eduwrote: 0.19.1 with a few convenience patches (mostly, they improve logging so the local file system researchers can play around with our data patterns). Hey Brian, I'm curious about this. Could you elaborate a bit on what kind of stuff you're logging? I'm interested in what FS metrics you're looking at and how you instrumented the code. -Todd
does hadoop have any way to append to an existing file?
Hi, does hadoop have any way to append to an existing file? for example, I wrote some contents to a file, and later on I want to append some more contents to the file. thanks,
Re: Does the HDFS client read the data from NameNode, or from DataNode directly?
Also, NameNode provides a URL named streamFile which allows any HTTP client to get the stored files. Any idea how it's operations compare in terms of speed to client HDFS access? What happens here is that the NameNode redirects you to a smartly (a data node that has some of the file's first 5 blocks, I think) chosen DataNode, and that DataNode proxies the file for you. Specifically, the assembling of a full file from multiple nodes is happening on that DataNode. If you were using a DFSClient, it would assemble the file from blocks at the client, and talk to many data nodes. -- Philip
Re: Two degrees of replications reliability
On Apr 10, 2009, at 2:06 PM, Todd Lipcon wrote: On Fri, Apr 10, 2009 at 12:03 PM, Brian Bockelman bbock...@cse.unl.edu wrote: 0.19.1 with a few convenience patches (mostly, they improve logging so the local file system researchers can play around with our data patterns). Hey Brian, I'm curious about this. Could you elaborate a bit on what kind of stuff you're logging? I'm interested in what FS metrics you're looking at and how you instrumented the code. -Todd No clue what they're doing *with* the data, but I know what we've applied to HDFS to get the data. We apply both of these patches: http://issues.apache.org/jira/browse/HADOOP-5222 https://issues.apache.org/jira/browse/HADOOP-5625 This adds the duration and offset to each read. Each read is then logged through the HDFS audit mechanisms. We've been pulling the logs through the web interface and putting them back into HDFS, then processing them (actually, today we've been playing with log collection via Chukwa). There is a student who is looking at our cluster's I/O access patterns, and there's a few folks who do work in designing metadata caching algorithms that love to see application traces. Personally, I'm interested in hooking the logfiles up to our I/O accounting system so I can keep historical records of transfers and compare it to our other file systems. Brian
a script to re-write JIRA subject lines for easier GMail threading
If you subscribe to core-dev and use gmail, you might be interested in a quick script I wrote to log into my e-mail and re-write JIRA subject lines, to fix the threading in gmail. This way Updated, Commented, and Created messages all appear in the same thread. The code is at http://github.com/philz/jira-rewrite/tree/master . I've pasted in a bit of the documentation and a sample run below. -- Philip Re-writes JIRA subject lines to remove JIRA's Updated/Commented/Created annotations. JIRA's default subjects break Gmail's threading (which threads by subject line); making the subject line uniform unbreaks Gmail's threading. Specifically, this looks at every message in --source, does a search and replace with --regex and --replace (defaults are for JIRA subject lines), and puts the modified message in --dest. The original message is moved to --backup. # EXAMPLE OUTPUT: # $ python rewritejira.py --username @com --source hadoop-jira --dest hadoop-jira-rewritten --backup hadoop-jira-orig # Password: # INFO:__main__:Looking at 3 messages. # INFO:__main__:[3] Subject: '[jira] Commented: (HADOOP-5649) Enable ServicePlugins for the\r\n JobTracker' - '(HADOOP-5649) Enable ServicePlugins for the\r\n JobTracker' # INFO:__main__:[2] Subject: '[jira] Commented: (HADOOP-5581) libhdfs does not get\r\n FileNotFoundException' - '(HADOOP-5581) libhdfs does not get\r\n FileNotFoundException' # INFO:__main__:[1] Subject: '[jira] Commented: (HADOOP-5638) More improvement on block placement\r\n performance' - '(HADOOP-5638) More improvement on block placement\r\n performance' # INFO:__main__:Rewrote 3 messages.
Re: Does the HDFS client read the data from NameNode, or from DataNode directly?
Hi. What happens here is that the NameNode redirects you to a smartly (a data node that has some of the file's first 5 blocks, I think) chosen DataNode, and that DataNode proxies the file for you. Specifically, the assembling of a full file from multiple nodes is happening on that DataNode. If you were using a DFSClient, it would assemble the file from blocks at the client, and talk to many data nodes. I see, thanks for the explanation.
Re: Multithreaded Reducer
Hi Sagar! There is no reason for the body of your reduce method to do more than copy and queue the key value set into an execution pool. The close method will need to wait until the all of the items finish execution and potentially keep the heartbeat up with the task tracker by periodically reporting something. Sadly right now the reporter has to be grabbed from the reduce method as configure and close do not get an instance. I believe the key and value objects are reused by the framework on the next call to reduce, so making a copy before queuing them into your thread pool is important. On Fri, Apr 10, 2009 at 11:12 AM, Sagar Naik sn...@attributor.com wrote: Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output sorted. Can u pl point to me any other issues or it would be safe to do so -Sagar -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: Multithreaded Reducer
On Fri, Apr 10, 2009 at 12:31 PM, jason hadoop jason.had...@gmail.comwrote: Hi Sagar! There is no reason for the body of your reduce method to do more than copy and queue the key value set into an execution pool. Agreed. You probably want to use a either a bounded queue on your execution pool, or even a SynchronousQueue to do handoff to executor threads. Otherwise your reducer will churn through all of its inputs at IO rate, potentially fill up RAM, and report 100% complete way before it's actually complete. Something like: BlockingQueueRunnable queue = new SynchronousQueueRunnable(); threadPool = new ThreadPoolExecutor(WORKER_THREAD_COUNT, WORKER_THREAD_COUNT, 5, TimeUnit.SECONDS, queue); ** The close method will need to wait until the all of the items finish execution and potentially keep the heartbeat up with the task tracker by periodically reporting something. Sadly right now the reporter has to be grabbed from the reduce method as configure and close do not get an instance. +1. You probably want to call reporter.progress() after each item is processed by the worker threads. I believe the key and value objects are reused by the framework on the next call to reduce, so making a copy before queuing them into your thread pool is important. +1 here too. You will definitely run into issues if you don't make a deep copy. -Todd
Re: does hadoop have any way to append to an existing file?
Hi, Hadoop's append functionality is somewhat in progress and is not stable in any released versions quite yet. The relevant ticket is http://issues.apache.org/jira/browse/HADOOP-1700. Note that, while this JIRA indicates that it is in 0.19.0, it was rolled back in 0.19.1 ( http://issues.apache.org/jira/browse/HADOOP-5224). It is coming back in 0.20, soon to be released, but you may want to investigate other solutions to your problem in the meantime. -Todd On Fri, Apr 10, 2009 at 12:09 PM, javateck javateck javat...@gmail.comwrote: Hi, does hadoop have any way to append to an existing file? for example, I wrote some contents to a file, and later on I want to append some more contents to the file. thanks,
Listing the files in HDFS directory
Hi. I must be completely missing it in API, but what is the function to list the directory contents? Thanks!
Hadoop and Image analysis question
Hi everyone, I would like to use Hadoop for analyzing tens of thousands of images. Ideally each mapper gets few hundred images to process and I'll have few hundred mappers. However, I want the mapper function to run on the machine where its images are stored. How can I achieve that. With text data creating splits and exploiting locality seems easy. One option would be input to the map function would be a text file and that each line of the text file will contain name of the image to be processed. Now this text file is the i/p to the mapper function, so mapper parses the file and reads the image file name to be processed.. Unfortunately, one drawback of this scheme is that the image file itself might be stored on a machine different than the one running this mapper function. Copying the file over the network would be quite inefficient. Any help on this would be great.
Re: HDFS read/write speeds, and read optimization
I just wanted to add to this one other published benchmark http://developer.yahoo.net/blogs/hadoop/2008/09/scaling_hadoop_to_4000_nodes_a.html In this example on a very busy cluster of 4000 nodes both read and write throughputs were close to the local disk bandwidth. This benchmark (called TestDFSIO) uses large consequent write and reads. You can run it yourself on your hardware to compare. Is it more efficient to unify the disks into one volume (RAID or LVM), and then present them as a single space? Or it's better to specify each disk separately? There was a discussion recently on this list about RAID0 vs separate disks. Please search the archives. Separate disks turn out to perform better. Reliability-wise, the latter sounds more correct, as a single/several (up to 3) disks going down won't take the whole node with them. But perhaps there is a performance penalty? You always have block replicas on other nodes, so one node going down should not be a problem. Thanks, --Konstantin
Abandoning block messages
Hi. I'm testing HDFS fault-tolerance, and randomly disconnecting and powering off the chunk nodes to see how HDFS withstands it. After I have disconnected a chunk node, the logs have started to fill with the following errors: 09/04/11 02:25:12 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.253.41:50010 09/04/11 02:25:12 INFO dfs.DFSClient: Abandoning block blk_-656235730036158252_1379 09/04/11 02:25:12 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.253.41:50010 09/04/11 02:25:12 INFO dfs.DFSClient: Abandoning block blk_4525038275659790960_1379 09/04/11 02:25:12 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.253.41:50010 09/04/11 02:25:12 INFO dfs.DFSClient: Abandoning block blk_4496668387968639765_1379 09/04/11 02:25:12 INFO dfs.DFSClient: Waiting to find target node: 192.168.253.20:50010 09/04/11 02:25:12 INFO dfs.DFSClient: Waiting to find target node: 192.168.253.20:50010 09/04/11 02:25:12 INFO dfs.DFSClient: Waiting to find target node: 192.168.253.20:50010 09/04/11 02:25:21 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.253.41:50010 09/04/11 02:25:21 INFO dfs.DFSClient: Abandoning block blk_7909869387543801936_1379 09/04/11 02:25:21 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.253.41:50010 09/04/11 02:25:21 INFO dfs.DFSClient: Abandoning block blk_3548805008548426073_1368 09/04/11 02:25:21 INFO dfs.DFSClient: Waiting to find target node: 192.168.253.20:50010 09/04/11 02:25:25 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.253.41:50010 09/04/11 02:25:25 INFO dfs.DFSClient: Abandoning block blk_6143739150500056332_1379 09/04/11 02:25:25 INFO dfs.DFSClient: Waiting to find target node: 192.168.253.20:50010 09/04/11 02:25:30 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.253.41:50010 09/04/11 02:25:30 INFO dfs.DFSClient: Abandoning block blk_5626187863612769755_1379 Any idea what it means, and should I be concerned about it. Regards.
Re: Listing the files in HDFS directory
Hi. Just wanted to tell that FileStatus did the trick. Regards. 2009/4/11 Stas Oskin stas.os...@gmail.com Hi. I must be completely missing it in API, but what is the function to list the directory contents? Thanks!