Re: RAID vs. JBOD
Runping Qi wrote: Hi, We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster performed better. Gridmix tests: Load: gridmix2 Cluster size: 190 nodes Test results: RAID0: 75 minutes JBOD: 67 minutes Difference: 10% Tests on HDFS writes performances We ran map only jobs writing data to dfs concurrently on different clusters. The overall dfs write throughputs on the jbod cluster are 30% (with a 58 nodes cluster) and 50% (with an 18 nodes cluster) better than that on the raid0 cluster, respectively. To understand why, we did some file level benchmarking on both clusters. We found that the file write throughput on a JBOD machine is 30% higher than that on a comparable machine with RAID0. This performance difference may be explained by the fact that the throughputs of different disks can vary 30% to 50%. With such variations, the overall throughput of a raid0 system may be bottlenecked by the slowest disk. -- Runping This is really interesting. Thank you for sharing these results! Presumably the servers were all set up with nominally homogenous hardware? And yet still the variations existed. That would be something to experiment with on new versus old clusters to see if it gets worse over time. Here we have a batch of desktop workstations all bought at the same time, to the same spec, but one of them, lucky is more prone to race conditions than any of the others. We don't know why, and assume its do with the (multiple) Xeon CPU chips being at different ends of the bell curve or something. all we know is: test on that box before shipping to find race conditions early. -steve
Re: Indexed Hashtables
Sean Shanny wrote: Delip, So far we have had pretty good luck with memcached. We are building a hadoop based solution for data warehouse ETL on XML based log files that represent click stream data on steroids. We process about 34 million records or about 70 GB data a day. We have to process dimensional data in our warehouse and then load the surrogate keyvalue pairs in memcached so we can traverse the XML files once again to perform the substitutions. We are using the memcached solution because is scales out just like hadoop. We will have code that allows us to fall back to the DB if the memcached lookup fails but that should not happen to often. LinkedIn have just opened up something they run internally, Project Voldemort: http://highscalability.com/product-project-voldemort-distributed-database http://project-voldemort.com/ It's a DHT, Java based. I haven't played with it yet, but it looks like a good part of the portfolio.
Is it possible to submit job to JobClient and exit immediately?
For now, I use such code blocks in all my MR jobs: try { JobClient.runJob(job); } catch (IOException exc) { LOG.info(Job failed, exc); } System.exit(0); But this code waits until MR job to complete. Thus, I have to run it on machine that is always online to jobtracker. My purpose is to write code that submits job to job tracker and exits without waiting for job to complete. I've tried try { JobID jobid = new JobClient(job).submitJob(job).getID(); } catch (IOException exc) { LOG.info(Job failed, exc); } System.exit(0); This code fails with java.lang.IllegalStateException: Shutdown in progress. It seems, that JobClient creates non-daemon threads on submitJob() invocation. So, is there the way to submit job and exit immediately? -- Andrew Gudkov PGP key id: CB9F07D8 (cryptonomicon.mit.edu) Jabber: gu...@jabber.ru
Re: Is it possible to submit job to JobClient and exit immediately?
Andrew wrote: For now, I use such code blocks in all my MR jobs: try { JobClient.runJob(job); JobClient jc = new JobClient(job); jc.submitJob(job); // submits a job and comes out } catch (IOException exc) { LOG.info(Job failed, exc); } System.exit(0); But this code waits until MR job to complete. Thus, I have to run it on machine that is always online to jobtracker. My purpose is to write code that submits job to job tracker and exits without waiting for job to complete. I've tried try { JobID jobid = new JobClient(job).submitJob(job).getID(); } catch (IOException exc) { LOG.info(Job failed, exc); } System.exit(0); This code fails with java.lang.IllegalStateException: Shutdown in progress. Can you check what state the jobtracker via its web-ui? Can you see what is happening to the jobtracker by checking its logs? Amar It seems, that JobClient creates non-daemon threads on submitJob() invocation. So, is there the way to submit job and exit immediately?
Re: RAID vs. JBOD
Yes, all the machines in the tests are new, with the same spec. The 30% to 50% throughput variations of the disks were observed on the disks of the same machines. Runping On 1/15/09 2:41 AM, Steve Loughran ste...@apache.org wrote: Runping Qi wrote: Hi, We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster performed better. Gridmix tests: Load: gridmix2 Cluster size: 190 nodes Test results: RAID0: 75 minutes JBOD: 67 minutes Difference: 10% Tests on HDFS writes performances We ran map only jobs writing data to dfs concurrently on different clusters. The overall dfs write throughputs on the jbod cluster are 30% (with a 58 nodes cluster) and 50% (with an 18 nodes cluster) better than that on the raid0 cluster, respectively. To understand why, we did some file level benchmarking on both clusters. We found that the file write throughput on a JBOD machine is 30% higher than that on a comparable machine with RAID0. This performance difference may be explained by the fact that the throughputs of different disks can vary 30% to 50%. With such variations, the overall throughput of a raid0 system may be bottlenecked by the slowest disk. -- Runping This is really interesting. Thank you for sharing these results! Presumably the servers were all set up with nominally homogenous hardware? And yet still the variations existed. That would be something to experiment with on new versus old clusters to see if it gets worse over time. Here we have a batch of desktop workstations all bought at the same time, to the same spec, but one of them, lucky is more prone to race conditions than any of the others. We don't know why, and assume its do with the (multiple) Xeon CPU chips being at different ends of the bell curve or something. all we know is: test on that box before shipping to find race conditions early. -steve
Re: Indexed Hashtables
Delip, what about Hadoop MapFile? http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/MapFile.html Regards, Peter
Re: Indexed Hashtables
Delip, Why do you think Hbase will be an overkill? I do something similar to what you're trying to do with Hbase and I haven't encountered any significant problems so far. Can you give some more info on the size of the data you have? Jim On Wed, Jan 14, 2009 at 8:47 PM, Delip Rao delip...@gmail.com wrote: Hi, I need to lookup a large number of key/value pairs in my map(). Is there any indexed hashtable available as a part of Hadoop I/O API? I find Hbase an overkill for my application; something on the lines of HashStore (www.cellspark.com/hashstore.html) should be fine. Thanks, Delip
Cascading 1.0.0 Released
Hi all Just a quick note to let everyone know that Cascading 1.0.0 is out. http://www.cascading.org/ Cascading is an API for defining and executing data processing flows without needing to think in MapReduce. This release supports only Hadoop 0.19.x. Minor releases will be available to track Hadoop's progress. A list of some of the high level features can be read about here: http://www.cascading.org/documentation/features/ Or you can peruse the User Guide: http://www.cascading.org/documentation/userguide.html Developer Support and OEM/Commercial Licensing is available through Concurrent, Inc. http://www.concurrentinc.com/ And finally, Advanced Hadoop and Cascading training (and consulting) is available through Scale Unlimited: http://www.scaleunlimited.com/ cheers, chris -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/
Re: Hadoop 0.17.1 = EOFException reading FSEdits file, what causes this? how to prevent?
Does the file exist or maybe was it deleted? Also, are the permissions on that directory set correctly, or could they have been changed out from under you by accident? - Aaron On Tue, Jan 13, 2009 at 9:53 AM, Joe Montanez jmonta...@veoh.com wrote: Hi: I'm using Hadoop 0.17.1 and I'm encountering EOFException reading the FSEdits file. I don't have a clear understanding what is causing this and how to prevent this. Has anyone seen this and can advise? Thanks in advance, Joe 2009-01-12 22:51:45,573 ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90) at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599) at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766) at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640) at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274) at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:255) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:178) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:164) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857) 2009-01-12 22:51:45,574 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
Re: Hadoop 0.17.1 = EOFException reading FSEdits file, what causes this? how to prevent?
Joe, It looks like you edits file is corrupted or truncated. Most probably the last modification was not written to it, when the name-node was turned off. This may happen if the node crashes depending on the underlying local file system I guess. Here are some options for you to consider: - try an alternative replica of the image directory if you had one. - try to edit the edits file if you know the internal format. - try to modify local copy of your name-node code, which should catch EOFException and ignore it. - Use a checkpointed image if you can afford to loose latest modifications to the fs. - Formatting of cause is the last resort since you loose everything. Thanks, --Konstantin Joe Montanez wrote: Hi: I'm using Hadoop 0.17.1 and I'm encountering EOFException reading the FSEdits file. I don't have a clear understanding what is causing this and how to prevent this. Has anyone seen this and can advise? Thanks in advance, Joe 2009-01-12 22:51:45,573 ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90) at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599) at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766) at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640) at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274) at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:255) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:178) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:164) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857) 2009-01-12 22:51:45,574 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
Locks in hadoop
I would like to implement a locking mechanism across the hdfs cluster I assume there is no inherent support for it I was going to do it with files. According to my knowledge, file creation is an atomic operation. So the file-based lock should work. I need to think through with all conditions but if some one has better idea/solution, pl share Thanks -Sagar
Re: Locks in hadoop
Did you look at Zookeeper? Thanks, --Konstantin Sagar Naik wrote: I would like to implement a locking mechanism across the hdfs cluster I assume there is no inherent support for it I was going to do it with files. According to my knowledge, file creation is an atomic operation. So the file-based lock should work. I need to think through with all conditions but if some one has better idea/solution, pl share Thanks -Sagar
Re: hadoop job -history
jobOutputDir is the location specified by the configuration property hadoop.job.history.user.location. If you don't specify anything for the property, the job history logs will be created in job's output directory. So, to view your history give your jobOutputDir, if you havent specified any location. Hope this helps. Thanks Amareshwari Bill Au wrote: I am having trouble getting the hadoop command job -hisotry to work. What am I suppose to use for jobOutputDir? I can see the job history from the JobTracker web ui. I tried specifing the history directory on the JobTracker but it didn't work: $ hadoop job -history logs/history/ Exception in thread main java.io.IOException: Not able to initialize History viewer at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:88) at org.apache.hadoop.mapred.JobClient.viewHistory(JobClient.java:1596) at org.apache.hadoop.mapred.JobClient.run(JobClient.java:1560) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1727) Caused by: java.io.IOException: History directory logs/history/_logs/historydoes not exist at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:70) ... 5 more
hadoop 0.19.0 and data node failure
To test hadoop's fault tolerence I tried the following node A -- name node and secondaryname node nodeB - datanode nodeC - datanode replica set to 2. When A, B and C are running I'm able to make a round trip for a wav file. Now to test fault tolerence I brought nodeB down and tried to write a file. Writing failed even though nodeC was up and running with following msg. More interestingly the file of size was listed in the name node. I would have expected hadoop to write the file to NodeB ##error msg### [had...@cancunvm1 testfiles]$ hadoop fs -copyFromLocal 9979_D4FE01E0-DD119BDE-3000CB83-EB857348.wav jukebox/9979_D4FE01E0-DD119BDE-3000CB83-EB857348_21.wav 09/01/16 01:47:09 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException 09/01/16 01:47:09 INFO hdfs.DFSClient: Abandoning block blk_4025795281260753088_1216 09/01/16 01:47:09 INFO hdfs.DFSClient: Waiting to find target node: 10.0.3.136:50010 09/01/16 01:47:18 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host 09/01/16 01:47:18 INFO hdfs.DFSClient: Abandoning block blk_-2076345051085316536_1216 09/01/16 01:47:27 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host 09/01/16 01:47:27 INFO hdfs.DFSClient: Abandoning block blk_2666380449580768625_1216 09/01/16 01:47:36 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host 09/01/16 01:47:36 INFO hdfs.DFSClient: Abandoning block blk_742770163755453348_1216 09/01/16 01:47:42 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 09/01/16 01:47:42 WARN hdfs.DFSClient: Error Recovery for block blk_742770163755453348_1216 bad datanode[0] nodes == null 09/01/16 01:47:42 WARN hdfs.DFSClient: Could not get block locations. Aborting... copyFromLocal: No route to host Exception closing file /user/hadoop/jukebox/9979_D4FE01E0-DD119BDE-3000CB83-EB857348_21.wav java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198) at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)