Hadoop Exceptions
Here are few hadoop exceptions that I am getting while running mapred job on 700MB of data on a 3 node cluster on Windows platform (using cygwin): 1. 2009-01-08 17:54:10,597 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_-4309088198093040326_1001 received exception java.io.IOException: Block blk_-4309088198093040326_1001 is valid, and cannot be written to. 2009-01-08 17:54:10,597 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.120.12.91:50010, storageID=DS-70805886-10.120.12.91-50010-1231381442699, infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: Block blk_-4309088198093040326_1001 is valid, and cannot be written to. at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:921) at org.apache.hadoop.dfs.DataNode$BlockReceiver.(DataNode.java:2364) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1218) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1076) at java.lang.Thread.run(Thread.java:619) 2. This particular job succeeded. Is it possible that this task was a speculative execution and was killed before it could be started. Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2195) 3. 2009-01-15 21:27:13,547 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200901152118_0001_r_00_0 Merge of the inmemory files threw an exception: java.io.IOException: Expecting a line not the end of stream at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:160) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2105) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078) 4. 2009-01-15 21:27:13,547 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 47 files left. 2009-01-15 21:27:13,579 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: attempt_200901152118_0001_r_00_0The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) 5. Caused by: java.io.IOException: An established connection was aborted by the software in your host machine ... 12 more Can anyone help me in giving some pointers to what could be the issue. Thanks, Sandeep -- View this message in context: http://www.nabble.com/Hadoop-Exceptions-tp21548261p21548261.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Performance testing
Hi, I am in the process of following your guidelines. I would like to know: 1. How can block size impact the performance of a mapred job. 2. Does the performance improve if I setup NameNode and JobTracker on different machine. At present, I am running Namenode and JobTracker on the same machine as Master interconnected to 2 slave machines running Datanode and TaskTracker 3. What should be the replication factor for a 3 node cluster 4. How does io.sort.mb impact the performance of the cluster Thanks, Sandeep Brian Bockelman wrote: > > Hey Sandeep, > > I'd do a couple of things: > 1) Run your test. Do something which will be similar to your actual > workflow. > 2) Save the resulting Ganglia plots. This will give you a hint as to > where things are bottlenecking (memory, CPU, wait I/O). > 3) Watch iostat and find out the I/O rates during the test. Compare > this to the I/O rates of a known I/O benchmark (i.e., Bonnie+). > 4) Finally, watch the logfiles closely. If you start to overload > things, you'll usually get a pretty good indication from Hadoop where > things go wrong. Once something does go wrong, *then* look through > the parameters to see what can be done. > > There's about a hundred things which can go wrong between the kernel, > the OS, Java, and the application code. It's difficult to make an > educated guess beforehand without some hint from the data. > > Brian > > On Dec 31, 2008, at 1:30 AM, Sandeep Dhawan wrote: > >> >> Hi Brian, >> >> That's what my issue is i.e. "How do I ascertain the bottleneck" or >> in other >> words if the results obtained after doing the performance testing >> are not >> upto the mark then How do I find the bottleneck. >> >> How can we confidently say that OS and hardware are the culprits. I >> understand that by using the latest OS and hardware can improve the >> performance irrespective of the application but my real worry is >> "What Next >> ". How can I further increase the performance. What should I look >> for which >> can suggest or point the areas which can be potential problems or >> "hotspot". >> >> Thanks for your comments. >> >> ~Sandeep~ >> >> >> Brian Bockelman wrote: >>> >>> Hey Sandeep, >>> >>> I would warn against premature optimization: first, run your test, >>> then see how far from your target you are. >>> >>> Of course, I'd wager you'd find that the hardware you are using is >>> woefully underpowered and that your OS is 5 years old. >>> >>> Brian >>> >>> On Dec 30, 2008, at 5:57 AM, Sandeep Dhawan wrote: >>> >>>> >>>> Hi, >>>> >>>> I am trying to create a hadoop cluster which can handle 2000 write >>>> requests >>>> per second. >>>> In each write request I would writing a line of size 1KB in a file. >>>> >>>> I would be using machine having following configuration: >>>> Platfom: Red Hat Linux 9.0 >>>> CPU : 2.07 GHz >>>> RAM : 1GB >>>> >>>> Can anyone help in giving me some pointers/guideline as to how to go >>>> about >>>> setting up such a cluster. >>>> What are the configuration parameters in hadoop with which we can >>>> tweak to >>>> ehance the performance of the hadoop cluster. >>>> >>>> Thanks, >>>> Sandeep >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Performance-testing-tp21216266p21216266.html >>>> Sent from the Hadoop core-user mailing list archive at Nabble.com. >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Performance-testing-tp21216266p21228264.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Performance-testing-tp21216266p21548160.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: File Modification timestamp
Hi Dhruba, The file is being closed properly but the timestamp does not get modified. The modification timestamp still shows the file creation time. I am creating a new file and writing data into this file. Thanks, Sandeep Dhruba Borthakur-2 wrote: > > I believe that file modification times are updated only when the file is > closed. Are you "appending" to a preexisting file? > > thanks, > dhruba > > > On Tue, Dec 30, 2008 at 3:14 AM, Sandeep Dhawan wrote: > >> >> Hi, >> >> I have a application which creates a simple text file on hdfs. There is a >> second application which processes this file. The second application >> picks >> up the file for processing only when the file has not been modified for >> 10 >> mins. In this way, the second application is sure that this file is ready >> for processing. >> >> But, what is happening is that the Hadoop is not updating the >> modification >> timestamp of the file even when the file is being written into. The >> modification timestamp of the file is same as the timestamp when the file >> was created. >> >> I am using hadoop 0.18.2. >> >> 1. Is this a bug in hadoop or is this way hadoop works >> 2. Is there way by which I can programmitically set the modification >> timestamp of the file >> >> Thanks, >> Sandeep >> >> -- >> View this message in context: >> http://www.nabble.com/File-Modification-timestamp-tp21215824p21215824.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/File-Modification-timestamp-tp21215824p21228299.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Performance testing
Hi Brian, That's what my issue is i.e. "How do I ascertain the bottleneck" or in other words if the results obtained after doing the performance testing are not upto the mark then How do I find the bottleneck. How can we confidently say that OS and hardware are the culprits. I understand that by using the latest OS and hardware can improve the performance irrespective of the application but my real worry is "What Next ". How can I further increase the performance. What should I look for which can suggest or point the areas which can be potential problems or "hotspot". Thanks for your comments. ~Sandeep~ Brian Bockelman wrote: > > Hey Sandeep, > > I would warn against premature optimization: first, run your test, > then see how far from your target you are. > > Of course, I'd wager you'd find that the hardware you are using is > woefully underpowered and that your OS is 5 years old. > > Brian > > On Dec 30, 2008, at 5:57 AM, Sandeep Dhawan wrote: > >> >> Hi, >> >> I am trying to create a hadoop cluster which can handle 2000 write >> requests >> per second. >> In each write request I would writing a line of size 1KB in a file. >> >> I would be using machine having following configuration: >> Platfom: Red Hat Linux 9.0 >> CPU : 2.07 GHz >> RAM : 1GB >> >> Can anyone help in giving me some pointers/guideline as to how to go >> about >> setting up such a cluster. >> What are the configuration parameters in hadoop with which we can >> tweak to >> ehance the performance of the hadoop cluster. >> >> Thanks, >> Sandeep >> -- >> View this message in context: >> http://www.nabble.com/Performance-testing-tp21216266p21216266.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Performance-testing-tp21216266p21228264.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Performance testing
Hi, I am trying to create a hadoop cluster which can handle 2000 write requests per second. In each write request I would writing a line of size 1KB in a file. I would be using machine having following configuration: Platfom: Red Hat Linux 9.0 CPU : 2.07 GHz RAM : 1GB Can anyone help in giving me some pointers/guideline as to how to go about setting up such a cluster. What are the configuration parameters in hadoop with which we can tweak to ehance the performance of the hadoop cluster. Thanks, Sandeep -- View this message in context: http://www.nabble.com/Performance-testing-tp21216266p21216266.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Append to Files..
Yes. The component which will be interfacing with hadoop will be deployed in an application server. And the application server runs on 1.5 only. Therefore, we cannot migrate to 1.6. At present I am thinking of decoupling the component from hadoop so that this component can run from within the application server and develop another component which would interface with this component. While the old component can run on 1.5, the second component will be executed as a separate process using 1.6 and interface with hadoop. In this way, we would be able to easily upgrade to future hadoop versions without causing any impact on the application server. Brian Bockelman wrote: > > Hey Sandeep, > > Is there a non-application reason for you to not upgrade? I.e., are > you working on a platform which does not have 1.6 yet? > > Looking at the JIRA ticket, it seems that they held off this new > requirement until Macs got 1.6. > > Brian > > On Dec 2, 2008, at 12:26 AM, Ariel Rabkin wrote: > >> File append is a major change, not a small bugfix. Probably, you need >> to bite the bullet and upgrade to a newer JDK. :( >> >> On Mon, Dec 1, 2008 at 4:29 AM, Sandeep Dhawan, Noida >> wrote: >>> Hello, >>> >>> >>> >>> I am currently using hadoop-0.18.0. I am not able to append files in >>> DFS. I came across a fix which was done on version 0.19.0 >>> (http://issues.apache.org/jira/browse/HADOOP-1700). But I cannot >>> migrate >>> to 0.19.0 version because it runs on JDK 1.6 and I have >>> >>> to stick to JDK 1.5 Therefore, I would like to know, if there is >>> patch >>> available for this bug for 0.18.0. >>> >>> >>> >>> Any assistance in this matter will be greatly appreciated. >>> >>> >>> >>> Eagerly waiting for your response. >>> >>> >>> >>> Thanks, >>> >>> Sandeep >>> >>> >>> >>> >> >> >> >> -- >> Ari Rabkin asrab...@gmail.com >> UC Berkeley Computer Science Department > > > -- View this message in context: http://www.nabble.com/Append-to-Files..-tp20771815p21215934.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
File Modification timestamp
Hi, I have a application which creates a simple text file on hdfs. There is a second application which processes this file. The second application picks up the file for processing only when the file has not been modified for 10 mins. In this way, the second application is sure that this file is ready for processing. But, what is happening is that the Hadoop is not updating the modification timestamp of the file even when the file is being written into. The modification timestamp of the file is same as the timestamp when the file was created. I am using hadoop 0.18.2. 1. Is this a bug in hadoop or is this way hadoop works 2. Is there way by which I can programmitically set the modification timestamp of the file Thanks, Sandeep -- View this message in context: http://www.nabble.com/File-Modification-timestamp-tp21215824p21215824.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Detect Dead DataNode
Hi, I have a setup of 2-node Hadoop cluster running on Windows using cygwin. When I open up the web gui to view the number of Live Nodes, it shows 2. But when I kill the slave node and refreshes the gui, it still shows the number of Live Nodes as 2. Its only after some 20-30 mins, that the master node is able to detect the failure which is then reflected in the gui. It then shows up : Live Node : 1 Dead Node : 1 Also, after killing the slave datanode if I try to copy a file from the local file system, it fails. 1. Is there a way by which we can configure the time interval after which master node can declare a datanode as dead. 2. Why does the file transfer fail when one of the slave node is dead and masternode is alive. -- View this message in context: http://www.nabble.com/Detect-Dead-DataNode-tp21202029p21202029.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
How to Install Application on Hadoop
DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect. ---
Most stable version of Hadoop
DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect. ---
Append to Files..
Hello, I am currently using hadoop-0.18.0. I am not able to append files in DFS. I came across a fix which was done on version 0.19.0 (http://issues.apache.org/jira/browse/HADOOP-1700). But I cannot migrate to 0.19.0 version because it runs on JDK 1.6 and I have to stick to JDK 1.5 Therefore, I would like to know, if there is patch available for this bug for 0.18.0. Any assistance in this matter will be greatly appreciated. Eagerly waiting for your response. Thanks, Sandeep