Hadoop Exceptions

2009-01-19 Thread Sandeep Dhawan

Here are few hadoop exceptions that I am getting while running mapred job on
700MB of data on a 3 node cluster on Windows platform (using cygwin):

1. 2009-01-08 17:54:10,597 INFO org.apache.hadoop.dfs.DataNode: writeBlock
blk_-4309088198093040326_1001 received exception java.io.IOException: Block
blk_-4309088198093040326_1001 is valid, and cannot be written to.
2009-01-08 17:54:10,597 ERROR org.apache.hadoop.dfs.DataNode:
DatanodeRegistration(10.120.12.91:50010,
storageID=DS-70805886-10.120.12.91-50010-1231381442699, infoPort=50075,
ipcPort=50020):DataXceiver: java.io.IOException: Block
blk_-4309088198093040326_1001 is valid, and cannot be written to.
at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:921)
at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.(DataNode.java:2364)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1218)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1076)
at java.lang.Thread.run(Thread.java:619)

2. This particular job succeeded. Is it possible that this task was a
speculative execution and was killed before it could be started.
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2195)

3. 2009-01-15 21:27:13,547 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200901152118_0001_r_00_0 Merge of the inmemory files threw an
exception: java.io.IOException: Expecting a line not the end of stream
at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at
org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:160)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2105)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)

4. 2009-01-15 21:27:13,547 INFO org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 47 files left.
2009-01-15 21:27:13,579 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.io.IOException: attempt_200901152118_0001_r_00_0The reduce copier
failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

5. Caused by: java.io.IOException: An established connection was aborted by
the software in your host machine
... 12 more

Can anyone help me in giving some pointers to what could be the issue. 

Thanks,
Sandeep


-- 
View this message in context: 
http://www.nabble.com/Hadoop-Exceptions-tp21548261p21548261.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Performance testing

2009-01-19 Thread Sandeep Dhawan

Hi,

I am in the process of following your guidelines. 

I would like to know:

1. How can block size impact the performance of a mapred job.
2. Does the performance improve if I setup NameNode and JobTracker on
different machine. At present,
I am running Namenode and JobTracker on the same machine as Master
interconnected to 2 slave machines running Datanode and TaskTracker
3. What should be the replication factor for a 3 node cluster
4. How does io.sort.mb impact the performance of the cluster

Thanks,
Sandeep 


Brian Bockelman wrote:
> 
> Hey Sandeep,
> 
> I'd do a couple of things:
> 1) Run your test.  Do something which will be similar to your actual  
> workflow.
> 2) Save the resulting Ganglia plots.  This will give you a hint as to  
> where things are bottlenecking (memory, CPU, wait I/O).
> 3) Watch iostat and find out the I/O rates during the test.  Compare  
> this to the I/O rates of a known I/O benchmark (i.e., Bonnie+).
> 4) Finally, watch the logfiles closely.  If you start to overload  
> things, you'll usually get a pretty good indication from Hadoop where  
> things go wrong.  Once something does go wrong, *then* look through  
> the parameters to see what can be done.
> 
> There's about a hundred things which can go wrong between the kernel,  
> the OS, Java, and the application code.  It's difficult to make an  
> educated guess beforehand without some hint from the data.
> 
> Brian
> 
> On Dec 31, 2008, at 1:30 AM, Sandeep Dhawan wrote:
> 
>>
>> Hi Brian,
>>
>> That's what my issue is i.e. "How do I ascertain the bottleneck" or  
>> in other
>> words if the results obtained after doing the performance testing  
>> are not
>> upto the mark then How do I find the bottleneck.
>>
>> How can we confidently say that OS and hardware are the culprits. I
>> understand that by using the latest OS and hardware can improve the
>> performance irrespective of the application but my real worry is  
>> "What Next
>> ". How can I further increase the performance. What should I look  
>> for which
>> can suggest or point the areas which can be potential problems or  
>> "hotspot".
>>
>> Thanks for your comments.
>>
>> ~Sandeep~
>>
>>
>> Brian Bockelman wrote:
>>>
>>> Hey Sandeep,
>>>
>>> I would warn against premature optimization: first, run your test,
>>> then see how far from your target you are.
>>>
>>> Of course, I'd wager you'd find that the hardware you are using is
>>> woefully underpowered and that your OS is 5 years old.
>>>
>>> Brian
>>>
>>> On Dec 30, 2008, at 5:57 AM, Sandeep Dhawan wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I am trying to create a hadoop cluster which can handle 2000 write
>>>> requests
>>>> per second.
>>>> In each write request I would writing a line of size 1KB in a file.
>>>>
>>>> I would be using machine having following configuration:
>>>> Platfom: Red Hat Linux 9.0
>>>> CPU : 2.07 GHz
>>>> RAM : 1GB
>>>>
>>>> Can anyone help in giving me some pointers/guideline as to how to go
>>>> about
>>>> setting up such a cluster.
>>>> What are the configuration parameters in hadoop with which we can
>>>> tweak to
>>>> ehance the performance of the hadoop cluster.
>>>>
>>>> Thanks,
>>>> Sandeep
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Performance-testing-tp21216266p21216266.html
>>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Performance-testing-tp21216266p21228264.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-testing-tp21216266p21548160.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: File Modification timestamp

2008-12-30 Thread Sandeep Dhawan

Hi Dhruba,

The file is being closed properly but the timestamp does not get modified.
The modification timestamp
still shows the file creation time. 
I am creating a new file and writing data into this file.

Thanks,
Sandeep


Dhruba Borthakur-2 wrote:
> 
> I believe that file modification times are updated only when the file is
> closed. Are you "appending" to a preexisting file?
> 
> thanks,
> dhruba
> 
> 
> On Tue, Dec 30, 2008 at 3:14 AM, Sandeep Dhawan  wrote:
> 
>>
>> Hi,
>>
>> I have a application which creates a simple text file on hdfs. There is a
>> second application which processes this file. The second application
>> picks
>> up the file for processing only when the file has not been modified for
>> 10
>> mins. In this way, the second application is sure that this file is ready
>> for processing.
>>
>> But, what is happening is that the Hadoop is not updating the
>> modification
>> timestamp of the file even when the file is being written into. The
>> modification timestamp of the file is same as the timestamp when the file
>> was created.
>>
>> I am using hadoop 0.18.2.
>>
>> 1. Is this a bug in hadoop or is this way hadoop works
>> 2. Is there way by which I can programmitically set the modification
>> timestamp of the file
>>
>> Thanks,
>> Sandeep
>>
>> --
>> View this message in context:
>> http://www.nabble.com/File-Modification-timestamp-tp21215824p21215824.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/File-Modification-timestamp-tp21215824p21228299.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Performance testing

2008-12-30 Thread Sandeep Dhawan

Hi Brian,

That's what my issue is i.e. "How do I ascertain the bottleneck" or in other
words if the results obtained after doing the performance testing are not
upto the mark then How do I find the bottleneck. 

How can we confidently say that OS and hardware are the culprits. I
understand that by using the latest OS and hardware can improve the
performance irrespective of the application but my real worry is "What Next
". How can I further increase the performance. What should I look for which
can suggest or point the areas which can be potential problems or "hotspot".

Thanks for your comments.

~Sandeep~


Brian Bockelman wrote:
> 
> Hey Sandeep,
> 
> I would warn against premature optimization: first, run your test,  
> then see how far from your target you are.
> 
> Of course, I'd wager you'd find that the hardware you are using is  
> woefully underpowered and that your OS is 5 years old.
> 
> Brian
> 
> On Dec 30, 2008, at 5:57 AM, Sandeep Dhawan wrote:
> 
>>
>> Hi,
>>
>> I am trying to create a hadoop cluster which can handle 2000 write  
>> requests
>> per second.
>> In each write request I would writing a line of size 1KB in a file.
>>
>> I would be using machine having following configuration:
>> Platfom: Red Hat Linux 9.0
>> CPU : 2.07 GHz
>> RAM : 1GB
>>
>> Can anyone help in giving me some pointers/guideline as to how to go  
>> about
>> setting up such a cluster.
>> What are the configuration parameters in hadoop with which we can  
>> tweak to
>> ehance the performance of the hadoop cluster.
>>
>> Thanks,
>> Sandeep
>> -- 
>> View this message in context:
>> http://www.nabble.com/Performance-testing-tp21216266p21216266.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-testing-tp21216266p21228264.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Performance testing

2008-12-30 Thread Sandeep Dhawan

Hi,

I am trying to create a hadoop cluster which can handle 2000 write requests
per second.
In each write request I would writing a line of size 1KB in a file.

I would be using machine having following configuration:
Platfom: Red Hat Linux 9.0 
CPU : 2.07 GHz
RAM : 1GB

Can anyone help in giving me some pointers/guideline as to how to go about
setting up such a cluster.
What are the configuration parameters in hadoop with which we can tweak to
ehance the performance of the hadoop cluster. 

Thanks,
Sandeep
-- 
View this message in context: 
http://www.nabble.com/Performance-testing-tp21216266p21216266.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Append to Files..

2008-12-30 Thread Sandeep Dhawan

Yes. The component which will be interfacing with hadoop will be deployed in
an application server.
And the application server runs on 1.5 only. Therefore, we cannot migrate to
1.6.

At present I am thinking of decoupling the component from hadoop so that
this component can run from within the application server and develop
another component which would interface with this component.
While the old component can run on 1.5, the second component will be
executed as a separate process using 1.6 and interface with hadoop. In this
way, we would be able to easily upgrade to future hadoop versions without
causing any impact on the application server.


Brian Bockelman wrote:
> 
> Hey Sandeep,
> 
> Is there a non-application reason for you to not upgrade?  I.e., are  
> you working on a platform which does not have 1.6 yet?
> 
> Looking at the JIRA ticket, it seems that they held off this new  
> requirement until Macs got 1.6.
> 
> Brian
> 
> On Dec 2, 2008, at 12:26 AM, Ariel Rabkin wrote:
> 
>> File append is a major change, not a small bugfix.  Probably, you need
>> to bite the bullet and upgrade to a newer JDK. :(
>>
>> On Mon, Dec 1, 2008 at 4:29 AM, Sandeep Dhawan, Noida  
>>  wrote:
>>> Hello,
>>>
>>>
>>>
>>> I am currently using hadoop-0.18.0. I am not able to append files in
>>> DFS. I came across a fix which was done on version 0.19.0
>>> (http://issues.apache.org/jira/browse/HADOOP-1700). But I cannot  
>>> migrate
>>> to 0.19.0 version because it runs on JDK 1.6 and I have
>>>
>>> to stick to JDK 1.5 Therefore, I would like to know, if there is  
>>> patch
>>> available for this bug for 0.18.0.
>>>
>>>
>>>
>>> Any assistance in this matter will be greatly appreciated.
>>>
>>>
>>>
>>> Eagerly waiting for your response.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Sandeep
>>>
>>>
>>>
>>>
>>
>>
>>
>> -- 
>> Ari Rabkin asrab...@gmail.com
>> UC Berkeley Computer Science Department
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Append-to-Files..-tp20771815p21215934.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



File Modification timestamp

2008-12-30 Thread Sandeep Dhawan

Hi,

I have a application which creates a simple text file on hdfs. There is a
second application which processes this file. The second application picks
up the file for processing only when the file has not been modified for 10
mins. In this way, the second application is sure that this file is ready
for processing.

But, what is happening is that the Hadoop is not updating the modification
timestamp of the file even when the file is being written into. The
modification timestamp of the file is same as the timestamp when the file
was created. 

I am using hadoop 0.18.2. 

1. Is this a bug in hadoop or is this way hadoop works
2. Is there way by which I can programmitically set the modification
timestamp of the file

Thanks,
Sandeep
 
-- 
View this message in context: 
http://www.nabble.com/File-Modification-timestamp-tp21215824p21215824.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Detect Dead DataNode

2008-12-29 Thread Sandeep Dhawan

Hi,

I have a setup of 2-node Hadoop cluster running on Windows using cygwin. 
When I open up the web gui to view the number of Live Nodes, it shows 2. 
But when I kill the slave node and refreshes the gui, it still shows the
number of Live Nodes as 2.

Its only after some 20-30 mins, that the master node is able to detect the
failure which is then reflected in the gui. It then shows up :

Live Node : 1
Dead Node : 1

Also, after killing the slave datanode if I try to copy a file from the
local file system, it fails. 

1. Is there a way by which we can configure the time interval after which
master node can declare a datanode as dead.
2. Why does the file transfer fail when one of the slave node is dead and
masternode is alive.

-- 
View this message in context: 
http://www.nabble.com/Detect-Dead-DataNode-tp21202029p21202029.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



How to Install Application on Hadoop

2008-12-16 Thread Sandeep Dhawan, Noida
DISCLAIMER:
---

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and 
attachments please check them for viruses and defect.

---

Most stable version of Hadoop

2008-12-16 Thread Sandeep Dhawan, Noida
DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only. 
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and 
attachments please check them for viruses and defect.
---

Append to Files..

2008-12-01 Thread Sandeep Dhawan, Noida
Hello,

 

I am currently using hadoop-0.18.0. I am not able to append files in
DFS. I came across a fix which was done on version 0.19.0
(http://issues.apache.org/jira/browse/HADOOP-1700). But I cannot migrate
to 0.19.0 version because it runs on JDK 1.6 and I have

to stick to JDK 1.5 Therefore, I would like to know, if there is patch
available for this bug for 0.18.0. 

 

Any assistance in this matter will be greatly appreciated. 

 

Eagerly waiting for your response.

 

Thanks,

Sandeep