Re: nutch-2.0-fetcher fails in reduce stage

alxsss Tue, 16 Oct 2012 23:41:39 -0700

Hello,

Today, I closely followed all hbase and hadoop logs. As soon as map reached 
100% reduce was 33%. Then when reduce reached 66% I saw in hadoop's datanode 
log the following error


2012-10-16 22:44:54,634 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:50010, 
storageID=DS-179532189-192.168.1.4-50010-1349640973409, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
    at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
    at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
    at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
    at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
    at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
    at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
    at java.lang.Thread.run(Thread.java:662)

 

 

And hbase's regionserver stopped without any errors. I do not see any errors in 
hbase master and hadoop namenode logs. 


@Lewis
Not sure what do you mean about configuration to run behind proxy. I closely 
followed hbase configuration at http://hbase.apache.org/book/configuration.html

box1 --is a local fedora linux box with dynamic ip
box2 --is a dedicated fedora server with static ip.

In box 2 fetcher runs without any errors, but the generated set is 100,000 
times less than the set in box1

Thanks in advance.
Alex.



-----Original Message-----
From: Lewis John Mcgibbney <[email protected]>
To: user <[email protected]>
Sent: Tue, Oct 16, 2012 2:40 am
Subject: Re: nutch-2.0-fetcher fails in reduce stage

 

 
 

Hi Alex, 


 


I've seen similar exceptions numerous times [0] when running the Gora 


test suite against HBase however this _always_ occurred against an 


HBase version other than the officially supported version of HBase 


(which is 0.90.4) when behind a local proxy so I am immediately 


tempted to speculate that this may be the source of the problem. 


 


On Tue, Oct 16, 2012 at 3:50 AM,  <[email protected]> wrote: 


 


>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>  


 


> org.apache.gora.util.GoraException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException:  


Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface  


to master/192.168.1.4:60020 after attempts=1 


 


The above two slices of the stack would also indicate that this is the case. 


 


> 


> bin/nutch inject works fine. Also, I have a different linux, box. fetcher 
> with  


the same config runs fine, but the generated set is much less than in the first 
 


linux box. 


 


I don't really understand this very well it is quite ambiguous. Can 


you clearly define between box1 and box2... and which one works and 


which one doesn't? Also how are your HBase configurations across these 


boxes and how are you running Nutch? 


> 


> Any ideas how to fix this issue and what is the benefit running fetcher in  


pseudo distributed mode against the local one? 


> 


 


Finally, is your Nutch deployment configured to run behind a proxy? I 


know there is no mention of this but maybe there is more to this than 


simply disabling iptables! I am not however HBase literate enough to 


comment further on what configuration causes this, therefore I've 


copied in the user@ gora list as well. 


 


@user@ 


 


The original thread for this topic can be found below [1] 


 


[0] http://www.mail-archive.com/[email protected]/msg00485.html 


[1] http://www.mail-archive.com/[email protected]/msg07823.html 


 


hth 


 


Lewis

Re: nutch-2.0-fetcher fails in reduce stage

Reply via email to