Hello,
Today, I closely followed all hbase and hadoop logs. As soon as map reached
100% reduce was 33%. Then when reduce reached 66% I saw in hadoop's datanode
log the following error
2012-10-16 22:44:54,634 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-179532189-192.168.1.4-50010-1349640973409, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
at java.lang.Thread.run(Thread.java:662)
And hbase's regionserver stopped without any errors. I do not see any errors in
hbase master and hadoop namenode logs.
@Lewis
Not sure what do you mean about configuration to run behind proxy. I closely
followed hbase configuration at http://hbase.apache.org/book/configuration.html
box1 --is a local fedora linux box with dynamic ip
box2 --is a dedicated fedora server with static ip.
In box 2 fetcher runs without any errors, but the generated set is 100,000
times less than the set in box1
Thanks in advance.
Alex.
-----Original Message-----
From: Lewis John Mcgibbney <[email protected]>
To: user <[email protected]>
Sent: Tue, Oct 16, 2012 2:40 am
Subject: Re: nutch-2.0-fetcher fails in reduce stage
Hi Alex,
I've seen similar exceptions numerous times [0] when running the Gora
test suite against HBase however this _always_ occurred against an
HBase version other than the officially supported version of HBase
(which is 0.90.4) when behind a local proxy so I am immediately
tempted to speculate that this may be the source of the problem.
On Tue, Oct 16, 2012 at 3:50 AM, <[email protected]> wrote:
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>
> org.apache.gora.util.GoraException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface
to master/192.168.1.4:60020 after attempts=1
The above two slices of the stack would also indicate that this is the case.
>
> bin/nutch inject works fine. Also, I have a different linux, box. fetcher
> with
the same config runs fine, but the generated set is much less than in the first
linux box.
I don't really understand this very well it is quite ambiguous. Can
you clearly define between box1 and box2... and which one works and
which one doesn't? Also how are your HBase configurations across these
boxes and how are you running Nutch?
>
> Any ideas how to fix this issue and what is the benefit running fetcher in
pseudo distributed mode against the local one?
>
Finally, is your Nutch deployment configured to run behind a proxy? I
know there is no mention of this but maybe there is more to this than
simply disabling iptables! I am not however HBase literate enough to
comment further on what configuration causes this, therefore I've
copied in the user@ gora list as well.
@user@
The original thread for this topic can be found below [1]
[0] http://www.mail-archive.com/[email protected]/msg00485.html
[1] http://www.mail-archive.com/[email protected]/msg07823.html
hth
Lewis