This is basically it (for the first time it died while copying), we have it at
warn level and above:
2011-05-27 16:44:27,565 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Fail
ed scan of catalog table
java.net.SocketTimeoutException: Call to /10.100.2.6:60020 failed on socket time
out exception: java.net.SocketTimeoutException: 60000 millis timeout while waiti
ng for channel to be ready for read. ch : java.nio.channels.SocketChannel[connec
ted local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.jav
a:784)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257
)
at $Proxy6.delete(Unknown Source)
at org.apache.hadoop.hbase.catalog.MetaEditor.deleteDaughterReferenceInP
arent(MetaEditor.java:201)
at org.apache.hadoop.hbase.master.CatalogJanitor.removeDaughterFromParen
t(CatalogJanitor.java:233)
at org.apache.hadoop.hbase.master.CatalogJanitor.hasReferences(CatalogJa
nitor.java:275)
at org.apache.hadoop.hbase.master.CatalogJanitor.checkDaughter(CatalogJa
nitor.java:202)
at org.apache.hadoop.hbase.master.CatalogJanitor.cleanParent(CatalogJani
tor.java:166)
at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.jav
a:120)
at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.ja
va:85)
at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting f
or channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
va:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
55)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
28)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.re
ad(HBaseClient.java:281)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HB
aseClient.java:521)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.ja
va:459)
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Wednesday, June 01, 2011 12:29 PM
To: [email protected]
Subject: Re: wrong region exception
On Wed, Jun 1, 2011 at 7:32 AM, Robert Gonzalez
<[email protected]> wrote:
> We have a table copy program that copies the data from one table to another,
> and we can give it the start/end keys. In this case we created a new blank
> table with the essential column families and let it run with start/end to be
> the whole range, 0-maxkey. At about 30% of the way through, which is roughly
> 600 million rows, it died trying to write to the new table with the wrong
> region exception. When we tried to restart the copy from that key + some
> delta, it still crapped out. No explanation in the logs the first time, but
> a series of timeouts in the second run. Now we are trying the copy again
> with a new table.
>
Robert:
Do you have the master logs for this copy run still? If so, if you put them
somewhere where I can pull them (or send them to me, I'll
take a look). I'd like to see the logs in the cluster to which you
were copying the data.
St.Ack
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Tuesday, May 31, 2011 6:42 PM
> To: [email protected]
> Subject: Re: wrong region exception
>
> So, what about this new WrongRegionException in the new cluster. Can you
> figure how it came about? In the new cluster, is there also a hole? Did you
> start the new cluster fresh or copy from old cluster?
>
> St.Ack
>
> On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez
> <[email protected]> wrote:
>> Yeah, we learned the hard way early last year to follow the guidelines
>> religiously. I've gone over the requirements and checked off everything.
>> We even re-did our tables to only have 4 column families, down from 4x that
>> amount. We are at a loss to find out why we seemed to be cursed when it
>> comes to HBase. Hadoop is performing like a charm, pretty much every
>> machine is busy 24/7.
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of
>> Stack
>> Sent: Tuesday, May 31, 2011 3:03 PM
>> To: [email protected]
>> Subject: Re: wrong region exception
>>
>> On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez
>> <[email protected]> wrote:
>>> Now I'm getting the wrong region exception on the new table that I'm
>>> copying the old table to. Running hbck reveals an inconsistency in the new
>>> table. The frustration is unbelievable. Like I said before, it doesn't
>>> appear that HBase is ready for prime time. I don't know how companies are
>>> using this successfully, it doesn't appear plausible.
>>>
>>
>>
>> Sorry you are not having a good experience. I've not seen
>> WrongRegionException in ages (Grep these lists yourself). Makes me suspect
>> your environment. For sure you've read the requirements section in the
>> manual and set up ulimits, nprocs and xceivers up?
>>
>> St.Ack
>>
>