RE: wrong region exception

Robert Gonzalez Wed, 01 Jun 2011 14:45:46 -0700

This is basically it (for the first time it died while copying), we have it at 
warn level and above:


2011-05-27 16:44:27,565 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Fail
ed scan of catalog table
java.net.SocketTimeoutException: Call to /10.100.2.6:60020 failed on socket time
out exception: java.net.SocketTimeoutException: 60000 millis timeout while waiti
ng for channel to be ready for read. ch : java.nio.channels.SocketChannel[connec
ted local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.jav
a:784)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257
)
        at $Proxy6.delete(Unknown Source)
        at org.apache.hadoop.hbase.catalog.MetaEditor.deleteDaughterReferenceInP
arent(MetaEditor.java:201)
        at org.apache.hadoop.hbase.master.CatalogJanitor.removeDaughterFromParen
t(CatalogJanitor.java:233)
        at org.apache.hadoop.hbase.master.CatalogJanitor.hasReferences(CatalogJa
nitor.java:275)
        at org.apache.hadoop.hbase.master.CatalogJanitor.checkDaughter(CatalogJa
nitor.java:202)
        at org.apache.hadoop.hbase.master.CatalogJanitor.cleanParent(CatalogJani
tor.java:166)
        at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.jav
a:120)
        at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.ja
va:85)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting f
or channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
va:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
55)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
28)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.re
ad(HBaseClient.java:281)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HB
aseClient.java:521)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.ja
va:459)

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Wednesday, June 01, 2011 12:29 PM
To: [email protected]
Subject: Re: wrong region exception

On Wed, Jun 1, 2011 at 7:32 AM, Robert Gonzalez 
<[email protected]> wrote:
> We have a table copy program that copies the data from one table to another, 
> and we can give it the start/end keys.  In this case we created a new blank 
> table with the essential column families and let it run with start/end to be 
> the whole range, 0-maxkey.  At about 30% of the way through, which is roughly 
> 600 million rows, it died trying to write to the new table with the wrong 
> region exception.  When we tried to restart the copy from that key + some 
> delta, it still crapped out.  No explanation in the logs the first time, but 
> a series of timeouts in the second run.  Now we are trying the copy again 
> with a new table.
>

Robert:

Do you have the master logs for this copy run still?  If so, if you put them 
somewhere where I can pull them (or send them to me, I'll
take a look).   I'd like to see the logs in the cluster to which you
were copying the data.

St.Ack


> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of 
> Stack
> Sent: Tuesday, May 31, 2011 6:42 PM
> To: [email protected]
> Subject: Re: wrong region exception
>
> So, what about this new WrongRegionException in the new cluster.  Can you 
> figure how it came about?  In the new cluster, is there also a hole?  Did you 
> start the new cluster fresh or copy from old cluster?
>
> St.Ack
>
> On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez 
> <[email protected]> wrote:
>> Yeah, we learned the hard way early last year to follow the guidelines 
>> religiously.  I've gone over the requirements and checked off everything.  
>> We even re-did our tables to only have 4 column families, down from 4x that 
>> amount.   We are at a loss to find out why we seemed to be cursed when it 
>> comes to HBase.  Hadoop is performing like a charm, pretty much every 
>> machine is busy 24/7.
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of 
>> Stack
>> Sent: Tuesday, May 31, 2011 3:03 PM
>> To: [email protected]
>> Subject: Re: wrong region exception
>>
>> On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez 
>> <[email protected]> wrote:
>>> Now I'm getting the wrong region exception on the new table that I'm 
>>> copying the old table to.  Running hbck reveals an inconsistency in the new 
>>> table.  The frustration is unbelievable.  Like I said before, it doesn't 
>>> appear that HBase is ready for prime time.  I don't know how companies are 
>>> using this successfully, it doesn't appear plausible.
>>>
>>
>>
>> Sorry you are not having a good experience.  I've not seen 
>> WrongRegionException in ages (Grep these lists yourself).  Makes me suspect 
>> your environment.  For sure you've read the requirements section in the 
>> manual and set up ulimits, nprocs and xceivers up?
>>
>> St.Ack
>>
>

RE: wrong region exception

Reply via email to