RE: wrong region exception

Robert Gonzalez Thu, 02 Jun 2011 10:07:43 -0700

I'm getting a lot of this on the slave that is doing the latest adds:

2011-06-02 00:33:05,231 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlushe
r: Region urlhashv4,7F8537883DDF5230B10AA2CB13182505,1306992752074.71ab4c4527ce7
6d777f78943a86009d2. has too many store files; delaying flush up to 90000ms
2011-06-02 00:33:17,395 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC 
Server handler 7 on 60020 took 1008 ms appending an edit to hlog; editcount=1, l
en~=34.1m
2011-06-02 00:33:53,626 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC 
Server handler 2 on 60020 took 1093 ms appending an edit to hlog; editcount=1, l
en~=34.1m
2011-06-02 00:34:02,001 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlushe
r: Region urlhashv4,7F9567FC7E75F6F219D704791212B1F5,1306992806225.0d427324855e7
d0dd043566d18e8d5c4. has too many store files; delaying flush up to 90000ms
2011-06-02 00:34:10,107 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlushe
r: Region urlhashv4,7F9567FC7E75F6F219D704791212B1F5,1306992806225.0d427324855e7
d0dd043566d18e8d5c4. has too many store files; delaying flush up to 90000ms
2011-06-02 00:34:16,989 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlushe
r: Region urlhashv4,7F9567FC7E75F6F219D704791212B1F5,1306992806225.0d427324855e7
d0dd043566d18e8d5c4. has too many store files; delaying flush up to 90000ms


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Wednesday, June 01, 2011 5:34 PM
To: [email protected]
Subject: Re: wrong region exception

We can't reach the server carrying .META. within 60 seconds.  Whats going on on 
that server?  Doesn't the next time the below catalogjanitor run, does it 
succeed or just always fail?

St.Ack

On Wed, Jun 1, 2011 at 2:27 PM, Robert Gonzalez 
<[email protected]> wrote:
> This is basically it (for the first time it died while copying), we have it 
> at warn level and above:
>
> 2011-05-27 16:44:27,565 WARN 
> org.apache.hadoop.hbase.master.CatalogJanitor: Fail ed scan of catalog 
> table
> java.net.SocketTimeoutException: Call to /10.100.2.6:60020 failed on 
> socket time out exception: java.net.SocketTimeoutException: 60000 
> millis timeout while waiti ng for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connec
> ted local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.jav
> a:784)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257
> )
>        at $Proxy6.delete(Unknown Source)
>        at 
> org.apache.hadoop.hbase.catalog.MetaEditor.deleteDaughterReferenceInP
> arent(MetaEditor.java:201)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.removeDaughterFromParen
> t(CatalogJanitor.java:233)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.hasReferences(CatalogJa
> nitor.java:275)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.checkDaughter(CatalogJa
> nitor.java:202)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.cleanParent(CatalogJani
> tor.java:166)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.jav
> a:120)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.ja
> va:85)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while 
> waiting f or channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected
> local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
>        at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
> va:164)
>        at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
> 55)
>        at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
> 28)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.re
> ad(HBaseClient.java:281)
>        at 
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at java.io.DataInputStream.readInt(DataInputStream.java:370)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HB
> aseClient.java:521)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.ja
> va:459)
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of 
> Stack
> Sent: Wednesday, June 01, 2011 12:29 PM
> To: [email protected]
> Subject: Re: wrong region exception
>
> On Wed, Jun 1, 2011 at 7:32 AM, Robert Gonzalez 
> <[email protected]> wrote:
>> We have a table copy program that copies the data from one table to another, 
>> and we can give it the start/end keys.  In this case we created a new blank 
>> table with the essential column families and let it run with start/end to be 
>> the whole range, 0-maxkey.  At about 30% of the way through, which is 
>> roughly 600 million rows, it died trying to write to the new table with the 
>> wrong region exception.  When we tried to restart the copy from that key + 
>> some delta, it still crapped out.  No explanation in the logs the first 
>> time, but a series of timeouts in the second run.  Now we are trying the 
>> copy again with a new table.
>>
>
> Robert:
>
> Do you have the master logs for this copy run still?  If so, if you 
> put them somewhere where I can pull them (or send them to me, I'll 
> take a look).   I'd like to see the logs in the cluster to which you were 
> copying the data.
>
> St.Ack
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of 
>> Stack
>> Sent: Tuesday, May 31, 2011 6:42 PM
>> To: [email protected]
>> Subject: Re: wrong region exception
>>
>> So, what about this new WrongRegionException in the new cluster.  Can you 
>> figure how it came about?  In the new cluster, is there also a hole?  Did 
>> you start the new cluster fresh or copy from old cluster?
>>
>> St.Ack
>>
>> On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez 
>> <[email protected]> wrote:
>>> Yeah, we learned the hard way early last year to follow the guidelines 
>>> religiously.  I've gone over the requirements and checked off everything.  
>>> We even re-did our tables to only have 4 column families, down from 4x that 
>>> amount.   We are at a loss to find out why we seemed to be cursed when it 
>>> comes to HBase.  Hadoop is performing like a charm, pretty much every 
>>> machine is busy 24/7.
>>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of 
>>> Stack
>>> Sent: Tuesday, May 31, 2011 3:03 PM
>>> To: [email protected]
>>> Subject: Re: wrong region exception
>>>
>>> On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez 
>>> <[email protected]> wrote:
>>>> Now I'm getting the wrong region exception on the new table that I'm 
>>>> copying the old table to.  Running hbck reveals an inconsistency in the 
>>>> new table.  The frustration is unbelievable.  Like I said before, it 
>>>> doesn't appear that HBase is ready for prime time.  I don't know how 
>>>> companies are using this successfully, it doesn't appear plausible.
>>>>
>>>
>>>
>>> Sorry you are not having a good experience.  I've not seen 
>>> WrongRegionException in ages (Grep these lists yourself).  Makes me suspect 
>>> your environment.  For sure you've read the requirements section in the 
>>> manual and set up ulimits, nprocs and xceivers up?
>>>
>>> St.Ack
>>>
>>
>

RE: wrong region exception

Reply via email to