We can't reach the server carrying .META. within 60 seconds.  Whats
going on on that server?  Doesn't the next time the below
catalogjanitor run, does it succeed or just always fail?

St.Ack

On Wed, Jun 1, 2011 at 2:27 PM, Robert Gonzalez
<[email protected]> wrote:
> This is basically it (for the first time it died while copying), we have it 
> at warn level and above:
>
> 2011-05-27 16:44:27,565 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Fail
> ed scan of catalog table
> java.net.SocketTimeoutException: Call to /10.100.2.6:60020 failed on socket 
> time
> out exception: java.net.SocketTimeoutException: 60000 millis timeout while 
> waiti
> ng for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connec
> ted local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.jav
> a:784)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257
> )
>        at $Proxy6.delete(Unknown Source)
>        at 
> org.apache.hadoop.hbase.catalog.MetaEditor.deleteDaughterReferenceInP
> arent(MetaEditor.java:201)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.removeDaughterFromParen
> t(CatalogJanitor.java:233)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.hasReferences(CatalogJa
> nitor.java:275)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.checkDaughter(CatalogJa
> nitor.java:202)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.cleanParent(CatalogJani
> tor.java:166)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.jav
> a:120)
>        at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.ja
> va:85)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while 
> waiting f
> or channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected
> local=/10.100.1.39:37717 remote=/10.100.2.6:60020]
>        at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
> va:164)
>        at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
> 55)
>        at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
> 28)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.re
> ad(HBaseClient.java:281)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at java.io.DataInputStream.readInt(DataInputStream.java:370)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HB
> aseClient.java:521)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.ja
> va:459)
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Stack
> Sent: Wednesday, June 01, 2011 12:29 PM
> To: [email protected]
> Subject: Re: wrong region exception
>
> On Wed, Jun 1, 2011 at 7:32 AM, Robert Gonzalez 
> <[email protected]> wrote:
>> We have a table copy program that copies the data from one table to another, 
>> and we can give it the start/end keys.  In this case we created a new blank 
>> table with the essential column families and let it run with start/end to be 
>> the whole range, 0-maxkey.  At about 30% of the way through, which is 
>> roughly 600 million rows, it died trying to write to the new table with the 
>> wrong region exception.  When we tried to restart the copy from that key + 
>> some delta, it still crapped out.  No explanation in the logs the first 
>> time, but a series of timeouts in the second run.  Now we are trying the 
>> copy again with a new table.
>>
>
> Robert:
>
> Do you have the master logs for this copy run still?  If so, if you put them 
> somewhere where I can pull them (or send them to me, I'll
> take a look).   I'd like to see the logs in the cluster to which you
> were copying the data.
>
> St.Ack
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of
>> Stack
>> Sent: Tuesday, May 31, 2011 6:42 PM
>> To: [email protected]
>> Subject: Re: wrong region exception
>>
>> So, what about this new WrongRegionException in the new cluster.  Can you 
>> figure how it came about?  In the new cluster, is there also a hole?  Did 
>> you start the new cluster fresh or copy from old cluster?
>>
>> St.Ack
>>
>> On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez 
>> <[email protected]> wrote:
>>> Yeah, we learned the hard way early last year to follow the guidelines 
>>> religiously.  I've gone over the requirements and checked off everything.  
>>> We even re-did our tables to only have 4 column families, down from 4x that 
>>> amount.   We are at a loss to find out why we seemed to be cursed when it 
>>> comes to HBase.  Hadoop is performing like a charm, pretty much every 
>>> machine is busy 24/7.
>>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of
>>> Stack
>>> Sent: Tuesday, May 31, 2011 3:03 PM
>>> To: [email protected]
>>> Subject: Re: wrong region exception
>>>
>>> On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez 
>>> <[email protected]> wrote:
>>>> Now I'm getting the wrong region exception on the new table that I'm 
>>>> copying the old table to.  Running hbck reveals an inconsistency in the 
>>>> new table.  The frustration is unbelievable.  Like I said before, it 
>>>> doesn't appear that HBase is ready for prime time.  I don't know how 
>>>> companies are using this successfully, it doesn't appear plausible.
>>>>
>>>
>>>
>>> Sorry you are not having a good experience.  I've not seen 
>>> WrongRegionException in ages (Grep these lists yourself).  Makes me suspect 
>>> your environment.  For sure you've read the requirements section in the 
>>> manual and set up ulimits, nprocs and xceivers up?
>>>
>>> St.Ack
>>>
>>
>

Reply via email to