Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread Suraj Varma
I will create a JIRA ticket ... The only side-effect I could think of is ... if a RS is having a GC of a few seconds, any _new_ client trying to connect would get connect failures. So ... the _initial_ connection to the RS is what would suffer from a super-low setting of the ipc.socket.timeout.

Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread Suraj Varma
Created https://issues.apache.org/jira/browse/HBASE-6364 for this issue. Thanks, --Suraj On Tue, Jul 10, 2012 at 9:46 AM, Suraj Varma svarma...@gmail.com wrote: I will create a JIRA ticket ... The only side-effect I could think of is ... if a RS is having a GC of a few seconds, any _new_

Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread N Keywal
Thanks for the jira. The client can be connected to multiple RS, depending on the rows is working on. So yes it's initial, but it's a dynamic initial :-). This said there is a retry on error... On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma svarma...@gmail.com wrote: I will create a JIRA ticket

Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread Suraj Varma
Yes. On the maxRetries, though ... I saw the code (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#677) show this.maxRetries = conf.getInt(hbase.ipc.client.connect.max.retries, 0); So - looks like by default, the

Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread N Keywal
I expect (without double checking the path in the code ;-) that the code in HConnectionManager will retry. On Tue, Jul 10, 2012 at 7:22 PM, Suraj Varma svarma...@gmail.com wrote: Yes. On the maxRetries, though ... I saw the code

Re: HBaseClient recovery from .META. server power down

2012-07-09 Thread Suraj Varma
Hello: I'd like to get advice on the below strategy of decreasing the ipc.socket.timeout configuration on the HBase Client side ... has anyone tried this? Has anyone had any issues with configuring this lower than the default 20s? Thanks, --Suraj On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma

Re: HBaseClient recovery from .META. server power down

2012-07-09 Thread N Keywal
Hi, What you're describing -the 35 minutes recovery time- seems to match the code. And it's a bug (still there on trunk). Could you please create a jira for it? If you have the logs it even better. Lowering the ipc.socket.timeout seems to be an acceptable partial workaround. Setting it to 10s

HBaseClient recovery from .META. server power down

2012-07-02 Thread Suraj Varma
Hello: We've been doing some failure scenario tests by powering down a .META. holding region server host and while the HBase cluster itself recovers and reassigns the META region and other regions (after we tweaked down the default timeouts), our client apps using HBaseClient take a long time to

Re: HBaseClient recovery from .META. server power down

2012-07-02 Thread Suraj Varma
By power down below, I mean powering down the host with the RS that holds the .META. table. (So - essentially, the host IP is unreachable and the RS/DN is gone.) Just wanted to clarify my below steps ... --S On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma svarma...@gmail.com wrote: Hello: We've