Re: Region server crashes when using replication

Eran Kutner Tue, 22 Mar 2011 12:37:58 -0700

Actually, it will probably be connection timeout, not connection
refused when there is no connection between the two clusters.


Is there a workaround I can implement now for HBASE-3664, can I write
something in ZK so the server has an old entry to delete and is happy
with it?

-eran




On Tue, Mar 22, 2011 at 21:01, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
> Inline.
>
> J-D
>
> On Tue, Mar 22, 2011 at 11:51 AM, Eran Kutner <e...@gigya.com> wrote:
>> Thanks, J-D.
>> As for the first issue, why does this behavior make sense? What happens when
>> the connection between the two cluster fails? Will the region servers of the
>> primary fail as well? or at least won't be able to start? Seems very
>> radical.
>
> The DNS entry should remain, so you won't get UnknownHostException but
> ConnectionRefused instead. But that's a different issue: HBASE-3130
>
>>
>> Regarding the second issue, I didn't see anything else in the logs, it just
>> seemed like it decided to shutdown, but maybe I missed it. I will try to
>> reproduce that and let you know if I succeed.
>
> That'd be nice :)
>
>>
>> Regarding the timeout to detect a failed server, 3 minutes sounds like a
>> very long time for a region server to be down. Obviously, during that time
>> the data owned by that server is inaccessible. Is there a reason for this
>> long timeout? Can it be configured?
>>
>
> We set it that high for people that try to push too much data to
> clusters that are too small / badly configured and then end up with
> crazy garbage collections. Have fun reading this serie of blog posts:
> http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/
>
> Please also see the book about this configuration:
> http://hbase.apache.org/book.html#recommended_configurations
>

Re: Region server crashes when using replication

Reply via email to