Actually, it will probably be connection timeout, not connection refused when there is no connection between the two clusters.
Is there a workaround I can implement now for HBASE-3664, can I write something in ZK so the server has an old entry to delete and is happy with it? -eran On Tue, Mar 22, 2011 at 21:01, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > Inline. > > J-D > > On Tue, Mar 22, 2011 at 11:51 AM, Eran Kutner <e...@gigya.com> wrote: >> Thanks, J-D. >> As for the first issue, why does this behavior make sense? What happens when >> the connection between the two cluster fails? Will the region servers of the >> primary fail as well? or at least won't be able to start? Seems very >> radical. > > The DNS entry should remain, so you won't get UnknownHostException but > ConnectionRefused instead. But that's a different issue: HBASE-3130 > >> >> Regarding the second issue, I didn't see anything else in the logs, it just >> seemed like it decided to shutdown, but maybe I missed it. I will try to >> reproduce that and let you know if I succeed. > > That'd be nice :) > >> >> Regarding the timeout to detect a failed server, 3 minutes sounds like a >> very long time for a region server to be down. Obviously, during that time >> the data owned by that server is inaccessible. Is there a reason for this >> long timeout? Can it be configured? >> > > We set it that high for people that try to push too much data to > clusters that are too small / badly configured and then end up with > crazy garbage collections. Have fun reading this serie of blog posts: > http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/ > > Please also see the book about this configuration: > http://hbase.apache.org/book.html#recommended_configurations >