no JVM limitations, but some code is just not really meant to be
restarted within the same JVM and things just didnt work out well.
Specifically the DFSClient code, and I think we had to hack a bunch to
make the ZK sessions reconnect because you have to re-init the entire
stack.

When you have a bunch of code that assumes a static gets initialized
once and never again that doesnt make for a easy reinitialize.

On Tue, Sep 21, 2010 at 5:36 PM, Matthew LeMieux <[email protected]> wrote:
> What are the JVM limitations that you were you running into?
>
> -Matthew
>
> On Sep 21, 2010, at 5:31 PM, Ryan Rawson wrote:
>
>> We tried that before, but some things are difficult to reset in the same JVM.
>>
>> A clean restart just works better :-)
>>
>> On Tue, Sep 21, 2010 at 5:29 PM, Dmitriy Lyubimov <[email protected]> wrote:
>>> Thanks a lot, Ryan.
>>>
>>> That's what i thought, I knew this explanation that the regions are split;
>>> although I guess one might reason there's no reason why we can't try to
>>> start a new life by rejoining cluster again as a new region server (but the
>>> same process). Or at least have such an option. Just wanted to double-check
>>> before wrapping it into some sort of a kicker.
>>> -Dmitriy
>>>
>>>
>>> On Tue, Sep 21, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote:
>>>
>>>> You could wrap the regionserver in a script that auto-reboots them?
>>>>
>>>> We cant really recover from this scenario, because the master notices
>>>> we are dead, then splits our logs and reassigns the regions to other
>>>> nodes.  This is the basis of how reliable hbase works in the face of
>>>> machine failure.
>>>>
>>>> -ryan
>>>>
>>>> On Tue, Sep 21, 2010 at 5:20 PM, Dmitriy Lyubimov <[email protected]>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> so in our production, we see temporary networking failures (we are not
>>>> quite
>>>>> 100% sure what they are) but now and then region server's zookeeper
>>>> session
>>>>> would get expired and in addition some ipc channels would throw 'channel
>>>>> closed'.
>>>>>
>>>>> This causes region server to exit. Which is not a very big deal, our
>>>>> monitoring system would send a text message so somebody would restart the
>>>>> region server.
>>>>>
>>>>> however, this does happen a little more often than we probably would have
>>>>> liked to do it manually.
>>>>>
>>>>> Why is server not recovering/reconnecting automatically? is there a
>>>> facility
>>>>> to enable server restarts and region server nodes to rejoin the cluster
>>>>> automatically?
>>>>>
>>>>> Thanks.
>>>>> -Dmitriy
>>>>>
>>>>
>>>
>
>

Reply via email to