Hi satish,
 what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?

  Also, how is your disk usage on this machine? Can you check your iostat
numbers? 

Thanks
mahadev


On 9/1/09 5:15 PM, "Satish Bhatti" <cthd2...@gmail.com> wrote:

> GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS
> scavenge( 7,636 collections)
> 
> It's been running for about 48 hours.
> 
> 
> On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> 
>> Do you have long GC delays?
>> 
>> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <cthd2...@gmail.com> wrote:
>> 
>>> Session timeout is 30 seconds.
>>> 
>>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote:
>>> 
>>>> What is your client timeout? It may be too low.
>>>> 
>>>> also see this section on handling recoverable errors:
>>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>>> 
>>>> connection loss in particular needs special care since:
>>>> "When a ZooKeeper client loses a connection to the ZooKeeper server
>> there
>>>> may be some requests in flight; we don't know where they were in their
>>>> flight at the time of the connection loss. "
>>>> 
>>>> Patrick
>>>> 
>>>> 
>>>> Satish Bhatti wrote:
>>>> 
>>>>> I have recently started running on EC2 and am seeing quite a few
>>>>> ConnectionLoss exceptions.  Should I just catch these and retry?
>>  Since
>>> I
>>>>> assume that eventually, if the shit truly hits the fan, I will get a
>>>>> SessionExpired?
>>>>> Satish
>>>>> 
>>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>  We have used EC2 quite a bit for ZK.
>>>>>> 
>>>>>> The basic lessons that I have learned include:
>>>>>> 
>>>>>> a) EC2's biggest advantage after scaling and elasticity was
>> conformity
>>> of
>>>>>> configuration.  Since you are bringing machines up and down all the
>>> time,
>>>>>> they begin to act more like programs and you wind up with boot
>> scripts
>>>>>> that
>>>>>> give you a very predictable environment.  Nice.
>>>>>> 
>>>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>>>>  That
>>>>>> can make the ZK servers appear a bit less connected.  You have to
>> plan
>>>>>> for
>>>>>> ConnectionLoss events.
>>>>>> 
>>>>>> c) for highest reliability, I switched to large instances.  On
>>>>>> reflection,
>>>>>> I
>>>>>> think that was helpful, but less important than I thought at the
>> time.
>>>>>> 
>>>>>> d) increasing and decreasing cluster size is nearly painless and is
>>>>>> easily
>>>>>> scriptable.  To decrease, do a rolling update on the survivors to
>>> update
>>>>>> their configuration.  Then take down the instance you want to lose.
>>  To
>>>>>> increase, do a rolling update starting with the new instances to
>> update
>>>>>> the
>>>>>> configuration to include all of the machines.  The rolling update
>>> should
>>>>>> bounce each ZK with several seconds between each bounce.  Rescaling
>> the
>>>>>> cluster takes less than a minute which makes it comparable to EC2
>>>>>> instance
>>>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we
>>> used
>>>>>> plus about 20 seconds for additional configuration).
>>>>>> 
>>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com>
>>>>>> wrote:
>>>>>> 
>>>>>>  Hello
>>>>>>> 
>>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>>>> 
>>>>>> system,
>>>>>> 
>>>>>>> zookeeper is used to run a locking service and to generate unique
>>> id's.
>>>>>>> Currently, for testing purposes, I am only running one instance.
>> Now,
>>> I
>>>>>>> 
>>>>>> need
>>>>>> 
>>>>>>> to set up an ensemble to protect my system against crashes.
>>>>>>> The ec2 services has some differences to a normal server farm. E.g.
>>> the
>>>>>>> data saved on the file system of an ec2 instance is lost if the
>>> instance
>>>>>>> crashes. In the documentation of zookeeper, I have read that
>> zookeeper
>>>>>>> 
>>>>>> saves
>>>>>> 
>>>>>>> snapshots of the in-memory data in the file system. Is that needed
>> for
>>>>>>> recovery? Logically, it would be much easier for me if this is not
>> the
>>>>>>> 
>>>>>> case.
>>>>>> 
>>>>>>> Additionally, ec2 brings the advantage that serves can be switch on
>>> and
>>>>>>> 
>>>>>> off
>>>>>> 
>>>>>>> dynamically dependent on the load, traffic, etc. Can this advantage
>> be
>>>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>>>> 
>>>>>> server
>>>>>> 
>>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>>> 
>>>>>>> David
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Ted Dunning, CTO
>> DeepDyve
>> 

Reply via email to