GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS scavenge( 7,636 collections)
It's been running for about 48 hours. On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Do you have long GC delays? > > On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <cthd2...@gmail.com> wrote: > > > Session timeout is 30 seconds. > > > > On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <ph...@apache.org> wrote: > > > > > What is your client timeout? It may be too low. > > > > > > also see this section on handling recoverable errors: > > > http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling > > > > > > connection loss in particular needs special care since: > > > "When a ZooKeeper client loses a connection to the ZooKeeper server > there > > > may be some requests in flight; we don't know where they were in their > > > flight at the time of the connection loss. " > > > > > > Patrick > > > > > > > > > Satish Bhatti wrote: > > > > > >> I have recently started running on EC2 and am seeing quite a few > > >> ConnectionLoss exceptions. Should I just catch these and retry? > Since > > I > > >> assume that eventually, if the shit truly hits the fan, I will get a > > >> SessionExpired? > > >> Satish > > >> > > >> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com> > > >> wrote: > > >> > > >> We have used EC2 quite a bit for ZK. > > >>> > > >>> The basic lessons that I have learned include: > > >>> > > >>> a) EC2's biggest advantage after scaling and elasticity was > conformity > > of > > >>> configuration. Since you are bringing machines up and down all the > > time, > > >>> they begin to act more like programs and you wind up with boot > scripts > > >>> that > > >>> give you a very predictable environment. Nice. > > >>> > > >>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN. > > >>> That > > >>> can make the ZK servers appear a bit less connected. You have to > plan > > >>> for > > >>> ConnectionLoss events. > > >>> > > >>> c) for highest reliability, I switched to large instances. On > > >>> reflection, > > >>> I > > >>> think that was helpful, but less important than I thought at the > time. > > >>> > > >>> d) increasing and decreasing cluster size is nearly painless and is > > >>> easily > > >>> scriptable. To decrease, do a rolling update on the survivors to > > update > > >>> their configuration. Then take down the instance you want to lose. > To > > >>> increase, do a rolling update starting with the new instances to > update > > >>> the > > >>> configuration to include all of the machines. The rolling update > > should > > >>> bounce each ZK with several seconds between each bounce. Rescaling > the > > >>> cluster takes less than a minute which makes it comparable to EC2 > > >>> instance > > >>> boot time (about 30 seconds for the Alestic ubuntu instance that we > > used > > >>> plus about 20 seconds for additional configuration). > > >>> > > >>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.g...@28msec.com> > > >>> wrote: > > >>> > > >>> Hello > > >>>> > > >>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my > > >>>> > > >>> system, > > >>> > > >>>> zookeeper is used to run a locking service and to generate unique > > id's. > > >>>> Currently, for testing purposes, I am only running one instance. > Now, > > I > > >>>> > > >>> need > > >>> > > >>>> to set up an ensemble to protect my system against crashes. > > >>>> The ec2 services has some differences to a normal server farm. E.g. > > the > > >>>> data saved on the file system of an ec2 instance is lost if the > > instance > > >>>> crashes. In the documentation of zookeeper, I have read that > zookeeper > > >>>> > > >>> saves > > >>> > > >>>> snapshots of the in-memory data in the file system. Is that needed > for > > >>>> recovery? Logically, it would be much easier for me if this is not > the > > >>>> > > >>> case. > > >>> > > >>>> Additionally, ec2 brings the advantage that serves can be switch on > > and > > >>>> > > >>> off > > >>> > > >>>> dynamically dependent on the load, traffic, etc. Can this advantage > be > > >>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper > > >>>> > > >>> server > > >>> > > >>>> dynamically to an ensemble? E.g. dependent on the in-memory load? > > >>>> > > >>>> David > > >>>> > > >>>> > > >> > > > > > > -- > Ted Dunning, CTO > DeepDyve >