Setting up a little process to run overnight that appends a timestamp to
a file once per second or so can be a very effective tool for ruling
out, for example, "extra-dimensional" VM effects.
On 06/16/2010 12:15 AM, Patrick Hunt wrote:
> I'm not very experienced personally with running zk on ec2 s
I'm not very experienced personally with running zk on ec2 smalls, Ted
usually has the ec2 related insight. Given these boxes are not loaded or
lightly loaded, and you've ruled out gc/swap, the only thing I can think
of is that something is going on under the covers at the vm level that's
causi
They're small instances. The thing is that these machines are doing next to no
work. We're just running simple little tests. The session expiration has not
happened while I've been watching. It tends to happen over night.
-JZ
On Jun 15, 2010, at 1:50 PM, Ted Dunning wrote:
> As usual, the ZK t
As usual, the ZK team provides the best feedback.
I would be bold enough to ask what kind of ec2 instances you are running on.
Small instances are small chunks of larger machines and are sometimes
subject to competition for resources from the other tenants.
On Tue, Jun 15, 2010 at 12:30 PM, Patr
Yes, 965 seconds is huge.
The times I've seen such huge latencies are (in order of frequency seen):
1) when the java process gc's, swaps, or both
and/or
2) disk utilization on the ZK server is high
and/or
3) under-provisioned virtual machines (ie vmware)
Re 2) in some cases we've seen users
Yes - the session drop happened again. I did the stat. The max latency is huge
(I assume that's in ms).
Zookeeper version: 3.3.0-925362, built on 03/19/2010 18:38 GMT
Clients:
/10.243.14.179:57300[1](queued=0,recved=0,sent=0)
/207.111.236.2:51493[1](queued=0,recved=1,sent=0)
/10.243.13.191:444
Jordan,
Good step to get this info.
I have to ask, did you have your disconnect problem last night as well?
(just checking)
What does the stat command on ZK give you for each server?
On Tue, Jun 15, 2010 at 10:33 AM, Jordan Zimmerman <
jzimmer...@proofpoint.com> wrote:
> More on this...
>
> I
More on this...
I ran last night with verbose GC on our client. I analyzed the GC log in
gchisto and 99% of the GCs are 1 or 2 ms. The longest gc is 30 ms. On the
Zookeeper server side, the longest gc is 130 ms. So, I submit, GC is not the
problem. NOTE we're running on Amazon EC2.
-JZ
On Ju
Session expiration is due to the server not hearing heartbeats from the
client. So either the client is partitioned from the server, or the
client is not sending heartbeats for some reason, typically this is due
to the client JVM gc'ing or swapping.
Patrick
On 06/10/2010 04:14 PM, Ted Dunning
On Jun 9, 2010, at 4:21 PM, Patrick Hunt wrote:
> In particular you might look at GC/swapping on your clients, that's the most
> common case we see for session expiration (apart from the obvious -- network
> level connectivity failures). In one case I remember there was very heavy
> network lo
Uh the options I was recommending were for your CLIENT. You should have
similar settings on ZK, but it is your client that is likely to be pausing.
On Thu, Jun 10, 2010 at 4:08 PM, Jordan Zimmerman wrote:
> The thing is, this is a test instance (on AWS/EC2) that isn't getting a lot
> of tra
The thing is, this is a test instance (on AWS/EC2) that isn't getting a lot of
traffic. i.e. 1 zookeeper instance that we're testing with.
On Jun 10, 2010, at 4:06 PM, Ted Dunning wrote:
> Possibly.
>
> I have seen GC times of > 4 minutes on some large processes. Better to set
> the GC paramet
Possibly.
I have seen GC times of > 4 minutes on some large processes. Better to set
the GC parameters so you don't get long pauses.
On http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting it mentions using
the "-XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC" options. I recommend
adding
On 06/09/2010 03:35 PM, Lei Zhang wrote:
We've consistently run into issues with vmware workstation (CentOS as guest
OS) on Windows host: just by leaving the cluster idle over night leads to zk
session expire issue. My theory is: windows may have gone to hibernation,
the zk heartbeat logic hibe
This can depend on which kind of instance you invoke as well. The smallest
instances disappear for short periods of time and that can lead to
surprises.
On Wed, Jun 9, 2010 at 3:35 PM, Lei Zhang wrote:
> On EC2 (still CentOS as guest OS), we consistently run into zk session
> expire issue when
We use zookeeper in virtualized environment, both on Amazon EC2 and on
Vmware Workstation on local machines.
We've consistently run into issues with vmware workstation (CentOS as guest
OS) on Windows host: just by leaving the cluster idle over night leads to zk
session expire issue. My theory is:
On Wed, Jun 9, 2010 at 2:47 PM, Patrick Hunt wrote:
> My guess is that your client is gcing for long periods of time - you can
> rule this in/out by turning on gc logging in your clients and then viewing
> the results after another such incident happens (try gchisto for graphical
> view)
>From re
"100mb partition"? sounds like virtualization. resource starvation
(worse in virtualized env) is a common cause of this. Are your clients
gcing/swapping at all? If a client gc's for long periods of time the
heartbeat thread won't be able to run and the server will expire the
session. There is a
18 matches
Mail list logo