Vishal brought up an issue at the ZK post-summit meetup that might
also be (partially?) resolved by this patch.
Thanks again Chang Song!
Patrick
2011/7/1 Chang Song tru64...@me.com:
No problem.
Glad to contribute.
Thanks a lot.
2011. 7. 2., 오전 1:03, Ted Dunning 작성:
Thanks for the
Actually, netspider, Chisu Ryu, in my team fixed it.
Thanks, Chisu.
Chang
2011. 7. 6., 오전 3:04, Patrick Hunt 작성:
Vishal brought up an issue at the ZK post-summit meetup that might
also be (partially?) resolved by this patch.
Thanks again Chang Song!
Patrick
2011/7/1 Chang Song
As a note, I believe we just used this patch to solve a major issue we were
seeing. We were having problems when power to a node was pulled, and thus
hung tcp sessions on the servers. With many connections, each close
operation was taking 2 seconds and held up the server significantly enough
to
Thanks for the feedback Jared!
(and thanks to Chang as well!)
On Fri, Jul 1, 2011 at 8:06 AM, Jared Cantwell jared.cantw...@gmail.comwrote:
As a note, I believe we just used this patch to solve a major issue ...
Thanks Chang!
~Jared
On Tue, Apr 19, 2011 at 10:59 AM, Ted Dunning
No problem.
Glad to contribute.
Thanks a lot.
2011. 7. 2., 오전 1:03, Ted Dunning 작성:
Thanks for the feedback Jared!
(and thanks to Chang as well!)
On Fri, Jul 1, 2011 at 8:06 AM, Jared Cantwell
jared.cantw...@gmail.comwrote:
As a note, I believe we just used this patch to solve a
Problem solved.
it was socket linger option set to 2 sec timeout.
We have verified that the original problem goes away when we turn off linger
option.
No longer a mystery ;)
https://issues.apache.org/jira/browse/ZOOKEEPER-1049
Chang
2011. 4. 19., 오전 3:16, Mahadev Konar 작성:
Camille, Ted,
Where is this set?
Why does this cause this problem?
2011/4/19 Chang Song tru64...@me.com
Problem solved.
it was socket linger option set to 2 sec timeout.
We have verified that the original problem goes away when we turn off
linger option.
No longer a mystery ;)
Interesting. It does seem to suggestion the session expiration is
expensive.
There is a concurrent table in guava that provides very good multi-threaded
performance. I think that is achieved by using a number of locks and then
distributing threads across the locks according to the hash slot
Camille, Ted,
Can we continue the discussion on
https://issues.apache.org/jira/browse/ZOOKEEPER-1049?
We should track all the suggestions/issues on the jira.
thanks
mahadev
On Mon, Apr 18, 2011 at 9:03 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Interesting. It does seem to suggestion the
Ted.
Please be patient.
I didn't say I won't post the data.
I am not doing the test myself. My team does.
I saw iostat result when they did the test.
I cannot cut-and-paste what I don't have.
I cannot force them to come in on weekends to do the testing.
and let me add. There is no magic in
That isn't the issue.
The issue is that there is something here that is a mystery. You aren't
seeing the answer. If you could, you would have seen it already and
wouldn't have a question to ask. If you want somebody else to see the
answer, you need to show them the raw data and not just tell
How many ephemeral files have to be deleted when a session closes or
expires?
2011/4/15 Chang Song tru64...@me.com
It is not login, it is session expiring and closing process.
I have file a JIRA bug
https://issues.apache.org/jira/browse/ZOOKEEPER-1049
We have measured I/O wait again, but found no IO activity due to ZK.
Just regular page cache sync daemon in the work: 0-3%.
I will have my team to attach ZK stat result.
Thanks a lot.
Let's move this discussion to
You know, I think it would help if you would answer some of the questions
that people have posed.
You say that it takes 1000 clients over 8 seconds to register. That is
about 100 transactions per second.
That is two orders of magnitude slower than others have observed ZK to be.
This is a
Patrick and Ted.
Unless Zookeeper clients adding this feature, it is not easy for us to
implement.
We only provide platform for many services within our org.
Their batch servers will fire off whatever clients they want.
We have no control over it.
But 8 second latency during stampede is
2011. 4. 14., 오전 10:30, Patrick Hunt 작성:
2011/4/13 Chang Song tru64...@me.com:
Patrick.
Thank you for the reply.
We are very aware of all the things you mentioned below.
None of those.
Not GC (we monitor every possible resource in JVM and system)
No IO. No Swapping.
No VM guest
2011. 4. 14., 오후 1:53, Patrick Hunt 작성:
two additional thoughts come to mind:
1) try running the ensemble with a single zk server, does this help at
all? (it might provide a short term workaround, it also might provide
some insight into what's causing the issue)
We are going to try this
2011/4/14 Chang Song tru64...@me.com:
2) regarding IO, if you run 'iostat -x 2' on the zk servers while your
issue is happening, what's the %util of the disk? what's the iowait
look like?
Again, no I/O at all. 0%
This is simply not possible.
Sessions are persistent. Each time a session
chang,
if the problem is on client startup, then it isn't the heartbeat
stamped, it is session establishment. the heartbeats are very light
weight, so i can't imagine them causing any issues.
the two key issues we need to know are: 1) the version of the server
you are running, and 2) if you are
2011. 4. 15., 오전 1:04, Patrick Hunt 작성:
2011/4/14 Chang Song tru64...@me.com:
2) regarding IO, if you run 'iostat -x 2' on the zk servers while your
issue is happening, what's the %util of the disk? what's the iowait
look like?
Again, no I/O at all. 0%
This is simply not
when you file the jira can you also note the logging level you are using?
thanx
ben
2011/4/14 Chang Song tru64...@me.com:
Yes, Ben.
If you read my emails carefully, I already said it is not heartbeat,
it is session establishment / closing gets stamped.
Since all the requests' response gets
sure I will
thank you.
Chang
2011. 4. 15., 오전 7:16, Benjamin Reed 작성:
when you file the jira can you also note the logging level you are using?
thanx
ben
2011/4/14 Chang Song tru64...@me.com:
Yes, Ben.
If you read my emails carefully, I already said it is not heartbeat,
it is
2011/4/14 Chang Song tru64...@me.com
You need to understand that most app can tolerate delay in connect/close,
but we cannot tolerate ping delay since we are using ZK heartbeat TO
for sole failure detection.
What about using multiple ZK clusters for this, then?
But it really sounds like
You said that, but there was some skepticism from others about this.
You need to try the monitoring that was suggested. 5 minute averages are
not useful.
What does the stat four letter command return? (
http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands )
2011/4/14 Chang
Patrick.
Thank you for the reply.
We are very aware of all the things you mentioned below.
None of those.
Not GC (we monitor every possible resource in JVM and system)
No IO. No Swapping.
No VM guest OS. No logging.
Oh, one thing I should mention is that it is not 1000 clients,
1000
This is a more powerful idea than it looks like at first glance.
The reason is that there is often a highly non-linear and adverse impact to
response time due to higher load. I have never been able to properly
account for this using queuing models in a system that is not swapping, but
it is
26 matches
Mail list logo