On 04/12/2010 03:58 PM, Kevin Webb wrote:
On Mon, 12 Apr 2010 15:09:20 -0700
Patrick Hunt<ph...@apache.org>  wrote:

We did have a case where the user setup 3 servers, each was
standalone. :-) Doesn't look like that's the problem here though
given you only specify 1 server in the connect string (although as
mahadev mentioned you don't need to worry about that aspect).

They're definitely not standalone.  Here's the server config:

# The number of milliseconds of each tick
# The number of ticks that the initial
# synchronization phase can take
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
# the directory where the snapshot is stored.
# the port at which the clients will connect
server.1=<hostname 1>:2888:3888
server.2=<hostname 2>:2888:3888
server.3=<hostname 3>:2888:3888

What's the ping time btw colos? 2sec tickTime and esp the initLimit and syncLimit are pretty low. You are allowing for only 4 seconds to d/l the data repository to a remote server. Even in-colo we typically use a higher value... but you many not want to change until we can reproduce this. You probably want a 4 sec tickTime and 60/40sec (so settings of 15/10) for the init/sync limits (something like that, depending on latencies/bandwidth you see)

After it goes 7->11->9, does it ever go back to 11 or just 9?

It actually does this:
7->7->11->9->9->12->14 ... (proceeds normally from here)

Hrm, that's very weird.

It would be good to capture the server log files (all 3) when this
happens next time. Please provide those as well, would be critical
for discovering this. In particular not many users are running
cross-colo clusters.

I'll be sure to save these next time.  I thought I had them for this
run, sorry.

NP. As I mentioned creating a JIRA would be a good idea. Very DRY.

If you can provide the config files too that will be useful.

What version of java/OS is being used?

I'm running on PlanetLab, which is based on Fedora 8 (very old).
uname says: Linux #1 SMP Tue
Jun 30 09:32:05 UTC 2009 i686 i686 i386 GNU/Linux


java -version says:
java version "1.7.0"
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea Client VM (build 1.7.0-b21, mixed mode)

Well we don't support 1.7 vms yet, but that's not to say that would cause the issue. Really once we see the server logs we should get more insight.

The only thing I could see with the os/java would be significant differences in thread/networking timing that we don't typically see with new os's and 1.6 vms...

Might be a good time to create a JIRA, attach all this to the JIRA so
that you don't have to repeat. :-)

I'll do that (including server logs) next time I see it happen.



Reply via email to