Errors while running sytest

2010-04-06 Thread Vishal K
Hi,

I am trying to run systest on a 3 node cluster (
http://svn.apache.org/repos/asf/hadoop/zookeeper/trunk/src/java/systest/README.txt
).

When I reach the 4th step which is to actually run the test I get exception
shown below.

Exception in thread main java.lang.NoClassDefFoundError:
junit/framework/TestC
ase
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:14
1)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.zookeeper.util.FatJarMain.main(FatJarMain.java:97)
Caused by: java.lang.ClassNotFoundException: junit.framework.TestCase
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 15 more

Looks like it is not able to find classes in junit. However, my classpath is
set right:

:/opt/zookeeper-3.3.0/zookeeper.jar:/opt/zookeeper-3.3.0/lib/junit-4.4.jar:/opt/
zookeeper-3.3.0/lib/log4j-1.2.15.jar:/opt/zookeeper-3.3.0/build/test/lib/junit-4.8.1.jar

Any suggestions how I can get around this problem? Thanks.


Re: Embedding ZK in another application

2010-04-23 Thread Vishal K
Hi,

Good question. We are planning to do something similar as well and it will
great to know if there are any issues with embedding ZK server into an app.
We simply use QourumPeerMain and QourumPeer from our app to start/stop the
ZK server. Is this not a good way to do it?

On Fri, Apr 23, 2010 at 1:28 PM, Asankha C. Perera asan...@apache.orgwrote:

 Hi All

 I'm very new to ZK, and am looking at embeding ZK into an app that needs
 cluster management - and the objective is to use ZK to notify
 application cluster control operations (e.g. shutdown etc) across nodes.

 I came across this post [1] from the user list by Ted Dunning from some
 months back :
 My experience with Katta has led me to believe that embedding a ZK in a
 product is almost always a bad idea. - The problems are that you can't
 administer the Zookeeper cluster independently and that the cluster
 typically goes down when the associated service goes down.

 However, I believe that both the above are fine to live with for the
 application under consideration, as ZK will be used only to coordinate
 the larger application. Is there anything else that needs to be
 considered - and can I safely shutdown the clientPort since the
 application is always in the same JVM - but, if I do that how would I
 connect to ZK thereafter ?

 thanks and regards
 asankha

 [1] http://markmail.org/message/tjonwec7p7dhfpms



Re: Embedding ZK in another application

2010-04-25 Thread Vishal K
Hi Mahadev, Ted,

Thanks for the feedback.

On Fri, Apr 23, 2010 at 3:02 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 It is, of course, your decision, but a key coordination function is to
 determine whether your application is up or not.  That is very hard to do
 if
 Zookeeper is inside your application.

 On Fri, Apr 23, 2010 at 10:28 AM, Asankha C. Perera asan...@apache.org
 wrote:

  However, I believe that both the above are fine to live with for the
  application under consideration, as ZK will be used only to coordinate
  the larger application. Is there anything else that needs to be
  considered - and can I safely shutdown the clientPort since the
  application is always in the same JVM - but, if I do that how would I
  connect to ZK thereafter ?
 



Re: Embedding ZK in another application

2010-04-29 Thread Vishal K
Hi,

Well looks like FastLeaderElection.shutdown() is not invoked. This has been
in 3.3.0. Should have checked on that earlier :-)

On Thu, Apr 29, 2010 at 10:13 AM, Vishal K vishalm...@gmail.com wrote:

 Hi Ted,

 We want the application that embeds the ZK server to be running even after
 the ZK server is shutdown. So we don't want to restart the application.
 Also, we prefer not to use zkServer.sh/zkServer.cmd because these are OS
 dependent (our application will run on Win as well as Linux). Instead, we
 thought that calling QuorumPeerMain.initializeAndRun() and
 QuorumPeerMain.shutdown() will suffice to start and shutdown a ZK server and
 we won't have to worry about checking the OS.

 Is there way to cleanly shutdown the ZK server (by invoking ZK server API)
 when it is embedded in the application without actually restarting the
 application process?
 Thanks.
 On Thu, Apr 29, 2010 at 1:54 AM, Ted Dunning ted.dunn...@gmail.comwrote:

 Hmmm it isn't quite clear what you mean by restart without restarting.

 Why is killing the server and restarting it not an option?

 It is common to do a rolling restart on a ZK cluster.  Just restart one
 server at a time.  This is often used during system upgrades.

 On Wed, Apr 28, 2010 at 8:22 PM, Vishal K vishalm...@gmail.com wrote:

 
  What is a good way to restart a ZK server (standalone and quorum)
 without
  having to restart it?
 
  Currently, I have ZK server embedded in another java application.





Securing ZooKeeper connections

2010-05-25 Thread Vishal K
Hi All,

Since ZooKeeper does not support secure network connections yet, I thought I
would poll and see what people are doing to address this problem. Is anyone
running ZooKeeper over secure channels (client - server and server- server
authentication/encryption)? If yes, can you please elaborate how you do it?

Thanks.

Regards,
-Vishal


cleanup ZK takes 40-60 seconds

2010-07-16 Thread Vishal K
Hi,

We have embedded ZK server in our application. We start a thread in our
application and call QuorumPeerMain.InitializeArgs().

When cleaning-up ZK we call QuorumPeerMain.shutdown() and wait for the
thread that is calling InitializeArgs() to finish. These two steps are
taking around 60 seconds. I could probably not wait for InitializeArgs() to
finish and that might speed up things.

However, I am not sure why the cleanup should take such a long time. Can
anyone comment on this?

Thanks.
-Vishal


Too many KeeperErrorCode = Session moved messages

2010-08-05 Thread Vishal K
Hi All,

I am seeing a lot of these messages in our application. I would like to know
if I am doing something wrong or this is a ZK bug.

Setup:
- Server environment:zookeeper.version=3.3.0-925362
- 3 node cluster
- Each node has few clients that connect to the local server using 127.0.0.1
as the host IP.
- The application first forms a ZK cluster. Once the ZK cluster is formed,
each node establish sessions with local ZK servers. The clients do not know
about remote server so sessions are always with the local server.

As soon as ZK clients connected to their respective follower, the ZK leader
starts spitting the following messages:

2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,748 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x9 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,755 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0xb zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,795 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x10 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,850 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x1 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,910 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x1b zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,920 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x20 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,019 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x29 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,030 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2c zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,035 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2e zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,065 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x33 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:38,840 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x4 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
20

These sessions were established on the follower:
2010-07-01 08:59:09,890 - INFO  [CommitProcessor:0:nioserverc...@1431] -
Established session 0x298d3b1fa9 with negotiated timeout 9000 for client
/127.0.0.1:50773
2010-07-01 08:59:09,890 - INFO
[SvaDefaultBLC-SendThread(localhost.localdom:2181):clientcnxn$sendthr...@701]
- Session establishment complete on server localhost.localdom/127.0.0.1:2181,
sessionid = 0x298d3b1fa9, negotiated timeout = 9000


The server is spitting out these messages for every session that it does not
own  (session established by clients with followers). The messages are
always seen for a sync request.
No other issues are seen with the cluster. I am wondering what would be the
cause of this problem? Looking at PrepRequestProcessor, it seems like this
message is printed when the owner of the request is not same as session
owner. But in our application this should never happen since clients always
connect to its local server.

Any ideas?

Thanks.
-Vishal


Re: How to handle Node does not exist error?

2010-08-12 Thread Vishal K
Hi,

I don't intend to hijack Dr. Hao's email thread here, but I would like to
point out two things:

1. I  use embedded server as well. But I don't use any setters. We extend
QuorumPeerMain and call initializeAndRun() function. So we are doing pretty
much the same thing that QuorumPeerMain is doing. However, note that I am
seeing the same problem (in ZK 3.3.0) as Dr Hao is seeing. I haven't
debugged the cause yet. I assumed that this was my implementation error (and
it could still be). Nevertheless, this could turn out to be a bug as well.

2. With respect to Ted's point about backward compatibility, I would suggest
to take an approach of having an API to support embedded ZK instead of
asking users to not embed ZK.

-Vishal

On Thu, Aug 12, 2010 at 3:18 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 It doesn't.

 But running a ZK cluster that is incorrectly configured can cause this
 problem and configuring ZK using setters is likely to be subject to changes
 in what configuration is needed.  Thus, your style of code is more subject
 to decay over time than is nice.

 The rest of my comments detail *other* reasons why embedding a coordination
 layer in the code being coordinated is a bad idea.

 On Thu, Aug 12, 2010 at 6:33 AM, Vishal K vishalm...@gmail.com wrote:

  Hi Ted,
 
  Can you explain why running ZK in embedded mode can cause znode
  inconsistencies?
  Thanks.
 
  -Vishal
 
  On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   Try running the server in non-embedded mode.
  
   Also, you are assuming that you know everything about how to configure
  the
   quorumPeer.  That is going to change and your code will break at that
  time.
If you use a non-embedded cluster, this won't be a problem and you
 will
  be
   able to upgrade ZK version without having to restart your service.
  
   My own opinion is that running an embedded ZK is a serious
 architectural
   error.  Since I don't know your particular situation, it might be
   different,
   but there is an inherent contradiction involved in running a
 coordination
   layer as part of the thing being coordinated.  Whatever your software
  does,
   it isn't what ZK does.  As such, it is better to factor out the ZK
   functionality and make it completely stable.  That gives you a much
  simpler
   world and will make it easier for you to trouble shoot your system.
  The
   simple fact that you can't take down your service without affecting the
   reliability of your ZK layer makes this a very bad idea.
  
   The problems you are having now are only a preview of what this
   architectural error leads to.  There will be more problems and many of
  them
   are likely to be more subtle and lead to service interruptions and lots
  of
   wasted time.
  
   On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He h...@softtouchit.com wrote:
  
hi, Ted and Mahadev,
   
   
Here are some more details about my setup:
   
I run zookeeper in the embedded mode with the following code:
   
   quorumPeer = new QuorumPeer();
   
 quorumPeer.setClientPort(getClientPort());
   quorumPeer.setTxnFactory(new
FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir(;
   
 quorumPeer.setQuorumPeers(getServers());
   
 quorumPeer.setElectionType(getElectionAlg());
   
   quorumPeer.setMyid(getServerId());
   
 quorumPeer.setTickTime(getTickTime());
   
 quorumPeer.setInitLimit(getInitLimit());
   
 quorumPeer.setSyncLimit(getSyncLimit());
   
 quorumPeer.setQuorumVerifier(getQuorumVerifier());
   
 quorumPeer.setCnxnFactory(cnxnFactory);
   quorumPeer.start();
   
   
The configuration values are read from the following XML document for
server 1:
   
cluster tickTime=1000 initLimit=10 syncLimit=5
 clientPort=2181
serverId=1
 member id=1 host=192.168.2.6:2888:3888/
 member id=2 host=192.168.2.3:2888:3888/
 member id=3 host=192.168.2.4:2888:3888/
/cluster
   
   
The other servers have the same configurations except their ids being
changed to 2 and 3.
   
The error occurred on server 3 when I batch loaded some messages to
   server
1.  However, this error does not always happen.  I am not sure
 exactly
   what
trigged this error yet.
   
I also performed the stat operation on one of the No exit node
 and
   got:
   
stat
   
 /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg001583
Exception in thread main java.lang.NullPointerException
   at
org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
   at
   
 org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
   at
org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579

Re: How to handle Node does not exist error?

2010-08-16 Thread Vishal K
In my case, I am pretty sure that the configuration was right. I will
reproduce it and post more info later. Thanks.

On Mon, Aug 16, 2010 at 1:08 PM, Patrick Hunt ph...@apache.org wrote:

 Try using the logs, stat command or JMX to verify that each ZK server is
 indeed a leader/follower as expected. You should have one leader and n-1
 followers. Verify that you don't have any standalone servers (this is the
 most frequent error I see - misconfiguration of a server such that it thinks
 it's a standalone server; I often see where a user has 3 standalone servers
 which they think is a single quorum, all of the servers will therefore be
 inconsistent to each other).

 Patrick


 On 08/12/2010 05:42 PM, Ted Dunning wrote:

 On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao Heh...@softtouchit.com  wrote:

  hi, Ted,

 I am a little bit confused here.  So, is the node inconsistency problem
 that Vishal and I have seen here most likely caused by configurations or
 embedding?

 If it is the former, I'd appreciate if you can point out where those
 silly
 mistakes have been made and the correct way to embed ZK.


 I think it is likely due to misconfiguration, but I don't know what the
 issue is exactly.  I think that another poster suggested that you ape the
 normal ZK startup process more closely.  That sounds good but it may be
 incompatible with your goals of integrating all configuration into a
 single
 XML file and not using the normal ZK configuration process.

 Your thought about forking ZK is a good one since there are calls to
 System.exit() that could wreak havoc.



  Although I agree with your comments about the architectural issues that
 embedding may lead to and we are aware of those,  I do not agree that
 embedding will always lead to those issues.



 I agree that embedding won't always lead to those issues and your
 application is a reasonable counter-example.  As is common, I think that
 the
 exception proves the rule since your system is really just another way to
 launch an independent ZK cluster rather than an example of ZK being
 embedded
 into an application.




Re: Weird ephemeral node issue

2010-08-17 Thread Vishal K
Hi Qing,

Can you list the znodes from the monitor and from the node that the monitor
is restarting (run zkCli.sh on both machines).
I am curious to see if the node that did not receive the SESSION_EXPIRED
event still has the znode in its database.
Also can you describe your setiup? Can you send out logs and zoo.cfg file.
Thanks.

-Vishal
On Tue, Aug 17, 2010 at 3:31 AM, Qing Yan qing...@gmail.com wrote:

 Forget to mention:  the process looks fine,  nomal memory foot print and
 cpu
 usage, generate expected results, only thing is missing
 the ephermenal node in ZK.



Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi,

I remember Ben had opened a jira for clock jumps earlier:
https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to
have clocks jump forward in virtualized environments.

It is desirable to modify ZooKeeper to handle this situation (as much as
possible) internally. It would need to be done for both client - server
connections and server - server connections. One obvious solution is to
retry a few times (send ping) after getting a timeout. Another way is to
count the number of pings that have been sent after receiving the timeout.
If number of pings do not match the expected number (say 5 ping attempt
should be finished for a 5 sec timeout), then wait till all the pings are
finished. In effect do not completely rely on the clock. Any comments?

-Vishal

On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan qing...@gmail.com wrote:

 Oh.. our servers are also running in a virtualized environment.

 On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite waite@gmail.com wrote:

  Hi,
 
  I have tripped over similar problems testing Red Hat Cluster in
 virtualised
  environments.  I don't know whether recent linux kernels have improved
  their
  interaction with VMWare, but in our environments clock drift caused by
 lost
  ticks can be substantial, requiring NTP to sometimes jump the clock
 rather
  than control acceleration.   In one of our internal production rigs, the
  local NTP servers themselves were virtualised - causing absolute mayhem
  when
  heavy loads hit the other guests on the same physical hosts.
 
  The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time by
 10
  seconds always causes a member to prematurely time-out on a network read,
  causing the member to drop out and trigger a cluster reconfiguration.
  Apparently NTP is integrated with RHCS version 3, but I don't know what
 is
  meant by that.
 
  I guess this post is not entirely relevent to ZK, but I am just making
 the
  point that virtualisation (of NTP servers and or clients) can cause
  repeated
  premature timeouts.  On Linux, I believe that there is a class of timers
  provided that is immune to this, but I doubt that there is a platform
  independent way of coping with this.
 
  My 2p.
 
  regards,
  Martin
 
  On 18 August 2010 16:53, Patrick Hunt ph...@apache.org wrote:
 
   Do you expect the time to be wrong frequently? If ntp is running it
   should never get out of sync more than a small amount. As long as this
 is
   less than ~your timeout you should be fine.
  
   Patrick
  
  
   On 08/18/2010 01:04 AM, Qing Yan wrote:
  
   Hi,
  
  The testcase is fairly simple. We have a client which connects to
 ZK,
   registers an ephemeral node and watches on it. Now change the client
   machine's time - session killed..
  
  Here is the log:
  
   *2010-08-18 04:24:57,782 INFO
   com.taobao.timetunnel2.cluster.service.AgentService: Host name
   kgbtest1.corp.alimama.com
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51
  GMT
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:host.name=kgbtest1.corp.alimama.com
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.version=1.6.0_13
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.vendor=Sun Microsystems Inc.
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.home=/usr/java/jdk1.6.0_13/jre
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
  
  
 
 environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
  
  
 
 environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.io.tmpdir=/tmp
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.compiler=NA
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.name=Linux
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.arch=amd64
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.version=2.6.18-164.el5
   2010-08-18 04:24:57,789 INFO 

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ted,

I haven't give it a serious thought yet, but I don't think it is neccessary
for the cluster to keep track of time.

A node can make its own decision. For the sake of argument, lets say that we
have a client and a server with following policy:
1. Client is supposed to send a ping to server every 1 sec.
2. If server does not hear from client for 5 seconds, then the server
declares that the client is dead.
3. Similary if the client cannot communicate with the server for 5 seconds
client declares that the server is dead.

If the client receives a timeout (say while doing some IO) because of a time
jump, it should check the number of pings that has failed with the server.
If the number is 5, then this is a true failure, If the number is less than
5, then this is because of a time drift.

At the server side, the server can attempt to reconnect (or send a ping to
the client) after it receives a timeout. Thus, if the timeout occured
because of time drift, the server will reconnect and continue. We should
ofcourse have an upper bound in number of retries, etc.

For ZK, it is important to handle time jumps on ZK leader.


 I believe that the pattern of these problems is a slow slippage behind and
 a
 sudden jump forward.



You won't see the slippage. You will mainly see a jump forward. Note with
large enough number of nodes, multiple nodes could see their time jumping
forward. Therefore, checking comparing time between two servers may not
help.



 On Thu, Aug 19, 2010 at 7:51 AM, Vishal K vishalm...@gmail.com wrote:

  Hi,
 
  I remember Ben had opened a jira for clock jumps earlier:
  https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon
 to
  have clocks jump forward in virtualized environments.
 
  It is desirable to modify ZooKeeper to handle this situation (as much as
  possible) internally. It would need to be done for both client - server
  connections and server - server connections. One obvious solution is to
  retry a few times (send ping) after getting a timeout. Another way is to
  count the number of pings that have been sent after receiving the
 timeout.
  If number of pings do not match the expected number (say 5 ping attempt
  should be finished for a 5 sec timeout), then wait till all the pings are
  finished. In effect do not completely rely on the clock. Any comments?
 
  -Vishal
 
  On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan qing...@gmail.com wrote:
 
   Oh.. our servers are also running in a virtualized environment.
  
   On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite waite@gmail.com
  wrote:
  
Hi,
   
I have tripped over similar problems testing Red Hat Cluster in
   virtualised
environments.  I don't know whether recent linux kernels have
 improved
their
interaction with VMWare, but in our environments clock drift caused
 by
   lost
ticks can be substantial, requiring NTP to sometimes jump the clock
   rather
than control acceleration.   In one of our internal production rigs,
  the
local NTP servers themselves were virtualised - causing absolute
 mayhem
when
heavy loads hit the other guests on the same physical hosts.
   
The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time
 by
   10
seconds always causes a member to prematurely time-out on a network
  read,
causing the member to drop out and trigger a cluster reconfiguration.
Apparently NTP is integrated with RHCS version 3, but I don't know
 what
   is
meant by that.
   
I guess this post is not entirely relevent to ZK, but I am just
 making
   the
point that virtualisation (of NTP servers and or clients) can cause
repeated
premature timeouts.  On Linux, I believe that there is a class of
  timers
provided that is immune to this, but I doubt that there is a platform
independent way of coping with this.
   
My 2p.
   
regards,
Martin
   
On 18 August 2010 16:53, Patrick Hunt ph...@apache.org wrote:
   
 Do you expect the time to be wrong frequently? If ntp is running
 it
 should never get out of sync more than a small amount. As long as
  this
   is
 less than ~your timeout you should be fine.

 Patrick


 On 08/18/2010 01:04 AM, Qing Yan wrote:

 Hi,

The testcase is fairly simple. We have a client which connects
 to
   ZK,
 registers an ephemeral node and watches on it. Now change the
 client
 machine's time - session killed..

Here is the log:

 *2010-08-18 04:24:57,782 INFO
 com.taobao.timetunnel2.cluster.service.AgentService: Host name
 kgbtest1.corp.alimama.com
 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
 Client
 environment:zookeeper.version=3.2.2-888565, built on 12/08/2009
  21:51
GMT
 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
 Client
 environment:host.name=kgbtest1.corp.alimama.com
 2010-08-18 04:24:57,789 INFO

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ben,

Comments inline..

On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed br...@yahoo-inc.com wrote:

 if we can't rely on the clock, we cannot say things like if ... for 5
 seconds.


if ... for 5 seconds indicates the timeout give by the socket library.
After the timeout we can verify that the timeout received was not a side
effect of time jump by looking at the number of ping attempts.



 also, clients connect to servers, not visa-versa, so we cannot say things
 like server can attempt to reconnect.


In the scenario described below, wouldn't it be ok for the server to just
send a ping request to see if the client is really dead?


 ben


 On 08/19/2010 10:17 AM, Vishal K wrote:

 Hi Ted,

 I haven't give it a serious thought yet, but I don't think it is
 neccessary
 for the cluster to keep track of time.

 A node can make its own decision. For the sake of argument, lets say that
 we
 have a client and a server with following policy:
 1. Client is supposed to send a ping to server every 1 sec.
 2. If server does not hear from client for 5 seconds, then the server
 declares that the client is dead.
 3. Similary if the client cannot communicate with the server for 5 seconds
 client declares that the server is dead.

 If the client receives a timeout (say while doing some IO) because of a
 time
 jump, it should check the number of pings that has failed with the server.
 If the number is 5, then this is a true failure, If the number is less
 than
 5, then this is because of a time drift.

 At the server side, the server can attempt to reconnect (or send a ping to
 the client) after it receives a timeout. Thus, if the timeout occured
 because of time drift, the server will reconnect and continue. We should
 ofcourse have an upper bound in number of retries, etc.

 For ZK, it is important to handle time jumps on ZK leader.



 I believe that the pattern of these problems is a slow slippage behind
 and
 a
 sudden jump forward.




 You won't see the slippage. You will mainly see a jump forward. Note with
 large enough number of nodes, multiple nodes could see their time jumping
 forward. Therefore, checking comparing time between two servers may not
 help.




 On Thu, Aug 19, 2010 at 7:51 AM, Vishal Kvishalm...@gmail.com  wrote:



 Hi,

 I remember Ben had opened a jira for clock jumps earlier:
 https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon


 to


 have clocks jump forward in virtualized environments.

 It is desirable to modify ZooKeeper to handle this situation (as much as
 possible) internally. It would need to be done for both client - server
 connections and server - server connections. One obvious solution is to
 retry a few times (send ping) after getting a timeout. Another way is to
 count the number of pings that have been sent after receiving the


 timeout.


 If number of pings do not match the expected number (say 5 ping attempt
 should be finished for a 5 sec timeout), then wait till all the pings
 are
 finished. In effect do not completely rely on the clock. Any comments?

 -Vishal

 On Thu, Aug 19, 2010 at 3:52 AM, Qing Yanqing...@gmail.com  wrote:



 Oh.. our servers are also running in a virtualized environment.

 On Thu, Aug 19, 2010 at 2:58 PM, Martin Waitewaite@gmail.com


 wrote:




 Hi,

 I have tripped over similar problems testing Red Hat Cluster in


 virtualised


 environments.  I don't know whether recent linux kernels have


 improved


 their
 interaction with VMWare, but in our environments clock drift caused


 by


 lost


 ticks can be substantial, requiring NTP to sometimes jump the clock


 rather


 than control acceleration.   In one of our internal production rigs,


 the


 local NTP servers themselves were virtualised - causing absolute


 mayhem


 when
 heavy loads hit the other guests on the same physical hosts.

 The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time


 by


 10


 seconds always causes a member to prematurely time-out on a network


 read,


 causing the member to drop out and trigger a cluster reconfiguration.
 Apparently NTP is integrated with RHCS version 3, but I don't know


 what


 is


 meant by that.

 I guess this post is not entirely relevent to ZK, but I am just


 making


 the


 point that virtualisation (of NTP servers and or clients) can cause
 repeated
 premature timeouts.  On Linux, I believe that there is a class of


 timers


 provided that is immune to this, but I doubt that there is a platform
 independent way of coping with this.

 My 2p.

 regards,
 Martin

 On 18 August 2010 16:53, Patrick Huntph...@apache.org  wrote:



 Do you expect the time to be wrong frequently? If ntp is running


 it


 should never get out of sync more than a small amount. As long as


 this


 is


 less than ~your timeout you should be fine.

 Patrick


 On 08/18/2010 01:04 AM, Qing Yan wrote:



 Hi,

The testcase is fairly simple. We have a client which connects


 to


 ZK

Understanding ZooKeeper data file management and LogFormatter

2010-09-08 Thread Vishal K
Hi All,

Can you please share your experience regarding ZK snapshot retention and
recovery policies?

We have an application where we never need to rollback (i.e., revert back to
a previous state by using old snapshots). Given this, I am trying to
understand under what circumstances would we ever need to use old ZK
snapshots. I understand a lot of these decisions depend on the application
and amount of redundancy used at every level (e.g,. RAID level where the
snapshots are stored etc) in the product. To simplify the discussion, I
would like to rule out any application characteristics and focus mainly on
data consistency.

- Assuming that we have a 3 node cluster I am trying to figure out when
would I really need to use old snapshot files. With 3 nodes we already have
at least 2 servers with consistent database. If I loose files on one of the
servers, I can use files from the other. In fact, ZK server join will take
care of this. I can remove files from a faulty node and reboot that node.
The faulty node will sync with the leader.

- The old files will be useful if the current snapshot and/or log files are
lost or corrupted on all 3 servers. If  the loss is due to a disaster (case
where we loose all 3 servers), one would have to keep the snapshots on some
external storage to recover. However, if the current snapshot file is
corrupted on all 3 servers, then the most likely cause would be a bug in ZK.
In which case, how can I trust the consistency of the old snapshots?

- Given a set of snapshots and log files, how can I verify the correctness
of these files? Example, if one of the intermediate snapshot file is
corrupt.

- The Admin's guide says Using older log and snapshot files, you can look
at the previous state of ZooKeeper servers and even restore that state. The
LogFormatter class allows an administrator to look at the transactions in a
log. * *Is there a tool that does this for the admin?  The LogFormatter
only displays the transactions in the log file.

- Has anyone ever had to play with the snapshot files in production?

Thanks in advance.

Regards,
-Vishal


Re: znode inconsistencies across ZooKeeper servers

2010-10-06 Thread Vishal K
Hi Patrick,

You are correct, the test restarts both ZooKeeper server and the client. The
client opens a new connection after restarting. So we would expect that the
ephmeral znode (/foo) to expire after the session timeout. However, the
client with the new session creates the ephemeral znode (/foo) again after
it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
doesn't exist). The client is not reusing the session ID. What I expect to
see is that the older /foo should expire after which a new /foo should get
created. Is my expectation correct?

What confuses me is the following output of 3 successive getstat /foo
requests on A (the zxid, time and owner fields).  Notice that the older
znode reappeared.
At the same time when I do getstat at B and C, I see the newer /foo.

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x105ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x105ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x105ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce4
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x10607
ctime = Tue Oct 05 15:01:07 UTC 2010
mZxid = 0x10607
mtime = Tue Oct 05 15:01:07 UTC 2010
pZxid = 0x10607
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce5bda4
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x105ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x105ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x105ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce4
dataLength = 54
numChildren = 0

Thanks for your help.

-Vishal

On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt ph...@apache.org wrote:

 Vishal the attachment seems to be getting removed by the list daemon (I
 don't have it), can you create a JIRA and attach? Also this is a good
 question for the ppl on zookeeper-user. (ccing)

 You are aware that ephemeral znodes are tied to the session? And that
 sessions only expire after the session timeout period? At which time any
 znodes created during that session are then deleted. The fact that you are
 killing your client process leads me to believe that you are not closing
 the session cleanly (meaning that it will eventually expire after the
 session timeout period), in which case the ephemeral znodes _should_
 reappear when A is restarted and successfully rejoins the cluster. (at
 least
 until the session timeout is exceeded)

 Patrick

 On Tue, Oct 5, 2010 at 11:04 AM, Vishal K vishalm...@gmail.com wrote:

  Hi,
 
  I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A), I
  have a ZK client running that connects to the local server and creates an
  ephemeral znode to indicate clients on other nodes that it is online.
 
  I have test script that reboots the zookeeper server as well as client on
  A. The test does a getstat on the ephemeral znode created by the client
 on
  A. I am seeing that the view of znodes on A is different from the other 2
  nodes. I can tell this from the session ID that the client gets after
  reconnecting to the local ZK server.
 
  So the test is simple:
  - kill zookeeper server and client process
  - wait for a few seconds
  - do zkCli.sh stat ...  test.out
 
  What I am seeing is that the ephemeral znode with old zxid, time, and
  session ID is reappearing on node A. I have attached the output of 3
  consecutive getstat requests of the test (see client_getstat.out). Notice
  that the third output is the same as the first one. That is, the old
  ephemeral znode reappeared at A. However, both B and C are showing the
  latest znode with correct time, zxid and session ID (output not
 attached).
 
  After this point, all following getstat requests on A are showing the old
  znode. Whereas, B and C show the correct znode every time the client on A
  comes online. This is something very perplexing. Earlier I thought this
 was
  a bug in my client implementation. But the test shows that the ZK server
 on
  A after reboot is out of sync with rest of the servers.
 
  The stat command to each server shows that the servers are in sync as far
  as zxid's are concerned (see stat.out). So there is something wrong with
 A's
  local database that is causing this problem.
 
  Has anyone seen this before? I will be doing more debugging in the next
 few
  days. Comments/suggestions for further debugging are welcomed.
 
  -Vishal
 
 
 



Reading znodes directly from snapshot and log files

2010-10-21 Thread Vishal K
Hi,

Is it possible to read znodes directly from snapshot and log files instead
of usign ZooKeeper API. In case a ZK ensemble is not available, can I login
to all available nodes and run a utility that will dump all znodes?

Thanks.
-Vishal