Re: Too many KeeperErrorCode = Session moved messages

2010-08-08 Thread Patrick Hunt
I suspect this is a bug with the sync call and session moved (the code 
path for sync is a bit special). Please enter a JIRA for this. Thanks.


Patrick

On 08/05/2010 01:20 PM, Vishal K wrote:

Hi All,

I am seeing a lot of these messages in our application. I would like to know
if I am doing something wrong or this is a ZK bug.

Setup:
- Server environment:zookeeper.version=3.3.0-925362
- 3 node cluster
- Each node has few clients that connect to the local server using 127.0.0.1
as the host IP.
- The application first forms a ZK cluster. Once the ZK cluster is formed,
each node establish sessions with local ZK servers. The clients do not know
about remote server so sessions are always with the local server.

As soon as ZK clients connected to their respective follower, the ZK leader
starts spitting the following messages:

2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,748 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x9 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,755 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0xb zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,795 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x10 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,850 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x1 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,910 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x1b zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,920 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x20 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,019 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x29 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,030 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2c zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,035 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2e zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,065 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x33 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:38,840 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x4 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
20

These sessions were established on the follower:
2010-07-01 08:59:09,890 - INFO  [CommitProcessor:0:nioserverc...@1431] -
Established session 0x298d3b1fa9 with negotiated timeout 9000 for client
/127.0.0.1:50773
2010-07-01 08:59:09,890 - INFO
[SvaDefaultBLC-SendThread(localhost.localdom:2181):clientcnxn$sendthr...@701]
- Session establishment complete on server localhost.localdom/127.0.0.1:2181,
sessionid = 0x298d3b1fa9, negotiated timeout = 9000


The server is spitting out these messages for every session that it does not
own  (session established by clients with followers). The messages are
always seen for a sync request.
No other issues are seen with the cluster. I am wondering what would be the
cause of this problem? Looking at PrepRequestProcessor, it seems like this
message is printed when the owner of the 

Too many KeeperErrorCode = Session moved messages

2010-08-05 Thread Vishal K
Hi All,

I am seeing a lot of these messages in our application. I would like to know
if I am doing something wrong or this is a ZK bug.

Setup:
- Server environment:zookeeper.version=3.3.0-925362
- 3 node cluster
- Each node has few clients that connect to the local server using 127.0.0.1
as the host IP.
- The application first forms a ZK cluster. Once the ZK cluster is formed,
each node establish sessions with local ZK servers. The clients do not know
about remote server so sessions are always with the local server.

As soon as ZK clients connected to their respective follower, the ZK leader
starts spitting the following messages:

2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,748 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x9 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,755 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0xb zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,795 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x10 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,850 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x1 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,910 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x1b zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,920 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x20 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,019 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x29 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,030 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2c zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,035 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2e zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,065 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x33 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:38,840 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x4 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
20

These sessions were established on the follower:
2010-07-01 08:59:09,890 - INFO  [CommitProcessor:0:nioserverc...@1431] -
Established session 0x298d3b1fa9 with negotiated timeout 9000 for client
/127.0.0.1:50773
2010-07-01 08:59:09,890 - INFO
[SvaDefaultBLC-SendThread(localhost.localdom:2181):clientcnxn$sendthr...@701]
- Session establishment complete on server localhost.localdom/127.0.0.1:2181,
sessionid = 0x298d3b1fa9, negotiated timeout = 9000


The server is spitting out these messages for every session that it does not
own  (session established by clients with followers). The messages are
always seen for a sync request.
No other issues are seen with the cluster. I am wondering what would be the
cause of this problem? Looking at PrepRequestProcessor, it seems like this
message is printed when the owner of the request is not same as session
owner. But in our application this should never happen since clients always
connect to its local server.

Any ideas?

Thanks.
-Vishal


Re: Too many KeeperErrorCode = Session moved messages

2010-08-05 Thread Ted Dunning
I can't comment much on this, except that this is a very odd usage pattern.

First, it isn't so unusual, but I find it a particularly bad practice to
embed ZK into your application.  The problem is that you lose a lot of the
virtues of ZK in terms of coordination if ZK goes down with your
application.  In a nutshell, what good is a coordination layer if it isn't
relatively permanent.  For instance, one important use of a coordination
layer is to avoid multiple invocations of an expensive component on a
machine.  You can't do that unless you share a ZK cluster between all
invocations of the component.  Similarly, restarting you application is much
more common than restarting ZK, but by connecting the two of these, you
again lose any ability to make configuration persistent and you lose the
ability to restart one piece of your application without restarting your ZK
at the same time.  This coupling between restarts of very different service
components is a very bad idea.  Better to have simple components that serve
simple ends.  ZK is relatively simple, very stable and does one job well.
 Why mess with that?

Secondly, why in the world are you connecting to the local ZK server?  Why
not to the cluster at large?  By connecting to only a single server you lose
all the benefits of high availability in the ZK layer because the client
can't fail-over to other servers.  Likewise, by using the local loopback
address, you make it much harder to understand your server logs. The amount
of data moved to and from a ZK cluster is typically relatively small so
there is no significant benefit to keeping the traffic local to a single
machine.

Thirdly, I suspect that associated with your somewhat idiosyncratic
architecture is some slightly odd ZK configuration.  Could you post your
configuration files?  Your log files make it sound like the cluster might be
confused about itself.

On Thu, Aug 5, 2010 at 1:20 PM, Vishal K vishalm...@gmail.com wrote:


 I am seeing a lot of these messages in our application. I would like to
 know
 if I am doing something wrong or this is a ZK bug.

 Setup:
 - Server environment:zookeeper.version=3.3.0-925362
 - 3 node cluster
 - Each node has few clients that connect to the local server using
 127.0.0.1
 as the host IP.
 - The application first forms a ZK cluster. Once the ZK cluster is formed,
 each node establish sessions with local ZK servers. The clients do not know
 about remote server so sessions are always with the local server.

 As soon as ZK clients connected to their respective follower, the ZK leader
 starts spitting the following messages: