RE: A few ZooKeeper questions

Rakesh R Thu, 04 Dec 2014 04:49:33 -0800

Hi,

>>>>>>>- Do you think, we could go without that stupid 10ms wait in lock 
>>>>>>>recipe, and just relied on zookeeper that exists() with watch will 
>>>>>>>return in timely fashion? Is there little or no possibilty that one 
>>>>>>>session would get stuck/deadlocked?


Since you are using EPHEMRAL_SEQUENCE node and the lowest one will get always 
lock. I suggest rather than doing 10secs wait. You can have a watcher 
implementation which looks only my predecessor lock znode deletion. Also, on 
reconnection (in disconnected and then syncconnected) you can getChildren and 
see my predecessor still exists or not.


>>>>>>>>- If clients requesting a lock in /SESSIONID/_xlock/lock- connect to 
>>>>>>>>different servers in quorum, is there a need to perform a sync before 
>>>>>>>>checking if we're the node with minimal sequence number?
Not required, since you are using EHPEMRAL_SEQUENCE all the writes will go 
through the Leader and if my client lock is "lock_000010" it means Leader has 
created "lock_000009" before me. getChildren() will return [lock_000009, 
lock_000010]

>>>>>>>- Would adding more servers to quorum increase the performance time? 
>>>>>>>Would performance be more constant that way?
If you increase the number of servers it will affect the write latency and will 
affect the write performance. If you want to improve the read performance you 
can add "Observers". 
Please refer: http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html

>>>>>>- Are we performing too many writes to zookeeper?
>>>>>>>- If we have 300k child nodes (eg 300k /SESSIONID nodes) would that be a 
>>>>>>>performace issue?

If there are many writes per secs concurrently, there could be cases of bottle 
neck at the disk level or network packets. This will affect the performance. 
But you can do benchmark in your env.
>From the above description what I understood is you are creating and deleting 
>the znodes very frequently. In that case, there will be more snapshot and 
>transaction logs in dataDir/dataLogDir respectively. Please ensure the purge 
>tasks is enabled and do the purging periodically.


>>>>>>>- When we make the lock path, we use recursive create, which basically 
>>>>>>>does a create() for each node, and if ZNODEEXISTS is returned, we simply 
>>>>>>>go on. I see a *lot* of these messages in log: 
[ProcessThread:-1:PrepRequestProcessor@419] - Got user-level KeeperException 
when processing sessionid:0x34a0cbf41360018 type:create cxid:0x548a2589 
zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error 
Path:/SESSIONID/_xlock Error:KeeperErrorCode = NodeExists for /SESSIONID/_xlock 
 Would it be better if exists() would be called for each before creating if 
needed? And, do we need to sync() before checking with exists()?

exists() api call will reduce the exceptions, but this can't be avoided I feel. 
Because you have many concurrent writes per second. Sometimes exists() may not 
seen the concurrently created /SESSIONID/_xlock znode.


- Would upgrading to newer version improve performance?
3.3.5 version is quite old, please take the stable 3.4.* release.


Cheers,
Rakesh
-----Original Message-----
From: Dejan Markic [mailto:[email protected]] 
Sent: 03 December 2014 20:32
To: [email protected]
Subject: FW: A few ZooKeeper questions

I'm sending this again, as I don't know if I sent it to the correct e-mail.

Kind regards,
Dejan

-----Original Message-----
From: Dejan Markic 
Sent: Tuesday, December 2, 2014 11:23 PM
To: [email protected]
Subject: A few ZooKeeper questions

Hello all!

Sorry for long message. But, here goes ...

We've started using ZooKeeper a few weeks ago, mostly for distributed locking. 
We have some performance issues, so I'll try to explain as much as possible 
about our configuration and applications.

For each "service", for which we need ZooKeeper, we're currently running quorum 
with 3 servers.
We are running version 3.3.5 on Debian 7.7. We are using MT C API using 
nonasync functions.
All servers have the following configuration:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
server.1=zk_server_1:2888:3888
server.2=zk_server_2:2888:3888
server.3=zk_server_3:2888:3888
maxClientCnxns=50
skipACL=yes

On all servers, /var/lib/zookeeper directory is acctualy a 2GB tmpfs 
"partition".
We are running zkPurgeTxnLog.sh script every 2 minutes.
We have 6 CPU's on each server with 6GB of ram, but there's ussualy atleast one 
more service running on it (eg Radius).

So, let me try to explain in which way we use the zookeeper.
We have Radius server(s) receiving around 300 requests per second 
(authentication and accounting). We need to maintain all sessions states 
(start, stop update) (and sessions come and go very quickly).
NAS(es) can send many requests for same session in unordered fashion (eg, auth 
request comes before acct Start, and Stop comes before Update, etc). 
So for each request, we request a lock from ZooKeeper. We use a bit modified 
recipe as found at https://zookeeper.apache.org/doc/r3.1.2/recipes.html.
The recipe we're currently using is far from perfect as we tried to implement a 
timeout for locking. So basically what we do is:

 - create ephemeral node /SESSIONID/_xlock/lock-
 - get children nodes in /SESSIONID/_xlock/
 - If we are the lowest sequence, we got the lock
 - If not, we wait for 10ms and call get children again untill we get the lock, 
or timeout expires (we ussualy wait 1 second)

After we get the lock, we write/read some stuff relevant for session in 
/SESSIONID/nodeName. Data is ussualy up to 20 bytes atmost. We write/read up to 
3 nodes for each session.
We then insert/update session in our MySQL servers and unlock the lock in 
ZooKeeper.

Ussualy it takes around 30-50ms to obtain the lock from ZooKeeper. But it does 
happen, that timeout of 1 second occurs. It seems that sometimes ZooKeeper is 
performing something and is busy and response times can go way up.

Since there's a lot of nodes (/SESSIONID) left after the sessions finish, we 
created a script, that removes all sessions that were last modified 900 seconds 
ago. So this script goes through all nodes (/SESSIONID) and checks their 
children last modification time. If none of the children were changed in last 
900 seconds, we remove the whole SESSIONID node and its children.

Questions:
- Do you think, we could go without that stupid 10ms wait in lock recipe, and 
just relied on zookeeper that exists() with watch will return in timely 
fashion? Is there little or no possibilty that one session would get 
stuck/deadlocked?
- If clients requesting a lock in /SESSIONID/_xlock/lock- connect to different 
servers in quorum, is there a need to perform a sync before checking if we're 
the node with minimal sequence number?
- Would adding more servers to quorum increase the performance time? Would 
performance be more constant that way?
- Are we performing too many writes to zookeeper?
- If we have 300k child nodes (eg 300k /SESSIONID nodes) would that be a 
performace issue?
- When we make the lock path, we use recursive create, which basically does a 
create() for each node, and if ZNODEEXISTS is returned, we simply go on. I see 
a *lot* of these messages in log: 
[ProcessThread:-1:PrepRequestProcessor@419] - Got user-level KeeperException 
when processing sessionid:0x34a0cbf41360018 type:create cxid:0x548a2589 
zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error 
Path:/SESSIONID/_xlock Error:KeeperErrorCode = NodeExists for /SESSIONID/_xlock 
 Would it be better if exists() would be called for each before creating if 
needed? And, do we need to sync() before checking with exists()?
- Would upgrading to newer version improve performance?

Any input would be greatly appreciated!

Kind regards,
Dejan Markic

RE: A few ZooKeeper questions

Reply via email to