Branch for reconfiguration (ZOOKEEPER-107)?
This message is to throw the idea and get a sense of what people think, especially the ones working closely on it like Alex, about creating a branch for the reconfiguration work. The rationale for proposing it is the following. In our experience with Zab last year, we implemented modifications that introduced a bunch of critical bugs. I'm not sure whether we could have been more careful, maybe so, but my perception is that it is not unusual to introduce such bugs once one touches upon a large chunk of core code. Perhaps it would have been better to develop it on the side and test more before merging. For reconfiguration, it sounds like we will be touching a big chunk of core code again, so I wonder if it makes sense to work on a separate branch, test, and merge once we are convinced. I must also say that I understand that merging branches can be quite a pain, so I would completely understand if the general feeling is that it outweighs the benefits. Thanks, -Flavio
[jira] [Updated] (ZOOKEEPER-1367) Data inconsistencies and unexpired ephemeral nodes after cluster restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1367: - Priority: Blocker (was: Major) Fix Version/s: 3.4.3 Data inconsistencies and unexpired ephemeral nodes after cluster restart Key: ZOOKEEPER-1367 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1367 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.2 Environment: Debian Squeeze, 64-bit Reporter: Jeremy Stribling Priority: Blocker Fix For: 3.4.3 Attachments: ZOOKEEPER-1367.tgz In one of our tests, we have a cluster of three ZooKeeper servers. We kill all three, and then restart just two of them. Sometimes we notice that on one of the restarted servers, ephemeral nodes from previous sessions do not get deleted, while on the other server they do. We are effectively running 3.4.2, though technically we are running 3.4.1 with the patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the patches for ZOOKEEPER-1163. I noticed that when I connected using zkCli.sh to the first node (90.0.0.221, zkid 84), I saw only one znode in a particular path: {quote} [zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm [nominee11] [zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 {quote} However, when I connect zkCli.sh to the second server (90.0.0.222, zkid 251), I saw three znodes under that same path: {quote} [zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm nominee06 nominee10 nominee11 [zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee10 90.0.0.221: cZxid = 0x3014c ctime = Thu Jan 19 07:53:42 UTC 2012 mZxid = 0x3014c mtime = Thu Jan 19 07:53:42 UTC 2012 pZxid = 0x3014c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc22 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee06 90.0.0.223: cZxid = 0x20cab ctime = Thu Jan 19 08:00:30 UTC 2012 mZxid = 0x20cab mtime = Thu Jan 19 08:00:30 UTC 2012 pZxid = 0x20cab cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x5434f5074e040002 dataLength = 16 numChildren = 0 {quote} These never went away for the lifetime of the server, for any clients connected directly to that server. Note that this cluster is configured to have all three servers still, the third one being down (90.0.0.223, zkid 162). I captured the data/snapshot directories for the the two live servers. When I start single-node servers using each directory, I can briefly see that the inconsistent data is present in those logs, though the ephemeral nodes seem to get (correctly) cleaned up pretty soon after I start the server. I will upload a tar containing the debug logs and data directories from the failure. I think we can reproduce it regularly if you need more info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1367) Data inconsistencies and unexpired ephemeral nodes after cluster restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190066#comment-13190066 ] Camille Fournier commented on ZOOKEEPER-1367: - I'll take a look this weekend unless someone's on it now. Data inconsistencies and unexpired ephemeral nodes after cluster restart Key: ZOOKEEPER-1367 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1367 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.2 Environment: Debian Squeeze, 64-bit Reporter: Jeremy Stribling Priority: Blocker Fix For: 3.4.3 Attachments: ZOOKEEPER-1367.tgz In one of our tests, we have a cluster of three ZooKeeper servers. We kill all three, and then restart just two of them. Sometimes we notice that on one of the restarted servers, ephemeral nodes from previous sessions do not get deleted, while on the other server they do. We are effectively running 3.4.2, though technically we are running 3.4.1 with the patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the patches for ZOOKEEPER-1163. I noticed that when I connected using zkCli.sh to the first node (90.0.0.221, zkid 84), I saw only one znode in a particular path: {quote} [zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm [nominee11] [zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 {quote} However, when I connect zkCli.sh to the second server (90.0.0.222, zkid 251), I saw three znodes under that same path: {quote} [zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm nominee06 nominee10 nominee11 [zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee10 90.0.0.221: cZxid = 0x3014c ctime = Thu Jan 19 07:53:42 UTC 2012 mZxid = 0x3014c mtime = Thu Jan 19 07:53:42 UTC 2012 pZxid = 0x3014c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc22 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee06 90.0.0.223: cZxid = 0x20cab ctime = Thu Jan 19 08:00:30 UTC 2012 mZxid = 0x20cab mtime = Thu Jan 19 08:00:30 UTC 2012 pZxid = 0x20cab cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x5434f5074e040002 dataLength = 16 numChildren = 0 {quote} These never went away for the lifetime of the server, for any clients connected directly to that server. Note that this cluster is configured to have all three servers still, the third one being down (90.0.0.223, zkid 162). I captured the data/snapshot directories for the the two live servers. When I start single-node servers using each directory, I can briefly see that the inconsistent data is present in those logs, though the ephemeral nodes seem to get (correctly) cleaned up pretty soon after I start the server. I will upload a tar containing the debug logs and data directories from the failure. I think we can reproduce it regularly if you need more info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1367) Data inconsistencies and unexpired ephemeral nodes after cluster restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190091#comment-13190091 ] Ted Dunning commented on ZOOKEEPER-1367: This sort of issue is usually a configuration bug. Can you post your configs as well? Data inconsistencies and unexpired ephemeral nodes after cluster restart Key: ZOOKEEPER-1367 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1367 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.2 Environment: Debian Squeeze, 64-bit Reporter: Jeremy Stribling Priority: Blocker Fix For: 3.4.3 Attachments: ZOOKEEPER-1367.tgz In one of our tests, we have a cluster of three ZooKeeper servers. We kill all three, and then restart just two of them. Sometimes we notice that on one of the restarted servers, ephemeral nodes from previous sessions do not get deleted, while on the other server they do. We are effectively running 3.4.2, though technically we are running 3.4.1 with the patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the patches for ZOOKEEPER-1163. I noticed that when I connected using zkCli.sh to the first node (90.0.0.221, zkid 84), I saw only one znode in a particular path: {quote} [zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm [nominee11] [zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 {quote} However, when I connect zkCli.sh to the second server (90.0.0.222, zkid 251), I saw three znodes under that same path: {quote} [zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm nominee06 nominee10 nominee11 [zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee10 90.0.0.221: cZxid = 0x3014c ctime = Thu Jan 19 07:53:42 UTC 2012 mZxid = 0x3014c mtime = Thu Jan 19 07:53:42 UTC 2012 pZxid = 0x3014c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc22 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee06 90.0.0.223: cZxid = 0x20cab ctime = Thu Jan 19 08:00:30 UTC 2012 mZxid = 0x20cab mtime = Thu Jan 19 08:00:30 UTC 2012 pZxid = 0x20cab cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x5434f5074e040002 dataLength = 16 numChildren = 0 {quote} These never went away for the lifetime of the server, for any clients connected directly to that server. Note that this cluster is configured to have all three servers still, the third one being down (90.0.0.223, zkid 162). I captured the data/snapshot directories for the the two live servers. When I start single-node servers using each directory, I can briefly see that the inconsistent data is present in those logs, though the ephemeral nodes seem to get (correctly) cleaned up pretty soon after I start the server. I will upload a tar containing the debug logs and data directories from the failure. I think we can reproduce it regularly if you need more info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1367) Data inconsistencies and unexpired ephemeral nodes after cluster restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190131#comment-13190131 ] Jeremy Stribling commented on ZOOKEEPER-1367: - I'd like to avoid getting into the exact algorithmic details about our code, but the nodes share their local information to a central component via RPC that then updates each server with the full list, before any of them even start ZK in the first place. We print this info out in another log, and I've verified that it's the same on the two live nodes before ZK starts. That configuration info is: {quote} 90.0.0.223 client_port: 2888 server_port: 2878 election_port: 3888 server_id: 162 uuid: 69dfc0f4-d2b7-4ee4-9dee-cf2bde2e386f 90.0.0.221 client_port: 2888 server_port: 2878 election_port: 3888 server_id: 84 uuid: 2df0e0f4-612b-43b9-8871-52f601468577 90.0.0.222 client_port: 2888 server_port: 2878 election_port: 3888 server_id: 251 uuid: 0e24e45e-7939-46b5-85b8-63919fab03b8 {quote} Furthermore, we were running this exact same way with 3.3.3 for more than a year, and this particular test has always passed. So unless we were doing something bad before and 3.4 now enforces it, or some new requirements crept into 3.4, I'm pretty confident in saying that our setup is consistent across both live nodes. Data inconsistencies and unexpired ephemeral nodes after cluster restart Key: ZOOKEEPER-1367 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1367 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.2 Environment: Debian Squeeze, 64-bit Reporter: Jeremy Stribling Priority: Blocker Fix For: 3.4.3 Attachments: ZOOKEEPER-1367.tgz In one of our tests, we have a cluster of three ZooKeeper servers. We kill all three, and then restart just two of them. Sometimes we notice that on one of the restarted servers, ephemeral nodes from previous sessions do not get deleted, while on the other server they do. We are effectively running 3.4.2, though technically we are running 3.4.1 with the patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the patches for ZOOKEEPER-1163. I noticed that when I connected using zkCli.sh to the first node (90.0.0.221, zkid 84), I saw only one znode in a particular path: {quote} [zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm [nominee11] [zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 {quote} However, when I connect zkCli.sh to the second server (90.0.0.222, zkid 251), I saw three znodes under that same path: {quote} [zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm nominee06 nominee10 nominee11 [zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee11 90.0.0.222: cZxid = 0x40027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x40027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x40027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee10 90.0.0.221: cZxid = 0x3014c ctime = Thu Jan 19 07:53:42 UTC 2012 mZxid = 0x3014c mtime = Thu Jan 19 07:53:42 UTC 2012 pZxid = 0x3014c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc22 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee06 90.0.0.223: cZxid = 0x20cab ctime = Thu Jan 19 08:00:30 UTC 2012 mZxid = 0x20cab mtime = Thu Jan 19 08:00:30 UTC 2012 pZxid = 0x20cab cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x5434f5074e040002 dataLength = 16 numChildren = 0 {quote} These never went away for the lifetime of the server, for any clients connected directly to that server. Note that this cluster is configured to have all three servers still, the third one being down (90.0.0.223, zkid 162). I captured the data/snapshot directories for the the two live servers. When I start single-node servers using each directory, I can briefly see that the inconsistent data is present in those logs, though the ephemeral nodes seem to get (correctly) cleaned up pretty soon after I start the server. I will upload a tar containing the debug logs and data directories from the failure. I think we can reproduce it regularly if you