[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706823#comment-15706823 ] Andrew Grasso commented on ZOOKEEPER-2325: -- This looks good to me. Thanks for putting the pull request together. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Attachment: ZOOKEEPER-2282.patch Combine fix and tests > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2282.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Attachment: (was: ZOOKEEPER-2282.patch) > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Attachment: (was: ZOOKEEPER-2282-TEST.patch) > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Attachment: (was: ZOOKEEPER-2282-TEST.patch) > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2282-TEST.patch, ZOOKEEPER-2282.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Attachment: ZOOKEEPER-2282.patch ZOOKEEPER-2282-TEST.patch Fixed patches to apply cleanly > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2282-TEST.patch, ZOOKEEPER-2282.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2519) zh->state should not be 0 while handle is active
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2519: - Attachment: ZOOKEEPER-2519.patch Thanks for the feedback Michael. The new patch should apply cleanly to 3.4.6. I had chosen ZOO_CONNECTING_STATE because I misunderstood what ZOO_NOTCONNECTED_STATE means. I thought we had used it to replace 0, and that both meant we had not begun connecting. I now think that 0 was used to mean either that we had not begun connecting or that we had closed permanently. So the ZOOKEEPER-800 fix had initially returned invalid state in either case, but I believe as of 3.4.6 it was behaving correctly. Given that, I now agree that ZOO_NOTCONNECTED_STATE makes more sense here, and I've dropped the change to zoo_add_auth > zh->state should not be 0 while handle is active > > > Key: ZOOKEEPER-2519 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2519 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6 >Reporter: Andrew Grasso > Attachments: ZOOKEEPER-2519.patch, ZOOKEEPER-2519.patch > > > 0 does not correspond to any of the defined states for the zookeeper handle, > so a client should not expect to see this value. But in the function > {{handle_error}}, we set {{zh->state = 0}}, which a client may then see. > Instead, we should set our state to be {{ZOO_CONNECTING_STATE}}. > At some point the code moved away from 0 as a valid state and introduced the > defined states. This broke the fix to ZOOKEEPER-800, which checks if state is > 0 to know if the handle has been created but has not yet connected. We now > use {{ZOO_NOTCONNECTED_STATE}} to mean this, so the check for this in > {{zoo_add_auth}} must be changed. > We saw this error in 3.4.6, but I believe it remains present in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2519) zh->state should not be 0 while handle is active
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2519: - Attachment: ZOOKEEPER-2519.patch > zh->state should not be 0 while handle is active > > > Key: ZOOKEEPER-2519 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2519 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6 >Reporter: Andrew Grasso > Attachments: ZOOKEEPER-2519.patch > > > 0 does not correspond to any of the defined states for the zookeeper handle, > so a client should not expect to see this value. But in the function > {{handle_error}}, we set {{zh->state = 0}}, which a client may then see. > Instead, we should set our state to be {{ZOO_CONNECTING_STATE}}. > At some point the code moved away from 0 as a valid state and introduced the > defined states. This broke the fix to ZOOKEEPER-800, which checks if state is > 0 to know if the handle has been created but has not yet connected. We now > use {{ZOO_NOTCONNECTED_STATE}} to mean this, so the check for this in > {{zoo_add_auth}} must be changed. > We saw this error in 3.4.6, but I believe it remains present in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2519) zh->state should not be 0 while handle is active
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2519: - Attachment: (was: ZOOKEEPER-2519.patch) > zh->state should not be 0 while handle is active > > > Key: ZOOKEEPER-2519 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2519 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6 >Reporter: Andrew Grasso > > 0 does not correspond to any of the defined states for the zookeeper handle, > so a client should not expect to see this value. But in the function > {{handle_error}}, we set {{zh->state = 0}}, which a client may then see. > Instead, we should set our state to be {{ZOO_CONNECTING_STATE}}. > At some point the code moved away from 0 as a valid state and introduced the > defined states. This broke the fix to ZOOKEEPER-800, which checks if state is > 0 to know if the handle has been created but has not yet connected. We now > use {{ZOO_NOTCONNECTED_STATE}} to mean this, so the check for this in > {{zoo_add_auth}} must be changed. > We saw this error in 3.4.6, but I believe it remains present in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2519) zh->state should not be 0 while handle is active
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2519: - Attachment: ZOOKEEPER-2519.patch > zh->state should not be 0 while handle is active > > > Key: ZOOKEEPER-2519 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2519 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6 >Reporter: Andrew Grasso > Attachments: ZOOKEEPER-2519.patch > > > 0 does not correspond to any of the defined states for the zookeeper handle, > so a client should not expect to see this value. But in the function > {{handle_error}}, we set {{zh->state = 0}}, which a client may then see. > Instead, we should set our state to be {{ZOO_CONNECTING_STATE}}. > At some point the code moved away from 0 as a valid state and introduced the > defined states. This broke the fix to ZOOKEEPER-800, which checks if state is > 0 to know if the handle has been created but has not yet connected. We now > use {{ZOO_NOTCONNECTED_STATE}} to mean this, so the check for this in > {{zoo_add_auth}} must be changed. > We saw this error in 3.4.6, but I believe it remains present in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2519) zh->state should not be 0 while handle is active
Andrew Grasso created ZOOKEEPER-2519: Summary: zh->state should not be 0 while handle is active Key: ZOOKEEPER-2519 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2519 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.4.6 Reporter: Andrew Grasso 0 does not correspond to any of the defined states for the zookeeper handle, so a client should not expect to see this value. But in the function {{handle_error}}, we set {{zh->state = 0}}, which a client may then see. Instead, we should set our state to be {{ZOO_CONNECTING_STATE}}. At some point the code moved away from 0 as a valid state and introduced the defined states. This broke the fix to ZOOKEEPER-800, which checks if state is 0 to know if the handle has been created but has not yet connected. We now use {{ZOO_NOTCONNECTED_STATE}} to mean this, so the check for this in {{zoo_add_auth}} must be changed. We saw this error in 3.4.6, but I believe it remains present in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2516) C client calculates invalid time interval for pings et al
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425355#comment-15425355 ] Andrew Grasso commented on ZOOKEEPER-2516: -- Can you provide an example input for which {{calculate_interval()}} returns the wrong value? > C client calculates invalid time interval for pings et al > - > > Key: ZOOKEEPER-2516 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2516 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.0, 3.4.8, 3.5.0, 3.5.1, 3.6.0 >Reporter: Hadriel Kaplan > > The C-client has a function called {{calculate_interval()}} in > {{zookeeper.c}}, whose purpose is to determine the number of milliseconds > difference between a start and end time. > Unfortunately its logic is invalid, if the number of microseconds of the end > time happens to be less than the number of microseconds of the start time - > which it will be about half the time, since the end time could be in the next > second interval. Such a case would yield a very big negative number, making > the function return an invalid value. > Instead of re-creating the wheel, the {{calculate_interval()}} should use the > {{timersub()}} function from {{time.h}} if it's available - if it's not > #define'd, then #define it. (it's a macro, and the source code for it is > readily available) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Attachment: ZOOKEEPER-2282-TEST.patch This appears not to have been fixed. The test code for chroot stripping contains the following comment: {quote} // the c client async callbacks do // not callback with the path, so // we dont need to test taht for now // we should fix that though soon! {quote} This is incorrect. Adding a test for the correctness of chroot stripping for async callbacks (e.g. create) causes the test to fail. This patch is against release 3.4.6 > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2282-TEST.patch, strip_chroot.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Priority: Critical (was: Major) > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso >Priority: Critical > Attachments: strip_chroot.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2325: - Attachment: ZOOKEEPER-2325-test.patch This patch adds a test that fails due to this bug, but succeeds with the submitted patch applied. I believe the behavior expected by this test is the "correct" behavior, but please let me know if my understanding is incorrect. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2325: - Attachment: ZOOKEEPER-2325.001.patch By returning -1L, transaction logs are skipped and recovery succeeds. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso > Attachments: ZOOKEEPER-2325.001.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
Andrew Grasso created ZOOKEEPER-2282: Summary: chroot not stripped from path in asynchronous callbacks Key: ZOOKEEPER-2282 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.5.0, 3.4.6 Environment: Centos 6.6 Reporter: Andrew Grasso Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create ops) are called on paths that include the chroot. This is analagous to issue 1027, which fixed this bug for synchronous calls. I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2282) chroot not stripped from path in asynchronous callbacks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Grasso updated ZOOKEEPER-2282: - Attachment: strip_chroot.patch > chroot not stripped from path in asynchronous callbacks > --- > > Key: ZOOKEEPER-2282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2282 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.6, 3.5.0 > Environment: Centos 6.6 >Reporter: Andrew Grasso > Attachments: strip_chroot.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Callbacks passed to [zoo_acreate], [zoo_async], and [zoo_amulti] (for create > ops) are called on paths that include the chroot. This is analagous to issue > 1027, which fixed this bug for synchronous calls. > I've created a patch to fix this in trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332)