[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933075#action_12933075 ] Mahadev konar commented on ZOOKEEPER-366: - I am all for making it for 3.3.3. I'd be willing to fix it probably not this week. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.3.3, 3.4.0 Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932708#action_12932708 ] Patrick Hunt commented on ZOOKEEPER-366: FYI, this came up again today on hbase list: 14:59 _hp_ man this system time update on a bunch of machines causing zookeeper session timeouts causing hr's to die is really taking its toll, count on a table now hangs, i disabled and enabled the table, tried count again, and it hangs at the same place still. Arg. Ben any progress on this? Should we try to get it into 3.3.3? Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932809#action_12932809 ] Benjamin Reed commented on ZOOKEEPER-366: - i haven't had a chance to get back to this. we really need to convert all the currentTimeMillis() to nanoTime(). we need to do a similar change in the C client. i don't think we can do a test for this. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.3.3, 3.4.0 Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902937#action_12902937 ] Patrick Hunt commented on ZOOKEEPER-366: One thing we should do - add sufficient logging (warn level or higher I would say) to ensure if this does happen in production we have a record of it in the log. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901949#action_12901949 ] Benjamin Reed commented on ZOOKEEPER-366: - holger you are correct. nanoTime is the way to go. i'll prepare a fix. one problem with it is that the fix will be impossible to test. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900997#action_12900997 ] Holger Hoffstätte commented on ZOOKEEPER-366: - You can avoid the entire problem by only comparing deltas from nanoTime, which is independent of time and simply increases monotonously. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900824#action_12900824 ] Benjamin Reed commented on ZOOKEEPER-366: - anyone have an idea of how to test this? i need to mock System.currentTimeMillis(). Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900894#action_12900894 ] Patrick Hunt commented on ZOOKEEPER-366: perhaps we should have a Clock utility that normally wraps System.currentTimeMillis() but can be mocked for testing purposes. If you want to do mocks we should do it via mockito. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900511#action_12900511 ] Benjamin Reed commented on ZOOKEEPER-366: - after discussion this on the list, we realized that we can detect a big jump in time change in the session expiration thread. since we expire a bucket of sessions each tick, if we run into the situation where we are going to expire more than one bucket in a row, we know we have jumped forward in time. we can smooth the jump by requiring at least a 1/2 ticktime wait between each bucket. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.