[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-11-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933075#action_12933075
 ] 

Mahadev konar commented on ZOOKEEPER-366:
-

I am all for making it for 3.3.3. I'd be willing to fix it probably not this 
week. 

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Fix For: 3.3.3, 3.4.0

 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-11-16 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932708#action_12932708
 ] 

Patrick Hunt commented on ZOOKEEPER-366:


FYI, this came up again today on hbase list:

14:59  _hp_ man this system time update on a bunch of machines causing 
zookeeper session timeouts causing hr's to die is really taking its toll, count 
on a table now hangs, i disabled and enabled the table, tried count again, and 
it hangs at the same place still.  Arg.


Ben any progress on this? Should we try to get it into 3.3.3?

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-11-16 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932809#action_12932809
 ] 

Benjamin Reed commented on ZOOKEEPER-366:
-

i haven't had a chance to get back to this. we really need to convert all the 
currentTimeMillis() to nanoTime(). we need to do a similar change in the C 
client.

i don't think we can do a test for this.

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Fix For: 3.3.3, 3.4.0

 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-26 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902937#action_12902937
 ] 

Patrick Hunt commented on ZOOKEEPER-366:


One thing we should do - add sufficient logging (warn level or higher I would 
say) to ensure if this does happen in production we have a record of it in the 
log.

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901949#action_12901949
 ] 

Benjamin Reed commented on ZOOKEEPER-366:
-

holger you are correct. nanoTime is the way to go. i'll prepare a fix. one 
problem with it is that the fix will be impossible to test.

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900997#action_12900997
 ] 

Holger Hoffstätte commented on ZOOKEEPER-366:
-

You can avoid the entire problem by only comparing deltas from nanoTime, which 
is independent of time and simply increases monotonously.

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-20 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900824#action_12900824
 ] 

Benjamin Reed commented on ZOOKEEPER-366:
-

anyone have an idea of how to test this? i need to mock 
System.currentTimeMillis().

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-20 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900894#action_12900894
 ] 

Patrick Hunt commented on ZOOKEEPER-366:


perhaps we should have a Clock utility that normally wraps 
System.currentTimeMillis() but can be mocked for testing purposes. If you want 
to do mocks we should do it via mockito.

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Attachments: ZOOKEEPER-366.patch


 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes

2010-08-19 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900511#action_12900511
 ] 

Benjamin Reed commented on ZOOKEEPER-366:
-

after discussion this on the list, we realized that we can detect a big jump in 
time change in the session expiration thread. since we expire a bucket of 
sessions each tick, if we run into the situation where we are going to expire 
more than one bucket in a row, we know we have jumped forward in time. we can 
smooth the jump by requiring at least a 1/2 ticktime wait between each 
bucket. 

 Session timeout detection can go wrong if the leader system time changes
 

 Key: ZOOKEEPER-366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed

 the leader tracks session expirations by calculating when a session will 
 timeout and then periodically checking to see what needs to be timed out 
 based on the current time. this works great as long as the leaders clock 
 progresses at a steady pace. the problem comes when there are big (session 
 size) changes in clock, by ntp for example. if time gets adjusted forward, 
 all the sessions could timeout immediately. if time goes backward sessions 
 that should timeout may take a lot longer to actually expire.
 this is really just a leader issue. the easiest way to deal with this is to 
 have the leader relinquish leadership if it detects a big jump forward in 
 time. when a new leader gets elected, it will recalculate timeouts of active 
 sessions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.