[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-11-21 Thread Denis Magda (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695341#comment-16695341
 ] 

Denis Magda commented on IGNITE-9679:
-

Thanks folks, everything is clear. Closing the ticket.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Denis Magda
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-11-16 Thread Prachi Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690218#comment-16690218
 ] 

Prachi Garg commented on IGNITE-9679:
-

[~Artem Budnikov], looks good.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Prachi Garg
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-11-14 Thread Artem Budnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686258#comment-16686258
 ] 

Artem Budnikov commented on IGNITE-9679:


[~pgarg] 

I updated the description because the implementation has changed. By default, 
the failure handler is not called when a critical worker stops responding. If 
the user wants to call the failure handler in such situations, additional 
configuring is required, that's why I added another configuration example in 
the "Critical Workers Health Check" section.

Please review.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Artem Budnikov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-11-01 Thread Prachi Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672441#comment-16672441
 ] 

Prachi Garg commented on IGNITE-9679:
-

[~Artem Budnikov], Is failure handler called under two circumstances - either 
there is critical failure or there is a critical worker that is not 
operational? If so, then what is the difference in configuration for Failure 
Handler section and Critical Workers Health Check section?

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Prachi Garg
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-31 Thread Artem Budnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670176#comment-16670176
 ] 

Artem Budnikov commented on IGNITE-9679:


[~pgarg] please review the page.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Prachi Garg
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-31 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669859#comment-16669859
 ] 

Andrey Kuznetsov commented on IGNITE-9679:
--

[~Artem Budnikov], sorry. Maybe I overlooked something yesterday. Now I see 
that configuration description is up to date.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-31 Thread Artem Budnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669837#comment-16669837
 ] 

Artem Budnikov commented on IGNITE-9679:


[~andrey-kuznetsov]

I've added the first two points you mentioned to the page.

With regard to configuration, I think I've described that already, or is 
something missing?

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-30 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668842#comment-16668842
 ] 

Andrey Kuznetsov commented on IGNITE-9679:
--

[~Artem Budnikov], thanks, great job!

Please consider some minor remarks.
* Blocked (aka hanging) worker could be included to Critical Failures list.
* Workers of Data Streamer striped pool could be added to mission critical 
worker list.
* Due to [1], blocked worker timeout configuration became a bit trickier. 
Should this be mentioned in docs? 

[1] 
https://issues.apache.org/jira/browse/IGNITE-9737?focusedCommentId=16632210=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16632210

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-30 Thread Artem Budnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668757#comment-16668757
 ] 

Artem Budnikov commented on IGNITE-9679:


I added a description of this feature to 
[https://apacheignite.readme.io/v2.6/docs/critical-failures-handling-27#section-critical-workers-health-check]

[~andrey-kuznetsov], [~agura] please review and provide feedback.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Artem Budnikov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-18 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655548#comment-16655548
 ] 

Andrey Kuznetsov commented on IGNITE-9679:
--

[~Artem Budnikov], this issue is not blocked anymore, your help with it is 
appreciated.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-09-24 Thread Denis Magda (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626422#comment-16626422
 ] 

Denis Magda commented on IGNITE-9679:
-

[~Artem Budnikov], please help to document this issue.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Artem Budnikov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a 
> threshold value.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> Liveness checks are enabled by default, but can be disabled through 
> {{WorkersControlMXBean.healthMonitoringEnabled}} property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)