Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-29 Thread Дмитрий Сорокин
Vladimir, At the moment policy looks like so: /** * Policy that defines how node will process the failures. Note that default * failure processing policy is defined by {@link IgniteConfiguration#DFLT_FLR_PLC} property. */ public enum FailureProcessingPolicy { /** Restart jvm. */

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-29 Thread Vladimir Ozerov
Dmitry, Thank you, but how FailureProcessingPolicy looks like? It is not clear how can I configure different reactions to different event types. On Wed, Nov 29, 2017 at 1:47 PM, Дмитрий Сорокин wrote: > Vladimir, > > These policies (policy, in fact) can be configured

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-29 Thread Дмитрий Сорокин
Vladimir, These policies (policy, in fact) can be configured in IgniteConfiguration by calling setFailureProcessingPolicy(FailureProcessingPolicy flrPlc) method. 2017-11-29 10:35 GMT+03:00 Vladimir Ozerov : > Denis, > > Yes, but can we look at proposed API before we dig

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-28 Thread Vladimir Ozerov
Denis, Yes, but can we look at proposed API before we dig into implementation? On Tue, Nov 28, 2017 at 9:43 PM, Denis Magda wrote: > I think the failure processing policy should be configured via > IgniteConfiguration in a way similar to the segmentation policies. > > — >

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-28 Thread Denis Magda
I think the failure processing policy should be configured via IgniteConfiguration in a way similar to the segmentation policies. — Denis > On Nov 27, 2017, at 11:28 PM, Vladimir Ozerov wrote: > > Dmitry, > > How these policies will be configured? Do you have any API in

Re: [!!Mass Mail]Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-28 Thread Сорокин Дмитрий Владимирович
Vladimir, These policies (policy, in fact) can be configured in IgniteConfiguration by calling setFailureProcessingPolicy(FailureProcessingPolicy flrPlc) method. -- Дмитрий Сорокин Тел.: 8-789-13512 Моб.: +7 (916) 560-39-63 28.11.17, 10:28 пользователь "Vladimir Ozerov"

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-27 Thread Vladimir Ozerov
Dmitry, How these policies will be configured? Do you have any API in mind? On Thu, Nov 23, 2017 at 6:26 PM, Denis Magda wrote: > No objections here. Additional policies like EXEC might be added later > depending on user needs. > > — > Denis > > > On Nov 23, 2017, at 2:26

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-23 Thread Denis Magda
No objections here. Additional policies like EXEC might be added later depending on user needs. — Denis > On Nov 23, 2017, at 2:26 AM, Дмитрий Сорокин > wrote: > > Denis, > I propose start with first three policies (it's already implemented, just > await some code

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-23 Thread Дмитрий Сорокин
Denis, I propose start with first three policies (it's already implemented, just await some code combing, commit & review). About of fourth policy (EXEC) I think that it's rather additional property (some script path) than policy. 2017-11-23 0:43 GMT+03:00 Denis Magda : > Just

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-22 Thread Denis Magda
Just provide FailureProcessingPolicy with possible reactions: - NOOP - exceptions will be reported, metrics will be triggered but an affected Ignite process won’t be touched. - HAULT (or STOP or KILL) - all the actions of the of NOOP + Ignite process termination. - RESTART - NOOP actions +

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-21 Thread Vladimir Ozerov
In the first iteration I would focus only on reporting facilities, to let administrator spot dangerous situation. And in the second phase, when all reporting and metrics are ready, we can think on some automatic actions. On Wed, Nov 22, 2017 at 10:39 AM, Mikhail Cherkasov

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-21 Thread Mikhail Cherkasov
Hi Anton, I don't think that we should shutdown node in case of IgniteOOMException, if one node has no space, then other probably don't have it too, so re -balancing will cause IgniteOOM on all other nodes and will kill the whole cluster. I think for some configurations cluster should survive

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-21 Thread Denis Magda
A lack of suggestions and thoughts encourages me to create a ticket: https://issues.apache.org/jira/browse/IGNITE-6980 — Denis > On Nov 20, 2017, at 2:53 PM, Denis Magda wrote: > > If an Ignite operation hangs by some

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-20 Thread Denis Magda
If an Ignite operation hangs by some reason due to an internal problem or buggy application code it needs to eventual *time out*. Take atomic operations case brought by Val to our attention recently: http://apache-ignite-developers.2346864.n4.nabble.com/Timeouts-in-atomic-cache-td19839.html An

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-20 Thread Anton Vinogradov
Dmitry, There's two cases 1) STW duration is long -> notifying monitoring via JMX metric 2) STW duration exceed N seconds -> no need to wait for something. We already know that node will be segmented or that pause bigger that N seconds will affect cluster performance. Better case is to kill node

Re: Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-20 Thread Dmitry Pavlov
Hi Anton, > - GC STW duration exceed maximum possible length (node should be stopped > before STW finished) Are you sure we should kill node in case long STW? Can we produce warnings into logs and monitoring tools an wait node to become alive a little bit longer if we detect STW. In this case

Ignite Enhancement Proposal #7 (Internal problems detection)

2017-11-20 Thread Anton Vinogradov
Igniters, Internal problems may and, unfortunately, cause unexpected cluster behavior. We should determine behavior in case any of internal problem happened. Well known internal problems can be split to: 1) OOM or any other reason cause node crash 2) Situations required graceful node shutdown