Re: IEP-14: Ignite failures handling (Discussion)

2018-03-26 Thread Alexey Goncharuk
Yakov, I agree with Andrey that a separate abstraction for failure handling makes sense. First, using event listeners for this kind of response allows users to install multiple listeners, which may be invoked in an unpredictable order, this looks error-prone to me. Second, we may add an

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-23 Thread Yakov Zhdanov
Andrey, I understand your point but you are trying to build one more mechanism and introduce abstractions that are already here. Again, please take a look at segmentation policy and event types we already have. Thanks! Yakov

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-22 Thread Andrey Gura
Yakov, DiscoveryWorker is critical worker itself and could be terminated or blocked by user provided listener. So specific abstraction for failure handling is more robust way to solve the problem because it doesn't dependent on other components. On Tue, Mar 20, 2018 at 1:33 PM, Yakov Zhdanov

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-20 Thread Yakov Zhdanov
If java runs oome then you cannot guarantee anything. Including calling runtime.halt(). My point is about consistent approach throughout the project. I think developing new mechanism with separate interface is incorrect. Yakov

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-19 Thread Dmitriy Setrakyan
On Mon, Mar 19, 2018 at 2:24 PM, Yakov Zhdanov wrote: > Andrey Gura, > > Why should we have any FailureHandler abstraction? We already have it - > this is EventListener. In my view it is better (and cleaner design) to add > events (similar to, for > example,

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-19 Thread Yakov Zhdanov
Andrey Gura, Why should we have any FailureHandler abstraction? We already have it - this is EventListener. In my view it is better (and cleaner design) to add events (similar to, for example, org.apache.ignite.events.EventType#EVT_NODE_SEGMENTED) like EVT_IGNITE_OOME, EVT_SYS_WORKER_FAILED and

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-16 Thread Dmitriy Setrakyan
Thanks Andrey! I have added a few comments to the IEP-14 page. D. On Fri, Mar 16, 2018 at 6:44 AM, Andrey Gura wrote: > Hi! > > Thank you all for your opinions and ideas! > > While reading the thread I made two important conclusions: > > 1. Proposed API should be changed

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-16 Thread Andrey Gura
Hi! Thank you all for your opinions and ideas! While reading the thread I made two important conclusions: 1. Proposed API should be changed because possible actions enumeration is bad idea. More clean and simple design should allow user provide failure handler implementation with custom logic

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-15 Thread Dmitriy Setrakyan
On Thu, Mar 15, 2018 at 5:21 AM, Dmitry Pavlov wrote: > Hi Dmitriy, > > It seems, here everyone agrees that killing the process will give a more > guaranteed result. The question is that the majority in the community does > not consider this to be acceptable in case Ignite

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-15 Thread Dmitry Pavlov
Hi Dmitriy, It seems, here everyone agrees that killing the process will give a more guaranteed result. The question is that the majority in the community does not consider this to be acceptable in case Ignite as started as embedded lib (e.g. from Java, using Ignition.start()) What can help to

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-14 Thread Dmitriy Setrakyan
On Wed, Mar 14, 2018 at 7:12 PM, Andrey Kornev wrote: > I'm not disagreeing with you, Dmitriy. > > What I'm trying to say is that if we assume that a serious enough bug or > some environmental issue prevents Ignite node from functioning correctly, > then it's only

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-14 Thread Andrey Kornev
6:22 PM To: dev@ignite.apache.org Subject: Re: IEP-14: Ignite failures handling (Discussion) On Wed, Mar 14, 2018 at 3:36 PM, Andrey Kornev <andrewkor...@hotmail.com> wrote: > If I were the one responsible for running Ignite-based applications (be it > embedded or standalone Ignite

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-14 Thread Dmitriy Setrakyan
On Wed, Mar 14, 2018 at 3:36 PM, Andrey Kornev wrote: > If I were the one responsible for running Ignite-based applications (be it > embedded or standalone Ignite) in my company's datacenter, I'd prefer the > application nodes simply make their current state readily

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-14 Thread Dmitriy Setrakyan
On Tue, Mar 13, 2018 at 11:17 PM, Nick Pordash wrote: > I can tell you as a user that if any library I was using in my application > called System.exit without my consent would result in a lot of frustration. > > If ignite enters an unrecoverable state then I think that is

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Vladimir Ozerov
As far as shutdown, what we need to implement is “hard shutdown” mode. This is when we first close all network sockets, then cancel all registered futures. This would enough to unblock the cluster and local user threads. ср, 14 марта 2018 г. в 8:40, Vladimir Ozerov : >

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Vladimir Ozerov
Valya, This is very easy to answer - if CommandLineStartup is used, then it is standalone node. In all other cases it is embedded. If node shutdown hangs - just let it continue hanging, so that application admins are able to decide on their own what to do next. Someone would want to get the

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Nikolay Izhikov
Dmitriy. I think you and other participants of discussion are talking about different cases. May be it be usefull to look at specific cases and discuss each of them separately? I look at IEP page and see following: ``` File IO errors. Usually IOException's threw by read/write operations on

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Nick Pordash
I can tell you as a user that if any library I was using in my application called System.exit without my consent would result in a lot of frustration. If ignite enters an unrecoverable state then I think that is something that should be observable locally, similar to node segmentation and then

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Valentin Kulichenko
Ivan, If grid hangs, graceful shutdown would most likely hang as well. Almost never you can recover from a bad state using graceful procedures. I agree that we should not create two defaults, especially in this case. It's not even strictly defined what is embedded node in Ignite. For example, if

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Ivan Rakov
One more note: "kill if standalone, stop if embedded" differs from what you are suggesting "try graceful, then kill process regardless" only in case when graceful shutdown hangs. Do we have understanding, how often does graceful shutdown hang? Obviously, *grid hang* is often case, but it

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitriy Setrakyan
On Tue, Mar 13, 2018 at 7:13 PM, Ivan Rakov wrote: > I just would like to add my +1 for "kill if standalone, stop if embedded" > default option. My arguments: > > 1) Regarding "If Ignite hangs - it will likely be impossible to stop": > Unfortunately, it's true that Ignite

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Ivan Rakov
I just would like to add my +1 for "kill if standalone, stop if embedded" default option. My arguments: 1) Regarding "If Ignite hangs - it will likely be impossible to stop": Unfortunately, it's true that Ignite can hang during stop procedure. However, most of failures described under IEP-14

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitriy Setrakyan
On Tue, Mar 13, 2018 at 6:55 PM, Dmitry Pavlov wrote: > What do you think if stop is default for all cases? > > Kill is configurable. > > We can consider enforse sockets close for 'stop'. This will allow to ignore > hang node by rest of the cluster. > Dmitriy, I see that

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitry Pavlov
What do you think if stop is default for all cases? Kill is configurable. We can consider enforse sockets close for 'stop'. This will allow to ignore hang node by rest of the cluster. ср, 14 мар. 2018 г., 1:48 Dmitriy Setrakyan : > Guys, I do not think there is an

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitriy Setrakyan
Guys, I do not think there is an understanding here. If Ignite hangs - it will likely be impossible to stop. So if you are suggesting "stop if embedded", you might as well suggest "do nothing if embedded". I have seen many Ignite deployments, embedded or not, large and small, and in all those

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Denis Magda
+1 for "kill if standalone, stop if embedded" behavior. If the practice shows that the node should be killed regardless of the mode, then it will be an easy change. Now we are just guessing, and common sense suggests going for "kill if standalone, stop if embedded" until we get feedback. - Denis

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitry Pavlov
You are suggesting to kill the process, which was not started by Ignite, are not you? More consistently is to stop only those processes that are generated by the control of Ignite, e.g. from ignite.sh - here it is ok for me. If we relese 'kill by default' as part of 2.5, we will end up with 2.6

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitriy Setrakyan
Dmitriy, I think everyone is suggesting that stopping the node will likely be impossible if Ignite is frozen. Moreover, it is very likely that all other apps are frozen too. My comments are below... On Tue, Mar 13, 2018 at 9:12 AM, Dmitry Pavlov wrote: > Please consider

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitry Pavlov
Please consider that user application may use Ignite as optional cache for some low-priority feature, but main logic is well functioning without Ingnite. I can say, as Ignite user in the past, that it is quite real case. Second real case is using several war files within one application server,

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitriy Setrakyan
On Tue, Mar 13, 2018 at 8:16 AM, Dmitry Pavlov wrote: > Dmitriy, alternative is "kill if standalone, stop if embedded" > User will be still able to set something like > -DNODE_CRASH_ACTION="kill" > if ignite.sh is not used and user accepts alternative that whole process

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Andrey Kuznetsov
The most doubtful thing is 'stopping'. What if node does not respond due to critical failure? 2018-03-13 15:16 GMT+03:00 Dmitry Pavlov : > Dmitriy, alternative is "kill if standalone, stop if embedded" > > User will be still able to set something like >

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitry Pavlov
Dmitriy, alternative is "kill if standalone, stop if embedded" User will be still able to set something like -DNODE_CRASH_ACTION="kill" if ignite.sh is not used and user accepts alternative that whole process would be killed if node is crashed. Default would be 'node stop', but not hang up

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitriy Setrakyan
Guys, I do not understand the alternative. If Ignite is frozen and causes the whole grid to freeze, how can we justify not killing it? Will uses rather have their applications freeze? I would consider real life use cases here. Can someone present a life example where keeping a frozen grid node

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Alexey Goncharuk
I also like "kill if standalone, stop if embedded" by default. A use can change it to kill for embedded mode, but it will be a controlled safe choice. 2018-03-13 11:26 GMT+03:00 Vladimir Ozerov : > +1 for "kill if standalone, stop if embedded". We should never kill a >

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Vladimir Ozerov
+1 for "kill if standalone, stop if embedded". We should never kill a process in embedded node because it might be disastrous for user application. On Tue, Mar 13, 2018 at 10:41 AM, Dmitry Pavlov wrote: > Denis, Dmitriy, I am not sure I agree here, please see close

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-13 Thread Dmitry Pavlov
Denis, Dmitriy, I am not sure I agree here, please see close analogue - JVM itself, and its parameter ExitOnOutOfMemoryError,- it is not default. If server node is started from sh script, kill OK for me, as process is controlled only by ignite. It is sufficient to add option to override default

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Dmitriy Setrakyan
On Tue, Mar 13, 2018 at 1:18 AM, Andrey Kornev wrote: > I believe the only reasonable way to handle a critical system failure (as > it is defined in the IEP) is a JVM halt (not a graceful exit/shutdown!). > The sooner - the better, lesser impact. There’s simply no way

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Dmitriy Setrakyan
On Mon, Mar 12, 2018 at 5:12 PM, Denis Magda wrote: > Dmitriy, > > Ignite client node is usually used in the embedded mode. By killing the > whole process, the node is running in, we're going to kill the entire > application. That doesn't sound like a good plan. That's why my

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Denis Magda
Dmitriy, Ignite client node is usually used in the embedded mode. By killing the whole process, the node is running in, we're going to kill the entire application. That doesn't sound like a good plan. That's why my suggestion is to try to kill the node somehow instead rather than the whole

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Dmitriy Setrakyan
Denis, what is the difference between killing the process and killing the node and the process? D. On Mon, Mar 12, 2018 at 12:03 PM, Denis Magda wrote: > Guys, > > I would make a decision depending on a type of the problematic node: > >- If it's a *server node*, then

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Denis Magda
Guys, I would make a decision depending on a type of the problematic node: - If it's a *server node*, then let's kill the process simply because the node usually owns the whole process. Don't see a practical reason why a user wants to run 2 server nodes in a single process. - If it's

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Dmitry Pavlov
Hi Andrey, Igniters, Thank you for starting this topic, because this is really important decision. JVM termination in case Ignite is started within application server with other application will kill all services started. So I suggest this option is not default. We can add this option

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Andrey Kuznetsov
To my mind, the default action should be as severe as possible, since we deal with critical errors, that is, entire JVM termination. In the case of some custom setup (e.g. different cluster nodes in one JVM) failure response action should be configured explicitly. 2018-03-12 12:32 GMT+03:00

IEP-14: Ignite failures handling (Discussion)

2018-03-12 Thread Andrey Gura
Igniters! We are working on proposal described in IEP-14 Ignite failures handling [1] and it's time to discuss it with community (although it was necessary to do this before). Most important question: what should be default behaviour in case of failure? There are 4 actions: 1. Restart JVM