Yakov,
I agree with Andrey that a separate abstraction for failure handling makes
sense.
First, using event listeners for this kind of response allows users to
install multiple listeners, which may be invoked in an unpredictable order,
this looks error-prone to me. Second, we may add an additiona
Andrey, I understand your point but you are trying to build one more
mechanism and introduce abstractions that are already here. Again, please
take a look at segmentation policy and event types we already have.
Thanks!
Yakov
Yakov,
DiscoveryWorker is critical worker itself and could be terminated or
blocked by user provided listener. So specific abstraction for failure
handling is more robust way to solve the problem because it doesn't
dependent on other components.
On Tue, Mar 20, 2018 at 1:33 PM, Yakov Zhdanov wro
If java runs oome then you cannot guarantee anything. Including calling
runtime.halt().
My point is about consistent approach throughout the project. I think
developing new mechanism with separate interface is incorrect.
Yakov
On Mon, Mar 19, 2018 at 2:24 PM, Yakov Zhdanov wrote:
> Andrey Gura,
>
> Why should we have any FailureHandler abstraction? We already have it -
> this is EventListener. In my view it is better (and cleaner design) to add
> events (similar to, for
> example, org.apache.ignite.events.EventType#EVT
Andrey Gura,
Why should we have any FailureHandler abstraction? We already have it -
this is EventListener. In my view it is better (and cleaner design) to add
events (similar to, for
example, org.apache.ignite.events.EventType#EVT_NODE_SEGMENTED) like
EVT_IGNITE_OOME, EVT_SYS_WORKER_FAILED and fi
Thanks Andrey! I have added a few comments to the IEP-14 page.
D.
On Fri, Mar 16, 2018 at 6:44 AM, Andrey Gura wrote:
> Hi!
>
> Thank you all for your opinions and ideas!
>
> While reading the thread I made two important conclusions:
>
> 1. Proposed API should be changed because possible action
Hi!
Thank you all for your opinions and ideas!
While reading the thread I made two important conclusions:
1. Proposed API should be changed because possible actions enumeration
is bad idea. More clean and simple design should allow user provide
failure handler implementation with custom logic of
On Thu, Mar 15, 2018 at 5:21 AM, Dmitry Pavlov
wrote:
> Hi Dmitriy,
>
> It seems, here everyone agrees that killing the process will give a more
> guaranteed result. The question is that the majority in the community does
> not consider this to be acceptable in case Ignite as started as embedded
Hi Dmitriy,
It seems, here everyone agrees that killing the process will give a more
guaranteed result. The question is that the majority in the community does
not consider this to be acceptable in case Ignite as started as embedded
lib (e.g. from Java, using Ignition.start())
What can help to ac
On Wed, Mar 14, 2018 at 7:12 PM, Andrey Kornev
wrote:
> I'm not disagreeing with you, Dmitriy.
>
> What I'm trying to say is that if we assume that a serious enough bug or
> some environmental issue prevents Ignite node from functioning correctly,
> then it's only logical to assume that Ignite pr
, 2018 6:22 PM
To: dev@ignite.apache.org
Subject: Re: IEP-14: Ignite failures handling (Discussion)
On Wed, Mar 14, 2018 at 3:36 PM, Andrey Kornev
wrote:
> If I were the one responsible for running Ignite-based applications (be it
> embedded or standalone Ignite) in my company's datacen
On Wed, Mar 14, 2018 at 3:36 PM, Andrey Kornev
wrote:
> If I were the one responsible for running Ignite-based applications (be it
> embedded or standalone Ignite) in my company's datacenter, I'd prefer the
> application nodes simply make their current state readily available to
> external tools
On Tue, Mar 13, 2018 at 11:17 PM, Nick Pordash
wrote:
> I can tell you as a user that if any library I was using in my application
> called System.exit without my consent would result in a lot of frustration.
>
> If ignite enters an unrecoverable state then I think that is something that
> should
ly on their own tooling for handling
failures.
Regards
Andrey
From: Vladimir Ozerov
Sent: Tuesday, March 13, 2018 10:43 PM
To: dev@ignite.apache.org
Subject: Re: IEP-14: Ignite failures handling (Discussion)
As far as shutdown, what we need to implement is “hard
As far as shutdown, what we need to implement is “hard shutdown” mode. This
is when we first close all network sockets, then cancel all registered
futures. This would enough to unblock the cluster and local user threads.
ср, 14 марта 2018 г. в 8:40, Vladimir Ozerov :
> Valya,
>
> This is very eas
Valya,
This is very easy to answer - if CommandLineStartup is used, then it is
standalone node. In all other cases it is embedded.
If node shutdown hangs - just let it continue hanging, so that application
admins are able to decide on their own what to do next. Someone would want
to get the stack
Dmitriy.
I think you and other participants of discussion are talking about different
cases.
May be it be usefull to look at specific cases and discuss each of them
separately?
I look at IEP page and see following:
```
File IO errors. Usually IOException's threw by read/write operations on fi
I can tell you as a user that if any library I was using in my application
called System.exit without my consent would result in a lot of frustration.
If ignite enters an unrecoverable state then I think that is something that
should be observable locally, similar to node segmentation and then the
Ivan,
If grid hangs, graceful shutdown would most likely hang as well. Almost
never you can recover from a bad state using graceful procedures.
I agree that we should not create two defaults, especially in this case.
It's not even strictly defined what is embedded node in Ignite. For
example, if
One more note: "kill if standalone, stop if embedded" differs from what
you are suggesting "try graceful, then kill process regardless" only in
case when graceful shutdown hangs.
Do we have understanding, how often does graceful shutdown hang?
Obviously, *grid hang* is often case, but it shouldn
On Tue, Mar 13, 2018 at 7:13 PM, Ivan Rakov wrote:
> I just would like to add my +1 for "kill if standalone, stop if embedded"
> default option. My arguments:
>
> 1) Regarding "If Ignite hangs - it will likely be impossible to stop":
> Unfortunately, it's true that Ignite can hang during stop pro
I just would like to add my +1 for "kill if standalone, stop if
embedded" default option. My arguments:
1) Regarding "If Ignite hangs - it will likely be impossible to stop":
Unfortunately, it's true that Ignite can hang during stop procedure.
However, most of failures described under IEP-14 (s
On Tue, Mar 13, 2018 at 6:55 PM, Dmitry Pavlov
wrote:
> What do you think if stop is default for all cases?
>
> Kill is configurable.
>
> We can consider enforse sockets close for 'stop'. This will allow to ignore
> hang node by rest of the cluster.
>
Dmitriy, I see that you cannot come to terms
What do you think if stop is default for all cases?
Kill is configurable.
We can consider enforse sockets close for 'stop'. This will allow to ignore
hang node by rest of the cluster.
ср, 14 мар. 2018 г., 1:48 Dmitriy Setrakyan :
> Guys, I do not think there is an understanding here. If Ignite
Guys, I do not think there is an understanding here. If Ignite hangs - it
will likely be impossible to stop. So if you are suggesting "stop if
embedded", you might as well suggest "do nothing if embedded".
I have seen many Ignite deployments, embedded or not, large and small, and
in all those depl
+1 for "kill if standalone, stop if embedded" behavior. If the practice
shows that the node should be killed regardless of the mode, then it will
be an easy change. Now we are just guessing, and common sense suggests
going for "kill if standalone, stop if embedded" until we get feedback.
-
Denis
You are suggesting to kill the process, which was not started by Ignite,
are not you?
More consistently is to stop only those processes that are generated by the
control of Ignite, e.g. from ignite.sh - here it is ok for me.
If we relese 'kill by default' as part of 2.5, we will end up with 2.6
e
Dmitriy,
I think everyone is suggesting that stopping the node will likely be
impossible if Ignite is frozen. Moreover, it is very likely that all other
apps are frozen too.
My comments are below...
On Tue, Mar 13, 2018 at 9:12 AM, Dmitry Pavlov
wrote:
> Please consider that user application m
Please consider that user application may use Ignite as optional cache for
some low-priority feature, but main logic is well functioning without
Ingnite. I can say, as Ignite user in the past, that it is quite real case.
Second real case is using several war files within one application server,
ru
On Tue, Mar 13, 2018 at 8:16 AM, Dmitry Pavlov
wrote:
> Dmitriy, alternative is "kill if standalone, stop if embedded"
> User will be still able to set something like
> -DNODE_CRASH_ACTION="kill"
> if ignite.sh is not used and user accepts alternative that whole process
> would be killed if nod
The most doubtful thing is 'stopping'. What if node does not respond due to
critical failure?
2018-03-13 15:16 GMT+03:00 Dmitry Pavlov :
> Dmitriy, alternative is "kill if standalone, stop if embedded"
>
> User will be still able to set something like
> -DNODE_CRASH_ACTION="kill"
> if ignite.sh i
Dmitriy, alternative is "kill if standalone, stop if embedded"
User will be still able to set something like
-DNODE_CRASH_ACTION="kill"
if ignite.sh is not used and user accepts alternative that whole process
would be killed if node is crashed.
Default would be 'node stop', but not hang up infine
Guys, I do not understand the alternative. If Ignite is frozen and causes
the whole grid to freeze, how can we justify not killing it? Will uses
rather have their applications freeze?
I would consider real life use cases here. Can someone present a life
example where keeping a frozen grid node aro
I also like "kill if standalone, stop if embedded" by default. A use can
change it to kill for embedded mode, but it will be a controlled safe
choice.
2018-03-13 11:26 GMT+03:00 Vladimir Ozerov :
> +1 for "kill if standalone, stop if embedded". We should never kill a
> process in embedded node be
+1 for "kill if standalone, stop if embedded". We should never kill a
process in embedded node because it might be disastrous for user
application.
On Tue, Mar 13, 2018 at 10:41 AM, Dmitry Pavlov
wrote:
> Denis, Dmitriy, I am not sure I agree here, please see close analogue - JVM
> itself, and i
Denis, Dmitriy, I am not sure I agree here, please see close analogue - JVM
itself, and its parameter ExitOnOutOfMemoryError,- it is not default.
If server node is started from sh script, kill OK for me, as process is
controlled only by ignite. It is sufficient to add option to override
default f
On Tue, Mar 13, 2018 at 1:18 AM, Andrey Kornev
wrote:
> I believe the only reasonable way to handle a critical system failure (as
> it is defined in the IEP) is a JVM halt (not a graceful exit/shutdown!).
> The sooner - the better, lesser impact. There’s simply no way to reason
> about the state
degrading the overall system
stability and availability all the while.
Andrey
_
From: Dmitriy Setrakyan
Sent: Monday, March 12, 2018 5:23 PM
Subject: Re: IEP-14: Ignite failures handling (Discussion)
To:
On Mon, Mar 12, 2018 at 5:12 PM, Denis Magda wrote:
> Dmit
On Mon, Mar 12, 2018 at 5:12 PM, Denis Magda wrote:
> Dmitriy,
>
> Ignite client node is usually used in the embedded mode. By killing the
> whole process, the node is running in, we're going to kill the entire
> application. That doesn't sound like a good plan. That's why my suggestion
> is to t
Dmitriy,
Ignite client node is usually used in the embedded mode. By killing the
whole process, the node is running in, we're going to kill the entire
application. That doesn't sound like a good plan. That's why my suggestion
is to try to kill the node somehow instead rather than the whole process
Denis, what is the difference between killing the process and killing the
node and the process?
D.
On Mon, Mar 12, 2018 at 12:03 PM, Denis Magda wrote:
> Guys,
>
> I would make a decision depending on a type of the problematic node:
>
>- If it's a *server node*, then let's kill the process
Guys,
I would make a decision depending on a type of the problematic node:
- If it's a *server node*, then let's kill the process simply because
the node usually owns the whole process. Don't see a practical reason why a
user wants to run 2 server nodes in a single process.
- If it's
Hi Andrey, Igniters,
Thank you for starting this topic, because this is really important
decision.
JVM termination in case Ignite is started within application server with
other application will kill all services started.
So I suggest this option is not default. We can add this option
(action="J
To my mind, the default action should be as severe as possible, since we
deal with critical errors, that is, entire JVM termination. In the case of
some custom setup (e.g. different cluster nodes in one JVM) failure
response action should be configured explicitly.
2018-03-12 12:32 GMT+03:00 Andrey
45 matches
Mail list logo