Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-18 Thread Pavel Kovalenko
Ivan,

I think your version is better, because it handles cases when several nodes
are left sequentially, so no needs to shrink baseline for each node left.
New version also saves some resources using internal scheduler.

2018-04-18 20:41 GMT+03:00 Ivan Rakov :

> I can suggest an improvement to BaselineWatcher by Pavel. I've added a new
> version to https://issues.apache.org/jira/browse/IGNITE-8241 comments.
> Pavel, what do you think?
>
> Best Regards,
> Ivan Rakov
>
>
> On 17.04.2018 20:47, Denis Magda wrote:
>
>> Thanks, Pavel!
>>
>> Alexey, Ivan, could you check that there are no any pitfalls in the
>> example
>> and it can be used as a template for our users?
>> https://issues.apache.org/jira/secure/attachment/12919452/
>> BaselineWatcher.java
>>
>> --
>> Denis
>>
>> On Tue, Apr 17, 2018 at 10:40 AM, Pavel Kovalenko 
>> wrote:
>>
>> Denis,
>>>
>>> I've attached example how to manage baseline automatically (It's named
>>> BaselineWatcher). It's just an concept and doesn't cover all possible
>>> cases, but might be good for a start.
>>>
>>> 2018-04-13 2:14 GMT+03:00 Denis Magda :
>>>
>>> Pavel, thanks for the suggestions. They would definitely work out. I

>>> would
>>>
 document the one with the event subscription:
 https://issues.apache.org/jira/browse/IGNITE-8241

 Could you help preparing a sample code snippet with such a listener that
 will be added to the doc? I know that there are some caveats related to

>>> the
>>>
 way how such an event has to be processed.

 Ivan, truly like your idea. Alex G., what's your thought on this?

 --
 Denis

 On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov 

>>> wrote:
>>>
 Guys,
>
> I also heard complaints about absence of option to automatically change
> baseline topology. They absolutely make sense.
> What Pavel suggested will work as a workaround. I think, in future
> releases we should give user an option to enable a similar behavior via
> Ignite Configuration.
> It may be called "Baseline Topology change policy". I see it as
>
 rule-based

> language, which allows to specify conditions of BLT change using
>
 several
>>>
 parameters - timeout and minimum allowed number of partition copies
>
 left
>>>
 (maybe this option should be provided also on per-cache-group level).
> Policy can also specify conditions for including new nodes in BLT if
>
 they
>>>
 are present - including node attributes filters and so on.
>
> What do you think?
>
> Best Regards,
> Ivan Rakov
>
>
> On 12.04.2018 19:41, Pavel Kovalenko wrote:
>
> Denis,
>>
>> It's just one of the ways to implement it. We also can subscribe on
>>
> node
>>>
 join / fail events to properly track downtime of a node.
>>
>> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko :
>>
>> Denis,
>>
>>> Using our API we can implement this task as follows:
>>> Do each minute:
>>> 1) Get all alive server nodes consistent ids =>
>>> ignite().context().discovery().aliveServerNodes() =>
>>> mapToConsistentIds().
>>> 2) Get current baseline topology => ignite().cluster().
>>> currentBaselineTopology()
>>> 3) For each node in baseline and not in alive server nodes check
>>>
>> timeout

> for this node.
>>> 4) If timeout is reached remove node from baseline
>>> 5) If baseline is changed set new baseline => ignite().cluster().
>>> setNewBaseline()
>>>
>>>
>>> 2018-04-12 2:18 GMT+03:00 Denis Magda :
>>>
>>> Pavel, Val,
>>>
 So, it means that the rebalancing will be initiated only after an
 administrator remove the failed node from the topology, right?

 Next, imagine that you are that IT administrator who has to automate

>>> the

> rebalancing activation if the node failed and not recovered within 1
 minute. What would you do and what Ignite provides to fulfill the

>>> task?

> --
 Denis

 On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <

>>> jokse...@gmail.com>
>>>
 wrote:

 Denis,

> In case of incomplete baseline topology IgniteCache.rebalance()
>
 will
>>>
 do

> nothing, because this event doesn't trigger partitions exchange or
>
> affinity

 change, so states of existing partitions are hold.
>
> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
>
> Denis,
>
>> In my understanding, in this case you should remove node from BLT
>>
> and

> that
>
> will trigger the rebalancing, no?
>>
>> -Val
>>
>> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <
>>
>

Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-18 Thread Ivan Rakov
I can suggest an improvement to BaselineWatcher by Pavel. I've added a 
new version to https://issues.apache.org/jira/browse/IGNITE-8241 comments.

Pavel, what do you think?

Best Regards,
Ivan Rakov

On 17.04.2018 20:47, Denis Magda wrote:

Thanks, Pavel!

Alexey, Ivan, could you check that there are no any pitfalls in the example
and it can be used as a template for our users?
https://issues.apache.org/jira/secure/attachment/12919452/BaselineWatcher.java

--
Denis

On Tue, Apr 17, 2018 at 10:40 AM, Pavel Kovalenko 
wrote:


Denis,

I've attached example how to manage baseline automatically (It's named
BaselineWatcher). It's just an concept and doesn't cover all possible
cases, but might be good for a start.

2018-04-13 2:14 GMT+03:00 Denis Magda :


Pavel, thanks for the suggestions. They would definitely work out. I

would

document the one with the event subscription:
https://issues.apache.org/jira/browse/IGNITE-8241

Could you help preparing a sample code snippet with such a listener that
will be added to the doc? I know that there are some caveats related to

the

way how such an event has to be processed.

Ivan, truly like your idea. Alex G., what's your thought on this?

--
Denis

On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov 

wrote:

Guys,

I also heard complaints about absence of option to automatically change
baseline topology. They absolutely make sense.
What Pavel suggested will work as a workaround. I think, in future
releases we should give user an option to enable a similar behavior via
Ignite Configuration.
It may be called "Baseline Topology change policy". I see it as

rule-based

language, which allows to specify conditions of BLT change using

several

parameters - timeout and minimum allowed number of partition copies

left

(maybe this option should be provided also on per-cache-group level).
Policy can also specify conditions for including new nodes in BLT if

they

are present - including node attributes filters and so on.

What do you think?

Best Regards,
Ivan Rakov


On 12.04.2018 19:41, Pavel Kovalenko wrote:


Denis,

It's just one of the ways to implement it. We also can subscribe on

node

join / fail events to properly track downtime of a node.

2018-04-12 19:38 GMT+03:00 Pavel Kovalenko :

Denis,

Using our API we can implement this task as follows:
Do each minute:
1) Get all alive server nodes consistent ids =>
ignite().context().discovery().aliveServerNodes() =>
mapToConsistentIds().
2) Get current baseline topology => ignite().cluster().
currentBaselineTopology()
3) For each node in baseline and not in alive server nodes check

timeout

for this node.
4) If timeout is reached remove node from baseline
5) If baseline is changed set new baseline => ignite().cluster().
setNewBaseline()


2018-04-12 2:18 GMT+03:00 Denis Magda :

Pavel, Val,

So, it means that the rebalancing will be initiated only after an
administrator remove the failed node from the topology, right?

Next, imagine that you are that IT administrator who has to automate

the

rebalancing activation if the node failed and not recovered within 1
minute. What would you do and what Ignite provides to fulfill the

task?

--
Denis

On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <

jokse...@gmail.com>

wrote:

Denis,

In case of incomplete baseline topology IgniteCache.rebalance()

will

do

nothing, because this event doesn't trigger partitions exchange or


affinity


change, so states of existing partitions are hold.

2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

Denis,

In my understanding, in this case you should remove node from BLT

and

that


will trigger the rebalancing, no?

-Val

On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <

dma...@gridgain.com>

wrote:


Igniters,

As we know the rebalancing doesn't happen if one of the nodes

goes

down,
thus, shrinking the baseline topology. It complies with our
assumption

that

the node should be recovered soon and there is no need to waste
CPU/memory/networking resources of the cluster shifting the data


around.
However, there are always edge cases. I was reasonably asked how

to

trigger


the rebalancing within the baseline topology manually or on

timeout

if:

 - It's not expected that the failed node would be resurrected

in

the

 nearest time and

 - It's not likely that that node will be replaced by the

other

one.

The question. If I call IgniteCache.rebalance() or configure

CacheConfiguration.rebalanceTimeout will the rebalancing be

fired

within
the baseline topology?

--
Denis






Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-18 Thread Dmitriy Setrakyan
On Tue, Apr 17, 2018 at 4:38 PM, Denis Magda  wrote:

> Dmitriy,
>
> We don't want to disable the baseline topology for the scenario discussed
> here. The goal is to make it more flexible by triggering the rebalancing in
> some circumstances.
>
> As for the SAFE or AGGRESSIVE policies, haven't seen the discussion on the
> dev. So, not sure what it's intended for (use case, scenario, behavior).
>

The idea was to have a mode for disabling BLT, so users will not have to
provide coding callbacks to force the rebalance. Essentially, if we had
this policy, you would not need to provide this example.


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-17 Thread Denis Magda
Dmitriy,

We don't want to disable the baseline topology for the scenario discussed
here. The goal is to make it more flexible by triggering the rebalancing in
some circumstances.

As for the SAFE or AGGRESSIVE policies, haven't seen the discussion on the
dev. So, not sure what it's intended for (use case, scenario, behavior).

--
Denis

On Tue, Apr 17, 2018 at 11:55 AM, Dmitriy Setrakyan 
wrote:

> On Tue, Apr 17, 2018 at 10:47 AM, Denis Magda  wrote:
>
> > Thanks, Pavel!
> >
> > Alexey, Ivan, could you check that there are no any pitfalls in the
> example
> > and it can be used as a template for our users?
> > https://issues.apache.org/jira/secure/attachment/
> > 12919452/BaselineWatcher.java
>
>
> Denis, I think the proper fix is to add ability to the product to disable
> BLT (baseline topology). I remember seeing a discussion in have SAFE and
> AGGRESSIVE (of SAFE and NONE) policies for BLT. Do we have a ticket for it?
>
> D.
>


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-17 Thread Dmitriy Setrakyan
On Tue, Apr 17, 2018 at 10:47 AM, Denis Magda  wrote:

> Thanks, Pavel!
>
> Alexey, Ivan, could you check that there are no any pitfalls in the example
> and it can be used as a template for our users?
> https://issues.apache.org/jira/secure/attachment/
> 12919452/BaselineWatcher.java


Denis, I think the proper fix is to add ability to the product to disable
BLT (baseline topology). I remember seeing a discussion in have SAFE and
AGGRESSIVE (of SAFE and NONE) policies for BLT. Do we have a ticket for it?

D.


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-17 Thread Denis Magda
Thanks, Pavel!

Alexey, Ivan, could you check that there are no any pitfalls in the example
and it can be used as a template for our users?
https://issues.apache.org/jira/secure/attachment/12919452/BaselineWatcher.java

--
Denis

On Tue, Apr 17, 2018 at 10:40 AM, Pavel Kovalenko 
wrote:

> Denis,
>
> I've attached example how to manage baseline automatically (It's named
> BaselineWatcher). It's just an concept and doesn't cover all possible
> cases, but might be good for a start.
>
> 2018-04-13 2:14 GMT+03:00 Denis Magda :
>
> > Pavel, thanks for the suggestions. They would definitely work out. I
> would
> > document the one with the event subscription:
> > https://issues.apache.org/jira/browse/IGNITE-8241
> >
> > Could you help preparing a sample code snippet with such a listener that
> > will be added to the doc? I know that there are some caveats related to
> the
> > way how such an event has to be processed.
> >
> > Ivan, truly like your idea. Alex G., what's your thought on this?
> >
> > --
> > Denis
> >
> > On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov 
> wrote:
> >
> > > Guys,
> > >
> > > I also heard complaints about absence of option to automatically change
> > > baseline topology. They absolutely make sense.
> > > What Pavel suggested will work as a workaround. I think, in future
> > > releases we should give user an option to enable a similar behavior via
> > > Ignite Configuration.
> > > It may be called "Baseline Topology change policy". I see it as
> > rule-based
> > > language, which allows to specify conditions of BLT change using
> several
> > > parameters - timeout and minimum allowed number of partition copies
> left
> > > (maybe this option should be provided also on per-cache-group level).
> > > Policy can also specify conditions for including new nodes in BLT if
> they
> > > are present - including node attributes filters and so on.
> > >
> > > What do you think?
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > >
> > > On 12.04.2018 19:41, Pavel Kovalenko wrote:
> > >
> > >> Denis,
> > >>
> > >> It's just one of the ways to implement it. We also can subscribe on
> node
> > >> join / fail events to properly track downtime of a node.
> > >>
> > >> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko :
> > >>
> > >> Denis,
> > >>>
> > >>> Using our API we can implement this task as follows:
> > >>> Do each minute:
> > >>> 1) Get all alive server nodes consistent ids =>
> > >>> ignite().context().discovery().aliveServerNodes() =>
> > >>> mapToConsistentIds().
> > >>> 2) Get current baseline topology => ignite().cluster().
> > >>> currentBaselineTopology()
> > >>> 3) For each node in baseline and not in alive server nodes check
> > timeout
> > >>> for this node.
> > >>> 4) If timeout is reached remove node from baseline
> > >>> 5) If baseline is changed set new baseline => ignite().cluster().
> > >>> setNewBaseline()
> > >>>
> > >>>
> > >>> 2018-04-12 2:18 GMT+03:00 Denis Magda :
> > >>>
> > >>> Pavel, Val,
> > 
> >  So, it means that the rebalancing will be initiated only after an
> >  administrator remove the failed node from the topology, right?
> > 
> >  Next, imagine that you are that IT administrator who has to automate
> > the
> >  rebalancing activation if the node failed and not recovered within 1
> >  minute. What would you do and what Ignite provides to fulfill the
> > task?
> > 
> >  --
> >  Denis
> > 
> >  On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <
> jokse...@gmail.com>
> >  wrote:
> > 
> >  Denis,
> > >
> > > In case of incomplete baseline topology IgniteCache.rebalance()
> will
> > do
> > > nothing, because this event doesn't trigger partitions exchange or
> > >
> >  affinity
> > 
> > > change, so states of existing partitions are hold.
> > >
> > > 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
> > > valentin.kuliche...@gmail.com>:
> > >
> > > Denis,
> > >>
> > >> In my understanding, in this case you should remove node from BLT
> > and
> > >>
> > > that
> > >
> > >> will trigger the rebalancing, no?
> > >>
> > >> -Val
> > >>
> > >> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <
> dma...@gridgain.com>
> > >>
> > > wrote:
> > >
> > >> Igniters,
> > >>>
> > >>> As we know the rebalancing doesn't happen if one of the nodes
> goes
> > >>>
> > >> down,
> > >
> > >> thus, shrinking the baseline topology. It complies with our
> > >>>
> > >> assumption
> > 
> > > that
> > >>
> > >>> the node should be recovered soon and there is no need to waste
> > >>> CPU/memory/networking resources of the cluster shifting the data
> > >>>
> > >> around.
> > >
> > >> However, there are always edge cases. I was reasonably asked how
> to
> > >>>
> > >> trigger
> > >>
> > >>> the rebalancing within the baseline topology manually or on
> timeout
> > >

Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-17 Thread Pavel Kovalenko
Denis,

I've attached example how to manage baseline automatically (It's named
BaselineWatcher). It's just an concept and doesn't cover all possible
cases, but might be good for a start.

2018-04-13 2:14 GMT+03:00 Denis Magda :

> Pavel, thanks for the suggestions. They would definitely work out. I would
> document the one with the event subscription:
> https://issues.apache.org/jira/browse/IGNITE-8241
>
> Could you help preparing a sample code snippet with such a listener that
> will be added to the doc? I know that there are some caveats related to the
> way how such an event has to be processed.
>
> Ivan, truly like your idea. Alex G., what's your thought on this?
>
> --
> Denis
>
> On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov  wrote:
>
> > Guys,
> >
> > I also heard complaints about absence of option to automatically change
> > baseline topology. They absolutely make sense.
> > What Pavel suggested will work as a workaround. I think, in future
> > releases we should give user an option to enable a similar behavior via
> > Ignite Configuration.
> > It may be called "Baseline Topology change policy". I see it as
> rule-based
> > language, which allows to specify conditions of BLT change using several
> > parameters - timeout and minimum allowed number of partition copies left
> > (maybe this option should be provided also on per-cache-group level).
> > Policy can also specify conditions for including new nodes in BLT if they
> > are present - including node attributes filters and so on.
> >
> > What do you think?
> >
> > Best Regards,
> > Ivan Rakov
> >
> >
> > On 12.04.2018 19:41, Pavel Kovalenko wrote:
> >
> >> Denis,
> >>
> >> It's just one of the ways to implement it. We also can subscribe on node
> >> join / fail events to properly track downtime of a node.
> >>
> >> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko :
> >>
> >> Denis,
> >>>
> >>> Using our API we can implement this task as follows:
> >>> Do each minute:
> >>> 1) Get all alive server nodes consistent ids =>
> >>> ignite().context().discovery().aliveServerNodes() =>
> >>> mapToConsistentIds().
> >>> 2) Get current baseline topology => ignite().cluster().
> >>> currentBaselineTopology()
> >>> 3) For each node in baseline and not in alive server nodes check
> timeout
> >>> for this node.
> >>> 4) If timeout is reached remove node from baseline
> >>> 5) If baseline is changed set new baseline => ignite().cluster().
> >>> setNewBaseline()
> >>>
> >>>
> >>> 2018-04-12 2:18 GMT+03:00 Denis Magda :
> >>>
> >>> Pavel, Val,
> 
>  So, it means that the rebalancing will be initiated only after an
>  administrator remove the failed node from the topology, right?
> 
>  Next, imagine that you are that IT administrator who has to automate
> the
>  rebalancing activation if the node failed and not recovered within 1
>  minute. What would you do and what Ignite provides to fulfill the
> task?
> 
>  --
>  Denis
> 
>  On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko 
>  wrote:
> 
>  Denis,
> >
> > In case of incomplete baseline topology IgniteCache.rebalance() will
> do
> > nothing, because this event doesn't trigger partitions exchange or
> >
>  affinity
> 
> > change, so states of existing partitions are hold.
> >
> > 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>:
> >
> > Denis,
> >>
> >> In my understanding, in this case you should remove node from BLT
> and
> >>
> > that
> >
> >> will trigger the rebalancing, no?
> >>
> >> -Val
> >>
> >> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda 
> >>
> > wrote:
> >
> >> Igniters,
> >>>
> >>> As we know the rebalancing doesn't happen if one of the nodes goes
> >>>
> >> down,
> >
> >> thus, shrinking the baseline topology. It complies with our
> >>>
> >> assumption
> 
> > that
> >>
> >>> the node should be recovered soon and there is no need to waste
> >>> CPU/memory/networking resources of the cluster shifting the data
> >>>
> >> around.
> >
> >> However, there are always edge cases. I was reasonably asked how to
> >>>
> >> trigger
> >>
> >>> the rebalancing within the baseline topology manually or on timeout
> >>>
> >> if:
> 
> > - It's not expected that the failed node would be resurrected in
> >>>
> >> the
> 
> > nearest time and
> >>> - It's not likely that that node will be replaced by the other
> >>>
> >> one.
> 
> > The question. If I call IgniteCache.rebalance() or configure
> >>> CacheConfiguration.rebalanceTimeout will the rebalancing be fired
> >>>
> >> within
> >
> >> the baseline topology?
> >>>
> >>> --
> >>> Denis
> >>>
> >>>
> >>>
> >
>


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-12 Thread Denis Magda
Pavel, thanks for the suggestions. They would definitely work out. I would
document the one with the event subscription:
https://issues.apache.org/jira/browse/IGNITE-8241

Could you help preparing a sample code snippet with such a listener that
will be added to the doc? I know that there are some caveats related to the
way how such an event has to be processed.

Ivan, truly like your idea. Alex G., what's your thought on this?

--
Denis

On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov  wrote:

> Guys,
>
> I also heard complaints about absence of option to automatically change
> baseline topology. They absolutely make sense.
> What Pavel suggested will work as a workaround. I think, in future
> releases we should give user an option to enable a similar behavior via
> Ignite Configuration.
> It may be called "Baseline Topology change policy". I see it as rule-based
> language, which allows to specify conditions of BLT change using several
> parameters - timeout and minimum allowed number of partition copies left
> (maybe this option should be provided also on per-cache-group level).
> Policy can also specify conditions for including new nodes in BLT if they
> are present - including node attributes filters and so on.
>
> What do you think?
>
> Best Regards,
> Ivan Rakov
>
>
> On 12.04.2018 19:41, Pavel Kovalenko wrote:
>
>> Denis,
>>
>> It's just one of the ways to implement it. We also can subscribe on node
>> join / fail events to properly track downtime of a node.
>>
>> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko :
>>
>> Denis,
>>>
>>> Using our API we can implement this task as follows:
>>> Do each minute:
>>> 1) Get all alive server nodes consistent ids =>
>>> ignite().context().discovery().aliveServerNodes() =>
>>> mapToConsistentIds().
>>> 2) Get current baseline topology => ignite().cluster().
>>> currentBaselineTopology()
>>> 3) For each node in baseline and not in alive server nodes check timeout
>>> for this node.
>>> 4) If timeout is reached remove node from baseline
>>> 5) If baseline is changed set new baseline => ignite().cluster().
>>> setNewBaseline()
>>>
>>>
>>> 2018-04-12 2:18 GMT+03:00 Denis Magda :
>>>
>>> Pavel, Val,

 So, it means that the rebalancing will be initiated only after an
 administrator remove the failed node from the topology, right?

 Next, imagine that you are that IT administrator who has to automate the
 rebalancing activation if the node failed and not recovered within 1
 minute. What would you do and what Ignite provides to fulfill the task?

 --
 Denis

 On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko 
 wrote:

 Denis,
>
> In case of incomplete baseline topology IgniteCache.rebalance() will do
> nothing, because this event doesn't trigger partitions exchange or
>
 affinity

> change, so states of existing partitions are hold.
>
> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
>
> Denis,
>>
>> In my understanding, in this case you should remove node from BLT and
>>
> that
>
>> will trigger the rebalancing, no?
>>
>> -Val
>>
>> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda 
>>
> wrote:
>
>> Igniters,
>>>
>>> As we know the rebalancing doesn't happen if one of the nodes goes
>>>
>> down,
>
>> thus, shrinking the baseline topology. It complies with our
>>>
>> assumption

> that
>>
>>> the node should be recovered soon and there is no need to waste
>>> CPU/memory/networking resources of the cluster shifting the data
>>>
>> around.
>
>> However, there are always edge cases. I was reasonably asked how to
>>>
>> trigger
>>
>>> the rebalancing within the baseline topology manually or on timeout
>>>
>> if:

> - It's not expected that the failed node would be resurrected in
>>>
>> the

> nearest time and
>>> - It's not likely that that node will be replaced by the other
>>>
>> one.

> The question. If I call IgniteCache.rebalance() or configure
>>> CacheConfiguration.rebalanceTimeout will the rebalancing be fired
>>>
>> within
>
>> the baseline topology?
>>>
>>> --
>>> Denis
>>>
>>>
>>>
>


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-12 Thread Ivan Rakov

Guys,

I also heard complaints about absence of option to automatically change 
baseline topology. They absolutely make sense.
What Pavel suggested will work as a workaround. I think, in future 
releases we should give user an option to enable a similar behavior via 
Ignite Configuration.
It may be called "Baseline Topology change policy". I see it as 
rule-based language, which allows to specify conditions of BLT change 
using several parameters - timeout and minimum allowed number of 
partition copies left (maybe this option should be provided also on 
per-cache-group level). Policy can also specify conditions for including 
new nodes in BLT if they are present - including node attributes filters 
and so on.


What do you think?

Best Regards,
Ivan Rakov

On 12.04.2018 19:41, Pavel Kovalenko wrote:

Denis,

It's just one of the ways to implement it. We also can subscribe on node
join / fail events to properly track downtime of a node.

2018-04-12 19:38 GMT+03:00 Pavel Kovalenko :


Denis,

Using our API we can implement this task as follows:
Do each minute:
1) Get all alive server nodes consistent ids =>
ignite().context().discovery().aliveServerNodes() => mapToConsistentIds().
2) Get current baseline topology => ignite().cluster().
currentBaselineTopology()
3) For each node in baseline and not in alive server nodes check timeout
for this node.
4) If timeout is reached remove node from baseline
5) If baseline is changed set new baseline => ignite().cluster().
setNewBaseline()


2018-04-12 2:18 GMT+03:00 Denis Magda :


Pavel, Val,

So, it means that the rebalancing will be initiated only after an
administrator remove the failed node from the topology, right?

Next, imagine that you are that IT administrator who has to automate the
rebalancing activation if the node failed and not recovered within 1
minute. What would you do and what Ignite provides to fulfill the task?

--
Denis

On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko 
wrote:


Denis,

In case of incomplete baseline topology IgniteCache.rebalance() will do
nothing, because this event doesn't trigger partitions exchange or

affinity

change, so states of existing partitions are hold.

2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
valentin.kuliche...@gmail.com>:


Denis,

In my understanding, in this case you should remove node from BLT and

that

will trigger the rebalancing, no?

-Val

On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda 

wrote:

Igniters,

As we know the rebalancing doesn't happen if one of the nodes goes

down,

thus, shrinking the baseline topology. It complies with our

assumption

that

the node should be recovered soon and there is no need to waste
CPU/memory/networking resources of the cluster shifting the data

around.

However, there are always edge cases. I was reasonably asked how to

trigger

the rebalancing within the baseline topology manually or on timeout

if:

- It's not expected that the failed node would be resurrected in

the

nearest time and
- It's not likely that that node will be replaced by the other

one.

The question. If I call IgniteCache.rebalance() or configure
CacheConfiguration.rebalanceTimeout will the rebalancing be fired

within

the baseline topology?

--
Denis







Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-12 Thread Pavel Kovalenko
Denis,

It's just one of the ways to implement it. We also can subscribe on node
join / fail events to properly track downtime of a node.

2018-04-12 19:38 GMT+03:00 Pavel Kovalenko :

> Denis,
>
> Using our API we can implement this task as follows:
> Do each minute:
> 1) Get all alive server nodes consistent ids =>
> ignite().context().discovery().aliveServerNodes() => mapToConsistentIds().
> 2) Get current baseline topology => ignite().cluster().
> currentBaselineTopology()
> 3) For each node in baseline and not in alive server nodes check timeout
> for this node.
> 4) If timeout is reached remove node from baseline
> 5) If baseline is changed set new baseline => ignite().cluster().
> setNewBaseline()
>
>
> 2018-04-12 2:18 GMT+03:00 Denis Magda :
>
>> Pavel, Val,
>>
>> So, it means that the rebalancing will be initiated only after an
>> administrator remove the failed node from the topology, right?
>>
>> Next, imagine that you are that IT administrator who has to automate the
>> rebalancing activation if the node failed and not recovered within 1
>> minute. What would you do and what Ignite provides to fulfill the task?
>>
>> --
>> Denis
>>
>> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko 
>> wrote:
>>
>> > Denis,
>> >
>> > In case of incomplete baseline topology IgniteCache.rebalance() will do
>> > nothing, because this event doesn't trigger partitions exchange or
>> affinity
>> > change, so states of existing partitions are hold.
>> >
>> > 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
>> > valentin.kuliche...@gmail.com>:
>> >
>> > > Denis,
>> > >
>> > > In my understanding, in this case you should remove node from BLT and
>> > that
>> > > will trigger the rebalancing, no?
>> > >
>> > > -Val
>> > >
>> > > On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda 
>> > wrote:
>> > >
>> > > > Igniters,
>> > > >
>> > > > As we know the rebalancing doesn't happen if one of the nodes goes
>> > down,
>> > > > thus, shrinking the baseline topology. It complies with our
>> assumption
>> > > that
>> > > > the node should be recovered soon and there is no need to waste
>> > > > CPU/memory/networking resources of the cluster shifting the data
>> > around.
>> > > >
>> > > > However, there are always edge cases. I was reasonably asked how to
>> > > trigger
>> > > > the rebalancing within the baseline topology manually or on timeout
>> if:
>> > > >
>> > > >- It's not expected that the failed node would be resurrected in
>> the
>> > > >nearest time and
>> > > >- It's not likely that that node will be replaced by the other
>> one.
>> > > >
>> > > > The question. If I call IgniteCache.rebalance() or configure
>> > > > CacheConfiguration.rebalanceTimeout will the rebalancing be fired
>> > within
>> > > > the baseline topology?
>> > > >
>> > > > --
>> > > > Denis
>> > > >
>> > >
>> >
>>
>
>


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-12 Thread Pavel Kovalenko
Denis,

Using our API we can implement this task as follows:
Do each minute:
1) Get all alive server nodes consistent ids =>
ignite().context().discovery().aliveServerNodes() => mapToConsistentIds().
2) Get current baseline topology =>
ignite().cluster().currentBaselineTopology()
3) For each node in baseline and not in alive server nodes check timeout
for this node.
4) If timeout is reached remove node from baseline
5) If baseline is changed set new baseline =>
ignite().cluster().setNewBaseline()


2018-04-12 2:18 GMT+03:00 Denis Magda :

> Pavel, Val,
>
> So, it means that the rebalancing will be initiated only after an
> administrator remove the failed node from the topology, right?
>
> Next, imagine that you are that IT administrator who has to automate the
> rebalancing activation if the node failed and not recovered within 1
> minute. What would you do and what Ignite provides to fulfill the task?
>
> --
> Denis
>
> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko 
> wrote:
>
> > Denis,
> >
> > In case of incomplete baseline topology IgniteCache.rebalance() will do
> > nothing, because this event doesn't trigger partitions exchange or
> affinity
> > change, so states of existing partitions are hold.
> >
> > 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>:
> >
> > > Denis,
> > >
> > > In my understanding, in this case you should remove node from BLT and
> > that
> > > will trigger the rebalancing, no?
> > >
> > > -Val
> > >
> > > On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda 
> > wrote:
> > >
> > > > Igniters,
> > > >
> > > > As we know the rebalancing doesn't happen if one of the nodes goes
> > down,
> > > > thus, shrinking the baseline topology. It complies with our
> assumption
> > > that
> > > > the node should be recovered soon and there is no need to waste
> > > > CPU/memory/networking resources of the cluster shifting the data
> > around.
> > > >
> > > > However, there are always edge cases. I was reasonably asked how to
> > > trigger
> > > > the rebalancing within the baseline topology manually or on timeout
> if:
> > > >
> > > >- It's not expected that the failed node would be resurrected in
> the
> > > >nearest time and
> > > >- It's not likely that that node will be replaced by the other
> one.
> > > >
> > > > The question. If I call IgniteCache.rebalance() or configure
> > > > CacheConfiguration.rebalanceTimeout will the rebalancing be fired
> > within
> > > > the baseline topology?
> > > >
> > > > --
> > > > Denis
> > > >
> > >
> >
>


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-11 Thread Denis Magda
Pavel, Val,

So, it means that the rebalancing will be initiated only after an
administrator remove the failed node from the topology, right?

Next, imagine that you are that IT administrator who has to automate the
rebalancing activation if the node failed and not recovered within 1
minute. What would you do and what Ignite provides to fulfill the task?

--
Denis

On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko  wrote:

> Denis,
>
> In case of incomplete baseline topology IgniteCache.rebalance() will do
> nothing, because this event doesn't trigger partitions exchange or affinity
> change, so states of existing partitions are hold.
>
> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
>
> > Denis,
> >
> > In my understanding, in this case you should remove node from BLT and
> that
> > will trigger the rebalancing, no?
> >
> > -Val
> >
> > On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda 
> wrote:
> >
> > > Igniters,
> > >
> > > As we know the rebalancing doesn't happen if one of the nodes goes
> down,
> > > thus, shrinking the baseline topology. It complies with our assumption
> > that
> > > the node should be recovered soon and there is no need to waste
> > > CPU/memory/networking resources of the cluster shifting the data
> around.
> > >
> > > However, there are always edge cases. I was reasonably asked how to
> > trigger
> > > the rebalancing within the baseline topology manually or on timeout if:
> > >
> > >- It's not expected that the failed node would be resurrected in the
> > >nearest time and
> > >- It's not likely that that node will be replaced by the other one.
> > >
> > > The question. If I call IgniteCache.rebalance() or configure
> > > CacheConfiguration.rebalanceTimeout will the rebalancing be fired
> within
> > > the baseline topology?
> > >
> > > --
> > > Denis
> > >
> >
>


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-11 Thread Pavel Kovalenko
Denis,

In case of incomplete baseline topology IgniteCache.rebalance() will do
nothing, because this event doesn't trigger partitions exchange or affinity
change, so states of existing partitions are hold.

2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Denis,
>
> In my understanding, in this case you should remove node from BLT and that
> will trigger the rebalancing, no?
>
> -Val
>
> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda  wrote:
>
> > Igniters,
> >
> > As we know the rebalancing doesn't happen if one of the nodes goes down,
> > thus, shrinking the baseline topology. It complies with our assumption
> that
> > the node should be recovered soon and there is no need to waste
> > CPU/memory/networking resources of the cluster shifting the data around.
> >
> > However, there are always edge cases. I was reasonably asked how to
> trigger
> > the rebalancing within the baseline topology manually or on timeout if:
> >
> >- It's not expected that the failed node would be resurrected in the
> >nearest time and
> >- It's not likely that that node will be replaced by the other one.
> >
> > The question. If I call IgniteCache.rebalance() or configure
> > CacheConfiguration.rebalanceTimeout will the rebalancing be fired within
> > the baseline topology?
> >
> > --
> > Denis
> >
>


Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-11 Thread Valentin Kulichenko
Denis,

In my understanding, in this case you should remove node from BLT and that
will trigger the rebalancing, no?

-Val

On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda  wrote:

> Igniters,
>
> As we know the rebalancing doesn't happen if one of the nodes goes down,
> thus, shrinking the baseline topology. It complies with our assumption that
> the node should be recovered soon and there is no need to waste
> CPU/memory/networking resources of the cluster shifting the data around.
>
> However, there are always edge cases. I was reasonably asked how to trigger
> the rebalancing within the baseline topology manually or on timeout if:
>
>- It's not expected that the failed node would be resurrected in the
>nearest time and
>- It's not likely that that node will be replaced by the other one.
>
> The question. If I call IgniteCache.rebalance() or configure
> CacheConfiguration.rebalanceTimeout will the rebalancing be fired within
> the baseline topology?
>
> --
> Denis
>


Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

2018-04-11 Thread Denis Magda
Igniters,

As we know the rebalancing doesn't happen if one of the nodes goes down,
thus, shrinking the baseline topology. It complies with our assumption that
the node should be recovered soon and there is no need to waste
CPU/memory/networking resources of the cluster shifting the data around.

However, there are always edge cases. I was reasonably asked how to trigger
the rebalancing within the baseline topology manually or on timeout if:

   - It's not expected that the failed node would be resurrected in the
   nearest time and
   - It's not likely that that node will be replaced by the other one.

The question. If I call IgniteCache.rebalance() or configure
CacheConfiguration.rebalanceTimeout will the rebalancing be fired within
the baseline topology?

--
Denis