[jira] [Comment Edited] (IGNITE-6579) WAL history does not used when node returns to cluster again

2018-02-19 Thread Pavel Pereslegin (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328816#comment-16328816
 ] 

Pavel Pereslegin edited comment on IGNITE-6579 at 2/19/18 12:59 PM:


Hello, [~v.pyatkov].
I agree, but since coordinator doesn't have information about non local 
partition sizes we can't simply point to this restriction here, at least 
without some irrelevant changes to existing partition exchange procedure.
To improve understanding, may be, we can add debug message on nodes where WAL 
history is being reserved for exchange (in case partition size too small for 
partial rebalancing)?


was (Author: xtern):
Hello, [~v.pyatkov].
I agreed, but since coordinator doesn't have information about non local 
partition sizes we can't simply point to this restriction here, at least 
without some irrelevant changes to existing partition exchange procedure.
To improve understanding, may be, we can add debug message on nodes where WAL 
history is being reserved for exchange (in case partition size too small for 
partial rebalancing)?

> WAL history does not used when node returns to cluster again
> 
>
> Key: IGNITE-6579
> URL: https://issues.apache.org/jira/browse/IGNITE-6579
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Vladislav Pyatkov
>Priority: Major
>
> When I have set big enough value to "WAL history size" and stop node on 20 
> minutes, I got the message from coordinator (order=1):
> {noformat}
> 2017-10-06 15:46:33.429 [WARN 
> ][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
>  Partition has been scheduled for rebalancing due to outdated update counter 
> [nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
>  cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2424, 
> haveHistory=false]
> 2017-10-06 15:46:33.429 [WARN 
> ][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
>  Partition has been scheduled for rebalancing due to outdated update counter 
> [nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
>  cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2427, 
> haveHistory=false]
> 2017-10-06 15:46:33.429 [WARN 
> ][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
>  Partition has been scheduled for rebalancing due to outdated update counter 
> [nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
>  cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2426, 
> haveHistory=false]
> {noformat}
> after start node again.
> I think, history size should be enough, but I see it is not by logs 
> (haveHistory=false).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-6579) WAL history does not used when node returns to cluster again

2017-12-25 Thread Pavel Pereslegin (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303251#comment-16303251
 ] 

Pavel Pereslegin edited comment on IGNITE-6579 at 12/25/17 12:07 PM:
-

Hello [~v.pyatkov],

I retested with the following scenario:

# Set IGNITE_PDS_WAL_REBALANCE_THRESHOLD to 1.
# Start nodes A, B and C with one replicated cache (RendezvousAffinityFunction 
with 10 partitions).
# Put 10 values to cache (1 keys per partition).
# Stop node C.
# Put 3000 values to cache (10300 keys per partition).
# Rejoin node C (nodeId = 606c6f4d-c314-4345-8b6d-cc3f3792).
# Observing messages from coordinator (haveHistory=true).

{noformat}
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=0, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=1, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=2, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=3, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=4, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=5, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=6, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=7, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=8, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=9, haveHistory=true]
{noformat}

If I set IGNITE_PDS_WAL_REBALANCE_THRESHOLD larger than the partition size 
(10301 for example) - WAL history is not used.
{noformat}
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=0, haveHistory=false]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, cacheOrGroupName=default, 
partId=1, haveHistory=false]
...{noformat}


was (Author: xtern):
Hello [~v.pyatkov],

I tried the following scenario:

# Set JVM option -DIGNITE_PDS_WAL_REBALANCE_THRESHOLD=1.
# Start nodes A, B and C with one replicated cache (no backups, 
RendezvousAffinityFunction with 10 partitions).
# Put 10 values to cache (1 keys per partition).
# Stop node C.
# Put 3000 values to cache (10300 keys per partition).
# Rejoin node C (nodeId = 606c6f4d-c314-4345-8b6d-cc3f3792).
# Observing messages from coordinator (haveHistory=true).

{noformat}
[GridDhtPartitionTopologyImpl] Partition has been scheduled for rebalancing due 
to outdated update counter [nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, 
cacheOrGroupName=default, partId=0, haveHistory=true]
[GridDhtPartitionTopologyImpl] Partition has been scheduled for rebalancing due 
to outdated update counter [nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, 
cacheOrGroupName=default, partId=1, haveHistory=true]
[GridDhtPartitionTopologyImpl] Partition has been scheduled for rebalancing due 
to outdated update counter [nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, 
cacheOrGroupName=default, partId=2, haveHistory=true]
[GridDhtPartitionTopologyImpl] Partition has been scheduled for rebalancing due 
to outdated update counter [nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, 
cacheOrGroupName=default, partId=3, haveHistory=true]
[GridDhtPartitionTopologyImpl] Partition has been scheduled for rebalancing due 
to outdated update counter [nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, 
cacheOrGroupName=default, partId=4, haveHistory=true]
[GridDhtPartitionTopologyImpl] Partition has been scheduled for rebalancing due 
to outdated update counter [nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, 
cacheOrGroupName=default, partId=5, haveHistory=true]
[GridDhtPartitionTopologyImpl] Partition has been scheduled for rebalancing due 
to outdated update counter [nodeId=606c6f4d-c314-4345-8b6d-cc3f3792, 

[jira] [Comment Edited] (IGNITE-6579) WAL history does not used when node returns to cluster again

2017-12-24 Thread Pavel Pereslegin (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16301632#comment-16301632
 ] 

Pavel Pereslegin edited comment on IGNITE-6579 at 12/25/17 7:50 AM:


Hello [~v.pyatkov].

Could you provide more details about configuration and expected behavior.

I tried the following scenario and I believe it works fine.
# Set IGNITE_PDS_WAL_REBALANCE_THRESHOLD to 0 (default threshold = 50).
# Start nodes A, B and C with PDS enabled (1 backup).
# Put data to cache.
# Stop node C.
# Put more data to cache.
# Rejoin node C to cluster (nodeId = 93206ec4-0b0a-4c91-bb55-79b05f12).
# Observing messages from coordinator about outdated update counters 
(haveHistory=true):

{noformat}
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=93206ec4-0b0a-4c91-bb55-79b05f12, cacheOrGroupName=default, 
partId=0, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=93206ec4-0b0a-4c91-bb55-79b05f12, cacheOrGroupName=default, 
partId=1, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=93206ec4-0b0a-4c91-bb55-79b05f12, cacheOrGroupName=default, 
partId=4, haveHistory=true]
...
{noformat}


was (Author: xtern):
Hello [~v.pyatkov].

Could you provide more details about your configuration and behavior that you 
expect to see (or how to reproduce this bug).

I tried following scenario and I believe it works well.
# Set IGNITE_PDS_WAL_REBALANCE_THRESHOLD to 0 (default threshold = 50).
# Start nodes A, B and C with PDS enabled (1 backup).
# Add data to cache.
# Stop node C.
# Add more data to cache.
# Rejoin node C to cluster (nodeId = 93206ec4-0b0a-4c91-bb55-79b05f12).
# Observing messages from coordinator about outdated update counters 
(haveHistory=true):

{noformat}
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=93206ec4-0b0a-4c91-bb55-79b05f12, cacheOrGroupName=default, 
partId=0, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=93206ec4-0b0a-4c91-bb55-79b05f12, cacheOrGroupName=default, 
partId=1, haveHistory=true]
Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=93206ec4-0b0a-4c91-bb55-79b05f12, cacheOrGroupName=default, 
partId=4, haveHistory=true]
...
{noformat}

> WAL history does not used when node returns to cluster again
> 
>
> Key: IGNITE-6579
> URL: https://issues.apache.org/jira/browse/IGNITE-6579
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Vladislav Pyatkov
>
> When I have set big enough value to "WAL history size" and stop node on 20 
> minutes, I got the message from coordinator (order=1):
> {noformat}
> 2017-10-06 15:46:33.429 [WARN 
> ][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
>  Partition has been scheduled for rebalancing due to outdated update counter 
> [nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
>  cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2424, 
> haveHistory=false]
> 2017-10-06 15:46:33.429 [WARN 
> ][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
>  Partition has been scheduled for rebalancing due to outdated update counter 
> [nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
>  cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2427, 
> haveHistory=false]
> 2017-10-06 15:46:33.429 [WARN 
> ][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
>  Partition has been scheduled for rebalancing due to outdated update counter 
> [nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
>  cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2426, 
> haveHistory=false]
> {noformat}
> after start node again.
> I think, history size should be enough, but I see it is not by logs 
> (haveHistory=false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)