[jira] [Commented] (IGNITE-13358) Improvements for partition clearing related parts
[ https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192059#comment-17192059 ] Alexey Scherbakov commented on IGNITE-13358: [~agoncharuk] I''ve fixed your comments and have got a visa. Having some connectivity issues with gitbox right now, need help with merging. > Improvements for partition clearing related parts > - > > Key: IGNITE-13358 > URL: https://issues.apache.org/jira/browse/IGNITE-13358 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Scherbakov >Assignee: Alexey Scherbakov >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We have several issues related to a partition clearing worth fixing. > 1. PartitionsEvictManager doent's provide obvious guarantees for a > correctness when a node or a cache group is stopped while partitions are > concurrently clearing. > 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write > lock, which is deadlock prone, because we currently require write lock to > destroy a partition. > 3. GridDhtLocalPartition contains a lot of messy code related to partition > clearing, most notably ClearFuture, but the clearing is done by > PartitionsEvictManager. We should get rid of a clearing code in > GridDhtLocalPartition. This should also bring better code readility and help > understand what happening during a clearing. > 4. Currently moving partitions are cleared before rebalancing in the order > different to rebalanceOrder, breaking the contract. Better to submit such > partitions for clearing to the rebalancing pool before each group starts to > rebalance. This will allow faster rebalancing (accoring to configured > rebalance pool size) and will provide rebalanceOrder guarantees. > 5. The clearing logic for for moving partitions (before rebalancing) seems > incorrect: it's possible to lost updates received during clearing. > 6. To clear partitions before full rebalancing we utilize same threads as for > a partition eviction. This can slow rebalancing even if we have resources. > Better to clear partitions in the rebalance pool (explicitely dedicated by > user). > 7. It's possible to reserve a renting partition, which have absolutely no > meaning. All operations with a renting partitions (except clearing) are a > waste of resources. > 8. Partition eviction causes system pool tasks starvation if a number of > threads in system pool=1. This can break crucial functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-13358) Improvements for partition clearing related parts
[ https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191992#comment-17191992 ] Ignite TC Bot commented on IGNITE-13358: {panel:title=Branch: [pull/8186/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/8186/head] Base: [master] : New Tests (43)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}MVCC Cache 7{color} [[tests 21|https://ci.ignite.apache.org/viewLog.html?buildId=5590168]] * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopCache_Volatile - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopCache_Persistence - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testCacheGroupDestroy_Volatile - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEvictionGroupReservation - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEviction_GroupReserved_2 - PASSED{color} ... and 10 new tests {color:#8b}Cache 7{color} [[tests 21|https://ci.ignite.apache.org/viewLog.html?buildId=5590133]] * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testCacheGroupDestroy_Volatile - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopCache_Volatile - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopCache_Persistence - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEviction_PartitionReserved - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEviction_GroupReserved - PASSED{color} ... and 10 new tests {color:#8b}Cache 6{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=5590132]] * {color:#013220}IgniteCacheTestSuite6: IgniteExchangeLatchManagerCoordinatorFailTest.testCoordinatorFailoverDuringPMEAfterServerLatchCompleted - PASSED{color} {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=5590183&buildTypeId=IgniteTests24Java8_RunAll] > Improvements for partition clearing related parts > - > > Key: IGNITE-13358 > URL: https://issues.apache.org/jira/browse/IGNITE-13358 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Scherbakov >Assignee: Alexey Scherbakov >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We have several issues related to a partition clearing worth fixing. > 1. PartitionsEvictManager doent's provide obvious guarantees for a > correctness when a node or a cache group is stopped while partitions are > concurrently clearing. > 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write > lock, which is deadlock prone, because we currently require write lock to > destroy a partition. > 3. GridDhtLocalPartition contains a lot of messy code related to partition > clearing, most notably ClearFuture, but the clearing is done by > PartitionsEvictManager. We should get rid of a clearing code in > GridDhtLocalPartition. This should also bring better code r
[jira] [Commented] (IGNITE-13358) Improvements for partition clearing related parts
[ https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191616#comment-17191616 ] Alexey Goncharuk commented on IGNITE-13358: --- [~ascherbakov], I've left some comments in the PR, please take a look. > Improvements for partition clearing related parts > - > > Key: IGNITE-13358 > URL: https://issues.apache.org/jira/browse/IGNITE-13358 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Scherbakov >Assignee: Alexey Scherbakov >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We have several issues related to a partition clearing worth fixing. > 1. PartitionsEvictManager doent's provide obvious guarantees for a > correctness when a node or a cache group is stopped while partitions are > concurrently clearing. > 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write > lock, which is deadlock prone, because we currently require write lock to > destroy a partition. > 3. GridDhtLocalPartition contains a lot of messy code related to partition > clearing, most notably ClearFuture, but the clearing is done by > PartitionsEvictManager. We should get rid of a clearing code in > GridDhtLocalPartition. This should also bring better code readility and help > understand what happening during a clearing. > 4. Currently moving partitions are cleared before rebalancing in the order > different to rebalanceOrder, breaking the contract. Better to submit such > partitions for clearing to the rebalancing pool before each group starts to > rebalance. This will allow faster rebalancing (accoring to configured > rebalance pool size) and will provide rebalanceOrder guarantees. > 5. The clearing logic for for moving partitions (before rebalancing) seems > incorrect: it's possible to lost updates received during clearing. > 6. To clear partitions before full rebalancing we utilize same threads as for > a partition eviction. This can slow rebalancing even if we have resources. > Better to clear partitions in the rebalance pool (explicitely dedicated by > user). > 7. It's possible to reserve a renting partition, which have absolutely no > meaning. All operations with a renting partitions (except clearing) are a > waste of resources. > 8. Partition eviction causes system pool tasks starvation if a number of > threads in system pool=1. This can break crucial functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-13358) Improvements for partition clearing related parts
[ https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191320#comment-17191320 ] Ignite TC Bot commented on IGNITE-13358: {panel:title=Branch: [pull/8186/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/8186/head] Base: [master] : New Tests (43)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}MVCC Cache 7{color} [[tests 21|https://ci.ignite.apache.org/viewLog.html?buildId=5587496]] * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopCache_Volatile - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopCache_Persistence - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testCacheGroupDestroy_Volatile - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEvictionGroupReservation - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEviction_GroupReserved_2 - PASSED{color} ... and 10 new tests {color:#8b}Cache 7{color} [[tests 21|https://ci.ignite.apache.org/viewLog.html?buildId=5587498]] * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testCacheGroupDestroy_Volatile - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopCache_Volatile - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopCache_Persistence - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEviction_PartitionReserved - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: RentingPartitionIsOwnedDuringEvictionTest.testOwnedAfterEviction_GroupReserved - PASSED{color} ... and 10 new tests {color:#8b}Cache 6{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=5585689]] * {color:#013220}IgniteCacheTestSuite6: IgniteExchangeLatchManagerCoordinatorFailTest.testCoordinatorFailoverDuringPMEAfterServerLatchCompleted - PASSED{color} {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=5585740&buildTypeId=IgniteTests24Java8_RunAll] > Improvements for partition clearing related parts > - > > Key: IGNITE-13358 > URL: https://issues.apache.org/jira/browse/IGNITE-13358 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Scherbakov >Assignee: Alexey Scherbakov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We have several issues related to a partition clearing worth fixing. > 1. PartitionsEvictManager doent's provide obvious guarantees for a > correctness when a node or a cache group is stopped while partitions are > concurrently clearing. > 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write > lock, which is deadlock prone, because we currently require write lock to > destroy a partition. > 3. GridDhtLocalPartition contains a lot of messy code related to partition > clearing, most notably ClearFuture, but the clearing is done by > PartitionsEvictManager. We should get rid of a clearing code in > GridDhtLocalPartition. This should also bring better code r
[jira] [Commented] (IGNITE-13358) Improvements for partition clearing related parts
[ https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187494#comment-17187494 ] Alexey Scherbakov commented on IGNITE-13358: [~agoncharuk] Can you look at this ? > Improvements for partition clearing related parts > - > > Key: IGNITE-13358 > URL: https://issues.apache.org/jira/browse/IGNITE-13358 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Scherbakov >Assignee: Alexey Scherbakov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We have several issues related to a partition clearing worth fixing. > 1. PartitionsEvictManager doent's provide obvious guarantees for a > correctness when a node or a cache group is stopped while partitions are > concurrently clearing. > 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write > lock, which is deadlock prone, because we currently require write lock to > destroy a partition. > 3. GridDhtLocalPartition contains a lot of messy code related to partition > clearing, most notably ClearFuture, but the clearing is done by > PartitionsEvictManager. We should get rid of a clearing code in > GridDhtLocalPartition. This should also bring better code readility and help > understand what happening during a clearing. > 4. Currently moving partitions are cleared before rebalancing in the order > different to rebalanceOrder, breaking the contract. Better to submit such > partitions for clearing to the rebalancing pool before each group starts to > rebalance. This will allow faster rebalancing (accoring to configured > rebalance pool size) and will provide rebalanceOrder guarantees. > 5. The clearing logic for for moving partitions (before rebalancing) seems > incorrect: it's possible to lost updates received during clearing. > 6. To clear partitions before full rebalancing we utilize same threads as for > a partition eviction. This can slow rebalancing even if we have resources. > Better to clear partitions in the rebalance pool (explicitely dedicated by > user). > 7. It's possible to reserve a renting partition, which have absolutely no > meaning. All operations with a renting partitions (except clearing) are a > waste of resources. > 8. Partition eviction causes system pool tasks starvation if a number of > threads in system pool=1. This can break crucial functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-13358) Improvements for partition clearing related parts
[ https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187491#comment-17187491 ] Ignite TC Bot commented on IGNITE-13358: {panel:title=Branch: [pull/8186/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/8186/head] Base: [master] : New Tests (33)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}MVCC Cache 7{color} [[tests 16|https://ci.ignite.apache.org/viewLog.html?buildId=5573494]] * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopCache_Volatile - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testStopCache_Persistence - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testCacheGroupDestroy_Volatile - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation_2 - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testDeactivation_Persistence - PASSED{color} * {color:#013220}IgniteCacheMvccTestSuite7: BlockedEvictionsTest.testFailureHandler - PASSED{color} ... and 5 new tests {color:#8b}Cache 7{color} [[tests 16|https://ci.ignite.apache.org/viewLog.html?buildId=5573290]] * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservationStartClient_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testCacheGroupDestroy_Volatile - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: EvictionWhilePartitionGroupIsReservedTest.testGroupReservation_2 - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopCache_Volatile - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopNodeDuringEviction - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testStopCache_Persistence - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testDeactivation_Persistence - PASSED{color} * {color:#013220}IgniteCacheTestSuite7: BlockedEvictionsTest.testFailureHandler - PASSED{color} ... and 5 new tests {color:#8b}Cache 6{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=5573289]] * {color:#013220}IgniteCacheTestSuite6: IgniteExchangeLatchManagerCoordinatorFailTest.testCoordinatorFailoverDuringPMEAfterServerLatchCompleted - PASSED{color} {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=5573340&buildTypeId=IgniteTests24Java8_RunAll] > Improvements for partition clearing related parts > - > > Key: IGNITE-13358 > URL: https://issues.apache.org/jira/browse/IGNITE-13358 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Scherbakov >Assignee: Alexey Scherbakov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We have several issues related to a partition clearing worth fixing. > 1. PartitionsEvictManager doent's provide obvious guarantees for a > correctness when a node or a cache group is stopped while partitions are > concurrently clearing. > 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write > lock, which is deadlock prone, because we currently require write lock to > destroy a partition. > 3. GridDhtLocalPartition contains a lot of messy code related to partition > clearing, most notably ClearFuture, but the clearing is done by > PartitionsEvictManager. We should get rid of a clearing code in > GridDhtLocalPartition. This should also bring better code readility and help > understand what happening during a clearing. > 4. Currently moving partitions are cleared before rebalancing in the order > diff