[jira] [Comment Edited] (IGNITE-11704) Write tombstones during rebalance to get rid of deferred delete buffer
[ https://issues.apache.org/jira/browse/IGNITE-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379945#comment-17379945 ] Pavel Tupitsyn edited comment on IGNITE-11704 at 7/13/21, 2:59 PM: --- TC run above is for https://github.com/apache/ignite/pull/9249: "IGNITE-11704 Add PARTITION_META_PAGE_DELTA_RECORD_V4 WALRecord type placeholder" Merged to master: 97edc893d5c6bcd458364746c5624a0bf9d1ae81 was (Author: ptupitsyn): TC run above is for https://github.com/apache/ignite/pull/9249: "IGNITE-11704 Add PARTITION_META_PAGE_DELTA_RECORD_V4 WALRecord type placeholder" > Write tombstones during rebalance to get rid of deferred delete buffer > -- > > Key: IGNITE-11704 > URL: https://issues.apache.org/jira/browse/IGNITE-11704 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Goncharuk >Assignee: Alexey Scherbakov >Priority: Major > Labels: rebalance > Time Spent: 50m > Remaining Estimate: 0h > > Currently Ignite relies on deferred delete buffer in order to handle > write-remove conflicts during rebalance. Given the limit size of the buffer, > this approach is fundamentally flawed, especially in case when persistence is > enabled. > I suggest to extend the logic of data storage to be able to store key > tombstones - to keep version for deleted entries. The tombstones will be > stored when rebalance is in progress and should be cleaned up when rebalance > is completed. > Later this approach may be used to implement fast partition rebalance based > on merkle trees (in this case, tombstones should be written on an incomplete > baseline). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (IGNITE-11704) Write tombstones during rebalance to get rid of deferred delete buffer
[ https://issues.apache.org/jira/browse/IGNITE-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946165#comment-16946165 ] Alexei Scherbakov edited comment on IGNITE-11704 at 10/8/19 7:10 AM: - [~jokser] [~sboikov] I've reviewed changes. Overall looks good, but still I have some questions. 1. My main concern is regarding the necessity of tombstoneBytes 5-bytes object. Seems it's possible to implement tombstone by treating absence of value as a tombstone. For example, valLen=0 could be treated as tombstone presense. Doing so we can get rid of 5 bytes comparison, and instead do null check: {noformat} private Boolean isTombstone(ByteBuffer buf, int offset) { int valLen = buf.getInt(buf.position() + offset); if (valLen != tombstoneBytes.length) return Boolean.FALSE; ... } {noformat} Instead we can do something like {{if (valLen == 0) return Boolean.TRUE}} 2. With new changes in PartitionsEvictManager it's possible to have two tasks of different types for the same partition. Consider a scenario: * node finished rebalancing and starts to clear thombstones * another node joins topology and become an owner for clearing partition. * eviction is started for already clearing partition. Probably this should not be allowed. 3. I see changes having no obvious relation to contribution, for example: static String cacheGroupMetricsRegistryName(String cacheGrp) DropCacheContextDuringEvictionTest.java GridCommandHandlerIndexingTest.java What's the purpose of these ? 4. Could you clarify the change in org.apache.ignite.internal.processors.cache.GridCacheMapEntry#initialValue: update0 |= (!preload && val == null); ? was (Author: ascherbakov): [~jokser] [~sboikov] I've reviewed changes. Overall looks good, but still I have some questions. 1. My main concern is regarding the necessity of tombstoneBytes 5-bytes object. Seems it's possible to implement tombstone by treating absence of value as a tombstone. For example, valLen=0 could be treated as tombstone presense. Doing so we can get rid of 5 bytes comparison, and instead do null check: {noformat} private Boolean isTombstone(ByteBuffer buf, int offset) { int valLen = buf.getInt(buf.position() + offset); if (valLen != tombstoneBytes.length) return Boolean.FALSE; ... } {noformat} Instead we can do something like {{if (valLen == 0) return true}} 2. With new changes in PartitionsEvictManager it's possible to have two tasks of different types for the same partition. Consider a scenario: * node finished rebalancing and starts to clear thombstones * another node joins topology and become an owner for clearing partition. * eviction is started for already clearing partition. Probably this should not be allowed. 3. I see changes having no obvious relation to contribution, for example: static String cacheGroupMetricsRegistryName(String cacheGrp) DropCacheContextDuringEvictionTest.java GridCommandHandlerIndexingTest.java What's the purpose of these ? 4. Could you clarify the change in org.apache.ignite.internal.processors.cache.GridCacheMapEntry#initialValue: update0 |= (!preload && val == null); ? > Write tombstones during rebalance to get rid of deferred delete buffer > -- > > Key: IGNITE-11704 > URL: https://issues.apache.org/jira/browse/IGNITE-11704 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Goncharuk >Assignee: Pavel Kovalenko >Priority: Major > Labels: rebalance > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently Ignite relies on deferred delete buffer in order to handle > write-remove conflicts during rebalance. Given the limit size of the buffer, > this approach is fundamentally flawed, especially in case when persistence is > enabled. > I suggest to extend the logic of data storage to be able to store key > tombstones - to keep version for deleted entries. The tombstones will be > stored when rebalance is in progress and should be cleaned up when rebalance > is completed. > Later this approach may be used to implement fast partition rebalance based > on merkle trees (in this case, tombstones should be written on an incomplete > baseline). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (IGNITE-11704) Write tombstones during rebalance to get rid of deferred delete buffer
[ https://issues.apache.org/jira/browse/IGNITE-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946165#comment-16946165 ] Alexei Scherbakov edited comment on IGNITE-11704 at 10/8/19 7:04 AM: - [~jokser] [~sboikov] I've reviewed changes. Overall looks good, but still I have some questions. 1. My main concern is regarding the necessity of tombstoneBytes 5-bytes object. Seems it's possible to implement tombstone by treating absence of value as a tombstone. For example, valLen=0 could be treated as tombstone presense. Doing so we can get rid of 5 bytes comparison, and instead do null check: {noformat} private Boolean isTombstone(ByteBuffer buf, int offset) { int valLen = buf.getInt(buf.position() + offset); if (valLen != tombstoneBytes.length) return Boolean.FALSE; ... } {noformat} Instead we can do something like {{if (valLen == 0) return true}} 2. With new changes in PartitionsEvictManager it's possible to have two tasks of different types for the same partition. Consider a scenario: * node finished rebalancing and starts to clear thombstones * another node joins topology and become an owner for clearing partition. * eviction is started for already clearing partition. Probably this should not be allowed. 3. I see changes having no obvious relation to contribution, for example: static String cacheGroupMetricsRegistryName(String cacheGrp) DropCacheContextDuringEvictionTest.java GridCommandHandlerIndexingTest.java What's the purpose of these ? 4. Could you clarify the change in org.apache.ignite.internal.processors.cache.GridCacheMapEntry#initialValue: update0 |= (!preload && val == null); ? was (Author: ascherbakov): [~jokser] [~sboikov] I've reviewed changes. Overall looks good, but still I have some questions. 1. My main concern is regarding the necessity of tombstoneBytes 5-bytes object. Seems it's possible to implement tombstone by treating absence of value as a tombstone. For example, valLen=0 could be treated as tombstone presense. Doing so we can get rid of 5 bytes comparison, and instead do null check: {noformat} private Boolean isTombstone(ByteBuffer buf, int offset) { int valLen = buf.getInt(buf.position() + offset); if (valLen != tombstoneBytes.length) return Boolean.FALSE; ... } {noformat} Instead we can do something like {{if (valLen == 0) return true}} 2. With new changes in PartitionsEvictManager it's possible to have two tasks of different types for the same partition. Consider a scenario: * node finished rebalancing and starts to clear thombstones * another node joins topology and become an owner for clearing partition. * eviction is started for already clearing partition. Probably this should not be allowed. 3. I see changes having no obvious relation to contribution, for example: static String cacheGroupMetricsRegistryName(String cacheGrp) DropCacheContextDuringEvictionTest.java GridCommandHandlerIndexingTest.java What's the purpose of these ? 4. Could you explain the modification in org.apache.ignite.internal.processors.cache.GridCacheMapEntry#initialValue: update0 |= (!preload && val == null); ? > Write tombstones during rebalance to get rid of deferred delete buffer > -- > > Key: IGNITE-11704 > URL: https://issues.apache.org/jira/browse/IGNITE-11704 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Goncharuk >Assignee: Pavel Kovalenko >Priority: Major > Labels: rebalance > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently Ignite relies on deferred delete buffer in order to handle > write-remove conflicts during rebalance. Given the limit size of the buffer, > this approach is fundamentally flawed, especially in case when persistence is > enabled. > I suggest to extend the logic of data storage to be able to store key > tombstones - to keep version for deleted entries. The tombstones will be > stored when rebalance is in progress and should be cleaned up when rebalance > is completed. > Later this approach may be used to implement fast partition rebalance based > on merkle trees (in this case, tombstones should be written on an incomplete > baseline). -- This message was sent by Atlassian Jira (v8.3.4#803005)