[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454335#comment-16454335 ] ASF GitHub Bot commented on IGNITE-8295: Github user AMashenkov closed the pull request at: https://github.com/apache/ignite/pull/3842 > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448399#comment-16448399 ] Ilya Lantukh commented on IGNITE-8295: -- GridCacheOffheapManager.recreateCacheDataStore() was removed in IGNITE-5874, so this ticket isn't relevant anymore. > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448338#comment-16448338 ] Pavel Kovalenko commented on IGNITE-8295: - [~agoncharuk] I confirm this. Without partition recreation presented deadlock is not possible. > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448332#comment-16448332 ] Alexey Goncharuk commented on IGNITE-8295: -- [~amashenkov], [~ilantukh], [~Jokser], I think this change is not necessary since we want to get rid of partition recreation anyway. Can you confirm? > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448147#comment-16448147 ] Andrew Mashenkov commented on IGNITE-8295: -- [~agoncharuk], link to TC added to the ticket. > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445614#comment-16445614 ] Alexey Goncharuk commented on IGNITE-8295: -- [~amashenkov], can you please attach a TC run link? > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444073#comment-16444073 ] Ilya Lantukh commented on IGNITE-8295: -- Changes look good to me. Regarding your TODO - we don't need to log WAL record for partition re-creation, because we log partition state changes, which are enough to understand that store is inconsistent. > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442297#comment-16442297 ] Andrew Mashenkov commented on IGNITE-8295: -- After wrap partStoreLock into checkpointLock i've got next stacktrace. Seems, we should truncate partition file under checkpointLock. java.lang.AssertionError: FullPageId [pageId=000100570003, effectivePageId=00570003, grpId=2141373874] at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:730) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:624) at org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:142) at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:186) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:164) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3155) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2909) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2808) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.
[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440725#comment-16440725 ] ASF GitHub Bot commented on IGNITE-8295: GitHub user AMashenkov opened a pull request: https://github.com/apache/ignite/pull/3842 IGNITE-8295: Fixed wrong checkpointLock vs partStoreLock order. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-8295 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/3842.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3842 commit fb7956b0c4fc9d8b62aac1f831e0c0ef939275da Author: Andrey V. Mashenkov Date: 2018-04-17T10:32:12Z Fixed wrong checkpointLock vs partStoreLock order. > Possible deadlock on partition eviction. > > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrew Mashenkov >Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)