[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454335#comment-16454335
 ] 

ASF GitHub Bot commented on IGNITE-8295:


Github user AMashenkov closed the pull request at:

https://github.com/apache/ignite/pull/3842


> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-23 Thread Ilya Lantukh (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448399#comment-16448399
 ] 

Ilya Lantukh commented on IGNITE-8295:
--

GridCacheOffheapManager.recreateCacheDataStore() was removed in IGNITE-5874, so 
this ticket isn't relevant anymore.

> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-23 Thread Pavel Kovalenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448338#comment-16448338
 ] 

Pavel Kovalenko commented on IGNITE-8295:
-

[~agoncharuk] I confirm this. Without partition recreation presented deadlock 
is not possible.

> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-23 Thread Alexey Goncharuk (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448332#comment-16448332
 ] 

Alexey Goncharuk commented on IGNITE-8295:
--

[~amashenkov], [~ilantukh], [~Jokser], I think this change is not necessary 
since we want to get rid of partition recreation anyway. Can you confirm?

> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-23 Thread Andrew Mashenkov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448147#comment-16448147
 ] 

Andrew Mashenkov commented on IGNITE-8295:
--

[~agoncharuk], 
link to TC added to the ticket.

> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-20 Thread Alexey Goncharuk (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445614#comment-16445614
 ] 

Alexey Goncharuk commented on IGNITE-8295:
--

[~amashenkov], can you please attach a TC run link?

> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-19 Thread Ilya Lantukh (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444073#comment-16444073
 ] 

Ilya Lantukh commented on IGNITE-8295:
--

Changes look good to me. Regarding your TODO - we don't need to log WAL record 
for partition re-creation, because we log partition state changes, which are 
enough to understand that store is inconsistent.

> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-18 Thread Andrew Mashenkov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442297#comment-16442297
 ] 

Andrew Mashenkov commented on IGNITE-8295:
--

After wrap partStoreLock into checkpointLock i've got next stacktrace.
Seems, we should truncate partition file under checkpointLock.

java.lang.AssertionError: FullPageId [pageId=000100570003, 
effectivePageId=00570003, grpId=2141373874]
 at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:730)
 at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:624)
 at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:142)
 at 
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:186)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:164)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3155)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2909)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2808)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 at java.lang.Thread.run(Thread.java:748)

> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8295) Possible deadlock on partition eviction.

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440725#comment-16440725
 ] 

ASF GitHub Bot commented on IGNITE-8295:


GitHub user AMashenkov opened a pull request:

https://github.com/apache/ignite/pull/3842

IGNITE-8295: Fixed wrong checkpointLock vs partStoreLock order.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-8295

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3842.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3842


commit fb7956b0c4fc9d8b62aac1f831e0c0ef939275da
Author: Andrey V. Mashenkov 
Date:   2018-04-17T10:32:12Z

Fixed wrong checkpointLock vs partStoreLock order.




> Possible deadlock on partition eviction.
> 
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls 
> updatePartitionCounter() under partStoreLock which may try to acquire 
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on 
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap) 
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls 
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)