Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
I might have missed some context, but a side note: > If current thread is write lock owner then it can't acquire read lock. > RRWL doesn't allow upgrades or downgrades. >From ReentrantReadWriteLock javadoc: Reentrancy also allows downgrading from the write lock to a read lock, by acquiring the write lock, then the read lock and then releasing the write lock. However, upgrading from a read lock to the write lock is not possible. Best regards, Ivan Pavlukhin вт, 25 февр. 2020 г. в 22:05, Andrey Gura : > > Agree with Andrey and Slava. > > Your change could lead to exception in case when current thread does > not own read lock. > > > Checking for a writе lock holding through method isHeldByCurrentThread is > > not atomic, so there is no certainty that this thread still owns a write > > lock. > > What do you mean when talk "method isHeldByCurrentThread is not > atomic"? There is no need in any additional guarantees here because > current thread always sees own changes and any other thread could not > change thread reference if current thread is owner. Also current > thread can't check lock owning and release lock concurrently. > > > I assume that a read lock was obtained and in this code we did not release > > it. > > If current thread is write lock owner then it can't acquire read lock. > RRWL doesn't allow upgrades or downgrades. > > Stack traces from JIRA issue don't point to deadlock. "Deadlock: true" > means that JVM detected deadlock but it doesn't mean that exactly this > threads in deadlock. > > On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov > wrote: > > > > Write lock is an exclusive lock and no other threads can obtain read or > > write lock while current thread holding write lock. > > > > Why you think checking for current thread is lock-owner is not atomic? > > A thread that already holds a write lock has no needs to obtain read locks > > unless it releases the write-lock. > > > > Your change can have serious impact. > > Can you share a reproducer? > > > > > > вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков < > > dmitry.gorchakov.mu...@gmail.com>: > > > > > > Could you please describe in detail the issue you are trying to fix? > > > > > > I once got same deadlock on a productive in v2.7.5. New iteration of > > > checkpoint could not start. Dump attached. > > > > > > > By the way, please take a look at > > > GridCacheDatabaseSharedManager.checkpointReadLock: > > > > > > Thanks, I will look more closely at this check. > > > > > > > So, do we really need to unconditionally unlock the read lock at > > > checkpointReadUnlock as you propose? > > > > > > Checking for a writе lock holding through method isHeldByCurrentThread is > > > not atomic, so there is no certainty that this thread still owns a write > > > lock. > > > I assume that a read lock was obtained and in this code we did not release > > > it. > > > > > > Thanks! > > > > > > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин > > >: > > > > > >> Hello Dmitry, > > >> > > >> To be honest, I don't quite understand the proposed fix. Could you please > > >> describe in detail the issue you are trying to fix? > > >> By the way, please take a look at > > >> GridCacheDatabaseSharedManager.checkpointReadLock: > > >> > > >> @Override public void checkpointReadLock() { > > >> if (checkpointLock.writeLock().isHeldByCurrentThread()) > > >> return; > > >> ... > > >> } > > >> > > >> In other words, the read lock is not acquired in case the write lock is > > >> already held. > > >> So, do we really need to unconditionally unlock the read lock at > > >> checkpointReadUnlock as you propose? > > >> > > >> Thanks, > > >> S. > > >> > > >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov < > > >> dmitry.gorchakov.mu...@gmail.com>: > > >> > > >> > This issue once reproduced in my productive with similar stack trace. > > >> > During > > >> > sources analysis I saw the use unsafe of method > > >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may > > >> > cause > > >> > thread to starvation. Problem no longer appeared after local fix. > > >> > I prepared pull request https://github.com/apache/ignite/pull/7445. > > >> > Accept please jira issue to me and review my PR. > > >> > > > >> > > > >> > > > >> > > > >> > -- > > >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > >> > > > >> > > >
Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
Agree with Andrey and Slava. Your change could lead to exception in case when current thread does not own read lock. > Checking for a writе lock holding through method isHeldByCurrentThread is not > atomic, so there is no certainty that this thread still owns a write lock. What do you mean when talk "method isHeldByCurrentThread is not atomic"? There is no need in any additional guarantees here because current thread always sees own changes and any other thread could not change thread reference if current thread is owner. Also current thread can't check lock owning and release lock concurrently. > I assume that a read lock was obtained and in this code we did not release it. If current thread is write lock owner then it can't acquire read lock. RRWL doesn't allow upgrades or downgrades. Stack traces from JIRA issue don't point to deadlock. "Deadlock: true" means that JVM detected deadlock but it doesn't mean that exactly this threads in deadlock. On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov wrote: > > Write lock is an exclusive lock and no other threads can obtain read or > write lock while current thread holding write lock. > > Why you think checking for current thread is lock-owner is not atomic? > A thread that already holds a write lock has no needs to obtain read locks > unless it releases the write-lock. > > Your change can have serious impact. > Can you share a reproducer? > > > вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков < > dmitry.gorchakov.mu...@gmail.com>: > > > > Could you please describe in detail the issue you are trying to fix? > > > > I once got same deadlock on a productive in v2.7.5. New iteration of > > checkpoint could not start. Dump attached. > > > > > By the way, please take a look at > > GridCacheDatabaseSharedManager.checkpointReadLock: > > > > Thanks, I will look more closely at this check. > > > > > So, do we really need to unconditionally unlock the read lock at > > checkpointReadUnlock as you propose? > > > > Checking for a writе lock holding through method isHeldByCurrentThread is > > not atomic, so there is no certainty that this thread still owns a write > > lock. > > I assume that a read lock was obtained and in this code we did not release > > it. > > > > Thanks! > > > > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин > >: > > > >> Hello Dmitry, > >> > >> To be honest, I don't quite understand the proposed fix. Could you please > >> describe in detail the issue you are trying to fix? > >> By the way, please take a look at > >> GridCacheDatabaseSharedManager.checkpointReadLock: > >> > >> @Override public void checkpointReadLock() { > >> if (checkpointLock.writeLock().isHeldByCurrentThread()) > >> return; > >> ... > >> } > >> > >> In other words, the read lock is not acquired in case the write lock is > >> already held. > >> So, do we really need to unconditionally unlock the read lock at > >> checkpointReadUnlock as you propose? > >> > >> Thanks, > >> S. > >> > >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov < > >> dmitry.gorchakov.mu...@gmail.com>: > >> > >> > This issue once reproduced in my productive with similar stack trace. > >> > During > >> > sources analysis I saw the use unsafe of method > >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may > >> > cause > >> > thread to starvation. Problem no longer appeared after local fix. > >> > I prepared pull request https://github.com/apache/ignite/pull/7445. > >> > Accept please jira issue to me and review my PR. > >> > > >> > > >> > > >> > > >> > -- > >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > >> > > >> > >
Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
Write lock is an exclusive lock and no other threads can obtain read or write lock while current thread holding write lock. Why you think checking for current thread is lock-owner is not atomic? A thread that already holds a write lock has no needs to obtain read locks unless it releases the write-lock. Your change can have serious impact. Can you share a reproducer? вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков < dmitry.gorchakov.mu...@gmail.com>: > > Could you please describe in detail the issue you are trying to fix? > > I once got same deadlock on a productive in v2.7.5. New iteration of > checkpoint could not start. Dump attached. > > > By the way, please take a look at > GridCacheDatabaseSharedManager.checkpointReadLock: > > Thanks, I will look more closely at this check. > > > So, do we really need to unconditionally unlock the read lock at > checkpointReadUnlock as you propose? > > Checking for a writе lock holding through method isHeldByCurrentThread is > not atomic, so there is no certainty that this thread still owns a write > lock. > I assume that a read lock was obtained and in this code we did not release > it. > > Thanks! > > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин >: > >> Hello Dmitry, >> >> To be honest, I don't quite understand the proposed fix. Could you please >> describe in detail the issue you are trying to fix? >> By the way, please take a look at >> GridCacheDatabaseSharedManager.checkpointReadLock: >> >> @Override public void checkpointReadLock() { >> if (checkpointLock.writeLock().isHeldByCurrentThread()) >> return; >> ... >> } >> >> In other words, the read lock is not acquired in case the write lock is >> already held. >> So, do we really need to unconditionally unlock the read lock at >> checkpointReadUnlock as you propose? >> >> Thanks, >> S. >> >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov < >> dmitry.gorchakov.mu...@gmail.com>: >> >> > This issue once reproduced in my productive with similar stack trace. >> > During >> > sources analysis I saw the use unsafe of method >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may >> > cause >> > thread to starvation. Problem no longer appeared after local fix. >> > I prepared pull request https://github.com/apache/ignite/pull/7445. >> > Accept please jira issue to me and review my PR. >> > >> > >> > >> > >> > -- >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ >> > >> >
Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
Hello Dmitry, To be honest, I don't quite understand the proposed fix. Could you please describe in detail the issue you are trying to fix? By the way, please take a look at GridCacheDatabaseSharedManager.checkpointReadLock: @Override public void checkpointReadLock() { if (checkpointLock.writeLock().isHeldByCurrentThread()) return; ... } In other words, the read lock is not acquired in case the write lock is already held. So, do we really need to unconditionally unlock the read lock at checkpointReadUnlock as you propose? Thanks, S. вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov < dmitry.gorchakov.mu...@gmail.com>: > This issue once reproduced in my productive with similar stack trace. > During > sources analysis I saw the use unsafe of method > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may > cause > thread to starvation. Problem no longer appeared after local fix. > I prepared pull request https://github.com/apache/ignite/pull/7445. > Accept please jira issue to me and review my PR. > > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ >
Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
Hello, Dmitriy. Do you have a reproducer? minimal application that can reproduce this specific bug. Can you, please, run a TC for your PR and get a TC bot visa? > 25 февр. 2020 г., в 11:08, Dmitry Gorchakov > написал(а): > > This issue once reproduced in my productive with similar stack trace. During > sources analysis I saw the use unsafe of method > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause > thread to starvation. Problem no longer appeared after local fix. > I prepared pull request https://github.com/apache/ignite/pull/7445. > Accept please jira issue to me and review my PR. > > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
This issue once reproduced in my productive with similar stack trace. During sources analysis I saw the use unsafe of method writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause thread to starvation. Problem no longer appeared after local fix. I prepared pull request https://github.com/apache/ignite/pull/7445. Accept please jira issue to me and review my PR. -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
This issue once reproduced in my productive with similar stack trace. During sources analysis I saw the use unsafe of method writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause thread to starvation. Problem no longer appeared after local fix. I prepared pull request https://github.com/apache/ignite/pull/7445. Accept please jira issue to me and review my PR. -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
This issue once reproduced in my productive with similar stack trace. During sources analysis I saw the use unsafe of method writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause thread to starvation. Problem no longer appeared after local fix. I prepared pull request https://github.com/apache/ignite/pull/7445. Accept please jira issue to me and review my PR. -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
[jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
Joel Lang created IGNITE-6530: - Summary: Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager Key: IGNITE-6530 URL: https://issues.apache.org/jira/browse/IGNITE-6530 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.2 Reporter: Joel Lang Just started receiving a great deal of warnings about possible starvation in stripped pool. The stack trace shows it happens in checkpointReadLock() in GridCacheDatabaseSharedManager. Here are the log messages: {noformat} 2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool. Thread name: sys-stripe-4-#5%mbe% Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=] Deadlock: true Completed: 3212 Thread [name="sys-stripe-4-#5%mbe%", id=19, state=WAITING, blockCnt=12, waitCnt=5835] Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source) at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260) at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042) at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561) at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126) at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097) at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483) at java.lang.Thread.run(Unknown Source) 2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool. Thread name: sys-stripe-5-#6%mbe% Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=] Deadlock: true Completed: 3524 Thread [name="sys-stripe-5-#6%mbe%", id=20, state=WAITING, blockCnt=3, waitCnt=6730] Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)