Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-28 Thread Ivan Pavlukhin
I might have missed some context, but a side note:

> If current thread is write lock owner then it can't acquire read lock.
> RRWL doesn't allow upgrades or downgrades.

>From ReentrantReadWriteLock javadoc:
Reentrancy also allows downgrading from the write lock to a read lock,
by acquiring the write lock, then the read lock and then releasing the
write lock. However, upgrading from a read lock to the write lock is
not possible.


Best regards,
Ivan Pavlukhin

вт, 25 февр. 2020 г. в 22:05, Andrey Gura :
>
> Agree with Andrey and Slava.
>
> Your change could lead to exception in case when current thread does
> not own read lock.
>
> > Checking for a writе lock holding through method isHeldByCurrentThread is 
> > not atomic, so there is no certainty that this thread still owns a write 
> > lock.
>
> What do you mean when talk "method isHeldByCurrentThread is not
> atomic"? There is no need in any additional guarantees here because
> current thread always sees own changes and any other thread could not
> change thread reference if current thread is owner. Also current
> thread can't check lock owning and release lock concurrently.
>
> > I assume that a read lock was obtained and in this code we did not release 
> > it.
>
> If current thread is write lock owner then it can't acquire read lock.
> RRWL doesn't allow upgrades or downgrades.
>
> Stack traces from JIRA issue don't point to deadlock. "Deadlock: true"
> means that JVM detected deadlock but it doesn't mean that exactly this
> threads in deadlock.
>
> On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov
>  wrote:
> >
> > Write lock is an exclusive lock and no other threads can obtain read or
> > write lock while current thread holding write lock.
> >
> > Why you think checking for current thread is lock-owner is not atomic?
> > A thread that already holds a write lock has no needs to obtain read locks
> > unless it releases the write-lock.
> >
> > Your change can have serious impact.
> > Can you share a reproducer?
> >
> >
> > вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
> > dmitry.gorchakov.mu...@gmail.com>:
> >
> > > > Could you please describe in detail the issue you are trying to fix?
> > >
> > > I once got same deadlock on a productive in v2.7.5. New iteration of
> > > checkpoint could not start. Dump attached.
> > >
> > > > By the way, please take a look at
> > > GridCacheDatabaseSharedManager.checkpointReadLock:
> > >
> > > Thanks, I will look more closely at this check.
> > >
> > > > So, do we really need to unconditionally unlock the read lock at
> > > checkpointReadUnlock as you propose?
> > >
> > > Checking for a writе lock holding through method isHeldByCurrentThread is
> > > not atomic, so there is no certainty that this thread still owns a write
> > > lock.
> > > I assume that a read lock was obtained and in this code we did not release
> > > it.
> > >
> > > Thanks!
> > >
> > > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин  > > >:
> > >
> > >> Hello Dmitry,
> > >>
> > >> To be honest, I don't quite understand the proposed fix. Could you please
> > >> describe in detail the issue you are trying to fix?
> > >> By the way, please take a look at
> > >> GridCacheDatabaseSharedManager.checkpointReadLock:
> > >>
> > >> @Override public void checkpointReadLock() {
> > >> if (checkpointLock.writeLock().isHeldByCurrentThread())
> > >> return;
> > >> ...
> > >> }
> > >>
> > >> In other words, the read lock is not acquired in case the write lock is
> > >> already held.
> > >> So, do we really need to unconditionally unlock the read lock at
> > >> checkpointReadUnlock as you propose?
> > >>
> > >> Thanks,
> > >> S.
> > >>
> > >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
> > >> dmitry.gorchakov.mu...@gmail.com>:
> > >>
> > >> > This issue once reproduced in my productive with similar stack trace.
> > >> > During
> > >> > sources analysis I saw the use unsafe of method
> > >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> > >> > cause
> > >> > thread to starvation. Problem no longer appeared after local fix.
> > >> > I prepared pull request https://github.com/apache/ignite/pull/7445.
> > >> > Accept please jira issue to me and review my PR.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >> >
> > >>
> > >


Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-25 Thread Andrey Gura
Agree with Andrey and Slava.

Your change could lead to exception in case when current thread does
not own read lock.

> Checking for a writе lock holding through method isHeldByCurrentThread is not 
> atomic, so there is no certainty that this thread still owns a write lock.

What do you mean when talk "method isHeldByCurrentThread is not
atomic"? There is no need in any additional guarantees here because
current thread always sees own changes and any other thread could not
change thread reference if current thread is owner. Also current
thread can't check lock owning and release lock concurrently.

> I assume that a read lock was obtained and in this code we did not release it.

If current thread is write lock owner then it can't acquire read lock.
RRWL doesn't allow upgrades or downgrades.

Stack traces from JIRA issue don't point to deadlock. "Deadlock: true"
means that JVM detected deadlock but it doesn't mean that exactly this
threads in deadlock.

On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov
 wrote:
>
> Write lock is an exclusive lock and no other threads can obtain read or
> write lock while current thread holding write lock.
>
> Why you think checking for current thread is lock-owner is not atomic?
> A thread that already holds a write lock has no needs to obtain read locks
> unless it releases the write-lock.
>
> Your change can have serious impact.
> Can you share a reproducer?
>
>
> вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
> dmitry.gorchakov.mu...@gmail.com>:
>
> > > Could you please describe in detail the issue you are trying to fix?
> >
> > I once got same deadlock on a productive in v2.7.5. New iteration of
> > checkpoint could not start. Dump attached.
> >
> > > By the way, please take a look at
> > GridCacheDatabaseSharedManager.checkpointReadLock:
> >
> > Thanks, I will look more closely at this check.
> >
> > > So, do we really need to unconditionally unlock the read lock at
> > checkpointReadUnlock as you propose?
> >
> > Checking for a writе lock holding through method isHeldByCurrentThread is
> > not atomic, so there is no certainty that this thread still owns a write
> > lock.
> > I assume that a read lock was obtained and in this code we did not release
> > it.
> >
> > Thanks!
> >
> > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин  > >:
> >
> >> Hello Dmitry,
> >>
> >> To be honest, I don't quite understand the proposed fix. Could you please
> >> describe in detail the issue you are trying to fix?
> >> By the way, please take a look at
> >> GridCacheDatabaseSharedManager.checkpointReadLock:
> >>
> >> @Override public void checkpointReadLock() {
> >> if (checkpointLock.writeLock().isHeldByCurrentThread())
> >> return;
> >> ...
> >> }
> >>
> >> In other words, the read lock is not acquired in case the write lock is
> >> already held.
> >> So, do we really need to unconditionally unlock the read lock at
> >> checkpointReadUnlock as you propose?
> >>
> >> Thanks,
> >> S.
> >>
> >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
> >> dmitry.gorchakov.mu...@gmail.com>:
> >>
> >> > This issue once reproduced in my productive with similar stack trace.
> >> > During
> >> > sources analysis I saw the use unsafe of method
> >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> >> > cause
> >> > thread to starvation. Problem no longer appeared after local fix.
> >> > I prepared pull request https://github.com/apache/ignite/pull/7445.
> >> > Accept please jira issue to me and review my PR.
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >> >
> >>
> >


Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-25 Thread Andrey Mashenkov
Write lock is an exclusive lock and no other threads can obtain read or
write lock while current thread holding write lock.

Why you think checking for current thread is lock-owner is not atomic?
A thread that already holds a write lock has no needs to obtain read locks
unless it releases the write-lock.

Your change can have serious impact.
Can you share a reproducer?


вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
dmitry.gorchakov.mu...@gmail.com>:

> > Could you please describe in detail the issue you are trying to fix?
>
> I once got same deadlock on a productive in v2.7.5. New iteration of
> checkpoint could not start. Dump attached.
>
> > By the way, please take a look at
> GridCacheDatabaseSharedManager.checkpointReadLock:
>
> Thanks, I will look more closely at this check.
>
> > So, do we really need to unconditionally unlock the read lock at
> checkpointReadUnlock as you propose?
>
> Checking for a writе lock holding through method isHeldByCurrentThread is
> not atomic, so there is no certainty that this thread still owns a write
> lock.
> I assume that a read lock was obtained and in this code we did not release
> it.
>
> Thanks!
>
> вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин  >:
>
>> Hello Dmitry,
>>
>> To be honest, I don't quite understand the proposed fix. Could you please
>> describe in detail the issue you are trying to fix?
>> By the way, please take a look at
>> GridCacheDatabaseSharedManager.checkpointReadLock:
>>
>> @Override public void checkpointReadLock() {
>> if (checkpointLock.writeLock().isHeldByCurrentThread())
>> return;
>> ...
>> }
>>
>> In other words, the read lock is not acquired in case the write lock is
>> already held.
>> So, do we really need to unconditionally unlock the read lock at
>> checkpointReadUnlock as you propose?
>>
>> Thanks,
>> S.
>>
>> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
>> dmitry.gorchakov.mu...@gmail.com>:
>>
>> > This issue once reproduced in my productive with similar stack trace.
>> > During
>> > sources analysis I saw the use unsafe of method
>> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
>> > cause
>> > thread to starvation. Problem no longer appeared after local fix.
>> > I prepared pull request https://github.com/apache/ignite/pull/7445.
>> > Accept please jira issue to me and review my PR.
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> >
>>
>


Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-25 Thread Вячеслав Коптилин
Hello Dmitry,

To be honest, I don't quite understand the proposed fix. Could you please
describe in detail the issue you are trying to fix?
By the way, please take a look at
GridCacheDatabaseSharedManager.checkpointReadLock:

@Override public void checkpointReadLock() {
if (checkpointLock.writeLock().isHeldByCurrentThread())
return;
...
}

In other words, the read lock is not acquired in case the write lock is
already held.
So, do we really need to unconditionally unlock the read lock at
checkpointReadUnlock as you propose?

Thanks,
S.

вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
dmitry.gorchakov.mu...@gmail.com>:

> This issue once reproduced in my productive with similar stack trace.
> During
> sources analysis I saw the use unsafe of method
> writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> cause
> thread to starvation. Problem no longer appeared after local fix.
> I prepared pull request https://github.com/apache/ignite/pull/7445.
> Accept please jira issue to me and review my PR.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-25 Thread Nikolay Izhikov
Hello, Dmitriy.

Do you have a reproducer? minimal application that can reproduce this specific 
bug.

Can you, please, run a TC for your PR and get a TC bot visa?

> 25 февр. 2020 г., в 11:08, Dmitry Gorchakov 
>  написал(а):
> 
> This issue once reproduced in my productive with similar stack trace. During
> sources analysis I saw the use unsafe of method
> writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
> thread to starvation. Problem no longer appeared after local fix.
> I prepared pull request https://github.com/apache/ignite/pull/7445. 
> Accept please jira issue to me and review my PR.
> 
> 
> 
> 
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/



Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-25 Thread Dmitry Gorchakov
This issue once reproduced in my productive with similar stack trace. During
sources analysis I saw the use unsafe of method
writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
thread to starvation. Problem no longer appeared after local fix.
I prepared pull request https://github.com/apache/ignite/pull/7445. 
Accept please jira issue to me and review my PR.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-25 Thread Dmitry Gorchakov
This issue once reproduced in my productive with similar stack trace. During
sources analysis I saw the use unsafe of method
writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
thread to starvation. Problem no longer appeared after local fix.
I prepared pull request https://github.com/apache/ignite/pull/7445. 
Accept please jira issue to me and review my PR.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2020-02-25 Thread Dmitry Gorchakov
This issue once reproduced in my productive with similar stack trace. During
sources analysis I saw the use unsafe of method
writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
thread to starvation. Problem no longer appeared after local fix.
I prepared pull request https://github.com/apache/ignite/pull/7445. 
Accept please jira issue to me and review my PR.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


[jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

2017-09-28 Thread Joel Lang (JIRA)
Joel Lang created IGNITE-6530:
-

 Summary: Deadlock in checkpointReadLock method in 
GridCacheDatabaseSharedManager
 Key: IGNITE-6530
 URL: https://issues.apache.org/jira/browse/IGNITE-6530
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.2
Reporter: Joel Lang


Just started receiving a great deal of warnings about possible starvation in 
stripped pool.

The stack trace shows it happens in checkpointReadLock() in 
GridCacheDatabaseSharedManager.

Here are the log messages:

{noformat}
2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  
o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
Thread name: sys-stripe-4-#5%mbe%
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, 
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, 
ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, 
flags=]
Deadlock: true
Completed: 3212
Thread [name="sys-stripe-4-#5%mbe%", id=19, state=WAITING, blockCnt=12, 
waitCnt=5835]
Lock 
[object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, 
ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
 Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown 
Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown 
Source)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
at 
o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
at 
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
at 
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
at 
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
at 
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
at 
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
at 
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
at 
o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
at 
o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
at 
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at 
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at 
o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at 
o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
at 
o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at 
o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
at java.lang.Thread.run(Unknown Source)

2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  
o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
Thread name: sys-stripe-5-#6%mbe%
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, 
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, 
ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, 
flags=]
Deadlock: true
Completed: 3524
Thread [name="sys-stripe-5-#6%mbe%", id=20, state=WAITING, blockCnt=3, 
waitCnt=6730]
Lock 
[object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, 
ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
 Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown 
Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown 
Source)