Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Mathieu Desnoyers
- On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote:

> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

By the way, there are other reasons for this lock in the urcu implementation:
it protects the rcu_registry, and the rcu_gp.ctr count.

Thanks,

Mathieu

> 
> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
>> Hi Paul and Yuxin,
>>
>> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>>> Try building without it and see what happens when you run the tests.
>>>
>>
>> I've run a 'regtest' with the following modification out of curiousity:
>>
>> diff --git a/urcu.c b/urcu.c
>> index a5568bdbd075..9dc3c9feae56 100644
>> --- a/urcu.c
>> +++ b/urcu.c
>> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>> /* We won't need to wake ourself up */
>> urcu_wait_set_state(, URCU_WAIT_RUNNING);
>>
>> -   mutex_lock(_gp_lock);
>> -
>> /*
>>  * Move all waiters into our local queue.
>>  */
>> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>> smp_mb_master();
>>  out:
>> mutex_unlock(_registry_lock);
>> -   mutex_unlock(_gp_lock);
>>
>> /*
>>  * Wakeup waiters only after we have completed the grace period
>>
>>
>> And guess what, the result of the test was:
>>
>> Test Summary Report
>> ---
>> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>>   150, 165, 180, 195, 210, 225, 240, 253
>>   255
>> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
>> 597.64 csys = 9579.25 CPU)
>> Result: FAIL
>>
>> And test case 30 for example is something like:
>>
>> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
>>
>> and it failed because:
>>
>> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
>> failed.
>> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
>> -d
>> 0 -b 32768
>>
>> So I think what was going on here was:
>>
>> CPU 0 (reader)  CPU 1 (writer)
>> CPU 2 (writer)
>> === 
>> ==
>> rcu_read_lock();
>> new =
>> malloc(sizeof(int));
>> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>> *new = 8;
>>  
>>old = rcu_xchg_pointer(_rcu_pointer, new);
>> synchronize_rcu():
>>   urcu_wait_add(); // return 0
>>   urcu_move_waiters(); // @gp_waiters is 
>> empty,
>>// the next 
>> urcu_wait_add() will return 0
>>
>>  
>>synchronize_rcu():
>>  
>>  urcu_wait_add(); // return 0
>>
>>   mutex_lock(_register_lock);
>>   wait_for_readers(); // remove registered 
>> reader from @registery,
>>   // release 
>> rcu_register_lock and wait via poll()
>>
>>  
>>  mutex_lock(_registry_lock);
>>  
>>  wait_for_readers(); // @regsitery is empty! we are so lucky
>>  
>>  return;
>>
>>  
>>if (old)
>>  
>>free(old)
>> rcu_read_unlock();
>> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>>
>>
>> So the point is there could be two writers calling synchronize_rcu() but
>> not returning early(both of them enter the slow path to perform a grace
>> period), so the rcu_gp_lock is necessary in this case.
>>
>> (Cc  Mathieu)
>>
>> But this is only my understanding and I'm learning the URCU code too ;-)
>>
>> Regards,
>> Boqun
>>
>>
>>> Might well be that it is unnecessary, but I will defer to Mathieu
>>> on 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Paul E. McKenney
On Thu, Apr 28, 2016 at 02:38:40PM +, Mathieu Desnoyers wrote:
> - On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote:
> 
> > Hi Boqun and Paul,
> > 
> > Thank you so much for your help.
> > 
> > I found one reason to use that lock.
> > In the slow path, a thread will move all waiters to a local queue.
> > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> > After this, following thread can also perform grace period, as the
> > global waiter queue is empty.
> > Thus the rcu_gp_lock is used to ensure mutual exclusion.
> > 
> > However, from real time aspect, such blocking will cause priority
> > inversion: higher priority writer can be blocked by low priority
> > writer.
> > Is there a way to modify the code to allow multiple threads to perform
> > grace period concurrently?
> 
> Before we redesign urcu for RT, would it be possible to simply
> use pi-mutexes (priority inheritance) instead to protect grace periods
> from each other with the current urcu scheme ?

Given that priority inversion can happen with low-priority readers
blocking a grace period that a high-priority updater is waiting on,
I stand by my earlier advice:  Don't let high-priority updaters block
waiting for grace periods.  ;-)

Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Thanks again!!
> > Yuxin
> > 
> > On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
> >> Hi Paul and Yuxin,
> >>
> >> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> >>> Try building without it and see what happens when you run the tests.
> >>>
> >>
> >> I've run a 'regtest' with the following modification out of curiousity:
> >>
> >> diff --git a/urcu.c b/urcu.c
> >> index a5568bdbd075..9dc3c9feae56 100644
> >> --- a/urcu.c
> >> +++ b/urcu.c
> >> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> >> /* We won't need to wake ourself up */
> >> urcu_wait_set_state(, URCU_WAIT_RUNNING);
> >>
> >> -   mutex_lock(_gp_lock);
> >> -
> >> /*
> >>  * Move all waiters into our local queue.
> >>  */
> >> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> >> smp_mb_master();
> >>  out:
> >> mutex_unlock(_registry_lock);
> >> -   mutex_unlock(_gp_lock);
> >>
> >> /*
> >>  * Wakeup waiters only after we have completed the grace period
> >>
> >>
> >> And guess what, the result of the test was:
> >>
> >> Test Summary Report
> >> ---
> >> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
> >>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
> >>   150, 165, 180, 195, 210, 225, 240, 253
> >>   255
> >> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
> >> 597.64 csys = 9579.25 CPU)
> >> Result: FAIL
> >>
> >> And test case 30 for example is something like:
> >>
> >> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
> >>
> >> and it failed because:
> >>
> >> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
> >> failed.
> >> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 
> >> 1 -d
> >> 0 -b 32768
> >>
> >> So I think what was going on here was:
> >>
> >> CPU 0 (reader)  CPU 1 (writer)
> >> CPU 2 (writer)
> >> === 
> >> ==
> >> rcu_read_lock();
> >> new =
> >> malloc(sizeof(int));
> >> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
> >> *new = 8;
> >>
> >>  old = rcu_xchg_pointer(_rcu_pointer, new);
> >> synchronize_rcu():
> >>   urcu_wait_add(); // return 0
> >>   urcu_move_waiters(); // @gp_waiters is 
> >> empty,
> >>// the next 
> >> urcu_wait_add() will return 0
> >>
> >>
> >>  synchronize_rcu():
> >>
> >>urcu_wait_add(); // return 0
> >>
> >>   mutex_lock(_register_lock);
> >>   wait_for_readers(); // remove registered 
> >> reader from @registery,
> >>   // release 
> >> rcu_register_lock and wait via poll()
> >>
> >>
> >>mutex_lock(_registry_lock);
> >>
> >>wait_for_readers(); // @regsitery is empty! we are so lucky
> >>
> >>return;
> >>
> >>   

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Mathieu Desnoyers
- On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote:

> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

Before we redesign urcu for RT, would it be possible to simply
use pi-mutexes (priority inheritance) instead to protect grace periods
from each other with the current urcu scheme ?

Thanks,

Mathieu


> 
> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
>> Hi Paul and Yuxin,
>>
>> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>>> Try building without it and see what happens when you run the tests.
>>>
>>
>> I've run a 'regtest' with the following modification out of curiousity:
>>
>> diff --git a/urcu.c b/urcu.c
>> index a5568bdbd075..9dc3c9feae56 100644
>> --- a/urcu.c
>> +++ b/urcu.c
>> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>> /* We won't need to wake ourself up */
>> urcu_wait_set_state(, URCU_WAIT_RUNNING);
>>
>> -   mutex_lock(_gp_lock);
>> -
>> /*
>>  * Move all waiters into our local queue.
>>  */
>> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>> smp_mb_master();
>>  out:
>> mutex_unlock(_registry_lock);
>> -   mutex_unlock(_gp_lock);
>>
>> /*
>>  * Wakeup waiters only after we have completed the grace period
>>
>>
>> And guess what, the result of the test was:
>>
>> Test Summary Report
>> ---
>> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>>   150, 165, 180, 195, 210, 225, 240, 253
>>   255
>> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
>> 597.64 csys = 9579.25 CPU)
>> Result: FAIL
>>
>> And test case 30 for example is something like:
>>
>> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
>>
>> and it failed because:
>>
>> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
>> failed.
>> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
>> -d
>> 0 -b 32768
>>
>> So I think what was going on here was:
>>
>> CPU 0 (reader)  CPU 1 (writer)
>> CPU 2 (writer)
>> === 
>> ==
>> rcu_read_lock();
>> new =
>> malloc(sizeof(int));
>> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>> *new = 8;
>>  
>>old = rcu_xchg_pointer(_rcu_pointer, new);
>> synchronize_rcu():
>>   urcu_wait_add(); // return 0
>>   urcu_move_waiters(); // @gp_waiters is 
>> empty,
>>// the next 
>> urcu_wait_add() will return 0
>>
>>  
>>synchronize_rcu():
>>  
>>  urcu_wait_add(); // return 0
>>
>>   mutex_lock(_register_lock);
>>   wait_for_readers(); // remove registered 
>> reader from @registery,
>>   // release 
>> rcu_register_lock and wait via poll()
>>
>>  
>>  mutex_lock(_registry_lock);
>>  
>>  wait_for_readers(); // @regsitery is empty! we are so lucky
>>  
>>  return;
>>
>>  
>>if (old)
>>  
>>free(old)
>> rcu_read_unlock();
>> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>>
>>
>> So the point is there could be two writers calling synchronize_rcu() but
>> not returning early(both of them enter the slow path to perform a grace
>> period), so the rcu_gp_lock is necessary in this case.
>>
>> (Cc  Mathieu)
>>
>> But this is only my understanding and I'm learning the URCU code too ;-)
>>
>> Regards,
>> Boqun
>>
>>
>>> Might well be that it is 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Paul E. McKenney
On Thu, Apr 28, 2016 at 09:47:23AM -0400, Yuxin Ren wrote:
> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

If a thread has real-time requirements, you shouldn't have it do
synchronous grace periods, just as you shouldn't have it do (say)
sleep(10).

You should instead either (1) have some other non-realtime thread do
the cleanup activities involving synchronize_rcu() or (2) have the
real-time thread use the asynchronous call_rcu().

Thanx, Paul

> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
> > Hi Paul and Yuxin,
> >
> > On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> >> Try building without it and see what happens when you run the tests.
> >>
> >
> > I've run a 'regtest' with the following modification out of curiousity:
> >
> > diff --git a/urcu.c b/urcu.c
> > index a5568bdbd075..9dc3c9feae56 100644
> > --- a/urcu.c
> > +++ b/urcu.c
> > @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> > /* We won't need to wake ourself up */
> > urcu_wait_set_state(, URCU_WAIT_RUNNING);
> >
> > -   mutex_lock(_gp_lock);
> > -
> > /*
> >  * Move all waiters into our local queue.
> >  */
> > @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> > smp_mb_master();
> >  out:
> > mutex_unlock(_registry_lock);
> > -   mutex_unlock(_gp_lock);
> >
> > /*
> >  * Wakeup waiters only after we have completed the grace period
> >
> >
> > And guess what, the result of the test was:
> >
> > Test Summary Report
> > ---
> > ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
> >   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
> >   150, 165, 180, 195, 210, 225, 240, 253
> >   255
> > Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
> > 597.64 csys = 9579.25 CPU)
> > Result: FAIL
> >
> > And test case 30 for example is something like:
> >
> > tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
> >
> > and it failed because:
> >
> > lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
> > failed.
> > zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 
> > 1 -d 0 -b 32768
> >
> > So I think what was going on here was:
> >
> > CPU 0 (reader)  CPU 1 (writer)  
> > CPU 2 (writer)
> > === 
> > ==
> > rcu_read_lock();
> > new = malloc(sizeof(int));
> > local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old  
> > *new = 8;
> > 
> > old = rcu_xchg_pointer(_rcu_pointer, new);
> > synchronize_rcu():
> >   urcu_wait_add(); // return 0
> >   urcu_move_waiters(); // @gp_waiters is 
> > empty,
> >// the next 
> > urcu_wait_add() will return 0
> >
> > 
> > synchronize_rcu():
> > 
> >   urcu_wait_add(); // return 0
> >
> >   mutex_lock(_register_lock);
> >   wait_for_readers(); // remove registered 
> > reader from @registery,
> >   // release 
> > rcu_register_lock and wait via poll()
> >
> > 
> >   mutex_lock(_registry_lock);
> > 
> >   wait_for_readers(); // @regsitery is empty! we are so lucky
> > 
> >   return;
> >
> > 
> > if (old)
> > 
> > free(old)
> > rcu_read_unlock();
> > assert(*local_ptr==8); // but 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Paul E. McKenney
On Thu, Apr 28, 2016 at 08:44:01PM +0800, Boqun Feng wrote:
> Hi Paul and Yuxin,
> 
> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> > Try building without it and see what happens when you run the tests.
> > 
> 
> I've run a 'regtest' with the following modification out of curiousity:
> 
> diff --git a/urcu.c b/urcu.c
> index a5568bdbd075..9dc3c9feae56 100644
> --- a/urcu.c
> +++ b/urcu.c
> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>   /* We won't need to wake ourself up */
>   urcu_wait_set_state(, URCU_WAIT_RUNNING);
>  
> - mutex_lock(_gp_lock);
> -
>   /*
>* Move all waiters into our local queue.
>*/
> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>   smp_mb_master();
>  out:
>   mutex_unlock(_registry_lock);
> - mutex_unlock(_gp_lock);
>  
>   /*
>* Wakeup waiters only after we have completed the grace period
> 
> 
> And guess what, the result of the test was:
> 
> Test Summary Report
> ---
> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>   150, 165, 180, 195, 210, 225, 240, 253
> 255
> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
> 597.64 csys = 9579.25 CPU)
> Result: FAIL
> 
> And test case 30 for example is something like:
> 
> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
> 
> and it failed because:
> 
> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
> failed.
> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
> -d 0 -b 32768
> 
> So I think what was going on here was:
> 
> CPU 0 (reader)CPU 1 (writer)  
> CPU 2 (writer)
> ===   
> ==
> rcu_read_lock();  
> new = malloc(sizeof(int));
> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
> *new = 8;
>   
> old = rcu_xchg_pointer(_rcu_pointer, new);
>   synchronize_rcu():
> urcu_wait_add(); // return 0
> urcu_move_waiters(); // @gp_waiters is empty,
>  // the next 
> urcu_wait_add() will return 0
> 
>   
> synchronize_rcu():
>   
>   urcu_wait_add(); // return 0
> 
> mutex_lock(_register_lock);
> wait_for_readers(); // remove registered 
> reader from @registery,
> // release 
> rcu_register_lock and wait via poll()
> 
>   
>   mutex_lock(_registry_lock);
>   
>   wait_for_readers(); // @regsitery is empty! we are so lucky
>   
>   return;
> 
>   
> if (old)
>   
> free(old)
> rcu_read_unlock();
> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
> 
> 
> So the point is there could be two writers calling synchronize_rcu() but
> not returning early(both of them enter the slow path to perform a grace
> period), so the rcu_gp_lock is necessary in this case.
> 
> (Cc  Mathieu)
> 
> But this is only my understanding and I'm learning the URCU code too ;-)

Nothing quite like actually trying it and seeing what happens!  One of
the best learning methods that I know of.

Assuming the act of actually trying it is non-fatal, of course.  ;-)

Thanx, Paul

> Regards,
> Boqun
> 
> 
> > Might well be that it is unnecessary, but I will defer to Mathieu
> > on that point.
> > 
> > Thanx, Paul
> > 
> > On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> > > As they don't currently perform grace period, why do we use the 
> > > rcu_gp_lock?
> > > 
> > > Thank you.
> > > Yuxin
> > > 
> > > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> > >  wrote:
> > > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> > > >> Hi,
> > > >>
> > > >> I am learning the URCU code.
> > > >>
> > > >> Why do we need rcu_gp_lock in synchronize_rcu?
> > > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> > > >>
> > > >> In the 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Yuxin Ren
Hi Boqun and Paul,

Thank you so much for your help.

I found one reason to use that lock.
In the slow path, a thread will move all waiters to a local queue.
https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
After this, following thread can also perform grace period, as the
global waiter queue is empty.
Thus the rcu_gp_lock is used to ensure mutual exclusion.

However, from real time aspect, such blocking will cause priority
inversion: higher priority writer can be blocked by low priority
writer.
Is there a way to modify the code to allow multiple threads to perform
grace period concurrently?

Thanks again!!
Yuxin

On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
> Hi Paul and Yuxin,
>
> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>> Try building without it and see what happens when you run the tests.
>>
>
> I've run a 'regtest' with the following modification out of curiousity:
>
> diff --git a/urcu.c b/urcu.c
> index a5568bdbd075..9dc3c9feae56 100644
> --- a/urcu.c
> +++ b/urcu.c
> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> /* We won't need to wake ourself up */
> urcu_wait_set_state(, URCU_WAIT_RUNNING);
>
> -   mutex_lock(_gp_lock);
> -
> /*
>  * Move all waiters into our local queue.
>  */
> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> smp_mb_master();
>  out:
> mutex_unlock(_registry_lock);
> -   mutex_unlock(_gp_lock);
>
> /*
>  * Wakeup waiters only after we have completed the grace period
>
>
> And guess what, the result of the test was:
>
> Test Summary Report
> ---
> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>   150, 165, 180, 195, 210, 225, 240, 253
>   255
> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
> 597.64 csys = 9579.25 CPU)
> Result: FAIL
>
> And test case 30 for example is something like:
>
> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
>
> and it failed because:
>
> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
> failed.
> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
> -d 0 -b 32768
>
> So I think what was going on here was:
>
> CPU 0 (reader)  CPU 1 (writer)
>   CPU 2 (writer)
> ===   
>   ==
> rcu_read_lock();  
>   new = malloc(sizeof(int));
> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>   *new = 8;
>   
>   old = rcu_xchg_pointer(_rcu_pointer, new);
> synchronize_rcu():
>   urcu_wait_add(); // return 0
>   urcu_move_waiters(); // @gp_waiters is 
> empty,
>// the next 
> urcu_wait_add() will return 0
>
>   
>   synchronize_rcu():
>   
> urcu_wait_add(); // return 0
>
>   mutex_lock(_register_lock);
>   wait_for_readers(); // remove registered 
> reader from @registery,
>   // release 
> rcu_register_lock and wait via poll()
>
>   
> mutex_lock(_registry_lock);
>   
> wait_for_readers(); // @regsitery is empty! we are so lucky
>   
> return;
>
>   
>   if (old)
>   
>   free(old)
> rcu_read_unlock();
> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>
>
> So the point is there could be two writers calling synchronize_rcu() but
> not returning early(both of them enter the slow path to perform a grace
> period), so the rcu_gp_lock is necessary in this case.
>
> (Cc  Mathieu)
>
> But this is only my understanding and I'm learning the URCU code too ;-)
>
> Regards,
> Boqun
>
>
>> Might well be that it is unnecessary, but I will defer to Mathieu
>> on that point.
>>
>>   Thanx, Paul
>>
>> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
>> > As they don't currently perform grace period, why do we use the 
>> > rcu_gp_lock?
>> >
>> > 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Boqun Feng
Hi Paul and Yuxin,

On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> Try building without it and see what happens when you run the tests.
> 

I've run a 'regtest' with the following modification out of curiousity:

diff --git a/urcu.c b/urcu.c
index a5568bdbd075..9dc3c9feae56 100644
--- a/urcu.c
+++ b/urcu.c
@@ -398,8 +398,6 @@ void synchronize_rcu(void)
/* We won't need to wake ourself up */
urcu_wait_set_state(, URCU_WAIT_RUNNING);
 
-   mutex_lock(_gp_lock);
-
/*
 * Move all waiters into our local queue.
 */
@@ -480,7 +478,6 @@ void synchronize_rcu(void)
smp_mb_master();
 out:
mutex_unlock(_registry_lock);
-   mutex_unlock(_gp_lock);
 
/*
 * Wakeup waiters only after we have completed the grace period


And guess what, the result of the test was:

Test Summary Report
---
./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
  Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
  150, 165, 180, 195, 210, 225, 240, 253
  255
Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
597.64 csys = 9579.25 CPU)
Result: FAIL

And test case 30 for example is something like:

tests/benchmark/test_urcu_mb   1 -d 0 -b 32768

and it failed because:

lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
failed.
zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d 
0 -b 32768

So I think what was going on here was:

CPU 0 (reader)  CPU 1 (writer)  
CPU 2 (writer)
=== 
==
rcu_read_lock();
new = malloc(sizeof(int));
local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old  
*new = 8;

old = rcu_xchg_pointer(_rcu_pointer, new);
synchronize_rcu():
  urcu_wait_add(); // return 0
  urcu_move_waiters(); // @gp_waiters is empty,
   // the next 
urcu_wait_add() will return 0


synchronize_rcu():

  urcu_wait_add(); // return 0

  mutex_lock(_register_lock);
  wait_for_readers(); // remove registered 
reader from @registery,
  // release 
rcu_register_lock and wait via poll()


  mutex_lock(_registry_lock);

  wait_for_readers(); // @regsitery is empty! we are so lucky

  return;


if (old)

free(old)
rcu_read_unlock();
assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.


So the point is there could be two writers calling synchronize_rcu() but
not returning early(both of them enter the slow path to perform a grace
period), so the rcu_gp_lock is necessary in this case.

(Cc  Mathieu)

But this is only my understanding and I'm learning the URCU code too ;-)

Regards,
Boqun


> Might well be that it is unnecessary, but I will defer to Mathieu
> on that point.
> 
>   Thanx, Paul
> 
> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
> > 
> > Thank you.
> > Yuxin
> > 
> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> >  wrote:
> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> > >> Hi,
> > >>
> > >> I am learning the URCU code.
> > >>
> > >> Why do we need rcu_gp_lock in synchronize_rcu?
> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> > >>
> > >> In the comment, it says this lock ensures mutual exclusion between
> > >> threads calling synchronize_rcu().
> > >> But only the first thread added to waiter queue can proceed to detect
> > >> grace period.
> > >> How can multiple threads currently perform the grace thread?
> > >
> > > They don't concurrently perform grace periods, and it would be wasteful
> > > for them to do so.  Instead, the first one performs the grace period,
> > > and all that were waiting at the time it started get the benefit of that
> > > same grace 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-27 Thread Paul E. McKenney
Try building without it and see what happens when you run the tests.

Might well be that it is unnecessary, but I will defer to Mathieu
on that point.

Thanx, Paul

On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> As they don't currently perform grace period, why do we use the rcu_gp_lock?
> 
> Thank you.
> Yuxin
> 
> On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
>  wrote:
> > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> >> Hi,
> >>
> >> I am learning the URCU code.
> >>
> >> Why do we need rcu_gp_lock in synchronize_rcu?
> >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> >>
> >> In the comment, it says this lock ensures mutual exclusion between
> >> threads calling synchronize_rcu().
> >> But only the first thread added to waiter queue can proceed to detect
> >> grace period.
> >> How can multiple threads currently perform the grace thread?
> >
> > They don't concurrently perform grace periods, and it would be wasteful
> > for them to do so.  Instead, the first one performs the grace period,
> > and all that were waiting at the time it started get the benefit of that
> > same grace period.
> >
> > Any that arrived after the first grace period performs the first
> > grace period are served by whichever of them performs the second
> > grace period.
> >
> > Thanx, Paul
> >
> 

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-27 Thread Yuxin Ren
As they don't currently perform grace period, why do we use the rcu_gp_lock?

Thank you.
Yuxin

On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
 wrote:
> On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
>> Hi,
>>
>> I am learning the URCU code.
>>
>> Why do we need rcu_gp_lock in synchronize_rcu?
>> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
>>
>> In the comment, it says this lock ensures mutual exclusion between
>> threads calling synchronize_rcu().
>> But only the first thread added to waiter queue can proceed to detect
>> grace period.
>> How can multiple threads currently perform the grace thread?
>
> They don't concurrently perform grace periods, and it would be wasteful
> for them to do so.  Instead, the first one performs the grace period,
> and all that were waiting at the time it started get the benefit of that
> same grace period.
>
> Any that arrived after the first grace period performs the first
> grace period are served by whichever of them performs the second
> grace period.
>
> Thanx, Paul
>
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-27 Thread Paul E. McKenney
On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> Hi,
> 
> I am learning the URCU code.
> 
> Why do we need rcu_gp_lock in synchronize_rcu?
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> 
> In the comment, it says this lock ensures mutual exclusion between
> threads calling synchronize_rcu().
> But only the first thread added to waiter queue can proceed to detect
> grace period.
> How can multiple threads currently perform the grace thread?

They don't concurrently perform grace periods, and it would be wasteful
for them to do so.  Instead, the first one performs the grace period,
and all that were waiting at the time it started get the benefit of that
same grace period.

Any that arrived after the first grace period performs the first
grace period are served by whichever of them performs the second
grace period.

Thanx, Paul

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev