Re: [lttng-dev] [PATCH 1/3] tests/lib/Makefile.am: Remove test_seek_empty_packet and test_seek_big_trace from SCRIPT_LIST

2016-04-28 Thread Simon Marchi

On 2016-04-28 14:28, Michael Jeanson wrote:
- On Apr 28, 2016, at 12:44 PM, Simon Marchi 
simon.mar...@polymtl.ca wrote:


Ok thanks.

Just an idea: I find it quite cumbersome to add a new entry in
configure.ac for each test script that we want autoconf to process 
(such

as in patch 3/3).  I think it could be easier to have a single file
(e.g. vars.sh) and have it contain definitions of various directories:

  srcdir=@srcdir@
  top_srcdir=@top_srcdir@
  abs_top_srcdir=@abs_top_srcdir@
  builddir=@builddir@
  ...

This way, you could have only a single generated file (vars.sh.in ->
vars.sh), and each test script would just have to source that file to
use the values.


As discussed on #lttng, here is the PR resulting from my updates to 
your

patches :

  https://github.com/efficios/babeltrace/pull/43


Thanks for following up.

Concerning the include file, I'm not convinced it would be simpler, we 
would
end up with static scripts in the source dir including a generated file 
in

the builddir.


In the current state, even the static scripts are copied to the build 
dir, so it
wouldn't be a problem.  Anyway, if it's not a problem, I think it looks 
good in your

PR.
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 1/3] tests/lib/Makefile.am: Remove test_seek_empty_packet and test_seek_big_trace from SCRIPT_LIST

2016-04-28 Thread Michael Jeanson
- On Apr 28, 2016, at 12:44 PM, Simon Marchi simon.mar...@polymtl.ca wrote:
> 
> Ok thanks.
> 
> Just an idea: I find it quite cumbersome to add a new entry in
> configure.ac for each test script that we want autoconf to process (such
> as in patch 3/3).  I think it could be easier to have a single file
> (e.g. vars.sh) and have it contain definitions of various directories:
> 
>   srcdir=@srcdir@
>   top_srcdir=@top_srcdir@
>   abs_top_srcdir=@abs_top_srcdir@
>   builddir=@builddir@
>   ...
> 
> This way, you could have only a single generated file (vars.sh.in ->
> vars.sh), and each test script would just have to source that file to
> use the values.

As discussed on #lttng, here is the PR resulting from my updates to your
patches :

  https://github.com/efficios/babeltrace/pull/43


Concerning the include file, I'm not convinced it would be simpler, we would
end up with static scripts in the source dir including a generated file in
the builddir.
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 1/3] tests/lib/Makefile.am: Remove test_seek_empty_packet and test_seek_big_trace from SCRIPT_LIST

2016-04-28 Thread Simon Marchi

On 2016-04-28 12:16, Michael Jeanson wrote:
- On Apr 27, 2016, at 11:04 PM, Simon Marchi 
simon.mar...@polymtl.ca wrote:



On 2016-04-27 22:52, Simon Marchi wrote:
Since these files are generated by autoconf, they shouldn't be 
included

in SCRIPT_LIST, which is the list of scripts to copy from the source
directory to the build directory.  This gets rid of these warnings 
when

building:

  cp: cannot stat
'/home/simark/src/babeltrace/tests/lib/test_seek_big_trace': No such
file or directory
  cp: cannot stat
'/home/simark/src/babeltrace/tests/lib/test_seek_empty_packet': No
such file or directory

Signed-off-by: Simon Marchi 
---
 tests/lib/Makefile.am | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tests/lib/Makefile.am b/tests/lib/Makefile.am
index e23bcc1..a08cbf6 100644
--- a/tests/lib/Makefile.am
+++ b/tests/lib/Makefile.am
@@ -41,9 +41,7 @@ test_bt_values_SOURCES = test_bt_values.c
 test_ctf_ir_ref_SOURCES = test_ctf_ir_ref.c
 test_bt_ctf_field_type_validation_SOURCES =
test_bt_ctf_field_type_validation.c

-SCRIPT_LIST = test_seek_big_trace \
-   test_seek_empty_packet \
-   test_ctf_writer_complete
+SCRIPT_LIST = test_ctf_writer_complete
 EXTRA_DIST = test_seek_big_trace.in test_seek_empty_packet.in
 CLEANFILES= test_seek_big_trace test_seek_empty_packet


Sorry, forgot to mention that this series is meant for babeltrace.


Good catch, I reviewed the patches and I'm working on an updated
version to do further cleanup of the Makefile and test scripts
generation.


Ok thanks.

Just an idea: I find it quite cumbersome to add a new entry in 
configure.ac for each test script that we want autoconf to process (such 
as in patch 3/3).  I think it could be easier to have a single file 
(e.g. vars.sh) and have it contain definitions of various directories:


  srcdir=@srcdir@
  top_srcdir=@top_srcdir@
  abs_top_srcdir=@abs_top_srcdir@
  builddir=@builddir@
  ...

This way, you could have only a single generated file (vars.sh.in -> 
vars.sh), and each test script would just have to source that file to 
use the values.

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [PATCH 1/3] tests/lib/Makefile.am: Remove test_seek_empty_packet and test_seek_big_trace from SCRIPT_LIST

2016-04-28 Thread Michael Jeanson
- On Apr 27, 2016, at 11:04 PM, Simon Marchi simon.mar...@polymtl.ca wrote:

> On 2016-04-27 22:52, Simon Marchi wrote:
>> Since these files are generated by autoconf, they shouldn't be included
>> in SCRIPT_LIST, which is the list of scripts to copy from the source
>> directory to the build directory.  This gets rid of these warnings when
>> building:
>> 
>>   cp: cannot stat
>> '/home/simark/src/babeltrace/tests/lib/test_seek_big_trace': No such
>> file or directory
>>   cp: cannot stat
>> '/home/simark/src/babeltrace/tests/lib/test_seek_empty_packet': No
>> such file or directory
>> 
>> Signed-off-by: Simon Marchi 
>> ---
>>  tests/lib/Makefile.am | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>> 
>> diff --git a/tests/lib/Makefile.am b/tests/lib/Makefile.am
>> index e23bcc1..a08cbf6 100644
>> --- a/tests/lib/Makefile.am
>> +++ b/tests/lib/Makefile.am
>> @@ -41,9 +41,7 @@ test_bt_values_SOURCES = test_bt_values.c
>>  test_ctf_ir_ref_SOURCES = test_ctf_ir_ref.c
>>  test_bt_ctf_field_type_validation_SOURCES =
>> test_bt_ctf_field_type_validation.c
>> 
>> -SCRIPT_LIST = test_seek_big_trace \
>> -test_seek_empty_packet \
>> -test_ctf_writer_complete
>> +SCRIPT_LIST = test_ctf_writer_complete
>>  EXTRA_DIST = test_seek_big_trace.in test_seek_empty_packet.in
>>  CLEANFILES= test_seek_big_trace test_seek_empty_packet
> 
> Sorry, forgot to mention that this series is meant for babeltrace.

Good catch, I reviewed the patches and I'm working on an updated
version to do further cleanup of the Makefile and test scripts
generation.
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Mathieu Desnoyers
- On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote:

> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

By the way, there are other reasons for this lock in the urcu implementation:
it protects the rcu_registry, and the rcu_gp.ctr count.

Thanks,

Mathieu

> 
> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
>> Hi Paul and Yuxin,
>>
>> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>>> Try building without it and see what happens when you run the tests.
>>>
>>
>> I've run a 'regtest' with the following modification out of curiousity:
>>
>> diff --git a/urcu.c b/urcu.c
>> index a5568bdbd075..9dc3c9feae56 100644
>> --- a/urcu.c
>> +++ b/urcu.c
>> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>> /* We won't need to wake ourself up */
>> urcu_wait_set_state(, URCU_WAIT_RUNNING);
>>
>> -   mutex_lock(_gp_lock);
>> -
>> /*
>>  * Move all waiters into our local queue.
>>  */
>> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>> smp_mb_master();
>>  out:
>> mutex_unlock(_registry_lock);
>> -   mutex_unlock(_gp_lock);
>>
>> /*
>>  * Wakeup waiters only after we have completed the grace period
>>
>>
>> And guess what, the result of the test was:
>>
>> Test Summary Report
>> ---
>> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>>   150, 165, 180, 195, 210, 225, 240, 253
>>   255
>> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
>> 597.64 csys = 9579.25 CPU)
>> Result: FAIL
>>
>> And test case 30 for example is something like:
>>
>> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
>>
>> and it failed because:
>>
>> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
>> failed.
>> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
>> -d
>> 0 -b 32768
>>
>> So I think what was going on here was:
>>
>> CPU 0 (reader)  CPU 1 (writer)
>> CPU 2 (writer)
>> === 
>> ==
>> rcu_read_lock();
>> new =
>> malloc(sizeof(int));
>> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>> *new = 8;
>>  
>>old = rcu_xchg_pointer(_rcu_pointer, new);
>> synchronize_rcu():
>>   urcu_wait_add(); // return 0
>>   urcu_move_waiters(); // @gp_waiters is 
>> empty,
>>// the next 
>> urcu_wait_add() will return 0
>>
>>  
>>synchronize_rcu():
>>  
>>  urcu_wait_add(); // return 0
>>
>>   mutex_lock(_register_lock);
>>   wait_for_readers(); // remove registered 
>> reader from @registery,
>>   // release 
>> rcu_register_lock and wait via poll()
>>
>>  
>>  mutex_lock(_registry_lock);
>>  
>>  wait_for_readers(); // @regsitery is empty! we are so lucky
>>  
>>  return;
>>
>>  
>>if (old)
>>  
>>free(old)
>> rcu_read_unlock();
>> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>>
>>
>> So the point is there could be two writers calling synchronize_rcu() but
>> not returning early(both of them enter the slow path to perform a grace
>> period), so the rcu_gp_lock is necessary in this case.
>>
>> (Cc  Mathieu)
>>
>> But this is only my understanding and I'm learning the URCU code too ;-)
>>
>> Regards,
>> Boqun
>>
>>
>>> Might well be that it is unnecessary, but I will defer to Mathieu
>>> on 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Paul E. McKenney
On Thu, Apr 28, 2016 at 02:38:40PM +, Mathieu Desnoyers wrote:
> - On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote:
> 
> > Hi Boqun and Paul,
> > 
> > Thank you so much for your help.
> > 
> > I found one reason to use that lock.
> > In the slow path, a thread will move all waiters to a local queue.
> > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> > After this, following thread can also perform grace period, as the
> > global waiter queue is empty.
> > Thus the rcu_gp_lock is used to ensure mutual exclusion.
> > 
> > However, from real time aspect, such blocking will cause priority
> > inversion: higher priority writer can be blocked by low priority
> > writer.
> > Is there a way to modify the code to allow multiple threads to perform
> > grace period concurrently?
> 
> Before we redesign urcu for RT, would it be possible to simply
> use pi-mutexes (priority inheritance) instead to protect grace periods
> from each other with the current urcu scheme ?

Given that priority inversion can happen with low-priority readers
blocking a grace period that a high-priority updater is waiting on,
I stand by my earlier advice:  Don't let high-priority updaters block
waiting for grace periods.  ;-)

Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Thanks again!!
> > Yuxin
> > 
> > On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
> >> Hi Paul and Yuxin,
> >>
> >> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> >>> Try building without it and see what happens when you run the tests.
> >>>
> >>
> >> I've run a 'regtest' with the following modification out of curiousity:
> >>
> >> diff --git a/urcu.c b/urcu.c
> >> index a5568bdbd075..9dc3c9feae56 100644
> >> --- a/urcu.c
> >> +++ b/urcu.c
> >> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> >> /* We won't need to wake ourself up */
> >> urcu_wait_set_state(, URCU_WAIT_RUNNING);
> >>
> >> -   mutex_lock(_gp_lock);
> >> -
> >> /*
> >>  * Move all waiters into our local queue.
> >>  */
> >> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> >> smp_mb_master();
> >>  out:
> >> mutex_unlock(_registry_lock);
> >> -   mutex_unlock(_gp_lock);
> >>
> >> /*
> >>  * Wakeup waiters only after we have completed the grace period
> >>
> >>
> >> And guess what, the result of the test was:
> >>
> >> Test Summary Report
> >> ---
> >> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
> >>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
> >>   150, 165, 180, 195, 210, 225, 240, 253
> >>   255
> >> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
> >> 597.64 csys = 9579.25 CPU)
> >> Result: FAIL
> >>
> >> And test case 30 for example is something like:
> >>
> >> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
> >>
> >> and it failed because:
> >>
> >> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
> >> failed.
> >> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 
> >> 1 -d
> >> 0 -b 32768
> >>
> >> So I think what was going on here was:
> >>
> >> CPU 0 (reader)  CPU 1 (writer)
> >> CPU 2 (writer)
> >> === 
> >> ==
> >> rcu_read_lock();
> >> new =
> >> malloc(sizeof(int));
> >> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
> >> *new = 8;
> >>
> >>  old = rcu_xchg_pointer(_rcu_pointer, new);
> >> synchronize_rcu():
> >>   urcu_wait_add(); // return 0
> >>   urcu_move_waiters(); // @gp_waiters is 
> >> empty,
> >>// the next 
> >> urcu_wait_add() will return 0
> >>
> >>
> >>  synchronize_rcu():
> >>
> >>urcu_wait_add(); // return 0
> >>
> >>   mutex_lock(_register_lock);
> >>   wait_for_readers(); // remove registered 
> >> reader from @registery,
> >>   // release 
> >> rcu_register_lock and wait via poll()
> >>
> >>
> >>mutex_lock(_registry_lock);
> >>
> >>wait_for_readers(); // @regsitery is empty! we are so lucky
> >>
> >>return;
> >>
> >>   

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Mathieu Desnoyers
- On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote:

> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

Before we redesign urcu for RT, would it be possible to simply
use pi-mutexes (priority inheritance) instead to protect grace periods
from each other with the current urcu scheme ?

Thanks,

Mathieu


> 
> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
>> Hi Paul and Yuxin,
>>
>> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>>> Try building without it and see what happens when you run the tests.
>>>
>>
>> I've run a 'regtest' with the following modification out of curiousity:
>>
>> diff --git a/urcu.c b/urcu.c
>> index a5568bdbd075..9dc3c9feae56 100644
>> --- a/urcu.c
>> +++ b/urcu.c
>> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>> /* We won't need to wake ourself up */
>> urcu_wait_set_state(, URCU_WAIT_RUNNING);
>>
>> -   mutex_lock(_gp_lock);
>> -
>> /*
>>  * Move all waiters into our local queue.
>>  */
>> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>> smp_mb_master();
>>  out:
>> mutex_unlock(_registry_lock);
>> -   mutex_unlock(_gp_lock);
>>
>> /*
>>  * Wakeup waiters only after we have completed the grace period
>>
>>
>> And guess what, the result of the test was:
>>
>> Test Summary Report
>> ---
>> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>>   150, 165, 180, 195, 210, 225, 240, 253
>>   255
>> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr
>> 597.64 csys = 9579.25 CPU)
>> Result: FAIL
>>
>> And test case 30 for example is something like:
>>
>> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
>>
>> and it failed because:
>>
>> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8'
>> failed.
>> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
>> -d
>> 0 -b 32768
>>
>> So I think what was going on here was:
>>
>> CPU 0 (reader)  CPU 1 (writer)
>> CPU 2 (writer)
>> === 
>> ==
>> rcu_read_lock();
>> new =
>> malloc(sizeof(int));
>> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>> *new = 8;
>>  
>>old = rcu_xchg_pointer(_rcu_pointer, new);
>> synchronize_rcu():
>>   urcu_wait_add(); // return 0
>>   urcu_move_waiters(); // @gp_waiters is 
>> empty,
>>// the next 
>> urcu_wait_add() will return 0
>>
>>  
>>synchronize_rcu():
>>  
>>  urcu_wait_add(); // return 0
>>
>>   mutex_lock(_register_lock);
>>   wait_for_readers(); // remove registered 
>> reader from @registery,
>>   // release 
>> rcu_register_lock and wait via poll()
>>
>>  
>>  mutex_lock(_registry_lock);
>>  
>>  wait_for_readers(); // @regsitery is empty! we are so lucky
>>  
>>  return;
>>
>>  
>>if (old)
>>  
>>free(old)
>> rcu_read_unlock();
>> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>>
>>
>> So the point is there could be two writers calling synchronize_rcu() but
>> not returning early(both of them enter the slow path to perform a grace
>> period), so the rcu_gp_lock is necessary in this case.
>>
>> (Cc  Mathieu)
>>
>> But this is only my understanding and I'm learning the URCU code too ;-)
>>
>> Regards,
>> Boqun
>>
>>
>>> Might well be that it is 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Paul E. McKenney
On Thu, Apr 28, 2016 at 09:47:23AM -0400, Yuxin Ren wrote:
> Hi Boqun and Paul,
> 
> Thank you so much for your help.
> 
> I found one reason to use that lock.
> In the slow path, a thread will move all waiters to a local queue.
> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
> After this, following thread can also perform grace period, as the
> global waiter queue is empty.
> Thus the rcu_gp_lock is used to ensure mutual exclusion.
> 
> However, from real time aspect, such blocking will cause priority
> inversion: higher priority writer can be blocked by low priority
> writer.
> Is there a way to modify the code to allow multiple threads to perform
> grace period concurrently?

If a thread has real-time requirements, you shouldn't have it do
synchronous grace periods, just as you shouldn't have it do (say)
sleep(10).

You should instead either (1) have some other non-realtime thread do
the cleanup activities involving synchronize_rcu() or (2) have the
real-time thread use the asynchronous call_rcu().

Thanx, Paul

> Thanks again!!
> Yuxin
> 
> On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
> > Hi Paul and Yuxin,
> >
> > On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> >> Try building without it and see what happens when you run the tests.
> >>
> >
> > I've run a 'regtest' with the following modification out of curiousity:
> >
> > diff --git a/urcu.c b/urcu.c
> > index a5568bdbd075..9dc3c9feae56 100644
> > --- a/urcu.c
> > +++ b/urcu.c
> > @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> > /* We won't need to wake ourself up */
> > urcu_wait_set_state(, URCU_WAIT_RUNNING);
> >
> > -   mutex_lock(_gp_lock);
> > -
> > /*
> >  * Move all waiters into our local queue.
> >  */
> > @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> > smp_mb_master();
> >  out:
> > mutex_unlock(_registry_lock);
> > -   mutex_unlock(_gp_lock);
> >
> > /*
> >  * Wakeup waiters only after we have completed the grace period
> >
> >
> > And guess what, the result of the test was:
> >
> > Test Summary Report
> > ---
> > ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
> >   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
> >   150, 165, 180, 195, 210, 225, 240, 253
> >   255
> > Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
> > 597.64 csys = 9579.25 CPU)
> > Result: FAIL
> >
> > And test case 30 for example is something like:
> >
> > tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
> >
> > and it failed because:
> >
> > lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
> > failed.
> > zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 
> > 1 -d 0 -b 32768
> >
> > So I think what was going on here was:
> >
> > CPU 0 (reader)  CPU 1 (writer)  
> > CPU 2 (writer)
> > === 
> > ==
> > rcu_read_lock();
> > new = malloc(sizeof(int));
> > local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old  
> > *new = 8;
> > 
> > old = rcu_xchg_pointer(_rcu_pointer, new);
> > synchronize_rcu():
> >   urcu_wait_add(); // return 0
> >   urcu_move_waiters(); // @gp_waiters is 
> > empty,
> >// the next 
> > urcu_wait_add() will return 0
> >
> > 
> > synchronize_rcu():
> > 
> >   urcu_wait_add(); // return 0
> >
> >   mutex_lock(_register_lock);
> >   wait_for_readers(); // remove registered 
> > reader from @registery,
> >   // release 
> > rcu_register_lock and wait via poll()
> >
> > 
> >   mutex_lock(_registry_lock);
> > 
> >   wait_for_readers(); // @regsitery is empty! we are so lucky
> > 
> >   return;
> >
> > 
> > if (old)
> > 
> > free(old)
> > rcu_read_unlock();
> > assert(*local_ptr==8); // but 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Paul E. McKenney
On Thu, Apr 28, 2016 at 08:44:01PM +0800, Boqun Feng wrote:
> Hi Paul and Yuxin,
> 
> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> > Try building without it and see what happens when you run the tests.
> > 
> 
> I've run a 'regtest' with the following modification out of curiousity:
> 
> diff --git a/urcu.c b/urcu.c
> index a5568bdbd075..9dc3c9feae56 100644
> --- a/urcu.c
> +++ b/urcu.c
> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
>   /* We won't need to wake ourself up */
>   urcu_wait_set_state(, URCU_WAIT_RUNNING);
>  
> - mutex_lock(_gp_lock);
> -
>   /*
>* Move all waiters into our local queue.
>*/
> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
>   smp_mb_master();
>  out:
>   mutex_unlock(_registry_lock);
> - mutex_unlock(_gp_lock);
>  
>   /*
>* Wakeup waiters only after we have completed the grace period
> 
> 
> And guess what, the result of the test was:
> 
> Test Summary Report
> ---
> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>   150, 165, 180, 195, 210, 225, 240, 253
> 255
> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
> 597.64 csys = 9579.25 CPU)
> Result: FAIL
> 
> And test case 30 for example is something like:
> 
> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
> 
> and it failed because:
> 
> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
> failed.
> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
> -d 0 -b 32768
> 
> So I think what was going on here was:
> 
> CPU 0 (reader)CPU 1 (writer)  
> CPU 2 (writer)
> ===   
> ==
> rcu_read_lock();  
> new = malloc(sizeof(int));
> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
> *new = 8;
>   
> old = rcu_xchg_pointer(_rcu_pointer, new);
>   synchronize_rcu():
> urcu_wait_add(); // return 0
> urcu_move_waiters(); // @gp_waiters is empty,
>  // the next 
> urcu_wait_add() will return 0
> 
>   
> synchronize_rcu():
>   
>   urcu_wait_add(); // return 0
> 
> mutex_lock(_register_lock);
> wait_for_readers(); // remove registered 
> reader from @registery,
> // release 
> rcu_register_lock and wait via poll()
> 
>   
>   mutex_lock(_registry_lock);
>   
>   wait_for_readers(); // @regsitery is empty! we are so lucky
>   
>   return;
> 
>   
> if (old)
>   
> free(old)
> rcu_read_unlock();
> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
> 
> 
> So the point is there could be two writers calling synchronize_rcu() but
> not returning early(both of them enter the slow path to perform a grace
> period), so the rcu_gp_lock is necessary in this case.
> 
> (Cc  Mathieu)
> 
> But this is only my understanding and I'm learning the URCU code too ;-)

Nothing quite like actually trying it and seeing what happens!  One of
the best learning methods that I know of.

Assuming the act of actually trying it is non-fatal, of course.  ;-)

Thanx, Paul

> Regards,
> Boqun
> 
> 
> > Might well be that it is unnecessary, but I will defer to Mathieu
> > on that point.
> > 
> > Thanx, Paul
> > 
> > On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> > > As they don't currently perform grace period, why do we use the 
> > > rcu_gp_lock?
> > > 
> > > Thank you.
> > > Yuxin
> > > 
> > > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> > >  wrote:
> > > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> > > >> Hi,
> > > >>
> > > >> I am learning the URCU code.
> > > >>
> > > >> Why do we need rcu_gp_lock in synchronize_rcu?
> > > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> > > >>
> > > >> In the 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Yuxin Ren
Hi Boqun and Paul,

Thank you so much for your help.

I found one reason to use that lock.
In the slow path, a thread will move all waiters to a local queue.
https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406
After this, following thread can also perform grace period, as the
global waiter queue is empty.
Thus the rcu_gp_lock is used to ensure mutual exclusion.

However, from real time aspect, such blocking will cause priority
inversion: higher priority writer can be blocked by low priority
writer.
Is there a way to modify the code to allow multiple threads to perform
grace period concurrently?

Thanks again!!
Yuxin

On Thu, Apr 28, 2016 at 8:44 AM, Boqun Feng  wrote:
> Hi Paul and Yuxin,
>
> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
>> Try building without it and see what happens when you run the tests.
>>
>
> I've run a 'regtest' with the following modification out of curiousity:
>
> diff --git a/urcu.c b/urcu.c
> index a5568bdbd075..9dc3c9feae56 100644
> --- a/urcu.c
> +++ b/urcu.c
> @@ -398,8 +398,6 @@ void synchronize_rcu(void)
> /* We won't need to wake ourself up */
> urcu_wait_set_state(, URCU_WAIT_RUNNING);
>
> -   mutex_lock(_gp_lock);
> -
> /*
>  * Move all waiters into our local queue.
>  */
> @@ -480,7 +478,6 @@ void synchronize_rcu(void)
> smp_mb_master();
>  out:
> mutex_unlock(_registry_lock);
> -   mutex_unlock(_gp_lock);
>
> /*
>  * Wakeup waiters only after we have completed the grace period
>
>
> And guess what, the result of the test was:
>
> Test Summary Report
> ---
> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
>   Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
>   150, 165, 180, 195, 210, 225, 240, 253
>   255
> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
> 597.64 csys = 9579.25 CPU)
> Result: FAIL
>
> And test case 30 for example is something like:
>
> tests/benchmark/test_urcu_mb   1 -d 0 -b 32768
>
> and it failed because:
>
> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
> failed.
> zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 
> -d 0 -b 32768
>
> So I think what was going on here was:
>
> CPU 0 (reader)  CPU 1 (writer)
>   CPU 2 (writer)
> ===   
>   ==
> rcu_read_lock();  
>   new = malloc(sizeof(int));
> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old
>   *new = 8;
>   
>   old = rcu_xchg_pointer(_rcu_pointer, new);
> synchronize_rcu():
>   urcu_wait_add(); // return 0
>   urcu_move_waiters(); // @gp_waiters is 
> empty,
>// the next 
> urcu_wait_add() will return 0
>
>   
>   synchronize_rcu():
>   
> urcu_wait_add(); // return 0
>
>   mutex_lock(_register_lock);
>   wait_for_readers(); // remove registered 
> reader from @registery,
>   // release 
> rcu_register_lock and wait via poll()
>
>   
> mutex_lock(_registry_lock);
>   
> wait_for_readers(); // @regsitery is empty! we are so lucky
>   
> return;
>
>   
>   if (old)
>   
>   free(old)
> rcu_read_unlock();
> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.
>
>
> So the point is there could be two writers calling synchronize_rcu() but
> not returning early(both of them enter the slow path to perform a grace
> period), so the rcu_gp_lock is necessary in this case.
>
> (Cc  Mathieu)
>
> But this is only my understanding and I'm learning the URCU code too ;-)
>
> Regards,
> Boqun
>
>
>> Might well be that it is unnecessary, but I will defer to Mathieu
>> on that point.
>>
>>   Thanx, Paul
>>
>> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
>> > As they don't currently perform grace period, why do we use the 
>> > rcu_gp_lock?
>> >
>> > 

Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU

2016-04-28 Thread Boqun Feng
Hi Paul and Yuxin,

On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote:
> Try building without it and see what happens when you run the tests.
> 

I've run a 'regtest' with the following modification out of curiousity:

diff --git a/urcu.c b/urcu.c
index a5568bdbd075..9dc3c9feae56 100644
--- a/urcu.c
+++ b/urcu.c
@@ -398,8 +398,6 @@ void synchronize_rcu(void)
/* We won't need to wake ourself up */
urcu_wait_set_state(, URCU_WAIT_RUNNING);
 
-   mutex_lock(_gp_lock);
-
/*
 * Move all waiters into our local queue.
 */
@@ -480,7 +478,6 @@ void synchronize_rcu(void)
smp_mb_master();
 out:
mutex_unlock(_registry_lock);
-   mutex_unlock(_gp_lock);
 
/*
 * Wakeup waiters only after we have completed the grace period


And guess what, the result of the test was:

Test Summary Report
---
./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18)
  Failed tests:  30, 45, 60, 75, 90, 103, 105, 120, 135
  150, 165, 180, 195, 210, 225, 240, 253
  255
Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr  0.04 sys + 8981.02 cusr 
597.64 csys = 9579.25 CPU)
Result: FAIL

And test case 30 for example is something like:

tests/benchmark/test_urcu_mb   1 -d 0 -b 32768

and it failed because:

lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' 
failed.
zsh: abort (core dumped)  ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d 
0 -b 32768

So I think what was going on here was:

CPU 0 (reader)  CPU 1 (writer)  
CPU 2 (writer)
=== 
==
rcu_read_lock();
new = malloc(sizeof(int));
local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old  
*new = 8;

old = rcu_xchg_pointer(_rcu_pointer, new);
synchronize_rcu():
  urcu_wait_add(); // return 0
  urcu_move_waiters(); // @gp_waiters is empty,
   // the next 
urcu_wait_add() will return 0


synchronize_rcu():

  urcu_wait_add(); // return 0

  mutex_lock(_register_lock);
  wait_for_readers(); // remove registered 
reader from @registery,
  // release 
rcu_register_lock and wait via poll()


  mutex_lock(_registry_lock);

  wait_for_readers(); // @regsitery is empty! we are so lucky

  return;


if (old)

free(old)
rcu_read_unlock();
assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed.


So the point is there could be two writers calling synchronize_rcu() but
not returning early(both of them enter the slow path to perform a grace
period), so the rcu_gp_lock is necessary in this case.

(Cc  Mathieu)

But this is only my understanding and I'm learning the URCU code too ;-)

Regards,
Boqun


> Might well be that it is unnecessary, but I will defer to Mathieu
> on that point.
> 
>   Thanx, Paul
> 
> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote:
> > As they don't currently perform grace period, why do we use the rcu_gp_lock?
> > 
> > Thank you.
> > Yuxin
> > 
> > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney
> >  wrote:
> > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote:
> > >> Hi,
> > >>
> > >> I am learning the URCU code.
> > >>
> > >> Why do we need rcu_gp_lock in synchronize_rcu?
> > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401
> > >>
> > >> In the comment, it says this lock ensures mutual exclusion between
> > >> threads calling synchronize_rcu().
> > >> But only the first thread added to waiter queue can proceed to detect
> > >> grace period.
> > >> How can multiple threads currently perform the grace thread?
> > >
> > > They don't concurrently perform grace periods, and it would be wasteful
> > > for them to do so.  Instead, the first one performs the grace period,
> > > and all that were waiting at the time it started get the benefit of that
> > > same grace