Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
- On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote: > Hi Boqun and Paul, > > Thank you so much for your help. > > I found one reason to use that lock. > In the slow path, a thread will move all waiters to a local queue. > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406 > After this, following thread can also perform grace period, as the > global waiter queue is empty. > Thus the rcu_gp_lock is used to ensure mutual exclusion. > > However, from real time aspect, such blocking will cause priority > inversion: higher priority writer can be blocked by low priority > writer. > Is there a way to modify the code to allow multiple threads to perform > grace period concurrently? By the way, there are other reasons for this lock in the urcu implementation: it protects the rcu_registry, and the rcu_gp.ctr count. Thanks, Mathieu > > Thanks again!! > Yuxin > > On Thu, Apr 28, 2016 at 8:44 AM, Boqun Fengwrote: >> Hi Paul and Yuxin, >> >> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote: >>> Try building without it and see what happens when you run the tests. >>> >> >> I've run a 'regtest' with the following modification out of curiousity: >> >> diff --git a/urcu.c b/urcu.c >> index a5568bdbd075..9dc3c9feae56 100644 >> --- a/urcu.c >> +++ b/urcu.c >> @@ -398,8 +398,6 @@ void synchronize_rcu(void) >> /* We won't need to wake ourself up */ >> urcu_wait_set_state(, URCU_WAIT_RUNNING); >> >> - mutex_lock(_gp_lock); >> - >> /* >> * Move all waiters into our local queue. >> */ >> @@ -480,7 +478,6 @@ void synchronize_rcu(void) >> smp_mb_master(); >> out: >> mutex_unlock(_registry_lock); >> - mutex_unlock(_gp_lock); >> >> /* >> * Wakeup waiters only after we have completed the grace period >> >> >> And guess what, the result of the test was: >> >> Test Summary Report >> --- >> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18) >> Failed tests: 30, 45, 60, 75, 90, 103, 105, 120, 135 >> 150, 165, 180, 195, 210, 225, 240, 253 >> 255 >> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr 0.04 sys + 8981.02 cusr >> 597.64 csys = 9579.25 CPU) >> Result: FAIL >> >> And test case 30 for example is something like: >> >> tests/benchmark/test_urcu_mb 1 -d 0 -b 32768 >> >> and it failed because: >> >> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' >> failed. >> zsh: abort (core dumped) ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 >> -d >> 0 -b 32768 >> >> So I think what was going on here was: >> >> CPU 0 (reader) CPU 1 (writer) >> CPU 2 (writer) >> === >> == >> rcu_read_lock(); >> new = >> malloc(sizeof(int)); >> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old >> *new = 8; >> >>old = rcu_xchg_pointer(_rcu_pointer, new); >> synchronize_rcu(): >> urcu_wait_add(); // return 0 >> urcu_move_waiters(); // @gp_waiters is >> empty, >>// the next >> urcu_wait_add() will return 0 >> >> >>synchronize_rcu(): >> >> urcu_wait_add(); // return 0 >> >> mutex_lock(_register_lock); >> wait_for_readers(); // remove registered >> reader from @registery, >> // release >> rcu_register_lock and wait via poll() >> >> >> mutex_lock(_registry_lock); >> >> wait_for_readers(); // @regsitery is empty! we are so lucky >> >> return; >> >> >>if (old) >> >>free(old) >> rcu_read_unlock(); >> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed. >> >> >> So the point is there could be two writers calling synchronize_rcu() but >> not returning early(both of them enter the slow path to perform a grace >> period), so the rcu_gp_lock is necessary in this case. >> >> (Cc Mathieu) >> >> But this is only my understanding and I'm learning the URCU code too ;-) >> >> Regards, >> Boqun >> >> >>> Might well be that it is unnecessary, but I will defer to Mathieu >>> on
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
On Thu, Apr 28, 2016 at 02:38:40PM +, Mathieu Desnoyers wrote: > - On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote: > > > Hi Boqun and Paul, > > > > Thank you so much for your help. > > > > I found one reason to use that lock. > > In the slow path, a thread will move all waiters to a local queue. > > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406 > > After this, following thread can also perform grace period, as the > > global waiter queue is empty. > > Thus the rcu_gp_lock is used to ensure mutual exclusion. > > > > However, from real time aspect, such blocking will cause priority > > inversion: higher priority writer can be blocked by low priority > > writer. > > Is there a way to modify the code to allow multiple threads to perform > > grace period concurrently? > > Before we redesign urcu for RT, would it be possible to simply > use pi-mutexes (priority inheritance) instead to protect grace periods > from each other with the current urcu scheme ? Given that priority inversion can happen with low-priority readers blocking a grace period that a high-priority updater is waiting on, I stand by my earlier advice: Don't let high-priority updaters block waiting for grace periods. ;-) Thanx, Paul > Thanks, > > Mathieu > > > > > > Thanks again!! > > Yuxin > > > > On Thu, Apr 28, 2016 at 8:44 AM, Boqun Fengwrote: > >> Hi Paul and Yuxin, > >> > >> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote: > >>> Try building without it and see what happens when you run the tests. > >>> > >> > >> I've run a 'regtest' with the following modification out of curiousity: > >> > >> diff --git a/urcu.c b/urcu.c > >> index a5568bdbd075..9dc3c9feae56 100644 > >> --- a/urcu.c > >> +++ b/urcu.c > >> @@ -398,8 +398,6 @@ void synchronize_rcu(void) > >> /* We won't need to wake ourself up */ > >> urcu_wait_set_state(, URCU_WAIT_RUNNING); > >> > >> - mutex_lock(_gp_lock); > >> - > >> /* > >> * Move all waiters into our local queue. > >> */ > >> @@ -480,7 +478,6 @@ void synchronize_rcu(void) > >> smp_mb_master(); > >> out: > >> mutex_unlock(_registry_lock); > >> - mutex_unlock(_gp_lock); > >> > >> /* > >> * Wakeup waiters only after we have completed the grace period > >> > >> > >> And guess what, the result of the test was: > >> > >> Test Summary Report > >> --- > >> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18) > >> Failed tests: 30, 45, 60, 75, 90, 103, 105, 120, 135 > >> 150, 165, 180, 195, 210, 225, 240, 253 > >> 255 > >> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr 0.04 sys + 8981.02 cusr > >> 597.64 csys = 9579.25 CPU) > >> Result: FAIL > >> > >> And test case 30 for example is something like: > >> > >> tests/benchmark/test_urcu_mb 1 -d 0 -b 32768 > >> > >> and it failed because: > >> > >> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' > >> failed. > >> zsh: abort (core dumped) ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 > >> 1 -d > >> 0 -b 32768 > >> > >> So I think what was going on here was: > >> > >> CPU 0 (reader) CPU 1 (writer) > >> CPU 2 (writer) > >> === > >> == > >> rcu_read_lock(); > >> new = > >> malloc(sizeof(int)); > >> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old > >> *new = 8; > >> > >> old = rcu_xchg_pointer(_rcu_pointer, new); > >> synchronize_rcu(): > >> urcu_wait_add(); // return 0 > >> urcu_move_waiters(); // @gp_waiters is > >> empty, > >>// the next > >> urcu_wait_add() will return 0 > >> > >> > >> synchronize_rcu(): > >> > >>urcu_wait_add(); // return 0 > >> > >> mutex_lock(_register_lock); > >> wait_for_readers(); // remove registered > >> reader from @registery, > >> // release > >> rcu_register_lock and wait via poll() > >> > >> > >>mutex_lock(_registry_lock); > >> > >>wait_for_readers(); // @regsitery is empty! we are so lucky > >> > >>return; > >> > >>
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
- On Apr 28, 2016, at 9:47 AM, Yuxin Ren r...@gwmail.gwu.edu wrote: > Hi Boqun and Paul, > > Thank you so much for your help. > > I found one reason to use that lock. > In the slow path, a thread will move all waiters to a local queue. > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406 > After this, following thread can also perform grace period, as the > global waiter queue is empty. > Thus the rcu_gp_lock is used to ensure mutual exclusion. > > However, from real time aspect, such blocking will cause priority > inversion: higher priority writer can be blocked by low priority > writer. > Is there a way to modify the code to allow multiple threads to perform > grace period concurrently? Before we redesign urcu for RT, would it be possible to simply use pi-mutexes (priority inheritance) instead to protect grace periods from each other with the current urcu scheme ? Thanks, Mathieu > > Thanks again!! > Yuxin > > On Thu, Apr 28, 2016 at 8:44 AM, Boqun Fengwrote: >> Hi Paul and Yuxin, >> >> On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote: >>> Try building without it and see what happens when you run the tests. >>> >> >> I've run a 'regtest' with the following modification out of curiousity: >> >> diff --git a/urcu.c b/urcu.c >> index a5568bdbd075..9dc3c9feae56 100644 >> --- a/urcu.c >> +++ b/urcu.c >> @@ -398,8 +398,6 @@ void synchronize_rcu(void) >> /* We won't need to wake ourself up */ >> urcu_wait_set_state(, URCU_WAIT_RUNNING); >> >> - mutex_lock(_gp_lock); >> - >> /* >> * Move all waiters into our local queue. >> */ >> @@ -480,7 +478,6 @@ void synchronize_rcu(void) >> smp_mb_master(); >> out: >> mutex_unlock(_registry_lock); >> - mutex_unlock(_gp_lock); >> >> /* >> * Wakeup waiters only after we have completed the grace period >> >> >> And guess what, the result of the test was: >> >> Test Summary Report >> --- >> ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18) >> Failed tests: 30, 45, 60, 75, 90, 103, 105, 120, 135 >> 150, 165, 180, 195, 210, 225, 240, 253 >> 255 >> Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr 0.04 sys + 8981.02 cusr >> 597.64 csys = 9579.25 CPU) >> Result: FAIL >> >> And test case 30 for example is something like: >> >> tests/benchmark/test_urcu_mb 1 -d 0 -b 32768 >> >> and it failed because: >> >> lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' >> failed. >> zsh: abort (core dumped) ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 >> -d >> 0 -b 32768 >> >> So I think what was going on here was: >> >> CPU 0 (reader) CPU 1 (writer) >> CPU 2 (writer) >> === >> == >> rcu_read_lock(); >> new = >> malloc(sizeof(int)); >> local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old >> *new = 8; >> >>old = rcu_xchg_pointer(_rcu_pointer, new); >> synchronize_rcu(): >> urcu_wait_add(); // return 0 >> urcu_move_waiters(); // @gp_waiters is >> empty, >>// the next >> urcu_wait_add() will return 0 >> >> >>synchronize_rcu(): >> >> urcu_wait_add(); // return 0 >> >> mutex_lock(_register_lock); >> wait_for_readers(); // remove registered >> reader from @registery, >> // release >> rcu_register_lock and wait via poll() >> >> >> mutex_lock(_registry_lock); >> >> wait_for_readers(); // @regsitery is empty! we are so lucky >> >> return; >> >> >>if (old) >> >>free(old) >> rcu_read_unlock(); >> assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed. >> >> >> So the point is there could be two writers calling synchronize_rcu() but >> not returning early(both of them enter the slow path to perform a grace >> period), so the rcu_gp_lock is necessary in this case. >> >> (Cc Mathieu) >> >> But this is only my understanding and I'm learning the URCU code too ;-) >> >> Regards, >> Boqun >> >> >>> Might well be that it is
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
On Thu, Apr 28, 2016 at 09:47:23AM -0400, Yuxin Ren wrote: > Hi Boqun and Paul, > > Thank you so much for your help. > > I found one reason to use that lock. > In the slow path, a thread will move all waiters to a local queue. > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406 > After this, following thread can also perform grace period, as the > global waiter queue is empty. > Thus the rcu_gp_lock is used to ensure mutual exclusion. > > However, from real time aspect, such blocking will cause priority > inversion: higher priority writer can be blocked by low priority > writer. > Is there a way to modify the code to allow multiple threads to perform > grace period concurrently? If a thread has real-time requirements, you shouldn't have it do synchronous grace periods, just as you shouldn't have it do (say) sleep(10). You should instead either (1) have some other non-realtime thread do the cleanup activities involving synchronize_rcu() or (2) have the real-time thread use the asynchronous call_rcu(). Thanx, Paul > Thanks again!! > Yuxin > > On Thu, Apr 28, 2016 at 8:44 AM, Boqun Fengwrote: > > Hi Paul and Yuxin, > > > > On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote: > >> Try building without it and see what happens when you run the tests. > >> > > > > I've run a 'regtest' with the following modification out of curiousity: > > > > diff --git a/urcu.c b/urcu.c > > index a5568bdbd075..9dc3c9feae56 100644 > > --- a/urcu.c > > +++ b/urcu.c > > @@ -398,8 +398,6 @@ void synchronize_rcu(void) > > /* We won't need to wake ourself up */ > > urcu_wait_set_state(, URCU_WAIT_RUNNING); > > > > - mutex_lock(_gp_lock); > > - > > /* > > * Move all waiters into our local queue. > > */ > > @@ -480,7 +478,6 @@ void synchronize_rcu(void) > > smp_mb_master(); > > out: > > mutex_unlock(_registry_lock); > > - mutex_unlock(_gp_lock); > > > > /* > > * Wakeup waiters only after we have completed the grace period > > > > > > And guess what, the result of the test was: > > > > Test Summary Report > > --- > > ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18) > > Failed tests: 30, 45, 60, 75, 90, 103, 105, 120, 135 > > 150, 165, 180, 195, 210, 225, 240, 253 > > 255 > > Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr 0.04 sys + 8981.02 cusr > > 597.64 csys = 9579.25 CPU) > > Result: FAIL > > > > And test case 30 for example is something like: > > > > tests/benchmark/test_urcu_mb 1 -d 0 -b 32768 > > > > and it failed because: > > > > lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' > > failed. > > zsh: abort (core dumped) ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 > > 1 -d 0 -b 32768 > > > > So I think what was going on here was: > > > > CPU 0 (reader) CPU 1 (writer) > > CPU 2 (writer) > > === > > == > > rcu_read_lock(); > > new = malloc(sizeof(int)); > > local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old > > *new = 8; > > > > old = rcu_xchg_pointer(_rcu_pointer, new); > > synchronize_rcu(): > > urcu_wait_add(); // return 0 > > urcu_move_waiters(); // @gp_waiters is > > empty, > >// the next > > urcu_wait_add() will return 0 > > > > > > synchronize_rcu(): > > > > urcu_wait_add(); // return 0 > > > > mutex_lock(_register_lock); > > wait_for_readers(); // remove registered > > reader from @registery, > > // release > > rcu_register_lock and wait via poll() > > > > > > mutex_lock(_registry_lock); > > > > wait_for_readers(); // @regsitery is empty! we are so lucky > > > > return; > > > > > > if (old) > > > > free(old) > > rcu_read_unlock(); > > assert(*local_ptr==8); // but
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
On Thu, Apr 28, 2016 at 08:44:01PM +0800, Boqun Feng wrote: > Hi Paul and Yuxin, > > On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote: > > Try building without it and see what happens when you run the tests. > > > > I've run a 'regtest' with the following modification out of curiousity: > > diff --git a/urcu.c b/urcu.c > index a5568bdbd075..9dc3c9feae56 100644 > --- a/urcu.c > +++ b/urcu.c > @@ -398,8 +398,6 @@ void synchronize_rcu(void) > /* We won't need to wake ourself up */ > urcu_wait_set_state(, URCU_WAIT_RUNNING); > > - mutex_lock(_gp_lock); > - > /* >* Move all waiters into our local queue. >*/ > @@ -480,7 +478,6 @@ void synchronize_rcu(void) > smp_mb_master(); > out: > mutex_unlock(_registry_lock); > - mutex_unlock(_gp_lock); > > /* >* Wakeup waiters only after we have completed the grace period > > > And guess what, the result of the test was: > > Test Summary Report > --- > ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18) > Failed tests: 30, 45, 60, 75, 90, 103, 105, 120, 135 > 150, 165, 180, 195, 210, 225, 240, 253 > 255 > Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr 0.04 sys + 8981.02 cusr > 597.64 csys = 9579.25 CPU) > Result: FAIL > > And test case 30 for example is something like: > > tests/benchmark/test_urcu_mb 1 -d 0 -b 32768 > > and it failed because: > > lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' > failed. > zsh: abort (core dumped) ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 > -d 0 -b 32768 > > So I think what was going on here was: > > CPU 0 (reader)CPU 1 (writer) > CPU 2 (writer) > === > == > rcu_read_lock(); > new = malloc(sizeof(int)); > local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old > *new = 8; > > old = rcu_xchg_pointer(_rcu_pointer, new); > synchronize_rcu(): > urcu_wait_add(); // return 0 > urcu_move_waiters(); // @gp_waiters is empty, > // the next > urcu_wait_add() will return 0 > > > synchronize_rcu(): > > urcu_wait_add(); // return 0 > > mutex_lock(_register_lock); > wait_for_readers(); // remove registered > reader from @registery, > // release > rcu_register_lock and wait via poll() > > > mutex_lock(_registry_lock); > > wait_for_readers(); // @regsitery is empty! we are so lucky > > return; > > > if (old) > > free(old) > rcu_read_unlock(); > assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed. > > > So the point is there could be two writers calling synchronize_rcu() but > not returning early(both of them enter the slow path to perform a grace > period), so the rcu_gp_lock is necessary in this case. > > (Cc Mathieu) > > But this is only my understanding and I'm learning the URCU code too ;-) Nothing quite like actually trying it and seeing what happens! One of the best learning methods that I know of. Assuming the act of actually trying it is non-fatal, of course. ;-) Thanx, Paul > Regards, > Boqun > > > > Might well be that it is unnecessary, but I will defer to Mathieu > > on that point. > > > > Thanx, Paul > > > > On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote: > > > As they don't currently perform grace period, why do we use the > > > rcu_gp_lock? > > > > > > Thank you. > > > Yuxin > > > > > > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney > > >wrote: > > > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote: > > > >> Hi, > > > >> > > > >> I am learning the URCU code. > > > >> > > > >> Why do we need rcu_gp_lock in synchronize_rcu? > > > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401 > > > >> > > > >> In the
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
Hi Boqun and Paul, Thank you so much for your help. I found one reason to use that lock. In the slow path, a thread will move all waiters to a local queue. https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L406 After this, following thread can also perform grace period, as the global waiter queue is empty. Thus the rcu_gp_lock is used to ensure mutual exclusion. However, from real time aspect, such blocking will cause priority inversion: higher priority writer can be blocked by low priority writer. Is there a way to modify the code to allow multiple threads to perform grace period concurrently? Thanks again!! Yuxin On Thu, Apr 28, 2016 at 8:44 AM, Boqun Fengwrote: > Hi Paul and Yuxin, > > On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote: >> Try building without it and see what happens when you run the tests. >> > > I've run a 'regtest' with the following modification out of curiousity: > > diff --git a/urcu.c b/urcu.c > index a5568bdbd075..9dc3c9feae56 100644 > --- a/urcu.c > +++ b/urcu.c > @@ -398,8 +398,6 @@ void synchronize_rcu(void) > /* We won't need to wake ourself up */ > urcu_wait_set_state(, URCU_WAIT_RUNNING); > > - mutex_lock(_gp_lock); > - > /* > * Move all waiters into our local queue. > */ > @@ -480,7 +478,6 @@ void synchronize_rcu(void) > smp_mb_master(); > out: > mutex_unlock(_registry_lock); > - mutex_unlock(_gp_lock); > > /* > * Wakeup waiters only after we have completed the grace period > > > And guess what, the result of the test was: > > Test Summary Report > --- > ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18) > Failed tests: 30, 45, 60, 75, 90, 103, 105, 120, 135 > 150, 165, 180, 195, 210, 225, 240, 253 > 255 > Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr 0.04 sys + 8981.02 cusr > 597.64 csys = 9579.25 CPU) > Result: FAIL > > And test case 30 for example is something like: > > tests/benchmark/test_urcu_mb 1 -d 0 -b 32768 > > and it failed because: > > lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' > failed. > zsh: abort (core dumped) ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 > -d 0 -b 32768 > > So I think what was going on here was: > > CPU 0 (reader) CPU 1 (writer) > CPU 2 (writer) > === > == > rcu_read_lock(); > new = malloc(sizeof(int)); > local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old > *new = 8; > > old = rcu_xchg_pointer(_rcu_pointer, new); > synchronize_rcu(): > urcu_wait_add(); // return 0 > urcu_move_waiters(); // @gp_waiters is > empty, >// the next > urcu_wait_add() will return 0 > > > synchronize_rcu(): > > urcu_wait_add(); // return 0 > > mutex_lock(_register_lock); > wait_for_readers(); // remove registered > reader from @registery, > // release > rcu_register_lock and wait via poll() > > > mutex_lock(_registry_lock); > > wait_for_readers(); // @regsitery is empty! we are so lucky > > return; > > > if (old) > > free(old) > rcu_read_unlock(); > assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed. > > > So the point is there could be two writers calling synchronize_rcu() but > not returning early(both of them enter the slow path to perform a grace > period), so the rcu_gp_lock is necessary in this case. > > (Cc Mathieu) > > But this is only my understanding and I'm learning the URCU code too ;-) > > Regards, > Boqun > > >> Might well be that it is unnecessary, but I will defer to Mathieu >> on that point. >> >> Thanx, Paul >> >> On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote: >> > As they don't currently perform grace period, why do we use the >> > rcu_gp_lock? >> > >> >
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
Hi Paul and Yuxin, On Wed, Apr 27, 2016 at 09:23:27PM -0700, Paul E. McKenney wrote: > Try building without it and see what happens when you run the tests. > I've run a 'regtest' with the following modification out of curiousity: diff --git a/urcu.c b/urcu.c index a5568bdbd075..9dc3c9feae56 100644 --- a/urcu.c +++ b/urcu.c @@ -398,8 +398,6 @@ void synchronize_rcu(void) /* We won't need to wake ourself up */ urcu_wait_set_state(, URCU_WAIT_RUNNING); - mutex_lock(_gp_lock); - /* * Move all waiters into our local queue. */ @@ -480,7 +478,6 @@ void synchronize_rcu(void) smp_mb_master(); out: mutex_unlock(_registry_lock); - mutex_unlock(_gp_lock); /* * Wakeup waiters only after we have completed the grace period And guess what, the result of the test was: Test Summary Report --- ./run-urcu-tests.sh 1 (Wstat: 0 Tests: 979 Failed: 18) Failed tests: 30, 45, 60, 75, 90, 103, 105, 120, 135 150, 165, 180, 195, 210, 225, 240, 253 255 Files=2, Tests=996, 1039 wallclock secs ( 0.55 usr 0.04 sys + 8981.02 cusr 597.64 csys = 9579.25 CPU) Result: FAIL And test case 30 for example is something like: tests/benchmark/test_urcu_mb 1 -d 0 -b 32768 and it failed because: lt-test_urcu_mb: test_urcu.c:183: thr_reader: Assertion `*local_ptr == 8' failed. zsh: abort (core dumped) ~/userspace-rcu/tests/benchmark/test_urcu_mb 4 4 1 -d 0 -b 32768 So I think what was going on here was: CPU 0 (reader) CPU 1 (writer) CPU 2 (writer) === == rcu_read_lock(); new = malloc(sizeof(int)); local_ptr = rcu_dereference(test_rcu_pointer); // local_ptr == old *new = 8; old = rcu_xchg_pointer(_rcu_pointer, new); synchronize_rcu(): urcu_wait_add(); // return 0 urcu_move_waiters(); // @gp_waiters is empty, // the next urcu_wait_add() will return 0 synchronize_rcu(): urcu_wait_add(); // return 0 mutex_lock(_register_lock); wait_for_readers(); // remove registered reader from @registery, // release rcu_register_lock and wait via poll() mutex_lock(_registry_lock); wait_for_readers(); // @regsitery is empty! we are so lucky return; if (old) free(old) rcu_read_unlock(); assert(*local_ptr==8); // but local_ptr(i.e. old) is already freed. So the point is there could be two writers calling synchronize_rcu() but not returning early(both of them enter the slow path to perform a grace period), so the rcu_gp_lock is necessary in this case. (Cc Mathieu) But this is only my understanding and I'm learning the URCU code too ;-) Regards, Boqun > Might well be that it is unnecessary, but I will defer to Mathieu > on that point. > > Thanx, Paul > > On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote: > > As they don't currently perform grace period, why do we use the rcu_gp_lock? > > > > Thank you. > > Yuxin > > > > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney > >wrote: > > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote: > > >> Hi, > > >> > > >> I am learning the URCU code. > > >> > > >> Why do we need rcu_gp_lock in synchronize_rcu? > > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401 > > >> > > >> In the comment, it says this lock ensures mutual exclusion between > > >> threads calling synchronize_rcu(). > > >> But only the first thread added to waiter queue can proceed to detect > > >> grace period. > > >> How can multiple threads currently perform the grace thread? > > > > > > They don't concurrently perform grace periods, and it would be wasteful > > > for them to do so. Instead, the first one performs the grace period, > > > and all that were waiting at the time it started get the benefit of that > > > same grace
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
Try building without it and see what happens when you run the tests. Might well be that it is unnecessary, but I will defer to Mathieu on that point. Thanx, Paul On Wed, Apr 27, 2016 at 10:18:04PM -0400, Yuxin Ren wrote: > As they don't currently perform grace period, why do we use the rcu_gp_lock? > > Thank you. > Yuxin > > On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenney >wrote: > > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote: > >> Hi, > >> > >> I am learning the URCU code. > >> > >> Why do we need rcu_gp_lock in synchronize_rcu? > >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401 > >> > >> In the comment, it says this lock ensures mutual exclusion between > >> threads calling synchronize_rcu(). > >> But only the first thread added to waiter queue can proceed to detect > >> grace period. > >> How can multiple threads currently perform the grace thread? > > > > They don't concurrently perform grace periods, and it would be wasteful > > for them to do so. Instead, the first one performs the grace period, > > and all that were waiting at the time it started get the benefit of that > > same grace period. > > > > Any that arrived after the first grace period performs the first > > grace period are served by whichever of them performs the second > > grace period. > > > > Thanx, Paul > > > ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
As they don't currently perform grace period, why do we use the rcu_gp_lock? Thank you. Yuxin On Wed, Apr 27, 2016 at 10:08 PM, Paul E. McKenneywrote: > On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote: >> Hi, >> >> I am learning the URCU code. >> >> Why do we need rcu_gp_lock in synchronize_rcu? >> https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401 >> >> In the comment, it says this lock ensures mutual exclusion between >> threads calling synchronize_rcu(). >> But only the first thread added to waiter queue can proceed to detect >> grace period. >> How can multiple threads currently perform the grace thread? > > They don't concurrently perform grace periods, and it would be wasteful > for them to do so. Instead, the first one performs the grace period, > and all that were waiting at the time it started get the benefit of that > same grace period. > > Any that arrived after the first grace period performs the first > grace period are served by whichever of them performs the second > grace period. > > Thanx, Paul > ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] Question about lock in synchronize_rcu implementation of URCU
On Wed, Apr 27, 2016 at 09:34:16PM -0400, Yuxin Ren wrote: > Hi, > > I am learning the URCU code. > > Why do we need rcu_gp_lock in synchronize_rcu? > https://github.com/urcu/userspace-rcu/blob/master/urcu.c#L401 > > In the comment, it says this lock ensures mutual exclusion between > threads calling synchronize_rcu(). > But only the first thread added to waiter queue can proceed to detect > grace period. > How can multiple threads currently perform the grace thread? They don't concurrently perform grace periods, and it would be wasteful for them to do so. Instead, the first one performs the grace period, and all that were waiting at the time it started get the benefit of that same grace period. Any that arrived after the first grace period performs the first grace period are served by whichever of them performs the second grace period. Thanx, Paul ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev