Re: Timeout and SMP race

2002-07-11 Thread Justin T. Gibbs

I believe I had this conversation with Justin Gibbs earlier; he told 
me that the callout consumers (network, cam) had to be aware of the 
race and handle this if it matters.  I don't particularly like complicating
the callout handlers as illustrated above, though, so if a better scheme
is possible, that would be nice.

If callout_stop() is always called from a context that can block and
we desire the semantics that upon return from callout_stop(), you are
not in the callout handler, it seems that this should be quite easy to
achieve.  Just track the state of the callout from commit and return,
via some flags locked by a global mutex.  If a callout_stop is called
between commit and return (just look at a global indicating the
currently running callout protected by this mutex), put the thread on a
list to be woken by the thread issuing the callout.  To handle the case of
a reschedule performed by the callout itself, the woken up caller to
callout_stop() will have to loop through the test again.  In all cases,
the global mutex is only held for very short operations as a leaf mutex.

tsleep() is only called from process context

With interrupt threads, isn't process context a bit of a misnomer?
Would it be acceptable for the interrupt threads that perform callout_stops
to block waiting for them to take effect?

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-11 Thread David Xu


--- Justin T. Gibbs [EMAIL PROTECTED] wrote:
 I believe I had this conversation with Justin Gibbs earlier; he told 
 me that the callout consumers (network, cam) had to be aware of the 
 race and handle this if it matters.  I don't particularly like complicating
 the callout handlers as illustrated above, though, so if a better scheme
 is possible, that would be nice.
 
 If callout_stop() is always called from a context that can block and
 we desire the semantics that upon return from callout_stop(), you are
 not in the callout handler, it seems that this should be quite easy to
 achieve.  Just track the state of the callout from commit and return,
 via some flags locked by a global mutex.  If a callout_stop is called
 between commit and return (just look at a global indicating the
 currently running callout protected by this mutex), put the thread on a
 list to be woken by the thread issuing the callout.  To handle the case of
 a reschedule performed by the callout itself, the woken up caller to
 callout_stop() will have to loop through the test again.  In all cases,
 the global mutex is only held for very short operations as a leaf mutex.
 
 tsleep() is only called from process context
 
 With interrupt threads, isn't process context a bit of a misnomer?
 Would it be acceptable for the interrupt threads that perform callout_stops
 to block waiting for them to take effect?
 
 --
 Justin

I have a patch which is very like what you said here. I set a flag between
commit and return, and blocking callout_stop() caller if 
it is not in softclock() context, the patch still has some problems, for
example lock oder reversal which is pointed out by jhb. but Archie Cobbs 
and others are talking about another resolution, I will set back.

David Xu


__
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-10 Thread John Baldwin


On 10-Jul-2002 Archie Cobbs wrote:
 John Baldwin writes:
  code would be modified to fit this new behaviour,  besides this, everywhere 
  callout_stop() is used need to hold sched_lock and do a mi_switch() and
  modify td_flags is also unacceptable, this SMP race should be resolved in 
  kern_timeout.c.
 
 How would you resolve it while still preserving the existing semantics?
 Saying this race should be resolved doesn't explain how you would go about
 resolving it.  It's a lot harder than it looks.
 
 I don't know if this is the same problem or a different problem, but FWIW..

It is the same problem.  What we do is change callout_stop() to let you know if
it actually stopped the timeout or not.  You then have to use your own locking
and synchronization in the timeout function and yourself to close the rest of
the race.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-10 Thread Archie Cobbs

John Baldwin writes:
 It is the same problem. What we do is change callout_stop() to let you know if
 it actually stopped the timeout or not.  You then have to use your own locking
 and synchronization in the timeout function and yourself to close the rest of
 the race.

OK, thanks.

What do you think of the idea of letting the timer code (optionally)
handle all the locking and race conditions?

Even with callout_stop() returning 1 or 0, there still is *lots*
of complexity required on the client side, especially when the
object associated with the timer can be freed after stopping the
timer, and you can have the same timer stopped and then restarted
(which means that if you can't reliably stop the timer before
starting another one, you can get an early timeout due to a previously
not-stopped timer (which you have to check for (which is not trivial))).

I beg you to look at netgraph/ng_pptpgre.c for an example of the
gymnastics required. For example, you can't just use the object (in
this case a netgraph node) as the void * argument to the timeout
routine: you have to malloc() a separate structure containing just
a pointer to the object, and then in the object itself have a pointer
back to the malloc()'d structure.  This is necessary so you can
differentiate a real timer timeout from the previously stopped (but
not really stopped) timer timeout if the timer was then restarted.

In my experience, if the timer routines handle the locking life gets
MUCH simpler for everyone else.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-10 Thread John Baldwin


On 10-Jul-2002 Archie Cobbs wrote:
 John Baldwin writes:
 It is the same problem. What we do is change callout_stop() to let you know if
 it actually stopped the timeout or not.  You then have to use your own locking
 and synchronization in the timeout function and yourself to close the rest of
 the race.
 
 OK, thanks.
 
 What do you think of the idea of letting the timer code (optionally)
 handle all the locking and race conditions?

I'm not sure it can in a clean fashion since of the few cases I've known
so far each client needs a customized solution.  I am open to ideas though.
I'm also open to some redesign of how callouts work to begin with (maybe
using other threads than the one running softclock() to actually execute
callout handlers, etc.).

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-10 Thread Archie Cobbs


[ NOTE: I'm moving this discussion to [EMAIL PROTECTED] ]

John Baldwin writes:
  What do you think of the idea of letting the timer code (optionally)
  handle all the locking and race conditions?
 
 I'm not sure it can in a clean fashion since of the few cases I've known
 so far each client needs a customized solution.  I am open to ideas though.
 I'm also open to some redesign of how callouts work to begin with (maybe
 using other threads than the one running softclock() to actually execute
 callout handlers, etc.).

FWIW, here is an API I've used before. This handles all race
conditions and the 'other thread' question.

struct timer_event; /* opaque structure */

typedef struct timer_event *timer_handle_t; /* caller's timer handle */

typedef void timer_func_t(void *arg);   /* timeout function type */

/* flags for timer_start() */
#define TIMER_RECURRING 0x0001  /* timer is recurring */
#define TIMER_OWN_THREAD0x0002  /* handle in separate thread */

extern int  timer_start(timer_handle_t *handlep, mutex_t *mutexp,
timer_func_t tfunc, void *arg, u_int delay,
int flags);
extern void timer_cancel(timer_handle_t *handlep);
extern int  timer_remaining(timer_handle_t handle, u_int *delayp);

static inline int
timer_isrunning(timer_handle_t handle)
{
return (handle != NULL);
}

Semantics:

  1. The caller supplies a pointer to the 'handle', which must initially
 be NULL. The handle != NULL if and only if the timer is running.
  2. timer_cancel() guarantees that tfunc() will not be called subsequently
  3. *handlep is set to NULL by timer_cancel() and by the timer expiring.
 So when *handlep is NULL when tfunc() is invoked (unless TIMER_RECURRING).
  4. Calling timer_start() or timer_stop() from within tfunc() is OK.
  5. If TIMER_RECURRING, timer started again before calling tfunc()
  6. If TIMER_OWN_THREAD, timer runs in a newly created thread (rather
 than the timer service thread), which means that tfunc() may sleep
 or be canceled. If tfunc() sleeps or the thread is canceled but
 TIMER_OWN_THREAD was not set - panic.
  7. If mutexp != NULL, *mutexp is acquired before calling tfunc() and
 released after it returns.

Items 1, and 2 are guaranteed only if mutexp != NULL and the caller
acquires *mutexp before any calls to timer_start() or timer_cancel()
(you would normally be doing this anyway).

Errors:

  - timer_start() returns EBUSY if *handlep != NULL
  - timer_remaining() returns ESRCH if handle != NULL

The model is: you have some object that has an associated lock and
one or more associated timers. The object is to be locked whenever
you muck with it (including when you start, stop, or handle a timer):

struct foobar {
struct lock mutex;
timer_handle_t  timer1;
timer_handle_t  timer2;
...
};

Then all calls to the timer_* routines are well behaved and the
timeout thread caling tfunc() never races with any other thread
that may be stopping or starting the timer, or destroying the object.
E.g., to destroy the object, the following suffices:

void
foobar_destroy(struct foobar *f)
{
mutex_lock(f-mutex);
timer_cancel(f-timer1);
timer_cancel(f-timer2);
mutex_unlock(f-mutex);
free(f);
}

The only remaining complexity for the caller is that if you have
any TIMER_OWN_THREAD handlers which unlock the object (e.g., in order
to go to sleep), then you need to reference count the object and
have a FOOBAR_INVALID flag.

If you are working under a different model then this API may not
be appropriate, but at least in my multi-threading experience this
model is very typical.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-10 Thread Julian Elischer



On Wed, 10 Jul 2002, Archie Cobbs wrote:

 John Baldwin writes:
  It is the same problem. What we do is change callout_stop() to let you know if
  it actually stopped the timeout or not.  You then have to use your own locking
  and synchronization in the timeout function and yourself to close the rest of
  the race.
 
 OK, thanks.
 
 What do you think of the idea of letting the timer code (optionally)
 handle all the locking and race conditions?
 
 Even with callout_stop() returning 1 or 0, there still is *lots*
 of complexity required on the client side, especially when the
 object associated with the timer can be freed after stopping the
 timer, and you can have the same timer stopped and then restarted
 (which means that if you can't reliably stop the timer before
 starting another one, you can get an early timeout due to a previously
 not-stopped timer (which you have to check for (which is not trivial))).
 
 I beg you to look at netgraph/ng_pptpgre.c for an example of the
 gymnastics required. For example, you can't just use the object (in
 this case a netgraph node) as the void * argument to the timeout
 routine: you have to malloc() a separate structure containing just
 a pointer to the object, and then in the object itself have a pointer
 back to the malloc()'d structure.  This is necessary so you can
 differentiate a real timer timeout from the previously stopped (but
 not really stopped) timer timeout if the timer was then restarted.
 
 In my experience, if the timer routines handle the locking life gets
 MUCH simpler for everyone else.

I certainly agree that mayb ewe should look at supplying support in teh
timeout code for handling this race.. I've had my mind bent by it a few
too many times, and I'm starting to think There's got to be a better way

Archie.. can you pu tup a strawman proposal for the API chenge needed?


 
 -Archie
 
 __
 Archie Cobbs * Packet Design * http://www.packetdesign.com
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-09 Thread Archie Cobbs

John Baldwin writes:
  code would be modified to fit this new behaviour,  besides this, everywhere 
  callout_stop() is used need to hold sched_lock and do a mi_switch() and
  modify td_flags is also unacceptable, this SMP race should be resolved in 
  kern_timeout.c.
 
 How would you resolve it while still preserving the existing semantics?
 Saying this race should be resolved doesn't explain how you would go about
 resolving it.  It's a lot harder than it looks.

I don't know if this is the same problem or a different problem, but FWIW..

In other multi-threaded applications where a timer library is provided,
there's always the race condition between the timer firing in one thread
and the timer being stopped in the other thread.

The best solution I've come up with is to let the timer registration
routine take an optional 'lock *' argument that points to a lock
that should be acquired on behalf of the client code before invoking
the timeout handler, and released when that handler returns.

In addition, the client code acquires this same lock before any calls
to timer_start(), timer_stop(), timer_is_running(), etc. Presumably this
lock protects whatever object the timer is associated with.

This way, the timer is always definitely either running or not.

Some minor trickery is necessary in the timer code to avoid lock
reversal between the client lock and the timer code's own private
lock... but some complexity is unavoidable, and it's simpler and
less error-prone to consolidate it all in the timer library.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-08 Thread David Xu

I want to set an flag bit CALLOUT_PROCESSING in callout.c_flags,
before softclock() releases callout_lock and start requesting
callout.c_func(), so callout_stop can find that callout is processing
by softclock and wait, after softclock processed the callout, it
resets the flag and wakeup callout_stop thread, of course, if
callout_stop is being called in softclock() thread, it should avoid
waiting, it is easy to detect.

David Xu

- Original Message - 
From: John Baldwin [EMAIL PROTECTED]
To: David Xu [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; Julian Elischer [EMAIL PROTECTED]
Sent: Tuesday, July 09, 2002 3:27 AM
Subject: RE: Timeout and SMP race


 
 On 04-Jul-2002 David Xu wrote:
  while we are getting rid of Giant,  current race condition between
softclock()
  and callout_stop() is unacceptable. the race causes two many places in
source
  code would be modified to fit this new behaviour,  besides this, everywhere

  callout_stop() is used need to hold sched_lock and do a mi_switch() and
  modify td_flags is also unacceptable, this SMP race should be resolved in 
  kern_timeout.c.
 
 How would you resolve it while still preserving the existing semantics?
 Saying this race should be resolved doesn't explain how you would go about
 resolving it.  It's a lot harder than it looks.
 
 -- 
 
 John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
 Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

__
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-08 Thread John Baldwin


On 08-Jul-2002 David Xu wrote:
 I want to set an flag bit CALLOUT_PROCESSING in callout.c_flags,
 before softclock() releases callout_lock and start requesting
 callout.c_func(), so callout_stop can find that callout is processing
 by softclock and wait, after softclock processed the callout, it
 resets the flag and wakeup callout_stop thread, of course, if
 callout_stop is being called in softclock() thread, it should avoid
 waiting, it is easy to detect.

This doesn't work.  The callout function can do a callout_reset() on
itself to reschedule itself.  Well, if all the various callout functions
check for the processing flag it might work.  Then you have to figure out
how to properly wait.  You can't use a cv as they don't work with spin
mutexes.  Hmmm, this is one of the times I wish we could wait on a cv with
a spin mutex.  You could do something evil where you had a cv and a mutex,
unlocked callout lock, locked mutex, locked callout lock, rechecked
condition, then blocked if needed.  Similar evilness required when doing
a wakeup as well.  We can cheat in the endtsleep() case because we know
what thread to wakeup as well.  We don't easily have that in the general
case unless we bloat struct callout a bit.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-08 Thread David Xu

I'll work out a patch to see if my idea works.

David Xu

--- John Baldwin [EMAIL PROTECTED] wrote:
 
 On 08-Jul-2002 David Xu wrote:
  I want to set an flag bit CALLOUT_PROCESSING in callout.c_flags,
  before softclock() releases callout_lock and start requesting
  callout.c_func(), so callout_stop can find that callout is processing
  by softclock and wait, after softclock processed the callout, it
  resets the flag and wakeup callout_stop thread, of course, if
  callout_stop is being called in softclock() thread, it should avoid
  waiting, it is easy to detect.
 
 This doesn't work.  The callout function can do a callout_reset() on
 itself to reschedule itself.  Well, if all the various callout functions
 check for the processing flag it might work.  Then you have to figure out
 how to properly wait.  You can't use a cv as they don't work with spin
 mutexes.  Hmmm, this is one of the times I wish we could wait on a cv with
 a spin mutex.  You could do something evil where you had a cv and a mutex,
 unlocked callout lock, locked mutex, locked callout lock, rechecked
 condition, then blocked if needed.  Similar evilness required when doing
 a wakeup as well.  We can cheat in the endtsleep() case because we know
 what thread to wakeup as well.  We don't easily have that in the general
 case unless we bloat struct callout a bit.
 
 -- 
 
 John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
 Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/


__
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-04 Thread Julian Elischer



On Thu, 4 Jul 2002, David Xu wrote:

 while we are getting rid of Giant,  current race condition between softclock()
 and callout_stop() is unacceptable. the race causes two many places in source
 code would be modified to fit this new behaviour,  besides this, everywhere 
 callout_stop() is used need to hold sched_lock and do a mi_switch() and
 modify td_flags is also unacceptable, this SMP race should be resolved in 
 kern_timeout.c.  
 
 David Xu

This is probably true..
the current hacks for this are rather horrible. I think there msut be
better ways. Your suggestion sounds plausible.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-04 Thread David Xu


- Original Message - 
From: Julian Elischer [EMAIL PROTECTED]
To: David Xu [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, July 04, 2002 4:36 PM
Subject: Re: Timeout and SMP race


 
 
 On Thu, 4 Jul 2002, David Xu wrote:
 
  while we are getting rid of Giant,  current race condition between softclock()
  and callout_stop() is unacceptable. the race causes two many places in source
  code would be modified to fit this new behaviour,  besides this, everywhere 
  callout_stop() is used need to hold sched_lock and do a mi_switch() and
  modify td_flags is also unacceptable, this SMP race should be resolved in 
  kern_timeout.c.  
  
  David Xu
 
 This is probably true..
 the current hacks for this are rather horrible. I think there msut be
 better ways. Your suggestion sounds plausible.
 
 

if another thread other than softclock itself is calling callout_stop(),
and callout_stop() detected that softclock is currently running the 
callout,  it should wait until softclock finishes the work, then return.

-David Xu



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-04 Thread Bruce Evans

On Thu, 4 Jul 2002, David Xu wrote:

 - Original Message -
 From: Julian Elischer [EMAIL PROTECTED]
 To: David Xu [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Thursday, July 04, 2002 4:36 PM
 Subject: Re: Timeout and SMP race
 
  On Thu, 4 Jul 2002, David Xu wrote:
 
   while we are getting rid of Giant,  current race condition between softclock()
   and callout_stop() is unacceptable. the race causes two many places in source
   code would be modified to fit this new behaviour,  besides this, everywhere
   callout_stop() is used need to hold sched_lock and do a mi_switch() and
   modify td_flags is also unacceptable, this SMP race should be resolved in
   kern_timeout.c.
  
   David Xu
 
  This is probably true..
  the current hacks for this are rather horrible. I think there msut be
  better ways. Your suggestion sounds plausible.
 
 

 if another thread other than softclock itself is calling callout_stop(),
 and callout_stop() detected that softclock is currently running the
 callout,  it should wait until softclock finishes the work, then return.

softclock() intentionally releases callout_lock() to allow other processes
to manipulate callouts.  What is the race exactly?  Concurrent calls to
softclock() seem to be possible but seem to be handled correctly (internal
locking prevents problems).  Well, I can see one race in softclock():

%   c_func = c-c_func;
%   c_arg = c-c_arg;
%   c_flags = c-c_flags;

This caches some values, as is needed since the 'c' pointer may become
invalid after we release the lock ... but the things pointed to may become
invalid too.

%   c-c_func = NULL;
%   if (c-c_flags  CALLOUT_LOCAL_ALLOC) {
%   c-c_flags = CALLOUT_LOCAL_ALLOC;
%   SLIST_INSERT_HEAD(callfree, c,
%   c_links.sle);
%   } else
%   c-c_flags = ~CALLOUT_PENDING;
%   mtx_unlock_spin(callout_lock);

callout_stop() may stop 'c' here.  It won't do much, since we have already
set c-c_func to NULL, but its caller may want the callout actually stopped
so that it can do things like unloading the old c-c_func.

%   if ((c_flags  CALLOUT_MPSAFE) == 0)
%   mtx_lock(Giant);
%   c_func(c_arg);

This calls through a possibly-invalid function pointer.

%   if ((c_flags  CALLOUT_MPSAFE) == 0)
%   mtx_unlock(Giant);
%   mtx_lock_spin(callout_lock);

This seems to be an old bug.  In RELENG_4, splsoftclock() gives a more
global lock, but there is nothing to prevent callout_stop() being run
at splsoftclock().  In fact, it must be able to run when called nested
from inside softclock(), since it might be called from the handler.
Waiting in callout_stop() for softclock() to finish would deadlock in
this case.  It's interesting that this case is (always?) avoided in
untimeout() by not calling callout_stop() when c-c_func == NULL.

softclock() can't do anything about c-c_func going away after it is
called.  Clients must somehow avoid killing it.

I think c-c_func rarely goes away, and the race that you noticed is
lost more often.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-04 Thread David Xu

in RELENG_4, when one calls callout_stop() (not nested in softclock execute
path
, I am not talking about this case), after it returns, he can believe that the
callout is truely stopped, however in CURRENT, this assumption is false, now we

must care if callout_stop() truely stopped the callout when it returned, this 
is all difference I see, we bring in this race which not exists in RELENG_4, 
see what hacking code put in kern_condvar.c and kern_synch.c in CURRENT source,

this kind of problem is arising and knocking door.

sorry, our company's smtp server refuse to relay my mail from home, I must send
it from yahoo. :(

-David Xu

- Original Message - 
From: Bruce Evans [EMAIL PROTECTED]
To: David Xu [EMAIL PROTECTED]
Cc: Julian Elischer [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, July 04, 2002 7:02 PM
Subject: Re: Timeout and SMP race


 On Thu, 4 Jul 2002, David Xu wrote:
 
  - Original Message -
  From: Julian Elischer [EMAIL PROTECTED]
  To: David Xu [EMAIL PROTECTED]
  Cc: [EMAIL PROTECTED]
  Sent: Thursday, July 04, 2002 4:36 PM
  Subject: Re: Timeout and SMP race
  

 
  if another thread other than softclock itself is calling callout_stop(),
  and callout_stop() detected that softclock is currently running the
  callout,  it should wait until softclock finishes the work, then return.
 
 softclock() intentionally releases callout_lock() to allow other processes
 to manipulate callouts.  What is the race exactly?  Concurrent calls to
 softclock() seem to be possible but seem to be handled correctly (internal
 locking prevents problems).  Well, I can see one race in softclock():
 
 % c_func = c-c_func;
 % c_arg = c-c_arg;
 % c_flags = c-c_flags;
 
 This caches some values, as is needed since the 'c' pointer may become
 invalid after we release the lock ... but the things pointed to may become
 invalid too.
 
 % c-c_func = NULL;
 % if (c-c_flags  CALLOUT_LOCAL_ALLOC) {
 % c-c_flags = CALLOUT_LOCAL_ALLOC;
 % SLIST_INSERT_HEAD(callfree, c,
 % c_links.sle);
 % } else
 % c-c_flags = ~CALLOUT_PENDING;
 % mtx_unlock_spin(callout_lock);
 
 callout_stop() may stop 'c' here.  It won't do much, since we have already
 set c-c_func to NULL, but its caller may want the callout actually stopped
 so that it can do things like unloading the old c-c_func.
 
 % if ((c_flags  CALLOUT_MPSAFE) == 0)
 % mtx_lock(Giant);
 % c_func(c_arg);
 
 This calls through a possibly-invalid function pointer.
 
 % if ((c_flags  CALLOUT_MPSAFE) == 0)
 % mtx_unlock(Giant);
 % mtx_lock_spin(callout_lock);
 
 This seems to be an old bug.  In RELENG_4, splsoftclock() gives a more
 global lock, but there is nothing to prevent callout_stop() being run
 at splsoftclock().  In fact, it must be able to run when called nested
 from inside softclock(), since it might be called from the handler.
 Waiting in callout_stop() for softclock() to finish would deadlock in
 this case.  It's interesting that this case is (always?) avoided in
 untimeout() by not calling callout_stop() when c-c_func == NULL.
 
 softclock() can't do anything about c-c_func going away after it is
 called.  Clients must somehow avoid killing it.
 
 I think c-c_func rarely goes away, and the race that you noticed is
 lost more often.
 
 Bruce
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message

__
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-04 Thread Jonathan Lemon

In article 
local.mail.freebsd-current/[EMAIL PROTECTED] you 
write:
in RELENG_4, when one calls callout_stop() (not nested in softclock execute
path
, I am not talking about this case), after it returns, he can believe that the
callout is truely stopped, however in CURRENT, this assumption is false, now we

must care if callout_stop() truely stopped the callout when it returned, this 
is all difference I see, we bring in this race which not exists in RELENG_4, 
see what hacking code put in kern_condvar.c and kern_synch.c in CURRENT source,

I don't believe this is true.  There is a corresponding race in -stable,
where the spl() level is dropped before performing the callout, thus 
allowing either a call to callout_stop() or callout_reset() to come in
immediately before the callout is actually made.

The callout function is responsible for checking to see if it has lost
the race, and perform the appropriate action.  This is done with the 
CALLOUT_PENDING and CALLOUT_ACTIVE flags:

s = splnet();
if (callout_pending(tp-tt_delack) || !callout_active(tp-tt_delack)) {
splx(s);
return;
}
callout_deactivate(tp-tt_delack);

If 'CALLOUT_PENDING' is set, then we lost a race with callout_reset(),
and should not perform the callout.  If 'CALLOUT_ACTIVE' is clear, then
we lost a race with callout_stop().

Either way, on both -current and -stable, you cannot assume that the
timer callback is completely gone immediately after calling callout_stop().
--
Jonathan


- Original Message - 
From: Bruce Evans [EMAIL PROTECTED]
To: David Xu [EMAIL PROTECTED]
Cc: Julian Elischer [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, July 04, 2002 7:02 PM
Subject: Re: Timeout and SMP race


 On Thu, 4 Jul 2002, David Xu wrote:
 
  - Original Message -
  From: Julian Elischer [EMAIL PROTECTED]
  To: David Xu [EMAIL PROTECTED]
  Cc: [EMAIL PROTECTED]
  Sent: Thursday, July 04, 2002 4:36 PM
  Subject: Re: Timeout and SMP race
  

 
  if another thread other than softclock itself is calling callout_stop(),
  and callout_stop() detected that softclock is currently running the
  callout,  it should wait until softclock finishes the work, then return.
 
 softclock() intentionally releases callout_lock() to allow other processes
 to manipulate callouts.  What is the race exactly?  Concurrent calls to
 softclock() seem to be possible but seem to be handled correctly (internal
 locking prevents problems).  Well, I can see one race in softclock():
 
 % c_func = c-c_func;
 % c_arg = c-c_arg;
 % c_flags = c-c_flags;
 
 This caches some values, as is needed since the 'c' pointer may become
 invalid after we release the lock ... but the things pointed to may become
 invalid too.
 
 % c-c_func = NULL;
 % if (c-c_flags  CALLOUT_LOCAL_ALLOC) {
 % c-c_flags = CALLOUT_LOCAL_ALLOC;
 % SLIST_INSERT_HEAD(callfree, c,
 % c_links.sle);
 % } else
 % c-c_flags = ~CALLOUT_PENDING;
 % mtx_unlock_spin(callout_lock);
 
 callout_stop() may stop 'c' here.  It won't do much, since we have already
 set c-c_func to NULL, but its caller may want the callout actually stopped
 so that it can do things like unloading the old c-c_func.
 
 % if ((c_flags  CALLOUT_MPSAFE) == 0)
 % mtx_lock(Giant);
 % c_func(c_arg);
 
 This calls through a possibly-invalid function pointer.
 
 % if ((c_flags  CALLOUT_MPSAFE) == 0)
 % mtx_unlock(Giant);
 % mtx_lock_spin(callout_lock);
 
 This seems to be an old bug.  In RELENG_4, splsoftclock() gives a more
 global lock, but there is nothing to prevent callout_stop() being run
 at splsoftclock().  In fact, it must be able to run when called nested
 from inside softclock(), since it might be called from the handler.
 Waiting in callout_stop() for softclock() to finish would deadlock in
 this case.  It's interesting that this case is (always?) avoided in
 untimeout() by not calling callout_stop() when c-c_func == NULL.
 
 softclock() can't do anything about c-c_func going away after it is
 called.  Clients must somehow avoid killing it.
 
 I think c-c_func rarely goes away, and the race that you noticed is
 lost more often.
 
 Bruce
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message

__
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-04 Thread Bruce Evans

On Thu, 4 Jul 2002, Jonathan Lemon wrote:

 In article 
local.mail.freebsd-current/[EMAIL PROTECTED] you 
write:
 in RELENG_4, when one calls callout_stop() (not nested in softclock execute
 path
 , I am not talking about this case), after it returns, he can believe that the
 callout is truely stopped, however in CURRENT, this assumption is false, now we
 
 must care if callout_stop() truely stopped the callout when it returned, this
 is all difference I see, we bring in this race which not exists in RELENG_4,
 see what hacking code put in kern_condvar.c and kern_synch.c in CURRENT source,

 I don't believe this is true.  There is a corresponding race in -stable,
 where the spl() level is dropped before performing the callout, thus
 allowing either a call to callout_stop() or callout_reset() to come in
 immediately before the callout is actually made.

I think Giant locking everything prevents problems in RELENG_4, at least
when callout_stop() is called in process context (if it is called in
interrupt context then it could easily be interrupting a callout even in
the UP case).

The race window extends from when the ipl or lock is dropped across the
whole callout until the ipl or lock is regained. (The ipl is only dropped
to splstatclock(); this prevents interruption by timeouts but not by
other interrupts.  In -current there is nothing much to prevent softclock()
itself being called concurrently, but in theory softclock()'s internal
locking should prevent problems.)

 The callout function is responsible for checking to see if it has lost
 the race, and perform the appropriate action.  This is done with the
 CALLOUT_PENDING and CALLOUT_ACTIVE flags:


 s = splnet();
 if (callout_pending(tp-tt_delack) || !callout_active(tp-tt_delack)) {
 splx(s);
 return;
 }
 callout_deactivate(tp-tt_delack);

I think David is objecting to this complicating all callers that do the
check and breaking all that don't.  The callers in kern_synch.c and
kern_condvar.c have an mi_switch() and other complications to handle
this sinc they can't just return.

 If 'CALLOUT_PENDING' is set, then we lost a race with callout_reset(),
 and should not perform the callout.  If 'CALLOUT_ACTIVE' is clear, then
 we lost a race with callout_stop().

 Either way, on both -current and -stable, you cannot assume that the
 timer callback is completely gone immediately after calling callout_stop().

tsleep() seems to assume this in RELENG_4.

[Some context lost to top posting.]

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Timeout and SMP race

2002-07-04 Thread Jonathan Lemon

On Fri, Jul 05, 2002 at 02:38:08AM +1000, Bruce Evans wrote:
 On Thu, 4 Jul 2002, Jonathan Lemon wrote:
 
  In article 
local.mail.freebsd-current/[EMAIL PROTECTED] you 
write:
  in RELENG_4, when one calls callout_stop() (not nested in softclock execute
  path
  , I am not talking about this case), after it returns, he can believe that the
  callout is truely stopped, however in CURRENT, this assumption is false, now we
  
  must care if callout_stop() truely stopped the callout when it returned, this
  is all difference I see, we bring in this race which not exists in RELENG_4,
  see what hacking code put in kern_condvar.c and kern_synch.c in CURRENT source,
 
  I don't believe this is true.  There is a corresponding race in -stable,
  where the spl() level is dropped before performing the callout, thus
  allowing either a call to callout_stop() or callout_reset() to come in
  immediately before the callout is actually made.
 
 I think Giant locking everything prevents problems in RELENG_4, at least
 when callout_stop() is called in process context (if it is called in
 interrupt context then it could easily be interrupting a callout even in
 the UP case).

In the network stack at least, callout_stop() is called in interrupt
context, so this case actually happens, and has to be handled.

 The race window extends from when the ipl or lock is dropped across the
 whole callout until the ipl or lock is regained. (The ipl is only dropped
 to splstatclock(); this prevents interruption by timeouts but not by
 other interrupts.  In -current there is nothing much to prevent softclock()
 itself being called concurrently, but in theory softclock()'s internal
 locking should prevent problems.)
 
  The callout function is responsible for checking to see if it has lost
  the race, and perform the appropriate action.  This is done with the
  CALLOUT_PENDING and CALLOUT_ACTIVE flags:
 
 
  s = splnet();
  if (callout_pending(tp-tt_delack) || !callout_active(tp-tt_delack)) {
  splx(s);
  return;
  }
  callout_deactivate(tp-tt_delack);
 
 I think David is objecting to this complicating all callers that do the
 check and breaking all that don't.  The callers in kern_synch.c and
 kern_condvar.c have an mi_switch() and other complications to handle
 this sinc they can't just return.

I believe I had this conversation with Justin Gibbs earlier; he told 
me that the callout consumers (network, cam) had to be aware of the 
race and handle this if it matters.  I don't particularly like complicating
the callout handlers as illustrated above, though, so if a better scheme
is possible, that would be nice.

I originally wanted something equivalent to an atomic spl downgrade
(splhigh - splnet), so the timeout code could obtain the target
ipl/lock that the callout handler wanted before dropping splhigh(), but
was told this would unnecessarily complicate things.


  If 'CALLOUT_PENDING' is set, then we lost a race with callout_reset(),
  and should not perform the callout.  If 'CALLOUT_ACTIVE' is clear, then
  we lost a race with callout_stop().
 
  Either way, on both -current and -stable, you cannot assume that the
  timer callback is completely gone immediately after calling callout_stop().
 
 tsleep() seems to assume this in RELENG_4.

tsleep() is only called from process context, and appears to be safe
because p-wchan has already been cleared by a previous call to unsleep().
The races only matter if you actually care about them.
-- 
Jonathan

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message