RE: select(2) converted to use a condition variable, and optimis
On Wed, 09 May 2001 19:20:07 +0900, Seigo Tanimura said: Seigo> On Tue, 08 May 2001 08:21:55 -0700 (PDT), Seigo> John Baldwin <[EMAIL PROTECTED]> said: John> On 08-May-01 Seigo Tanimura wrote: >>> Here is another issue. PROC_LOCK may block to acquire a process lock, >>> during which an event of interest may occur or the remaining time of >>> select(2)/poll(2) may run out. Thus if the remaining time runs out >>> during locking a process, we should first rescan file descriptors to >>> avoid missing an event, followed by returning the result. John> Hmmm. Actually, can the nb_poll, ncp_poll, pollscan, or selscan functions call John> tsleep/msleep? If they don't, then we are just better off holding proc lock John> while we call them rather than releasing it just to call that function and John> acquiring it later. Then we don't have to work around the race you describe. Seigo> Poling functions called via fo_poll in a file descriptor should not Seigo> call msleep(9) in theory. Seigo> That does not, however, necessarily imply that we can scan file Seigo> descriptors with holding a process lock. Another process can release a Seigo> reference to a file descriptor via closef() during polling the Seigo> descriptor by calling its fo_poll. In this case, fdrop() subsequent to Seigo> the call of fo_poll may result the reference count of the descriptor Seigo> to be zero. Another issue that turned out just now is lock order reversal in selrecord(), where a process attempts to lock allproc sx while locking the process itself. You might see something like this in your dmesg: > lock order reversal > 1st 0xc7b9c31c process lock @ ../../kern/sys_generic.c:776 > 2nd 0xc0438800 allproc @ ../../kern/kern_proc.c:146 As it is essential in this case to lock allproc and check whether another process is waiting for an event, the only one option is to back out the change of scanning file descriptors with holding a process lock. -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
atomic operation of flags (was: RE: select(2) converted to use a condition variable, and optimis)
On Mon, 07 May 2001 12:37:22 -0700 (PDT), John Baldwin <[EMAIL PROTECTED]> said: John> You need the lock when clearing the bit in p_flag. That is why the proc locks John> are there, so all those proc locks need to stay. When you clear a bit, you are John> writing all the bits, so you need to ensure that you can atomically John> read/modify/write all the bits in p_flag, hence the need for the proc lock. As we now have a set of atomic operation functions in machine/atomic.h, why do we not use them to read, modify and write p_flag atomically? Is that more expensive than protecting by PROC_LOCK and PROC_UNLOCK? -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: select(2) converted to use a condition variable, and optimis
On Thu, 10 May 2001 09:06:15 +0900, Seigo Tanimura <[EMAIL PROTECTED]> said: Seigo> A quick and hopefully efficient solution to those problems is to Seigo> fhold() struct file's first, then enter polling loop. That seems much Seigo> cheaper than to work on free()ing a vnode or a socket with holding a Seigo> process lock, provided that struct filedesc and file are protected Seigo> properly (and we have to do it anyway). That work is now in the patch at: http://people.FreeBSD.org/~tanimura/patches/selectopt.diff Another work in that patch is elimination of P_SELECT and nselcoll check just prior to waiting for selwait. As we now scan file descriptors with holding a process lock, race with selwakeup() should never occur during polling. Please note that netncp and netsmb are not in the scope the solution discussed above because we can neither lock nor hold a reference to a socket for now. Since several issues regarding to the file descriptor layer have risen up, I am planning to commit the patch soon and work on locking file descriptors. -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: select(2) converted to use a condition variable, and optimis
On 10-May-01 Terry Lambert wrote: > Seigo Tanimura wrote: >> A quick and hopefully efficient solution to those problems is to >> fhold() struct file's first, then enter polling loop. That seems much >> cheaper than to work on free()ing a vnode or a socket with holding a >> process lock, provided that struct filedesc and file are protected >> properly (and we have to do it anyway). > > Let me once again point out that fhold(), like crhold(), > should act as an l-valued function that takes an r-value > as an "argument", as in: > > a = crhold(b); > q = fhold(r); > etc. (for all such functions). > > [ snip ] I agree with all this, I just haven't had the time to sit down and do it. > -- Terry -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: select(2) converted to use a condition variable, and optimis
Seigo Tanimura wrote: > A quick and hopefully efficient solution to those problems is to > fhold() struct file's first, then enter polling loop. That seems much > cheaper than to work on free()ing a vnode or a socket with holding a > process lock, provided that struct filedesc and file are protected > properly (and we have to do it anyway). Let me once again point out that fhold(), like crhold(), should act as an l-valued function that takes an r-value as an "argument", as in: a = crhold(b); q = fhold(r); etc. (for all such functions). The reason is that you can not replace it with an instrumented function, otherwise, which would permit you to catch subtle errors that: a = b; crhold(a); q = r; fhold(r); etc. may cause (yes, I know I have held the r-calue in one case and the l-value in the other in the above example; the kernel itself does this all over the place, and I think it's an error). I also think that they should use common base macros: it is exceedingly dangerous to have multiple hold semantics in the kernel. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: select(2) converted to use a condition variable, and optimis
On Wed, 09 May 2001 09:21:09 -0700 (PDT), John Baldwin <[EMAIL PROTECTED]> said: >> Now the problem is whether it is easy or difficult to free a file >> descriptor with holding a process lock. At the level of the file >> descriptor layer, we can convert the memory allocator of a file >> descriptor from malloc(9) to the zone allocator, which locks only the >> zone state for file descriptors by a mutex. John> Free'ing is not all that problematic since it doesn't sleep. It is more John> problematic atm because it can result in lock order reversals due to lockmgr John> headaches. Eventually this won't become an issue I hope. >> It is a crucial issue to release an object underlying a file >> descriptor. Releasing a vnode can result in calling msleep(9) for >> locking the vnode in vrele(). Releasing a socket may end up with >> calling tsleep(9) for draining data if SO_LINGER is set. Hence we >> cannot scan file descriptors with holding a process lock for now. John> Argh, ok. Eventually this issue may go away as well. A quick and hopefully efficient solution to those problems is to fhold() struct file's first, then enter polling loop. That seems much cheaper than to work on free()ing a vnode or a socket with holding a process lock, provided that struct filedesc and file are protected properly (and we have to do it anyway). -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: select(2) converted to use a condition variable, and optimis
On Wed, 9 May 2001 13:33:54 -0700 (PDT), Matt Dillon <[EMAIL PROTECTED]> said: Matt> * The process's descriptor table Matt> * The struct file's referenced by that descriptor table Those are in my TODO list, and I have already started working on them. -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: RE: select(2) converted to use a condition variable, and optimis
There are several issues here: * The process's descriptor table * The struct file's referenced by that descriptor table * The object underlying a struct file. A process's descriptor table is not protected by the proc lock, because the descriptor table can be shared across processes that rfork(). The struct file underlying a descriptor has a reference count. The descriptor table reference to the underlying struct file is part of this ref count. Any process that fhold()'s a struct file pointer also bumps this ref count. However, holding a ref on the struct file does not prevent another process from revoking the descriptor, so you aren't as safe as you might think. You are safe from it being closed out from under you, but not safe otherwise. Additionally, the act of *getting* a new ref count on a struct file is not currently protected by anything but Giant. For example, look at kern/kern_descrip.c line 226 in -current: if ((unsigned)uap->fd >= fdp->fd_nfiles || (fp = fdp->fd_ofiles[uap->fd]) == NULL) return (EBADF); pop = &fdp->fd_ofileflags[uap->fd]; ... fhold(fp); A race can develope between the point where 'fp' is loaded, and the point where it is held by fhold(). Linux solves this problem by making their fhold() equivalent take a descriptor number rather then a struct file and doing all necessary locking internally. I did a bunch of work on the descriptor code to handle -stable SMP races that were occuring due to a lack of fhold()s around blocking conditions, but I never took the final step of encapsulating the descriptor number lookup in the fhold() (I ran out of time). For -current, you MUST do this if you want to move any descriptor-referencing code out from under Giant. The process lock is not sufficient... the descriptor table needs its own mutex. It is possible to optimize this out by checking to see if the descriptor table is shared, but in general you need it. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: select(2) converted to use a condition variable, and optimis
On 09-May-01 Seigo Tanimura wrote: > On Tue, 08 May 2001 08:21:55 -0700 (PDT), > John Baldwin <[EMAIL PROTECTED]> said: > > John> On 08-May-01 Seigo Tanimura wrote: > >>> Here is another issue. PROC_LOCK may block to acquire a process lock, >>> during which an event of interest may occur or the remaining time of >>> select(2)/poll(2) may run out. Thus if the remaining time runs out >>> during locking a process, we should first rescan file descriptors to >>> avoid missing an event, followed by returning the result. > > John> Hmmm. Actually, can the nb_poll, ncp_poll, pollscan, or selscan > functions call > John> tsleep/msleep? If they don't, then we are just better off holding proc > lock > John> while we call them rather than releasing it just to call that function > and > John> acquiring it later. Then we don't have to work around the race you > describe. > > Poling functions called via fo_poll in a file descriptor should not > call msleep(9) in theory. > > Now the problem is whether it is easy or difficult to free a file > descriptor with holding a process lock. At the level of the file > descriptor layer, we can convert the memory allocator of a file > descriptor from malloc(9) to the zone allocator, which locks only the > zone state for file descriptors by a mutex. Free'ing is not all that problematic since it doesn't sleep. It is more problematic atm because it can result in lock order reversals due to lockmgr headaches. Eventually this won't become an issue I hope. > It is a crucial issue to release an object underlying a file > descriptor. Releasing a vnode can result in calling msleep(9) for > locking the vnode in vrele(). Releasing a socket may end up with > calling tsleep(9) for draining data if SO_LINGER is set. Hence we > cannot scan file descriptors with holding a process lock for now. Argh, ok. Eventually this issue may go away as well. > John> Also, in the done_noproclock: case (might want to use 'unlock:' and > 'done:' > John> instead of 'done:' and 'done_noproclock:' for label names instead) are > you > John> always sure that when you go to that label you don't need to clear > P_SELECT? > > select(2) and poll(2) set P_SELECT below retry: and clear P_SELECT > upon occurrence of an event or timeout. There are no other places than > them for a process to set P_SELECT in its p_flag. Thus we can assume > at the entry of select(2) and poll(2) that P_SELECT is not set. > > We go to done_noproclock if and only if either copyin() or sanity > check of arguments fails. As we do not poll any descriptors in this > case, it is not necessary to clear P_SELECT. Ok, sounds good. Just wanted to make sure. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: select(2) converted to use a condition variable, and optimis
On Tue, 08 May 2001 08:21:55 -0700 (PDT), John Baldwin <[EMAIL PROTECTED]> said: John> On 08-May-01 Seigo Tanimura wrote: >> Here is another issue. PROC_LOCK may block to acquire a process lock, >> during which an event of interest may occur or the remaining time of >> select(2)/poll(2) may run out. Thus if the remaining time runs out >> during locking a process, we should first rescan file descriptors to >> avoid missing an event, followed by returning the result. John> Hmmm. Actually, can the nb_poll, ncp_poll, pollscan, or selscan functions call John> tsleep/msleep? If they don't, then we are just better off holding proc lock John> while we call them rather than releasing it just to call that function and John> acquiring it later. Then we don't have to work around the race you describe. Poling functions called via fo_poll in a file descriptor should not call msleep(9) in theory. That does not, however, necessarily imply that we can scan file descriptors with holding a process lock. Another process can release a reference to a file descriptor via closef() during polling the descriptor by calling its fo_poll. In this case, fdrop() subsequent to the call of fo_poll may result the reference count of the descriptor to be zero. Now the problem is whether it is easy or difficult to free a file descriptor with holding a process lock. At the level of the file descriptor layer, we can convert the memory allocator of a file descriptor from malloc(9) to the zone allocator, which locks only the zone state for file descriptors by a mutex. It is a crucial issue to release an object underlying a file descriptor. Releasing a vnode can result in calling msleep(9) for locking the vnode in vrele(). Releasing a socket may end up with calling tsleep(9) for draining data if SO_LINGER is set. Hence we cannot scan file descriptors with holding a process lock for now. John> Also, in the done_noproclock: case (might want to use 'unlock:' and 'done:' John> instead of 'done:' and 'done_noproclock:' for label names instead) are you John> always sure that when you go to that label you don't need to clear P_SELECT? select(2) and poll(2) set P_SELECT below retry: and clear P_SELECT upon occurrence of an event or timeout. There are no other places than them for a process to set P_SELECT in its p_flag. Thus we can assume at the entry of select(2) and poll(2) that P_SELECT is not set. We go to done_noproclock if and only if either copyin() or sanity check of arguments fails. As we do not poll any descriptors in this case, it is not necessary to clear P_SELECT. -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: select(2) converted to use a condition variable, and optimis
On Wed, 09 May 2001 19:20:07 +0900, Seigo Tanimura said: Seigo> That does not, however, necessarily imply that we can scan file Seigo> descriptors with holding a process lock. Another process can release a Seigo> reference to a file descriptor via closef() during polling the Seigo> descriptor by calling its fo_poll. In this case, fdrop() subsequent to Seigo> the call of fo_poll may result the reference count of the descriptor Seigo> to be zero. Hang on, the process that owns a file descriptor should be fhold()ing the descriptor. Since we never call closef() during waiting for an event, that problem described above is no more than absurd fear. -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: select(2) converted to use a condition variable, and optimis
On 08-May-01 Seigo Tanimura wrote: > On Mon, 07 May 2001 12:37:22 -0700 (PDT), > John Baldwin <[EMAIL PROTECTED]> said: > > John> You need the lock when clearing the bit in p_flag. That is why the > proc locks > John> are there, so all those proc locks need to stay. When you clear a bit, > you are > John> writing all the bits, so you need to ensure that you can atomically > John> read/modify/write all the bits in p_flag, hence the need for the proc > lock. > > John> Please back out the changes to not lock the process around the p_flag > change. > John> The rest of the patch looks ok, though I'd like to review the updated > version > John> before it is committed. Thanks. > > Process locks are now back. > > Here is another issue. PROC_LOCK may block to acquire a process lock, > during which an event of interest may occur or the remaining time of > select(2)/poll(2) may run out. Thus if the remaining time runs out > during locking a process, we should first rescan file descriptors to > avoid missing an event, followed by returning the result. > > Those changes are now in the updated patch at: > >>> http://people.FreeBSD.org/~tanimura/patches/selectopt.diff Hmmm. Actually, can the nb_poll, ncp_poll, pollscan, or selscan functions call tsleep/msleep? If they don't, then we are just better off holding proc lock while we call them rather than releasing it just to call that function and acquiring it later. Then we don't have to work around the race you describe. Also, in the done_noproclock: case (might want to use 'unlock:' and 'done:' instead of 'done:' and 'done_noproclock:' for label names instead) are you always sure that when you go to that label you don't need to clear P_SELECT? -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: select(2) converted to use a condition variable, and optimis
On Mon, 07 May 2001 12:37:22 -0700 (PDT), John Baldwin <[EMAIL PROTECTED]> said: John> You need the lock when clearing the bit in p_flag. That is why the proc locks John> are there, so all those proc locks need to stay. When you clear a bit, you are John> writing all the bits, so you need to ensure that you can atomically John> read/modify/write all the bits in p_flag, hence the need for the proc lock. John> Please back out the changes to not lock the process around the p_flag change. John> The rest of the patch looks ok, though I'd like to review the updated version John> before it is committed. Thanks. Process locks are now back. Here is another issue. PROC_LOCK may block to acquire a process lock, during which an event of interest may occur or the remaining time of select(2)/poll(2) may run out. Thus if the remaining time runs out during locking a process, we should first rescan file descriptors to avoid missing an event, followed by returning the result. Those changes are now in the updated patch at: >> http://people.FreeBSD.org/~tanimura/patches/selectopt.diff -- Seigo Tanimura <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: select(2) converted to use a condition variable, and optimis
On 06-May-01 Seigo Tanimura wrote: > As conversion of select(2) from msleep(9) to a condition variable is > in the SMPng TODO list, I have done that task. > > Also, we do not have to lock a process in order to evaluate the result > of {sel,poll}scan() and the remaining time of {select,poll}(2). It > should be enough to do those pieces of work first, followed by locking > a process and wait for a condition variable or selwakeup(). You need the lock when clearing the bit in p_flag. That is why the proc locks are there, so all those proc locks need to stay. When you clear a bit, you are writing all the bits, so you need to ensure that you can atomically read/modify/write all the bits in p_flag, hence the need for the proc lock. Please back out the changes to not lock the process around the p_flag change. The rest of the patch looks ok, though I'd like to review the updated version before it is committed. Thanks. > Those changes are in the patch at: > > http://people.FreeBSD.org/~tanimura/patches/selectopt.diff -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: select(2) converted to use a condition variable, and optimis
On 06-May-01 Alfred Perlstein wrote: > * Seigo Tanimura <[EMAIL PROTECTED]> [010506 04:40] wrote: >> >> http://people.FreeBSD.org/~tanimura/patches/selectopt.diff > > Please do not remove the spl calls, they serve as a useful guide > for making finer grained locks as well as error checking the new > locks. Actually, in this case, the proc lock provides all the locking needed, so the spl's can go I think. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message