RE: select(2) converted to use a condition variable, and optimis

2001-05-15 Thread Seigo Tanimura

On Wed, 09 May 2001 19:20:07 +0900,
  Seigo Tanimura tanimura said:

Seigo On Tue, 08 May 2001 08:21:55 -0700 (PDT),
Seigo   John Baldwin [EMAIL PROTECTED] said:

John On 08-May-01 Seigo Tanimura wrote:

 Here is another issue. PROC_LOCK may block to acquire a process lock,
 during which an event of interest may occur or the remaining time of
 select(2)/poll(2) may run out. Thus if the remaining time runs out
 during locking a process, we should first rescan file descriptors to
 avoid missing an event, followed by returning the result.

John Hmmm.  Actually, can the nb_poll, ncp_poll, pollscan, or selscan functions call
John tsleep/msleep?  If they don't, then we are just better off holding proc lock
John while we call them rather than releasing it just to call that function and
John acquiring it later.  Then we don't have to work around the race you describe.

Seigo Poling functions called via fo_poll in a file descriptor should not
Seigo call msleep(9) in theory.

Seigo That does not, however, necessarily imply that we can scan file
Seigo descriptors with holding a process lock. Another process can release a
Seigo reference to a file descriptor via closef() during polling the
Seigo descriptor by calling its fo_poll. In this case, fdrop() subsequent to
Seigo the call of fo_poll may result the reference count of the descriptor
Seigo to be zero.

Another issue that turned out just now is lock order reversal in
selrecord(), where a process attempts to lock allproc sx while locking
the process itself. You might see something like this in your dmesg:

 lock order reversal
  1st 0xc7b9c31c process lock @ ../../kern/sys_generic.c:776
  2nd 0xc0438800 allproc @ ../../kern/kern_proc.c:146

As it is essential in this case to lock allproc and check whether
another process is waiting for an event, the only one option is to
back out the change of scanning file descriptors with holding a
process lock.

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



atomic operation of flags (was: RE: select(2) converted to use a condition variable, and optimis)

2001-05-14 Thread Seigo Tanimura

On Mon, 07 May 2001 12:37:22 -0700 (PDT),
  John Baldwin [EMAIL PROTECTED] said:

John You need the lock when clearing the bit in p_flag.  That is why the proc locks
John are there, so all those proc locks need to stay.  When you clear a bit, you are
John writing all the bits, so you need to ensure that you can atomically
John read/modify/write all the bits in p_flag, hence the need for the proc lock.

As we now have a set of atomic operation functions in
machine/atomic.h, why do we not use them to read, modify and write
p_flag atomically? Is that more expensive than protecting by PROC_LOCK
and PROC_UNLOCK?

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: select(2) converted to use a condition variable, and optimis

2001-05-11 Thread Seigo Tanimura

On Thu, 10 May 2001 09:06:15 +0900,
  Seigo Tanimura [EMAIL PROTECTED] said:

Seigo A quick and hopefully efficient solution to those problems is to
Seigo fhold() struct file's first, then enter polling loop. That seems much
Seigo cheaper than to work on free()ing a vnode or a socket with holding a
Seigo process lock, provided that struct filedesc and file are protected
Seigo properly (and we have to do it anyway).

That work is now in the patch at:

http://people.FreeBSD.org/~tanimura/patches/selectopt.diff

Another work in that patch is elimination of P_SELECT and nselcoll
check just prior to waiting for selwait. As we now scan file
descriptors with holding a process lock, race with selwakeup() should
never occur during polling.

Please note that netncp and netsmb are not in the scope the solution
discussed above because we can neither lock nor hold a reference to a
socket for now.

Since several issues regarding to the file descriptor layer have risen
up, I am planning to commit the patch soon and work on locking file
descriptors.

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: select(2) converted to use a condition variable, and optimis

2001-05-10 Thread Terry Lambert

Seigo Tanimura wrote:
 A quick and hopefully efficient solution to those problems is to
 fhold() struct file's first, then enter polling loop. That seems much
 cheaper than to work on free()ing a vnode or a socket with holding a
 process lock, provided that struct filedesc and file are protected
 properly (and we have to do it anyway).

Let me once again point out that fhold(), like crhold(),
should act as an l-valued function that takes an r-value
as an argument, as in:

a = crhold(b);
q = fhold(r);
etc. (for all such functions).

The reason is that you can not replace it with an instrumented
function, otherwise, which would permit you to catch subtle
errors that:

a = b;
crhold(a);
q = r;
fhold(r);

etc. may cause (yes, I know I have held the r-calue in one
case and the l-value in the other in the above example; the
kernel itself does this all over the place, and I think it's
an error).

I also think that they should use common base macros: it is
exceedingly dangerous to have multiple hold semantics in the
kernel.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: select(2) converted to use a condition variable, and optimis

2001-05-10 Thread John Baldwin


On 10-May-01 Terry Lambert wrote:
 Seigo Tanimura wrote:
 A quick and hopefully efficient solution to those problems is to
 fhold() struct file's first, then enter polling loop. That seems much
 cheaper than to work on free()ing a vnode or a socket with holding a
 process lock, provided that struct filedesc and file are protected
 properly (and we have to do it anyway).
 
 Let me once again point out that fhold(), like crhold(),
 should act as an l-valued function that takes an r-value
 as an argument, as in:
 
   a = crhold(b);
   q = fhold(r);
 etc. (for all such functions).

 [ snip ]

I agree with all this, I just haven't had the time to sit down and do it.

 -- Terry

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: select(2) converted to use a condition variable, and optimis

2001-05-09 Thread Seigo Tanimura

On Wed, 09 May 2001 19:20:07 +0900,
  Seigo Tanimura tanimura said:

Seigo That does not, however, necessarily imply that we can scan file
Seigo descriptors with holding a process lock. Another process can release a
Seigo reference to a file descriptor via closef() during polling the
Seigo descriptor by calling its fo_poll. In this case, fdrop() subsequent to
Seigo the call of fo_poll may result the reference count of the descriptor
Seigo to be zero.

Hang on, the process that owns a file descriptor should be fhold()ing
the descriptor. Since we never call closef() during waiting for an
event, that problem described above is no more than absurd fear.

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: select(2) converted to use a condition variable, and optimis

2001-05-09 Thread Seigo Tanimura

On Tue, 08 May 2001 08:21:55 -0700 (PDT),
  John Baldwin [EMAIL PROTECTED] said:

John On 08-May-01 Seigo Tanimura wrote:

 Here is another issue. PROC_LOCK may block to acquire a process lock,
 during which an event of interest may occur or the remaining time of
 select(2)/poll(2) may run out. Thus if the remaining time runs out
 during locking a process, we should first rescan file descriptors to
 avoid missing an event, followed by returning the result.

John Hmmm.  Actually, can the nb_poll, ncp_poll, pollscan, or selscan functions call
John tsleep/msleep?  If they don't, then we are just better off holding proc lock
John while we call them rather than releasing it just to call that function and
John acquiring it later.  Then we don't have to work around the race you describe.

Poling functions called via fo_poll in a file descriptor should not
call msleep(9) in theory.

That does not, however, necessarily imply that we can scan file
descriptors with holding a process lock. Another process can release a
reference to a file descriptor via closef() during polling the
descriptor by calling its fo_poll. In this case, fdrop() subsequent to
the call of fo_poll may result the reference count of the descriptor
to be zero.

Now the problem is whether it is easy or difficult to free a file
descriptor with holding a process lock. At the level of the file
descriptor layer, we can convert the memory allocator of a file
descriptor from malloc(9) to the zone allocator, which locks only the
zone state for file descriptors by a mutex.

It is a crucial issue to release an object underlying a file
descriptor. Releasing a vnode can result in calling msleep(9) for
locking the vnode in vrele(). Releasing a socket may end up with
calling tsleep(9) for draining data if SO_LINGER is set. Hence we
cannot scan file descriptors with holding a process lock for now.


John Also, in the done_noproclock: case (might want to use 'unlock:' and 'done:'
John instead of 'done:' and 'done_noproclock:' for label names instead) are you
John always sure that when you go to that label you don't need to clear P_SELECT?

select(2) and poll(2) set P_SELECT below retry: and clear P_SELECT
upon occurrence of an event or timeout. There are no other places than
them for a process to set P_SELECT in its p_flag. Thus we can assume
at the entry of select(2) and poll(2) that P_SELECT is not set.

We go to done_noproclock if and only if either copyin() or sanity
check of arguments fails. As we do not poll any descriptors in this
case, it is not necessary to clear P_SELECT.

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: select(2) converted to use a condition variable, and optimis

2001-05-09 Thread John Baldwin


On 09-May-01 Seigo Tanimura wrote:
 On Tue, 08 May 2001 08:21:55 -0700 (PDT),
   John Baldwin [EMAIL PROTECTED] said:
 
 John On 08-May-01 Seigo Tanimura wrote:
 
 Here is another issue. PROC_LOCK may block to acquire a process lock,
 during which an event of interest may occur or the remaining time of
 select(2)/poll(2) may run out. Thus if the remaining time runs out
 during locking a process, we should first rescan file descriptors to
 avoid missing an event, followed by returning the result.
 
 John Hmmm.  Actually, can the nb_poll, ncp_poll, pollscan, or selscan
 functions call
 John tsleep/msleep?  If they don't, then we are just better off holding proc
 lock
 John while we call them rather than releasing it just to call that function
 and
 John acquiring it later.  Then we don't have to work around the race you
 describe.
 
 Poling functions called via fo_poll in a file descriptor should not
 call msleep(9) in theory.

 Now the problem is whether it is easy or difficult to free a file
 descriptor with holding a process lock. At the level of the file
 descriptor layer, we can convert the memory allocator of a file
 descriptor from malloc(9) to the zone allocator, which locks only the
 zone state for file descriptors by a mutex.

Free'ing is not all that problematic since it doesn't sleep.  It is more
problematic atm because it can result in lock order reversals due to lockmgr
headaches.  Eventually this won't become an issue I hope.

 It is a crucial issue to release an object underlying a file
 descriptor. Releasing a vnode can result in calling msleep(9) for
 locking the vnode in vrele(). Releasing a socket may end up with
 calling tsleep(9) for draining data if SO_LINGER is set. Hence we
 cannot scan file descriptors with holding a process lock for now.

Argh, ok.  Eventually this issue may go away as well.

 John Also, in the done_noproclock: case (might want to use 'unlock:' and
 'done:'
 John instead of 'done:' and 'done_noproclock:' for label names instead) are
 you
 John always sure that when you go to that label you don't need to clear
 P_SELECT?
 
 select(2) and poll(2) set P_SELECT below retry: and clear P_SELECT
 upon occurrence of an event or timeout. There are no other places than
 them for a process to set P_SELECT in its p_flag. Thus we can assume
 at the entry of select(2) and poll(2) that P_SELECT is not set.
 
 We go to done_noproclock if and only if either copyin() or sanity
 check of arguments fails. As we do not poll any descriptors in this
 case, it is not necessary to clear P_SELECT.

Ok, sounds good.  Just wanted to make sure.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: RE: select(2) converted to use a condition variable, and optimis

2001-05-09 Thread Matt Dillon


There are several issues here:

* The process's descriptor table
* The struct file's referenced by that descriptor table
* The object underlying a struct file.

A process's descriptor table is not protected by the proc lock, because
the descriptor table can be shared across processes that rfork().

The struct file underlying a descriptor has a reference count.  The
descriptor table reference to the underlying struct file is part of
this ref count.  Any process that fhold()'s a struct file pointer also
bumps this ref count.  However, holding a ref on the struct file does
not prevent another process from revoking the descriptor, so you aren't
as safe as you might think.  You are safe from it being closed out
from under you, but not safe otherwise.

Additionally, the act of *getting* a new ref count on a struct file is
not currently protected by anything but Giant.  For example, look
at kern/kern_descrip.c line 226 in -current:

if ((unsigned)uap-fd = fdp-fd_nfiles ||
(fp = fdp-fd_ofiles[uap-fd]) == NULL)
return (EBADF);
pop = fdp-fd_ofileflags[uap-fd];
...

fhold(fp);

A race can develope between the point where 'fp' is loaded, and the 
point where it is held by fhold().  Linux solves this problem by making
their fhold() equivalent take a descriptor number rather then a struct
file and doing all necessary locking internally.  

I did a bunch of work on the descriptor code to handle -stable SMP
races that were occuring due to a lack of fhold()s around blocking
conditions, but I never took the final step of encapsulating the
descriptor number lookup in the fhold() (I ran out of time).

For -current, you MUST do this if you want to move any 
descriptor-referencing code out from under Giant.  The process lock
is not sufficient... the descriptor table needs its own mutex.  It
is possible to optimize this out by checking to see if the descriptor
table is shared, but in general you need it.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: select(2) converted to use a condition variable, and optimis

2001-05-09 Thread Seigo Tanimura

On Wed, 9 May 2001 13:33:54 -0700 (PDT),
  Matt Dillon [EMAIL PROTECTED] said:

Matt * The process's descriptor table
Matt * The struct file's referenced by that descriptor table

Those are in my TODO list, and I have already started working on them.

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: select(2) converted to use a condition variable, and optimis

2001-05-09 Thread Seigo Tanimura

On Wed, 09 May 2001 09:21:09 -0700 (PDT),
  John Baldwin [EMAIL PROTECTED] said:

 Now the problem is whether it is easy or difficult to free a file
 descriptor with holding a process lock. At the level of the file
 descriptor layer, we can convert the memory allocator of a file
 descriptor from malloc(9) to the zone allocator, which locks only the
 zone state for file descriptors by a mutex.

John Free'ing is not all that problematic since it doesn't sleep.  It is more
John problematic atm because it can result in lock order reversals due to lockmgr
John headaches.  Eventually this won't become an issue I hope.

 It is a crucial issue to release an object underlying a file
 descriptor. Releasing a vnode can result in calling msleep(9) for
 locking the vnode in vrele(). Releasing a socket may end up with
 calling tsleep(9) for draining data if SO_LINGER is set. Hence we
 cannot scan file descriptors with holding a process lock for now.

John Argh, ok.  Eventually this issue may go away as well.

A quick and hopefully efficient solution to those problems is to
fhold() struct file's first, then enter polling loop. That seems much
cheaper than to work on free()ing a vnode or a socket with holding a
process lock, provided that struct filedesc and file are protected
properly (and we have to do it anyway).

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: select(2) converted to use a condition variable, and optimis

2001-05-08 Thread Seigo Tanimura

On Mon, 07 May 2001 12:37:22 -0700 (PDT),
  John Baldwin [EMAIL PROTECTED] said:

John You need the lock when clearing the bit in p_flag.  That is why the proc locks
John are there, so all those proc locks need to stay.  When you clear a bit, you are
John writing all the bits, so you need to ensure that you can atomically
John read/modify/write all the bits in p_flag, hence the need for the proc lock.

John Please back out the changes to not lock the process around the p_flag change. 
John The rest of the patch looks ok, though I'd like to review the updated version
John before it is committed.  Thanks.

Process locks are now back.

Here is another issue. PROC_LOCK may block to acquire a process lock,
during which an event of interest may occur or the remaining time of
select(2)/poll(2) may run out. Thus if the remaining time runs out
during locking a process, we should first rescan file descriptors to
avoid missing an event, followed by returning the result.

Those changes are now in the updated patch at:

 http://people.FreeBSD.org/~tanimura/patches/selectopt.diff

-- 
Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: select(2) converted to use a condition variable, and optimis

2001-05-08 Thread John Baldwin


On 08-May-01 Seigo Tanimura wrote:
 On Mon, 07 May 2001 12:37:22 -0700 (PDT),
   John Baldwin [EMAIL PROTECTED] said:
 
 John You need the lock when clearing the bit in p_flag.  That is why the
 proc locks
 John are there, so all those proc locks need to stay.  When you clear a bit,
 you are
 John writing all the bits, so you need to ensure that you can atomically
 John read/modify/write all the bits in p_flag, hence the need for the proc
 lock.
 
 John Please back out the changes to not lock the process around the p_flag
 change. 
 John The rest of the patch looks ok, though I'd like to review the updated
 version
 John before it is committed.  Thanks.
 
 Process locks are now back.
 
 Here is another issue. PROC_LOCK may block to acquire a process lock,
 during which an event of interest may occur or the remaining time of
 select(2)/poll(2) may run out. Thus if the remaining time runs out
 during locking a process, we should first rescan file descriptors to
 avoid missing an event, followed by returning the result.
 
 Those changes are now in the updated patch at:
 
 http://people.FreeBSD.org/~tanimura/patches/selectopt.diff

Hmmm.  Actually, can the nb_poll, ncp_poll, pollscan, or selscan functions call
tsleep/msleep?  If they don't, then we are just better off holding proc lock
while we call them rather than releasing it just to call that function and
acquiring it later.  Then we don't have to work around the race you describe.

Also, in the done_noproclock: case (might want to use 'unlock:' and 'done:'
instead of 'done:' and 'done_noproclock:' for label names instead) are you
always sure that when you go to that label you don't need to clear P_SELECT?

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: select(2) converted to use a condition variable, and optimis

2001-05-07 Thread John Baldwin


On 06-May-01 Alfred Perlstein wrote:
 * Seigo Tanimura [EMAIL PROTECTED] [010506 04:40] wrote:
 
 http://people.FreeBSD.org/~tanimura/patches/selectopt.diff
 
 Please do not remove the spl calls, they serve as a useful guide
 for making finer grained locks as well as error checking the new
 locks.

Actually, in this case, the proc lock provides all the locking needed, so the
spl's can go I think.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: select(2) converted to use a condition variable, and optimis

2001-05-07 Thread John Baldwin


On 06-May-01 Seigo Tanimura wrote:
 As conversion of select(2) from msleep(9) to a condition variable is
 in the SMPng TODO list, I have done that task.
 
 Also, we do not have to lock a process in order to evaluate the result
 of {sel,poll}scan() and the remaining time of {select,poll}(2). It
 should be enough to do those pieces of work first, followed by locking
 a process and wait for a condition variable or selwakeup().

You need the lock when clearing the bit in p_flag.  That is why the proc locks
are there, so all those proc locks need to stay.  When you clear a bit, you are
writing all the bits, so you need to ensure that you can atomically
read/modify/write all the bits in p_flag, hence the need for the proc lock.

Please back out the changes to not lock the process around the p_flag change. 
The rest of the patch looks ok, though I'd like to review the updated version
before it is committed.  Thanks.

 Those changes are in the patch at:
 
 http://people.FreeBSD.org/~tanimura/patches/selectopt.diff

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message