Re: vm_zeropage priority problems.

2001-12-26 Thread Terry Lambert

Luigi Rizzo wrote:
 
 Don't know how interesting this can be, but i am writing
 (no plans to commit it, unless people find it interesting)
 some code to implement a weight-based instead of priority-based
 scheduler. The code is basically the WF2Q+ scheme which is
 already part of dummynet, adapted to processes.
 It is quite compact, and i think i can make it reasonably
 compatible with the old scheme, i.e. a sysctl var can be
 used to switch between one and the other with reasonably
 little overhead.
 
 This would help removing the ugly property that priority-based
 have, which is that one process can starve the rest of the system.

Look for QLINUX.  There's a nice paper on Weighted Fair Share
scheduling referenced on the page, as well as a couple of LRP
references.

Note that someone recently did a port of the LRP + Resource Containers
code to FreeBSD 4.4 (Terrible Rice Univeristy License, though).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-22 Thread Luigi Rizzo

On Sat, Dec 22, 2001 at 06:48:26PM +1100, Bruce Evans wrote:
 On Fri, 21 Dec 2001, Luigi Rizzo wrote:
...
  This would help removing the ugly property that priority-based
  have, which is that one process can starve the rest of the system.
 
 Only broken priority-based schedulers have that property.  One of
 my incomplete fixes uses weights:

which makes it weight based, doesn't it :)

 Most of the changes here are to fix style bugs.  In the NEW_SCHED case,
 the relative weights for each priority are determined by the niceweights[]
 table.  kg-kg_estcpu is limited only by INT_MAX and priorities are
 assigned according to relative values of kg-kg_estcpu (code for this is
 not shown).

i guess the latter is the hard part... what kind of complexity does
it have ?

The nice feature of the scheduling code used in dummynet/Wf2Q+ is
that it has O(log N) complexity where N is the number of active
flows (processes in this case) and the basic scheduling operation
is just one or two heap insert/delete, so it's really fast.

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-22 Thread Jake Burkholder

Apparently, On Sat, Dec 22, 2001 at 06:48:26PM +1100,
Bruce Evans said words to the effect of;

 On Fri, 21 Dec 2001, Luigi Rizzo wrote:
 
  Don't know how interesting this can be, but i am writing
  (no plans to commit it, unless people find it interesting)
  some code to implement a weight-based instead of priority-based
  scheduler. The code is basically the WF2Q+ scheme which is
  already part of dummynet, adapted to processes.
  It is quite compact, and i think i can make it reasonably
  compatible with the old scheme, i.e. a sysctl var can be
  used to switch between one and the other with reasonably
  little overhead.
 
  This would help removing the ugly property that priority-based
  have, which is that one process can starve the rest of the system.
 
 Only broken priority-based schedulers have that property.  One of
 my incomplete fixes uses weights:
 
 Index: kern_synch.c
 ===
 RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v
 retrieving revision 1.167
 diff -u -2 -r1.167 kern_synch.c
 --- kern_synch.c  18 Dec 2001 00:27:17 -  1.167
 +++ kern_synch.c  19 Dec 2001 16:01:26 -
 @@ -936,18 +1058,18 @@
   struct thread *td;
  {
 - struct kse *ke = td-td_kse;
 - struct ksegrp *kg = td-td_ksegrp;
 + struct ksegrp *kg;
 
 - if (td) {
 - ke-ke_cpticks++;
 - kg-kg_estcpu = ESTCPULIM(kg-kg_estcpu + 1);
 - if ((kg-kg_estcpu % INVERSE_ESTCPU_WEIGHT) == 0) {
 - resetpriority(td-td_ksegrp);
 - if (kg-kg_pri.pri_level = PUSER)
 - kg-kg_pri.pri_level = kg-kg_pri.pri_user;
 - }
 - } else {
 + if (td == NULL)
   panic(schedclock);
 - }
 + td-td_kse-ke_cpticks++;
 + kg = td-td_ksegrp;
 +#ifdef NEW_SCHED
 + kg-kg_estcpu += niceweights[kg-kg_nice - PRIO_MIN];
 +#else
 + kg-kg_estcpu++;
 +#endif
 + resetpriority(kg);
 + if (kg-kg_pri.pri_level = PUSER)
 + kg-kg_pri.pri_level = kg-kg_pri.pri_user;
  }

I'm curious why you removed the ESTCPULIM and INVERSE_ESTCPU_WEIGHT
calculations even in the OLD_SCHED case.  Do these turn out to have
no effect in general?

 
 Most of the changes here are to fix style bugs.  In the NEW_SCHED case,
 the relative weights for each priority are determined by the niceweights[]
 table.  kg-kg_estcpu is limited only by INT_MAX and priorities are
 assigned according to relative values of kg-kg_estcpu (code for this is
 not shown).  The NEW_SCHED case has not been tried since before SMPng
 broke scheduling some more by compressing the priority ranges.

It is relatively easy to uncompress the priority ranges if that is
desirable.  What range is best?

Jake

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-22 Thread Bruce Evans

On Fri, 21 Dec 2001, John Baldwin wrote:

 On 21-Dec-01 Bruce Evans wrote:
  On Fri, 21 Dec 2001, Luigi Rizzo wrote:
  the original priority should be somewhere and accessible,
  either directly or through some function. Otherwise how
  do we know what to pass to tsleep() ?
 
  It's whatever the thread set itself.  There is no good way of setting
  this either (vm_pagezero() and poll_idle() hack it into
  td-td_ksegrp-kg_pri).  Userland would use rtprio(2) instead.
  Unfortunately, this gives priorities in different units than the ones
  for tsleep().

 pri_level is the current priority of the thread.  The actual priority level is
 going to move back into the thread and out of the KSE group so that tsleep and
 priority propagation work properly, but pri_native, pri_user, and nice will
 stay in the KSE group.  The normal priorities for tsleep() are just a subset

This will make encapsulating priority stuff on a struct more obviously wrong.

 of the priorities available to a thread.  Thus, they are using the same unit,
 but perhaps a wider range.

They are offset by PRI_MIN_IDLE too, due to vestiges of the rtprio()
misimplementation.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-22 Thread Bruce Evans

On Sat, 22 Dec 2001, Luigi Rizzo wrote:

 On Sat, Dec 22, 2001 at 06:48:26PM +1100, Bruce Evans wrote:
  Most of the changes here are to fix style bugs.  In the NEW_SCHED case,
  the relative weights for each priority are determined by the niceweights[]
  table.  kg-kg_estcpu is limited only by INT_MAX and priorities are
  assigned according to relative values of kg-kg_estcpu (code for this is
  not shown).

 i guess the latter is the hard part... what kind of complexity does
 it have ?

Not too bad.  I use an extra loop in schedcpu() to find the current maximum
of all kg-kg_estcpu, and convert the divison by this maximum (for scaling
individual kg-kg_estcpu's) to a multiplication and a shift.  This can
probably be done better in loadav().

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-22 Thread Bruce Evans

On Sat, 22 Dec 2001, Jake Burkholder wrote:

 Apparently, On Sat, Dec 22, 2001 at 06:48:26PM +1100,
   Bruce Evans said words to the effect of;
  Index: kern_synch.c
  ===
  RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v
  retrieving revision 1.167
  diff -u -2 -r1.167 kern_synch.c
  --- kern_synch.c18 Dec 2001 00:27:17 -  1.167
  +++ kern_synch.c19 Dec 2001 16:01:26 -
  @@ -936,18 +1058,18 @@
  struct thread *td;
   {
  -   struct kse *ke = td-td_kse;
  -   struct ksegrp *kg = td-td_ksegrp;
  +   struct ksegrp *kg;
 
  -   if (td) {
  -   ke-ke_cpticks++;
  -   kg-kg_estcpu = ESTCPULIM(kg-kg_estcpu + 1);
  -   if ((kg-kg_estcpu % INVERSE_ESTCPU_WEIGHT) == 0) {
  -   resetpriority(td-td_ksegrp);
  -   if (kg-kg_pri.pri_level = PUSER)
  -   kg-kg_pri.pri_level = kg-kg_pri.pri_user;
  -   }
  -   } else {
  +   if (td == NULL)
  panic(schedclock);
  -   }
  +   td-td_kse-ke_cpticks++;
  +   kg = td-td_ksegrp;
  +#ifdef NEW_SCHED
  +   kg-kg_estcpu += niceweights[kg-kg_nice - PRIO_MIN];
  +#else
  +   kg-kg_estcpu++;
  +#endif
  +   resetpriority(kg);
  +   if (kg-kg_pri.pri_level = PUSER)
  +   kg-kg_pri.pri_level = kg-kg_pri.pri_user;
   }

 I'm curious why you removed the ESTCPULIM and INVERSE_ESTCPU_WEIGHT
 calculations even in the OLD_SCHED case.  Do these turn out to have
 no effect in general?

ESTCPULIM basically breaks scheduling if it is are hit (clipping to it
prevents accumulation of hog points that would cause cpu hogs to be
run less).  This is a problem in practice.  I use dynamic limits even
in the !NEW_SCHED case.  I forgot that I did this or I would have included
more context to show it (see below).  kg-kg_estcpu is allowed to grow
without explicit limit and scaled to fit in the priority range.  This
requires fixing sorcerer's-apprentice growth of kg_estcpu in fork()
and exit().  kg_estcpu has natural limits but they are quite large
(a constant multiple of the load average).

INVERSE_ESTCPU_WEIGHT is not used because it goes with static scaling,
and % INVERSE_ESTCPU_WEIGHT optimization (which depends on the internals
of resetpriority()) is not so easy to do.

Here are the corresponding changes for resetpriority():

%%%
Index: kern_synch.c
===
RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.167
diff -u -2 -r1.167 kern_synch.c
--- kern_synch.c18 Dec 2001 00:27:17 -  1.167
+++ kern_synch.c22 Dec 2001 07:34:15 -
@@ -844,15 +949,32 @@
register struct ksegrp *kg;
 {
+   u_int estcpu;
register unsigned int newpriority;

mtx_lock_spin(sched_lock);
if (kg-kg_pri.pri_class == PRI_TIMESHARE) {
-   newpriority = PUSER + kg-kg_estcpu / INVERSE_ESTCPU_WEIGHT +
+   estcpu = kg-kg_estcpu;
+   if (estcpu  estcpumax)
+   estcpu = estcpumax;
+#ifdef NEW_SCHED
+   newpriority = PUSER +
+   (((u_int64_t)estcpu * estcpumul)  ESTCPU_SHIFT);
+#else
+   newpriority = PUSER +
+   (((u_int64_t)estcpu * estcpumul)  ESTCPU_SHIFT) +
NICE_WEIGHT * (kg-kg_nice - PRIO_MIN);
-   newpriority = min(max(newpriority, PRI_MIN_TIMESHARE),
-   PRI_MAX_TIMESHARE);
+#endif
+   if (newpriority  PUSER)
+   newpriority = PUSER;
+   if (newpriority  PRI_MAX_TIMESHARE) {
+   Debugger(newpriority botch);
+   newpriority = PRI_MAX_TIMESHARE;
+   }
kg-kg_pri.pri_user = newpriority;
-   }
-   maybe_resched(kg);
+   maybe_resched(kg, newpriority);
+   } else
+   /* XXX doing anything here is dubious. */
+   /* XXX was: need_resched(). */
+   maybe_resched(kg, kg-kg_pri.pri_user);
mtx_unlock_spin(sched_lock);
 }
%%%

  Most of the changes here are to fix style bugs.  In the NEW_SCHED case,
  the relative weights for each priority are determined by the niceweights[]
  table.  kg-kg_estcpu is limited only by INT_MAX and priorities are
  assigned according to relative values of kg-kg_estcpu (code for this is
  not shown).  The NEW_SCHED case has not been tried since before SMPng
  broke scheduling some more by compressing the priority ranges.

 It is relatively easy to uncompress the priority ranges if that is
 desirable.  What range is best?

The original algorithm works best with something close to the old range
of 50-127 (PUSER = 50, MAXPRI = 127) for positively niced processes alone.
This gives unniced processes a priority range of 50-127 and permits
nice -20'ed processes to have a much larger (numerically) base priority
than unniced ones while still allowing 

Re: vm_zeropage priority problems.

2001-12-21 Thread Bruce Evans

On Thu, 20 Dec 2001, John Baldwin wrote:

 On 20-Dec-01 Luigi Rizzo wrote:
  On Thu, Dec 20, 2001 at 12:16:03PM -0800, John Baldwin wrote:
  However, kthreads should tsleep() with their current priority, not PPAUSE.
 
  current meaning pri_level or pri_native ? What if one tries to
  tsleep() while holding a lock and so its pri_level is raised ?

 pri_level.  Calling tsleep() while holding a lock is a bug though. :)  Unless
 you are calling msleep() with a lock that will be released.

  In the device polling code i did a tsleep on the original pri_level,
  but maybe pri_native is good enough.

 pri_level is more correct.

I think pri_native is just an implementation detail which shouldn't
be used or visible to threads.  It used used by the priority propagation
mechanism to hold the original pri_level.  Threads should just use their
original priority (or a different one if they want to temporarily change
thier priority).  Even pri_level probably shouldn't be used or visible
to threads.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-21 Thread Luigi Rizzo

On Sat, Dec 22, 2001 at 12:46:40AM +1100, Bruce Evans wrote:
 I think pri_native is just an implementation detail which shouldn't
 be used or visible to threads.  It used used by the priority propagation
 mechanism to hold the original pri_level.  Threads should just use their
 original priority (or a different one if they want to temporarily change
 thier priority).  Even pri_level probably shouldn't be used or visible
 to threads.

the original priority should be somewhere and accessible,
either directly or through some function. Otherwise how
do we know what to pass to tsleep() ?

In any case I wonder if this is a bug new in -current; -stable
uses three separate data structures for realtime, user and idle tasks
so even specifying the wrong priority in tsleep should not cause
crossing classes there. -current has only one array, hence the
chance of doing the wrong thing.

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-21 Thread Bruce Evans

On Fri, 21 Dec 2001, Luigi Rizzo wrote:

 On Sat, Dec 22, 2001 at 12:46:40AM +1100, Bruce Evans wrote:
  I think pri_native is just an implementation detail which shouldn't
  be used or visible to threads.  It used used by the priority propagation
  mechanism to hold the original pri_level.  Threads should just use their
  original priority (or a different one if they want to temporarily change
  thier priority).  Even pri_level probably shouldn't be used or visible
  to threads.

 the original priority should be somewhere and accessible,
 either directly or through some function. Otherwise how
 do we know what to pass to tsleep() ?

It's whatever the thread set itself.  There is no good way of setting
this either (vm_pagezero() and poll_idle() hack it into
td-td_ksegrp-kg_pri).  Userland would use rtprio(2) instead.
Unfortunately, this gives priorities in different units than the ones
for tsleep().

 In any case I wonder if this is a bug new in -current; -stable
 uses three separate data structures for realtime, user and idle tasks
 so even specifying the wrong priority in tsleep should not cause
 crossing classes there. -current has only one array, hence the
 chance of doing the wrong thing.

The 3 classes are a design bug in -stable.  Crossing classes is sometimes
right and 3 classes mainly make it harder and force triplication of code.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-21 Thread John Baldwin


On 21-Dec-01 Bruce Evans wrote:
 On Fri, 21 Dec 2001, Luigi Rizzo wrote:
 
 On Sat, Dec 22, 2001 at 12:46:40AM +1100, Bruce Evans wrote:
  I think pri_native is just an implementation detail which shouldn't
  be used or visible to threads.  It used used by the priority propagation
  mechanism to hold the original pri_level.  Threads should just use their
  original priority (or a different one if they want to temporarily change
  thier priority).  Even pri_level probably shouldn't be used or visible
  to threads.

 the original priority should be somewhere and accessible,
 either directly or through some function. Otherwise how
 do we know what to pass to tsleep() ?
 
 It's whatever the thread set itself.  There is no good way of setting
 this either (vm_pagezero() and poll_idle() hack it into
 td-td_ksegrp-kg_pri).  Userland would use rtprio(2) instead.
 Unfortunately, this gives priorities in different units than the ones
 for tsleep().

pri_level is the current priority of the thread.  The actual priority level is
going to move back into the thread and out of the KSE group so that tsleep and
priority propagation work properly, but pri_native, pri_user, and nice will
stay in the KSE group.  The normal priorities for tsleep() are just a subset
of the priorities available to a thread.  Thus, they are using the same unit,
but perhaps a wider range.

 In any case I wonder if this is a bug new in -current; -stable
 uses three separate data structures for realtime, user and idle tasks
 so even specifying the wrong priority in tsleep should not cause
 crossing classes there. -current has only one array, hence the
 chance of doing the wrong thing.
 
 The 3 classes are a design bug in -stable.  Crossing classes is sometimes
 right and 3 classes mainly make it harder and force triplication of code.

Agreed.  In current we basically have one large priority array.  At the top
(low priorities) are realtime kernel priorities including interrupt thread
priorities.  Below those are other top-half kernel priorities including the
normal sleep priorities.  Below that are real-time user priorities, followed by
time sharing user priorities, and finally idle user priorities.

Possibly real-time user priorities should move up into the real-time kernel
range, and it should be noted that the idle priorites are for idle kernel
threads, not just user threads (so sys/priority.h may need a bit of updating).

This is a simpler model esp. when you consider a thread of one priority class
blocking on a lock held by a thread of a lower priority class as you only
have to update pri_level for priority propagation rather than comparing classes
and having to manage that extra gunk.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-21 Thread Bruce Evans

On Fri, 21 Dec 2001, Luigi Rizzo wrote:

 Don't know how interesting this can be, but i am writing
 (no plans to commit it, unless people find it interesting)
 some code to implement a weight-based instead of priority-based
 scheduler. The code is basically the WF2Q+ scheme which is
 already part of dummynet, adapted to processes.
 It is quite compact, and i think i can make it reasonably
 compatible with the old scheme, i.e. a sysctl var can be
 used to switch between one and the other with reasonably
 little overhead.

 This would help removing the ugly property that priority-based
 have, which is that one process can starve the rest of the system.

Only broken priority-based schedulers have that property.  One of
my incomplete fixes uses weights:

Index: kern_synch.c
===
RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.167
diff -u -2 -r1.167 kern_synch.c
--- kern_synch.c18 Dec 2001 00:27:17 -  1.167
+++ kern_synch.c19 Dec 2001 16:01:26 -
@@ -936,18 +1058,18 @@
struct thread *td;
 {
-   struct kse *ke = td-td_kse;
-   struct ksegrp *kg = td-td_ksegrp;
+   struct ksegrp *kg;

-   if (td) {
-   ke-ke_cpticks++;
-   kg-kg_estcpu = ESTCPULIM(kg-kg_estcpu + 1);
-   if ((kg-kg_estcpu % INVERSE_ESTCPU_WEIGHT) == 0) {
-   resetpriority(td-td_ksegrp);
-   if (kg-kg_pri.pri_level = PUSER)
-   kg-kg_pri.pri_level = kg-kg_pri.pri_user;
-   }
-   } else {
+   if (td == NULL)
panic(schedclock);
-   }
+   td-td_kse-ke_cpticks++;
+   kg = td-td_ksegrp;
+#ifdef NEW_SCHED
+   kg-kg_estcpu += niceweights[kg-kg_nice - PRIO_MIN];
+#else
+   kg-kg_estcpu++;
+#endif
+   resetpriority(kg);
+   if (kg-kg_pri.pri_level = PUSER)
+   kg-kg_pri.pri_level = kg-kg_pri.pri_user;
 }

Most of the changes here are to fix style bugs.  In the NEW_SCHED case,
the relative weights for each priority are determined by the niceweights[]
table.  kg-kg_estcpu is limited only by INT_MAX and priorities are
assigned according to relative values of kg-kg_estcpu (code for this is
not shown).  The NEW_SCHED case has not been tried since before SMPng
broke scheduling some more by compressing the priority ranges.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-20 Thread Peter Wemm

Luigi Rizzo wrote:
 [Cc peter because he introduced this code]
 
 Hi,
 i was trying the following code in -current (basically copied from
 vm_zeropage.c), to implement device polling in the idle loop, and
 noticed that the process would take all of the CPU time. Being
 suspicious that something was wrong with priorities, I added a
 couple of printfs, and to my surprise, the priority of the process
 (pri_level) after the first tsleep() becomes 104 instead of staying
 at 255 as it was originally initialized.
 
 A quick scan of the files in sys/sys shows that 104 is PPAUSE,
 and this is the source of the problem.

Excellent catch!  This particular problem was one of the main reasons
why this is still defaulting to 'off'.  I have a couple of other changes
to it pending commit to fix some of Bruce's complaints, but I hadn't
noticed the cause of this.

Part of the problem is that tsleep temporarily elevates the priority for
wakeup, and it is normally returned back to normal level when the process
returns to userland (see the *_userpri stuff).

Changing the tlseep PPAUSE value isn't necessarily the right thing either
because the priority propagation mechanism will elevate us if we hold a lock
that something else *needs*.  (This is so that when it yields, we will run
and do what we need to do and presumably release the lock.)

 I guess this is a bug in vm_zeropage.c, the second parameter
 in tsleep should be td-td_ksegrp-kg_pri.pri_level instead
 of PPAUSE, or at least the priority should be reset to the
 original value after returning from the tsleep ?

Possibly both of these.  Let me do some checking.

   cheers
   luigi
 
 static void
 poll_idle(void)
 {
 struct thread *td = curthread;
 struct rtprio rtp;
  
 rtp.prio = RTP_PRIO_MAX;/* lowest priority */
 rtp.type = RTP_PRIO_IDLE;
 mtx_lock_spin(sched_lock);
 rtp_to_pri(rtp, td-td_ksegrp-kg_pri);
 mtx_unlock_spin(sched_lock);

 printf(idlepoll start and sleep, pri is %d native %d user %d\n,
 td-td_ksegrp-kg_pri.pri_level,
 td-td_ksegrp-kg_pri.pri_native,
 td-td_ksegrp-kg_pri.pri_user);
 for (;;) {
 if (poll_in_idle  poll_handlers  0) {
 idlepoll_sleeping = 0;
 mtx_lock(Giant);
 ether_poll(poll_each_burst);
 mtx_unlock(Giant);
 mtx_assert(Giant, MA_NOTOWNED);
 mtx_lock_spin(sched_lock);
 setrunqueue(td);
 td-td_proc-p_stats-p_ru.ru_nvcsw++;
 mi_switch();
 mtx_unlock_spin(sched_lock);
 } else {
   printf(idlepoll goes to sleep, 
   pri is %d native %d user %d\n,
   td-td_ksegrp-kg_pri.pri_level,
   td-td_ksegrp-kg_pri.pri_native,
   td-td_ksegrp-kg_pri.pri_user);
 idlepoll_sleeping = 1;
 tsleep(idlepoll_sleeping, PPAUSE, pollid, hz * 3);
 }
 }
 }
 
 static struct proc *idlepoll;
 static struct kproc_desc idlepoll_kp = {
  idlepoll,
  poll_idle,
  idlepoll
 };
 SYSINIT(idlepoll, SI_SUB_KTHREAD_VM, SI_ORDER_ANY, kproc_start, idlepoll_kp)
 
 
 

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-20 Thread Luigi Rizzo

On Thu, Dec 20, 2001 at 11:13:27AM -0800, Peter Wemm wrote:
...
 Excellent catch!  This particular problem was one of the main reasons
 why this is still defaulting to 'off'.  I have a couple of other changes
 to it pending commit to fix some of Bruce's complaints, but I hadn't
 noticed the cause of this.
 
 Part of the problem is that tsleep temporarily elevates the priority for
 wakeup, and it is normally returned back to normal level when the process
 returns to userland (see the *_userpri stuff).

ah, ok, kernel threads never return to userland...

so, i presume one should also make sure that after the process is
scheduled, the priority is restored to the 'native' level; this
should also solve the problem with the priority propagation mechanism
(though... I have the feeling that if you have nested locks, this
cannot scale...)

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-20 Thread John Baldwin


On 20-Dec-01 Luigi Rizzo wrote:
 On Thu, Dec 20, 2001 at 11:13:27AM -0800, Peter Wemm wrote:
 ...
 Excellent catch!  This particular problem was one of the main reasons
 why this is still defaulting to 'off'.  I have a couple of other changes
 to it pending commit to fix some of Bruce's complaints, but I hadn't
 noticed the cause of this.
 
 Part of the problem is that tsleep temporarily elevates the priority for
 wakeup, and it is normally returned back to normal level when the process
 returns to userland (see the *_userpri stuff).
 
 ah, ok, kernel threads never return to userland...
 
 so, i presume one should also make sure that after the process is
 scheduled, the priority is restored to the 'native' level; this
 should also solve the problem with the priority propagation mechanism
 (though... I have the feeling that if you have nested locks, this
 cannot scale...)

Priority propagation will already handle things ok.  We drop to pri_native
after we drop a lock (although if we still hold a contested lock we bump our
priority to the min(nativepri, highest priority of threads on contested locks
we hold and drop to nativepri after dropping the last contested lock). 
However, kthreads should tsleep() with their current priority, not PPAUSE.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-20 Thread Luigi Rizzo

On Thu, Dec 20, 2001 at 12:16:03PM -0800, John Baldwin wrote:
...
 Priority propagation will already handle things ok.  We drop to pri_native
 after we drop a lock (although if we still hold a contested lock we bump our
 priority to the min(nativepri, highest priority of threads on contested locks
 we hold and drop to nativepri after dropping the last contested lock). 

ok, thanks for the clarification

 However, kthreads should tsleep() with their current priority, not PPAUSE.

current meaning pri_level or pri_native ? What if one tries to
tsleep() while holding a lock and so its pri_level is raised ?

In the device polling code i did a tsleep on the original pri_level,
but maybe pri_native is good enough.

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: vm_zeropage priority problems.

2001-12-20 Thread John Baldwin


On 20-Dec-01 Luigi Rizzo wrote:
 On Thu, Dec 20, 2001 at 12:16:03PM -0800, John Baldwin wrote:
 ...
 Priority propagation will already handle things ok.  We drop to pri_native
 after we drop a lock (although if we still hold a contested lock we bump our
 priority to the min(nativepri, highest priority of threads on contested
 locks
 we hold and drop to nativepri after dropping the last contested lock). 
 
 ok, thanks for the clarification
 
 However, kthreads should tsleep() with their current priority, not PPAUSE.
 
 current meaning pri_level or pri_native ? What if one tries to
 tsleep() while holding a lock and so its pri_level is raised ?

pri_level.  Calling tsleep() while holding a lock is a bug though. :)  Unless
you are calling msleep() with a lock that will be released.

 In the device polling code i did a tsleep on the original pri_level,
 but maybe pri_native is good enough.

pri_level is more correct.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



vm_zeropage priority problems.

2001-12-18 Thread Luigi Rizzo

[Cc peter because he introduced this code]

Hi,
i was trying the following code in -current (basically copied from
vm_zeropage.c), to implement device polling in the idle loop, and
noticed that the process would take all of the CPU time. Being
suspicious that something was wrong with priorities, I added a
couple of printfs, and to my surprise, the priority of the process
(pri_level) after the first tsleep() becomes 104 instead of staying
at 255 as it was originally initialized.

A quick scan of the files in sys/sys shows that 104 is PPAUSE,
and this is the source of the problem.

I guess this is a bug in vm_zeropage.c, the second parameter
in tsleep should be td-td_ksegrp-kg_pri.pri_level instead
of PPAUSE, or at least the priority should be reset to the
original value after returning from the tsleep ?

cheers
luigi

static void
poll_idle(void)
{
struct thread *td = curthread;
struct rtprio rtp;
 
rtp.prio = RTP_PRIO_MAX;/* lowest priority */
rtp.type = RTP_PRIO_IDLE;
mtx_lock_spin(sched_lock);
rtp_to_pri(rtp, td-td_ksegrp-kg_pri);
mtx_unlock_spin(sched_lock);
   
printf(idlepoll start and sleep, pri is %d native %d user %d\n,
td-td_ksegrp-kg_pri.pri_level,
td-td_ksegrp-kg_pri.pri_native,
td-td_ksegrp-kg_pri.pri_user);
for (;;) {
if (poll_in_idle  poll_handlers  0) {
idlepoll_sleeping = 0;
mtx_lock(Giant);
ether_poll(poll_each_burst);
mtx_unlock(Giant);
mtx_assert(Giant, MA_NOTOWNED);
mtx_lock_spin(sched_lock);
setrunqueue(td);
td-td_proc-p_stats-p_ru.ru_nvcsw++;
mi_switch();
mtx_unlock_spin(sched_lock);
} else {
printf(idlepoll goes to sleep, 
pri is %d native %d user %d\n,
td-td_ksegrp-kg_pri.pri_level,
td-td_ksegrp-kg_pri.pri_native,
td-td_ksegrp-kg_pri.pri_user);
idlepoll_sleeping = 1;
tsleep(idlepoll_sleeping, PPAUSE, pollid, hz * 3);
}
}
}

static struct proc *idlepoll;
static struct kproc_desc idlepoll_kp = {
 idlepoll,
 poll_idle,
 idlepoll
};
SYSINIT(idlepoll, SI_SUB_KTHREAD_VM, SI_ORDER_ANY, kproc_start, idlepoll_kp)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message