Re: sched_pin() bug in SCHED_ULE

2010-09-01 Thread John Baldwin
On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote:
 On Tue, Aug 31, 2010 at 10:16 AM,  m...@freebsd.org wrote:
  I recorded the stack any time ts-ts_cpu was set and when a thread was
  migrated by sched_switch() I printed out the recorded info.  Here's
  what I found:
 
 
  XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
  [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000:
  #0 0x802b36b4 at bug67957+0x84
  #1 0x802b5dd4 at sched_affinity+0xd4
  #2 0x8024a707 at cpuset_setthread+0x137
  #3 0x8024aeae at cpuset_setaffinity+0x21e
  #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
  #5 0x80295f49 at isi_syscall+0x99
  #6 0x804a630e at ia32_syscall+0x1ce
  #7 0x8046dc60 at Xint0x80_syscall+0x60
  [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000:
  #0 0x802b36b4 at bug67957+0x84
  #1 0x802b4ad8 at sched_add+0xe8
  #2 0x8029b96a at create_thread+0x34a
  #3 0x8029badc at kern_thr_new+0x8c
  #4 0x804a8912 at freebsd32_thr_new+0x122
  #5 0x80295f49 at isi_syscall+0x99
  #6 0x804a630e at ia32_syscall+0x1ce
  #7 0x8046dc60 at Xint0x80_syscall+0x60
 
  So one thread in the process called cpuset_setaffinity(2), and another
  thread in the process was forcibly migrated by the IPI while returning
  from a syscall, while it had td_pinned set.
 
  Given this path, it seems reasonable to me to skip the migrate if we
  notice THREAD_CAN_MIGRATE is false.
 
  Opinions?  My debug code is below.  I'll try to write a short testcase
  that exhibits this bug.
 
 Just a few more thoughts on this.  The check in sched_affinity for
 THREAD_CAN_MIGRATE is racy.  Since witness uses sched_pin, it's not
 simple to take the THREAD lock around an increment of td_pinned.  So
 I'm looking for suggestions on the best way to fix this issue.  My
 thoughts:
 
 1) add a check in sched_switch() for THREAD_CAN_MIGRATE
 2) have WITNESS not use sched_pin, and take the THREAD lock when
 modifying td_pinned
 3) have the IPI_PREEMPT handler notice the thread is pinned (and not
 trying to bind) and postpone the mi_switch, just like it postpones
 when a thread is in a critical section.
 
 Except for the potential complexity of implementation, I think (2) is
 the best solution.

I don't like 2) only because you'd really need to fix all the other places 
that use sched_pin() to grab the thread lock.  That would be a bit expensive.  
I think the problem is code that checks THREAD_CAN_MIGRATE() for running 
threads that aren't curthread.

The sched_affinity() bug is probably my fault FWIW.  I think something along 
the lines of 1) is the best approach.  Maybe something like the patch below.  
It moves the CPU assignment out of sched_affinity() and moves it into 
sched_switch() instead so that it takes effect on the next context switch for 
this thread.  That is the point at which the CPU binding actually takes effect 
anyway since sched_affinity() doesn't force an immediate context switch.  This 
patch is relative to 7.  The patch for 9 is nearly identical (just a fixup for 
ipi_cpu() vs ipi_selected()).

Index: sched_ule.c
===
--- sched_ule.c (revision 212065)
+++ sched_ule.c (working copy)
@@ -1885,6 +1885,8 @@
srqflag = (flags  SW_PREEMPT) ?
SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
SRQ_OURSELF|SRQ_YIELDING;
+   if (THREAD_CAN_MIGRATE(td)  !THREAD_CAN_SCHED(td, ts-tscpu))
+   ts-ts_cpu = sched_pickcpu(td, 0);
if (ts-ts_cpu == cpuid)
tdq_add(tdq, td, srqflag);
else
@@ -2536,7 +2538,6 @@
 {
 #ifdef SMP
struct td_sched *ts;
-   int cpu;
 
THREAD_LOCK_ASSERT(td, MA_OWNED);
ts = td-td_sched;
@@ -2550,17 +2551,13 @@
if (!TD_IS_RUNNING(td))
return;
td-td_flags |= TDF_NEEDRESCHED;
-   if (!THREAD_CAN_MIGRATE(td))
-   return;
/*
-* Assign the new cpu and force a switch before returning to
-* userspace.  If the target thread is not running locally send
-* an ipi to force the issue.
+* Force a switch before returning to userspace.  If the
+* target thread is not running locally send an ipi to force
+* the issue.
 */
-   cpu = ts-ts_cpu;
-   ts-ts_cpu = sched_pickcpu(td, 0);
-   if (cpu != PCPU_GET(cpuid))
-   ipi_selected(1  cpu, IPI_PREEMPT);
+   if (td != curthread)
+   ipi_selected(1  ts-ts_cpu, IPI_PREEMPT);
 #endif
 }
 

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sched_pin() bug in SCHED_ULE

2010-09-01 Thread mdf
On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin j...@freebsd.org wrote:
 On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote:
 On Tue, Aug 31, 2010 at 10:16 AM,  m...@freebsd.org wrote:
  I recorded the stack any time ts-ts_cpu was set and when a thread was
  migrated by sched_switch() I printed out the recorded info.  Here's
  what I found:
 
 
  XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
  [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000:
  #0 0x802b36b4 at bug67957+0x84
  #1 0x802b5dd4 at sched_affinity+0xd4
  #2 0x8024a707 at cpuset_setthread+0x137
  #3 0x8024aeae at cpuset_setaffinity+0x21e
  #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
  #5 0x80295f49 at isi_syscall+0x99
  #6 0x804a630e at ia32_syscall+0x1ce
  #7 0x8046dc60 at Xint0x80_syscall+0x60
  [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000:
  #0 0x802b36b4 at bug67957+0x84
  #1 0x802b4ad8 at sched_add+0xe8
  #2 0x8029b96a at create_thread+0x34a
  #3 0x8029badc at kern_thr_new+0x8c
  #4 0x804a8912 at freebsd32_thr_new+0x122
  #5 0x80295f49 at isi_syscall+0x99
  #6 0x804a630e at ia32_syscall+0x1ce
  #7 0x8046dc60 at Xint0x80_syscall+0x60
 
  So one thread in the process called cpuset_setaffinity(2), and another
  thread in the process was forcibly migrated by the IPI while returning
  from a syscall, while it had td_pinned set.
 
  Given this path, it seems reasonable to me to skip the migrate if we
  notice THREAD_CAN_MIGRATE is false.
 
  Opinions?  My debug code is below.  I'll try to write a short testcase
  that exhibits this bug.

 Just a few more thoughts on this.  The check in sched_affinity for
 THREAD_CAN_MIGRATE is racy.  Since witness uses sched_pin, it's not
 simple to take the THREAD lock around an increment of td_pinned.  So
 I'm looking for suggestions on the best way to fix this issue.  My
 thoughts:

 1) add a check in sched_switch() for THREAD_CAN_MIGRATE
 2) have WITNESS not use sched_pin, and take the THREAD lock when
 modifying td_pinned
 3) have the IPI_PREEMPT handler notice the thread is pinned (and not
 trying to bind) and postpone the mi_switch, just like it postpones
 when a thread is in a critical section.

 Except for the potential complexity of implementation, I think (2) is
 the best solution.

 I don't like 2) only because you'd really need to fix all the other places
 that use sched_pin() to grab the thread lock.  That would be a bit expensive.
 I think the problem is code that checks THREAD_CAN_MIGRATE() for running
 threads that aren't curthread.

 The sched_affinity() bug is probably my fault FWIW.  I think something along
 the lines of 1) is the best approach.  Maybe something like the patch below.
 It moves the CPU assignment out of sched_affinity() and moves it into
 sched_switch() instead so that it takes effect on the next context switch for
 this thread.  That is the point at which the CPU binding actually takes effect
 anyway since sched_affinity() doesn't force an immediate context switch.  This
 patch is relative to 7.  The patch for 9 is nearly identical (just a fixup for
 ipi_cpu() vs ipi_selected()).

 Index: sched_ule.c
 ===
 --- sched_ule.c (revision 212065)
 +++ sched_ule.c (working copy)
 @@ -1885,6 +1885,8 @@
                srqflag = (flags  SW_PREEMPT) ?
                    SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
                    SRQ_OURSELF|SRQ_YIELDING;
 +               if (THREAD_CAN_MIGRATE(td)  !THREAD_CAN_SCHED(td, 
 ts-tscpu))
 +                       ts-ts_cpu = sched_pickcpu(td, 0);
                if (ts-ts_cpu == cpuid)
                        tdq_add(tdq, td, srqflag);
                else
 @@ -2536,7 +2538,6 @@
  {
  #ifdef SMP
        struct td_sched *ts;
 -       int cpu;

        THREAD_LOCK_ASSERT(td, MA_OWNED);
        ts = td-td_sched;
 @@ -2550,17 +2551,13 @@
        if (!TD_IS_RUNNING(td))
                return;
        td-td_flags |= TDF_NEEDRESCHED;
 -       if (!THREAD_CAN_MIGRATE(td))
 -               return;
        /*
 -        * Assign the new cpu and force a switch before returning to
 -        * userspace.  If the target thread is not running locally send
 -        * an ipi to force the issue.
 +        * Force a switch before returning to userspace.  If the
 +        * target thread is not running locally send an ipi to force
 +        * the issue.
         */
 -       cpu = ts-ts_cpu;
 -       ts-ts_cpu = sched_pickcpu(td, 0);
 -       if (cpu != PCPU_GET(cpuid))
 -               ipi_selected(1  cpu, IPI_PREEMPT);
 +       if (td != curthread)
 +               ipi_selected(1  ts-ts_cpu, IPI_PREEMPT);
  #endif
  }

I will test this patch out; thanks for the help!

Two questions:

1) How does a thread get moved between CPUs when it's not running?  I
see that we change the runqueue for non-running threads that are on a

Re: sched_pin() bug in SCHED_ULE

2010-09-01 Thread Alan Cox

m...@freebsd.org wrote:
[snip]

I will test this patch out; thanks for the help!

Two questions:

1) How does a thread get moved between CPUs when it's not running?  I
see that we change the runqueue for non-running threads that are on a
runqueue.  Does the code always check for THREAD_CAN_SCHED when adding
a sleeping thread to a runqueue on wakeup?

2) I assume sched_switch() runs a lot more often than sched_pin() or
sched_affinity(), so it would make sense to add any performance
penalty of increased work in either of those places than in the
scheduler.  I suppose the two memory references for THREAD_CAN_MIGRATE
and THREAD_CAN_SCHED won't be that expensive.
  


sched_pin() gets used a fair amount on i386 for creating temporary 
mappings in order to avoid the need for system-wide TLB shootdowns.  The 
use cases range from the fast path for pipes to page zeroing.


Alan

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sched_pin() bug in SCHED_ULE

2010-09-01 Thread John Baldwin
On Wednesday, September 01, 2010 12:54:13 pm m...@freebsd.org wrote:
 On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin j...@freebsd.org wrote:
  On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote:
  On Tue, Aug 31, 2010 at 10:16 AM,  m...@freebsd.org wrote:
   I recorded the stack any time ts-ts_cpu was set and when a thread was
   migrated by sched_switch() I printed out the recorded info.  Here's
   what I found:
  
  
   XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
   [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000:
   #0 0x802b36b4 at bug67957+0x84
   #1 0x802b5dd4 at sched_affinity+0xd4
   #2 0x8024a707 at cpuset_setthread+0x137
   #3 0x8024aeae at cpuset_setaffinity+0x21e
   #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
   #5 0x80295f49 at isi_syscall+0x99
   #6 0x804a630e at ia32_syscall+0x1ce
   #7 0x8046dc60 at Xint0x80_syscall+0x60
   [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000:
   #0 0x802b36b4 at bug67957+0x84
   #1 0x802b4ad8 at sched_add+0xe8
   #2 0x8029b96a at create_thread+0x34a
   #3 0x8029badc at kern_thr_new+0x8c
   #4 0x804a8912 at freebsd32_thr_new+0x122
   #5 0x80295f49 at isi_syscall+0x99
   #6 0x804a630e at ia32_syscall+0x1ce
   #7 0x8046dc60 at Xint0x80_syscall+0x60
  
   So one thread in the process called cpuset_setaffinity(2), and another
   thread in the process was forcibly migrated by the IPI while returning
   from a syscall, while it had td_pinned set.
  
   Given this path, it seems reasonable to me to skip the migrate if we
   notice THREAD_CAN_MIGRATE is false.
  
   Opinions?  My debug code is below.  I'll try to write a short testcase
   that exhibits this bug.
 
  Just a few more thoughts on this.  The check in sched_affinity for
  THREAD_CAN_MIGRATE is racy.  Since witness uses sched_pin, it's not
  simple to take the THREAD lock around an increment of td_pinned.  So
  I'm looking for suggestions on the best way to fix this issue.  My
  thoughts:
 
  1) add a check in sched_switch() for THREAD_CAN_MIGRATE
  2) have WITNESS not use sched_pin, and take the THREAD lock when
  modifying td_pinned
  3) have the IPI_PREEMPT handler notice the thread is pinned (and not
  trying to bind) and postpone the mi_switch, just like it postpones
  when a thread is in a critical section.
 
  Except for the potential complexity of implementation, I think (2) is
  the best solution.
 
  I don't like 2) only because you'd really need to fix all the other places
  that use sched_pin() to grab the thread lock.  That would be a bit 
  expensive.
  I think the problem is code that checks THREAD_CAN_MIGRATE() for running
  threads that aren't curthread.
 
  The sched_affinity() bug is probably my fault FWIW.  I think something along
  the lines of 1) is the best approach.  Maybe something like the patch below.
  It moves the CPU assignment out of sched_affinity() and moves it into
  sched_switch() instead so that it takes effect on the next context switch 
  for
  this thread.  That is the point at which the CPU binding actually takes 
  effect
  anyway since sched_affinity() doesn't force an immediate context switch.  
  This
  patch is relative to 7.  The patch for 9 is nearly identical (just a fixup 
  for
  ipi_cpu() vs ipi_selected()).
 
  Index: sched_ule.c
  ===
  --- sched_ule.c (revision 212065)
  +++ sched_ule.c (working copy)
  @@ -1885,6 +1885,8 @@
 srqflag = (flags  SW_PREEMPT) ?
 SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
 SRQ_OURSELF|SRQ_YIELDING;
  +   if (THREAD_CAN_MIGRATE(td)  !THREAD_CAN_SCHED(td, 
  ts-tscpu))
  +   ts-ts_cpu = sched_pickcpu(td, 0);
 if (ts-ts_cpu == cpuid)
 tdq_add(tdq, td, srqflag);
 else
  @@ -2536,7 +2538,6 @@
   {
   #ifdef SMP
 struct td_sched *ts;
  -   int cpu;
 
 THREAD_LOCK_ASSERT(td, MA_OWNED);
 ts = td-td_sched;
  @@ -2550,17 +2551,13 @@
 if (!TD_IS_RUNNING(td))
 return;
 td-td_flags |= TDF_NEEDRESCHED;
  -   if (!THREAD_CAN_MIGRATE(td))
  -   return;
 /*
  -* Assign the new cpu and force a switch before returning to
  -* userspace.  If the target thread is not running locally send
  -* an ipi to force the issue.
  +* Force a switch before returning to userspace.  If the
  +* target thread is not running locally send an ipi to force
  +* the issue.
  */
  -   cpu = ts-ts_cpu;
  -   ts-ts_cpu = sched_pickcpu(td, 0);
  -   if (cpu != PCPU_GET(cpuid))
  -   ipi_selected(1  cpu, IPI_PREEMPT);
  +   if (td != curthread)
  +   ipi_selected(1  ts-ts_cpu, IPI_PREEMPT);
   #endif
   }
 
 I will test this 

Re: sched_pin() bug in SCHED_ULE

2010-08-31 Thread mdf
I recorded the stack any time ts-ts_cpu was set and when a thread was
migrated by sched_switch() I printed out the recorded info.  Here's
what I found:


XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
[1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000:
#0 0x802b36b4 at bug67957+0x84
#1 0x802b5dd4 at sched_affinity+0xd4
#2 0x8024a707 at cpuset_setthread+0x137
#3 0x8024aeae at cpuset_setaffinity+0x21e
#4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
#5 0x80295f49 at isi_syscall+0x99
#6 0x804a630e at ia32_syscall+0x1ce
#7 0x8046dc60 at Xint0x80_syscall+0x60
[0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000:
#0 0x802b36b4 at bug67957+0x84
#1 0x802b4ad8 at sched_add+0xe8
#2 0x8029b96a at create_thread+0x34a
#3 0x8029badc at kern_thr_new+0x8c
#4 0x804a8912 at freebsd32_thr_new+0x122
#5 0x80295f49 at isi_syscall+0x99
#6 0x804a630e at ia32_syscall+0x1ce
#7 0x8046dc60 at Xint0x80_syscall+0x60

So one thread in the process called cpuset_setaffinity(2), and another
thread in the process was forcibly migrated by the IPI while returning
from a syscall, while it had td_pinned set.

Given this path, it seems reasonable to me to skip the migrate if we
notice THREAD_CAN_MIGRATE is false.

Opinions?  My debug code is below.  I'll try to write a short testcase
that exhibits this bug.

Thanks,
matthew


Index: kern/sched_ule.c
===
--- kern/sched_ule.c(revision 158580)
+++ kern/sched_ule.c(working copy)
@@ -697,6 +697,41 @@
return;
 }

+static void
+bug67957(struct thread *td)
+{
+   int idx;
+
+   THREAD_LOCK_ASSERT(td, MA_OWNED);
+   idx = (td-xxx_idx++ % 5);
+   stack_save(td-xxx[idx].td_preempt);
+   td-xxx[idx].td_moveto = td-td_sched-ts_cpu;
+   td-xxx[idx].td_movefrom = (td-td_oncpu == NOCPU) ? td-td_lastcpu
: td-td_oncpu;
+   td-xxx[idx].td_statewas = td-td_state;
+   td-xxx[idx].td_pinned = td-td_pinned;
+   td-xxx[idx].td_by = curthread;
+}
+
+static void
+pr_bug67957(struct thread *td, int idx)
+{
+   int idx, i;
+
+   printf(XXX bug 67957: moving %p from %d to %d\n,
+   td, td-td_lastcpu, td-td_sched-ts_cpu);
+   for (i = 0, idx = td-xxx_idx - 1;
+   i  5  idx = 0;
+   i++, idx--) {
+   printf([%d]: pin %d state %d move %d - %d done by %p:\n,
+   idx, td-xxx[idx % 5].td_pinned,
+   td-xxx[idx % 5].td_statewas,
+   td-xxx[idx % 5].td_movefrom,
+   td-xxx[idx % 5].td_moveto,
+   td-xxx[idx % 5].td_by);
+   stack_print_ddb(td-xxx[idx % 5].td_preempt);
+   }
+}
+
 /*
  * Move a thread from one thread queue to another.
  */
@@ -739,6 +774,7 @@
TDQ_UNLOCK(from);
sched_rem(td);
ts-ts_cpu = cpu;
+   bug67957(td);
td-td_lock = TDQ_LOCKPTR(to);
tdq_add(to, td, SRQ_YIELDING);
 }
@@ -971,6 +1007,7 @@
tdq = TDQ_CPU(cpu);
td = ts-ts_thread;
ts-ts_cpu = cpu;
+   bug67957(td);

/* If the lock matches just return the queue. */
if (td-td_lock == TDQ_LOCKPTR(tdq))
@@ -1890,8 +1964,15 @@
SRQ_OURSELF|SRQ_YIELDING;
if (ts-ts_cpu == cpuid)
tdq_add(tdq, td, srqflag);
-   else
+   else {
+   if (!THREAD_CAN_MIGRATE(td) 
+   (ts-ts_flags  TSF_BOUND) == 0) {
+   pr_bug67957(td, idx);
+   panic(XXX);
+   }
mtx = sched_switch_migrate(tdq, td, srqflag);
+   }
+   td-xxx_idx = 0;
} else {
/* This thread must be going to sleep. */
TDQ_LOCK(tdq);
@@ -2479,8 +2560,10 @@
 * target cpu.
 */
if (td-td_priority = PRI_MAX_ITHD  THREAD_CAN_MIGRATE(td) 
-   curthread-td_intr_nesting_level)
+   curthread-td_intr_nesting_level) {
ts-ts_cpu = cpuid;
+   bug67957(td);
+   }
if (!THREAD_CAN_MIGRATE(td))
cpu = ts-ts_cpu;
else
@@ -2590,6 +2673,7 @@
 */
cpu = ts-ts_cpu;
ts-ts_cpu = sched_pickcpu(td, 0);
+   bug67957(td);
if (cpu != PCPU_GET(cpuid))
ipi_selected(1  cpu, IPI_PREEMPT);
 #endif
@@ -2613,6 +2697,7 @@
if (PCPU_GET(cpuid) == cpu)
return;
ts-ts_cpu = cpu;
+   bug67957(td);
/* When we return from mi_switch we'll be on the correct cpu. */
mi_switch(SW_VOL, NULL);
 #endif
Index: sys/proc.h
===
--- sys/proc.h  (revision 158580)
+++ sys/proc.h  (working copy)
@@ -68,6 +68,8 @@
 #include sys/isi_mountroot.h
 #include 

Re: sched_pin() bug in SCHED_ULE

2010-08-31 Thread mdf
On Tue, Aug 31, 2010 at 10:16 AM,  m...@freebsd.org wrote:
 I recorded the stack any time ts-ts_cpu was set and when a thread was
 migrated by sched_switch() I printed out the recorded info.  Here's
 what I found:


 XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000:
 #0 0x802b36b4 at bug67957+0x84
 #1 0x802b5dd4 at sched_affinity+0xd4
 #2 0x8024a707 at cpuset_setthread+0x137
 #3 0x8024aeae at cpuset_setaffinity+0x21e
 #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
 #5 0x80295f49 at isi_syscall+0x99
 #6 0x804a630e at ia32_syscall+0x1ce
 #7 0x8046dc60 at Xint0x80_syscall+0x60
 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000:
 #0 0x802b36b4 at bug67957+0x84
 #1 0x802b4ad8 at sched_add+0xe8
 #2 0x8029b96a at create_thread+0x34a
 #3 0x8029badc at kern_thr_new+0x8c
 #4 0x804a8912 at freebsd32_thr_new+0x122
 #5 0x80295f49 at isi_syscall+0x99
 #6 0x804a630e at ia32_syscall+0x1ce
 #7 0x8046dc60 at Xint0x80_syscall+0x60

 So one thread in the process called cpuset_setaffinity(2), and another
 thread in the process was forcibly migrated by the IPI while returning
 from a syscall, while it had td_pinned set.

 Given this path, it seems reasonable to me to skip the migrate if we
 notice THREAD_CAN_MIGRATE is false.

 Opinions?  My debug code is below.  I'll try to write a short testcase
 that exhibits this bug.

Just a few more thoughts on this.  The check in sched_affinity for
THREAD_CAN_MIGRATE is racy.  Since witness uses sched_pin, it's not
simple to take the THREAD lock around an increment of td_pinned.  So
I'm looking for suggestions on the best way to fix this issue.  My
thoughts:

1) add a check in sched_switch() for THREAD_CAN_MIGRATE
2) have WITNESS not use sched_pin, and take the THREAD lock when
modifying td_pinned
3) have the IPI_PREEMPT handler notice the thread is pinned (and not
trying to bind) and postpone the mi_switch, just like it postpones
when a thread is in a critical section.

Except for the potential complexity of implementation, I think (2) is
the best solution.

For those who want to play at home, I have a small test program that
exhibits this behavior at
http://people.freebsd.org/~mdf/cpu_affinity_test.c.  It seems to
require 4 or more CPUs to hit the assert.  You'll also need to patch
the kernel to assert when migrating a pinned thread:

Index: kern/sched_ule.c
===
--- kern/sched_ule.c(revision 158580)
+++ kern/sched_ule.c(working copy)
@@ -1888,11 +1889,26 @@ sched_switch(struct thread *td, struct t
srqflag = (flags  SW_PREEMPT) ?
SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
SRQ_OURSELF|SRQ_YIELDING;
if (ts-ts_cpu == cpuid)
tdq_add(tdq, td, srqflag);
-   else
+   else {
+   KASSERT(THREAD_CAN_MIGRATE(td) ||
+   (ts-ts_flags  TSF_BOUND) != 0,
+   (Thread %p shouldn't migrate!, td));
mtx = sched_switch_migrate(tdq, td, srqflag);
+   }
} else {
/* This thread must be going to sleep. */
TDQ_LOCK(tdq);
mtx = thread_lock_block(td);

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sched_pin() bug in SCHED_ULE

2010-08-26 Thread John Baldwin
On Thursday, August 26, 2010 4:03:38 pm m...@freebsd.org wrote:
 Back at the beginning of August I posted about issues with sched_pin()
 and witness_warn():
 
 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html
 
 After a lot of debugging I think I've basically found the issue.  I
 found this bug on stable/7, but looking at the commit log for CURRENT
 I don't see any reason the bug doesn't exist there.  This analysis is
 specific to amd64/i386 but the problem is likely to exist on most
 arches.
 
 The basic problem is that in both tdq_move() and sched_setcpu() can
 manipulate the ts-ts_cpu variable of another process, which may be
 running.  This means that a process running on CPU N may be set to run
 on CPU M the next context switch.
 
 Then, that process may call sched_pin(), then grab a PCPU variable.
 An IPI_PREEMPT can come in, causing the thread to call mi_switch() in
 ipi_bitmap_handler().  sched_switch() will then notice that ts-ts_cpu
 is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the
 thread to the intended CPU.  Thus after sched_pin() and potentially
 before using any PCPU variable the process can get migrated to another
 CPU, and now it is using a PCPU variable for the wrong processor.
 
 Given that ts_cpu can be set by other threads, it doesn't seem worth
 checking at sched_pin time whether ts_cpu is not the same as td_oncpu,
 because to make the check would require taking the thread_lock.
 Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before
 calling sched_switch_migrate().  However, sched_pin() is also used by
 sched_bind(9) to force the thread to stay on the intended cpu.  I
 would handle this by adding a TSF_BINDING state that is different from
 TSF_BOUND.
 
 The thing that has me wondering whether this is all correct is that I
 don't see any checks in tdq_move() or sched_setcpu() for either
 TSF_BOUND or THREAD_CAN_MIGRATE().  Perhaps that logic is managed in
 the various calling functions; in that case I would propose adding
 asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked
 TSF_BINDING.
 
 Does this analysis seem correct?

Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it is 
not safe to call that on anything but curthread as td_pinned is not locked.  
It is assumed that only curthread would ever check td_pinned, not other 
threads.  I suspect what is happening is that another CPU is seeing a stale 
value of td_pinned.  You could fix this by grabbing the thread lock in 
sched_pin()/unpin() for ULE, but that is a bit expensive perhaps.

You could test your theory by disabling thread stealing btw.  There are a few 
sysctl knobs to turn it off.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sched_pin() bug in SCHED_ULE

2010-08-26 Thread mdf
On Thu, Aug 26, 2010 at 1:49 PM, John Baldwin j...@freebsd.org wrote:
 On Thursday, August 26, 2010 4:03:38 pm m...@freebsd.org wrote:
 Back at the beginning of August I posted about issues with sched_pin()
 and witness_warn():

 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html

 After a lot of debugging I think I've basically found the issue.  I
 found this bug on stable/7, but looking at the commit log for CURRENT
 I don't see any reason the bug doesn't exist there.  This analysis is
 specific to amd64/i386 but the problem is likely to exist on most
 arches.

 The basic problem is that in both tdq_move() and sched_setcpu() can
 manipulate the ts-ts_cpu variable of another process, which may be
 running.  This means that a process running on CPU N may be set to run
 on CPU M the next context switch.

 Then, that process may call sched_pin(), then grab a PCPU variable.
 An IPI_PREEMPT can come in, causing the thread to call mi_switch() in
 ipi_bitmap_handler().  sched_switch() will then notice that ts-ts_cpu
 is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the
 thread to the intended CPU.  Thus after sched_pin() and potentially
 before using any PCPU variable the process can get migrated to another
 CPU, and now it is using a PCPU variable for the wrong processor.

 Given that ts_cpu can be set by other threads, it doesn't seem worth
 checking at sched_pin time whether ts_cpu is not the same as td_oncpu,
 because to make the check would require taking the thread_lock.
 Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before
 calling sched_switch_migrate().  However, sched_pin() is also used by
 sched_bind(9) to force the thread to stay on the intended cpu.  I
 would handle this by adding a TSF_BINDING state that is different from
 TSF_BOUND.

 The thing that has me wondering whether this is all correct is that I
 don't see any checks in tdq_move() or sched_setcpu() for either
 TSF_BOUND or THREAD_CAN_MIGRATE().  Perhaps that logic is managed in
 the various calling functions; in that case I would propose adding
 asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked
 TSF_BINDING.

 Does this analysis seem correct?

 Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it is
 not safe to call that on anything but curthread as td_pinned is not locked.
 It is assumed that only curthread would ever check td_pinned, not other
 threads.  I suspect what is happening is that another CPU is seeing a stale
 value of td_pinned.  You could fix this by grabbing the thread lock in
 sched_pin()/unpin() for ULE, but that is a bit expensive perhaps.

I tried making sched_pin() a real function which used
intr_disable/intr_restore around saving off td-td_oncpu,
td-td_lastcpu and ts-ts_cpu, and the stack at the time of call.  In
sched_switch when I saw an unexpected migration I printed all that
out.  I found that on my boxes, at sched_pin() time ts_cpu was already
different from td-td_oncpu, so the specific problem I was having was
that while another thread can change ts_cpu it has no way to force
that thread to immediately migrate.

I believe the below patch fixes the issue, though I'm open to other methods:


Index: kern/sched_ule.c
===
--- kern/sched_ule.c(.../head/src/sys)  (revision 158279)
+++ kern/sched_ule.c(.../branches/BR_BUG_67957/src/sys) (revision 
158279)
@@ -112,6 +112,7 @@
 /* flags kept in ts_flags */
 #defineTSF_BOUND   0x0001  /* Thread can not migrate. */
 #defineTSF_XFERABLE0x0002  /* Thread was added as 
transferable. */
+#defineTSF_BINDING 0x0004  /* Thread is being bound. */

 static struct td_sched td_sched0;

@@ -1859,6 +1858,7 @@
struct mtx *mtx;
int srqflag;
int cpuid;
+   int do_switch;

THREAD_LOCK_ASSERT(td, MA_OWNED);

@@ -1888,10 +1888,21 @@
srqflag = (flags  SW_PREEMPT) ?
SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
SRQ_OURSELF|SRQ_YIELDING;
-   if (ts-ts_cpu == cpuid)
+   /*
+* Allow the switch to another processor as requested
+* only if the thread can migrate or we are in the
+* middle of binding for sched_bind(9).  This keeps
+* sched_pin() quick, since other threads can
+* manipulate ts_cpu.
+*/
+   do_switch = (ts-ts_cpu != cpuid);
+   if (!THREAD_CAN_MIGRATE(td) 
+   (ts-ts_flags  TSF_BINDING) == 0)
+   do_switch = 0;
+   if (do_switch)
+   mtx = sched_switch_migrate(tdq, td, srqflag);
+   else
tdq_add(tdq, td, srqflag);
-   else
-   mtx = sched_switch_migrate(tdq, td, srqflag);
} else {

Re: sched_pin() bug in SCHED_ULE

2010-08-26 Thread Andriy Gapon
on 27/08/2010 00:20 m...@freebsd.org said the following:
 
 I tried making sched_pin() a real function which used
 intr_disable/intr_restore around saving off td-td_oncpu,
 td-td_lastcpu and ts-ts_cpu, and the stack at the time of call.  In
 sched_switch when I saw an unexpected migration I printed all that
 out.  I found that on my boxes, at sched_pin() time ts_cpu was already
 different from td-td_oncpu, so the specific problem I was having was
 that while another thread can change ts_cpu it has no way to force
 that thread to immediately migrate.

Like e.g. in sched_affinity where ts_cpu is first changed and then the old cpu
is ipi-ed?

 I believe the below patch fixes the issue, though I'm open to other methods:
 
 
 Index: kern/sched_ule.c
 ===
 --- kern/sched_ule.c  (.../head/src/sys)  (revision 158279)
 +++ kern/sched_ule.c  (.../branches/BR_BUG_67957/src/sys) (revision 
 158279)
 @@ -112,6 +112,7 @@
  /* flags kept in ts_flags */
  #define  TSF_BOUND   0x0001  /* Thread can not migrate. */
  #define  TSF_XFERABLE0x0002  /* Thread was added as 
 transferable. */
 +#define  TSF_BINDING 0x0004  /* Thread is being bound. */

I don't really follow why TSF_BINDING is needed, i.e. why TSF_BOUND is not
sufficient in this case?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sched_pin() bug in SCHED_ULE

2010-08-26 Thread mdf
On Thu, Aug 26, 2010 at 3:10 PM, Andriy Gapon a...@icyb.net.ua wrote:
 on 27/08/2010 00:20 m...@freebsd.org said the following:

 I tried making sched_pin() a real function which used
 intr_disable/intr_restore around saving off td-td_oncpu,
 td-td_lastcpu and ts-ts_cpu, and the stack at the time of call.  In
 sched_switch when I saw an unexpected migration I printed all that
 out.  I found that on my boxes, at sched_pin() time ts_cpu was already
 different from td-td_oncpu, so the specific problem I was having was
 that while another thread can change ts_cpu it has no way to force
 that thread to immediately migrate.

 Like e.g. in sched_affinity where ts_cpu is first changed and then the old cpu
 is ipi-ed?

That could do it.  I didn't follow the stack of the places that were
touching ts_cpu for non-curthread.

 I believe the below patch fixes the issue, though I'm open to other methods:


 Index: kern/sched_ule.c
 ===
 --- kern/sched_ule.c  (.../head/src/sys)      (revision 158279)
 +++ kern/sched_ule.c  (.../branches/BR_BUG_67957/src/sys)     (revision 
 158279)
 @@ -112,6 +112,7 @@
  /* flags kept in ts_flags */
  #define      TSF_BOUND       0x0001          /* Thread can not migrate. */
  #define      TSF_XFERABLE    0x0002          /* Thread was added as 
 transferable. */
 +#define      TSF_BINDING     0x0004          /* Thread is being bound. */

 I don't really follow why TSF_BINDING is needed, i.e. why TSF_BOUND is not
 sufficient in this case?

I wanted to tighten up the asserts, so that the only time it was okay
for a td_pinned thread to migrate was the one time it was moving to
the target cpu.  Having a separate flag allows that.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org