Re: sched_pin() bug in SCHED_ULE
On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote: On Tue, Aug 31, 2010 at 10:16 AM, m...@freebsd.org wrote: I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Just a few more thoughts on this. The check in sched_affinity for THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not simple to take the THREAD lock around an increment of td_pinned. So I'm looking for suggestions on the best way to fix this issue. My thoughts: 1) add a check in sched_switch() for THREAD_CAN_MIGRATE 2) have WITNESS not use sched_pin, and take the THREAD lock when modifying td_pinned 3) have the IPI_PREEMPT handler notice the thread is pinned (and not trying to bind) and postpone the mi_switch, just like it postpones when a thread is in a critical section. Except for the potential complexity of implementation, I think (2) is the best solution. I don't like 2) only because you'd really need to fix all the other places that use sched_pin() to grab the thread lock. That would be a bit expensive. I think the problem is code that checks THREAD_CAN_MIGRATE() for running threads that aren't curthread. The sched_affinity() bug is probably my fault FWIW. I think something along the lines of 1) is the best approach. Maybe something like the patch below. It moves the CPU assignment out of sched_affinity() and moves it into sched_switch() instead so that it takes effect on the next context switch for this thread. That is the point at which the CPU binding actually takes effect anyway since sched_affinity() doesn't force an immediate context switch. This patch is relative to 7. The patch for 9 is nearly identical (just a fixup for ipi_cpu() vs ipi_selected()). Index: sched_ule.c === --- sched_ule.c (revision 212065) +++ sched_ule.c (working copy) @@ -1885,6 +1885,8 @@ srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; + if (THREAD_CAN_MIGRATE(td) !THREAD_CAN_SCHED(td, ts-tscpu)) + ts-ts_cpu = sched_pickcpu(td, 0); if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); else @@ -2536,7 +2538,6 @@ { #ifdef SMP struct td_sched *ts; - int cpu; THREAD_LOCK_ASSERT(td, MA_OWNED); ts = td-td_sched; @@ -2550,17 +2551,13 @@ if (!TD_IS_RUNNING(td)) return; td-td_flags |= TDF_NEEDRESCHED; - if (!THREAD_CAN_MIGRATE(td)) - return; /* -* Assign the new cpu and force a switch before returning to -* userspace. If the target thread is not running locally send -* an ipi to force the issue. +* Force a switch before returning to userspace. If the +* target thread is not running locally send an ipi to force +* the issue. */ - cpu = ts-ts_cpu; - ts-ts_cpu = sched_pickcpu(td, 0); - if (cpu != PCPU_GET(cpuid)) - ipi_selected(1 cpu, IPI_PREEMPT); + if (td != curthread) + ipi_selected(1 ts-ts_cpu, IPI_PREEMPT); #endif } -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sched_pin() bug in SCHED_ULE
On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin j...@freebsd.org wrote: On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote: On Tue, Aug 31, 2010 at 10:16 AM, m...@freebsd.org wrote: I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Just a few more thoughts on this. The check in sched_affinity for THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not simple to take the THREAD lock around an increment of td_pinned. So I'm looking for suggestions on the best way to fix this issue. My thoughts: 1) add a check in sched_switch() for THREAD_CAN_MIGRATE 2) have WITNESS not use sched_pin, and take the THREAD lock when modifying td_pinned 3) have the IPI_PREEMPT handler notice the thread is pinned (and not trying to bind) and postpone the mi_switch, just like it postpones when a thread is in a critical section. Except for the potential complexity of implementation, I think (2) is the best solution. I don't like 2) only because you'd really need to fix all the other places that use sched_pin() to grab the thread lock. That would be a bit expensive. I think the problem is code that checks THREAD_CAN_MIGRATE() for running threads that aren't curthread. The sched_affinity() bug is probably my fault FWIW. I think something along the lines of 1) is the best approach. Maybe something like the patch below. It moves the CPU assignment out of sched_affinity() and moves it into sched_switch() instead so that it takes effect on the next context switch for this thread. That is the point at which the CPU binding actually takes effect anyway since sched_affinity() doesn't force an immediate context switch. This patch is relative to 7. The patch for 9 is nearly identical (just a fixup for ipi_cpu() vs ipi_selected()). Index: sched_ule.c === --- sched_ule.c (revision 212065) +++ sched_ule.c (working copy) @@ -1885,6 +1885,8 @@ srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; + if (THREAD_CAN_MIGRATE(td) !THREAD_CAN_SCHED(td, ts-tscpu)) + ts-ts_cpu = sched_pickcpu(td, 0); if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); else @@ -2536,7 +2538,6 @@ { #ifdef SMP struct td_sched *ts; - int cpu; THREAD_LOCK_ASSERT(td, MA_OWNED); ts = td-td_sched; @@ -2550,17 +2551,13 @@ if (!TD_IS_RUNNING(td)) return; td-td_flags |= TDF_NEEDRESCHED; - if (!THREAD_CAN_MIGRATE(td)) - return; /* - * Assign the new cpu and force a switch before returning to - * userspace. If the target thread is not running locally send - * an ipi to force the issue. + * Force a switch before returning to userspace. If the + * target thread is not running locally send an ipi to force + * the issue. */ - cpu = ts-ts_cpu; - ts-ts_cpu = sched_pickcpu(td, 0); - if (cpu != PCPU_GET(cpuid)) - ipi_selected(1 cpu, IPI_PREEMPT); + if (td != curthread) + ipi_selected(1 ts-ts_cpu, IPI_PREEMPT); #endif } I will test this patch out; thanks for the help! Two questions: 1) How does a thread get moved between CPUs when it's not running? I see that we change the runqueue for non-running threads that are on a
Re: sched_pin() bug in SCHED_ULE
m...@freebsd.org wrote: [snip] I will test this patch out; thanks for the help! Two questions: 1) How does a thread get moved between CPUs when it's not running? I see that we change the runqueue for non-running threads that are on a runqueue. Does the code always check for THREAD_CAN_SCHED when adding a sleeping thread to a runqueue on wakeup? 2) I assume sched_switch() runs a lot more often than sched_pin() or sched_affinity(), so it would make sense to add any performance penalty of increased work in either of those places than in the scheduler. I suppose the two memory references for THREAD_CAN_MIGRATE and THREAD_CAN_SCHED won't be that expensive. sched_pin() gets used a fair amount on i386 for creating temporary mappings in order to avoid the need for system-wide TLB shootdowns. The use cases range from the fast path for pipes to page zeroing. Alan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sched_pin() bug in SCHED_ULE
On Wednesday, September 01, 2010 12:54:13 pm m...@freebsd.org wrote: On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin j...@freebsd.org wrote: On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote: On Tue, Aug 31, 2010 at 10:16 AM, m...@freebsd.org wrote: I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Just a few more thoughts on this. The check in sched_affinity for THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not simple to take the THREAD lock around an increment of td_pinned. So I'm looking for suggestions on the best way to fix this issue. My thoughts: 1) add a check in sched_switch() for THREAD_CAN_MIGRATE 2) have WITNESS not use sched_pin, and take the THREAD lock when modifying td_pinned 3) have the IPI_PREEMPT handler notice the thread is pinned (and not trying to bind) and postpone the mi_switch, just like it postpones when a thread is in a critical section. Except for the potential complexity of implementation, I think (2) is the best solution. I don't like 2) only because you'd really need to fix all the other places that use sched_pin() to grab the thread lock. That would be a bit expensive. I think the problem is code that checks THREAD_CAN_MIGRATE() for running threads that aren't curthread. The sched_affinity() bug is probably my fault FWIW. I think something along the lines of 1) is the best approach. Maybe something like the patch below. It moves the CPU assignment out of sched_affinity() and moves it into sched_switch() instead so that it takes effect on the next context switch for this thread. That is the point at which the CPU binding actually takes effect anyway since sched_affinity() doesn't force an immediate context switch. This patch is relative to 7. The patch for 9 is nearly identical (just a fixup for ipi_cpu() vs ipi_selected()). Index: sched_ule.c === --- sched_ule.c (revision 212065) +++ sched_ule.c (working copy) @@ -1885,6 +1885,8 @@ srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; + if (THREAD_CAN_MIGRATE(td) !THREAD_CAN_SCHED(td, ts-tscpu)) + ts-ts_cpu = sched_pickcpu(td, 0); if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); else @@ -2536,7 +2538,6 @@ { #ifdef SMP struct td_sched *ts; - int cpu; THREAD_LOCK_ASSERT(td, MA_OWNED); ts = td-td_sched; @@ -2550,17 +2551,13 @@ if (!TD_IS_RUNNING(td)) return; td-td_flags |= TDF_NEEDRESCHED; - if (!THREAD_CAN_MIGRATE(td)) - return; /* -* Assign the new cpu and force a switch before returning to -* userspace. If the target thread is not running locally send -* an ipi to force the issue. +* Force a switch before returning to userspace. If the +* target thread is not running locally send an ipi to force +* the issue. */ - cpu = ts-ts_cpu; - ts-ts_cpu = sched_pickcpu(td, 0); - if (cpu != PCPU_GET(cpuid)) - ipi_selected(1 cpu, IPI_PREEMPT); + if (td != curthread) + ipi_selected(1 ts-ts_cpu, IPI_PREEMPT); #endif } I will test this
Re: sched_pin() bug in SCHED_ULE
I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Thanks, matthew Index: kern/sched_ule.c === --- kern/sched_ule.c(revision 158580) +++ kern/sched_ule.c(working copy) @@ -697,6 +697,41 @@ return; } +static void +bug67957(struct thread *td) +{ + int idx; + + THREAD_LOCK_ASSERT(td, MA_OWNED); + idx = (td-xxx_idx++ % 5); + stack_save(td-xxx[idx].td_preempt); + td-xxx[idx].td_moveto = td-td_sched-ts_cpu; + td-xxx[idx].td_movefrom = (td-td_oncpu == NOCPU) ? td-td_lastcpu : td-td_oncpu; + td-xxx[idx].td_statewas = td-td_state; + td-xxx[idx].td_pinned = td-td_pinned; + td-xxx[idx].td_by = curthread; +} + +static void +pr_bug67957(struct thread *td, int idx) +{ + int idx, i; + + printf(XXX bug 67957: moving %p from %d to %d\n, + td, td-td_lastcpu, td-td_sched-ts_cpu); + for (i = 0, idx = td-xxx_idx - 1; + i 5 idx = 0; + i++, idx--) { + printf([%d]: pin %d state %d move %d - %d done by %p:\n, + idx, td-xxx[idx % 5].td_pinned, + td-xxx[idx % 5].td_statewas, + td-xxx[idx % 5].td_movefrom, + td-xxx[idx % 5].td_moveto, + td-xxx[idx % 5].td_by); + stack_print_ddb(td-xxx[idx % 5].td_preempt); + } +} + /* * Move a thread from one thread queue to another. */ @@ -739,6 +774,7 @@ TDQ_UNLOCK(from); sched_rem(td); ts-ts_cpu = cpu; + bug67957(td); td-td_lock = TDQ_LOCKPTR(to); tdq_add(to, td, SRQ_YIELDING); } @@ -971,6 +1007,7 @@ tdq = TDQ_CPU(cpu); td = ts-ts_thread; ts-ts_cpu = cpu; + bug67957(td); /* If the lock matches just return the queue. */ if (td-td_lock == TDQ_LOCKPTR(tdq)) @@ -1890,8 +1964,15 @@ SRQ_OURSELF|SRQ_YIELDING; if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); - else + else { + if (!THREAD_CAN_MIGRATE(td) + (ts-ts_flags TSF_BOUND) == 0) { + pr_bug67957(td, idx); + panic(XXX); + } mtx = sched_switch_migrate(tdq, td, srqflag); + } + td-xxx_idx = 0; } else { /* This thread must be going to sleep. */ TDQ_LOCK(tdq); @@ -2479,8 +2560,10 @@ * target cpu. */ if (td-td_priority = PRI_MAX_ITHD THREAD_CAN_MIGRATE(td) - curthread-td_intr_nesting_level) + curthread-td_intr_nesting_level) { ts-ts_cpu = cpuid; + bug67957(td); + } if (!THREAD_CAN_MIGRATE(td)) cpu = ts-ts_cpu; else @@ -2590,6 +2673,7 @@ */ cpu = ts-ts_cpu; ts-ts_cpu = sched_pickcpu(td, 0); + bug67957(td); if (cpu != PCPU_GET(cpuid)) ipi_selected(1 cpu, IPI_PREEMPT); #endif @@ -2613,6 +2697,7 @@ if (PCPU_GET(cpuid) == cpu) return; ts-ts_cpu = cpu; + bug67957(td); /* When we return from mi_switch we'll be on the correct cpu. */ mi_switch(SW_VOL, NULL); #endif Index: sys/proc.h === --- sys/proc.h (revision 158580) +++ sys/proc.h (working copy) @@ -68,6 +68,8 @@ #include sys/isi_mountroot.h #include
Re: sched_pin() bug in SCHED_ULE
On Tue, Aug 31, 2010 at 10:16 AM, m...@freebsd.org wrote: I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Just a few more thoughts on this. The check in sched_affinity for THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not simple to take the THREAD lock around an increment of td_pinned. So I'm looking for suggestions on the best way to fix this issue. My thoughts: 1) add a check in sched_switch() for THREAD_CAN_MIGRATE 2) have WITNESS not use sched_pin, and take the THREAD lock when modifying td_pinned 3) have the IPI_PREEMPT handler notice the thread is pinned (and not trying to bind) and postpone the mi_switch, just like it postpones when a thread is in a critical section. Except for the potential complexity of implementation, I think (2) is the best solution. For those who want to play at home, I have a small test program that exhibits this behavior at http://people.freebsd.org/~mdf/cpu_affinity_test.c. It seems to require 4 or more CPUs to hit the assert. You'll also need to patch the kernel to assert when migrating a pinned thread: Index: kern/sched_ule.c === --- kern/sched_ule.c(revision 158580) +++ kern/sched_ule.c(working copy) @@ -1888,11 +1889,26 @@ sched_switch(struct thread *td, struct t srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); - else + else { + KASSERT(THREAD_CAN_MIGRATE(td) || + (ts-ts_flags TSF_BOUND) != 0, + (Thread %p shouldn't migrate!, td)); mtx = sched_switch_migrate(tdq, td, srqflag); + } } else { /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx = thread_lock_block(td); Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sched_pin() bug in SCHED_ULE
On Thursday, August 26, 2010 4:03:38 pm m...@freebsd.org wrote: Back at the beginning of August I posted about issues with sched_pin() and witness_warn(): http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html After a lot of debugging I think I've basically found the issue. I found this bug on stable/7, but looking at the commit log for CURRENT I don't see any reason the bug doesn't exist there. This analysis is specific to amd64/i386 but the problem is likely to exist on most arches. The basic problem is that in both tdq_move() and sched_setcpu() can manipulate the ts-ts_cpu variable of another process, which may be running. This means that a process running on CPU N may be set to run on CPU M the next context switch. Then, that process may call sched_pin(), then grab a PCPU variable. An IPI_PREEMPT can come in, causing the thread to call mi_switch() in ipi_bitmap_handler(). sched_switch() will then notice that ts-ts_cpu is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the thread to the intended CPU. Thus after sched_pin() and potentially before using any PCPU variable the process can get migrated to another CPU, and now it is using a PCPU variable for the wrong processor. Given that ts_cpu can be set by other threads, it doesn't seem worth checking at sched_pin time whether ts_cpu is not the same as td_oncpu, because to make the check would require taking the thread_lock. Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before calling sched_switch_migrate(). However, sched_pin() is also used by sched_bind(9) to force the thread to stay on the intended cpu. I would handle this by adding a TSF_BINDING state that is different from TSF_BOUND. The thing that has me wondering whether this is all correct is that I don't see any checks in tdq_move() or sched_setcpu() for either TSF_BOUND or THREAD_CAN_MIGRATE(). Perhaps that logic is managed in the various calling functions; in that case I would propose adding asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked TSF_BINDING. Does this analysis seem correct? Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it is not safe to call that on anything but curthread as td_pinned is not locked. It is assumed that only curthread would ever check td_pinned, not other threads. I suspect what is happening is that another CPU is seeing a stale value of td_pinned. You could fix this by grabbing the thread lock in sched_pin()/unpin() for ULE, but that is a bit expensive perhaps. You could test your theory by disabling thread stealing btw. There are a few sysctl knobs to turn it off. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sched_pin() bug in SCHED_ULE
On Thu, Aug 26, 2010 at 1:49 PM, John Baldwin j...@freebsd.org wrote: On Thursday, August 26, 2010 4:03:38 pm m...@freebsd.org wrote: Back at the beginning of August I posted about issues with sched_pin() and witness_warn(): http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html After a lot of debugging I think I've basically found the issue. I found this bug on stable/7, but looking at the commit log for CURRENT I don't see any reason the bug doesn't exist there. This analysis is specific to amd64/i386 but the problem is likely to exist on most arches. The basic problem is that in both tdq_move() and sched_setcpu() can manipulate the ts-ts_cpu variable of another process, which may be running. This means that a process running on CPU N may be set to run on CPU M the next context switch. Then, that process may call sched_pin(), then grab a PCPU variable. An IPI_PREEMPT can come in, causing the thread to call mi_switch() in ipi_bitmap_handler(). sched_switch() will then notice that ts-ts_cpu is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the thread to the intended CPU. Thus after sched_pin() and potentially before using any PCPU variable the process can get migrated to another CPU, and now it is using a PCPU variable for the wrong processor. Given that ts_cpu can be set by other threads, it doesn't seem worth checking at sched_pin time whether ts_cpu is not the same as td_oncpu, because to make the check would require taking the thread_lock. Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before calling sched_switch_migrate(). However, sched_pin() is also used by sched_bind(9) to force the thread to stay on the intended cpu. I would handle this by adding a TSF_BINDING state that is different from TSF_BOUND. The thing that has me wondering whether this is all correct is that I don't see any checks in tdq_move() or sched_setcpu() for either TSF_BOUND or THREAD_CAN_MIGRATE(). Perhaps that logic is managed in the various calling functions; in that case I would propose adding asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked TSF_BINDING. Does this analysis seem correct? Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it is not safe to call that on anything but curthread as td_pinned is not locked. It is assumed that only curthread would ever check td_pinned, not other threads. I suspect what is happening is that another CPU is seeing a stale value of td_pinned. You could fix this by grabbing the thread lock in sched_pin()/unpin() for ULE, but that is a bit expensive perhaps. I tried making sched_pin() a real function which used intr_disable/intr_restore around saving off td-td_oncpu, td-td_lastcpu and ts-ts_cpu, and the stack at the time of call. In sched_switch when I saw an unexpected migration I printed all that out. I found that on my boxes, at sched_pin() time ts_cpu was already different from td-td_oncpu, so the specific problem I was having was that while another thread can change ts_cpu it has no way to force that thread to immediately migrate. I believe the below patch fixes the issue, though I'm open to other methods: Index: kern/sched_ule.c === --- kern/sched_ule.c(.../head/src/sys) (revision 158279) +++ kern/sched_ule.c(.../branches/BR_BUG_67957/src/sys) (revision 158279) @@ -112,6 +112,7 @@ /* flags kept in ts_flags */ #defineTSF_BOUND 0x0001 /* Thread can not migrate. */ #defineTSF_XFERABLE0x0002 /* Thread was added as transferable. */ +#defineTSF_BINDING 0x0004 /* Thread is being bound. */ static struct td_sched td_sched0; @@ -1859,6 +1858,7 @@ struct mtx *mtx; int srqflag; int cpuid; + int do_switch; THREAD_LOCK_ASSERT(td, MA_OWNED); @@ -1888,10 +1888,21 @@ srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; - if (ts-ts_cpu == cpuid) + /* +* Allow the switch to another processor as requested +* only if the thread can migrate or we are in the +* middle of binding for sched_bind(9). This keeps +* sched_pin() quick, since other threads can +* manipulate ts_cpu. +*/ + do_switch = (ts-ts_cpu != cpuid); + if (!THREAD_CAN_MIGRATE(td) + (ts-ts_flags TSF_BINDING) == 0) + do_switch = 0; + if (do_switch) + mtx = sched_switch_migrate(tdq, td, srqflag); + else tdq_add(tdq, td, srqflag); - else - mtx = sched_switch_migrate(tdq, td, srqflag); } else {
Re: sched_pin() bug in SCHED_ULE
on 27/08/2010 00:20 m...@freebsd.org said the following: I tried making sched_pin() a real function which used intr_disable/intr_restore around saving off td-td_oncpu, td-td_lastcpu and ts-ts_cpu, and the stack at the time of call. In sched_switch when I saw an unexpected migration I printed all that out. I found that on my boxes, at sched_pin() time ts_cpu was already different from td-td_oncpu, so the specific problem I was having was that while another thread can change ts_cpu it has no way to force that thread to immediately migrate. Like e.g. in sched_affinity where ts_cpu is first changed and then the old cpu is ipi-ed? I believe the below patch fixes the issue, though I'm open to other methods: Index: kern/sched_ule.c === --- kern/sched_ule.c (.../head/src/sys) (revision 158279) +++ kern/sched_ule.c (.../branches/BR_BUG_67957/src/sys) (revision 158279) @@ -112,6 +112,7 @@ /* flags kept in ts_flags */ #define TSF_BOUND 0x0001 /* Thread can not migrate. */ #define TSF_XFERABLE0x0002 /* Thread was added as transferable. */ +#define TSF_BINDING 0x0004 /* Thread is being bound. */ I don't really follow why TSF_BINDING is needed, i.e. why TSF_BOUND is not sufficient in this case? -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sched_pin() bug in SCHED_ULE
On Thu, Aug 26, 2010 at 3:10 PM, Andriy Gapon a...@icyb.net.ua wrote: on 27/08/2010 00:20 m...@freebsd.org said the following: I tried making sched_pin() a real function which used intr_disable/intr_restore around saving off td-td_oncpu, td-td_lastcpu and ts-ts_cpu, and the stack at the time of call. In sched_switch when I saw an unexpected migration I printed all that out. I found that on my boxes, at sched_pin() time ts_cpu was already different from td-td_oncpu, so the specific problem I was having was that while another thread can change ts_cpu it has no way to force that thread to immediately migrate. Like e.g. in sched_affinity where ts_cpu is first changed and then the old cpu is ipi-ed? That could do it. I didn't follow the stack of the places that were touching ts_cpu for non-curthread. I believe the below patch fixes the issue, though I'm open to other methods: Index: kern/sched_ule.c === --- kern/sched_ule.c (.../head/src/sys) (revision 158279) +++ kern/sched_ule.c (.../branches/BR_BUG_67957/src/sys) (revision 158279) @@ -112,6 +112,7 @@ /* flags kept in ts_flags */ #define TSF_BOUND 0x0001 /* Thread can not migrate. */ #define TSF_XFERABLE 0x0002 /* Thread was added as transferable. */ +#define TSF_BINDING 0x0004 /* Thread is being bound. */ I don't really follow why TSF_BINDING is needed, i.e. why TSF_BOUND is not sufficient in this case? I wanted to tighten up the asserts, so that the only time it was okay for a td_pinned thread to migrate was the one time it was moving to the target cpu. Having a separate flag allows that. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org