Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Thursday, 7 of February 2008, Ingo Molnar wrote: > > * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ > > > > > > on top of sched-devel. > > > > Indeed, with these patches applied the issue is not reproducible any > > more. > > great! I've queued Peter's fixes and enhancements up in sched-devel. > (not pushed out yet) Could you please tell me what's happening to these patches? The regression is still present in the current mainline and it would be a pity if -rc1 went out with it. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Thursday, 7 of February 2008, Ingo Molnar wrote: * Rafael J. Wysocki [EMAIL PROTECTED] wrote: http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ on top of sched-devel. Indeed, with these patches applied the issue is not reproducible any more. great! I've queued Peter's fixes and enhancements up in sched-devel. (not pushed out yet) Could you please tell me what's happening to these patches? The regression is still present in the current mainline and it would be a pity if -rc1 went out with it. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Thu, 2008-02-07 at 20:53 +0100, Rafael J. Wysocki wrote: > On Thursday, 7 of February 2008, Ingo Molnar wrote: > > > > * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ > > > > > > > > on top of sched-devel. > > > > > > Indeed, with these patches applied the issue is not reproducible any > > > more. > > > > great! I've queued Peter's fixes and enhancements up in sched-devel. > > (not pushed out yet) > > However, these patches break my second testbed (dual -core AMD desktop), > with the attached config. > > It crashes on boot with the following trace: Ah, that is the last patch - which I didn't intent to push out. Let me remove that - its me fighting the group scheduling latencies reported. Its not lined up for merging. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Thursday, 7 of February 2008, Ingo Molnar wrote: > > * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ > > > > > > on top of sched-devel. > > > > Indeed, with these patches applied the issue is not reproducible any > > more. > > great! I've queued Peter's fixes and enhancements up in sched-devel. > (not pushed out yet) However, these patches break my second testbed (dual -core AMD desktop), with the attached config. It crashes on boot with the following trace: Using local APIC timer interrupts. Detected 12.500 MHz APIC timer. lockdep: fixing up alternatives. Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 4400.18 BogoMIPS (lpj=8800361) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping 01 Brought up 2 CPUs divide error: [1] SMP CPU 0 Modules linked in: Pid: 2, comm: kthreadd Not tainted 2.6.24 #41 RIP: 0010:[] [] place_entity+0xa4/0xc0 RSP: :81007f885d20 EFLAGS: 00010046 RAX: 000321067800 RBX: 81007f8d22b8 RCX: RDX: RSI: 00c8419e RDI: 0c31 RBP: 81007f885d30 R08: R09: 0001 R10: cf3cf3cf3cf3cf3d R11: R12: fff0 R13: R14: R15: 810002c32300 FS: () GS:8067f000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process kthreadd (pid: 2, threadinfo 81007f884000, task 81007f882080) Stack: 81007f8d2280 810002c2e880 81007f885d70 802340b3 81007f885d70 81007f8d2280 810002c32300 0009 81007f885da0 80234afc Call Trace: [] task_new_fair+0x83/0xf0 [] wake_up_new_task+0xac/0xc0 [] do_fork+0x1fe/0x320 [] ? __lock_acquire+0x251/0x1100 [] kernel_thread+0x81/0xde [] ? kthread+0x0/0x80 [] ? child_rip+0x0/0x12 [] ? kthreadd+0x154/0x190 [] child_rip+0xa/0x12 [] ? restore_args+0x0/0x31 [] ? kthreadd+0x0/0x190 [] ? child_rip+0x0/0x12 Code: 0f 1f 80 00 00 00 00 48 8b 81 30 01 00 00 48 39 cb 48 8b 10 74 2c 48 89 f0 89 d2 48 8b 89 28 01 00 00 48 c1 e0 0a 49 89 d0 31 d2 <49> f7 f0 48 8 RIP [] place_entity+0xa4/0xc0 RSP ---[ end trace ca143223eefdc828 ]--- kthreadd used greatest stack depth: 5424 bytes left where (gdb) l *place_entity+0xa4 0x8022a024 is in place_entity (/home/rafael/src/linux-2.6/kernel/sched_fair.c:390). 385 weight = cfs_rq->load.weight; 386 if (se == new) 387 weight += new->load.weight; 388 389 vslice *= NICE_0_LOAD; 390 do_div(vslice, weight); 391 } 392 393 return vslice; 394 } (gdb) so it looks like "weight" is 0. Peter? Thanks, Rafael # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24 # Thu Feb 7 19:08:31 2008 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y # CONFIG_QUICKLIST is not set CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ZONE_DMA32=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_X86_SMP=y CONFIG_X86_64_SMP=y CONFIG_X86_TRAMPOLINE=y # CONFIG_KTIME_SCALAR is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y CONFIG_IKCONFIG=y
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Thursday, 7 of February 2008, Ingo Molnar wrote: * Rafael J. Wysocki [EMAIL PROTECTED] wrote: http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ on top of sched-devel. Indeed, with these patches applied the issue is not reproducible any more. great! I've queued Peter's fixes and enhancements up in sched-devel. (not pushed out yet) However, these patches break my second testbed (dual -core AMD desktop), with the attached config. It crashes on boot with the following trace: Using local APIC timer interrupts. Detected 12.500 MHz APIC timer. lockdep: fixing up alternatives. Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 4400.18 BogoMIPS (lpj=8800361) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping 01 Brought up 2 CPUs divide error: [1] SMP CPU 0 Modules linked in: Pid: 2, comm: kthreadd Not tainted 2.6.24 #41 RIP: 0010:[8022a024] [8022a024] place_entity+0xa4/0xc0 RSP: :81007f885d20 EFLAGS: 00010046 RAX: 000321067800 RBX: 81007f8d22b8 RCX: RDX: RSI: 00c8419e RDI: 0c31 RBP: 81007f885d30 R08: R09: 0001 R10: cf3cf3cf3cf3cf3d R11: R12: fff0 R13: R14: R15: 810002c32300 FS: () GS:8067f000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process kthreadd (pid: 2, threadinfo 81007f884000, task 81007f882080) Stack: 81007f8d2280 810002c2e880 81007f885d70 802340b3 81007f885d70 81007f8d2280 810002c32300 0009 81007f885da0 80234afc Call Trace: [802340b3] task_new_fair+0x83/0xf0 [80234afc] wake_up_new_task+0xac/0xc0 [802372ee] do_fork+0x1fe/0x320 [8025d641] ? __lock_acquire+0x251/0x1100 [8020c531] kernel_thread+0x81/0xde [8024eb00] ? kthread+0x0/0x80 [8020c58e] ? child_rip+0x0/0x12 [8024ecd4] ? kthreadd+0x154/0x190 [8020c598] child_rip+0xa/0x12 [8020bcaf] ? restore_args+0x0/0x31 [8024eb80] ? kthreadd+0x0/0x190 [8020c58e] ? child_rip+0x0/0x12 Code: 0f 1f 80 00 00 00 00 48 8b 81 30 01 00 00 48 39 cb 48 8b 10 74 2c 48 89 f0 89 d2 48 8b 89 28 01 00 00 48 c1 e0 0a 49 89 d0 31 d2 49 f7 f0 48 8 RIP [8022a024] place_entity+0xa4/0xc0 RSP 81007f885d20 ---[ end trace ca143223eefdc828 ]--- kthreadd used greatest stack depth: 5424 bytes left where (gdb) l *place_entity+0xa4 0x8022a024 is in place_entity (/home/rafael/src/linux-2.6/kernel/sched_fair.c:390). 385 weight = cfs_rq-load.weight; 386 if (se == new) 387 weight += new-load.weight; 388 389 vslice *= NICE_0_LOAD; 390 do_div(vslice, weight); 391 } 392 393 return vslice; 394 } (gdb) so it looks like weight is 0. Peter? Thanks, Rafael # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24 # Thu Feb 7 19:08:31 2008 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y # CONFIG_QUICKLIST is not set CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ZONE_DMA32=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_X86_SMP=y CONFIG_X86_64_SMP=y CONFIG_X86_TRAMPOLINE=y # CONFIG_KTIME_SCALAR is not set CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Thu, 2008-02-07 at 20:53 +0100, Rafael J. Wysocki wrote: On Thursday, 7 of February 2008, Ingo Molnar wrote: * Rafael J. Wysocki [EMAIL PROTECTED] wrote: http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ on top of sched-devel. Indeed, with these patches applied the issue is not reproducible any more. great! I've queued Peter's fixes and enhancements up in sched-devel. (not pushed out yet) However, these patches break my second testbed (dual -core AMD desktop), with the attached config. It crashes on boot with the following trace: Ah, that is the last patch - which I didn't intent to push out. Let me remove that - its me fighting the group scheduling latencies reported. Its not lined up for merging. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ > > > > on top of sched-devel. > > Indeed, with these patches applied the issue is not reproducible any > more. great! I've queued Peter's fixes and enhancements up in sched-devel. (not pushed out yet) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote: > > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > > > > > On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: > > > > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > > > > > > > Well, that whole queue. > > > > > > > > It doesn't compile for me. > > > > > > I did solve some compile issues since posting, Ingo should have the > > > compiling version in sched-devel soonish (don't know if he pushed it > > > already). > > > > Can you point me to the cleaned up version, please? > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ > > on top of sched-devel. Indeed, with these patches applied the issue is not reproducible any more. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote: > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > > > On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: > > > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > > > > > Well, that whole queue. > > > > > > It doesn't compile for me. > > > > I did solve some compile issues since posting, Ingo should have the > > compiling version in sched-devel soonish (don't know if he pushed it > > already). > > Can you point me to the cleaned up version, please? http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ on top of sched-devel. > > > > Your test program just failed to obtain realtime scheduling > > > > > > Well, it shouldn't. The expected result is to obtain realtime scheduling > > > or we will break existing setups. > > > > Thats a case of wrong expectations in my book. You enabled group > > scheduling and hence behaviour changes. > > So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior? With the mainline stuff, with the new stuff just ensure CONFIG_RT_GROUP_SCHED=n. > > There is just nothing much one can do about it, if you don't assign > > bandwidth > > to a group, it won't be able to run anything. Better to refuse to run, than > > to sit > > idle, right? > > As a general rule, probably yes. > > > But I appreciate the situation, therefore I made the whole rt-group > > scheduling a separate .config option (which defaults to n) > > Which is introduced by the new patches, isn't it? Yes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: > > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > > > Well, that whole queue. > > > > It doesn't compile for me. > > I did solve some compile issues since posting, Ingo should have the > compiling version in sched-devel soonish (don't know if he pushed it > already). Can you point me to the cleaned up version, please? > > > Your test program just failed to obtain realtime scheduling > > > > Well, it shouldn't. The expected result is to obtain realtime scheduling > > or we will break existing setups. > > Thats a case of wrong expectations in my book. You enabled group > scheduling and hence behaviour changes. So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior? > There is just nothing much one can do about it, if you don't assign bandwidth > to a group, it won't be able to run anything. Better to refuse to run, than > to sit > idle, right? As a general rule, probably yes. > But I appreciate the situation, therefore I made the whole rt-group > scheduling a separate .config option (which defaults to n) Which is introduced by the new patches, isn't it? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > Well, that whole queue. > > It doesn't compile for me. I did solve some compile issues since posting, Ingo should have the compiling version in sched-devel soonish (don't know if he pushed it already). > > Your test program just failed to obtain realtime scheduling > > Well, it shouldn't. The expected result is to obtain realtime scheduling > or we will break existing setups. Thats a case of wrong expectations in my book. You enabled group scheduling and hence behaviour changes. There is just nothing much one can do about it, if you don't assign bandwidth to a group, it won't be able to run anything. Better to refuse to run, than to sit idle, right? But I appreciate the situation, therefore I made the whole rt-group scheduling a separate .config option (which defaults to n) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote: > > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > > > > > On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote: > > > > On 06/02/2008, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > > On Tuesday, 5 of February 2008, Dmitry Adamushko wrote: > > > > > > Rafael, any progress with this issue? (a few questions below). > > > > > > > > > > > > > > > > > > > > > > Does this artsmessage thing also run with RT priority? > > > > > > > > > > > > > > Well, it's in a strange state (after it's broken). From top: > > > > > > > > > > > > > > PR = -51 > > > > > > > NI = 0 > > > > > > > S = R > > > > > > > %CPU = 0.0 > > > > > > > %MEM = 0.0 > > > > > > > > > > > > cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? > > > > > > cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug > > > > > > > > > > Well, instead please find appended a test program that allows me to > > > > > trigger > > > > > the issue. > > > > > > > > Great, I'll look at this problem in the everning (sure, if nobody else > > > > is faster :-). > > > > > > Yeah, it seems fixed here (after I made my current queue compile for > > > this silly CONFIG_USER_SCHED thing again). > > > > > > I'm now refusing realtime tasks in groups that do not have real-time > > > bandwidth assigned. > > > > If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then > > sorry, but it doesn't fix the issue for me, with the attached config. > > Well, that whole queue. It doesn't compile for me. > Your test program just failed to obtain realtime scheduling Well, it shouldn't. The expected result is to obtain realtime scheduling or we will break existing setups. > but didn't hang. That's much better outcome. :-) > I'll test your full config. OK, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote: > On Wednesday, 6 of February 2008, Peter Zijlstra wrote: > > > > On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote: > > > On 06/02/2008, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > On Tuesday, 5 of February 2008, Dmitry Adamushko wrote: > > > > > Rafael, any progress with this issue? (a few questions below). > > > > > > > > > > > > > > > > > > > Does this artsmessage thing also run with RT priority? > > > > > > > > > > > > Well, it's in a strange state (after it's broken). From top: > > > > > > > > > > > > PR = -51 > > > > > > NI = 0 > > > > > > S = R > > > > > > %CPU = 0.0 > > > > > > %MEM = 0.0 > > > > > > > > > > cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? > > > > > cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug > > > > > > > > Well, instead please find appended a test program that allows me to > > > > trigger > > > > the issue. > > > > > > Great, I'll look at this problem in the everning (sure, if nobody else > > > is faster :-). > > > > Yeah, it seems fixed here (after I made my current queue compile for > > this silly CONFIG_USER_SCHED thing again). > > > > I'm now refusing realtime tasks in groups that do not have real-time > > bandwidth assigned. > > If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then > sorry, but it doesn't fix the issue for me, with the attached config. Well, that whole queue. Your test program just failed to obtain realtime scheduling but didn't hang. I'll test your full config. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote: On 06/02/2008, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Tuesday, 5 of February 2008, Dmitry Adamushko wrote: Rafael, any progress with this issue? (a few questions below). Does this artsmessage thing also run with RT priority? Well, it's in a strange state (after it's broken). From top: PR = -51 NI = 0 S = R %CPU = 0.0 %MEM = 0.0 cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug Well, instead please find appended a test program that allows me to trigger the issue. Great, I'll look at this problem in the everning (sure, if nobody else is faster :-). Yeah, it seems fixed here (after I made my current queue compile for this silly CONFIG_USER_SCHED thing again). I'm now refusing realtime tasks in groups that do not have real-time bandwidth assigned. If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then sorry, but it doesn't fix the issue for me, with the attached config. Well, that whole queue. Your test program just failed to obtain realtime scheduling but didn't hang. I'll test your full config. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wednesday, 6 of February 2008, Peter Zijlstra wrote: On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote: On 06/02/2008, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Tuesday, 5 of February 2008, Dmitry Adamushko wrote: Rafael, any progress with this issue? (a few questions below). Does this artsmessage thing also run with RT priority? Well, it's in a strange state (after it's broken). From top: PR = -51 NI = 0 S = R %CPU = 0.0 %MEM = 0.0 cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug Well, instead please find appended a test program that allows me to trigger the issue. Great, I'll look at this problem in the everning (sure, if nobody else is faster :-). Yeah, it seems fixed here (after I made my current queue compile for this silly CONFIG_USER_SCHED thing again). I'm now refusing realtime tasks in groups that do not have real-time bandwidth assigned. If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then sorry, but it doesn't fix the issue for me, with the attached config. Well, that whole queue. It doesn't compile for me. Your test program just failed to obtain realtime scheduling Well, it shouldn't. The expected result is to obtain realtime scheduling or we will break existing setups. but didn't hang. That's much better outcome. :-) I'll test your full config. OK, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: Well, that whole queue. It doesn't compile for me. I did solve some compile issues since posting, Ingo should have the compiling version in sched-devel soonish (don't know if he pushed it already). Your test program just failed to obtain realtime scheduling Well, it shouldn't. The expected result is to obtain realtime scheduling or we will break existing setups. Thats a case of wrong expectations in my book. You enabled group scheduling and hence behaviour changes. There is just nothing much one can do about it, if you don't assign bandwidth to a group, it won't be able to run anything. Better to refuse to run, than to sit idle, right? But I appreciate the situation, therefore I made the whole rt-group scheduling a separate .config option (which defaults to n) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wednesday, 6 of February 2008, Peter Zijlstra wrote: On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: Well, that whole queue. It doesn't compile for me. I did solve some compile issues since posting, Ingo should have the compiling version in sched-devel soonish (don't know if he pushed it already). Can you point me to the cleaned up version, please? Your test program just failed to obtain realtime scheduling Well, it shouldn't. The expected result is to obtain realtime scheduling or we will break existing setups. Thats a case of wrong expectations in my book. You enabled group scheduling and hence behaviour changes. So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior? There is just nothing much one can do about it, if you don't assign bandwidth to a group, it won't be able to run anything. Better to refuse to run, than to sit idle, right? As a general rule, probably yes. But I appreciate the situation, therefore I made the whole rt-group scheduling a separate .config option (which defaults to n) Which is introduced by the new patches, isn't it? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: Well, that whole queue. It doesn't compile for me. I did solve some compile issues since posting, Ingo should have the compiling version in sched-devel soonish (don't know if he pushed it already). Can you point me to the cleaned up version, please? http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ on top of sched-devel. Your test program just failed to obtain realtime scheduling Well, it shouldn't. The expected result is to obtain realtime scheduling or we will break existing setups. Thats a case of wrong expectations in my book. You enabled group scheduling and hence behaviour changes. So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior? With the mainline stuff, with the new stuff just ensure CONFIG_RT_GROUP_SCHED=n. There is just nothing much one can do about it, if you don't assign bandwidth to a group, it won't be able to run anything. Better to refuse to run, than to sit idle, right? As a general rule, probably yes. But I appreciate the situation, therefore I made the whole rt-group scheduling a separate .config option (which defaults to n) Which is introduced by the new patches, isn't it? Yes. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Wednesday, 6 of February 2008, Peter Zijlstra wrote: On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote: On Wednesday, 6 of February 2008, Peter Zijlstra wrote: Well, that whole queue. It doesn't compile for me. I did solve some compile issues since posting, Ingo should have the compiling version in sched-devel soonish (don't know if he pushed it already). Can you point me to the cleaned up version, please? http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ on top of sched-devel. Indeed, with these patches applied the issue is not reproducible any more. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
* Rafael J. Wysocki [EMAIL PROTECTED] wrote: http://programming.kicks-ass.net/kernel-patches/sched-rt-group/ on top of sched-devel. Indeed, with these patches applied the issue is not reproducible any more. great! I've queued Peter's fixes and enhancements up in sched-devel. (not pushed out yet) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Tuesday, 5 of February 2008, Dmitry Adamushko wrote: > Rafael, any progress with this issue? (a few questions below). > > > > > > > Does this artsmessage thing also run with RT priority? > > > > Well, it's in a strange state (after it's broken). From top: > > > > PR = -51 > > NI = 0 > > S = R > > %CPU = 0.0 > > %MEM = 0.0 > > cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? > cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug Well, instead please find appended a test program that allows me to trigger the issue. To reproduce it do: $ gcc -o break_scheduler break_scheduler.c $ su - [...] # chown root.root $PATH_TO_BINARY/break_scheduler # chmod u+s $PATH_TO_BINARY/break_scheduler ^D $ ./break_scheduler It behaves normally if run directly by root and it also behaves normally if the execv() at the end is removed. Hope that helps to understand what the problem is. Thanks, Rafael --- #include #include #include #include #include #include #include #include #define EXECUTE "/bin/ls" void adjust_priority() { int sched = sched_getscheduler(0); if(sched == SCHED_FIFO || sched == SCHED_RR) { puts(">> non-standard scheduling policy"); } else { struct sched_param sp; long int priority = (sched_get_priority_max(SCHED_FIFO) + sched_get_priority_min(SCHED_FIFO))/2; sp.sched_priority = priority; if (sched_setscheduler(0, SCHED_FIFO, ) != -1) { printf(">> running as realtime process now " "(priority %ld)\n", priority); } else { /* can't set realtime priority */ puts(">> could not set realtime priority"); } } } int main(int argc, char **argv) { adjust_priority(); /* drop root privileges if running setuid root (due to realtime priority stuff) */ if (geteuid() != getuid()) { seteuid(getuid()); if (geteuid() != getuid()) { perror("setuid()"); return 2; } } puts("OK"); if(argc == 0) return 1; argv[0] = EXECUTE; execv(EXECUTE, argv); perror(EXECUTE); return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
Rafael, any progress with this issue? (a few questions below). > > > > Does this artsmessage thing also run with RT priority? > > Well, it's in a strange state (after it's broken). From top: > > PR = -51 > NI = 0 > S = R > %CPU = 0.0 > %MEM = 0.0 cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug and also cat /proc/$PID/stat taken when this task still looks 'sane' :-/ Do you mean that only SCHED_NORMAL tasks can't run on this cpu or RT tasks of high prio as well (so that maybe the scheduler is in inconsistent state) ? If the latter, the watchdog should have triggered after a while (if enabled), I guess. > > Here's the corresponding trace from sysrq+t: > > [] ? try_to_wake_up+0x77/0x200 > [] __cond_resched+0x2d/0x60 > [] _cond_resched+0x31/0x40 > [] wait_for_common+0x34/0x170 > [] ? try_to_wake_up+0x77/0x200 > [] wait_for_completion+0x18/0x20 > [ ... ] does a stack trace always look like this? -- Best regards, Dmitry Adamushko -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
Rafael, any progress with this issue? (a few questions below). Does this artsmessage thing also run with RT priority? Well, it's in a strange state (after it's broken). From top: PR = -51 NI = 0 S = R %CPU = 0.0 %MEM = 0.0 cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug and also cat /proc/$PID/stat taken when this task still looks 'sane' :-/ Do you mean that only SCHED_NORMAL tasks can't run on this cpu or RT tasks of high prio as well (so that maybe the scheduler is in inconsistent state) ? If the latter, the watchdog should have triggered after a while (if enabled), I guess. Here's the corresponding trace from sysrq+t: [8022fdb7] ? try_to_wake_up+0x77/0x200 [8023573d] __cond_resched+0x2d/0x60 [804ddce1] _cond_resched+0x31/0x40 [804ddd24] wait_for_common+0x34/0x170 [8022fdb7] ? try_to_wake_up+0x77/0x200 [804ddec8] wait_for_completion+0x18/0x20 [ ... ] does a stack trace always look like this? -- Best regards, Dmitry Adamushko -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Tuesday, 5 of February 2008, Dmitry Adamushko wrote: Rafael, any progress with this issue? (a few questions below). Does this artsmessage thing also run with RT priority? Well, it's in a strange state (after it's broken). From top: PR = -51 NI = 0 S = R %CPU = 0.0 %MEM = 0.0 cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ? cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug Well, instead please find appended a test program that allows me to trigger the issue. To reproduce it do: $ gcc -o break_scheduler break_scheduler.c $ su - [...] # chown root.root $PATH_TO_BINARY/break_scheduler # chmod u+s $PATH_TO_BINARY/break_scheduler ^D $ ./break_scheduler It behaves normally if run directly by root and it also behaves normally if the execv() at the end is removed. Hope that helps to understand what the problem is. Thanks, Rafael --- #include stdio.h #include sys/stat.h #include sys/resource.h #include unistd.h #include stdlib.h #include string.h #include stdlib.h #include sched.h #define EXECUTE /bin/ls void adjust_priority() { int sched = sched_getscheduler(0); if(sched == SCHED_FIFO || sched == SCHED_RR) { puts( non-standard scheduling policy); } else { struct sched_param sp; long int priority = (sched_get_priority_max(SCHED_FIFO) + sched_get_priority_min(SCHED_FIFO))/2; sp.sched_priority = priority; if (sched_setscheduler(0, SCHED_FIFO, sp) != -1) { printf( running as realtime process now (priority %ld)\n, priority); } else { /* can't set realtime priority */ puts( could not set realtime priority); } } } int main(int argc, char **argv) { adjust_priority(); /* drop root privileges if running setuid root (due to realtime priority stuff) */ if (geteuid() != getuid()) { seteuid(getuid()); if (geteuid() != getuid()) { perror(setuid()); return 2; } } puts(OK); if(argc == 0) return 1; argv[0] = EXECUTE; execv(EXECUTE, argv); perror(EXECUTE); return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Friday, 1 of February 2008, Peter Zijlstra wrote: > > On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote: > > On Friday, 1 of February 2008, Peter Zijlstra wrote: > > > > > It arts run as root, or does it use RLIMIT_RTPRIO to allow users to > > > > execute realtime tasks? > > > > artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used. > > Still, artswrapper is running as a regular user, so it most probably drops > > privileges early. > > > > BTW, it fails while running the artsmessage utility used for displaying arts > > error messages, so I guess there's an error in arts that this thing tries to > > display and deadlocks (or something like that). > > > > Should I test the patch nevertheless? > > Don't think that would help any in this situation. The thing to look out > for are RT tasks running with a different uid than 0. > > This patch would only stop a task from obtaining RT class scheduling > when already in a (misconfigured) group. If the task is RT and then > switches group another - similar - thing is needed. > > Does this artsmessage thing also run with RT priority? Well, it's in a strange state (after it's broken). From top: PR = -51 NI = 0 S = R %CPU = 0.0 %MEM = 0.0 Here's the corresponding trace from sysrq+t: artswrapper R running task 5128 5776 1 81007a8dbd88 0046 00015c4321b0 81006aa6e5c8 806daa00 806daa00 806daa00 806daa00 806daa00 806daa00 806d7a60 806daa00 Call Trace: [] ? try_to_wake_up+0x77/0x200 [] __cond_resched+0x2d/0x60 [] _cond_resched+0x31/0x40 [] wait_for_common+0x34/0x170 [] ? try_to_wake_up+0x77/0x200 [] wait_for_completion+0x18/0x20 [] sched_exec+0xba/0xf0 [] do_execve+0x64/0x220 [] sys_execve+0x46/0x70 [] stub_execve+0x67/0xb0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote: > On Friday, 1 of February 2008, Peter Zijlstra wrote: > > > It arts run as root, or does it use RLIMIT_RTPRIO to allow users to > > > execute realtime tasks? > > artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used. > Still, artswrapper is running as a regular user, so it most probably drops > privileges early. > > BTW, it fails while running the artsmessage utility used for displaying arts > error messages, so I guess there's an error in arts that this thing tries to > display and deadlocks (or something like that). > > Should I test the patch nevertheless? Don't think that would help any in this situation. The thing to look out for are RT tasks running with a different uid than 0. This patch would only stop a task from obtaining RT class scheduling when already in a (misconfigured) group. If the task is RT and then switches group another - similar - thing is needed. Does this artsmessage thing also run with RT priority? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Friday, 1 of February 2008, Peter Zijlstra wrote: > > On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote: > > On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote: > > > On Friday, 1 of February 2008, Rafael J. Wysocki wrote: > > > > Hi, > > > > > > > > This is related to the problem I reported earlier this week: > > > > http://lkml.org/lkml/2008/1/30/554 > > > > > > > > Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time > > > > priority, > > > > is mishandled by the scheduler. The problem is that after the user > > > > logs out, > > > > artswrapper stays in TASK_RUNNING forever and prevents other tasks from > > > > being > > > > scheduled on the CPU occupied by it. In this state it also breaks > > > > suspend and > > > > hibernation (it cannot be frozen). > > > > > > > > Since the problem is 100% reproducible on my test boxes, I carried out a > > > > bisection which turned out the following commit: > > > > > > > > commit 6f505b16425a51270058e4a93441fe64de3dd435 > > > > Author: Peter Zijlstra <[EMAIL PROTECTED]> > > > > Date: Fri Jan 25 21:08:30 2008 +0100 > > > > > > > > sched: rt group scheduling > > > > > > > > I'm now checking if the problem disappears after reverting this patch > > > > (along a > > > > couple of dependent ones). > > > > > > Yes, it does. > > > > > > Please let me know what I can do to debug it further. > > > > It arts run as root, or does it use RLIMIT_RTPRIO to allow users to > > execute realtime tasks? artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used. Still, artswrapper is running as a regular user, so it most probably drops privileges early. BTW, it fails while running the artsmessage utility used for displaying arts error messages, so I guess there's an error in arts that this thing tries to display and deadlocks (or something like that). Should I test the patch nevertheless? > If the latter, does this help: > > diff --git a/kernel/sched.c b/kernel/sched.c > index ba4c880..bb76cbc 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -4563,6 +4563,15 @@ recheck: > return -EPERM; > } > > +#ifdef CONFIG_FAIR_GROUP_SCHED > + /* > + * Do not allow realtime tasks into groups that have no runtime > + * assigned. > + */ > + if (rt_policy(policy) && task_group(p)->rt_ratio == 0) > + return -EPERM; > +#endif > + > retval = security_task_setscheduler(p, policy, param); > if (retval) > return retval; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote: > On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote: > > On Friday, 1 of February 2008, Rafael J. Wysocki wrote: > > > Hi, > > > > > > This is related to the problem I reported earlier this week: > > > http://lkml.org/lkml/2008/1/30/554 > > > > > > Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time > > > priority, > > > is mishandled by the scheduler. The problem is that after the user logs > > > out, > > > artswrapper stays in TASK_RUNNING forever and prevents other tasks from > > > being > > > scheduled on the CPU occupied by it. In this state it also breaks > > > suspend and > > > hibernation (it cannot be frozen). > > > > > > Since the problem is 100% reproducible on my test boxes, I carried out a > > > bisection which turned out the following commit: > > > > > > commit 6f505b16425a51270058e4a93441fe64de3dd435 > > > Author: Peter Zijlstra <[EMAIL PROTECTED]> > > > Date: Fri Jan 25 21:08:30 2008 +0100 > > > > > > sched: rt group scheduling > > > > > > I'm now checking if the problem disappears after reverting this patch > > > (along a > > > couple of dependent ones). > > > > Yes, it does. > > > > Please let me know what I can do to debug it further. > > It arts run as root, or does it use RLIMIT_RTPRIO to allow users to > execute realtime tasks? > If the latter, does this help: diff --git a/kernel/sched.c b/kernel/sched.c index ba4c880..bb76cbc 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4563,6 +4563,15 @@ recheck: return -EPERM; } +#ifdef CONFIG_FAIR_GROUP_SCHED + /* +* Do not allow realtime tasks into groups that have no runtime +* assigned. +*/ + if (rt_policy(policy) && task_group(p)->rt_ratio == 0) + return -EPERM; +#endif + retval = security_task_setscheduler(p, policy, param); if (retval) return retval; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote: On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote: On Friday, 1 of February 2008, Rafael J. Wysocki wrote: Hi, This is related to the problem I reported earlier this week: http://lkml.org/lkml/2008/1/30/554 Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time priority, is mishandled by the scheduler. The problem is that after the user logs out, artswrapper stays in TASK_RUNNING forever and prevents other tasks from being scheduled on the CPU occupied by it. In this state it also breaks suspend and hibernation (it cannot be frozen). Since the problem is 100% reproducible on my test boxes, I carried out a bisection which turned out the following commit: commit 6f505b16425a51270058e4a93441fe64de3dd435 Author: Peter Zijlstra [EMAIL PROTECTED] Date: Fri Jan 25 21:08:30 2008 +0100 sched: rt group scheduling I'm now checking if the problem disappears after reverting this patch (along a couple of dependent ones). Yes, it does. Please let me know what I can do to debug it further. It arts run as root, or does it use RLIMIT_RTPRIO to allow users to execute realtime tasks? If the latter, does this help: diff --git a/kernel/sched.c b/kernel/sched.c index ba4c880..bb76cbc 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4563,6 +4563,15 @@ recheck: return -EPERM; } +#ifdef CONFIG_FAIR_GROUP_SCHED + /* +* Do not allow realtime tasks into groups that have no runtime +* assigned. +*/ + if (rt_policy(policy) task_group(p)-rt_ratio == 0) + return -EPERM; +#endif + retval = security_task_setscheduler(p, policy, param); if (retval) return retval; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Friday, 1 of February 2008, Peter Zijlstra wrote: On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote: On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote: On Friday, 1 of February 2008, Rafael J. Wysocki wrote: Hi, This is related to the problem I reported earlier this week: http://lkml.org/lkml/2008/1/30/554 Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time priority, is mishandled by the scheduler. The problem is that after the user logs out, artswrapper stays in TASK_RUNNING forever and prevents other tasks from being scheduled on the CPU occupied by it. In this state it also breaks suspend and hibernation (it cannot be frozen). Since the problem is 100% reproducible on my test boxes, I carried out a bisection which turned out the following commit: commit 6f505b16425a51270058e4a93441fe64de3dd435 Author: Peter Zijlstra [EMAIL PROTECTED] Date: Fri Jan 25 21:08:30 2008 +0100 sched: rt group scheduling I'm now checking if the problem disappears after reverting this patch (along a couple of dependent ones). Yes, it does. Please let me know what I can do to debug it further. It arts run as root, or does it use RLIMIT_RTPRIO to allow users to execute realtime tasks? artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used. Still, artswrapper is running as a regular user, so it most probably drops privileges early. BTW, it fails while running the artsmessage utility used for displaying arts error messages, so I guess there's an error in arts that this thing tries to display and deadlocks (or something like that). Should I test the patch nevertheless? If the latter, does this help: diff --git a/kernel/sched.c b/kernel/sched.c index ba4c880..bb76cbc 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4563,6 +4563,15 @@ recheck: return -EPERM; } +#ifdef CONFIG_FAIR_GROUP_SCHED + /* + * Do not allow realtime tasks into groups that have no runtime + * assigned. + */ + if (rt_policy(policy) task_group(p)-rt_ratio == 0) + return -EPERM; +#endif + retval = security_task_setscheduler(p, policy, param); if (retval) return retval; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote: On Friday, 1 of February 2008, Peter Zijlstra wrote: It arts run as root, or does it use RLIMIT_RTPRIO to allow users to execute realtime tasks? artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used. Still, artswrapper is running as a regular user, so it most probably drops privileges early. BTW, it fails while running the artsmessage utility used for displaying arts error messages, so I guess there's an error in arts that this thing tries to display and deadlocks (or something like that). Should I test the patch nevertheless? Don't think that would help any in this situation. The thing to look out for are RT tasks running with a different uid than 0. This patch would only stop a task from obtaining RT class scheduling when already in a (misconfigured) group. If the task is RT and then switches group another - similar - thing is needed. Does this artsmessage thing also run with RT priority? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Friday, 1 of February 2008, Peter Zijlstra wrote: On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote: On Friday, 1 of February 2008, Peter Zijlstra wrote: It arts run as root, or does it use RLIMIT_RTPRIO to allow users to execute realtime tasks? artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used. Still, artswrapper is running as a regular user, so it most probably drops privileges early. BTW, it fails while running the artsmessage utility used for displaying arts error messages, so I guess there's an error in arts that this thing tries to display and deadlocks (or something like that). Should I test the patch nevertheless? Don't think that would help any in this situation. The thing to look out for are RT tasks running with a different uid than 0. This patch would only stop a task from obtaining RT class scheduling when already in a (misconfigured) group. If the task is RT and then switches group another - similar - thing is needed. Does this artsmessage thing also run with RT priority? Well, it's in a strange state (after it's broken). From top: PR = -51 NI = 0 S = R %CPU = 0.0 %MEM = 0.0 Here's the corresponding trace from sysrq+t: artswrapper R running task 5128 5776 1 81007a8dbd88 0046 00015c4321b0 81006aa6e5c8 806daa00 806daa00 806daa00 806daa00 806daa00 806daa00 806d7a60 806daa00 Call Trace: [8022fdb7] ? try_to_wake_up+0x77/0x200 [8023573d] __cond_resched+0x2d/0x60 [804ddce1] _cond_resched+0x31/0x40 [804ddd24] wait_for_common+0x34/0x170 [8022fdb7] ? try_to_wake_up+0x77/0x200 [804ddec8] wait_for_completion+0x18/0x20 [80235aba] sched_exec+0xba/0xf0 [802b5a64] do_execve+0x64/0x220 [802097c6] sys_execve+0x46/0x70 [8020bab7] stub_execve+0x67/0xb0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote: > On Friday, 1 of February 2008, Rafael J. Wysocki wrote: > > Hi, > > > > This is related to the problem I reported earlier this week: > > http://lkml.org/lkml/2008/1/30/554 > > > > Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time > > priority, > > is mishandled by the scheduler. The problem is that after the user logs > > out, > > artswrapper stays in TASK_RUNNING forever and prevents other tasks from > > being > > scheduled on the CPU occupied by it. In this state it also breaks suspend > > and > > hibernation (it cannot be frozen). > > > > Since the problem is 100% reproducible on my test boxes, I carried out a > > bisection which turned out the following commit: > > > > commit 6f505b16425a51270058e4a93441fe64de3dd435 > > Author: Peter Zijlstra <[EMAIL PROTECTED]> > > Date: Fri Jan 25 21:08:30 2008 +0100 > > > > sched: rt group scheduling > > > > I'm now checking if the problem disappears after reverting this patch > > (along a > > couple of dependent ones). > > Yes, it does. > > Please let me know what I can do to debug it further. It arts run as root, or does it use RLIMIT_RTPRIO to allow users to execute realtime tasks? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Friday, 1 of February 2008, Rafael J. Wysocki wrote: > Hi, > > This is related to the problem I reported earlier this week: > http://lkml.org/lkml/2008/1/30/554 > > Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time priority, > is mishandled by the scheduler. The problem is that after the user logs out, > artswrapper stays in TASK_RUNNING forever and prevents other tasks from being > scheduled on the CPU occupied by it. In this state it also breaks suspend and > hibernation (it cannot be frozen). > > Since the problem is 100% reproducible on my test boxes, I carried out a > bisection which turned out the following commit: > > commit 6f505b16425a51270058e4a93441fe64de3dd435 > Author: Peter Zijlstra <[EMAIL PROTECTED]> > Date: Fri Jan 25 21:08:30 2008 +0100 > > sched: rt group scheduling > > I'm now checking if the problem disappears after reverting this patch (along a > couple of dependent ones). Yes, it does. Please let me know what I can do to debug it further. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
Hi, This is related to the problem I reported earlier this week: http://lkml.org/lkml/2008/1/30/554 Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time priority, is mishandled by the scheduler. The problem is that after the user logs out, artswrapper stays in TASK_RUNNING forever and prevents other tasks from being scheduled on the CPU occupied by it. In this state it also breaks suspend and hibernation (it cannot be frozen). Since the problem is 100% reproducible on my test boxes, I carried out a bisection which turned out the following commit: commit 6f505b16425a51270058e4a93441fe64de3dd435 Author: Peter Zijlstra <[EMAIL PROTECTED]> Date: Fri Jan 25 21:08:30 2008 +0100 sched: rt group scheduling I'm now checking if the problem disappears after reverting this patch (along a couple of dependent ones). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)
On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote: On Friday, 1 of February 2008, Rafael J. Wysocki wrote: Hi, This is related to the problem I reported earlier this week: http://lkml.org/lkml/2008/1/30/554 Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time priority, is mishandled by the scheduler. The problem is that after the user logs out, artswrapper stays in TASK_RUNNING forever and prevents other tasks from being scheduled on the CPU occupied by it. In this state it also breaks suspend and hibernation (it cannot be frozen). Since the problem is 100% reproducible on my test boxes, I carried out a bisection which turned out the following commit: commit 6f505b16425a51270058e4a93441fe64de3dd435 Author: Peter Zijlstra [EMAIL PROTECTED] Date: Fri Jan 25 21:08:30 2008 +0100 sched: rt group scheduling I'm now checking if the problem disappears after reverting this patch (along a couple of dependent ones). Yes, it does. Please let me know what I can do to debug it further. It arts run as root, or does it use RLIMIT_RTPRIO to allow users to execute realtime tasks? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/