Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-10 Thread Rafael J. Wysocki
On Thursday, 7 of February 2008, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> 
> > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
> > > 
> > > on top of sched-devel.
> > 
> > Indeed, with these patches applied the issue is not reproducible any 
> > more.
> 
> great! I've queued Peter's fixes and enhancements up in sched-devel. 
> (not pushed out yet)

Could you please tell me what's happening to these patches?

The regression is still present in the current mainline and it would be a pity
if -rc1 went out with it.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-10 Thread Rafael J. Wysocki
On Thursday, 7 of February 2008, Ingo Molnar wrote:
 
 * Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
   http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
   
   on top of sched-devel.
  
  Indeed, with these patches applied the issue is not reproducible any 
  more.
 
 great! I've queued Peter's fixes and enhancements up in sched-devel. 
 (not pushed out yet)

Could you please tell me what's happening to these patches?

The regression is still present in the current mainline and it would be a pity
if -rc1 went out with it.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-07 Thread Peter Zijlstra

On Thu, 2008-02-07 at 20:53 +0100, Rafael J. Wysocki wrote:
> On Thursday, 7 of February 2008, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > 
> > > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
> > > > 
> > > > on top of sched-devel.
> > > 
> > > Indeed, with these patches applied the issue is not reproducible any 
> > > more.
> > 
> > great! I've queued Peter's fixes and enhancements up in sched-devel. 
> > (not pushed out yet)
> 
> However, these patches break my second testbed (dual -core AMD desktop),
> with the attached config.
> 
> It crashes on boot with the following trace:

Ah, that is the last patch - which I didn't intent to push out. Let me
remove that - its me fighting the group scheduling latencies reported.

Its not lined up for merging.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-07 Thread Rafael J. Wysocki
On Thursday, 7 of February 2008, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> 
> > > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
> > > 
> > > on top of sched-devel.
> > 
> > Indeed, with these patches applied the issue is not reproducible any 
> > more.
> 
> great! I've queued Peter's fixes and enhancements up in sched-devel. 
> (not pushed out yet)

However, these patches break my second testbed (dual -core AMD desktop),
with the attached config.

It crashes on boot with the following trace:

Using local APIC timer interrupts.
Detected 12.500 MHz APIC timer.
lockdep: fixing up alternatives.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 4400.18 BogoMIPS (lpj=8800361)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping 01
Brought up 2 CPUs
divide error:  [1] SMP
CPU 0
Modules linked in:
Pid: 2, comm: kthreadd Not tainted 2.6.24 #41
RIP: 0010:[]  [] place_entity+0xa4/0xc0
RSP: :81007f885d20  EFLAGS: 00010046
RAX: 000321067800 RBX: 81007f8d22b8 RCX: 
RDX:  RSI: 00c8419e RDI: 0c31
RBP: 81007f885d30 R08:  R09: 0001
R10: cf3cf3cf3cf3cf3d R11:  R12: fff0
R13:  R14:  R15: 810002c32300
FS:  () GS:8067f000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process kthreadd (pid: 2, threadinfo 81007f884000, task 81007f882080)
Stack:  81007f8d2280 810002c2e880 81007f885d70 802340b3
 81007f885d70 81007f8d2280 810002c32300 0009
   81007f885da0 80234afc
Call Trace:
 [] task_new_fair+0x83/0xf0
 [] wake_up_new_task+0xac/0xc0
 [] do_fork+0x1fe/0x320
 [] ? __lock_acquire+0x251/0x1100
 [] kernel_thread+0x81/0xde
 [] ? kthread+0x0/0x80
 [] ? child_rip+0x0/0x12
 [] ? kthreadd+0x154/0x190
 [] child_rip+0xa/0x12
 [] ? restore_args+0x0/0x31
 [] ? kthreadd+0x0/0x190
 [] ? child_rip+0x0/0x12


Code: 0f 1f 80 00 00 00 00 48 8b 81 30 01 00 00 48 39 cb 48 8b 10 74 2c 48 89 
f0 89 d2 48 8b 89 28 01 00 00 48 c1 e0 0a 49 89 d0 31 d2 <49> f7 f0 48 8
RIP  [] place_entity+0xa4/0xc0
 RSP 
---[ end trace ca143223eefdc828 ]---
kthreadd used greatest stack depth: 5424 bytes left

where

(gdb) l *place_entity+0xa4
0x8022a024 is in place_entity 
(/home/rafael/src/linux-2.6/kernel/sched_fair.c:390).
385 weight = cfs_rq->load.weight;
386 if (se == new)
387 weight += new->load.weight;
388
389 vslice *= NICE_0_LOAD;
390 do_div(vslice, weight);
391 }
392
393 return vslice;
394 }
(gdb)

so it looks like "weight" is 0.  Peter?

Thanks,
Rafael
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24
# Thu Feb  7 19:08:31 2008
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
# CONFIG_QUICKLIST is not set
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_64_SMP=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y

Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-07 Thread Rafael J. Wysocki
On Thursday, 7 of February 2008, Ingo Molnar wrote:
 
 * Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
   http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
   
   on top of sched-devel.
  
  Indeed, with these patches applied the issue is not reproducible any 
  more.
 
 great! I've queued Peter's fixes and enhancements up in sched-devel. 
 (not pushed out yet)

However, these patches break my second testbed (dual -core AMD desktop),
with the attached config.

It crashes on boot with the following trace:

Using local APIC timer interrupts.
Detected 12.500 MHz APIC timer.
lockdep: fixing up alternatives.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 4400.18 BogoMIPS (lpj=8800361)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping 01
Brought up 2 CPUs
divide error:  [1] SMP
CPU 0
Modules linked in:
Pid: 2, comm: kthreadd Not tainted 2.6.24 #41
RIP: 0010:[8022a024]  [8022a024] place_entity+0xa4/0xc0
RSP: :81007f885d20  EFLAGS: 00010046
RAX: 000321067800 RBX: 81007f8d22b8 RCX: 
RDX:  RSI: 00c8419e RDI: 0c31
RBP: 81007f885d30 R08:  R09: 0001
R10: cf3cf3cf3cf3cf3d R11:  R12: fff0
R13:  R14:  R15: 810002c32300
FS:  () GS:8067f000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process kthreadd (pid: 2, threadinfo 81007f884000, task 81007f882080)
Stack:  81007f8d2280 810002c2e880 81007f885d70 802340b3
 81007f885d70 81007f8d2280 810002c32300 0009
   81007f885da0 80234afc
Call Trace:
 [802340b3] task_new_fair+0x83/0xf0
 [80234afc] wake_up_new_task+0xac/0xc0
 [802372ee] do_fork+0x1fe/0x320
 [8025d641] ? __lock_acquire+0x251/0x1100
 [8020c531] kernel_thread+0x81/0xde
 [8024eb00] ? kthread+0x0/0x80
 [8020c58e] ? child_rip+0x0/0x12
 [8024ecd4] ? kthreadd+0x154/0x190
 [8020c598] child_rip+0xa/0x12
 [8020bcaf] ? restore_args+0x0/0x31
 [8024eb80] ? kthreadd+0x0/0x190
 [8020c58e] ? child_rip+0x0/0x12


Code: 0f 1f 80 00 00 00 00 48 8b 81 30 01 00 00 48 39 cb 48 8b 10 74 2c 48 89 
f0 89 d2 48 8b 89 28 01 00 00 48 c1 e0 0a 49 89 d0 31 d2 49 f7 f0 48 8
RIP  [8022a024] place_entity+0xa4/0xc0
 RSP 81007f885d20
---[ end trace ca143223eefdc828 ]---
kthreadd used greatest stack depth: 5424 bytes left

where

(gdb) l *place_entity+0xa4
0x8022a024 is in place_entity 
(/home/rafael/src/linux-2.6/kernel/sched_fair.c:390).
385 weight = cfs_rq-load.weight;
386 if (se == new)
387 weight += new-load.weight;
388
389 vslice *= NICE_0_LOAD;
390 do_div(vslice, weight);
391 }
392
393 return vslice;
394 }
(gdb)

so it looks like weight is 0.  Peter?

Thanks,
Rafael
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24
# Thu Feb  7 19:08:31 2008
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
# CONFIG_QUICKLIST is not set
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_64_SMP=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y

Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-07 Thread Peter Zijlstra

On Thu, 2008-02-07 at 20:53 +0100, Rafael J. Wysocki wrote:
 On Thursday, 7 of February 2008, Ingo Molnar wrote:
  
  * Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  
http://programming.kicks-ass.net/kernel-patches/sched-rt-group/

on top of sched-devel.
   
   Indeed, with these patches applied the issue is not reproducible any 
   more.
  
  great! I've queued Peter's fixes and enhancements up in sched-devel. 
  (not pushed out yet)
 
 However, these patches break my second testbed (dual -core AMD desktop),
 with the attached config.
 
 It crashes on boot with the following trace:

Ah, that is the last patch - which I didn't intent to push out. Let me
remove that - its me fighting the group scheduling latencies reported.

Its not lined up for merging.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Ingo Molnar

* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:

> > http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
> > 
> > on top of sched-devel.
> 
> Indeed, with these patches applied the issue is not reproducible any 
> more.

great! I've queued Peter's fixes and enhancements up in sched-devel. 
(not pushed out yet)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Rafael J. Wysocki
On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> 
> On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> > > 
> > > On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> > > 
> > > > > Well, that whole queue.
> > > > 
> > > > It doesn't compile for me.
> > > 
> > > I did solve some compile issues since posting, Ingo should have the
> > > compiling version in sched-devel soonish (don't know if he pushed it
> > > already).
> > 
> > Can you point me to the cleaned up version, please?
> 
> http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
> 
> on top of sched-devel.

Indeed, with these patches applied the issue is not reproducible any more.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Peter Zijlstra

On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote:
> On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> > 
> > On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> > 
> > > > Well, that whole queue.
> > > 
> > > It doesn't compile for me.
> > 
> > I did solve some compile issues since posting, Ingo should have the
> > compiling version in sched-devel soonish (don't know if he pushed it
> > already).
> 
> Can you point me to the cleaned up version, please?

http://programming.kicks-ass.net/kernel-patches/sched-rt-group/

on top of sched-devel.

> > > > Your test program just failed to obtain realtime scheduling
> > > 
> > > Well, it shouldn't.  The expected result is to obtain realtime scheduling
> > > or we will break existing setups.
> > 
> > Thats a case of wrong expectations in my book. You enabled group
> > scheduling and hence behaviour changes.
> 
> So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior?

With the mainline stuff, with the new stuff just ensure
CONFIG_RT_GROUP_SCHED=n.

> > There is just nothing much one can do about it, if you don't assign 
> > bandwidth
> > to a group, it won't be able to run anything. Better to refuse to run, than 
> > to sit
> > idle, right? 
> 
> As a general rule, probably yes.
> 
> > But I appreciate the situation, therefore I made the whole rt-group
> > scheduling a separate .config option (which defaults to n)
> 
> Which is introduced by the new patches, isn't it?

Yes.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Rafael J. Wysocki
On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> 
> On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> 
> > > Well, that whole queue.
> > 
> > It doesn't compile for me.
> 
> I did solve some compile issues since posting, Ingo should have the
> compiling version in sched-devel soonish (don't know if he pushed it
> already).

Can you point me to the cleaned up version, please?

> > > Your test program just failed to obtain realtime scheduling
> > 
> > Well, it shouldn't.  The expected result is to obtain realtime scheduling
> > or we will break existing setups.
> 
> Thats a case of wrong expectations in my book. You enabled group
> scheduling and hence behaviour changes.

So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior?

> There is just nothing much one can do about it, if you don't assign bandwidth
> to a group, it won't be able to run anything. Better to refuse to run, than 
> to sit
> idle, right? 

As a general rule, probably yes.

> But I appreciate the situation, therefore I made the whole rt-group
> scheduling a separate .config option (which defaults to n)

Which is introduced by the new patches, isn't it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Peter Zijlstra

On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
> On Wednesday, 6 of February 2008, Peter Zijlstra wrote:

> > Well, that whole queue.
> 
> It doesn't compile for me.

I did solve some compile issues since posting, Ingo should have the
compiling version in sched-devel soonish (don't know if he pushed it
already).

> > Your test program just failed to obtain realtime scheduling
> 
> Well, it shouldn't.  The expected result is to obtain realtime scheduling
> or we will break existing setups.

Thats a case of wrong expectations in my book. You enabled group
scheduling and hence behaviour changes. There is just nothing much one
can do about it, if you don't assign bandwidth to a group, it won't be
able to run anything. Better to refuse to run, than to sit idle, right?

But I appreciate the situation, therefore I made the whole rt-group
scheduling a separate .config option (which defaults to n)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Rafael J. Wysocki
On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> 
> On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> > > 
> > > On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote:
> > > > On 06/02/2008, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > > > > On Tuesday, 5 of February 2008, Dmitry Adamushko wrote:
> > > > > > Rafael, any progress with this issue? (a few questions below).
> > > > > >
> > > > > > > >
> > > > > > > > Does this artsmessage thing also run with RT priority?
> > > > > > >
> > > > > > > Well, it's in a strange state (after it's broken).  From top:
> > > > > > >
> > > > > > > PR = -51
> > > > > > > NI = 0
> > > > > > > S = R
> > > > > > > %CPU = 0.0
> > > > > > > %MEM = 0.0
> > > > > >
> > > > > > cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
> > > > > > cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug
> > > > >
> > > > > Well, instead please find appended a test program that allows me to 
> > > > > trigger
> > > > > the issue.
> > > > 
> > > > Great, I'll look at this problem in the everning (sure, if nobody else
> > > > is faster :-).
> > > 
> > > Yeah, it seems fixed here (after I made my current queue compile for
> > > this silly CONFIG_USER_SCHED thing again).
> > > 
> > > I'm now refusing realtime tasks in groups that do not have real-time
> > > bandwidth assigned.
> > 
> > If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then
> > sorry, but it doesn't fix the issue for me, with the attached config.
> 
> Well, that whole queue.

It doesn't compile for me.

> Your test program just failed to obtain realtime scheduling

Well, it shouldn't.  The expected result is to obtain realtime scheduling
or we will break existing setups.

> but didn't hang. 

That's much better outcome. :-)

> I'll test your full config. 

OK, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Peter Zijlstra

On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote:
> On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
> > 
> > On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote:
> > > On 06/02/2008, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > > > On Tuesday, 5 of February 2008, Dmitry Adamushko wrote:
> > > > > Rafael, any progress with this issue? (a few questions below).
> > > > >
> > > > > > >
> > > > > > > Does this artsmessage thing also run with RT priority?
> > > > > >
> > > > > > Well, it's in a strange state (after it's broken).  From top:
> > > > > >
> > > > > > PR = -51
> > > > > > NI = 0
> > > > > > S = R
> > > > > > %CPU = 0.0
> > > > > > %MEM = 0.0
> > > > >
> > > > > cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
> > > > > cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug
> > > >
> > > > Well, instead please find appended a test program that allows me to 
> > > > trigger
> > > > the issue.
> > > 
> > > Great, I'll look at this problem in the everning (sure, if nobody else
> > > is faster :-).
> > 
> > Yeah, it seems fixed here (after I made my current queue compile for
> > this silly CONFIG_USER_SCHED thing again).
> > 
> > I'm now refusing realtime tasks in groups that do not have real-time
> > bandwidth assigned.
> 
> If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then
> sorry, but it doesn't fix the issue for me, with the attached config.

Well, that whole queue. Your test program just failed to obtain realtime
scheduling but didn't hang. I'll test your full config.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Peter Zijlstra

On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote:
 On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
  
  On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote:
   On 06/02/2008, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
On Tuesday, 5 of February 2008, Dmitry Adamushko wrote:
 Rafael, any progress with this issue? (a few questions below).

  
   Does this artsmessage thing also run with RT priority?
 
  Well, it's in a strange state (after it's broken).  From top:
 
  PR = -51
  NI = 0
  S = R
  %CPU = 0.0
  %MEM = 0.0

 cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
 cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug
   
Well, instead please find appended a test program that allows me to 
trigger
the issue.
   
   Great, I'll look at this problem in the everning (sure, if nobody else
   is faster :-).
  
  Yeah, it seems fixed here (after I made my current queue compile for
  this silly CONFIG_USER_SCHED thing again).
  
  I'm now refusing realtime tasks in groups that do not have real-time
  bandwidth assigned.
 
 If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then
 sorry, but it doesn't fix the issue for me, with the attached config.

Well, that whole queue. Your test program just failed to obtain realtime
scheduling but didn't hang. I'll test your full config.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Rafael J. Wysocki
On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
 
 On Wed, 2008-02-06 at 19:25 +0100, Rafael J. Wysocki wrote:
  On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
   
   On Wed, 2008-02-06 at 09:40 +0100, Dmitry Adamushko wrote:
On 06/02/2008, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 On Tuesday, 5 of February 2008, Dmitry Adamushko wrote:
  Rafael, any progress with this issue? (a few questions below).
 
   
Does this artsmessage thing also run with RT priority?
  
   Well, it's in a strange state (after it's broken).  From top:
  
   PR = -51
   NI = 0
   S = R
   %CPU = 0.0
   %MEM = 0.0
 
  cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
  cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug

 Well, instead please find appended a test program that allows me to 
 trigger
 the issue.

Great, I'll look at this problem in the everning (sure, if nobody else
is faster :-).
   
   Yeah, it seems fixed here (after I made my current queue compile for
   this silly CONFIG_USER_SCHED thing again).
   
   I'm now refusing realtime tasks in groups that do not have real-time
   bandwidth assigned.
  
  If you're referring to this patch: http://lkml.org/lkml/2008/2/4/332 , then
  sorry, but it doesn't fix the issue for me, with the attached config.
 
 Well, that whole queue.

It doesn't compile for me.

 Your test program just failed to obtain realtime scheduling

Well, it shouldn't.  The expected result is to obtain realtime scheduling
or we will break existing setups.

 but didn't hang. 

That's much better outcome. :-)

 I'll test your full config. 

OK, thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Peter Zijlstra

On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
 On Wednesday, 6 of February 2008, Peter Zijlstra wrote:

  Well, that whole queue.
 
 It doesn't compile for me.

I did solve some compile issues since posting, Ingo should have the
compiling version in sched-devel soonish (don't know if he pushed it
already).

  Your test program just failed to obtain realtime scheduling
 
 Well, it shouldn't.  The expected result is to obtain realtime scheduling
 or we will break existing setups.

Thats a case of wrong expectations in my book. You enabled group
scheduling and hence behaviour changes. There is just nothing much one
can do about it, if you don't assign bandwidth to a group, it won't be
able to run anything. Better to refuse to run, than to sit idle, right?

But I appreciate the situation, therefore I made the whole rt-group
scheduling a separate .config option (which defaults to n)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Rafael J. Wysocki
On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
 
 On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
  On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
 
   Well, that whole queue.
  
  It doesn't compile for me.
 
 I did solve some compile issues since posting, Ingo should have the
 compiling version in sched-devel soonish (don't know if he pushed it
 already).

Can you point me to the cleaned up version, please?

   Your test program just failed to obtain realtime scheduling
  
  Well, it shouldn't.  The expected result is to obtain realtime scheduling
  or we will break existing setups.
 
 Thats a case of wrong expectations in my book. You enabled group
 scheduling and hence behaviour changes.

So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior?

 There is just nothing much one can do about it, if you don't assign bandwidth
 to a group, it won't be able to run anything. Better to refuse to run, than 
 to sit
 idle, right? 

As a general rule, probably yes.

 But I appreciate the situation, therefore I made the whole rt-group
 scheduling a separate .config option (which defaults to n)

Which is introduced by the new patches, isn't it?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Peter Zijlstra

On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote:
 On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
  
  On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
   On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
  
Well, that whole queue.
   
   It doesn't compile for me.
  
  I did solve some compile issues since posting, Ingo should have the
  compiling version in sched-devel soonish (don't know if he pushed it
  already).
 
 Can you point me to the cleaned up version, please?

http://programming.kicks-ass.net/kernel-patches/sched-rt-group/

on top of sched-devel.

Your test program just failed to obtain realtime scheduling
   
   Well, it shouldn't.  The expected result is to obtain realtime scheduling
   or we will break existing setups.
  
  Thats a case of wrong expectations in my book. You enabled group
  scheduling and hence behaviour changes.
 
 So, I'd have to unset FAIR_GROUP_SCHED to obtain the previous behavior?

With the mainline stuff, with the new stuff just ensure
CONFIG_RT_GROUP_SCHED=n.

  There is just nothing much one can do about it, if you don't assign 
  bandwidth
  to a group, it won't be able to run anything. Better to refuse to run, than 
  to sit
  idle, right? 
 
 As a general rule, probably yes.
 
  But I appreciate the situation, therefore I made the whole rt-group
  scheduling a separate .config option (which defaults to n)
 
 Which is introduced by the new patches, isn't it?

Yes.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Rafael J. Wysocki
On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
 
 On Wed, 2008-02-06 at 23:18 +0100, Rafael J. Wysocki wrote:
  On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
   
   On Wed, 2008-02-06 at 22:50 +0100, Rafael J. Wysocki wrote:
On Wednesday, 6 of February 2008, Peter Zijlstra wrote:
   
 Well, that whole queue.

It doesn't compile for me.
   
   I did solve some compile issues since posting, Ingo should have the
   compiling version in sched-devel soonish (don't know if he pushed it
   already).
  
  Can you point me to the cleaned up version, please?
 
 http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
 
 on top of sched-devel.

Indeed, with these patches applied the issue is not reproducible any more.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-06 Thread Ingo Molnar

* Rafael J. Wysocki [EMAIL PROTECTED] wrote:

  http://programming.kicks-ass.net/kernel-patches/sched-rt-group/
  
  on top of sched-devel.
 
 Indeed, with these patches applied the issue is not reproducible any 
 more.

great! I've queued Peter's fixes and enhancements up in sched-devel. 
(not pushed out yet)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-05 Thread Rafael J. Wysocki
On Tuesday, 5 of February 2008, Dmitry Adamushko wrote:
> Rafael, any progress with this issue? (a few questions below).
> 
> > >
> > > Does this artsmessage thing also run with RT priority?
> >
> > Well, it's in a strange state (after it's broken).  From top:
> >
> > PR = -51
> > NI = 0
> > S = R
> > %CPU = 0.0
> > %MEM = 0.0
> 
> cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
> cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug

Well, instead please find appended a test program that allows me to trigger
the issue.

To reproduce it do:

$ gcc -o break_scheduler break_scheduler.c
$ su -
[...]
# chown root.root $PATH_TO_BINARY/break_scheduler
# chmod u+s $PATH_TO_BINARY/break_scheduler
^D
$ ./break_scheduler

It behaves normally if run directly by root and it also behaves normally if
the execv() at the end is removed.

Hope that helps to understand what the problem is.

Thanks,
Rafael

---
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define EXECUTE "/bin/ls"

void adjust_priority()
{
int sched = sched_getscheduler(0);

if(sched == SCHED_FIFO || sched == SCHED_RR) {
puts(">> non-standard scheduling policy");
} else {
struct sched_param sp;
long int priority = (sched_get_priority_max(SCHED_FIFO) +
 sched_get_priority_min(SCHED_FIFO))/2;

sp.sched_priority = priority;

if (sched_setscheduler(0, SCHED_FIFO, ) != -1) {
printf(">> running as realtime process now "
"(priority %ld)\n", priority);
} else {
/* can't set realtime priority */
puts(">> could not set realtime priority");
}
}
}

int main(int argc, char **argv)
{
adjust_priority();

/* drop root privileges if running setuid root
   (due to realtime priority stuff) */
if (geteuid() != getuid()) {
seteuid(getuid());
if (geteuid() != getuid()) {
perror("setuid()");
return 2;
}
}

puts("OK");

if(argc == 0)
return 1;

argv[0] = EXECUTE;
execv(EXECUTE, argv);
perror(EXECUTE);

return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-05 Thread Dmitry Adamushko
Rafael, any progress with this issue? (a few questions below).

> >
> > Does this artsmessage thing also run with RT priority?
>
> Well, it's in a strange state (after it's broken).  From top:
>
> PR = -51
> NI = 0
> S = R
> %CPU = 0.0
> %MEM = 0.0

cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug

and also cat /proc/$PID/stat taken when this task still looks 'sane' :-/

Do you mean that only SCHED_NORMAL tasks can't run on this cpu or RT
tasks of high prio as well (so that maybe the scheduler is in
inconsistent state) ?
If the latter, the watchdog should have triggered after a while (if
enabled), I guess.

>
> Here's the corresponding trace from sysrq+t:
>
>  [] ? try_to_wake_up+0x77/0x200
>  [] __cond_resched+0x2d/0x60
>  [] _cond_resched+0x31/0x40
>  [] wait_for_common+0x34/0x170
>  [] ? try_to_wake_up+0x77/0x200
>  [] wait_for_completion+0x18/0x20
> [ ... ]

does a stack trace always look like this?


-- 
Best regards,
Dmitry Adamushko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-05 Thread Dmitry Adamushko
Rafael, any progress with this issue? (a few questions below).

 
  Does this artsmessage thing also run with RT priority?

 Well, it's in a strange state (after it's broken).  From top:

 PR = -51
 NI = 0
 S = R
 %CPU = 0.0
 %MEM = 0.0

cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug

and also cat /proc/$PID/stat taken when this task still looks 'sane' :-/

Do you mean that only SCHED_NORMAL tasks can't run on this cpu or RT
tasks of high prio as well (so that maybe the scheduler is in
inconsistent state) ?
If the latter, the watchdog should have triggered after a while (if
enabled), I guess.


 Here's the corresponding trace from sysrq+t:

  [8022fdb7] ? try_to_wake_up+0x77/0x200
  [8023573d] __cond_resched+0x2d/0x60
  [804ddce1] _cond_resched+0x31/0x40
  [804ddd24] wait_for_common+0x34/0x170
  [8022fdb7] ? try_to_wake_up+0x77/0x200
  [804ddec8] wait_for_completion+0x18/0x20
 [ ... ]

does a stack trace always look like this?


-- 
Best regards,
Dmitry Adamushko
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-05 Thread Rafael J. Wysocki
On Tuesday, 5 of February 2008, Dmitry Adamushko wrote:
 Rafael, any progress with this issue? (a few questions below).
 
  
   Does this artsmessage thing also run with RT priority?
 
  Well, it's in a strange state (after it's broken).  From top:
 
  PR = -51
  NI = 0
  S = R
  %CPU = 0.0
  %MEM = 0.0
 
 cat /proc/$PID/stat ; sleep 3; cat /proc/$PID/stat ?
 cat /proc/sched_debug; sleep 3 ; cat /proc/sched_debug

Well, instead please find appended a test program that allows me to trigger
the issue.

To reproduce it do:

$ gcc -o break_scheduler break_scheduler.c
$ su -
[...]
# chown root.root $PATH_TO_BINARY/break_scheduler
# chmod u+s $PATH_TO_BINARY/break_scheduler
^D
$ ./break_scheduler

It behaves normally if run directly by root and it also behaves normally if
the execv() at the end is removed.

Hope that helps to understand what the problem is.

Thanks,
Rafael

---
#include stdio.h
#include sys/stat.h
#include sys/resource.h
#include unistd.h
#include stdlib.h
#include string.h
#include stdlib.h
#include sched.h

#define EXECUTE /bin/ls

void adjust_priority()
{
int sched = sched_getscheduler(0);

if(sched == SCHED_FIFO || sched == SCHED_RR) {
puts( non-standard scheduling policy);
} else {
struct sched_param sp;
long int priority = (sched_get_priority_max(SCHED_FIFO) +
 sched_get_priority_min(SCHED_FIFO))/2;

sp.sched_priority = priority;

if (sched_setscheduler(0, SCHED_FIFO, sp) != -1) {
printf( running as realtime process now 
(priority %ld)\n, priority);
} else {
/* can't set realtime priority */
puts( could not set realtime priority);
}
}
}

int main(int argc, char **argv)
{
adjust_priority();

/* drop root privileges if running setuid root
   (due to realtime priority stuff) */
if (geteuid() != getuid()) {
seteuid(getuid());
if (geteuid() != getuid()) {
perror(setuid());
return 2;
}
}

puts(OK);

if(argc == 0)
return 1;

argv[0] = EXECUTE;
execv(EXECUTE, argv);
perror(EXECUTE);

return 0;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Rafael J. Wysocki
On Friday, 1 of February 2008, Peter Zijlstra wrote:
> 
> On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote:
> > On Friday, 1 of February 2008, Peter Zijlstra wrote:
> 
> > > > It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
> > > > execute realtime tasks?
> > 
> > artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used.
> > Still, artswrapper is running as a regular user, so it most probably drops
> > privileges early.
> > 
> > BTW, it fails while running the artsmessage utility used for displaying arts
> > error messages, so I guess there's an error in arts that this thing tries to
> > display and deadlocks (or something like that).
> > 
> > Should I test the patch nevertheless?
> 
> Don't think that would help any in this situation. The thing to look out
> for are RT tasks running with a different uid than 0.
> 
> This patch would only stop a task from obtaining RT class scheduling
> when already in a (misconfigured) group. If the task is RT and then
> switches group another - similar - thing is needed.
> 
> Does this artsmessage thing also run with RT priority?

Well, it's in a strange state (after it's broken).  From top:

PR = -51
NI = 0
S = R
%CPU = 0.0
%MEM = 0.0

Here's the corresponding trace from sysrq+t:

artswrapper   R  running task 5128  5776  1
 81007a8dbd88 0046 00015c4321b0 81006aa6e5c8
 806daa00 806daa00 806daa00 806daa00
 806daa00 806daa00 806d7a60 806daa00
Call Trace:
 [] ? try_to_wake_up+0x77/0x200
 [] __cond_resched+0x2d/0x60
 [] _cond_resched+0x31/0x40
 [] wait_for_common+0x34/0x170
 [] ? try_to_wake_up+0x77/0x200
 [] wait_for_completion+0x18/0x20
 [] sched_exec+0xba/0xf0
 [] do_execve+0x64/0x220
 [] sys_execve+0x46/0x70
 [] stub_execve+0x67/0xb0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Peter Zijlstra

On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote:
> On Friday, 1 of February 2008, Peter Zijlstra wrote:

> > > It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
> > > execute realtime tasks?
> 
> artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used.
> Still, artswrapper is running as a regular user, so it most probably drops
> privileges early.
> 
> BTW, it fails while running the artsmessage utility used for displaying arts
> error messages, so I guess there's an error in arts that this thing tries to
> display and deadlocks (or something like that).
> 
> Should I test the patch nevertheless?

Don't think that would help any in this situation. The thing to look out
for are RT tasks running with a different uid than 0.

This patch would only stop a task from obtaining RT class scheduling
when already in a (misconfigured) group. If the task is RT and then
switches group another - similar - thing is needed.

Does this artsmessage thing also run with RT priority?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Rafael J. Wysocki
On Friday, 1 of February 2008, Peter Zijlstra wrote:
> 
> On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote:
> > On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote:
> > > On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> > > > Hi,
> > > > 
> > > > This is related to the problem I reported earlier this week:
> > > > http://lkml.org/lkml/2008/1/30/554
> > > > 
> > > > Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time 
> > > > priority,
> > > > is mishandled by the scheduler.  The problem is that after the user 
> > > > logs out,
> > > > artswrapper stays in TASK_RUNNING forever and prevents other tasks from 
> > > > being
> > > > scheduled on the CPU occupied by it.  In this state it also breaks 
> > > > suspend and
> > > > hibernation (it cannot be frozen).
> > > > 
> > > > Since the problem is 100% reproducible on my test boxes, I carried out a
> > > > bisection which turned out the following commit:
> > > > 
> > > > commit 6f505b16425a51270058e4a93441fe64de3dd435
> > > > Author: Peter Zijlstra <[EMAIL PROTECTED]>
> > > > Date:   Fri Jan 25 21:08:30 2008 +0100
> > > > 
> > > > sched: rt group scheduling
> > > > 
> > > > I'm now checking if the problem disappears after reverting this patch 
> > > > (along a
> > > > couple of dependent ones).
> > > 
> > > Yes, it does.
> > > 
> > > Please let me know what I can do to debug it further.
> > 
> > It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
> > execute realtime tasks?

artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used.
Still, artswrapper is running as a regular user, so it most probably drops
privileges early.

BTW, it fails while running the artsmessage utility used for displaying arts
error messages, so I guess there's an error in arts that this thing tries to
display and deadlocks (or something like that).

Should I test the patch nevertheless?

> If the latter, does this help:
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index ba4c880..bb76cbc 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4563,6 +4563,15 @@ recheck:
>   return -EPERM;
>   }
>  
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> + /*
> +  * Do not allow realtime tasks into groups that have no runtime
> +  * assigned.
> +  */
> + if (rt_policy(policy) && task_group(p)->rt_ratio == 0)
> + return -EPERM;
> +#endif
> +
>   retval = security_task_setscheduler(p, policy, param);
>   if (retval)
>   return retval;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Peter Zijlstra

On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote:
> On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote:
> > On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> > > Hi,
> > > 
> > > This is related to the problem I reported earlier this week:
> > > http://lkml.org/lkml/2008/1/30/554
> > > 
> > > Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time 
> > > priority,
> > > is mishandled by the scheduler.  The problem is that after the user logs 
> > > out,
> > > artswrapper stays in TASK_RUNNING forever and prevents other tasks from 
> > > being
> > > scheduled on the CPU occupied by it.  In this state it also breaks 
> > > suspend and
> > > hibernation (it cannot be frozen).
> > > 
> > > Since the problem is 100% reproducible on my test boxes, I carried out a
> > > bisection which turned out the following commit:
> > > 
> > > commit 6f505b16425a51270058e4a93441fe64de3dd435
> > > Author: Peter Zijlstra <[EMAIL PROTECTED]>
> > > Date:   Fri Jan 25 21:08:30 2008 +0100
> > > 
> > > sched: rt group scheduling
> > > 
> > > I'm now checking if the problem disappears after reverting this patch 
> > > (along a
> > > couple of dependent ones).
> > 
> > Yes, it does.
> > 
> > Please let me know what I can do to debug it further.
> 
> It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
> execute realtime tasks?
> 

If the latter, does this help:

diff --git a/kernel/sched.c b/kernel/sched.c
index ba4c880..bb76cbc 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4563,6 +4563,15 @@ recheck:
return -EPERM;
}
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+   /*
+* Do not allow realtime tasks into groups that have no runtime
+* assigned.
+*/
+   if (rt_policy(policy) && task_group(p)->rt_ratio == 0)
+   return -EPERM;
+#endif
+
retval = security_task_setscheduler(p, policy, param);
if (retval)
return retval;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Peter Zijlstra

On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote:
 On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote:
  On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
   Hi,
   
   This is related to the problem I reported earlier this week:
   http://lkml.org/lkml/2008/1/30/554
   
   Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time 
   priority,
   is mishandled by the scheduler.  The problem is that after the user logs 
   out,
   artswrapper stays in TASK_RUNNING forever and prevents other tasks from 
   being
   scheduled on the CPU occupied by it.  In this state it also breaks 
   suspend and
   hibernation (it cannot be frozen).
   
   Since the problem is 100% reproducible on my test boxes, I carried out a
   bisection which turned out the following commit:
   
   commit 6f505b16425a51270058e4a93441fe64de3dd435
   Author: Peter Zijlstra [EMAIL PROTECTED]
   Date:   Fri Jan 25 21:08:30 2008 +0100
   
   sched: rt group scheduling
   
   I'm now checking if the problem disappears after reverting this patch 
   (along a
   couple of dependent ones).
  
  Yes, it does.
  
  Please let me know what I can do to debug it further.
 
 It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
 execute realtime tasks?
 

If the latter, does this help:

diff --git a/kernel/sched.c b/kernel/sched.c
index ba4c880..bb76cbc 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4563,6 +4563,15 @@ recheck:
return -EPERM;
}
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+   /*
+* Do not allow realtime tasks into groups that have no runtime
+* assigned.
+*/
+   if (rt_policy(policy)  task_group(p)-rt_ratio == 0)
+   return -EPERM;
+#endif
+
retval = security_task_setscheduler(p, policy, param);
if (retval)
return retval;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Rafael J. Wysocki
On Friday, 1 of February 2008, Peter Zijlstra wrote:
 
 On Fri, 2008-02-01 at 08:44 +0100, Peter Zijlstra wrote:
  On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote:
   On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
Hi,

This is related to the problem I reported earlier this week:
http://lkml.org/lkml/2008/1/30/554

Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time 
priority,
is mishandled by the scheduler.  The problem is that after the user 
logs out,
artswrapper stays in TASK_RUNNING forever and prevents other tasks from 
being
scheduled on the CPU occupied by it.  In this state it also breaks 
suspend and
hibernation (it cannot be frozen).

Since the problem is 100% reproducible on my test boxes, I carried out a
bisection which turned out the following commit:

commit 6f505b16425a51270058e4a93441fe64de3dd435
Author: Peter Zijlstra [EMAIL PROTECTED]
Date:   Fri Jan 25 21:08:30 2008 +0100

sched: rt group scheduling

I'm now checking if the problem disappears after reverting this patch 
(along a
couple of dependent ones).
   
   Yes, it does.
   
   Please let me know what I can do to debug it further.
  
  It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
  execute realtime tasks?

artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used.
Still, artswrapper is running as a regular user, so it most probably drops
privileges early.

BTW, it fails while running the artsmessage utility used for displaying arts
error messages, so I guess there's an error in arts that this thing tries to
display and deadlocks (or something like that).

Should I test the patch nevertheless?

 If the latter, does this help:
 
 diff --git a/kernel/sched.c b/kernel/sched.c
 index ba4c880..bb76cbc 100644
 --- a/kernel/sched.c
 +++ b/kernel/sched.c
 @@ -4563,6 +4563,15 @@ recheck:
   return -EPERM;
   }
  
 +#ifdef CONFIG_FAIR_GROUP_SCHED
 + /*
 +  * Do not allow realtime tasks into groups that have no runtime
 +  * assigned.
 +  */
 + if (rt_policy(policy)  task_group(p)-rt_ratio == 0)
 + return -EPERM;
 +#endif
 +
   retval = security_task_setscheduler(p, policy, param);
   if (retval)
   return retval;
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Peter Zijlstra

On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote:
 On Friday, 1 of February 2008, Peter Zijlstra wrote:

   It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
   execute realtime tasks?
 
 artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used.
 Still, artswrapper is running as a regular user, so it most probably drops
 privileges early.
 
 BTW, it fails while running the artsmessage utility used for displaying arts
 error messages, so I guess there's an error in arts that this thing tries to
 display and deadlocks (or something like that).
 
 Should I test the patch nevertheless?

Don't think that would help any in this situation. The thing to look out
for are RT tasks running with a different uid than 0.

This patch would only stop a task from obtaining RT class scheduling
when already in a (misconfigured) group. If the task is RT and then
switches group another - similar - thing is needed.

Does this artsmessage thing also run with RT priority?



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-02-01 Thread Rafael J. Wysocki
On Friday, 1 of February 2008, Peter Zijlstra wrote:
 
 On Fri, 2008-02-01 at 12:50 +0100, Rafael J. Wysocki wrote:
  On Friday, 1 of February 2008, Peter Zijlstra wrote:
 
It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
execute realtime tasks?
  
  artswrapper is setuid root and RLIMIT_RTPRIO is apparently not used.
  Still, artswrapper is running as a regular user, so it most probably drops
  privileges early.
  
  BTW, it fails while running the artsmessage utility used for displaying arts
  error messages, so I guess there's an error in arts that this thing tries to
  display and deadlocks (or something like that).
  
  Should I test the patch nevertheless?
 
 Don't think that would help any in this situation. The thing to look out
 for are RT tasks running with a different uid than 0.
 
 This patch would only stop a task from obtaining RT class scheduling
 when already in a (misconfigured) group. If the task is RT and then
 switches group another - similar - thing is needed.
 
 Does this artsmessage thing also run with RT priority?

Well, it's in a strange state (after it's broken).  From top:

PR = -51
NI = 0
S = R
%CPU = 0.0
%MEM = 0.0

Here's the corresponding trace from sysrq+t:

artswrapper   R  running task 5128  5776  1
 81007a8dbd88 0046 00015c4321b0 81006aa6e5c8
 806daa00 806daa00 806daa00 806daa00
 806daa00 806daa00 806d7a60 806daa00
Call Trace:
 [8022fdb7] ? try_to_wake_up+0x77/0x200
 [8023573d] __cond_resched+0x2d/0x60
 [804ddce1] _cond_resched+0x31/0x40
 [804ddd24] wait_for_common+0x34/0x170
 [8022fdb7] ? try_to_wake_up+0x77/0x200
 [804ddec8] wait_for_completion+0x18/0x20
 [80235aba] sched_exec+0xba/0xf0
 [802b5a64] do_execve+0x64/0x220
 [802097c6] sys_execve+0x46/0x70
 [8020bab7] stub_execve+0x67/0xb0
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-01-31 Thread Peter Zijlstra

On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote:
> On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > This is related to the problem I reported earlier this week:
> > http://lkml.org/lkml/2008/1/30/554
> > 
> > Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time 
> > priority,
> > is mishandled by the scheduler.  The problem is that after the user logs 
> > out,
> > artswrapper stays in TASK_RUNNING forever and prevents other tasks from 
> > being
> > scheduled on the CPU occupied by it.  In this state it also breaks suspend 
> > and
> > hibernation (it cannot be frozen).
> > 
> > Since the problem is 100% reproducible on my test boxes, I carried out a
> > bisection which turned out the following commit:
> > 
> > commit 6f505b16425a51270058e4a93441fe64de3dd435
> > Author: Peter Zijlstra <[EMAIL PROTECTED]>
> > Date:   Fri Jan 25 21:08:30 2008 +0100
> > 
> > sched: rt group scheduling
> > 
> > I'm now checking if the problem disappears after reverting this patch 
> > (along a
> > couple of dependent ones).
> 
> Yes, it does.
> 
> Please let me know what I can do to debug it further.

It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
execute realtime tasks?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-01-31 Thread Rafael J. Wysocki
On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> Hi,
> 
> This is related to the problem I reported earlier this week:
> http://lkml.org/lkml/2008/1/30/554
> 
> Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time priority,
> is mishandled by the scheduler.  The problem is that after the user logs out,
> artswrapper stays in TASK_RUNNING forever and prevents other tasks from being
> scheduled on the CPU occupied by it.  In this state it also breaks suspend and
> hibernation (it cannot be frozen).
> 
> Since the problem is 100% reproducible on my test boxes, I carried out a
> bisection which turned out the following commit:
> 
> commit 6f505b16425a51270058e4a93441fe64de3dd435
> Author: Peter Zijlstra <[EMAIL PROTECTED]>
> Date:   Fri Jan 25 21:08:30 2008 +0100
> 
> sched: rt group scheduling
> 
> I'm now checking if the problem disappears after reverting this patch (along a
> couple of dependent ones).

Yes, it does.

Please let me know what I can do to debug it further.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-01-31 Thread Rafael J. Wysocki
Hi,

This is related to the problem I reported earlier this week:
http://lkml.org/lkml/2008/1/30/554

Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time priority,
is mishandled by the scheduler.  The problem is that after the user logs out,
artswrapper stays in TASK_RUNNING forever and prevents other tasks from being
scheduled on the CPU occupied by it.  In this state it also breaks suspend and
hibernation (it cannot be frozen).

Since the problem is 100% reproducible on my test boxes, I carried out a
bisection which turned out the following commit:

commit 6f505b16425a51270058e4a93441fe64de3dd435
Author: Peter Zijlstra <[EMAIL PROTECTED]>
Date:   Fri Jan 25 21:08:30 2008 +0100

sched: rt group scheduling

I'm now checking if the problem disappears after reverting this patch (along a
couple of dependent ones).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 2.6.24-git9: RT sched mishandles artswrapper (bisected)

2008-01-31 Thread Peter Zijlstra

On Fri, 2008-02-01 at 03:04 +0100, Rafael J. Wysocki wrote:
 On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
  Hi,
  
  This is related to the problem I reported earlier this week:
  http://lkml.org/lkml/2008/1/30/554
  
  Apparently artswrapper, run by KDE in openSUSE 10.3 with a real time 
  priority,
  is mishandled by the scheduler.  The problem is that after the user logs 
  out,
  artswrapper stays in TASK_RUNNING forever and prevents other tasks from 
  being
  scheduled on the CPU occupied by it.  In this state it also breaks suspend 
  and
  hibernation (it cannot be frozen).
  
  Since the problem is 100% reproducible on my test boxes, I carried out a
  bisection which turned out the following commit:
  
  commit 6f505b16425a51270058e4a93441fe64de3dd435
  Author: Peter Zijlstra [EMAIL PROTECTED]
  Date:   Fri Jan 25 21:08:30 2008 +0100
  
  sched: rt group scheduling
  
  I'm now checking if the problem disappears after reverting this patch 
  (along a
  couple of dependent ones).
 
 Yes, it does.
 
 Please let me know what I can do to debug it further.

It arts run as root, or does it use RLIMIT_RTPRIO to allow users to
execute realtime tasks?



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/