Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-24 Thread Rafael J. Wysocki
On Sunday, 24 June 2007 02:45, Eric W. Biederman wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > On Sun, 24 Jun 2007 01:54:52 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> > > wrote: > > > >> On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: > >> > On Tue, Jun 19, 2007 at

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-24 Thread Rafael J. Wysocki
On Sunday, 24 June 2007 02:28, Siddha, Suresh B wrote: > On Sun, Jun 24, 2007 at 01:54:52AM +0200, Rafael J. Wysocki wrote: > > This patch breaks hibernation on my Turion 64 X2 - based testbox (HPC > > nx6325). > > > > _cpu_down() just hangs as though there were a deadlock in there, 100% of the

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-24 Thread Rafael J. Wysocki
On Sunday, 24 June 2007 02:45, Eric W. Biederman wrote: Andrew Morton [EMAIL PROTECTED] writes: On Sun, 24 Jun 2007 01:54:52 +0200 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-24 Thread Rafael J. Wysocki
On Sunday, 24 June 2007 02:28, Siddha, Suresh B wrote: On Sun, Jun 24, 2007 at 01:54:52AM +0200, Rafael J. Wysocki wrote: This patch breaks hibernation on my Turion 64 X2 - based testbox (HPC nx6325). _cpu_down() just hangs as though there were a deadlock in there, 100% of the time.

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Siddha, Suresh B
On Sat, Jun 23, 2007 at 06:45:05PM -0600, Eric W. Biederman wrote: > > Hmm. It looks like Siddha sent the wrong version of the patch. > The working tested version had an additional test to ensure > the mask and unmask methods were implemented. > > i.e. > + if

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Eric W. Biederman
Andrew Morton <[EMAIL PROTECTED]> writes: > On Sun, 24 Jun 2007 01:54:52 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> > wrote: > >> On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: >> > On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: >> > > >> > > This fixes the

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Siddha, Suresh B
On Sun, Jun 24, 2007 at 01:54:52AM +0200, Rafael J. Wysocki wrote: > This patch breaks hibernation on my Turion 64 X2 - based testbox (HPC nx6325). > > _cpu_down() just hangs as though there were a deadlock in there, 100% of the > time. Does the patch at this URL work for you?

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Andrew Morton
On Sun, 24 Jun 2007 01:54:52 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: > > On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: > > > > > > This fixes the problem! Hurrah! > > > > Great! Andrew, please include

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Rafael J. Wysocki
On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: > On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: > > > > This fixes the problem! Hurrah! > > Great! Andrew, please include the appended patch in -mm. > > > Subject: [patch] x86_64, irq: use mask/unmask and proper

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Rafael J. Wysocki
On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: This fixes the problem! Hurrah! Great! Andrew, please include the appended patch in -mm. Subject: [patch] x86_64, irq: use mask/unmask and proper locking

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Andrew Morton
On Sun, 24 Jun 2007 01:54:52 +0200 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: This fixes the problem! Hurrah! Great! Andrew, please include the appended

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Siddha, Suresh B
On Sun, Jun 24, 2007 at 01:54:52AM +0200, Rafael J. Wysocki wrote: This patch breaks hibernation on my Turion 64 X2 - based testbox (HPC nx6325). _cpu_down() just hangs as though there were a deadlock in there, 100% of the time. Does the patch at this URL work for you?

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Eric W. Biederman
Andrew Morton [EMAIL PROTECTED] writes: On Sun, 24 Jun 2007 01:54:52 +0200 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Wednesday, 20 June 2007 00:08, Siddha, Suresh B wrote: On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: This fixes the problem! Hurrah!

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-23 Thread Siddha, Suresh B
On Sat, Jun 23, 2007 at 06:45:05PM -0600, Eric W. Biederman wrote: Hmm. It looks like Siddha sent the wrong version of the patch. The working tested version had an additional test to ensure the mask and unmask methods were implemented. i.e. + if (irq_desc[irq].chip-mask) +

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Siddha, Suresh B
On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: > > This fixes the problem! Hurrah! Great! Andrew, please include the appended patch in -mm. Subject: [patch] x86_64, irq: use mask/unmask and proper locking in fixup_irqs From: Suresh Siddha <[EMAIL PROTECTED]> Force irq

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Darrick J. Wong
On Tue, Jun 19, 2007 at 12:59:27PM -0700, Siddha, Suresh B wrote: > hmm.. Please try this instead. This is intended only for debug. Based on your > test results, we can comeup with a more decent fix. This fixes the problem! Hurrah! --D signature.asc Description: Digital signature

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Siddha, Suresh B
On Tue, Jun 19, 2007 at 12:06:37PM -0700, Darrick J. Wong wrote: > On Tue, Jun 19, 2007 at 11:00:03AM -0700, Siddha, Suresh B wrote: > > Anyhow, Darrick there is a general bug in this area, can you try this and > > see if it helps? > > Er... that instantly locked up the system. hmm.. Please try

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Darrick J. Wong
On Tue, Jun 19, 2007 at 11:00:03AM -0700, Siddha, Suresh B wrote: > Anyhow, Darrick there is a general bug in this area, can you try this and > see if it helps? Er... that instantly locked up the system. --D signature.asc Description: Digital signature

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Eric W. Biederman
"Siddha, Suresh B" <[EMAIL PROTECTED]> writes: > On Tue, Jun 19, 2007 at 11:54:45AM -0600, Eric W. Biederman wrote: >> "Darrick J. Wong" <[EMAIL PROTECTED]> writes: >> >> > On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: >> > >> >> > >> >> > [ 256.298787] irq=4341 affinity=d

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Siddha, Suresh B
On Tue, Jun 19, 2007 at 11:54:45AM -0600, Eric W. Biederman wrote: > "Darrick J. Wong" <[EMAIL PROTECTED]> writes: > > > On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: > > > >> > > >> > [ 256.298787] irq=4341 affinity=d > >> > > >> > >> And just to make sure, at this point,

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Eric W. Biederman
"Darrick J. Wong" <[EMAIL PROTECTED]> writes: > On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: > >> > >> > [ 256.298787] irq=4341 affinity=d >> > >> >> And just to make sure, at this point, your MSI irq 4341 affinity >> (/proc/irq/4341/smp_affinity) still points to '2'? > >

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Eric W. Biederman
Darrick J. Wong [EMAIL PROTECTED] writes: On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: call to set_affinity [ 256.298787] irq=4341 affinity=d ethernet on irq 4341 stops working And just to make sure, at this point, your MSI irq 4341 affinity

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Siddha, Suresh B
On Tue, Jun 19, 2007 at 11:54:45AM -0600, Eric W. Biederman wrote: Darrick J. Wong [EMAIL PROTECTED] writes: On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: call to set_affinity [ 256.298787] irq=4341 affinity=d ethernet on irq 4341 stops working And just to

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Eric W. Biederman
Siddha, Suresh B [EMAIL PROTECTED] writes: On Tue, Jun 19, 2007 at 11:54:45AM -0600, Eric W. Biederman wrote: Darrick J. Wong [EMAIL PROTECTED] writes: On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: call to set_affinity [ 256.298787] irq=4341 affinity=d

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Darrick J. Wong
On Tue, Jun 19, 2007 at 11:00:03AM -0700, Siddha, Suresh B wrote: Anyhow, Darrick there is a general bug in this area, can you try this and see if it helps? Er... that instantly locked up the system. --D signature.asc Description: Digital signature

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Siddha, Suresh B
On Tue, Jun 19, 2007 at 12:06:37PM -0700, Darrick J. Wong wrote: On Tue, Jun 19, 2007 at 11:00:03AM -0700, Siddha, Suresh B wrote: Anyhow, Darrick there is a general bug in this area, can you try this and see if it helps? Er... that instantly locked up the system. hmm.. Please try this

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Darrick J. Wong
On Tue, Jun 19, 2007 at 12:59:27PM -0700, Siddha, Suresh B wrote: hmm.. Please try this instead. This is intended only for debug. Based on your test results, we can comeup with a more decent fix. This fixes the problem! Hurrah! --D signature.asc Description: Digital signature

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-19 Thread Siddha, Suresh B
On Tue, Jun 19, 2007 at 01:49:30PM -0700, Darrick J. Wong wrote: This fixes the problem! Hurrah! Great! Andrew, please include the appended patch in -mm. Subject: [patch] x86_64, irq: use mask/unmask and proper locking in fixup_irqs From: Suresh Siddha [EMAIL PROTECTED] Force irq

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-18 Thread Darrick J. Wong
On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: > > > > [ 256.298787] irq=4341 affinity=d > > > > And just to make sure, at this point, your MSI irq 4341 affinity > (/proc/irq/4341/smp_affinity) still points to '2'? Actually, it's 0xD. From the kernel's perspective the

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-18 Thread Siddha, Suresh B
On Mon, Jun 18, 2007 at 03:38:20PM -0700, Darrick J. Wong wrote: > On Thu, Jun 07, 2007 at 05:57:26PM -0700, Siddha, Suresh B wrote: > > > As you have the failing system, you need to do more detective work and > > help me out. Can you try this debug patch and send across the dmesg after > > the

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-18 Thread Darrick J. Wong
On Thu, Jun 07, 2007 at 05:57:26PM -0700, Siddha, Suresh B wrote: > As you have the failing system, you need to do more detective work and > help me out. Can you try this debug patch and send across the dmesg after the > bug happens and also can you try different compiler to see if something >

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-18 Thread Darrick J. Wong
On Thu, Jun 07, 2007 at 05:57:26PM -0700, Siddha, Suresh B wrote: As you have the failing system, you need to do more detective work and help me out. Can you try this debug patch and send across the dmesg after the bug happens and also can you try different compiler to see if something

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-18 Thread Siddha, Suresh B
On Mon, Jun 18, 2007 at 03:38:20PM -0700, Darrick J. Wong wrote: On Thu, Jun 07, 2007 at 05:57:26PM -0700, Siddha, Suresh B wrote: As you have the failing system, you need to do more detective work and help me out. Can you try this debug patch and send across the dmesg after the bug

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-18 Thread Darrick J. Wong
On Mon, Jun 18, 2007 at 04:54:34PM -0700, Siddha, Suresh B wrote: call to set_affinity [ 256.298787] irq=4341 affinity=d ethernet on irq 4341 stops working And just to make sure, at this point, your MSI irq 4341 affinity (/proc/irq/4341/smp_affinity) still points to '2'? Actually,

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-07 Thread Siddha, Suresh B
On Wed, Jun 06, 2007 at 04:16:42PM -0700, Darrick J. Wong wrote: > On Wed, Jun 06, 2007 at 12:35:14PM -0700, Siddha, Suresh B wrote: > > > Weird. Then the bug can only happen if for some reason, "mask = map" > > didn't happen in fixup_irqs(). Can you send us the disassembly of the > >

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-07 Thread Siddha, Suresh B
On Wed, Jun 06, 2007 at 04:16:42PM -0700, Darrick J. Wong wrote: On Wed, Jun 06, 2007 at 12:35:14PM -0700, Siddha, Suresh B wrote: Weird. Then the bug can only happen if for some reason, mask = map didn't happen in fixup_irqs(). Can you send us the disassembly of the fixup_irqs()?

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-06 Thread Darrick J. Wong
On Wed, Jun 06, 2007 at 12:35:14PM -0700, Siddha, Suresh B wrote: > Weird. Then the bug can only happen if for some reason, "mask = map" > didn't happen in fixup_irqs(). Can you send us the disassembly of the > fixup_irqs()? Attached. --D (gdb) disassemble fixup_irqs Dump of assembler code for

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-06 Thread Siddha, Suresh B
On Wed, Jun 06, 2007 at 11:58:29AM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2007 at 06:37:59PM -0700, Siddha, Suresh B wrote: > > On Tue, Jun 05, 2007 at 04:57:07PM -0700, Darrick J. Wong wrote: > > > On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: > > > > > > > Can you

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-06 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 06:37:59PM -0700, Siddha, Suresh B wrote: > On Tue, Jun 05, 2007 at 04:57:07PM -0700, Darrick J. Wong wrote: > > On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: > > > > > Can you send us your system's dmesg aswell as output of /proc/interrupts? > > > >

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-06 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 06:37:59PM -0700, Siddha, Suresh B wrote: On Tue, Jun 05, 2007 at 04:57:07PM -0700, Darrick J. Wong wrote: On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: Can you send us your system's dmesg aswell as output of /proc/interrupts?

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-06 Thread Siddha, Suresh B
On Wed, Jun 06, 2007 at 11:58:29AM -0700, Darrick J. Wong wrote: On Tue, Jun 05, 2007 at 06:37:59PM -0700, Siddha, Suresh B wrote: On Tue, Jun 05, 2007 at 04:57:07PM -0700, Darrick J. Wong wrote: On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: Can you send us your

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-06 Thread Darrick J. Wong
On Wed, Jun 06, 2007 at 12:35:14PM -0700, Siddha, Suresh B wrote: Weird. Then the bug can only happen if for some reason, mask = map didn't happen in fixup_irqs(). Can you send us the disassembly of the fixup_irqs()? Attached. --D (gdb) disassemble fixup_irqs Dump of assembler code for

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 04:57:07PM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: > > > Can you send us your system's dmesg aswell as output of /proc/interrupts? > > http://sweaglesw.net/~djwong/docs/dmesg >

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: > Can you send us your system's dmesg aswell as output of /proc/interrupts? http://sweaglesw.net/~djwong/docs/dmesg http://sweaglesw.net/~djwong/docs/interrupts --D signature.asc Description: Digital signature

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 01:09:54PM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2007 at 11:40:15AM -0700, Siddha, Suresh B wrote: > > > Does this problem happen only under certain stress or something simple, like > > > > boot the kernel > > echo 2 > /proc/irq/114/smp_affinity > > wait for irq

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 11:40:15AM -0700, Siddha, Suresh B wrote: > Does this problem happen only under certain stress or something simple, like > > boot the kernel > echo 2 > /proc/irq/114/smp_affinity > wait for irq to hit the cpu1. > echo 0 > /sys/devices/system/cpu/cpu1/online > > will

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 11:33:01AM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2007 at 11:13:42AM -0700, Siddha, Suresh B wrote: > > I see. Your system should have 4 or 8 logical cpu's right. So you must be > > using logical flat mode, right? > > I believe so. The system has two Xeon 5150s

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 11:13:42AM -0700, Siddha, Suresh B wrote: > I see. Your system should have 4 or 8 logical cpu's right. So you must be > using logical flat mode, right? I believe so. The system has two Xeon 5150s with an Intel 5000 chipset of some sort. > When this bug happens, what does

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 10:36:47AM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2007 at 10:23:10AM -0700, Siddha, Suresh B wrote: > > > Darrick, I see a kernel bug in this area(which is already filled with bugs, > > and I am looking into ways to fix them). Are you making sure that > > between

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 10:23:10AM -0700, Siddha, Suresh B wrote: > Darrick, I see a kernel bug in this area(which is already filled with bugs, > and I am looking into ways to fix them). Are you making sure that > between step-1 and step-2, that interrupts actually started arriving at cpu1? > >

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Thu, May 31, 2007 at 05:44:27PM -0700, Darrick J. Wong wrote: > Hi there, > > I'm seeing a driver hang with 2.6.22-rc3 while being slightly stupid > about offlining CPUs. I suspect that this problem extends beyond a > particular machine, as I've been able to replicate it with an IBM x3650 >

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Thu, May 31, 2007 at 05:44:27PM -0700, Darrick J. Wong wrote: Hi there, I'm seeing a driver hang with 2.6.22-rc3 while being slightly stupid about offlining CPUs. I suspect that this problem extends beyond a particular machine, as I've been able to replicate it with an IBM x3650 and an

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 10:23:10AM -0700, Siddha, Suresh B wrote: Darrick, I see a kernel bug in this area(which is already filled with bugs, and I am looking into ways to fix them). Are you making sure that between step-1 and step-2, that interrupts actually started arriving at cpu1? i.e.,

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 10:36:47AM -0700, Darrick J. Wong wrote: On Tue, Jun 05, 2007 at 10:23:10AM -0700, Siddha, Suresh B wrote: Darrick, I see a kernel bug in this area(which is already filled with bugs, and I am looking into ways to fix them). Are you making sure that between step-1

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 11:13:42AM -0700, Siddha, Suresh B wrote: I see. Your system should have 4 or 8 logical cpu's right. So you must be using logical flat mode, right? I believe so. The system has two Xeon 5150s with an Intel 5000 chipset of some sort. When this bug happens, what does

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 11:33:01AM -0700, Darrick J. Wong wrote: On Tue, Jun 05, 2007 at 11:13:42AM -0700, Siddha, Suresh B wrote: I see. Your system should have 4 or 8 logical cpu's right. So you must be using logical flat mode, right? I believe so. The system has two Xeon 5150s with an

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 11:40:15AM -0700, Siddha, Suresh B wrote: Does this problem happen only under certain stress or something simple, like boot the kernel echo 2 /proc/irq/114/smp_affinity wait for irq to hit the cpu1. echo 0 /sys/devices/system/cpu/cpu1/online will immmd trigger

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 01:09:54PM -0700, Darrick J. Wong wrote: On Tue, Jun 05, 2007 at 11:40:15AM -0700, Siddha, Suresh B wrote: Does this problem happen only under certain stress or something simple, like boot the kernel echo 2 /proc/irq/114/smp_affinity wait for irq to hit the

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Darrick J. Wong
On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: Can you send us your system's dmesg aswell as output of /proc/interrupts? http://sweaglesw.net/~djwong/docs/dmesg http://sweaglesw.net/~djwong/docs/interrupts --D signature.asc Description: Digital signature

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-05 Thread Siddha, Suresh B
On Tue, Jun 05, 2007 at 04:57:07PM -0700, Darrick J. Wong wrote: On Tue, Jun 05, 2007 at 02:14:51PM -0700, Siddha, Suresh B wrote: Can you send us your system's dmesg aswell as output of /proc/interrupts? http://sweaglesw.net/~djwong/docs/dmesg

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-03 Thread Emmanuel Fusté
> > This is just getting confusing. > > Emmanuel Fust. Please play with /proc/irq/*/smp_affinity by and and > confirm that you can move your irqs. This will confirm it is the decision > part. > Ok, as planned, you're right ;-) , playing with /proc/irq/*/smp_affinity let me move irqs.

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-03 Thread Emmanuel Fusté
This is just getting confusing. Emmanuel Fust. Please play with /proc/irq/*/smp_affinity by and and confirm that you can move your irqs. This will confirm it is the decision part. Ok, as planned, you're right ;-) , playing with /proc/irq/*/smp_affinity let me move irqs. Emmanuel. ---

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman
"Darrick J. Wong" <[EMAIL PROTECTED]> writes: > On Fri, Jun 01, 2007 at 06:18:32PM -0600, Eric W. Biederman wrote: > >> I doubt it. The practical problem is that cpu_down does not >> and by design can not call the irq balancing part properly >> and I haven't yet seen anything to suggest that we

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Darrick J. Wong
On Fri, Jun 01, 2007 at 06:18:32PM -0600, Eric W. Biederman wrote: > I doubt it. The practical problem is that cpu_down does not > and by design can not call the irq balancing part properly > and I haven't yet seen anything to suggest that we don't migrate > irq properly. > > So I'm guessing it

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman
> As a side note, on my very old SMP machine, 2.6.20 correctly > load-balance IRQs across CPU but 2.6.21 not. I know that > in-kernel IRQ load balancer is marked as deprecated and > somewhat broken, but with your report it make me think it > could be a bug in the IRQ rerouting part in my case too

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Emmanuel Fusté
> There exists a similar scenario. Set the IRQ affinity to a bunch of > CPUs, watch /proc/interrupts to see which CPU is actually servicing the > interrupts, then offline that CPU. The kernel does not reroute the IRQ > to any of the other CPUs and the device also hangs. > > The furthest that

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman
"Darrick J. Wong" <[EMAIL PROTECTED]> writes: > Hi there, > > I'm seeing a driver hang with 2.6.22-rc3 while being slightly stupid > about offlining CPUs. I suspect that this problem extends beyond a > particular machine, as I've been able to replicate it with an IBM x3650 > and an IBM x3755.

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman
Darrick J. Wong [EMAIL PROTECTED] writes: Hi there, I'm seeing a driver hang with 2.6.22-rc3 while being slightly stupid about offlining CPUs. I suspect that this problem extends beyond a particular machine, as I've been able to replicate it with an IBM x3650 and an IBM x3755. This is

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Emmanuel Fusté
There exists a similar scenario. Set the IRQ affinity to a bunch of CPUs, watch /proc/interrupts to see which CPU is actually servicing the interrupts, then offline that CPU. The kernel does not reroute the IRQ to any of the other CPUs and the device also hangs. The furthest that I've dug

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman
As a side note, on my very old SMP machine, 2.6.20 correctly load-balance IRQs across CPU but 2.6.21 not. I know that in-kernel IRQ load balancer is marked as deprecated and somewhat broken, but with your report it make me think it could be a bug in the IRQ rerouting part in my case too and

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Darrick J. Wong
On Fri, Jun 01, 2007 at 06:18:32PM -0600, Eric W. Biederman wrote: I doubt it. The practical problem is that cpu_down does not and by design can not call the irq balancing part properly and I haven't yet seen anything to suggest that we don't migrate irq properly. So I'm guessing it was

Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman
Darrick J. Wong [EMAIL PROTECTED] writes: On Fri, Jun 01, 2007 at 06:18:32PM -0600, Eric W. Biederman wrote: I doubt it. The practical problem is that cpu_down does not and by design can not call the irq balancing part properly and I haven't yet seen anything to suggest that we don't

Device hang when offlining a CPU due to IRQ misrouting

2007-05-31 Thread Darrick J. Wong
Hi there, I'm seeing a driver hang with 2.6.22-rc3 while being slightly stupid about offlining CPUs. I suspect that this problem extends beyond a particular machine, as I've been able to replicate it with an IBM x3650 and an IBM x3755. This is what I'm doing: 1) I tie an IRQ to a particular

Device hang when offlining a CPU due to IRQ misrouting

2007-05-31 Thread Darrick J. Wong
Hi there, I'm seeing a driver hang with 2.6.22-rc3 while being slightly stupid about offlining CPUs. I suspect that this problem extends beyond a particular machine, as I've been able to replicate it with an IBM x3650 and an IBM x3755. This is what I'm doing: 1) I tie an IRQ to a particular