RE: 2.6.12-rc2-mm3 pciehp regression
On Wednesday, April 20, 2005 12:50 PM, Tom Duffy wrote: > > The errors I encountered were: > > Reading all physical volumes. This may take a while... > > Umount /sys failed: 16 > > mount: error 6 mounting ext3 > > mount: error 2 mounting none > > Switching to new root > > Switchroot: mount failed 22 > > umount /initrd/dev failed: 2 > > > > I also encountered issue you & others discussed in the thread on > > "Re: Heads up on 2.6.12-rc1 and later" if I used SCSI drive. > > > > Can you send me the config file you used successfully on your > > system? > > You will need to boot the system UP (not SMP). There is a problem > with modules loading too fast that causes the initrd to fail. This doesn't help on my system. I tried both ways: using boot option with nosmp, and rebuilding kernel with SMP off in config file. Using nosmp, I got: IOAPIC [0]: Invalid reference to IRQ 0 . . audit() initialized ide 1 : id1 1 : ports already in use, skipping and system halted Rebuilding kernel with SMP off in config file, I got: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) Thanks, Dely - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.12-rc2-mm3 pciehp regression
On Wed, 2005-04-20 at 11:56 -0700, Sy, Dely L wrote: > On Friday, April 15, 2005 12:48 PM, Tom Duffy wrote: > > From: "Sy, Dely L" <[EMAIL PROTECTED]> > > > Thanks for reporting this. I'll look into it. Which was the last > > > kernel you tested on your hw and worked for you? > > > That is a good question. I think it was a 2.6.11 kernel. It was > > definately before express was moved to a different directory, > > whenever that occured. > > Tom, > > I was not able to duplicate this problem on my system yet for I have > trouble in getting my system booted up on 2.6.12-rc2-mm3. I did some > back-tracking and found that the boot problem occurred also with > 2.6.12-rc2-mm2 & 2.6.12-rc2-mm3, and on two systems using IDE as boot > drive. The config file I used worked fine on 2.6.11.7. I tried > different config file without success. > > The errors I encountered were: > Reading all physical volumes. This may take a while... > Umount /sys failed: 16 > mount: error 6 mounting ext3 > mount: error 2 mounting none > Switching to new root > Switchroot: mount failed 22 > umount /initrd/dev failed: 2 > > I also encountered issue you & others discussed in the thread on > "Re: Heads up on 2.6.12-rc1 and later" if I used SCSI drive. > > Can you send me the config file you used successfully on your > system? You will need to boot the system UP (not SMP). There is a problem with modules loading too fast that causes the initrd to fail. -tduffy signature.asc Description: This is a digitally signed message part
RE: 2.6.12-rc2-mm3 pciehp regression
On Friday, April 15, 2005 12:48 PM, Tom Duffy wrote: > From: "Sy, Dely L" <[EMAIL PROTECTED]> > > Thanks for reporting this. I'll look into it. Which was the last > > kernel you tested on your hw and worked for you? > That is a good question. I think it was a 2.6.11 kernel. It was > definately before express was moved to a different directory, > whenever that occured. Tom, I was not able to duplicate this problem on my system yet for I have trouble in getting my system booted up on 2.6.12-rc2-mm3. I did some back-tracking and found that the boot problem occurred also with 2.6.12-rc2-mm2 & 2.6.12-rc2-mm3, and on two systems using IDE as boot drive. The config file I used worked fine on 2.6.11.7. I tried different config file without success. The errors I encountered were: Reading all physical volumes. This may take a while... Umount /sys failed: 16 mount: error 6 mounting ext3 mount: error 2 mounting none Switching to new root Switchroot: mount failed 22 umount /initrd/dev failed: 2 I also encountered issue you & others discussed in the thread on "Re: Heads up on 2.6.12-rc1 and later" if I used SCSI drive. Can you send me the config file you used successfully on your system? Thanks, Dely - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.12-rc2-mm3 pciehp regression
On Friday, April 15, 2005 12:48 PM, Tom Duffy wrote: From: Sy, Dely L [EMAIL PROTECTED] Thanks for reporting this. I'll look into it. Which was the last kernel you tested on your hw and worked for you? That is a good question. I think it was a 2.6.11 kernel. It was definately before express was moved to a different directory, whenever that occured. Tom, I was not able to duplicate this problem on my system yet for I have trouble in getting my system booted up on 2.6.12-rc2-mm3. I did some back-tracking and found that the boot problem occurred also with 2.6.12-rc2-mm2 2.6.12-rc2-mm3, and on two systems using IDE as boot drive. The config file I used worked fine on 2.6.11.7. I tried different config file without success. The errors I encountered were: Reading all physical volumes. This may take a while... Umount /sys failed: 16 mount: error 6 mounting ext3 mount: error 2 mounting none Switching to new root Switchroot: mount failed 22 umount /initrd/dev failed: 2 I also encountered issue you others discussed in the thread on Re: Heads up on 2.6.12-rc1 and later if I used SCSI drive. Can you send me the config file you used successfully on your system? Thanks, Dely - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.12-rc2-mm3 pciehp regression
On Wed, 2005-04-20 at 11:56 -0700, Sy, Dely L wrote: On Friday, April 15, 2005 12:48 PM, Tom Duffy wrote: From: Sy, Dely L [EMAIL PROTECTED] Thanks for reporting this. I'll look into it. Which was the last kernel you tested on your hw and worked for you? That is a good question. I think it was a 2.6.11 kernel. It was definately before express was moved to a different directory, whenever that occured. Tom, I was not able to duplicate this problem on my system yet for I have trouble in getting my system booted up on 2.6.12-rc2-mm3. I did some back-tracking and found that the boot problem occurred also with 2.6.12-rc2-mm2 2.6.12-rc2-mm3, and on two systems using IDE as boot drive. The config file I used worked fine on 2.6.11.7. I tried different config file without success. The errors I encountered were: Reading all physical volumes. This may take a while... Umount /sys failed: 16 mount: error 6 mounting ext3 mount: error 2 mounting none Switching to new root Switchroot: mount failed 22 umount /initrd/dev failed: 2 I also encountered issue you others discussed in the thread on Re: Heads up on 2.6.12-rc1 and later if I used SCSI drive. Can you send me the config file you used successfully on your system? You will need to boot the system UP (not SMP). There is a problem with modules loading too fast that causes the initrd to fail. -tduffy signature.asc Description: This is a digitally signed message part
RE: 2.6.12-rc2-mm3 pciehp regression
On Wednesday, April 20, 2005 12:50 PM, Tom Duffy wrote: The errors I encountered were: Reading all physical volumes. This may take a while... Umount /sys failed: 16 mount: error 6 mounting ext3 mount: error 2 mounting none Switching to new root Switchroot: mount failed 22 umount /initrd/dev failed: 2 I also encountered issue you others discussed in the thread on Re: Heads up on 2.6.12-rc1 and later if I used SCSI drive. Can you send me the config file you used successfully on your system? You will need to boot the system UP (not SMP). There is a problem with modules loading too fast that causes the initrd to fail. This doesn't help on my system. I tried both ways: using boot option with nosmp, and rebuilding kernel with SMP off in config file. Using nosmp, I got: IOAPIC [0]: Invalid reference to IRQ 0 . . audit() initialized ide 1 : id1 1 : ports already in use, skipping and system halted Rebuilding kernel with SMP off in config file, I got: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) Thanks, Dely - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 regression - certain applications get SIGSEGV but are fine with 2.6.12-rc2-mm2
On Tue, 19 Apr 2005, Alexander Nyberg wrote: > tis 2005-04-19 klockan 11:33 +0200 skrev Jesper Juhl: > > Everything is fine with 2.6.12-rc2, 2.6.12-rc2-mm1, 2.6.12-rc2-mm2 & > > earlier kernels as well, but 2.6.12-rc2-mm3 seems to have a problem. > > I don't know what's causing this, all I can do at the moment is describe > > the symptoms. > > > > Certain applications (krootimage and ksplash from KDE 3.4 are 100% > > reproducible test cases) that used to run fine have started crashing with > > SIGSEGV on 2.6.12-rc2-mm3. I see nothing suspicious in dmesg. > > I'm including dmesg output as well as strace output from krootimage and > > ksplash below. > > If someone could give me a hint as to what the cause of this could be or > > what to try in order to track it down I'd appreciate it. > > This is 100% reproducible. > > Try backing out > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/broken-out/sched-unlocked-context-switches.patch > That did the trick. All the apps that segfaulted previously now seem to be running just fine again. Are Ingo, Nick & Andrew aware that this patch has issues? -- Jesper - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 regression - certain applications get SIGSEGV but are fine with 2.6.12-rc2-mm2
tis 2005-04-19 klockan 11:33 +0200 skrev Jesper Juhl: > Everything is fine with 2.6.12-rc2, 2.6.12-rc2-mm1, 2.6.12-rc2-mm2 & > earlier kernels as well, but 2.6.12-rc2-mm3 seems to have a problem. > I don't know what's causing this, all I can do at the moment is describe > the symptoms. > > Certain applications (krootimage and ksplash from KDE 3.4 are 100% > reproducible test cases) that used to run fine have started crashing with > SIGSEGV on 2.6.12-rc2-mm3. I see nothing suspicious in dmesg. > I'm including dmesg output as well as strace output from krootimage and > ksplash below. > If someone could give me a hint as to what the cause of this could be or > what to try in order to track it down I'd appreciate it. > This is 100% reproducible. Try backing out http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/broken-out/sched-unlocked-context-switches.patch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 regression - certain applications get SIGSEGV but are fine with 2.6.12-rc2-mm2
tis 2005-04-19 klockan 11:33 +0200 skrev Jesper Juhl: Everything is fine with 2.6.12-rc2, 2.6.12-rc2-mm1, 2.6.12-rc2-mm2 earlier kernels as well, but 2.6.12-rc2-mm3 seems to have a problem. I don't know what's causing this, all I can do at the moment is describe the symptoms. Certain applications (krootimage and ksplash from KDE 3.4 are 100% reproducible test cases) that used to run fine have started crashing with SIGSEGV on 2.6.12-rc2-mm3. I see nothing suspicious in dmesg. I'm including dmesg output as well as strace output from krootimage and ksplash below. If someone could give me a hint as to what the cause of this could be or what to try in order to track it down I'd appreciate it. This is 100% reproducible. Try backing out http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/broken-out/sched-unlocked-context-switches.patch - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 regression - certain applications get SIGSEGV but are fine with 2.6.12-rc2-mm2
On Tue, 19 Apr 2005, Alexander Nyberg wrote: tis 2005-04-19 klockan 11:33 +0200 skrev Jesper Juhl: Everything is fine with 2.6.12-rc2, 2.6.12-rc2-mm1, 2.6.12-rc2-mm2 earlier kernels as well, but 2.6.12-rc2-mm3 seems to have a problem. I don't know what's causing this, all I can do at the moment is describe the symptoms. Certain applications (krootimage and ksplash from KDE 3.4 are 100% reproducible test cases) that used to run fine have started crashing with SIGSEGV on 2.6.12-rc2-mm3. I see nothing suspicious in dmesg. I'm including dmesg output as well as strace output from krootimage and ksplash below. If someone could give me a hint as to what the cause of this could be or what to try in order to track it down I'd appreciate it. This is 100% reproducible. Try backing out http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/broken-out/sched-unlocked-context-switches.patch That did the trick. All the apps that segfaulted previously now seem to be running just fine again. Are Ingo, Nick Andrew aware that this patch has issues? -- Jesper - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3: hostap: do not #include .c files
On Tue, Apr 19, 2005 at 04:03:12AM +0200, Adrian Bunk wrote: > drivers/net/wireless/hostap/hostap.c:#include "hostap_crypt.c" > Please do not #include .c files. A tested patch would be appreciated.. ;-) > A proper separation in a .c file and a header file is the better > solution. Agreed and this is on my to-do list, but not very high on it. Some of these would be relatively easy to fix, but the hardware specific ones (different register offsets for PC Card/PLX/PCI) would require quite a bit of changes to get rid of this. -- Jouni MalinenPGP id EFC895FA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 18 Apr 2005 11:56:15 +0200, Alexander Nyberg wrote: >> >This patch fixes the NMI checking problems in -mm x64 for me. It >> >> What problems? >> > >Sorry, in -mm on x64 check_nmi_watchdog() has started to be run as a >late_initcall(). Currently it reports the NMIs as stuck on a few systems >although they are not, both of mine are reported as stuck. This appears >to be because the current event mask uses don't appear to tick much >running mdelay() on opteron (in my case). Please provide a complete dmesg log up to and including the failure point where the kernel complains about stuck NMIs. I tried 2.6.12-rc2-mm3 SMP on UP amd64 and I immediately found a bug triggering bogus stuck NMI failures, and I want to check if what you're seeing is caused by the same bug. > Also in -mm because nmi_hz is >set to 1 in setup_k7_watchdog() the NMI watchdog checking takes 10 >seconds, a bit much. Orthogonal issue. Let's ignore this one for now. >Patch below uses RETIRED_UOPS for a more constant rate of NMI sending, This may or may not work as you intend. There is _no_ documented reason to assume that RETIRED_UOPS would provide a more steady stream of events than CYCLES_PROCESSOR_IS_RUNNING. Both events are likely to be idle in HLT states, for instance. The local APIC + performance counter driven NMI watchdog simply cannot provide wall-clock like behaviour. You need the I/O-APIC driven watchdog for that, or to prevent the kernel from using HLT when idle. >@@ -68,7 +69,7 @@ > #define K7_EVNTSEL_INT(1 << 20) > #define K7_EVNTSEL_OS (1 << 17) > #define K7_EVNTSEL_USR(1 << 16) >-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 >+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0xC1 /* Retired uops */ > #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING This is as bogus as "#define ONE 2". CYCLES_PROCESSOR_IS_RUNNING _is_ event 0x76 (AMD renamed it recently, but that's irrelevant). Using RETIRED_UOPS requires a new define, and a modification to the K7_NMI_EVENT #define. /Mikael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
mån 2005-04-18 klockan 13:14 +0200 skrev Arjan van de Ven: > On Mon, 2005-04-18 at 13:05 +0200, Alexander Nyberg wrote: > > [Proper patch now that goes all the way, sorry for spamming] > > > > Patch below uses RETIRED_UOPS for a more constant rate of NMI sending. > > This makes x64 deliver NMI interrupts every fourth second at a constant > > rate when going through the local apic. Makes both cpus on my box to get > > NMIs at constant rate that it previously did not, there could be long > > delays when a CPU was idle. > > > isn't this dangerous in the light of the mobile cpus that either scale > back or stop entirely in idle or lower load situations ? > I don't see any real problem, at each nmi_watchdog_tick() the next NMI is calculated accounting cpu_khz so the NMIs might not come at a constant rate while frequency scaling, but over time there will still be one every fourth second. And if stop entirely as you say, are there even any uops run? And even if so the watchdog that is now currently would also have a few events accounted on it and could fire NMI aswell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 2005-04-18 at 13:05 +0200, Alexander Nyberg wrote: > [Proper patch now that goes all the way, sorry for spamming] > > Patch below uses RETIRED_UOPS for a more constant rate of NMI sending. > This makes x64 deliver NMI interrupts every fourth second at a constant > rate when going through the local apic. Makes both cpus on my box to get > NMIs at constant rate that it previously did not, there could be long > delays when a CPU was idle. isn't this dangerous in the light of the mobile cpus that either scale back or stop entirely in idle or lower load situations ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
[Proper patch now that goes all the way, sorry for spamming] Patch below uses RETIRED_UOPS for a more constant rate of NMI sending. This makes x64 deliver NMI interrupts every fourth second at a constant rate when going through the local apic. Makes both cpus on my box to get NMIs at constant rate that it previously did not, there could be long delays when a CPU was idle. This fixes misdetection in check_nmi_watchdog() that thought the NMI sending was stuck although it was not because the perfctr did not generate enough events with the previous mask. The 10-second check_nmi_watchdog() delay is down to 10 msec now. Tested on opteron SMP. Index: x64_mm/arch/x86_64/kernel/nmi.c === --- x64_mm.orig/arch/x86_64/kernel/nmi.c2005-04-18 12:56:05.0 +0200 +++ x64_mm/arch/x86_64/kernel/nmi.c 2005-04-18 14:47:14.0 +0200 @@ -59,16 +59,14 @@ unsigned int nmi_watchdog = NMI_DEFAULT; static unsigned int nmi_hz = HZ; +static int nmi_mult = 1; /* nmi multiplier for longer intervals */ unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ -/* Note that these events don't tick when the CPU idles. This means - the frequency varies with CPU load. */ - #define K7_EVNTSEL_ENABLE (1 << 22) #define K7_EVNTSEL_INT (1 << 20) #define K7_EVNTSEL_OS (1 << 17) #define K7_EVNTSEL_USR (1 << 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 +#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0xC1 /* Retired uops */ #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING #define P6_EVNTSEL0_ENABLE (1 << 22) @@ -78,6 +76,11 @@ #define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +static inline unsigned long nmi_interval(void) +{ + return ((unsigned long)cpu_khz * 1000 * nmi_mult) / nmi_hz; +} + /* Run after command line and cpu_init init, but before all other checks */ void __init nmi_watchdog_default(void) { @@ -146,8 +149,10 @@ /* now that we know it works we can reduce NMI frequency to something more reasonable; makes a difference in some configs */ - if (nmi_watchdog == NMI_LOCAL_APIC) + if (nmi_watchdog == NMI_LOCAL_APIC) { nmi_hz = 1; + nmi_mult = 8; + } return 0; } @@ -305,9 +310,6 @@ int i; unsigned int evntsel; - /* No check, so can start with slow frequency */ - nmi_hz = 1; - /* XXX should check these in EFER */ nmi_perfctr_msr = MSR_K7_PERFCTR0; @@ -325,7 +327,7 @@ | K7_NMI_EVENT; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz*1000) / nmi_hz); + wrmsrl(MSR_K7_PERFCTR0, -nmi_interval()); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); @@ -393,10 +395,10 @@ if (last_irq_sums[cpu] == sum) { /* * Ayiee, looks like this CPU is stuck ... -* wait a few IRQs (5 seconds) before doing the oops ... +* wait a few NMIs before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) { + if (alert_counter[cpu] == 3*nmi_hz) { if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP) { alert_counter[cpu] = 0; @@ -409,7 +411,7 @@ alert_counter[cpu] = 0; } if (nmi_perfctr_msr) - wrmsr(nmi_perfctr_msr, -(cpu_khz/nmi_hz*1000), -1); + wrmsr(nmi_perfctr_msr, -nmi_interval(), -1); } static int dummy_nmi_callback(struct pt_regs * regs, int cpu) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
> >This patch fixes the NMI checking problems in -mm x64 for me. It > > What problems? > Sorry, in -mm on x64 check_nmi_watchdog() has started to be run as a late_initcall(). Currently it reports the NMIs as stuck on a few systems although they are not, both of mine are reported as stuck. This appears to be because the current event mask uses don't appear to tick much running mdelay() on opteron (in my case). Also in -mm because nmi_hz is set to 1 in setup_k7_watchdog() the NMI watchdog checking takes 10 seconds, a bit much. Patch below uses RETIRED_UOPS for a more constant rate of NMI sending, this works well for me. However I'd like NMIs to maybe fire every fourth second or so. Using nmi_mult to multiply nmi_interval() by 4 doesn't seem to make it go every fourth second however, maybe every 1.5 second, I'm puzzled about this... Index: x64_mm/arch/x86_64/kernel/nmi.c === --- x64_mm.orig/arch/x86_64/kernel/nmi.c2005-04-18 12:56:05.0 +0200 +++ x64_mm/arch/x86_64/kernel/nmi.c 2005-04-18 13:34:37.0 +0200 @@ -59,6 +59,7 @@ unsigned int nmi_watchdog = NMI_DEFAULT; static unsigned int nmi_hz = HZ; +static int nmi_mult = 1; /* nmi multiplier, how many seconds inbetween */ unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ /* Note that these events don't tick when the CPU idles. This means @@ -68,7 +69,7 @@ #define K7_EVNTSEL_INT (1 << 20) #define K7_EVNTSEL_OS (1 << 17) #define K7_EVNTSEL_USR (1 << 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 +#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0xC1 /* Retired uops */ #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING #define P6_EVNTSEL0_ENABLE (1 << 22) @@ -78,6 +79,11 @@ #define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +static inline unsigned long nmi_interval(void) +{ + return ((unsigned long)cpu_khz * 1000 * nmi_mult) / nmi_hz; +} + /* Run after command line and cpu_init init, but before all other checks */ void __init nmi_watchdog_default(void) { @@ -146,8 +152,10 @@ /* now that we know it works we can reduce NMI frequency to something more reasonable; makes a difference in some configs */ - if (nmi_watchdog == NMI_LOCAL_APIC) + if (nmi_watchdog == NMI_LOCAL_APIC) { nmi_hz = 1; + nmi_mult = 4; + } return 0; } @@ -305,9 +313,6 @@ int i; unsigned int evntsel; - /* No check, so can start with slow frequency */ - nmi_hz = 1; - /* XXX should check these in EFER */ nmi_perfctr_msr = MSR_K7_PERFCTR0; @@ -325,7 +330,7 @@ | K7_NMI_EVENT; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz*1000) / nmi_hz); + wrmsrl(MSR_K7_PERFCTR0, -nmi_interval()); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); @@ -409,7 +414,7 @@ alert_counter[cpu] = 0; } if (nmi_perfctr_msr) - wrmsr(nmi_perfctr_msr, -(cpu_khz/nmi_hz*1000), -1); + wrmsr(nmi_perfctr_msr, -nmi_interval(), -1); } static int dummy_nmi_callback(struct pt_regs * regs, int cpu) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
This patch fixes the NMI checking problems in -mm x64 for me. It What problems? Sorry, in -mm on x64 check_nmi_watchdog() has started to be run as a late_initcall(). Currently it reports the NMIs as stuck on a few systems although they are not, both of mine are reported as stuck. This appears to be because the current event mask uses don't appear to tick much running mdelay() on opteron (in my case). Also in -mm because nmi_hz is set to 1 in setup_k7_watchdog() the NMI watchdog checking takes 10 seconds, a bit much. Patch below uses RETIRED_UOPS for a more constant rate of NMI sending, this works well for me. However I'd like NMIs to maybe fire every fourth second or so. Using nmi_mult to multiply nmi_interval() by 4 doesn't seem to make it go every fourth second however, maybe every 1.5 second, I'm puzzled about this... Index: x64_mm/arch/x86_64/kernel/nmi.c === --- x64_mm.orig/arch/x86_64/kernel/nmi.c2005-04-18 12:56:05.0 +0200 +++ x64_mm/arch/x86_64/kernel/nmi.c 2005-04-18 13:34:37.0 +0200 @@ -59,6 +59,7 @@ unsigned int nmi_watchdog = NMI_DEFAULT; static unsigned int nmi_hz = HZ; +static int nmi_mult = 1; /* nmi multiplier, how many seconds inbetween */ unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ /* Note that these events don't tick when the CPU idles. This means @@ -68,7 +69,7 @@ #define K7_EVNTSEL_INT (1 20) #define K7_EVNTSEL_OS (1 17) #define K7_EVNTSEL_USR (1 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 +#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0xC1 /* Retired uops */ #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING #define P6_EVNTSEL0_ENABLE (1 22) @@ -78,6 +79,11 @@ #define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +static inline unsigned long nmi_interval(void) +{ + return ((unsigned long)cpu_khz * 1000 * nmi_mult) / nmi_hz; +} + /* Run after command line and cpu_init init, but before all other checks */ void __init nmi_watchdog_default(void) { @@ -146,8 +152,10 @@ /* now that we know it works we can reduce NMI frequency to something more reasonable; makes a difference in some configs */ - if (nmi_watchdog == NMI_LOCAL_APIC) + if (nmi_watchdog == NMI_LOCAL_APIC) { nmi_hz = 1; + nmi_mult = 4; + } return 0; } @@ -305,9 +313,6 @@ int i; unsigned int evntsel; - /* No check, so can start with slow frequency */ - nmi_hz = 1; - /* XXX should check these in EFER */ nmi_perfctr_msr = MSR_K7_PERFCTR0; @@ -325,7 +330,7 @@ | K7_NMI_EVENT; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz*1000) / nmi_hz); + wrmsrl(MSR_K7_PERFCTR0, -nmi_interval()); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); @@ -409,7 +414,7 @@ alert_counter[cpu] = 0; } if (nmi_perfctr_msr) - wrmsr(nmi_perfctr_msr, -(cpu_khz/nmi_hz*1000), -1); + wrmsr(nmi_perfctr_msr, -nmi_interval(), -1); } static int dummy_nmi_callback(struct pt_regs * regs, int cpu) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
[Proper patch now that goes all the way, sorry for spamming] Patch below uses RETIRED_UOPS for a more constant rate of NMI sending. This makes x64 deliver NMI interrupts every fourth second at a constant rate when going through the local apic. Makes both cpus on my box to get NMIs at constant rate that it previously did not, there could be long delays when a CPU was idle. This fixes misdetection in check_nmi_watchdog() that thought the NMI sending was stuck although it was not because the perfctr did not generate enough events with the previous mask. The 10-second check_nmi_watchdog() delay is down to 10 msec now. Tested on opteron SMP. Index: x64_mm/arch/x86_64/kernel/nmi.c === --- x64_mm.orig/arch/x86_64/kernel/nmi.c2005-04-18 12:56:05.0 +0200 +++ x64_mm/arch/x86_64/kernel/nmi.c 2005-04-18 14:47:14.0 +0200 @@ -59,16 +59,14 @@ unsigned int nmi_watchdog = NMI_DEFAULT; static unsigned int nmi_hz = HZ; +static int nmi_mult = 1; /* nmi multiplier for longer intervals */ unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ -/* Note that these events don't tick when the CPU idles. This means - the frequency varies with CPU load. */ - #define K7_EVNTSEL_ENABLE (1 22) #define K7_EVNTSEL_INT (1 20) #define K7_EVNTSEL_OS (1 17) #define K7_EVNTSEL_USR (1 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 +#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0xC1 /* Retired uops */ #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING #define P6_EVNTSEL0_ENABLE (1 22) @@ -78,6 +76,11 @@ #define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +static inline unsigned long nmi_interval(void) +{ + return ((unsigned long)cpu_khz * 1000 * nmi_mult) / nmi_hz; +} + /* Run after command line and cpu_init init, but before all other checks */ void __init nmi_watchdog_default(void) { @@ -146,8 +149,10 @@ /* now that we know it works we can reduce NMI frequency to something more reasonable; makes a difference in some configs */ - if (nmi_watchdog == NMI_LOCAL_APIC) + if (nmi_watchdog == NMI_LOCAL_APIC) { nmi_hz = 1; + nmi_mult = 8; + } return 0; } @@ -305,9 +310,6 @@ int i; unsigned int evntsel; - /* No check, so can start with slow frequency */ - nmi_hz = 1; - /* XXX should check these in EFER */ nmi_perfctr_msr = MSR_K7_PERFCTR0; @@ -325,7 +327,7 @@ | K7_NMI_EVENT; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz*1000) / nmi_hz); + wrmsrl(MSR_K7_PERFCTR0, -nmi_interval()); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); @@ -393,10 +395,10 @@ if (last_irq_sums[cpu] == sum) { /* * Ayiee, looks like this CPU is stuck ... -* wait a few IRQs (5 seconds) before doing the oops ... +* wait a few NMIs before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) { + if (alert_counter[cpu] == 3*nmi_hz) { if (notify_die(DIE_NMI, nmi, regs, reason, 2, SIGINT) == NOTIFY_STOP) { alert_counter[cpu] = 0; @@ -409,7 +411,7 @@ alert_counter[cpu] = 0; } if (nmi_perfctr_msr) - wrmsr(nmi_perfctr_msr, -(cpu_khz/nmi_hz*1000), -1); + wrmsr(nmi_perfctr_msr, -nmi_interval(), -1); } static int dummy_nmi_callback(struct pt_regs * regs, int cpu) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 2005-04-18 at 13:05 +0200, Alexander Nyberg wrote: [Proper patch now that goes all the way, sorry for spamming] Patch below uses RETIRED_UOPS for a more constant rate of NMI sending. This makes x64 deliver NMI interrupts every fourth second at a constant rate when going through the local apic. Makes both cpus on my box to get NMIs at constant rate that it previously did not, there could be long delays when a CPU was idle. isn't this dangerous in the light of the mobile cpus that either scale back or stop entirely in idle or lower load situations ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
mån 2005-04-18 klockan 13:14 +0200 skrev Arjan van de Ven: On Mon, 2005-04-18 at 13:05 +0200, Alexander Nyberg wrote: [Proper patch now that goes all the way, sorry for spamming] Patch below uses RETIRED_UOPS for a more constant rate of NMI sending. This makes x64 deliver NMI interrupts every fourth second at a constant rate when going through the local apic. Makes both cpus on my box to get NMIs at constant rate that it previously did not, there could be long delays when a CPU was idle. isn't this dangerous in the light of the mobile cpus that either scale back or stop entirely in idle or lower load situations ? I don't see any real problem, at each nmi_watchdog_tick() the next NMI is calculated accounting cpu_khz so the NMIs might not come at a constant rate while frequency scaling, but over time there will still be one every fourth second. And if stop entirely as you say, are there even any uops run? And even if so the watchdog that is now currently would also have a few events accounted on it and could fire NMI aswell. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 18 Apr 2005 11:56:15 +0200, Alexander Nyberg wrote: This patch fixes the NMI checking problems in -mm x64 for me. It What problems? Sorry, in -mm on x64 check_nmi_watchdog() has started to be run as a late_initcall(). Currently it reports the NMIs as stuck on a few systems although they are not, both of mine are reported as stuck. This appears to be because the current event mask uses don't appear to tick much running mdelay() on opteron (in my case). Please provide a complete dmesg log up to and including the failure point where the kernel complains about stuck NMIs. I tried 2.6.12-rc2-mm3 SMP on UP amd64 and I immediately found a bug triggering bogus stuck NMI failures, and I want to check if what you're seeing is caused by the same bug. Also in -mm because nmi_hz is set to 1 in setup_k7_watchdog() the NMI watchdog checking takes 10 seconds, a bit much. Orthogonal issue. Let's ignore this one for now. Patch below uses RETIRED_UOPS for a more constant rate of NMI sending, This may or may not work as you intend. There is _no_ documented reason to assume that RETIRED_UOPS would provide a more steady stream of events than CYCLES_PROCESSOR_IS_RUNNING. Both events are likely to be idle in HLT states, for instance. The local APIC + performance counter driven NMI watchdog simply cannot provide wall-clock like behaviour. You need the I/O-APIC driven watchdog for that, or to prevent the kernel from using HLT when idle. @@ -68,7 +69,7 @@ #define K7_EVNTSEL_INT(1 20) #define K7_EVNTSEL_OS (1 17) #define K7_EVNTSEL_USR(1 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 +#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0xC1 /* Retired uops */ #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING This is as bogus as #define ONE 2. CYCLES_PROCESSOR_IS_RUNNING _is_ event 0x76 (AMD renamed it recently, but that's irrelevant). Using RETIRED_UOPS requires a new define, and a modification to the K7_NMI_EVENT #define. /Mikael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3: hostap: do not #include .c files
On Tue, Apr 19, 2005 at 04:03:12AM +0200, Adrian Bunk wrote: drivers/net/wireless/hostap/hostap.c:#include hostap_crypt.c Please do not #include .c files. A tested patch would be appreciated.. ;-) A proper separation in a .c file and a header file is the better solution. Agreed and this is on my to-do list, but not very high on it. Some of these would be relatively easy to fix, but the hardware specific ones (different register offsets for PC Card/PLX/PCI) would require quite a bit of changes to get rid of this. -- Jouni MalinenPGP id EFC895FA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 18 Apr 2005 00:27:02 +0200, Alexander Nyberg wrote: >This patch fixes the NMI checking problems in -mm x64 for me. It What problems? >changes the perfctr selection to use RETIRED_UOPS instead >(makes both processors tick even on my box). This patch mixes what appears to be cleanups with what appears to be bug fixes. Please separate them, and describe the problem in enough detail that we can judge the validity of the bug-fix parts. >Index: x64_mm/arch/x86_64/kernel/nmi.c >=== >--- x64_mm.orig/arch/x86_64/kernel/nmi.c 2005-04-17 14:34:09.0 >+0200 >+++ x64_mm/arch/x86_64/kernel/nmi.c2005-04-18 02:11:37.0 +0200 >@@ -58,7 +58,7 @@ > int panic_on_timeout; > > unsigned int nmi_watchdog = NMI_DEFAULT; >-static unsigned int nmi_hz = HZ; >+static unsigned long nmi_hz = HZ; Why? Surely the value won't exceed 2^32-1? > unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ > > /* Note that these events don't tick when the CPU idles. This means >@@ -70,6 +70,7 @@ > #define K7_EVNTSEL_USR(1 << 16) > #define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 > #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING >+#define K7_RETIRED_UOPS 0xC1 /* always running */ > > #define P6_EVNTSEL0_ENABLE(1 << 22) > #define P6_EVNTSEL_INT(1 << 20) >@@ -78,6 +79,11 @@ > #define P6_EVENT_CPU_CLOCKS_NOT_HALTED0x79 > #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED > >+static inline unsigned long nmi_interval(void) >+{ >+ return (((unsigned long)cpu_khz * 1000UL) / nmi_hz); Extraneous parentheses. Also I'd prefer to divide before the multiply. >+} >+ > /* Run after command line and cpu_init init, but before all other checks */ > void __init nmi_watchdog_default(void) > { >@@ -129,8 +135,8 @@ > > for (cpu = 0; cpu < NR_CPUS; cpu++) > counts[cpu] = cpu_pda[cpu].__nmi_count; >- local_irq_enable(); Why? >- mdelay((10*1000)/nmi_hz); // wait 10 ticks >+ >+ mdelay((10*1000) / nmi_hz); /* wait 10 NMI ticks */ Not a bug fix. > > for (cpu = 0; cpu < NR_CPUS; cpu++) { > if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) { >@@ -305,9 +311,6 @@ > int i; > unsigned int evntsel; > >- /* No check, so can start with slow frequency */ >- nmi_hz = 1; >- What's this for? > /* XXX should check these in EFER */ > > nmi_perfctr_msr = MSR_K7_PERFCTR0; >@@ -322,10 +325,10 @@ > evntsel = K7_EVNTSEL_INT > | K7_EVNTSEL_OS > | K7_EVNTSEL_USR >- | K7_NMI_EVENT; >+ | K7_RETIRED_UOPS; Bogus. Redefine K7_NMI_EVENT instead. /Mikael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
mån 2005-04-11 klockan 01:25 -0700 skrev Andrew Morton: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > I tried to kexec on my x64 and it hangs up in calibrate_delay() because the PIT never fires any interrupts so jiffies is never updated. Has kexec been tested on x64 and should be working? I want to know if I should start looking at weirdness with my hardware or if it is like this on all x64 boxes. Also, patch at bottom is needed to compile kexec on x64 without ia32 emulation support (the includes are not used at the moment). CC arch/x86_64/kernel/crash.o In file included from arch/x86_64/kernel/crash.c:18: include/linux/elfcore.h: I funktion `elf_core_copy_regs': include/linux/elfcore.h:92: error: dereferencing pointer to incomplete type include/linux/elfcore.h:92: error: dereferencing pointer to incomplete type Index: x64_mm/arch/x86_64/kernel/crash.c === --- x64_mm.orig/arch/x86_64/kernel/crash.c 2005-04-16 19:23:58.0 +0200 +++ x64_mm/arch/x86_64/kernel/crash.c 2005-04-16 19:47:56.0 +0200 @@ -14,8 +14,6 @@ #include #include #include -#include -#include #include #include - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > [Mikael Pettersson on CC, would like your advice] This patch fixes the NMI checking problems in -mm x64 for me. It changes the perfctr selection to use RETIRED_UOPS instead (makes both processors tick even on my box). This makes the NMI tick once per second while running which is quite much, I'd like to get it down to every fourth second and herein lies the problem. Multiplying nmi_interval() in patch below with 4 does not help, still ticks at about the same pace. I'm puzzled... Index: x64_mm/arch/x86_64/kernel/nmi.c === --- x64_mm.orig/arch/x86_64/kernel/nmi.c2005-04-17 14:34:09.0 +0200 +++ x64_mm/arch/x86_64/kernel/nmi.c 2005-04-18 02:11:37.0 +0200 @@ -58,7 +58,7 @@ int panic_on_timeout; unsigned int nmi_watchdog = NMI_DEFAULT; -static unsigned int nmi_hz = HZ; +static unsigned long nmi_hz = HZ; unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ /* Note that these events don't tick when the CPU idles. This means @@ -70,6 +70,7 @@ #define K7_EVNTSEL_USR (1 << 16) #define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING +#define K7_RETIRED_UOPS0xC1 /* always running */ #define P6_EVNTSEL0_ENABLE (1 << 22) #define P6_EVNTSEL_INT (1 << 20) @@ -78,6 +79,11 @@ #define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +static inline unsigned long nmi_interval(void) +{ + return (((unsigned long)cpu_khz * 1000UL) / nmi_hz); +} + /* Run after command line and cpu_init init, but before all other checks */ void __init nmi_watchdog_default(void) { @@ -129,8 +135,8 @@ for (cpu = 0; cpu < NR_CPUS; cpu++) counts[cpu] = cpu_pda[cpu].__nmi_count; - local_irq_enable(); - mdelay((10*1000)/nmi_hz); // wait 10 ticks + + mdelay((10*1000) / nmi_hz); /* wait 10 NMI ticks */ for (cpu = 0; cpu < NR_CPUS; cpu++) { if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) { @@ -305,9 +311,6 @@ int i; unsigned int evntsel; - /* No check, so can start with slow frequency */ - nmi_hz = 1; - /* XXX should check these in EFER */ nmi_perfctr_msr = MSR_K7_PERFCTR0; @@ -322,10 +325,10 @@ evntsel = K7_EVNTSEL_INT | K7_EVNTSEL_OS | K7_EVNTSEL_USR - | K7_NMI_EVENT; + | K7_RETIRED_UOPS; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz*1000) / nmi_hz); + wrmsrl(MSR_K7_PERFCTR0, -nmi_interval()); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); @@ -409,7 +412,7 @@ alert_counter[cpu] = 0; } if (nmi_perfctr_msr) - wrmsr(nmi_perfctr_msr, -(cpu_khz/nmi_hz*1000), -1); + wrmsr(nmi_perfctr_msr, -nmi_interval(), -1); } static int dummy_nmi_callback(struct pt_regs * regs, int cpu) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > On Fri, 2005-04-15 at 20:23 +0200, Juergen Kreileder wrote: >> Juergen Kreileder <[EMAIL PROTECTED]> writes: >> >>> Andrew Morton <[EMAIL PROTECTED]> writes: >>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ >>> >>> I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. >> >> I think I finally found the culprit. Both rc2-mm3 and rc1-mm1 work >> fine when I reverse the timer-* patches. >> >> Any idea? Bug in my ppc64 gcc? > > Or a bug in those patches, Probably. I've tried a different toolchain now (3.4.3), didn't help. Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: On Fri, 2005-04-15 at 20:23 +0200, Juergen Kreileder wrote: Juergen Kreileder [EMAIL PROTECTED] writes: Andrew Morton [EMAIL PROTECTED] writes: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. I think I finally found the culprit. Both rc2-mm3 and rc1-mm1 work fine when I reverse the timer-* patches. Any idea? Bug in my ppc64 gcc? Or a bug in those patches, Probably. I've tried a different toolchain now (3.4.3), didn't help. Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ [Mikael Pettersson on CC, would like your advice] This patch fixes the NMI checking problems in -mm x64 for me. It changes the perfctr selection to use RETIRED_UOPS instead (makes both processors tick even on my box). This makes the NMI tick once per second while running which is quite much, I'd like to get it down to every fourth second and herein lies the problem. Multiplying nmi_interval() in patch below with 4 does not help, still ticks at about the same pace. I'm puzzled... Index: x64_mm/arch/x86_64/kernel/nmi.c === --- x64_mm.orig/arch/x86_64/kernel/nmi.c2005-04-17 14:34:09.0 +0200 +++ x64_mm/arch/x86_64/kernel/nmi.c 2005-04-18 02:11:37.0 +0200 @@ -58,7 +58,7 @@ int panic_on_timeout; unsigned int nmi_watchdog = NMI_DEFAULT; -static unsigned int nmi_hz = HZ; +static unsigned long nmi_hz = HZ; unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ /* Note that these events don't tick when the CPU idles. This means @@ -70,6 +70,7 @@ #define K7_EVNTSEL_USR (1 16) #define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING +#define K7_RETIRED_UOPS0xC1 /* always running */ #define P6_EVNTSEL0_ENABLE (1 22) #define P6_EVNTSEL_INT (1 20) @@ -78,6 +79,11 @@ #define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +static inline unsigned long nmi_interval(void) +{ + return (((unsigned long)cpu_khz * 1000UL) / nmi_hz); +} + /* Run after command line and cpu_init init, but before all other checks */ void __init nmi_watchdog_default(void) { @@ -129,8 +135,8 @@ for (cpu = 0; cpu NR_CPUS; cpu++) counts[cpu] = cpu_pda[cpu].__nmi_count; - local_irq_enable(); - mdelay((10*1000)/nmi_hz); // wait 10 ticks + + mdelay((10*1000) / nmi_hz); /* wait 10 NMI ticks */ for (cpu = 0; cpu NR_CPUS; cpu++) { if (cpu_pda[cpu].__nmi_count - counts[cpu] = 5) { @@ -305,9 +311,6 @@ int i; unsigned int evntsel; - /* No check, so can start with slow frequency */ - nmi_hz = 1; - /* XXX should check these in EFER */ nmi_perfctr_msr = MSR_K7_PERFCTR0; @@ -322,10 +325,10 @@ evntsel = K7_EVNTSEL_INT | K7_EVNTSEL_OS | K7_EVNTSEL_USR - | K7_NMI_EVENT; + | K7_RETIRED_UOPS; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz*1000) / nmi_hz); + wrmsrl(MSR_K7_PERFCTR0, -nmi_interval()); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); @@ -409,7 +412,7 @@ alert_counter[cpu] = 0; } if (nmi_perfctr_msr) - wrmsr(nmi_perfctr_msr, -(cpu_khz/nmi_hz*1000), -1); + wrmsr(nmi_perfctr_msr, -nmi_interval(), -1); } static int dummy_nmi_callback(struct pt_regs * regs, int cpu) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
mån 2005-04-11 klockan 01:25 -0700 skrev Andrew Morton: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ I tried to kexec on my x64 and it hangs up in calibrate_delay() because the PIT never fires any interrupts so jiffies is never updated. Has kexec been tested on x64 and should be working? I want to know if I should start looking at weirdness with my hardware or if it is like this on all x64 boxes. Also, patch at bottom is needed to compile kexec on x64 without ia32 emulation support (the includes are not used at the moment). CC arch/x86_64/kernel/crash.o In file included from arch/x86_64/kernel/crash.c:18: include/linux/elfcore.h: I funktion `elf_core_copy_regs': include/linux/elfcore.h:92: error: dereferencing pointer to incomplete type include/linux/elfcore.h:92: error: dereferencing pointer to incomplete type Index: x64_mm/arch/x86_64/kernel/crash.c === --- x64_mm.orig/arch/x86_64/kernel/crash.c 2005-04-16 19:23:58.0 +0200 +++ x64_mm/arch/x86_64/kernel/crash.c 2005-04-16 19:47:56.0 +0200 @@ -14,8 +14,6 @@ #include linux/irq.h #include linux/reboot.h #include linux/kexec.h -#include linux/elf.h -#include linux/elfcore.h #include asm/processor.h #include asm/hardirq.h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 18 Apr 2005 00:27:02 +0200, Alexander Nyberg wrote: This patch fixes the NMI checking problems in -mm x64 for me. It What problems? changes the perfctr selection to use RETIRED_UOPS instead (makes both processors tick even on my box). This patch mixes what appears to be cleanups with what appears to be bug fixes. Please separate them, and describe the problem in enough detail that we can judge the validity of the bug-fix parts. Index: x64_mm/arch/x86_64/kernel/nmi.c === --- x64_mm.orig/arch/x86_64/kernel/nmi.c 2005-04-17 14:34:09.0 +0200 +++ x64_mm/arch/x86_64/kernel/nmi.c2005-04-18 02:11:37.0 +0200 @@ -58,7 +58,7 @@ int panic_on_timeout; unsigned int nmi_watchdog = NMI_DEFAULT; -static unsigned int nmi_hz = HZ; +static unsigned long nmi_hz = HZ; Why? Surely the value won't exceed 2^32-1? unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ /* Note that these events don't tick when the CPU idles. This means @@ -70,6 +70,7 @@ #define K7_EVNTSEL_USR(1 16) #define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING +#define K7_RETIRED_UOPS 0xC1 /* always running */ #define P6_EVNTSEL0_ENABLE(1 22) #define P6_EVNTSEL_INT(1 20) @@ -78,6 +79,11 @@ #define P6_EVENT_CPU_CLOCKS_NOT_HALTED0x79 #define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +static inline unsigned long nmi_interval(void) +{ + return (((unsigned long)cpu_khz * 1000UL) / nmi_hz); Extraneous parentheses. Also I'd prefer to divide before the multiply. +} + /* Run after command line and cpu_init init, but before all other checks */ void __init nmi_watchdog_default(void) { @@ -129,8 +135,8 @@ for (cpu = 0; cpu NR_CPUS; cpu++) counts[cpu] = cpu_pda[cpu].__nmi_count; - local_irq_enable(); Why? - mdelay((10*1000)/nmi_hz); // wait 10 ticks + + mdelay((10*1000) / nmi_hz); /* wait 10 NMI ticks */ Not a bug fix. for (cpu = 0; cpu NR_CPUS; cpu++) { if (cpu_pda[cpu].__nmi_count - counts[cpu] = 5) { @@ -305,9 +311,6 @@ int i; unsigned int evntsel; - /* No check, so can start with slow frequency */ - nmi_hz = 1; - What's this for? /* XXX should check these in EFER */ nmi_perfctr_msr = MSR_K7_PERFCTR0; @@ -322,10 +325,10 @@ evntsel = K7_EVNTSEL_INT | K7_EVNTSEL_OS | K7_EVNTSEL_USR - | K7_NMI_EVENT; + | K7_RETIRED_UOPS; Bogus. Redefine K7_NMI_EVENT instead. /Mikael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Fri, 2005-04-15 at 20:23 +0200, Juergen Kreileder wrote: > Juergen Kreileder <[EMAIL PROTECTED]> writes: > > > Andrew Morton <[EMAIL PROTECTED]> writes: > > > >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > > > I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. > > I think I finally found the culprit. Both rc2-mm3 and rc1-mm1 work > fine when I reverse the timer-* patches. > > Any idea? Bug in my ppc64 gcc? Or a bug in those patches, I'll have a look as soon as I find 5 minutes. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Juergen Kreileder <[EMAIL PROTECTED]> writes: > Andrew Morton <[EMAIL PROTECTED]> writes: > >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. I think I finally found the culprit. Both rc2-mm3 and rc1-mm1 work fine when I reverse the timer-* patches. Any idea? Bug in my ppc64 gcc? Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Juergen Kreileder [EMAIL PROTECTED] writes: Andrew Morton [EMAIL PROTECTED] writes: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. I think I finally found the culprit. Both rc2-mm3 and rc1-mm1 work fine when I reverse the timer-* patches. Any idea? Bug in my ppc64 gcc? Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Fri, 2005-04-15 at 20:23 +0200, Juergen Kreileder wrote: Juergen Kreileder [EMAIL PROTECTED] writes: Andrew Morton [EMAIL PROTECTED] writes: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. I think I finally found the culprit. Both rc2-mm3 and rc1-mm1 work fine when I reverse the timer-* patches. Any idea? Bug in my ppc64 gcc? Or a bug in those patches, I'll have a look as soon as I find 5 minutes. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Hello. Ingo Molnar wrote: does the patch below fix the problem for you? Works perfectly, thankyou! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Hello. Ingo Molnar wrote: does the patch below fix the problem for you? Works perfectly, thankyou! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Ed Tomlinson <[EMAIL PROTECTED]> wrote: > > On Wednesday 13 April 2005 20:20, Andrew Morton wrote: > > Ed Tomlinson <[EMAIL PROTECTED]> wrote: > > > > > > > Don't think so - it works OK here. Checked the .config? Does the > serial > > > > port work if you do `echo foo > /dev/ttyS0'? ACPI? > > > > > > Turned out it was some old ups software that got reactivated on the box > displaying the > > > console - was a pain to disable it > > > > OK. > > > > > In any case, when the box reboots there are not any messages. Any > ideas on what debug > > > options to enable or suggestions on how we can figure out the cause of > the reboots. > > > > There were a few problems in the task switching area - maybe that. > > These hit arch/i386. Are they going to help on an x86_64 box? nope. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Wednesday 13 April 2005 20:20, Andrew Morton wrote: > Ed Tomlinson <[EMAIL PROTECTED]> wrote: > > > > > Don't think so - it works OK here. Checked the .config? Does the serial > > > port work if you do `echo foo > /dev/ttyS0'? ACPI? > > > > Turned out it was some old ups software that got reactivated on the box > > displaying the > > console - was a pain to disable it > > OK. > > > In any case, when the box reboots there are not any messages. Any ideas > > on what debug > > options to enable or suggestions on how we can figure out the cause of the > > reboots. > > There were a few problems in the task switching area - maybe that. These hit arch/i386. Are they going to help on an x86_64 box? Ed > From: Ingo Molnar <[EMAIL PROTECTED]> > > delay the reloading of segment registers into switch_mm(), so that if > the LDT size changes we dont get a (silent) fault and a zeroed selector > register upon reloading. > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > 25-akpm/arch/i386/kernel/process.c | 10 +- > 25-akpm/include/asm-i386/mmu_context.h |7 +++ > 2 files changed, 12 insertions(+), 5 deletions(-) > > diff -puN arch/i386/kernel/process.c~sched-unlocked-context-switches-fix > arch/i386/kernel/process.c > --- 25/arch/i386/kernel/process.c~sched-unlocked-context-switches-fix > 2005-04-12 03:43:07.254363568 -0700 > +++ 25-akpm/arch/i386/kernel/process.c2005-04-12 03:43:07.259362808 > -0700 > @@ -653,12 +653,12 @@ struct task_struct fastcall * __switch_t > asm volatile("mov %%gs,%0":"=m" (prev->gs)); > > /* > - * Restore %fs and %gs if needed. > + * Clear selectors if needed: >*/ > - if (unlikely(prev->fs | prev->gs | next->fs | next->gs)) { > - loadsegment(fs, next->fs); > - loadsegment(gs, next->gs); > - } > +if (unlikely((prev->fs | prev->gs) && !(next->fs | next->gs))) { > +loadsegment(fs, next->fs); > +loadsegment(gs, next->gs); > +} > > /* >* Now maybe reload the debug registers > diff -puN include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix > include/asm-i386/mmu_context.h > --- 25/include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix > 2005-04-12 03:43:07.256363264 -0700 > +++ 25-akpm/include/asm-i386/mmu_context.h2005-04-12 03:43:07.260362656 > -0700 > @@ -61,6 +61,13 @@ static inline void switch_mm(struct mm_s > } > } > #endif > + /* > + * Now that we've switched the LDT, load segments: > + */ > + if (unlikely(current->thread.fs | current->thread.gs)) { > + loadsegment(fs, current->thread.fs); > + loadsegment(gs, current->thread.gs); > + } > } > > #define deactivate_mm(tsk, mm) \ > _ > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Ed Tomlinson <[EMAIL PROTECTED]> wrote: > > > Don't think so - it works OK here. Checked the .config? Does the serial > > port work if you do `echo foo > /dev/ttyS0'? ACPI? > > Turned out it was some old ups software that got reactivated on the box > displaying the > console - was a pain to disable it OK. > In any case, when the box reboots there are not any messages. Any ideas on > what debug > options to enable or suggestions on how we can figure out the cause of the > reboots. There were a few problems in the task switching area - maybe that. From: Ingo Molnar <[EMAIL PROTECTED]> delay the reloading of segment registers into switch_mm(), so that if the LDT size changes we dont get a (silent) fault and a zeroed selector register upon reloading. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- 25-akpm/arch/i386/kernel/process.c | 10 +- 25-akpm/include/asm-i386/mmu_context.h |7 +++ 2 files changed, 12 insertions(+), 5 deletions(-) diff -puN arch/i386/kernel/process.c~sched-unlocked-context-switches-fix arch/i386/kernel/process.c --- 25/arch/i386/kernel/process.c~sched-unlocked-context-switches-fix 2005-04-12 03:43:07.254363568 -0700 +++ 25-akpm/arch/i386/kernel/process.c 2005-04-12 03:43:07.259362808 -0700 @@ -653,12 +653,12 @@ struct task_struct fastcall * __switch_t asm volatile("mov %%gs,%0":"=m" (prev->gs)); /* -* Restore %fs and %gs if needed. +* Clear selectors if needed: */ - if (unlikely(prev->fs | prev->gs | next->fs | next->gs)) { - loadsegment(fs, next->fs); - loadsegment(gs, next->gs); - } +if (unlikely((prev->fs | prev->gs) && !(next->fs | next->gs))) { +loadsegment(fs, next->fs); +loadsegment(gs, next->gs); +} /* * Now maybe reload the debug registers diff -puN include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix include/asm-i386/mmu_context.h --- 25/include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix 2005-04-12 03:43:07.256363264 -0700 +++ 25-akpm/include/asm-i386/mmu_context.h 2005-04-12 03:43:07.260362656 -0700 @@ -61,6 +61,13 @@ static inline void switch_mm(struct mm_s } } #endif + /* +* Now that we've switched the LDT, load segments: +*/ + if (unlikely(current->thread.fs | current->thread.gs)) { + loadsegment(fs, current->thread.fs); + loadsegment(gs, current->thread.gs); + } } #define deactivate_mm(tsk, mm) \ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Tuesday 12 April 2005 07:39, Andrew Morton wrote: > Ed Tomlinson <[EMAIL PROTECTED]> wrote: > > > > On Monday 11 April 2005 04:25, Andrew Morton wrote: > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > > > > > > > > - The anticipatory I/O scheduler has always been fairly useless with SCSI > > > disks which perform tagged command queueing. There's a patch here from > > > Jens > > > which is designed to fix that up by constraining the number of requests > > > which we'll leave pending in the device. > > > > > > The depth currently defaults to 1. Tunable in > > > /sys/block/hdX/queue/iosched/queue_depth > > > > > > This patch hasn't been performance tested at all yet. If you think it > > > is > > > misbehaving (the usual symptom is processes stuck in D state) then > > > please > > > report it, then boot with `elevator=cfq' or `elevator=deadline' to work > > > around it. > > > > > > - More CPU scheduler work. I hope someone is testing this stuff. > > > > Something is not quite right here. I built rc2-mm3 and booted (uni > > processor, amd64, preempt on). > > mm3 lasted about 30 mins before locking up with a dead keyboard. I had mm2 > > reboot a few times > > over the last couple of days too. > > > > 11-mm3 uptime of 2 weeks+ > > 12-rc2-mm2 reboots once every couple of days > > 12-rc2-mm3 locked up within 30 mins using X using kmail/bogofilter > > Unpleasant. Serial console would be nice ;) > > > My serial console does not seem to want to work. Has anything changed with > > this support? > > > > Don't think so - it works OK here. Checked the .config? Does the serial > port work if you do `echo foo > /dev/ttyS0'? ACPI? Turned out it was some old ups software that got reactivated on the box displaying the console - was a pain to disable it In any case, when the box reboots there are not any messages. Any ideas on what debug options to enable or suggestions on how we can figure out the cause of the reboots. TIA, Ed Tomlinson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
* Stas Sergeev <[EMAIL PROTECTED]> wrote: > Hi Ingo. > > I have some programs that crash > in 2.6.12-rc2-mm3. After seeing this: > http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.1/1091.html does the patch below fix the problem for you? (already in Andrew's tree, should be in the next -mm patch) Ingo -- delay the reloading of segment registers into switch_mm(), so that if the LDT size changes we dont get a (silent) fault and a zeroed selector register upon reloading. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- linux/arch/i386/kernel/process.c.orig +++ linux/arch/i386/kernel/process.c @@ -612,12 +612,12 @@ struct task_struct fastcall * __switch_t asm volatile("movl %%gs,%0":"=m" (*(int *)>gs)); /* -* Restore %fs and %gs if needed. +* Clear selectors if needed: */ - if (unlikely(prev->fs | prev->gs | next->fs | next->gs)) { - loadsegment(fs, next->fs); - loadsegment(gs, next->gs); - } +if (unlikely((prev->fs | prev->gs) && !(next->fs | next->gs))) { +loadsegment(fs, next->fs); +loadsegment(gs, next->gs); +} /* * Now maybe reload the debug registers --- linux/include/asm-i386/mmu_context.h.orig +++ linux/include/asm-i386/mmu_context.h @@ -61,6 +61,13 @@ static inline void switch_mm(struct mm_s } } #endif + /* +* Now that we've switched the LDT, load segments: +*/ + if (unlikely(current->thread.fs | current->thread.gs)) { + loadsegment(fs, current->thread.fs); + loadsegment(gs, current->thread.gs); + } } #define deactivate_mm(tsk, mm) \ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Hi Ingo. I have some programs that crash in 2.6.12-rc2-mm3. After seeing this: http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.1/1091.html I tried to revert the sched-unlocked-context-switches.patch and indeed the problem goes away. Attached is the (crappy) test-case. If you can make it to say "All OK" then the problem is solved. Apparently the %fs gets trashed somewhere, any ideas? #include #include #include #include #include #include #include _syscall3(int, modify_ldt, int, func, void *, ptr, unsigned long, bytecount) static int set_ldt_entry(int entry, unsigned long base, unsigned int limit, int seg_32bit_flag, int contents, int read_only_flag, int limit_in_pages_flag, int seg_not_present, int useable) { struct modify_ldt_ldt_s ldt_info; ldt_info.entry_number = entry; ldt_info.base_addr = base; ldt_info.limit = limit; ldt_info.seg_32bit = seg_32bit_flag; ldt_info.contents = contents; ldt_info.read_exec_only = read_only_flag; ldt_info.limit_in_pages = limit_in_pages_flag; ldt_info.seg_not_present = seg_not_present; ldt_info.useable = useable; return modify_ldt(1, _info, sizeof(ldt_info)); } int main(int argc, char *argv[]) { unsigned short _ss, new_ss, fs; /* Get SS */ asm volatile( "movw %%ss, %0\n" :"=m"(_ss) ); /* Force to LDT */ new_ss = (_ss & 0x) | 4; /* Create the LDT entry */ set_ldt_entry(new_ss >> 3, 0, 0xf, 0, MODIFY_LDT_CONTENTS_DATA, 0, 1, 0, 0); asm ("movw %%fs, %0":"=m"(fs)); printf("fs1=0x%hx\n", fs); asm ("movw %0, %%fs"::"a"(new_ss)); asm ("movw %%fs, %0":"=m"(fs)); printf("fs2=0x%hx\n", fs); usleep(0); asm ("movw %%fs, %0":"=m"(fs)); printf("fs3=0x%hx\n", fs); if (fs != new_ss) printf("BUG!\n"); else printf("All OK\n"); return 0; }
Re: 2.6.12-rc2-mm3
Hi Ingo. I have some programs that crash in 2.6.12-rc2-mm3. After seeing this: http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.1/1091.html I tried to revert the sched-unlocked-context-switches.patch and indeed the problem goes away. Attached is the (crappy) test-case. If you can make it to say All OK then the problem is solved. Apparently the %fs gets trashed somewhere, any ideas? #include stdio.h #include string.h #include stdlib.h #include signal.h #include linux/unistd.h #include asm/ldt.h #include asm/ucontext.h _syscall3(int, modify_ldt, int, func, void *, ptr, unsigned long, bytecount) static int set_ldt_entry(int entry, unsigned long base, unsigned int limit, int seg_32bit_flag, int contents, int read_only_flag, int limit_in_pages_flag, int seg_not_present, int useable) { struct modify_ldt_ldt_s ldt_info; ldt_info.entry_number = entry; ldt_info.base_addr = base; ldt_info.limit = limit; ldt_info.seg_32bit = seg_32bit_flag; ldt_info.contents = contents; ldt_info.read_exec_only = read_only_flag; ldt_info.limit_in_pages = limit_in_pages_flag; ldt_info.seg_not_present = seg_not_present; ldt_info.useable = useable; return modify_ldt(1, ldt_info, sizeof(ldt_info)); } int main(int argc, char *argv[]) { unsigned short _ss, new_ss, fs; /* Get SS */ asm volatile( movw %%ss, %0\n :=m(_ss) ); /* Force to LDT */ new_ss = (_ss 0x) | 4; /* Create the LDT entry */ set_ldt_entry(new_ss 3, 0, 0xf, 0, MODIFY_LDT_CONTENTS_DATA, 0, 1, 0, 0); asm (movw %%fs, %0:=m(fs)); printf(fs1=0x%hx\n, fs); asm (movw %0, %%fs::a(new_ss)); asm (movw %%fs, %0:=m(fs)); printf(fs2=0x%hx\n, fs); usleep(0); asm (movw %%fs, %0:=m(fs)); printf(fs3=0x%hx\n, fs); if (fs != new_ss) printf(BUG!\n); else printf(All OK\n); return 0; }
Re: 2.6.12-rc2-mm3
* Stas Sergeev [EMAIL PROTECTED] wrote: Hi Ingo. I have some programs that crash in 2.6.12-rc2-mm3. After seeing this: http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.1/1091.html does the patch below fix the problem for you? (already in Andrew's tree, should be in the next -mm patch) Ingo -- delay the reloading of segment registers into switch_mm(), so that if the LDT size changes we dont get a (silent) fault and a zeroed selector register upon reloading. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- linux/arch/i386/kernel/process.c.orig +++ linux/arch/i386/kernel/process.c @@ -612,12 +612,12 @@ struct task_struct fastcall * __switch_t asm volatile(movl %%gs,%0:=m (*(int *)prev-gs)); /* -* Restore %fs and %gs if needed. +* Clear selectors if needed: */ - if (unlikely(prev-fs | prev-gs | next-fs | next-gs)) { - loadsegment(fs, next-fs); - loadsegment(gs, next-gs); - } +if (unlikely((prev-fs | prev-gs) !(next-fs | next-gs))) { +loadsegment(fs, next-fs); +loadsegment(gs, next-gs); +} /* * Now maybe reload the debug registers --- linux/include/asm-i386/mmu_context.h.orig +++ linux/include/asm-i386/mmu_context.h @@ -61,6 +61,13 @@ static inline void switch_mm(struct mm_s } } #endif + /* +* Now that we've switched the LDT, load segments: +*/ + if (unlikely(current-thread.fs | current-thread.gs)) { + loadsegment(fs, current-thread.fs); + loadsegment(gs, current-thread.gs); + } } #define deactivate_mm(tsk, mm) \ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Tuesday 12 April 2005 07:39, Andrew Morton wrote: Ed Tomlinson [EMAIL PROTECTED] wrote: On Monday 11 April 2005 04:25, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ - The anticipatory I/O scheduler has always been fairly useless with SCSI disks which perform tagged command queueing. There's a patch here from Jens which is designed to fix that up by constraining the number of requests which we'll leave pending in the device. The depth currently defaults to 1. Tunable in /sys/block/hdX/queue/iosched/queue_depth This patch hasn't been performance tested at all yet. If you think it is misbehaving (the usual symptom is processes stuck in D state) then please report it, then boot with `elevator=cfq' or `elevator=deadline' to work around it. - More CPU scheduler work. I hope someone is testing this stuff. Something is not quite right here. I built rc2-mm3 and booted (uni processor, amd64, preempt on). mm3 lasted about 30 mins before locking up with a dead keyboard. I had mm2 reboot a few times over the last couple of days too. 11-mm3 uptime of 2 weeks+ 12-rc2-mm2 reboots once every couple of days 12-rc2-mm3 locked up within 30 mins using X using kmail/bogofilter Unpleasant. Serial console would be nice ;) My serial console does not seem to want to work. Has anything changed with this support? Don't think so - it works OK here. Checked the .config? Does the serial port work if you do `echo foo /dev/ttyS0'? ACPI? Turned out it was some old ups software that got reactivated on the box displaying the console - was a pain to disable it In any case, when the box reboots there are not any messages. Any ideas on what debug options to enable or suggestions on how we can figure out the cause of the reboots. TIA, Ed Tomlinson - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Ed Tomlinson [EMAIL PROTECTED] wrote: Don't think so - it works OK here. Checked the .config? Does the serial port work if you do `echo foo /dev/ttyS0'? ACPI? Turned out it was some old ups software that got reactivated on the box displaying the console - was a pain to disable it OK. In any case, when the box reboots there are not any messages. Any ideas on what debug options to enable or suggestions on how we can figure out the cause of the reboots. There were a few problems in the task switching area - maybe that. From: Ingo Molnar [EMAIL PROTECTED] delay the reloading of segment registers into switch_mm(), so that if the LDT size changes we dont get a (silent) fault and a zeroed selector register upon reloading. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- 25-akpm/arch/i386/kernel/process.c | 10 +- 25-akpm/include/asm-i386/mmu_context.h |7 +++ 2 files changed, 12 insertions(+), 5 deletions(-) diff -puN arch/i386/kernel/process.c~sched-unlocked-context-switches-fix arch/i386/kernel/process.c --- 25/arch/i386/kernel/process.c~sched-unlocked-context-switches-fix 2005-04-12 03:43:07.254363568 -0700 +++ 25-akpm/arch/i386/kernel/process.c 2005-04-12 03:43:07.259362808 -0700 @@ -653,12 +653,12 @@ struct task_struct fastcall * __switch_t asm volatile(mov %%gs,%0:=m (prev-gs)); /* -* Restore %fs and %gs if needed. +* Clear selectors if needed: */ - if (unlikely(prev-fs | prev-gs | next-fs | next-gs)) { - loadsegment(fs, next-fs); - loadsegment(gs, next-gs); - } +if (unlikely((prev-fs | prev-gs) !(next-fs | next-gs))) { +loadsegment(fs, next-fs); +loadsegment(gs, next-gs); +} /* * Now maybe reload the debug registers diff -puN include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix include/asm-i386/mmu_context.h --- 25/include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix 2005-04-12 03:43:07.256363264 -0700 +++ 25-akpm/include/asm-i386/mmu_context.h 2005-04-12 03:43:07.260362656 -0700 @@ -61,6 +61,13 @@ static inline void switch_mm(struct mm_s } } #endif + /* +* Now that we've switched the LDT, load segments: +*/ + if (unlikely(current-thread.fs | current-thread.gs)) { + loadsegment(fs, current-thread.fs); + loadsegment(gs, current-thread.gs); + } } #define deactivate_mm(tsk, mm) \ _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Wednesday 13 April 2005 20:20, Andrew Morton wrote: Ed Tomlinson [EMAIL PROTECTED] wrote: Don't think so - it works OK here. Checked the .config? Does the serial port work if you do `echo foo /dev/ttyS0'? ACPI? Turned out it was some old ups software that got reactivated on the box displaying the console - was a pain to disable it OK. In any case, when the box reboots there are not any messages. Any ideas on what debug options to enable or suggestions on how we can figure out the cause of the reboots. There were a few problems in the task switching area - maybe that. These hit arch/i386. Are they going to help on an x86_64 box? Ed From: Ingo Molnar [EMAIL PROTECTED] delay the reloading of segment registers into switch_mm(), so that if the LDT size changes we dont get a (silent) fault and a zeroed selector register upon reloading. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- 25-akpm/arch/i386/kernel/process.c | 10 +- 25-akpm/include/asm-i386/mmu_context.h |7 +++ 2 files changed, 12 insertions(+), 5 deletions(-) diff -puN arch/i386/kernel/process.c~sched-unlocked-context-switches-fix arch/i386/kernel/process.c --- 25/arch/i386/kernel/process.c~sched-unlocked-context-switches-fix 2005-04-12 03:43:07.254363568 -0700 +++ 25-akpm/arch/i386/kernel/process.c2005-04-12 03:43:07.259362808 -0700 @@ -653,12 +653,12 @@ struct task_struct fastcall * __switch_t asm volatile(mov %%gs,%0:=m (prev-gs)); /* - * Restore %fs and %gs if needed. + * Clear selectors if needed: */ - if (unlikely(prev-fs | prev-gs | next-fs | next-gs)) { - loadsegment(fs, next-fs); - loadsegment(gs, next-gs); - } +if (unlikely((prev-fs | prev-gs) !(next-fs | next-gs))) { +loadsegment(fs, next-fs); +loadsegment(gs, next-gs); +} /* * Now maybe reload the debug registers diff -puN include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix include/asm-i386/mmu_context.h --- 25/include/asm-i386/mmu_context.h~sched-unlocked-context-switches-fix 2005-04-12 03:43:07.256363264 -0700 +++ 25-akpm/include/asm-i386/mmu_context.h2005-04-12 03:43:07.260362656 -0700 @@ -61,6 +61,13 @@ static inline void switch_mm(struct mm_s } } #endif + /* + * Now that we've switched the LDT, load segments: + */ + if (unlikely(current-thread.fs | current-thread.gs)) { + loadsegment(fs, current-thread.fs); + loadsegment(gs, current-thread.gs); + } } #define deactivate_mm(tsk, mm) \ _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Ed Tomlinson [EMAIL PROTECTED] wrote: On Wednesday 13 April 2005 20:20, Andrew Morton wrote: Ed Tomlinson [EMAIL PROTECTED] wrote: Don't think so - it works OK here. Checked the .config? Does the serial port work if you do `echo foo /dev/ttyS0'? ACPI? Turned out it was some old ups software that got reactivated on the box displaying the console - was a pain to disable it OK. In any case, when the box reboots there are not any messages. Any ideas on what debug options to enable or suggestions on how we can figure out the cause of the reboots. There were a few problems in the task switching area - maybe that. These hit arch/i386. Are they going to help on an x86_64 box? nope. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: AS basically does its own TCQ strangulation, which IIRC involves things > > like completing all reads before issuing new writes, and completing all > > reads from one process before reads from another. As well as the > > fundamental way that waiting for a 'dependant read' throttles TCQ. > > My (mpt-fusion-based) workstation is still really slow when there's a lot > of writeout happening. Just from a quick test: > > > 2.6.12-rc2, as, tcq depth=2: 7.241 seconds > > 2.6.12-rc2, as, tcq depth=64: 12.172 seconds > > 2.6.12-rc2+patch,as, tcq depth=64: 7.199 seconds > > 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes > > 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes > > 2.6.11-rc4-mm1, as, mpt-f 39.349 seconds > > That was really really slow but had a sudden burst of read I/O at the end > which made the thing look better than it really is. I wouldn't have a clue > what tag depth it's using, and it's the only mpt-fusion based machine I > have handy... > Heh. Well with my current lineup on the mpt-fusion driver and no as-limit-queue-depth.patch that test takes 17 seconds. With as-limit-queue-depth.patch it's down to 10 seconds. Which is pretty darn good btw. I assume from this: scsi0 : ioc0: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=25 scsi1 : ioc1: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=26 that it's using a tag depth of 222. int req_depth; /* Number of request frames */ I wonder if that's true... One thing which changed is that this kernel now has the fixed-up mpt-fusion chipset tuning. That doubles the IO bandwidth, which would pretty well account for that difference. I'll wait and see how irritating things get under writeout load. Yes, we'll need to decide if we want to retain as-limit-queue-depth.patch and toss out some of the older AS logic which was designed to address the TCQ problem. Steve, could you help to identify a not-too-hard-to-set-up workload at which AS was particularly poor? Thanks. AS with XFS was pretty bad on a couple of workloads. random 4k reads and "metadata" which was 40%create, 40%append, 20%delete multithreaded workloads. I'll try to run a few tests with and without this patch on my hardware setup over the next day or so and see how it does. I have not really looked at AS performance since about 2.6.6/7. Our database team recently re-checked IO Scheduler performance, and on the Ad Hoc Decision Support Workload we still saw a 15-20% lower throughput on RHEL4 with AS compared to other schedulers which were all within a couple of %. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > On Tue, 2005-04-12 at 06:42 +0200, Juergen Kreileder wrote: >> Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: >> >>> On Tue, 2005-04-12 at 03:18 +0200, Juergen Kreileder wrote: Andrew Morton <[EMAIL PROTECTED]> writes: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. 2.6.11-mm4 works fine but all 2.6.12 versions I've tried (all since -rc1-mm3) lock up randomly. The easiest way to reproduce the problem seems to be running Azareus. So it might be network related, but I'm not 100% sure about that, there was a least one deadlock with virtually no network usage. >>> >>> Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you >>> test without it and let me know if it makes a difference ? >> >> IIRC I had disabled that for rc2-mm2 and it didn't make a >> difference. I'll disable it again when I try older versions. >> >> I just got another crash with rc2-mm3. The crash was a bit >> different this time, I still could move the mouse pointer and the >> logs contained some info: > > Ok, what about non-mm ? (just plain rc2) I've tried older kernels now. rc1-mm1 locks up (no logs); plain rc1 seems to be OK (running fine for several hours now). Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you > test without it and let me know if it makes a difference ? IIRC I had disabled that for rc2-mm2 and it didn't make a difference. I'll disable it again when I try older versions. I just got another crash with rc2-mm3. The crash was a bit different this time, I still could move the mouse pointer and the logs contained some info: >>> >>> Ok, what about non-mm ? (just plain rc2) >> >> I've tried older kernels now. rc1-mm1 locks up (no logs); plain >> rc1 seems to be OK (running fine for several hours now). > > Interesting. Please try -rc2 too... Works fine so far. Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 (ACPI build problem)
On Tue, Apr 12, 2005 at 09:53:02PM +0400, Stas Sergeev wrote: > Hello. > > Andrew Morton wrote: > >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > Fails to compile with > !CONFIG_ACPI && CONFIG_SMP. > CONFIG_SMP sets CONFIG_X86_HT, > which sets CONFIG_ACPI_BOOT, > but that fails without CONFIG_ACPI: > > CC arch/i386/kernel/setup.o > arch/i386/kernel/setup.c:96: error: syntax error before ???acpi_sci_flags??? > arch/i386/kernel/setup.c:96: warning: type defaults to ???int??? in > declaration of ???acpi_sci_flags??? > arch/i386/kernel/setup.c:96: warning: data definition has no type or > storage class > arch/i386/kernel/setup.c: In function ???parse_cmdline_early???: > arch/i386/kernel/setup.c:811: error: request for member ???trigger??? in > something not a structure or union > arch/i386/kernel/setup.c:814: error: request for member ???trigger??? in > something not a structure or union > arch/i386/kernel/setup.c:817: error: request for member ???polarity??? in > something not a structure or union > arch/i386/kernel/setup.c:820: error: request for member ???polarity??? in > something not a structure or union Known bug. Workaround: Enable CONFIG_ACPI. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
>> > See this patch from Steve French: > http://cifs.bkbits.net:8080/linux-2.5cifs/[EMAIL PROTECTED] Thanks, that fixed it. M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
> >>> Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you > >>> test without it and let me know if it makes a difference ? > >> > >> IIRC I had disabled that for rc2-mm2 and it didn't make a > >> difference. I'll disable it again when I try older versions. > >> > >> I just got another crash with rc2-mm3. The crash was a bit > >> different this time, I still could move the mouse pointer and the > >> logs contained some info: > > > > Ok, what about non-mm ? (just plain rc2) > > I've tried older kernels now. rc1-mm1 locks up (no logs); plain rc1 > seems to be OK (running fine for several hours now). Interesting. Please try -rc2 too... Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/3] Re: 2.6.12-rc2-mm3
Hello. Andrew Morton wrote: OK, the `int $3' is part of the CONFIG_TRAP_BAD_SYSCALL_EXITS thing which I never use. I'm not sure what problem is actually being reported here, now you mention it. The problem being reported here is that CONFIG_TRAP_BAD_SYSCALL_EXITS does nothing but locking up your machine. Actually the bug was so obvious, that I had real troubles finding it (the obvious things are difficult to spot), so I found some more bugs in a mean time. What was the bug? GET_THREAD_INFO(%ebp) was missing before TI_preempt_count(%ebp), hence the accesses beyond the stack. I'll have troubles beleiving this code worked without a lock-ups for someone sometimes. I fixed it differently though. The subsequent patches are addressing the issue. Yup. But are you sure that the "+ 8" is correct, given these offsets are<> larger than that? I don't think they are indeed larger. The %esp points to "struct pt_regs", which is 60 bytes in size, and without the xss/esp = 52. Adding 8 to this gives 60, so 56(+3) looks safe to me. Probably it decided that some syscall got a "bad exit". Disable CONFIG_TRAP_BAD_SYSCALL_EXITS. Yes, that's the fix too. > - p->thread.esp0 = (unsigned long) (childregs+1) - 8; > + p->thread.esp0 = (unsigned long) (childregs+1) - 15; 15 is somewhat nasty - it will make the stack unaligned, should better be 16 I think. ? It's still 4-byte-aligned. I don't see your point. Why do you think that I substract the stack pointer by 32 bytes, for example? I literally substract it by 8 bytes, you propose to substract it by 15 *bytes* (not dwords), so why would it still be aligned? But anyway, fortunately this bug is not about the esp0 stuff at all. I'm suspecting this is all due to CONFIG_TRAP_BAD_SYSCALL_EXITS taking the debug trap.. Sure. And that looks silly. I removed "int $3". Patches follow. Seems to work reliable now, but I don't know how to test it since there seem to be no such an offending syscalls here to test. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 (ACPI build problem)
Hello. Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ Fails to compile with !CONFIG_ACPI && CONFIG_SMP. CONFIG_SMP sets CONFIG_X86_HT, which sets CONFIG_ACPI_BOOT, but that fails without CONFIG_ACPI: CC arch/i386/kernel/setup.o arch/i386/kernel/setup.c:96: error: syntax error before âacpi_sci_flagsâ arch/i386/kernel/setup.c:96: warning: type defaults to âintâ in declaration of âacpi_sci_flagsâ arch/i386/kernel/setup.c:96: warning: data definition has no type or storage class arch/i386/kernel/setup.c: In function âparse_cmdline_earlyâ: arch/i386/kernel/setup.c:811: error: request for member âtriggerâ in something not a structure or union arch/i386/kernel/setup.c:814: error: request for member âtriggerâ in something not a structure or union arch/i386/kernel/setup.c:817: error: request for member âpolarityâ in something not a structure or union arch/i386/kernel/setup.c:820: error: request for member âpolarityâ in something not a structure or union - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: - The effects of tcq on AS are much less disastrous than I thought they > were. Do I have the wrong workload? Memory fails me. Or did we fix the > anticipatory scheduler? > > Yes, we did fix it ;) Quite a long time ago, so maybe you are thinking of something else (I haven't been able to work it out). Steve Pratt's ols2004 presentation made AS look pretty bad. However the numbers in the proceedings (http://www.finux.org/proceedings/LinuxSymposium2004_V2.pdf) are much less stark. Steve, what's up with that? The slides which you talked to had some awful numbers. Was it the same set of tests? I highlighted a few cases where AS went really wrong during the presentation, like on really large RAID 0 arrays, but in general (referring back to slides) AS trailed other schedulers by 5-10% on ext3, but had real trouble with XFS, losing by as much as %145 on 5disk raid5 system for a mix of workloads. Perhaps this is the piece you remember. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3: 10 seconds of nothingness
> [ 19.617890] Testing NMI watchdog ... <6>ACPI: No ACPI bus support > for 2-2 [ 19.705673] ACPI: No ACPI bus support for 2-2:1.0 > [ 20.002417] usb 3-2: new full speed USB device using uhci_hcd and > address 2 [ 20.121763] ACPI: No ACPI bus support for 3-2 > [ 20.156293] ACPI: No ACPI bus support for 3-2:1.0 > [ 29.539613] OK. > > I also had this "problem" with mm1. mm2 wouldn't compile, so I didn't > test that. IIRC it also happened with the rc1-mm's. Is this supposed to > happen? It's a fairly new thing on x64, should be fixed soon. If it disturbs you too much back out http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/broken-out/rfc-check-nmi-watchdog-is-broken.patch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Tuesday 12 April 2005 06:20, Stas Sergeev wrote: Here we go again: You might be right about the int3 instruction: (gdb) disas 0xc0102ee0 Dump of assembler code for function restore_all: 0xc0102ed1 : mov0x30(%esp),%eax 0xc0102ed5 : mov0x2c(%esp),%al 0xc0102ed9 : test $0x20003,%eax 0xc0102ede :je 0xc0102ee7 0xc0102ee0 :cmpl $0x0,0x14(%ebp) 0xc0102ee4 :je 0xc0102ee7 0xc0102ee6 :int3 End of assembler dump. > Could you please also do > "p $esp" or "info reg", so that we can > see the rest of the registers? Program received signal SIGTRAP, Trace/breakpoint trap. 0xc0102ee7 in resume_kernelX () at atomic.h:175 175 { (gdb) p $esp $1 = (void *) 0xdfcb4fc4 (gdb) info reg eax0x273627 ecx0x0 0 edx0x1 65536 ebx0xb7fd9c00 -1208116224 esp0xdfcb4fc4 0xdfcb4fc4 ebp0xbfbd5948 0xbfbd5948 esi0x77 119 edi0x1cb459 eip0xc0102ee7 0xc0102ee7 eflags 0x82 130 cs 0x60 96 ss 0x68 104 ds 0xc010007b -1072693125 es 0xdfcb007b -540344197 fs 0x 65535 gs 0x 65535 (gdb) > >> And as we see, we're at the "mov0x30(%esp),%eax" which accesses > >> above the bottom of the stack. > > But that's strange. Another instance of > the 0x30(%esp) is there a few instructions > above this one, see it with "disas restore_all". > It is much more likely that the real offender > is the previous instruction. $eip points on > the instruction *after* the trap, which might > be innocent. > > >> After applying nmi_stack_correct-fix.patch, rc2-mm3 > > I can't find this one in an -mm broken-outs. > Where is this patch? > Could you please also test this one: > http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.0/1287.html > > > Interesting. It could be an interaction between the kgdb patch and the > > new vm86 checking code. > > I think so too, will have a look if I can > reproduce it. > > > The above code is accessing esp+56, > > Yes, but this particular instruction was > not reached. "int $3" killed the system > for some reasons. > > > - p->thread.esp0 = (unsigned long) (childregs+1) - 8; > > + p->thread.esp0 = (unsigned long) (childregs+1) - 15; So, as next I'm gonna try disabling CONFIG_TRAP_BAD_SYSCALL_EXITS and see what happens there and then the stack-aligned process.c one liner above. /me open to testing suggestions. Regards, Boris. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Ed Tomlinson <[EMAIL PROTECTED]> wrote: > > On Monday 11 April 2005 04:25, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > > > > > - The anticipatory I/O scheduler has always been fairly useless with SCSI > > disks which perform tagged command queueing. There's a patch here from > > Jens > > which is designed to fix that up by constraining the number of requests > > which we'll leave pending in the device. > > > > The depth currently defaults to 1. Tunable in > > /sys/block/hdX/queue/iosched/queue_depth > > > > This patch hasn't been performance tested at all yet. If you think it is > > misbehaving (the usual symptom is processes stuck in D state) then please > > report it, then boot with `elevator=cfq' or `elevator=deadline' to work > > around it. > > > > - More CPU scheduler work. I hope someone is testing this stuff. > > Something is not quite right here. I built rc2-mm3 and booted (uni > processor, amd64, preempt on). > mm3 lasted about 30 mins before locking up with a dead keyboard. I had mm2 > reboot a few times > over the last couple of days too. > > 11-mm3 uptime of 2 weeks+ > 12-rc2-mm2 reboots once every couple of days > 12-rc2-mm3 locked up within 30 mins using X using kmail/bogofilter Unpleasant. Serial console would be nice ;) > My serial console does not seem to want to work. Has anything changed with > this support? > Don't think so - it works OK here. Checked the .config? Does the serial port work if you do `echo foo > /dev/ttyS0'? ACPI? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Monday 11 April 2005 04:25, Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > > - The anticipatory I/O scheduler has always been fairly useless with SCSI > disks which perform tagged command queueing. There's a patch here from Jens > which is designed to fix that up by constraining the number of requests > which we'll leave pending in the device. > > The depth currently defaults to 1. Tunable in > /sys/block/hdX/queue/iosched/queue_depth > > This patch hasn't been performance tested at all yet. If you think it is > misbehaving (the usual symptom is processes stuck in D state) then please > report it, then boot with `elevator=cfq' or `elevator=deadline' to work > around it. > > - More CPU scheduler work. I hope someone is testing this stuff. Something is not quite right here. I built rc2-mm3 and booted (uni processor, amd64, preempt on). mm3 lasted about 30 mins before locking up with a dead keyboard. I had mm2 reboot a few times over the last couple of days too. 11-mm3 uptime of 2 weeks+ 12-rc2-mm2 reboots once every couple of days 12-rc2-mm3 locked up within 30 mins using X using kmail/bogofilter My serial console does not seem to want to work. Has anything changed with this support? TIA, Ed Tomlinson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: > Jindrich Makovicka <[EMAIL PROTECTED]> wrote: > >>Andrew Morton wrote: >> >>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ >> >>MPlayer randomly crashes in various pthread_* calls when using binary >>codecs. 2.6.12-rc2-mm2 was ok. I tried to reverse >>fix-crash-in-entrys-restore_all.patch, but it didn't help. >> > > > hm, could be anything. > > Does 2.6.12-rc2 also fail? looks like it's sched-unlocked-context-switches.patch. after reversing it works fine. -- Jindrich Makovicka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > AS basically does its own TCQ strangulation, which IIRC involves things > > > like completing all reads before issuing new writes, and completing all > > > reads from one process before reads from another. As well as the > > > fundamental way that waiting for a 'dependant read' throttles TCQ. > > > > My (mpt-fusion-based) workstation is still really slow when there's a lot > > of writeout happening. Just from a quick test: > > > > > 2.6.12-rc2, as, tcq depth=2:7.241 seconds > > > 2.6.12-rc2, as, tcq depth=64: 12.172 seconds > > > 2.6.12-rc2+patch,as, tcq depth=64: 7.199 seconds > > > 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes > > > 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes > > > > 2.6.11-rc4-mm1, as, mpt-f 39.349 seconds > > > > That was really really slow but had a sudden burst of read I/O at the end > > which made the thing look better than it really is. I wouldn't have a clue > > what tag depth it's using, and it's the only mpt-fusion based machine I > > have handy... > > > > Heh. Well with my current lineup on the mpt-fusion driver and no as-limit-queue-depth.patch that test takes 17 seconds. With as-limit-queue-depth.patch it's down to 10 seconds. Which is pretty darn good btw. I assume from this: scsi0 : ioc0: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=25 scsi1 : ioc1: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=26 that it's using a tag depth of 222. int req_depth; /* Number of request frames */ I wonder if that's true... One thing which changed is that this kernel now has the fixed-up mpt-fusion chipset tuning. That doubles the IO bandwidth, which would pretty well account for that difference. I'll wait and see how irritating things get under writeout load. Yes, we'll need to decide if we want to retain as-limit-queue-depth.patch and toss out some of the older AS logic which was designed to address the TCQ problem. Steve, could you help to identify a not-too-hard-to-set-up workload at which AS was particularly poor? Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, Apr 11 2005, Andrew Morton wrote: > - CFQ is seriously, seriously read-starved on this workload. > > CFQ3: > > procs ---memory-- ---swap-- -io --system-- cpu > r b swpd free buff cache si sobibo incs us sy id wa > 1 5 1008 25888 4204 38458200012 50544 1119 116 0 3 49 > 48 > 0 5 1008 24096 4204 384752000 8 51200 1112 110 0 3 49 > 48 > 0 5 1008 25824 4204 384582000 8 54816 1117 120 0 4 49 > 48 > 0 5 1008 25440 4204 384616000 8 52880 1113 115 0 3 49 > 48 > 0 5 1008 25888 4208 38457480016 51024 1121 116 0 3 49 > 48 Looks very bad, I'll have a look at this. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 2005-04-11 at 23:19 -0700, Andrew Morton wrote: > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > >- The effects of tcq on AS are much less disastrous than I thought they > > > were. Do I have the wrong workload? Memory fails me. Or did we fix > > the > > > anticipatory scheduler? > > > > > > > > > > Yes, we did fix it ;) > > Quite a long time ago, so maybe you are thinking of something else > > (I haven't been able to work it out). > > Steve Pratt's ols2004 presentation made AS look pretty bad. However the > numbers in the proceedings > (http://www.finux.org/proceedings/LinuxSymposium2004_V2.pdf) are much less > stark. > > Steve, what's up with that? The slides which you talked to had some awful > numbers. Was it the same set of tests? > Yes, they still do... :P > Seems that software RAID might have muddied the waters as well. > This may be the big issue, and yes software (and hardware) RAID isn't very good for AS - mainly because it can't make a good guess as to where "the head" is. Probably software RAID should default to using deadline if possible. I think we can do that easily with Jens' recent ioscheduler work. > That was 2.6.5. Do you recall if we did significant AS work after that? > I don't think there was. > > AS basically does its own TCQ strangulation, which IIRC involves things > > like completing all reads before issuing new writes, and completing all > > reads from one process before reads from another. As well as the > > fundamental way that waiting for a 'dependant read' throttles TCQ. > > My (mpt-fusion-based) workstation is still really slow when there's a lot > of writeout happening. Just from a quick test: > > > 2.6.12-rc2, as, tcq depth=2:7.241 seconds > > 2.6.12-rc2, as, tcq depth=64: 12.172 seconds > > 2.6.12-rc2+patch,as,tcq depth=64: 7.199 seconds > > 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes > > 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes > > 2.6.11-rc4-mm1, as, mpt-f 39.349 seconds > > That was really really slow but had a sudden burst of read I/O at the end > which made the thing look better than it really is. I wouldn't have a clue > what tag depth it's using, and it's the only mpt-fusion based machine I > have handy... > Heh. > > >- as-limit-queue-depth.patch fixes things right up anyway. Seems to be > > > doing the right thing. > > > > > > > > > > Well it depends on what we want to do. If we hard limit the AS queue > > like this, I can remove some of that TCQ throttling logic from AS. > > > > OTOH, the throttling was intended to allow us to sanely use a large > > TCQ depth without getting really bad behaviour. Theoretically a process > > can make use of TCQ if it is doing a lot of writing, or if it is not > > determined to be doing dependant reads. > > OK, I'll have a bit more of a poke at the LSI53C1030 driver, see if I can > characterise what's going on. OK. I'd like to start doing a bit of work on AS again too. Hopefully after the current CPU scheduler work gets resolved. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Tue, 2005-04-12 at 06:42 +0200, Juergen Kreileder wrote: > Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > > > On Tue, 2005-04-12 at 03:18 +0200, Juergen Kreileder wrote: > >> Andrew Morton <[EMAIL PROTECTED]> writes: > >> > >>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > >> > >> I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. > >> > >> 2.6.11-mm4 works fine but all 2.6.12 versions I've tried (all since > >> -rc1-mm3) lock up randomly. The easiest way to reproduce the > >> problem seems to be running Azareus. So it might be network > >> related, but I'm not 100% sure about that, there was a least one > >> deadlock with virtually no network usage. > > > > Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you > > test without it and let me know if it makes a difference ? > > IIRC I had disabled that for rc2-mm2 and it didn't make a difference. > I'll disable it again when I try older versions. > > I just got another crash with rc2-mm3. The crash was a bit different > this time, I still could move the mouse pointer and the logs contained > some info: Ok, what about non-mm ? (just plain rc2) Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Nick Piggin <[EMAIL PROTECTED]> wrote: > > >- The effects of tcq on AS are much less disastrous than I thought they > > were. Do I have the wrong workload? Memory fails me. Or did we fix the > > anticipatory scheduler? > > > > > > Yes, we did fix it ;) > Quite a long time ago, so maybe you are thinking of something else > (I haven't been able to work it out). Steve Pratt's ols2004 presentation made AS look pretty bad. However the numbers in the proceedings (http://www.finux.org/proceedings/LinuxSymposium2004_V2.pdf) are much less stark. Steve, what's up with that? The slides which you talked to had some awful numbers. Was it the same set of tests? Seems that software RAID might have muddied the waters as well. That was 2.6.5. Do you recall if we did significant AS work after that? > AS basically does its own TCQ strangulation, which IIRC involves things > like completing all reads before issuing new writes, and completing all > reads from one process before reads from another. As well as the > fundamental way that waiting for a 'dependant read' throttles TCQ. My (mpt-fusion-based) workstation is still really slow when there's a lot of writeout happening. Just from a quick test: > 2.6.12-rc2, as, tcq depth=2:7.241 seconds > 2.6.12-rc2, as, tcq depth=64: 12.172 seconds > 2.6.12-rc2+patch,as, tcq depth=64: 7.199 seconds > 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes > 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes 2.6.11-rc4-mm1, as, mpt-f 39.349 seconds That was really really slow but had a sudden burst of read I/O at the end which made the thing look better than it really is. I wouldn't have a clue what tag depth it's using, and it's the only mpt-fusion based machine I have handy... > >- as-limit-queue-depth.patch fixes things right up anyway. Seems to be > > doing the right thing. > > > > > > Well it depends on what we want to do. If we hard limit the AS queue > like this, I can remove some of that TCQ throttling logic from AS. > > OTOH, the throttling was intended to allow us to sanely use a large > TCQ depth without getting really bad behaviour. Theoretically a process > can make use of TCQ if it is doing a lot of writing, or if it is not > determined to be doing dependant reads. OK, I'll have a bit more of a poke at the LSI53C1030 driver, see if I can characterise what's going on. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Stas Sergeev <[EMAIL PROTECTED]> wrote: > > Hello. > > Andrew Morton wrote: > >> Program received signal SIGTRAP, Trace/breakpoint trap. > SIGTRAP - it looks like the "int $3" > triggered, not "mov0x30(%esp),%eax", > which is just the next insn and so the > %eip points to it, but it might be > innocent. And besides, 0x30(%esp) is > EFLAGS, not OLDSS. So I think maybe my > patch is not guilty this time, it is > just the non-zero preempt count on the > return path caused by something else. OK, the `int $3' is part of the CONFIG_TRAP_BAD_SYSCALL_EXITS thing which I never use. I'm not sure what problem is actually being reported here, now you mention it. > >> (gdb) p $eip > >> $1 = (void *) 0xc0102ee7 > Could you please also do > "p $esp" or "info reg", so that we can > see the rest of the registers? > > >> And as we see, we're at the "mov0x30(%esp),%eax" which accesses above > >> the > >> bottom of the stack. > But that's strange. Another instance of > the 0x30(%esp) is there a few instructions > above this one, see it with "disas restore_all". > It is much more likely that the real offender > is the previous instruction. $eip points on > the instruction *after* the trap, which might > be innocent. Yup. But are you sure that the "+ 8" is correct, given these offsets are larger than that? > >> After applying nmi_stack_correct-fix.patch, rc2-mm3 > I can't find this one in an -mm broken-outs. It was in rc2-mm2. > Where is this patch? > Could you please also test this one: > http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.0/1287.html > > > Interesting. It could be an interaction between the kgdb patch and the new > > vm86 checking code. > I think so too, will have a look if I can > reproduce it. > > > The above code is accessing esp+56, > Yes, but this particular instruction was > not reached. "int $3" killed the system > for some reasons. Probably it decided that some syscall got a "bad exit". Disable CONFIG_TRAP_BAD_SYSCALL_EXITS. > > - p->thread.esp0 = (unsigned long) (childregs+1) - 8; > > + p->thread.esp0 = (unsigned long) (childregs+1) - 15; > 15 is somewhat nasty - it will make the > stack unaligned, should better be 16 I > think. ? It's still 4-byte-aligned. > But I don't see why, the only > scenario we've seen were the not stored > SS/ESP, which is 8 bytes only. > But the > If we definitely think my patch is guilty > again, then probably something like this > is necessary: > > --- linux/include/asm-i386/processor.h.old 2005-03-20 14:13:02.0 > +0300 > +++ linux/include/asm-i386/processor.h 2005-04-12 07:50:11.0 +0400 > @@ -458,7 +458,7 @@ > * be within the limit. > */ > #define INIT_TSS {\ > - .esp0 = sizeof(init_stack) + (long)_stack, \ > + .esp0 = sizeof(init_stack) - 8 + (long)_stack, \ > .ss0= __KERNEL_DS, \ > .ss1= __KERNEL_CS, \ > .ldt= GDT_ENTRY_LDT,\ > > But I don't think the init_stack can be > abused on the sysenter path, so this is > just a wild guess. I'm suspecting this is all due to CONFIG_TRAP_BAD_SYSCALL_EXITS taking the debug trap.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Hello. Andrew Morton wrote: Program received signal SIGTRAP, Trace/breakpoint trap. SIGTRAP - it looks like the "int $3" triggered, not "mov0x30(%esp),%eax", which is just the next insn and so the %eip points to it, but it might be innocent. And besides, 0x30(%esp) is EFLAGS, not OLDSS. So I think maybe my patch is not guilty this time, it is just the non-zero preempt count on the return path caused by something else. (gdb) p $eip $1 = (void *) 0xc0102ee7 Could you please also do "p $esp" or "info reg", so that we can see the rest of the registers? And as we see, we're at the "mov0x30(%esp),%eax" which accesses above the bottom of the stack. But that's strange. Another instance of the 0x30(%esp) is there a few instructions above this one, see it with "disas restore_all". It is much more likely that the real offender is the previous instruction. $eip points on the instruction *after* the trap, which might be innocent. After applying nmi_stack_correct-fix.patch, rc2-mm3 I can't find this one in an -mm broken-outs. Where is this patch? Could you please also test this one: http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.0/1287.html Interesting. It could be an interaction between the kgdb patch and the new vm86 checking code. I think so too, will have a look if I can reproduce it. The above code is accessing esp+56, Yes, but this particular instruction was not reached. "int $3" killed the system for some reasons. - p->thread.esp0 = (unsigned long) (childregs+1) - 8; + p->thread.esp0 = (unsigned long) (childregs+1) - 15; 15 is somewhat nasty - it will make the stack unaligned, should better be 16 I think. But I don't see why, the only scenario we've seen were the not stored SS/ESP, which is 8 bytes only. If we definitely think my patch is guilty again, then probably something like this is necessary: --- linux/include/asm-i386/processor.h.old 2005-03-20 14:13:02.0 +0300 +++ linux/include/asm-i386/processor.h 2005-04-12 07:50:11.0 +0400 @@ -458,7 +458,7 @@ * be within the limit. */ #define INIT_TSS {\ - .esp0 = sizeof(init_stack) + (long)_stack, \ + .esp0 = sizeof(init_stack) - 8 + (long)_stack, \ .ss0= __KERNEL_DS, \ .ss1= __KERNEL_CS, \ .ldt= GDT_ENTRY_LDT,\ But I don't think the init_stack can be abused on the sysenter path, so this is just a wild guess. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: So it turns out that patch was broken. I've fixed it locally and the results are good, but odd. The machine is a 4GB x86_64 with aic79xx controllers and MAXTOR ATLAS10K4_73WLS disks. ext2 filesystem. The workload is continuous pagecache writeback versus read-lots-of-little-files: while true do dd if=/dev/zero of=/mnt/sdb2/x bs=40M count=100 conv=notrunc done versus find /mnt/sdb2/linux-2.4.25 -type f | xargs cat > /dev/null we measure how long the find+cat takes. 2.6.12-rc2, as, tcq depth=2:7.241 seconds 2.6.12-rc2, as, tcq depth=64: 12.172 seconds 2.6.12-rc2+patch,as,tcq depth=64: 7.199 seconds 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes So - The effects of tcq on AS are much less disastrous than I thought they were. Do I have the wrong workload? Memory fails me. Or did we fix the anticipatory scheduler? Yes, we did fix it ;) Quite a long time ago, so maybe you are thinking of something else (I haven't been able to work it out). AS basically does its own TCQ strangulation, which IIRC involves things like completing all reads before issuing new writes, and completing all reads from one process before reads from another. As well as the fundamental way that waiting for a 'dependant read' throttles TCQ. - as-limit-queue-depth.patch fixes things right up anyway. Seems to be doing the right thing. Well it depends on what we want to do. If we hard limit the AS queue like this, I can remove some of that TCQ throttling logic from AS. OTOH, the throttling was intended to allow us to sanely use a large TCQ depth without getting really bad behaviour. Theoretically a process can make use of TCQ if it is doing a lot of writing, or if it is not determined to be doing dependant reads. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: So it turns out that patch was broken. I've fixed it locally and the results are good, but odd. The machine is a 4GB x86_64 with aic79xx controllers and MAXTOR ATLAS10K4_73WLS disks. ext2 filesystem. The workload is continuous pagecache writeback versus read-lots-of-little-files: while true do dd if=/dev/zero of=/mnt/sdb2/x bs=40M count=100 conv=notrunc done versus find /mnt/sdb2/linux-2.4.25 -type f | xargs cat /dev/null we measure how long the find+cat takes. 2.6.12-rc2, as, tcq depth=2:7.241 seconds 2.6.12-rc2, as, tcq depth=64: 12.172 seconds 2.6.12-rc2+patch,as,tcq depth=64: 7.199 seconds 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes So - The effects of tcq on AS are much less disastrous than I thought they were. Do I have the wrong workload? Memory fails me. Or did we fix the anticipatory scheduler? Yes, we did fix it ;) Quite a long time ago, so maybe you are thinking of something else (I haven't been able to work it out). AS basically does its own TCQ strangulation, which IIRC involves things like completing all reads before issuing new writes, and completing all reads from one process before reads from another. As well as the fundamental way that waiting for a 'dependant read' throttles TCQ. - as-limit-queue-depth.patch fixes things right up anyway. Seems to be doing the right thing. Well it depends on what we want to do. If we hard limit the AS queue like this, I can remove some of that TCQ throttling logic from AS. OTOH, the throttling was intended to allow us to sanely use a large TCQ depth without getting really bad behaviour. Theoretically a process can make use of TCQ if it is doing a lot of writing, or if it is not determined to be doing dependant reads. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Hello. Andrew Morton wrote: Program received signal SIGTRAP, Trace/breakpoint trap. SIGTRAP - it looks like the int $3 triggered, not mov0x30(%esp),%eax, which is just the next insn and so the %eip points to it, but it might be innocent. And besides, 0x30(%esp) is EFLAGS, not OLDSS. So I think maybe my patch is not guilty this time, it is just the non-zero preempt count on the return path caused by something else. (gdb) p $eip $1 = (void *) 0xc0102ee7 Could you please also do p $esp or info reg, so that we can see the rest of the registers? And as we see, we're at the mov0x30(%esp),%eax which accesses above the bottom of the stack. But that's strange. Another instance of the 0x30(%esp) is there a few instructions above this one, see it with disas restore_all. It is much more likely that the real offender is the previous instruction. $eip points on the instruction *after* the trap, which might be innocent. After applying nmi_stack_correct-fix.patch, rc2-mm3 I can't find this one in an -mm broken-outs. Where is this patch? Could you please also test this one: http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.0/1287.html Interesting. It could be an interaction between the kgdb patch and the new vm86 checking code. I think so too, will have a look if I can reproduce it. The above code is accessing esp+56, Yes, but this particular instruction was not reached. int $3 killed the system for some reasons. - p-thread.esp0 = (unsigned long) (childregs+1) - 8; + p-thread.esp0 = (unsigned long) (childregs+1) - 15; 15 is somewhat nasty - it will make the stack unaligned, should better be 16 I think. But I don't see why, the only scenario we've seen were the not stored SS/ESP, which is 8 bytes only. If we definitely think my patch is guilty again, then probably something like this is necessary: --- linux/include/asm-i386/processor.h.old 2005-03-20 14:13:02.0 +0300 +++ linux/include/asm-i386/processor.h 2005-04-12 07:50:11.0 +0400 @@ -458,7 +458,7 @@ * be within the limit. */ #define INIT_TSS {\ - .esp0 = sizeof(init_stack) + (long)init_stack, \ + .esp0 = sizeof(init_stack) - 8 + (long)init_stack, \ .ss0= __KERNEL_DS, \ .ss1= __KERNEL_CS, \ .ldt= GDT_ENTRY_LDT,\ But I don't think the init_stack can be abused on the sysenter path, so this is just a wild guess. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Stas Sergeev [EMAIL PROTECTED] wrote: Hello. Andrew Morton wrote: Program received signal SIGTRAP, Trace/breakpoint trap. SIGTRAP - it looks like the int $3 triggered, not mov0x30(%esp),%eax, which is just the next insn and so the %eip points to it, but it might be innocent. And besides, 0x30(%esp) is EFLAGS, not OLDSS. So I think maybe my patch is not guilty this time, it is just the non-zero preempt count on the return path caused by something else. OK, the `int $3' is part of the CONFIG_TRAP_BAD_SYSCALL_EXITS thing which I never use. I'm not sure what problem is actually being reported here, now you mention it. (gdb) p $eip $1 = (void *) 0xc0102ee7 Could you please also do p $esp or info reg, so that we can see the rest of the registers? And as we see, we're at the mov0x30(%esp),%eax which accesses above the bottom of the stack. But that's strange. Another instance of the 0x30(%esp) is there a few instructions above this one, see it with disas restore_all. It is much more likely that the real offender is the previous instruction. $eip points on the instruction *after* the trap, which might be innocent. Yup. But are you sure that the + 8 is correct, given these offsets are larger than that? After applying nmi_stack_correct-fix.patch, rc2-mm3 I can't find this one in an -mm broken-outs. It was in rc2-mm2. Where is this patch? Could you please also test this one: http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.0/1287.html Interesting. It could be an interaction between the kgdb patch and the new vm86 checking code. I think so too, will have a look if I can reproduce it. The above code is accessing esp+56, Yes, but this particular instruction was not reached. int $3 killed the system for some reasons. Probably it decided that some syscall got a bad exit. Disable CONFIG_TRAP_BAD_SYSCALL_EXITS. - p-thread.esp0 = (unsigned long) (childregs+1) - 8; + p-thread.esp0 = (unsigned long) (childregs+1) - 15; 15 is somewhat nasty - it will make the stack unaligned, should better be 16 I think. ? It's still 4-byte-aligned. But I don't see why, the only scenario we've seen were the not stored SS/ESP, which is 8 bytes only. But the If we definitely think my patch is guilty again, then probably something like this is necessary: --- linux/include/asm-i386/processor.h.old 2005-03-20 14:13:02.0 +0300 +++ linux/include/asm-i386/processor.h 2005-04-12 07:50:11.0 +0400 @@ -458,7 +458,7 @@ * be within the limit. */ #define INIT_TSS {\ - .esp0 = sizeof(init_stack) + (long)init_stack, \ + .esp0 = sizeof(init_stack) - 8 + (long)init_stack, \ .ss0= __KERNEL_DS, \ .ss1= __KERNEL_CS, \ .ldt= GDT_ENTRY_LDT,\ But I don't think the init_stack can be abused on the sysenter path, so this is just a wild guess. I'm suspecting this is all due to CONFIG_TRAP_BAD_SYSCALL_EXITS taking the debug trap.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, 2005-04-11 at 23:19 -0700, Andrew Morton wrote: Nick Piggin [EMAIL PROTECTED] wrote: - The effects of tcq on AS are much less disastrous than I thought they were. Do I have the wrong workload? Memory fails me. Or did we fix the anticipatory scheduler? Yes, we did fix it ;) Quite a long time ago, so maybe you are thinking of something else (I haven't been able to work it out). Steve Pratt's ols2004 presentation made AS look pretty bad. However the numbers in the proceedings (http://www.finux.org/proceedings/LinuxSymposium2004_V2.pdf) are much less stark. Steve, what's up with that? The slides which you talked to had some awful numbers. Was it the same set of tests? Yes, they still do... :P Seems that software RAID might have muddied the waters as well. This may be the big issue, and yes software (and hardware) RAID isn't very good for AS - mainly because it can't make a good guess as to where the head is. Probably software RAID should default to using deadline if possible. I think we can do that easily with Jens' recent ioscheduler work. That was 2.6.5. Do you recall if we did significant AS work after that? I don't think there was. AS basically does its own TCQ strangulation, which IIRC involves things like completing all reads before issuing new writes, and completing all reads from one process before reads from another. As well as the fundamental way that waiting for a 'dependant read' throttles TCQ. My (mpt-fusion-based) workstation is still really slow when there's a lot of writeout happening. Just from a quick test: 2.6.12-rc2, as, tcq depth=2:7.241 seconds 2.6.12-rc2, as, tcq depth=64: 12.172 seconds 2.6.12-rc2+patch,as,tcq depth=64: 7.199 seconds 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes 2.6.11-rc4-mm1, as, mpt-f 39.349 seconds That was really really slow but had a sudden burst of read I/O at the end which made the thing look better than it really is. I wouldn't have a clue what tag depth it's using, and it's the only mpt-fusion based machine I have handy... Heh. - as-limit-queue-depth.patch fixes things right up anyway. Seems to be doing the right thing. Well it depends on what we want to do. If we hard limit the AS queue like this, I can remove some of that TCQ throttling logic from AS. OTOH, the throttling was intended to allow us to sanely use a large TCQ depth without getting really bad behaviour. Theoretically a process can make use of TCQ if it is doing a lot of writing, or if it is not determined to be doing dependant reads. OK, I'll have a bit more of a poke at the LSI53C1030 driver, see if I can characterise what's going on. OK. I'd like to start doing a bit of work on AS again too. Hopefully after the current CPU scheduler work gets resolved. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Mon, Apr 11 2005, Andrew Morton wrote: - CFQ is seriously, seriously read-starved on this workload. CFQ3: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 5 1008 25888 4204 38458200012 50544 1119 116 0 3 49 48 0 5 1008 24096 4204 384752000 8 51200 1112 110 0 3 49 48 0 5 1008 25824 4204 384582000 8 54816 1117 120 0 4 49 48 0 5 1008 25440 4204 384616000 8 52880 1113 115 0 3 49 48 0 5 1008 25888 4208 38457480016 51024 1121 116 0 3 49 48 Looks very bad, I'll have a look at this. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Nick Piggin [EMAIL PROTECTED] wrote: AS basically does its own TCQ strangulation, which IIRC involves things like completing all reads before issuing new writes, and completing all reads from one process before reads from another. As well as the fundamental way that waiting for a 'dependant read' throttles TCQ. My (mpt-fusion-based) workstation is still really slow when there's a lot of writeout happening. Just from a quick test: 2.6.12-rc2, as, tcq depth=2:7.241 seconds 2.6.12-rc2, as, tcq depth=64: 12.172 seconds 2.6.12-rc2+patch,as, tcq depth=64: 7.199 seconds 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes 2.6.11-rc4-mm1, as, mpt-f 39.349 seconds That was really really slow but had a sudden burst of read I/O at the end which made the thing look better than it really is. I wouldn't have a clue what tag depth it's using, and it's the only mpt-fusion based machine I have handy... Heh. Well with my current lineup on the mpt-fusion driver and no as-limit-queue-depth.patch that test takes 17 seconds. With as-limit-queue-depth.patch it's down to 10 seconds. Which is pretty darn good btw. I assume from this: scsi0 : ioc0: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=25 scsi1 : ioc1: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=26 that it's using a tag depth of 222. int req_depth; /* Number of request frames */ I wonder if that's true... One thing which changed is that this kernel now has the fixed-up mpt-fusion chipset tuning. That doubles the IO bandwidth, which would pretty well account for that difference. I'll wait and see how irritating things get under writeout load. Yes, we'll need to decide if we want to retain as-limit-queue-depth.patch and toss out some of the older AS logic which was designed to address the TCQ problem. Steve, could you help to identify a not-too-hard-to-set-up workload at which AS was particularly poor? Thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: Jindrich Makovicka [EMAIL PROTECTED] wrote: Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ MPlayer randomly crashes in various pthread_* calls when using binary codecs. 2.6.12-rc2-mm2 was ok. I tried to reverse fix-crash-in-entrys-restore_all.patch, but it didn't help. hm, could be anything. Does 2.6.12-rc2 also fail? looks like it's sched-unlocked-context-switches.patch. after reversing it works fine. -- Jindrich Makovicka - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Monday 11 April 2005 04:25, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ - The anticipatory I/O scheduler has always been fairly useless with SCSI disks which perform tagged command queueing. There's a patch here from Jens which is designed to fix that up by constraining the number of requests which we'll leave pending in the device. The depth currently defaults to 1. Tunable in /sys/block/hdX/queue/iosched/queue_depth This patch hasn't been performance tested at all yet. If you think it is misbehaving (the usual symptom is processes stuck in D state) then please report it, then boot with `elevator=cfq' or `elevator=deadline' to work around it. - More CPU scheduler work. I hope someone is testing this stuff. Something is not quite right here. I built rc2-mm3 and booted (uni processor, amd64, preempt on). mm3 lasted about 30 mins before locking up with a dead keyboard. I had mm2 reboot a few times over the last couple of days too. 11-mm3 uptime of 2 weeks+ 12-rc2-mm2 reboots once every couple of days 12-rc2-mm3 locked up within 30 mins using X using kmail/bogofilter My serial console does not seem to want to work. Has anything changed with this support? TIA, Ed Tomlinson - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Ed Tomlinson [EMAIL PROTECTED] wrote: On Monday 11 April 2005 04:25, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ - The anticipatory I/O scheduler has always been fairly useless with SCSI disks which perform tagged command queueing. There's a patch here from Jens which is designed to fix that up by constraining the number of requests which we'll leave pending in the device. The depth currently defaults to 1. Tunable in /sys/block/hdX/queue/iosched/queue_depth This patch hasn't been performance tested at all yet. If you think it is misbehaving (the usual symptom is processes stuck in D state) then please report it, then boot with `elevator=cfq' or `elevator=deadline' to work around it. - More CPU scheduler work. I hope someone is testing this stuff. Something is not quite right here. I built rc2-mm3 and booted (uni processor, amd64, preempt on). mm3 lasted about 30 mins before locking up with a dead keyboard. I had mm2 reboot a few times over the last couple of days too. 11-mm3 uptime of 2 weeks+ 12-rc2-mm2 reboots once every couple of days 12-rc2-mm3 locked up within 30 mins using X using kmail/bogofilter Unpleasant. Serial console would be nice ;) My serial console does not seem to want to work. Has anything changed with this support? Don't think so - it works OK here. Checked the .config? Does the serial port work if you do `echo foo /dev/ttyS0'? ACPI? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Tuesday 12 April 2005 06:20, Stas Sergeev wrote: Here we go again: You might be right about the int3 instruction: (gdb) disas 0xc0102ee0 Dump of assembler code for function restore_all: 0xc0102ed1 restore_all+0: mov0x30(%esp),%eax 0xc0102ed5 restore_all+4: mov0x2c(%esp),%al 0xc0102ed9 restore_all+8: test $0x20003,%eax 0xc0102ede restore_all+13:je 0xc0102ee7 resume_kernelX 0xc0102ee0 restore_all+15:cmpl $0x0,0x14(%ebp) 0xc0102ee4 restore_all+19:je 0xc0102ee7 resume_kernelX 0xc0102ee6 restore_all+21:int3 End of assembler dump. Could you please also do p $esp or info reg, so that we can see the rest of the registers? Program received signal SIGTRAP, Trace/breakpoint trap. 0xc0102ee7 in resume_kernelX () at atomic.h:175 175 { (gdb) p $esp $1 = (void *) 0xdfcb4fc4 (gdb) info reg eax0x273627 ecx0x0 0 edx0x1 65536 ebx0xb7fd9c00 -1208116224 esp0xdfcb4fc4 0xdfcb4fc4 ebp0xbfbd5948 0xbfbd5948 esi0x77 119 edi0x1cb459 eip0xc0102ee7 0xc0102ee7 eflags 0x82 130 cs 0x60 96 ss 0x68 104 ds 0xc010007b -1072693125 es 0xdfcb007b -540344197 fs 0x 65535 gs 0x 65535 (gdb) And as we see, we're at the mov0x30(%esp),%eax which accesses above the bottom of the stack. But that's strange. Another instance of the 0x30(%esp) is there a few instructions above this one, see it with disas restore_all. It is much more likely that the real offender is the previous instruction. $eip points on the instruction *after* the trap, which might be innocent. After applying nmi_stack_correct-fix.patch, rc2-mm3 I can't find this one in an -mm broken-outs. Where is this patch? Could you please also test this one: http://www.uwsg.iu.edu/hypermail/linux/kernel/0504.0/1287.html Interesting. It could be an interaction between the kgdb patch and the new vm86 checking code. I think so too, will have a look if I can reproduce it. The above code is accessing esp+56, Yes, but this particular instruction was not reached. int $3 killed the system for some reasons. - p-thread.esp0 = (unsigned long) (childregs+1) - 8; + p-thread.esp0 = (unsigned long) (childregs+1) - 15; snip So, as next I'm gonna try disabling CONFIG_TRAP_BAD_SYSCALL_EXITS and see what happens there and then the stack-aligned process.c one liner above. /me open to testing suggestions. Regards, Boris. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3: 10 seconds of nothingness
[ 19.617890] Testing NMI watchdog ... 6ACPI: No ACPI bus support for 2-2 [ 19.705673] ACPI: No ACPI bus support for 2-2:1.0 [ 20.002417] usb 3-2: new full speed USB device using uhci_hcd and address 2 [ 20.121763] ACPI: No ACPI bus support for 3-2 [ 20.156293] ACPI: No ACPI bus support for 3-2:1.0 [ 29.539613] OK. I also had this problem with mm1. mm2 wouldn't compile, so I didn't test that. IIRC it also happened with the rc1-mm's. Is this supposed to happen? It's a fairly new thing on x64, should be fixed soon. If it disturbs you too much back out http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/broken-out/rfc-check-nmi-watchdog-is-broken.patch - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: Nick Piggin [EMAIL PROTECTED] wrote: - The effects of tcq on AS are much less disastrous than I thought they were. Do I have the wrong workload? Memory fails me. Or did we fix the anticipatory scheduler? Yes, we did fix it ;) Quite a long time ago, so maybe you are thinking of something else (I haven't been able to work it out). Steve Pratt's ols2004 presentation made AS look pretty bad. However the numbers in the proceedings (http://www.finux.org/proceedings/LinuxSymposium2004_V2.pdf) are much less stark. Steve, what's up with that? The slides which you talked to had some awful numbers. Was it the same set of tests? I highlighted a few cases where AS went really wrong during the presentation, like on really large RAID 0 arrays, but in general (referring back to slides) AS trailed other schedulers by 5-10% on ext3, but had real trouble with XFS, losing by as much as %145 on 5disk raid5 system for a mix of workloads. Perhaps this is the piece you remember. Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 (ACPI build problem)
Hello. Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ Fails to compile with !CONFIG_ACPI CONFIG_SMP. CONFIG_SMP sets CONFIG_X86_HT, which sets CONFIG_ACPI_BOOT, but that fails without CONFIG_ACPI: CC arch/i386/kernel/setup.o arch/i386/kernel/setup.c:96: error: syntax error before acpi_sci_flags arch/i386/kernel/setup.c:96: warning: type defaults to int in declaration of acpi_sci_flags arch/i386/kernel/setup.c:96: warning: data definition has no type or storage class arch/i386/kernel/setup.c: In function parse_cmdline_early: arch/i386/kernel/setup.c:811: error: request for member trigger in something not a structure or union arch/i386/kernel/setup.c:814: error: request for member trigger in something not a structure or union arch/i386/kernel/setup.c:817: error: request for member polarity in something not a structure or union arch/i386/kernel/setup.c:820: error: request for member polarity in something not a structure or union - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/3] Re: 2.6.12-rc2-mm3
Hello. Andrew Morton wrote: OK, the `int $3' is part of the CONFIG_TRAP_BAD_SYSCALL_EXITS thing which I never use. I'm not sure what problem is actually being reported here, now you mention it. The problem being reported here is that CONFIG_TRAP_BAD_SYSCALL_EXITS does nothing but locking up your machine. Actually the bug was so obvious, that I had real troubles finding it (the obvious things are difficult to spot), so I found some more bugs in a mean time. What was the bug? GET_THREAD_INFO(%ebp) was missing before TI_preempt_count(%ebp), hence the accesses beyond the stack. I'll have troubles beleiving this code worked without a lock-ups for someone sometimes. I fixed it differently though. The subsequent patches are addressing the issue. Yup. But are you sure that the + 8 is correct, given these offsets are larger than that? I don't think they are indeed larger. The %esp points to struct pt_regs, which is 60 bytes in size, and without the xss/esp = 52. Adding 8 to this gives 60, so 56(+3) looks safe to me. Probably it decided that some syscall got a bad exit. Disable CONFIG_TRAP_BAD_SYSCALL_EXITS. Yes, that's the fix too. - p-thread.esp0 = (unsigned long) (childregs+1) - 8; + p-thread.esp0 = (unsigned long) (childregs+1) - 15; 15 is somewhat nasty - it will make the stack unaligned, should better be 16 I think. ? It's still 4-byte-aligned. I don't see your point. Why do you think that I substract the stack pointer by 32 bytes, for example? I literally substract it by 8 bytes, you propose to substract it by 15 *bytes* (not dwords), so why would it still be aligned? But anyway, fortunately this bug is not about the esp0 stuff at all. I'm suspecting this is all due to CONFIG_TRAP_BAD_SYSCALL_EXITS taking the debug trap.. Sure. And that looks silly. I removed int $3. Patches follow. Seems to work reliable now, but I don't know how to test it since there seem to be no such an offending syscalls here to test. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you test without it and let me know if it makes a difference ? IIRC I had disabled that for rc2-mm2 and it didn't make a difference. I'll disable it again when I try older versions. I just got another crash with rc2-mm3. The crash was a bit different this time, I still could move the mouse pointer and the logs contained some info: Ok, what about non-mm ? (just plain rc2) I've tried older kernels now. rc1-mm1 locks up (no logs); plain rc1 seems to be OK (running fine for several hours now). Interesting. Please try -rc2 too... Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
See this patch from Steve French: http://cifs.bkbits.net:8080/linux-2.5cifs/[EMAIL PROTECTED] Thanks, that fixed it. M. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3 (ACPI build problem)
On Tue, Apr 12, 2005 at 09:53:02PM +0400, Stas Sergeev wrote: Hello. Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ Fails to compile with !CONFIG_ACPI CONFIG_SMP. CONFIG_SMP sets CONFIG_X86_HT, which sets CONFIG_ACPI_BOOT, but that fails without CONFIG_ACPI: CC arch/i386/kernel/setup.o arch/i386/kernel/setup.c:96: error: syntax error before ???acpi_sci_flags??? arch/i386/kernel/setup.c:96: warning: type defaults to ???int??? in declaration of ???acpi_sci_flags??? arch/i386/kernel/setup.c:96: warning: data definition has no type or storage class arch/i386/kernel/setup.c: In function ???parse_cmdline_early???: arch/i386/kernel/setup.c:811: error: request for member ???trigger??? in something not a structure or union arch/i386/kernel/setup.c:814: error: request for member ???trigger??? in something not a structure or union arch/i386/kernel/setup.c:817: error: request for member ???polarity??? in something not a structure or union arch/i386/kernel/setup.c:820: error: request for member ???polarity??? in something not a structure or union Known bug. Workaround: Enable CONFIG_ACPI. cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you test without it and let me know if it makes a difference ? IIRC I had disabled that for rc2-mm2 and it didn't make a difference. I'll disable it again when I try older versions. I just got another crash with rc2-mm3. The crash was a bit different this time, I still could move the mouse pointer and the logs contained some info: Ok, what about non-mm ? (just plain rc2) I've tried older kernels now. rc1-mm1 locks up (no logs); plain rc1 seems to be OK (running fine for several hours now). Interesting. Please try -rc2 too... Works fine so far. Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: On Tue, 2005-04-12 at 06:42 +0200, Juergen Kreileder wrote: Benjamin Herrenschmidt [EMAIL PROTECTED] writes: On Tue, 2005-04-12 at 03:18 +0200, Juergen Kreileder wrote: Andrew Morton [EMAIL PROTECTED] writes: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. 2.6.11-mm4 works fine but all 2.6.12 versions I've tried (all since -rc1-mm3) lock up randomly. The easiest way to reproduce the problem seems to be running Azareus. So it might be network related, but I'm not 100% sure about that, there was a least one deadlock with virtually no network usage. Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you test without it and let me know if it makes a difference ? IIRC I had disabled that for rc2-mm2 and it didn't make a difference. I'll disable it again when I try older versions. I just got another crash with rc2-mm3. The crash was a bit different this time, I still could move the mouse pointer and the logs contained some info: Ok, what about non-mm ? (just plain rc2) I've tried older kernels now. rc1-mm1 locks up (no logs); plain rc1 seems to be OK (running fine for several hours now). Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton wrote: Nick Piggin [EMAIL PROTECTED] wrote: AS basically does its own TCQ strangulation, which IIRC involves things like completing all reads before issuing new writes, and completing all reads from one process before reads from another. As well as the fundamental way that waiting for a 'dependant read' throttles TCQ. My (mpt-fusion-based) workstation is still really slow when there's a lot of writeout happening. Just from a quick test: 2.6.12-rc2, as, tcq depth=2: 7.241 seconds 2.6.12-rc2, as, tcq depth=64: 12.172 seconds 2.6.12-rc2+patch,as, tcq depth=64: 7.199 seconds 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes 2.6.11-rc4-mm1, as, mpt-f 39.349 seconds That was really really slow but had a sudden burst of read I/O at the end which made the thing look better than it really is. I wouldn't have a clue what tag depth it's using, and it's the only mpt-fusion based machine I have handy... Heh. Well with my current lineup on the mpt-fusion driver and no as-limit-queue-depth.patch that test takes 17 seconds. With as-limit-queue-depth.patch it's down to 10 seconds. Which is pretty darn good btw. I assume from this: scsi0 : ioc0: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=25 scsi1 : ioc1: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=26 that it's using a tag depth of 222. int req_depth; /* Number of request frames */ I wonder if that's true... One thing which changed is that this kernel now has the fixed-up mpt-fusion chipset tuning. That doubles the IO bandwidth, which would pretty well account for that difference. I'll wait and see how irritating things get under writeout load. Yes, we'll need to decide if we want to retain as-limit-queue-depth.patch and toss out some of the older AS logic which was designed to address the TCQ problem. Steve, could you help to identify a not-too-hard-to-set-up workload at which AS was particularly poor? Thanks. AS with XFS was pretty bad on a couple of workloads. random 4k reads and metadata which was 40%create, 40%append, 20%delete multithreaded workloads. I'll try to run a few tests with and without this patch on my hardware setup over the next day or so and see how it does. I have not really looked at AS performance since about 2.6.6/7. Our database team recently re-checked IO Scheduler performance, and on the Ad Hoc Decision Support Workload we still saw a 15-20% lower throughput on RHEL4 with AS compared to other schedulers which were all within a couple of %. Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: > On Tue, 2005-04-12 at 03:18 +0200, Juergen Kreileder wrote: >> Andrew Morton <[EMAIL PROTECTED]> writes: >> >>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ >> >> I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. >> >> 2.6.11-mm4 works fine but all 2.6.12 versions I've tried (all since >> -rc1-mm3) lock up randomly. The easiest way to reproduce the >> problem seems to be running Azareus. So it might be network >> related, but I'm not 100% sure about that, there was a least one >> deadlock with virtually no network usage. > > Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you > test without it and let me know if it makes a difference ? IIRC I had disabled that for rc2-mm2 and it didn't make a difference. I'll disable it again when I try older versions. I just got another crash with rc2-mm3. The crash was a bit different this time, I still could move the mouse pointer and the logs contained some info: Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 [c0017690b860] [00069a73] 0x69a73 (unreliable) [c0017690b900] [c003b300] .__schedule_tail+0x9c/0x1b4 [c0017690b9a0] [c03162b0] .schedule+0x324/0x610 [c0017690ba80] [c03177e8] .schedule_timeout+0xfc/0x104 [c0017690bb60] [c00b6118] .do_select+0x278/0x4c4 [c0017690bcb0] [c00d6f4c] .compat_sys_select+0x390/0x690 [c0017690bdc0] [c0019eb8] .ppc32_select+0x14/0x28 [c0017690be30] [c000da00] syscall_exit+0x0/0x18 Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 Call Trace: [c0016fe23860] [0413] 0x413 (unreliable) [c0016fe23900] [c003b300] .__schedule_tail+0x9c/0x1b4 [c0016fe239a0] [c03162b0] .schedule+0x324/0x610 [c0016fe23a80] [c0317774] .schedule_timeout+0x88/0x104 [c0016fe23b60] [c00b6118] .do_select+0x278/0x4c4 [c0016fe23cb0] [c00d6f4c] .compat_sys_select+0x390/0x690 [c0016fe23dc0] [c0019eb8] .ppc32_select+0x14/0x28 [c0016fe23e30] [c000da00] syscall_exit+0x0/0x18 Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 Call Trace: [c00175d2b860] [0163] 0x163 (unreliable) [c00175d2b900] [c003b300] .__schedule_tail+0x9c/0x1b4 [c00175d2b9a0] [c03162b0] .schedule+0x324/0x610 [c00175d2ba80] [c0317774] .schedule_timeout+0x88/0x104 [c00175d2bb60] [c00b6118] .do_select+0x278/0x4c4 [c00175d2bcb0] [c00d6f4c] .compat_sys_select+0x390/0x690 [c00175d2bdc0] [c0019eb8] .ppc32_select+0x14/0x28 [c00175d2be30] [c000da00] syscall_exit+0x0/0x18 Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 Call Trace: [c00178a17860] [0eb1] 0xeb1 (unreliable) [c00178a17900] [c003b300] .__schedule_tail+0x9c/0x1b4 [c00178a179a0] [c03162b0] .schedule+0x324/0x610 [c00178a17a80] [c0317774] .schedule_timeout+0x88/0x104 [c00178a17b60] [c00b6118] .do_select+0x278/0x4c4 [c00178a17cb0] [c00d6f4c] .compat_sys_select+0x390/0x690 [c00178a17dc0] [c0019eb8] .ppc32_select+0x14/0x28 [c00178a17e30] [c000da00] syscall_exit+0x0/0x18 Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 Call Trace: [c001767fba10] [1bca] 0x1bca (unreliable) and so on until the machine switched into jet-fighter mode after: [c0016f887a10] [0001fc8c] 0x1fc8c (unreliable) [c0016f887ab0] [c003b300] .__schedule_tail+0x9c/0x1b4 [c0016f887b50] [c03162b0] .schedule+0x324/0x610 [c0016f887c30] [c0317774] .schedule_timeout+0x88/0x104 [c0016f887d10] [c00b6bb4] .sys_poll+0x3b8/0x4dc [c0016f887e30] [c000da00] syscall_exit+0x0/0x18 Oops: Machine check, sig: 0 [#1] Machine info: * PowerMac7,2 with 2x 2GHz * 4GB RAM * 2 disks with ext3 partitions on top of LVM2 * Radeon 9800Pro with radeonfb and X (from Debian sid) at 1600x1200 * USB Mouse via evdev * Bluetooth enabled but unused * Firewire disabled * No PCI cards * Kernel compiled with gcc-3.4.2 Juergen -- Juergen Kreileder, Blackdown Java-Linux Team http://blog.blackdown.de/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton <[EMAIL PROTECTED]> wrote: > > - The anticipatory I/O scheduler has always been fairly useless with SCSI >disks which perform tagged command queueing. There's a patch here from > Jens >which is designed to fix that up by constraining the number of requests >which we'll leave pending in the device. > >The depth currently defaults to 1. Tunable in >/sys/block/hdX/queue/iosched/queue_depth > >This patch hasn't been performance tested at all yet. If you think it is >misbehaving (the usual symptom is processes stuck in D state) then please >report it, then boot with `elevator=cfq' or `elevator=deadline' to work >around it. So it turns out that patch was broken. I've fixed it locally and the results are good, but odd. The machine is a 4GB x86_64 with aic79xx controllers and MAXTOR ATLAS10K4_73WLS disks. ext2 filesystem. The workload is continuous pagecache writeback versus read-lots-of-little-files: while true do dd if=/dev/zero of=/mnt/sdb2/x bs=40M count=100 conv=notrunc done versus find /mnt/sdb2/linux-2.4.25 -type f | xargs cat > /dev/null we measure how long the find+cat takes. 2.6.12-rc2, as, tcq depth=2:7.241 seconds 2.6.12-rc2, as, tcq depth=64: 12.172 seconds 2.6.12-rc2+patch,as,tcq depth=64: 7.199 seconds 2.6.12-rc2, cfq2, tcq depth=64: much more than 5 minutes 2.6.12-rc2, cfq3, tcq depth=64: much more than 5 minutes So - The effects of tcq on AS are much less disastrous than I thought they were. Do I have the wrong workload? Memory fails me. Or did we fix the anticipatory scheduler? - as-limit-queue-depth.patch fixes things right up anyway. Seems to be doing the right thing. - CFQ is seriously, seriously read-starved on this workload. CFQ2: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 2 3 1116 25504 4868 385400840 8 61976 1112 291 0 4 39 58 0 4 1112 24992 4868 38551200 568 4 53804 1124 452 0 4 54 43 0 4 1112 24032 4868 385600400 8 44652 1110 303 0 3 45 53 0 2 1112 25912 4872 385416400 4 51108 1122 321 0 3 52 45 2 3 1112 24312 4872 38557280032 52240 1113 300 0 4 44 52 1 3 1112 25728 4876 38544320020 48128 1118 296 0 3 58 39 0 2 1112 23872 4876 385633600 4 48136 1116 288 0 4 47 49 0 4 1112 25856 4876 38543000416 50260 1117 294 0 3 55 42 CFQ3: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 5 1008 25888 4204 38458200012 50544 1119 116 0 3 49 48 0 5 1008 24096 4204 384752000 8 51200 1112 110 0 3 49 48 0 5 1008 25824 4204 384582000 8 54816 1117 120 0 4 49 48 0 5 1008 25440 4204 384616000 8 52880 1113 115 0 3 49 48 0 5 1008 25888 4208 38457480016 51024 1121 116 0 3 49 48 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Tue, 2005-04-12 at 03:18 +0200, Juergen Kreileder wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. > > 2.6.11-mm4 works fine but all 2.6.12 versions I've tried (all since > -rc1-mm3) lock up randomly. The easiest way to reproduce the problem > seems to be running Azareus. So it might be network related, but I'm > not 100% sure about that, there was a least one deadlock with > virtually no network usage. > > BTW, what's the SysRq key on recent Apple USB keyboards? Alt/Cmd-F13 > doesn't work for me. > Hrm... I just noticed you have CONFIG_PREEMPT enabled... Can you test without it and let me know if it makes a difference ? Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Tue, 2005-04-12 at 03:18 +0200, Juergen Kreileder wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. > > 2.6.11-mm4 works fine but all 2.6.12 versions I've tried (all since > -rc1-mm3) lock up randomly. The easiest way to reproduce the problem > seems to be running Azareus. So it might be network related, but I'm > not 100% sure about that, there was a least one deadlock with > virtually no network usage. > > BTW, what's the SysRq key on recent Apple USB keyboards? Alt/Cmd-F13 > doesn't work for me. No idea about sysrq, i don't use it. However, I haven't experienced any such problem with the various G5s we have here (and no other G5 user reported such a problem). So it would be useful if you could provide a bit more informations here though. For example, what exact G5 model is this, do you have any 3rd party PCI card, what video card are you using, can you reproduce the crash in console mode, that sort of thing ... Also, did you run a memtest equivalent on the machine ? Finally, it would be useful if you could point out which specific patch or bk snapshot, or at least -mm rev. introduced the bug. As I said previously, you are the only one to report that and none of the G5s here is showing such a problem. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Andrew Morton <[EMAIL PROTECTED]> writes: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ I'm getting frequent lockups on my PowerMac G5 with rc2-mm3. 2.6.11-mm4 works fine but all 2.6.12 versions I've tried (all since -rc1-mm3) lock up randomly. The easiest way to reproduce the problem seems to be running Azareus. So it might be network related, but I'm not 100% sure about that, there was a least one deadlock with virtually no network usage. BTW, what's the SysRq key on recent Apple USB keyboards? Alt/Cmd-F13 doesn't work for me. Juergen CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y CONFIG_FORCE_MAX_ZONEORDER=13 CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_SYSCTL=y CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_KALLSYMS=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_OBSOLETE_MODPARM=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_SYSVIPC_COMPAT=y CONFIG_PPC_MULTIPLATFORM=y CONFIG_PPC_PMAC=y CONFIG_PPC=y CONFIG_PPC64=y CONFIG_PPC_OF=y CONFIG_ALTIVEC=y CONFIG_U3_DART=y CONFIG_PPC_PMAC64=y CONFIG_BOOTX_TEXT=y CONFIG_POWER4_ONLY=y CONFIG_IOMMU_VMERGE=y CONFIG_SMP=y CONFIG_NR_CPUS=2 CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_FLATMEM=y CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y CONFIG_GENERIC_HARDIRQS=y CONFIG_SECCOMP=y CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=m CONFIG_PCI_NAMES=y CONFIG_PROC_DEVICETREE=y CONFIG_NET=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_SYN_COOKIES=y CONFIG_IP_TCPDIAG=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y CONFIG_BT=m CONFIG_BT_L2CAP=m CONFIG_BT_RFCOMM=m CONFIG_BT_RFCOMM_TTY=y CONFIG_BT_HCIUSB=m CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_INITRAMFS_SOURCE="" CONFIG_CDROM_PKTCDVD=y CONFIG_CDROM_PKTCDVD_BUFFERS=8 CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDECD=y CONFIG_IDE_TASK_IOCTL=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_IDE_PMAC=y CONFIG_BLK_DEV_IDE_PMAC_ATA100FIRST=y CONFIG_BLK_DEV_IDEDMA_PMAC=y CONFIG_BLK_DEV_IDEDMA=y CONFIG_IDEDMA_AUTO=y CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y CONFIG_BLK_DEV_SD=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_LOGGING=y CONFIG_SCSI_SATA=y CONFIG_SCSI_SATA_SVW=y CONFIG_SCSI_QLA2XXX=y CONFIG_MD=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y CONFIG_DM_SNAPSHOT=y CONFIG_DM_MIRROR=y CONFIG_DM_ZERO=y CONFIG_ADB_PMU=y CONFIG_THERM_PM72=y CONFIG_NETDEVICES=y CONFIG_NET_ETHERNET=y CONFIG_MII=y CONFIG_SUNGEM=y CONFIG_INPUT=y CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1600 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1200 CONFIG_INPUT_EVDEV=m CONFIG_INPUT_KEYBOARD=y CONFIG_INPUT_MOUSE=y CONFIG_INPUT_MISC=y CONFIG_INPUT_UINPUT=m CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_RTC=y CONFIG_AGP=y CONFIG_AGP_UNINORTH=y CONFIG_I2C=y CONFIG_I2C_CHARDEV=y CONFIG_I2C_ALGOBIT=y CONFIG_I2C_KEYWEST=y CONFIG_FB=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y CONFIG_FB_SOFT_CURSOR=y CONFIG_FB_MACMODES=y CONFIG_FB_MODE_HELPERS=y CONFIG_FB_OF=y CONFIG_FB_RADEON=y CONFIG_FB_RADEON_I2C=y CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y CONFIG_SOUND=m CONFIG_SND=m CONFIG_SND_TIMER=m CONFIG_SND_PCM=m CONFIG_SND_RAWMIDI=m CONFIG_SND_SEQUENCER=m CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=m CONFIG_SND_PCM_OSS=m CONFIG_SND_SEQUENCER_OSS=y CONFIG_SND_RTCTIMER=m CONFIG_SND_POWERMAC=m CONFIG_SND_USB_AUDIO=m CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB=y CONFIG_USB_DEVICEFS=y CONFIG_USB_BANDWIDTH=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_SPLIT_ISO=y CONFIG_USB_EHCI_ROOT_HUB_TT=y CONFIG_USB_OHCI_HCD=y CONFIG_USB_OHCI_LITTLE_ENDIAN=y CONFIG_USB_STORAGE=m CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y CONFIG_USB_HIDDEV=y CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_JBD=y CONFIG_FS_MBCACHE=y CONFIG_FS_POSIX_ACL=y CONFIG_INOTIFY=y CONFIG_DNOTIFY=y CONFIG_FUSE_FS=m CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=y CONFIG_UDF_NLS=y CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_FAT_DEFAULT_CODEPAGE=850
Re: 2.6.12-rc2-mm3
Jindrich Makovicka <[EMAIL PROTECTED]> wrote: > > Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > MPlayer randomly crashes in various pthread_* calls when using binary > codecs. 2.6.12-rc2-mm2 was ok. I tried to reverse > fix-crash-in-entrys-restore_all.patch, but it didn't help. > hm, could be anything. Does 2.6.12-rc2 also fail? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Apr 11, 2005 10:46 PM, Martin J. Bligh <[EMAIL PROTECTED]> wrote: > > > --On Monday, April 11, 2005 01:25:32 -0700 Andrew Morton <[EMAIL PROTECTED]> > wrote: > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > > > > > - The anticipatory I/O scheduler has always been fairly useless with SCSI > > disks which perform tagged command queueing. There's a patch here from > > Jens > > which is designed to fix that up by constraining the number of requests > > which we'll leave pending in the device. > > > > The depth currently defaults to 1. Tunable in > > /sys/block/hdX/queue/iosched/queue_depth > > > > This patch hasn't been performance tested at all yet. If you think it is > > misbehaving (the usual symptom is processes stuck in D state) then please > > report it, then boot with `elevator=cfq' or `elevator=deadline' to work > > around it. > > > > - More CPU scheduler work. I hope someone is testing this stuff. > > Trying ... having some build problems that seem to be part test-harness, > part bugs. > > Meanwhile on PPC64: > > fs/cifs/misc.c: In function `cifs_convertUCSpath': > fs/cifs/misc.c:546: error: case label does not reduce to an integer constant > fs/cifs/misc.c:549: error: case label does not reduce to an integer constant > fs/cifs/misc.c:552: error: case label does not reduce to an integer constant > fs/cifs/misc.c:561: error: case label does not reduce to an integer constant > fs/cifs/misc.c:564: error: case label does not reduce to an integer constant > fs/cifs/misc.c:567: error: case label does not reduce to an integer constant > make[2]: *** [fs/cifs/misc.o] Error 1 > make[1]: *** [fs/cifs] Error 2 > make[1]: *** Waiting for unfinished jobs > > See this patch from Steve French: http://cifs.bkbits.net:8080/linux-2.5cifs/[EMAIL PROTECTED] > M. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
Borislav Petkov <[EMAIL PROTECTED]> wrote: > > On Monday 11 April 2005 11:43, Andrew Morton wrote: > > (Please do reply-to-all) > > > > "J.A. Magallon" <[EMAIL PROTECTED]> wrote: > > > On 04.11, Andrew Morton wrote: > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-r > > > >c2/2.6.12-rc2-mm3/ > > > > > > Is this not needed anymore ? > > > > > > --- 25/arch/i386/kernel/entry.S~nmi_stack_correct-fix2005-04-05 > > > 00:02:48.0 -0700 +++ 25-akpm/arch/i386/kernel/entry.S > > > 2005-04-05 > > > 00:02:48.0 -0700 > > > > Hopefully not. fix-crash-in-entrys-restore_all.patch works around the > > problem. - > > Hello Andrew, > I don't know whether you remember the mysterious crashes I was telling you > about last week and me rookiesh-ly trying to debug them with kgdb over the > serial console. Well, today I tried for the n-th time again and after rc2-mm3 > blocked again while loading, here's what I did: > > > [ 12.335438] NET: Registered protocol family 17 > [ 12.362483] Testing NMI watchdog ... OK. > [ 12.416195] Starting balanced_irq > [ 12.443099] VFS: Mounted root (ext2 filesystem) readonly. > [ 12.472490] Freeing unused kernel memory: 196k freed > [ 12.521004] logips2pp: Detected unknown logitech mouse model 1 > [ 12.572581] Warning: unable to open an initial console. > [ 12.972518] input: PS/2 Logitech Mouse on isa0060/serio1 > > Program received signal SIGTRAP, Trace/breakpoint trap. > 0xc0102ee7 in resume_kernelX () at atomic.h:175 <--- this one is wrong for a > mysterious reason > 175 { > (gdb) p $eip > $1 = (void *) 0xc0102ee7 > > (gdb) disas 0xc0102ee7 > Dump of assembler code for function resume_kernelX: > 0xc0102ee7 : mov0x30(%esp),%eax > 0xc0102eeb : mov0x38(%esp),%ah > 0xc0102eef : mov0x2c(%esp),%al > 0xc0102ef3 : and$0x20403,%eax > 0xc0102ef8 : cmp$0x403,%eax > 0xc0102efd : je 0xc0102f0c > End of assembler dump. > (gdb) > > And as we see, we're at the "mov0x30(%esp),%eax" which accesses above the > bottom of the stack. After applying nmi_stack_correct-fix.patch, rc2-mm3 > booted just fine, so I IMHO think that we might still be needing this, after > all. Interesting. It could be an interaction between the kgdb patch and the new vm86 checking code. (looks. I don't think that's the case). Stas, could you please take a look at 2.6.12-rc2-mm3's entry.S sometime, see if you think my theory is correct? It seems that you have CONFIG_TRAP_BAD_SYSCALL_EXITS enabled - I can't say that I've ever used that, and I really should remove it. But I doubt if that is the cause of this bug. The above code is accessing esp+56, but Stas's patch only offsets the stack pointer by 32 bytes, so I assume this, in copy_thread(): - p->thread.esp0 = (unsigned long) (childregs+1) - 8; + p->thread.esp0 = (unsigned long) (childregs+1) - 15; fixes it? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
On Monday 11 April 2005 11:43, Andrew Morton wrote: > (Please do reply-to-all) > > "J.A. Magallon" <[EMAIL PROTECTED]> wrote: > > On 04.11, Andrew Morton wrote: > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-r > > >c2/2.6.12-rc2-mm3/ > > > > Is this not needed anymore ? > > > > --- 25/arch/i386/kernel/entry.S~nmi_stack_correct-fix 2005-04-05 > > 00:02:48.0 -0700 +++ 25-akpm/arch/i386/kernel/entry.S > > 2005-04-05 > > 00:02:48.0 -0700 > > Hopefully not. fix-crash-in-entrys-restore_all.patch works around the > problem. - Hello Andrew, I don't know whether you remember the mysterious crashes I was telling you about last week and me rookiesh-ly trying to debug them with kgdb over the serial console. Well, today I tried for the n-th time again and after rc2-mm3 blocked again while loading, here's what I did: [ 12.335438] NET: Registered protocol family 17 [ 12.362483] Testing NMI watchdog ... OK. [ 12.416195] Starting balanced_irq [ 12.443099] VFS: Mounted root (ext2 filesystem) readonly. [ 12.472490] Freeing unused kernel memory: 196k freed [ 12.521004] logips2pp: Detected unknown logitech mouse model 1 [ 12.572581] Warning: unable to open an initial console. [ 12.972518] input: PS/2 Logitech Mouse on isa0060/serio1 Program received signal SIGTRAP, Trace/breakpoint trap. 0xc0102ee7 in resume_kernelX () at atomic.h:175 <--- this one is wrong for a mysterious reason 175 { (gdb) p $eip $1 = (void *) 0xc0102ee7 (gdb) disas 0xc0102ee7 Dump of assembler code for function resume_kernelX: 0xc0102ee7 : mov0x30(%esp),%eax 0xc0102eeb : mov0x38(%esp),%ah 0xc0102eef : mov0x2c(%esp),%al 0xc0102ef3 : and$0x20403,%eax 0xc0102ef8 : cmp$0x403,%eax 0xc0102efd : je 0xc0102f0c End of assembler dump. (gdb) And as we see, we're at the "mov0x30(%esp),%eax" which accesses above the bottom of the stack. After applying nmi_stack_correct-fix.patch, rc2-mm3 booted just fine, so I IMHO think that we might still be needing this, after all. Regards, Boris. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm3
--On Monday, April 11, 2005 01:25:32 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm3/ > > > - The anticipatory I/O scheduler has always been fairly useless with SCSI > disks which perform tagged command queueing. There's a patch here from Jens > which is designed to fix that up by constraining the number of requests > which we'll leave pending in the device. > > The depth currently defaults to 1. Tunable in > /sys/block/hdX/queue/iosched/queue_depth > > This patch hasn't been performance tested at all yet. If you think it is > misbehaving (the usual symptom is processes stuck in D state) then please > report it, then boot with `elevator=cfq' or `elevator=deadline' to work > around it. > > - More CPU scheduler work. I hope someone is testing this stuff. Trying ... having some build problems that seem to be part test-harness, part bugs. Meanwhile on PPC64: fs/cifs/misc.c: In function `cifs_convertUCSpath': fs/cifs/misc.c:546: error: case label does not reduce to an integer constant fs/cifs/misc.c:549: error: case label does not reduce to an integer constant fs/cifs/misc.c:552: error: case label does not reduce to an integer constant fs/cifs/misc.c:561: error: case label does not reduce to an integer constant fs/cifs/misc.c:564: error: case label does not reduce to an integer constant fs/cifs/misc.c:567: error: case label does not reduce to an integer constant make[2]: *** [fs/cifs/misc.o] Error 1 make[1]: *** [fs/cifs] Error 2 make[1]: *** Waiting for unfinished jobs M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/