Re: udiskd high CPU usage with 4.0 git
On Mon, Mar 16, 2015 at 12:44 AM, NeilBrown wrote: > On Sat, 14 Mar 2015 21:16:51 +0100 Torsten Kaiser > wrote: >> udisksd now again behaves normal, but I'm not sending this change as a >> patch, because I do not know about the locking and livetime of these >> objects to evaluate, if that is really the correct fix. > > Thanks for the bisection and analysis! Always easier when someone else does > the hard work :-) > > There is a much simpler patch (as you probably suspected). I'll post it in a > moment. Linux-4.0-rc4 is still broken as expected, but after applying your patch from "[PATCH] kernfs: handle poll correctly on 'direct_read' files" my udisksd process behaves normal again. Thanks for the quick answer + fix! Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: udiskd high CPU usage with 4.0 git
On Mon, Mar 16, 2015 at 12:44 AM, NeilBrown ne...@suse.de wrote: On Sat, 14 Mar 2015 21:16:51 +0100 Torsten Kaiser just.for.l...@googlemail.com wrote: udisksd now again behaves normal, but I'm not sending this change as a patch, because I do not know about the locking and livetime of these objects to evaluate, if that is really the correct fix. Thanks for the bisection and analysis! Always easier when someone else does the hard work :-) There is a much simpler patch (as you probably suspected). I'll post it in a moment. Linux-4.0-rc4 is still broken as expected, but after applying your patch from [PATCH] kernfs: handle poll correctly on 'direct_read' files my udisksd process behaves normal again. Thanks for the quick answer + fix! Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: udiskd high CPU usage with 4.0 git
On Mon, Mar 9, 2015 at 12:30 AM, NeilBrown wrote: > On Sun, 08 Mar 2015 18:14:39 +0100 Prakash Punnoor wrote: > >> Hi, >> >> I noticed the udisks daemon (version 2.1.4) suddenly started using high >> cpu (one core at 100%) with linux 4.0 git kernel. I bisected it to: >> >> 750f199ee8b578062341e6ddfe36c59ac8ff2dcb I had the same problem upgrading from 4.0-rc1 to 4.0-rc3. I have just finished bisecting and "fixing" it. My bisect points to the same commit. Looking at udisksd with strace sees a loop of polling and then accessing several md related sysfs files. The only file that udisksd monitors and was changes by that commit was "sync_action". If I revert this part of the commit, my system works normal again: static struct md_sysfs_entry md_scan_mode = - __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); + __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); It seems that polling is broken for peralloc files. The cause seems to be that kernfs_seq_show() updates ->event, while the new sysfs_kf_read() does not. So the polling will always trigger and udisksd goes into an inifinite loop looking for changes that are not there. I fixed my local system by copying the line "of->event = atomic_read(>kn->attr.open->event);" from kernfs_seq_show() into sysfs_kf_read(). (I also needed to move the definition of struct kernfs_open_node from kernfs/file.c to kefs-internal.h) udisksd now again behaves normal, but I'm not sending this change as a patch, because I do not know about the locking and livetime of these objects to evaluate, if that is really the correct fix. Thanks, Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: udiskd high CPU usage with 4.0 git
On Mon, Mar 9, 2015 at 12:30 AM, NeilBrown ne...@suse.de wrote: On Sun, 08 Mar 2015 18:14:39 +0100 Prakash Punnoor prak...@punnoor.de wrote: Hi, I noticed the udisks daemon (version 2.1.4) suddenly started using high cpu (one core at 100%) with linux 4.0 git kernel. I bisected it to: 750f199ee8b578062341e6ddfe36c59ac8ff2dcb I had the same problem upgrading from 4.0-rc1 to 4.0-rc3. I have just finished bisecting and fixing it. My bisect points to the same commit. Looking at udisksd with strace sees a loop of polling and then accessing several md related sysfs files. The only file that udisksd monitors and was changes by that commit was sync_action. If I revert this part of the commit, my system works normal again: static struct md_sysfs_entry md_scan_mode = - __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); + __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); It seems that polling is broken for peralloc files. The cause seems to be that kernfs_seq_show() updates -event, while the new sysfs_kf_read() does not. So the polling will always trigger and udisksd goes into an inifinite loop looking for changes that are not there. I fixed my local system by copying the line of-event = atomic_read(of-kn-attr.open-event); from kernfs_seq_show() into sysfs_kf_read(). (I also needed to move the definition of struct kernfs_open_node from kernfs/file.c to kefs-internal.h) udisksd now again behaves normal, but I'm not sending this change as a patch, because I do not know about the locking and livetime of these objects to evaluate, if that is really the correct fix. Thanks, Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] firmware: Create directories for external firmware
On Tue, Jul 8, 2014 at 2:47 PM, Michal Marek wrote: > Commit 5180d5f4 ("firmware: Simplify directory creation") broke > including firmware specified in CONFIG_EXTRA_FIRMWARE: > > MK_FW firmware/amd-ucode/microcode_amd.bin.gen.S > /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory > ... > firmware/Makefile:185: recipe for target > 'firmware/amd-ucode/microcode_amd.bin.gen.S' failed > > It works with O= builds, because the directory is created by > Makefile.build. Create the directory in firmware/Makefile in non-O > builds. > > Reported-by: Ronald > Reported-by: Torsten Kaiser > Signed-off-by: Michal Marek > --- > > Can you try this patch? Works fine for me. Thanks for the quick patch! Torsten > Ronald, can you tell me your full name for the Reported-by: line? > > Thanks. > --- > > firmware/Makefile | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/firmware/Makefile b/firmware/Makefile > index 5747417..0862d34 100644 > --- a/firmware/Makefile > +++ b/firmware/Makefile > @@ -219,6 +219,12 @@ $(obj)/%.fw: $(obj)/%.H16 $(ihex2fw_dep) > obj-y += $(patsubst %,%.gen.o, $(fw-external-y)) > obj-$(CONFIG_FIRMWARE_IN_KERNEL) += $(patsubst %,%.gen.o, $(fw-shipped-y)) > > +ifeq ($(KBUILD_SRC),) > +# Makefile.build only creates subdirectories for O= builds, but external > +# firmware might live outside the kernel source tree > +_dummy := $(foreach d,$(addprefix $(obj)/,$(dir $(fw-external-y))), $(shell > [ -d $(d) ] || mkdir -p $(d))) > +endif > + > # Remove .S files and binaries created from ihex > # (during 'make clean' .config isn't included so they're all in > $(fw-shipped-)) > targets := $(fw-shipped-) $(patsubst $(obj)/%,%, \ > -- > 1.8.4.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] firmware: Create directories for external firmware
On Tue, Jul 8, 2014 at 2:47 PM, Michal Marek mma...@suse.cz wrote: Commit 5180d5f4 (firmware: Simplify directory creation) broke including firmware specified in CONFIG_EXTRA_FIRMWARE: MK_FW firmware/amd-ucode/microcode_amd.bin.gen.S /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory ... firmware/Makefile:185: recipe for target 'firmware/amd-ucode/microcode_amd.bin.gen.S' failed It works with O= builds, because the directory is created by Makefile.build. Create the directory in firmware/Makefile in non-O builds. Reported-by: Ronald ronald...@gmail.com Reported-by: Torsten Kaiser just.for.l...@googlemail.com Signed-off-by: Michal Marek mma...@suse.cz --- Can you try this patch? Works fine for me. Thanks for the quick patch! Torsten Ronald, can you tell me your full name for the Reported-by: line? Thanks. --- firmware/Makefile | 6 ++ 1 file changed, 6 insertions(+) diff --git a/firmware/Makefile b/firmware/Makefile index 5747417..0862d34 100644 --- a/firmware/Makefile +++ b/firmware/Makefile @@ -219,6 +219,12 @@ $(obj)/%.fw: $(obj)/%.H16 $(ihex2fw_dep) obj-y += $(patsubst %,%.gen.o, $(fw-external-y)) obj-$(CONFIG_FIRMWARE_IN_KERNEL) += $(patsubst %,%.gen.o, $(fw-shipped-y)) +ifeq ($(KBUILD_SRC),) +# Makefile.build only creates subdirectories for O= builds, but external +# firmware might live outside the kernel source tree +_dummy := $(foreach d,$(addprefix $(obj)/,$(dir $(fw-external-y))), $(shell [ -d $(d) ] || mkdir -p $(d))) +endif + # Remove .S files and binaries created from ihex # (during 'make clean' .config isn't included so they're all in $(fw-shipped-)) targets := $(fw-shipped-) $(patsubst $(obj)/%,%, \ -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: firmware: Simplify directory creation + b43 = fails to build
On Wed, Jun 18, 2014 at 6:25 PM, Ronald wrote: > From my .config > > ==> cat /usr/src/config | grep -i b43 > CONFIG_EXTRA_FIRMWARE="b43/ucode5.fw b43/b0g0initvals5.fw > b43/b0g0bsinitvals5.fw b43/pcm5.fw" > ... snip ... That might be rather later, but I seem to have the same problem: CHK kernel/config_data.h MK_FW firmware/amd-ucode/microcode_amd.bin.gen.S /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory firmware/Makefile:185: recipe for target 'firmware/amd-ucode/microcode_amd.bin.gen.S' failed make[1]: *** [firmware/amd-ucode/microcode_amd.bin.gen.S] Error 1 Makefile:896: recipe for target 'firmware' failed make: *** [firmware] Error 2 The directory firmware/amd-ucode does not exist in the kernel source tree, but my .config seems to need it: CONFIG_EXTRA_FIRMWARE="radeon/R700_rlc.bin radeon/RV710_uvd.bin radeon/RV730_smc.bin amd-ucode/microcode_amd.bin" CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware" Just doing a "mkdir firmware/amd-ucode" lets the build continue and I get a working 3.16 kernel. With 3.15 and earlier I never had a problem with this, but 3.16-rc4 just failed with above message. Do you need my full .config or any other information about my system? I would be happy to provide that and/or test a patch. Thanks for looking into this! Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: firmware: Simplify directory creation + b43 = fails to build
On Wed, Jun 18, 2014 at 6:25 PM, Ronald ronald...@gmail.com wrote: From my .config == cat /usr/src/config | grep -i b43 CONFIG_EXTRA_FIRMWARE=b43/ucode5.fw b43/b0g0initvals5.fw b43/b0g0bsinitvals5.fw b43/pcm5.fw ... snip ... That might be rather later, but I seem to have the same problem: CHK kernel/config_data.h MK_FW firmware/amd-ucode/microcode_amd.bin.gen.S /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory firmware/Makefile:185: recipe for target 'firmware/amd-ucode/microcode_amd.bin.gen.S' failed make[1]: *** [firmware/amd-ucode/microcode_amd.bin.gen.S] Error 1 Makefile:896: recipe for target 'firmware' failed make: *** [firmware] Error 2 The directory firmware/amd-ucode does not exist in the kernel source tree, but my .config seems to need it: CONFIG_EXTRA_FIRMWARE=radeon/R700_rlc.bin radeon/RV710_uvd.bin radeon/RV730_smc.bin amd-ucode/microcode_amd.bin CONFIG_EXTRA_FIRMWARE_DIR=/lib/firmware Just doing a mkdir firmware/amd-ucode lets the build continue and I get a working 3.16 kernel. With 3.15 and earlier I never had a problem with this, but 3.16-rc4 just failed with above message. Do you need my full .config or any other information about my system? I would be happy to provide that and/or test a patch. Thanks for looking into this! Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86, amd, microcode: Fix error path in apply_microcode_amd()
Commit-ID: d982057f631df04f8d78321084a1a71ca51f3364 Gitweb: http://git.kernel.org/tip/d982057f631df04f8d78321084a1a71ca51f3364 Author: Torsten Kaiser AuthorDate: Tue, 23 Jul 2013 22:58:23 +0200 Committer: H. Peter Anvin CommitDate: Wed, 31 Jul 2013 08:37:14 -0700 x86, amd, microcode: Fix error path in apply_microcode_amd() Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Signed-off-by: Torsten Kaiser Link: http://lkml.kernel.org/r/20130723225823.2e4e7...@googlemail.com Acked-by: Borislav Petkov Signed-off-by: H. Peter Anvin --- arch/x86/kernel/microcode_amd.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c index 47ebb1d..7a0adb7 100644 --- a/arch/x86/kernel/microcode_amd.c +++ b/arch/x86/kernel/microcode_amd.c @@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu) return 0; } - if (__apply_microcode_amd(mc_amd)) + if (__apply_microcode_amd(mc_amd)) { pr_err("CPU%d: update failed for patch_level=0x%08x\n", cpu, mc_amd->hdr.patch_id); - else - pr_info("CPU%d: new patch_level=0x%08x\n", cpu, - mc_amd->hdr.patch_id); + return -1; + } + pr_info("CPU%d: new patch_level=0x%08x\n", cpu, + mc_amd->hdr.patch_id); uci->cpu_sig.rev = mc_amd->hdr.patch_id; c->microcode = mc_amd->hdr.patch_id; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86, amd, microcode: Fix error path in apply_microcode_amd()
Commit-ID: d982057f631df04f8d78321084a1a71ca51f3364 Gitweb: http://git.kernel.org/tip/d982057f631df04f8d78321084a1a71ca51f3364 Author: Torsten Kaiser just.for.l...@googlemail.com AuthorDate: Tue, 23 Jul 2013 22:58:23 +0200 Committer: H. Peter Anvin h...@linux.intel.com CommitDate: Wed, 31 Jul 2013 08:37:14 -0700 x86, amd, microcode: Fix error path in apply_microcode_amd() Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com Link: http://lkml.kernel.org/r/20130723225823.2e4e7...@googlemail.com Acked-by: Borislav Petkov b...@suse.de Signed-off-by: H. Peter Anvin h...@linux.intel.com --- arch/x86/kernel/microcode_amd.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c index 47ebb1d..7a0adb7 100644 --- a/arch/x86/kernel/microcode_amd.c +++ b/arch/x86/kernel/microcode_amd.c @@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu) return 0; } - if (__apply_microcode_amd(mc_amd)) + if (__apply_microcode_amd(mc_amd)) { pr_err(CPU%d: update failed for patch_level=0x%08x\n, cpu, mc_amd-hdr.patch_id); - else - pr_info(CPU%d: new patch_level=0x%08x\n, cpu, - mc_amd-hdr.patch_id); + return -1; + } + pr_info(CPU%d: new patch_level=0x%08x\n, cpu, + mc_amd-hdr.patch_id); uci-cpu_sig.rev = mc_amd-hdr.patch_id; c-microcode = mc_amd-hdr.patch_id; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]Fix early microcode loading on AMD
On Wed, Jul 24, 2013 at 4:19 PM, Borislav Petkov wrote: > On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote: >> > The other problem I see is not updating c->microcode since it is going >> > to be overwritten by smp_store_cpu_info, which is unfortunate. >> > >> > And I don't see where Intel are updating that cpuinfo_x86.microcode >> > field on early load too. >> > >> > So, AFAICT, c->microcode would remain unset when we only do early >> > microcode load. But that is something we should fix as a later patch. >> >> I don't see a problem with that staying unset. >> apply_microcode_amd() directly reads the rev from >> MSR_AMD64_PATCH_LEVEL so it does not depend on that being correct. >> And smp_store_(boot)_cpu_info will also read the current rev directly >> from the CPU to fill ->microcode. > > We need to store the actual microcode revision to c->microcode for > /proc/cpuinfo and MCE. init_amd() will fill that field. (You could alway compile with CONFIG_MICROCODE_AMD=n and that field would still need filling) And as that will get called before smp_store_(boo)_cpu_info() everything should be fine. >> > So I think you should switch load_ucode_amd_ap to __apply_microcode_amd: >> > >> > p = find_patch() >> > >> > __apply_microcode_amd(p->mc_data); >> > >> > which should take care of the issue you're seeing, IMHO. >> >> The issue I'm seeing is that collect_cpu_info_amd_early() fills c->x86 >> but not c->x86_vendor. >> Which breaks cpu_has_amd_erratum() and then Erratum 400 breaks the boot. >> >> I did not really want to switch from apply_microcode_amd() to >> __apply_microcode_amd() because then I would lose the check if the new >> microcode is really an upgrade. > > Well, if the BSP has already loaded the pcache, there's no need for > the AP to parse and load the same microcode blobs file for the initrd, > right? loading != applying. load_ucode_amd_ap() should probably called apply_ucode_amd_ap() because that is primarily for applying the microcode. That it also loads it (but really only once thanks to ucode_loaded) is only because nobody else has run yet. That whole place is hairy: Because on 32bit that seems to run much earlier the 64 and 32 cases are very different. 64bit can and will use pcache/apply_microcode_amd() for the non BSP CPUs, but on 32 bit everything directly applys the patches from initrd memory into the CPUs be directly calling __apply_microcode_amd(). And so bypassing pcache. See comment above the 32bit version of load_ucode_amd_ap(): /* * On 32-bit, since AP's early load occurs before paging is turned on, we * cannot traverse cpu_equiv_table and pcache in kernel heap memory. So during * cold boot, AP will apply_ucode_in_initrd() just like the BSP. During * save_microcode_in_initrd_amd() BSP's patch is copied to amd_bsp_mpb, which * is used upon resume from suspend. */ As written in the other email: I'm currently trying to see if I can kill amd_bsp_mpb... >> >> * load_ucode_ap(): Quick exit for !cpu, because without >> >> load_microcode_amd() >> >> getting called apply_microcode_amd() can't do anything. Exit, if no >> >> microcode >> >> could be loaded. >> > >> > This could probably be a WARN_ON(!cpu) to catch errors... >> >> No, load_ucode_ap() will be called for cpu == 0. > > This needs fixing IMO... Can't answer that. I have only seen that it is called for cpu == 0 and that there is no special case für CPU#0 in all the places that call load_ucode_ap()... > Btw, thanks for looking at this and asking critical questions! > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]Fix early microcode loading on AMD
On Wed, Jul 24, 2013 at 3:56 PM, Borislav Petkov wrote: > On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote: >> >> * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could >> >> later load an older microcode file that would overwrite amd_bsp_mpb, >> >> but would not be applied to the CPUs >> > >> > See the patch id check in apply_ucode_in_initrd()? >> > >> > if (eq_id == mc->hdr.processor_rev_id && rev < mc->hdr.patch_id) >> >> I meant with "load" load_microcode_amd() not the loading of the >> microcode into the CPU. >> >> 1.: load microcode rev X into CPU (early or normal is not important) >> 2.: get older microcode file that only contains rev Y with Y> 3.: trigger load_microcode_amd() with a corrupt file: This will call >> cleanup() and empty pcache. > > Ok, that's actually a good catch. So I wonder: why in hell would we > flush the pcache if some of the blobs we're loading are corrupted. So > what?! Jacob, what were you thinking - I'd be very interested to know > what the idea behind this was. > > So, just to refresh everybody: the idea of the pcache is to keep all > patches for the current family in memory so that we can support all > sorts of hotplug and cpu mixed stepping diddling. Then it would probably be the best to kill free_cache() completely. Which would mean cleanup() should also go. Which will make unloading microcode_amd.ko impossible. But that is probably a good idea anyway: If you unload the module there is no way to keep pcache. But I still have another way to kill you: free_equiv_cpu_table() Without that table find_patch() can't work and will not return the correct information. And that can be triggered by: * start of load_microcode_amd(): If you reach that function (Only UCODE_MAGIC needs to be in the file) that table is dead. * __load_microcode_amd(): If the file only contains the table but no patches ("invalid type field in container file section header\n") >> 4.: trigger load_microcode_amd() with the older file: >> * this will now load rev Y into pcache >> * rev Y will be returned by find_patch and copied into amd_bsp_mpb >> * any try to apply rev Y will be skipped in apply_microcode_amd() >> >> So now the CPU still correctly has rev X, but amd_bsp_mpb will contain >> the wrong rev Y. > > Right, so this shouldn't happen - what should happen is, pcache would > hold both X and Y and find_patch would automatically give you the right > one. > > And this is guaranteed since we keep the patches in a sorted linked list > by ->patch_id which is guaranteed to be increasing. > > So actually load_microcode_amd() shouldn't be doing cleanup() but simply > return ret upwards. But it already called free_equiv_cpu_table() and so pcache is inaccessible. And I don't think just preserving equiv_cpu_table for restoring in the error case will be the right solution: If the new firmware file contains a new table with fewer entries (or different entries!) some of the patches in pcache might become inaccessible. >> That copying already in load_microcode also is suspicious if someone >> would only load the microcode but not apply it. But I did not find >> a codepath in arch/x86/kernel/microcode_core.c to load it without a >> followup apply. > > Yeah, we always load and apply. > > So now back to the original problem - load_microcode_amd() shouldn't > clear the pcache and, in that case, a subsequent find_patch() would > always give the right patch. Not if equiv_cpu_table got mangled. So should install_equiv_cpu_table() be turned into add_to_equiv_cpu_table() or should pcache save all cpu_sig with each patch, so that find_patch() no longer needs equiv_cpu_table? I suspect saving that in struct ucode_patch might be better, to prevent changes in equiv_id <-> cpu_sig mapping to make a patch inaccessible. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]Fix early microcode loading on AMD
On Wed, Jul 24, 2013 at 3:41 PM, Borislav Petkov wrote: > Let me try to answer this as well as I can, peacemeal-wise. > > On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote: >> On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov wrote: >> > On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote: >> >> Fixup the early AMD microcode loading. >> >> >> >> * load_microcode_amd() (and the helper its using) should not have an >> >> cpu parameter. >> > >> > Hmm, I don't think so - we get the cpu handed down from microcode_core >> > and besides the early load on 32bit needs to do find_patch(cpu). >> >> Thats why I moved that part into apply_microcode_amd(). See later on >> more, why I think that move is the right thing. >> And without that the current cpu parameter will only be used to get >> the (in the early load case not even correctly set up!) per-cpu data. >> But the only member of cpuinfo_x86 that gets uses is ->x86, the family. >> Line 159: switch(c->x86) and Line 301: if (proc_fam!)c->x86) >> >> I really wanted to make that switch from cpu to x86family a separate >> patch, that it would be more obvious correct, but because of that >> amd_bsp_mpb hunk I can't find a good cut and thats why this patch is >> larger that I would have preferred. > > Ok. First moving that hunk, then switching from cpu to x86family did work. See patch 4/5 and 5/5. :-) >> >> The microcode loading is not depending on the CPU it is >> > >> > Mostly. There are mixed-stepping boxes which need to differentiate >> > between which cpu we're applying the patch for. >> >> Nothing looks at ->x86_model or ->x86_mask during load. >> It will always load all patches from the current family. > > Yes, that's the idea. We want to have all patches for the current family > loaded. And thats why switching from cpu to x86family is OK: during *load* we only care for the family. >> If loading would really depend on the current cpu in a mixed >> system that would be horrible: Depending on which CPU gets execute >> load_microcode_amd() it there would be different patches loaded into >> RAM? > > No, we load the microcode based on CPUID(1).EAX which is in the > equivalence table. Look at find_equiv_id(). > > But for that we need all patches belonging to the current family to be > in the cache. I think you confused *load* and *apply*. load_microcode_amd() *loads* the microcode from a firmwarefile into the pcache list. This wants all patches for the family and thats why my switch here is OK. apply_microcode_amd() *applies* the microcode to the CPU / "loads" it into the CPU. That function (or better its helper find_patch()) need the full stepping/masking. I did not change that function, because in that case 'cpu' makes sense as a parameter, because the microcode needs to be applied for each CPU. (You could argue that that parameter is also stupid: If you ever pass something else as raw_smp_processor_id() then it will BUG(). But removing that parameter would need to change the whole microcode_core.c and also microcode_intel.c. And there that parameter might make sense, so it's better to keep 'cpu' for apply_microcode_amd()) But wrt. you concern about mixed stepping systems: There early microcode loading is definitly broken for 32bit. The current mainline code will save the patch for the BSP in amd_bsp_mpb and then apply that to all CPUs irregardless of its stepping. With my change in 4/5 to move the amd_bsp_mpb setup to apply time it will now wrongly patch all CPUs with the microcode that was loaded last. But u8 amd_bsp_mpb[NR_CPUS][MPB_MAX_SIZE] doesn't look like a good idea. Maybe the best way here is to fail apply_microcode_amd() if amd_bsp_mpb already contains an incompatible patch and in load_ucode_amd_ap() only apply it when the cpu_sig matches. Or u8 amd_bsp_mpb[4][MPB_MAX_SIZE] which would support up to 4 different steppings per system. No patch yet, because I do not understand why that is not a problem on 64bit. load_ucode_amd_bsp() is shared between 32 and 64 so if that code works then I can't really find a need for amd_bsp_mpb at all. So my current plan is to look into who calls load_ucode_amd_bsp() and load_ucode_amd_ap() and in what sequence (..hopefully in the same sequence on 32 and 64bit...) and if I can find a rational why amd_bsp_mpb can be killed, I will send you a patch. Otherwise I will try to create something that will fail apply_microcode_amd() in a safe way, if CONFIG_MICROCODE_AMD_EARLY gets uses on a mixed system. >> > Btw, your config boots on my F14h box with "nomodeset" on the command >> > line because it is missing radeon firmware for my gpu. >> >> I suspect a
Re: [PATCH]Fix early microcode loading on AMD
On Wed, Jul 24, 2013 at 3:41 PM, Borislav Petkov b...@alien8.de wrote: Let me try to answer this as well as I can, peacemeal-wise. On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote: On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov b...@alien8.de wrote: On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote: Fixup the early AMD microcode loading. * load_microcode_amd() (and the helper its using) should not have an cpu parameter. Hmm, I don't think so - we get the cpu handed down from microcode_core and besides the early load on 32bit needs to do find_patch(cpu). Thats why I moved that part into apply_microcode_amd(). See later on more, why I think that move is the right thing. And without that the current cpu parameter will only be used to get the (in the early load case not even correctly set up!) per-cpu data. But the only member of cpuinfo_x86 that gets uses is -x86, the family. Line 159: switch(c-x86) and Line 301: if (proc_fam!)c-x86) I really wanted to make that switch from cpu to x86family a separate patch, that it would be more obvious correct, but because of that amd_bsp_mpb hunk I can't find a good cut and thats why this patch is larger that I would have preferred. Ok. First moving that hunk, then switching from cpu to x86family did work. See patch 4/5 and 5/5. :-) The microcode loading is not depending on the CPU it is Mostly. There are mixed-stepping boxes which need to differentiate between which cpu we're applying the patch for. Nothing looks at -x86_model or -x86_mask during load. It will always load all patches from the current family. Yes, that's the idea. We want to have all patches for the current family loaded. And thats why switching from cpu to x86family is OK: during *load* we only care for the family. If loading would really depend on the current cpu in a mixed system that would be horrible: Depending on which CPU gets execute load_microcode_amd() it there would be different patches loaded into RAM? No, we load the microcode based on CPUID(1).EAX which is in the equivalence table. Look at find_equiv_id(). But for that we need all patches belonging to the current family to be in the cache. I think you confused *load* and *apply*. load_microcode_amd() *loads* the microcode from a firmwarefile into the pcache list. This wants all patches for the family and thats why my switch here is OK. apply_microcode_amd() *applies* the microcode to the CPU / loads it into the CPU. That function (or better its helper find_patch()) need the full stepping/masking. I did not change that function, because in that case 'cpu' makes sense as a parameter, because the microcode needs to be applied for each CPU. (You could argue that that parameter is also stupid: If you ever pass something else as raw_smp_processor_id() then it will BUG(). But removing that parameter would need to change the whole microcode_core.c and also microcode_intel.c. And there that parameter might make sense, so it's better to keep 'cpu' for apply_microcode_amd()) But wrt. you concern about mixed stepping systems: There early microcode loading is definitly broken for 32bit. The current mainline code will save the patch for the BSP in amd_bsp_mpb and then apply that to all CPUs irregardless of its stepping. With my change in 4/5 to move the amd_bsp_mpb setup to apply time it will now wrongly patch all CPUs with the microcode that was loaded last. But u8 amd_bsp_mpb[NR_CPUS][MPB_MAX_SIZE] doesn't look like a good idea. Maybe the best way here is to fail apply_microcode_amd() if amd_bsp_mpb already contains an incompatible patch and in load_ucode_amd_ap() only apply it when the cpu_sig matches. Or u8 amd_bsp_mpb[4][MPB_MAX_SIZE] which would support up to 4 different steppings per system. No patch yet, because I do not understand why that is not a problem on 64bit. load_ucode_amd_bsp() is shared between 32 and 64 so if that code works then I can't really find a need for amd_bsp_mpb at all. So my current plan is to look into who calls load_ucode_amd_bsp() and load_ucode_amd_ap() and in what sequence (..hopefully in the same sequence on 32 and 64bit...) and if I can find a rational why amd_bsp_mpb can be killed, I will send you a patch. Otherwise I will try to create something that will fail apply_microcode_amd() in a safe way, if CONFIG_MICROCODE_AMD_EARLY gets uses on a mixed system. Btw, your config boots on my F14h box with nomodeset on the command line because it is missing radeon firmware for my gpu. I suspect a F14h box will never see that hang. It trips over the the C1E erratum and amd_erratum_400[] looks like it only affects 0xfh and 0x10h (like my Phenom II X6). I could fire up my F10h if needed :) executed and all the loaded patches will end up in a global list for all CPUs anyway. * Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Yep, this part I
Re: [PATCH]Fix early microcode loading on AMD
On Wed, Jul 24, 2013 at 3:56 PM, Borislav Petkov b...@alien8.de wrote: On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote: * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later load an older microcode file that would overwrite amd_bsp_mpb, but would not be applied to the CPUs See the patch id check in apply_ucode_in_initrd()? if (eq_id == mc-hdr.processor_rev_id rev mc-hdr.patch_id) I meant with load load_microcode_amd() not the loading of the microcode into the CPU. 1.: load microcode rev X into CPU (early or normal is not important) 2.: get older microcode file that only contains rev Y with YX 3.: trigger load_microcode_amd() with a corrupt file: This will call cleanup() and empty pcache. Ok, that's actually a good catch. So I wonder: why in hell would we flush the pcache if some of the blobs we're loading are corrupted. So what?! Jacob, what were you thinking - I'd be very interested to know what the idea behind this was. So, just to refresh everybody: the idea of the pcache is to keep all patches for the current family in memory so that we can support all sorts of hotplug and cpu mixed stepping diddling. Then it would probably be the best to kill free_cache() completely. Which would mean cleanup() should also go. Which will make unloading microcode_amd.ko impossible. But that is probably a good idea anyway: If you unload the module there is no way to keep pcache. But I still have another way to kill you: free_equiv_cpu_table() Without that table find_patch() can't work and will not return the correct information. And that can be triggered by: * start of load_microcode_amd(): If you reach that function (Only UCODE_MAGIC needs to be in the file) that table is dead. * __load_microcode_amd(): If the file only contains the table but no patches (invalid type field in container file section header\n) 4.: trigger load_microcode_amd() with the older file: * this will now load rev Y into pcache * rev Y will be returned by find_patch and copied into amd_bsp_mpb * any try to apply rev Y will be skipped in apply_microcode_amd() So now the CPU still correctly has rev X, but amd_bsp_mpb will contain the wrong rev Y. Right, so this shouldn't happen - what should happen is, pcache would hold both X and Y and find_patch would automatically give you the right one. And this is guaranteed since we keep the patches in a sorted linked list by -patch_id which is guaranteed to be increasing. So actually load_microcode_amd() shouldn't be doing cleanup() but simply return ret upwards. But it already called free_equiv_cpu_table() and so pcache is inaccessible. And I don't think just preserving equiv_cpu_table for restoring in the error case will be the right solution: If the new firmware file contains a new table with fewer entries (or different entries!) some of the patches in pcache might become inaccessible. That copying already in load_microcode also is suspicious if someone would only load the microcode but not apply it. But I did not find a codepath in arch/x86/kernel/microcode_core.c to load it without a followup apply. Yeah, we always load and apply. So now back to the original problem - load_microcode_amd() shouldn't clear the pcache and, in that case, a subsequent find_patch() would always give the right patch. Not if equiv_cpu_table got mangled. So should install_equiv_cpu_table() be turned into add_to_equiv_cpu_table() or should pcache save all cpu_sig with each patch, so that find_patch() no longer needs equiv_cpu_table? I suspect saving that in struct ucode_patch might be better, to prevent changes in equiv_id - cpu_sig mapping to make a patch inaccessible. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]Fix early microcode loading on AMD
On Wed, Jul 24, 2013 at 4:19 PM, Borislav Petkov b...@alien8.de wrote: On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote: The other problem I see is not updating c-microcode since it is going to be overwritten by smp_store_cpu_info, which is unfortunate. And I don't see where Intel are updating that cpuinfo_x86.microcode field on early load too. So, AFAICT, c-microcode would remain unset when we only do early microcode load. But that is something we should fix as a later patch. I don't see a problem with that staying unset. apply_microcode_amd() directly reads the rev from MSR_AMD64_PATCH_LEVEL so it does not depend on that being correct. And smp_store_(boot)_cpu_info will also read the current rev directly from the CPU to fill -microcode. We need to store the actual microcode revision to c-microcode for /proc/cpuinfo and MCE. init_amd() will fill that field. (You could alway compile with CONFIG_MICROCODE_AMD=n and that field would still need filling) And as that will get called before smp_store_(boo)_cpu_info() everything should be fine. So I think you should switch load_ucode_amd_ap to __apply_microcode_amd: p = find_patch() __apply_microcode_amd(p-mc_data); which should take care of the issue you're seeing, IMHO. The issue I'm seeing is that collect_cpu_info_amd_early() fills c-x86 but not c-x86_vendor. Which breaks cpu_has_amd_erratum() and then Erratum 400 breaks the boot. I did not really want to switch from apply_microcode_amd() to __apply_microcode_amd() because then I would lose the check if the new microcode is really an upgrade. Well, if the BSP has already loaded the pcache, there's no need for the AP to parse and load the same microcode blobs file for the initrd, right? loading != applying. load_ucode_amd_ap() should probably called apply_ucode_amd_ap() because that is primarily for applying the microcode. That it also loads it (but really only once thanks to ucode_loaded) is only because nobody else has run yet. That whole place is hairy: Because on 32bit that seems to run much earlier the 64 and 32 cases are very different. 64bit can and will use pcache/apply_microcode_amd() for the non BSP CPUs, but on 32 bit everything directly applys the patches from initrd memory into the CPUs be directly calling __apply_microcode_amd(). And so bypassing pcache. See comment above the 32bit version of load_ucode_amd_ap(): /* * On 32-bit, since AP's early load occurs before paging is turned on, we * cannot traverse cpu_equiv_table and pcache in kernel heap memory. So during * cold boot, AP will apply_ucode_in_initrd() just like the BSP. During * save_microcode_in_initrd_amd() BSP's patch is copied to amd_bsp_mpb, which * is used upon resume from suspend. */ As written in the other email: I'm currently trying to see if I can kill amd_bsp_mpb... * load_ucode_ap(): Quick exit for !cpu, because without load_microcode_amd() getting called apply_microcode_amd() can't do anything. Exit, if no microcode could be loaded. This could probably be a WARN_ON(!cpu) to catch errors... No, load_ucode_ap() will be called for cpu == 0. This needs fixing IMO... Can't answer that. I have only seen that it is called for cpu == 0 and that there is no special case für CPU#0 in all the places that call load_ucode_ap()... Btw, thanks for looking at this and asking critical questions! -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]Fix early microcode loading on AMD
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov wrote: > On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote: >> Fixup the early AMD microcode loading. >> >> * load_microcode_amd() (and the helper its using) should not have an >> cpu parameter. > > Hmm, I don't think so - we get the cpu handed down from microcode_core > and besides the early load on 32bit needs to do find_patch(cpu). > >> The microcode loading is not depending on the CPU it is > > Mostly. There are mixed-stepping boxes which need to differentiate > between which cpu we're applying the patch for. I redid the patch in 5 parts, hopefully now better to understand. Without the other changes the microcode_amd.c-part of patch 5/5 should make it much more obvious that my change did not result in a different behavior about which patches get loaded into the microcode cache 'pcache'. > Btw, your config boots on my F14h box with "nomodeset" on the command > line because it is missing radeon firmware for my gpu. > >> executed and all the loaded patches will end up in a global list for all >> CPUs anyway. >> * Return -1 (like Intels apply_microcode) when the loading fails, >> also do not set the active microcode level on failure. > > Yep, this part I want. Please send it as a separate patch. That is now patch 1/5. Patch 2/5 is new, I skipped that part originally because I did not want to make it even bigger... > So I see a couple of issues in this patch and they should be separated > into single patches - one patch taking care of one issue and explaining > what the problem is in the commit message (I know you can do that good > :)). I'm still seeing some things in the microcode code that look suspicious: Why is the X86_64 code updating uci->cpu_sig.rev, but the 32bit version does not? And I can't see anything that reads that value. Should apply_microcode_amd() really update uci->mc even before checking if the microcode is newer? The X86_32 hunk in save_microcode_in_initrd_amd() now seems obsolete. load_microcode_amd() is no longer using find_patch() so it doesn't use ucode_cpu_info anymore. But why is that code using boot_cpu_data.cpu_index to find the BSP but always then passing 0 as cpu parameter to load_microcode_amd()? If boot_cpu_data.cpu_index is ever !=0 that code would fail. ... and collect_cpu_info_amd() also looks very weird. If csig would not point to uci->cpu_sig then find_patch() will not be happy. Wouldn't directly passing cpuid_eax(0x0001) to find_patch() be a better interface? Then the early microcode loading code would not need to access ucode_cpu_info at all. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] x86, AMD: simplify load_microcode_amd() to fix early microcode loading to no longer access uninitialized per-cpu data
load_microcode_amd() (and the helper it is using) should not have an cpu parameter. The microcode loading is not depending on the CPU it is executed and all the loaded patches will end up in a global list for all CPUs anyway. The change from cpu to x86family in load_microcode_amd() now allows to drop the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which is wrong anyway because at that point the per-cpu cpu_info is not yet setup. And these values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info(). Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place and without the cpuinfo_x86 accesses it was not much left. Signed-off-by: Torsten Kaiser --- One effect of this early, partly initialisation of cpu_info was, that the fallback logic in cpu_has_amd_erratum() did not use boot_cpu_data and because x86_vendor was not initialised in the per-cpu struct the E400 erratum was not activated on my system resulting in a failed boot. --- a/arch/x86/include/asm/microcode_amd.h 2013-07-23 20:15:10.549501081 +0200 +++ b/arch/x86/include/asm/microcode_amd.h 2013-07-23 20:16:05.329500620 +0200 @@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e extern int __apply_microcode_amd(struct microcode_amd *mc_amd); extern int apply_microcode_amd(int cpu); -extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size); +extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size); #ifdef CONFIG_MICROCODE_AMD_EARLY #ifdef CONFIG_X86_32 --- a/arch/x86/kernel/microcode_amd.c 2013-07-23 20:05:04.469506188 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-23 20:23:22.739496934 +0200 @@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu, return 0; } -static unsigned int verify_patch_size(int cpu, u32 patch_size, +static unsigned int verify_patch_size(u8 x86family, u32 patch_size, unsigned int size) { - struct cpuinfo_x86 *c = _data(cpu); u32 max_size; #define F1XH_MPB_MAX_SIZE 2048 @@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in #define F15H_MPB_MAX_SIZE 4096 #define F16H_MPB_MAX_SIZE 3458 - switch (c->x86) { + switch (x86family) { case 0x14: max_size = F14H_MPB_MAX_SIZE; break; @@ -283,9 +282,8 @@ static void cleanup(void) * driver cannot continue functioning normally. In such cases, we tear * down everything we've used up so far and exit. */ -static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int leftover) +static int verify_and_add_patch(u8 x86family, u8 *fw, unsigned int leftover) { - struct cpuinfo_x86 *c = _data(cpu); struct microcode_header_amd *mc_hdr; struct ucode_patch *patch; unsigned int patch_size, crnt_size, ret; @@ -305,7 +303,7 @@ static int verify_and_add_patch(unsigned /* check if patch is for the current family */ proc_fam = ((proc_fam >> 8) & 0xf) + ((proc_fam >> 20) & 0xff); - if (proc_fam != c->x86) + if (proc_fam != x86family) return crnt_size; if (mc_hdr->nb_dev_id || mc_hdr->sb_dev_id) { @@ -314,7 +312,7 @@ static int verify_and_add_patch(unsigned return crnt_size; } - ret = verify_patch_size(cpu, patch_size, leftover); + ret = verify_patch_size(x86family, patch_size, leftover); if (!ret) { pr_err("Patch-ID 0x%08x: size mismatch.\n", mc_hdr->patch_id); return crnt_size; @@ -345,7 +343,7 @@ static int verify_and_add_patch(unsigned return crnt_size; } -static enum ucode_state __load_microcode_amd(int cpu, const u8 *data, size_t size) +static enum ucode_state __load_microcode_amd(u8 x86family, const u8 *data, size_t size) { enum ucode_state ret = UCODE_ERROR; unsigned int leftover; @@ -368,7 +366,7 @@ static enum ucode_state __load_microcode } while (leftover) { - crnt_size = verify_and_add_patch(cpu, fw, leftover); + crnt_size = verify_and_add_patch(x86family, fw, leftover); if (crnt_size < 0) return ret; @@ -379,14 +377,14 @@ static enum ucode_state __load_microcode return UCODE_OK; } -enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size) +enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size) { enum ucode_state ret; /* free old equiv table */ free_equiv_cpu_table(); - ret = __load_microcode_amd(cpu, data, size); + ret = __load_microcode_amd(x86family, data, size); if (ret != UCODE_OK) cleanup(); @@ -436,7 +434,7 @@ static enum ucode_state request_microcod goto fw_release; } -
[PATCH 4/5] x86, AMD: saved applied, not loaded microcode for reloading on resume
* Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later load an older microcode file via load_microcode_amd() that would overwrite amd_bsp_mpb, but would not be applied to the CPUs (apply_microcode_amd() checks the current patchlevel, but the copy code in load_microcode_adm() did not. If somehow cleanup() gets called and clears pcache find_patch() could return return older patches then the currently installed microcode) * Save the amd_bsp_mpb on every update. Otherwise, if someone would update the microcode after offlining the BSP, these updates would not get saved and would be lost on resume. * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because load_microcode_amd() its no longer doing this and its not using apply_microcode_amd(). Signed-off-by: Torsten Kaiser --- Removing this hunk from load_microcode_amd() also allows me to kill the cpu parameter for that function in the next patch... --- a/arch/x86/kernel/microcode_amd.c 2013-07-23 19:43:30.359517091 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-23 20:05:04.469506188 +0200 @@ -228,6 +228,12 @@ int apply_microcode_amd(int cpu) pr_info("CPU%d: new patch_level=0x%08x\n", cpu, mc_amd->hdr.patch_id); +#if defined(CONFIG_MICROCODE_AMD_EARLY) && defined(CONFIG_X86_32) + /* save applied patch for early load */ + memset(amd_bsp_mpb, 0, MPB_MAX_SIZE); + memcpy(amd_bsp_mpb, p->data, min_t(u32, ksize(p->data), MPB_MAX_SIZE)); +#endif + uci->cpu_sig.rev = mc_amd->hdr.patch_id; c->microcode = mc_amd->hdr.patch_id; @@ -385,17 +391,6 @@ enum ucode_state load_microcode_amd(int if (ret != UCODE_OK) cleanup(); -#if defined(CONFIG_MICROCODE_AMD_EARLY) && defined(CONFIG_X86_32) - /* save BSP's matching patch for early load */ - if (cpu_data(cpu).cpu_index == boot_cpu_data.cpu_index) { - struct ucode_patch *p = find_patch(cpu); - if (p) { - memset(amd_bsp_mpb, 0, MPB_MAX_SIZE); - memcpy(amd_bsp_mpb, p->data, min_t(u32, ksize(p->data), - MPB_MAX_SIZE)); - } - } -#endif return ret; } --- a/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 +0200 +++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:05:14.969506099 +0200 @@ -170,6 +170,13 @@ static void apply_ucode_in_initrd(void * mc = (struct microcode_amd *)(data + SECTION_HDR_SIZE); if (eq_id == mc->hdr.processor_rev_id && rev < mc->hdr.patch_id) if (__apply_microcode_amd(mc) == 0) { +#ifdef CONFIG_X86_32 + /* save applied patch for early load */ + memset((void *)__pa(amd_bsp_mpb), 0, + MPB_MAX_SIZE); + memcpy((void *)__pa(amd_bsp_mpb), mc, + min_t(u32, header[1], MPB_MAX_SIZE)); +#endif rev = mc->hdr.patch_id; *new_rev = rev; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] x86, AMD: cleanup: merge common code in early microcode loading
Extract common checks and initialisations from load_ucode_ap() and save_microcode_in_initrd_amd() to load_microcode_amd_early(). load_ucode_ap() gets a quick exit for !cpu, because for the BSP there is already a different function dealing with its update. The original code already didn't anything, because without load_microcode_amd() getting called apply_microcode_amd() could not do anything. Signed-off-by: Torsten Kaiser --- a/arch/x86/kernel/microcode_amd_early.c 2013-07-22 06:22:32.0 +0200 +++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 +0200 @@ -196,6 +196,23 @@ void __init load_ucode_amd_bsp(void) apply_ucode_in_initrd(cd.data, cd.size); } +static int load_microcode_amd_early(void) +{ + enum ucode_state ret; + void *ucode; + + if (ucode_loaded || !ucode_size || !initrd_start) + return 0; + + ucode = (void *)(initrd_start + ucode_offset); + ret = load_microcode_amd(0, ucode, ucode_size); + if (ret != UCODE_OK) + return -EINVAL; + + ucode_loaded = true; + return 0; +} + #ifdef CONFIG_X86_32 u8 amd_bsp_mpb[MPB_MAX_SIZE]; @@ -258,17 +275,13 @@ void load_ucode_amd_ap(void) collect_cpu_info_amd_early(_data(cpu), ucode_cpu_info + cpu); - if (cpu && !ucode_loaded) { - void *ucode; - - if (!ucode_size || !initrd_start) - return; + /* BSP via load_ucode_amd_bsp() */ + if (!cpu) + return; - ucode = (void *)(initrd_start + ucode_offset); - if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK) - return; - ucode_loaded = true; - } + load_microcode_amd_early(); + if (!ucode_loaded) + return; apply_microcode_amd(cpu); } @@ -276,8 +289,6 @@ void load_ucode_amd_ap(void) int __init save_microcode_in_initrd_amd(void) { - enum ucode_state ret; - void *ucode; #ifdef CONFIG_X86_32 unsigned int bsp = boot_cpu_data.cpu_index; struct ucode_cpu_info *uci = ucode_cpu_info + bsp; @@ -289,14 +300,5 @@ int __init save_microcode_in_initrd_amd( pr_info("microcode: updated early to new patch_level=0x%08x\n", ucode_new_rev); - if (ucode_loaded || !ucode_size || !initrd_start) - return 0; - - ucode = (void *)(initrd_start + ucode_offset); - ret = load_microcode_amd(0, ucode, ucode_size); - if (ret != UCODE_OK) - return -EINVAL; - - ucode_loaded = true; - return 0; + return load_microcode_amd_early(); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] x86, microcode: Don't lose error returns in save_microcode_in_initrd()
Don't lose the error return. This was lost when early amd microcode loading was added in 757885e94a22bcc82beb9b1445c95218cb20ceab Signed-off-by: Torsten Kaiser --- a/arch/x86/kernel/microcode_core_early.c2013-07-23 19:44:05.509516795 +0200 +++ b/arch/x86/kernel/microcode_core_early.c2013-07-23 19:58:34.459509474 +0200 @@ -127,11 +127,11 @@ int __init save_microcode_in_initrd(void switch (c->x86_vendor) { case X86_VENDOR_INTEL: if (c->x86 >= 6) - save_microcode_in_initrd_intel(); + return save_microcode_in_initrd_intel(); break; case X86_VENDOR_AMD: if (c->x86 >= 0x10) - save_microcode_in_initrd_amd(); + return save_microcode_in_initrd_amd(); break; default: break; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] x86, AMD: fix error path in apply_microcode_amd()
Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Signed-off-by: Torsten Kaiser --- a/arch/x86/kernel/microcode_amd.c 2013-07-23 19:42:16.089517717 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-23 19:43:30.359517091 +0200 @@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu) return 0; } - if (__apply_microcode_amd(mc_amd)) + if (__apply_microcode_amd(mc_amd)) { pr_err("CPU%d: update failed for patch_level=0x%08x\n", cpu, mc_amd->hdr.patch_id); - else - pr_info("CPU%d: new patch_level=0x%08x\n", cpu, - mc_amd->hdr.patch_id); + return -1; + } + pr_info("CPU%d: new patch_level=0x%08x\n", cpu, + mc_amd->hdr.patch_id); uci->cpu_sig.rev = mc_amd->hdr.patch_id; c->microcode = mc_amd->hdr.patch_id; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] x86, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info(). If early microcode loading is enabled its collect_cpu_info_amd_early() will fill ->x86 and so the fallback to boot_cpu_data is not used. But ->x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in no errata fixes getting applied and my system hangs on boot. Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses that struct. So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and the broken fallback can be dropped. I also added an WARN_ON() into the vendor check because init_amd() can only be used by AMD CPUs and if the current failure hadn't been silent this bug would have been much more obvious. V2: At request of Borislav Petkov: BUG_ON -> WARN_ON and subject change Signed-off-by: Torsten Kaiser --- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200 +++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200 @@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf static const int amd_erratum_383[]; static const int amd_erratum_400[]; -static bool cpu_has_amd_erratum(const int *erratum); +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum); static void init_amd(struct cpuinfo_x86 *c) { @@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86 value &= ~(1ULL << 24); wrmsrl_safe(MSR_AMD64_BU_CFG2, value); - if (cpu_has_amd_erratum(amd_erratum_383)) + if (cpu_has_amd_erratum(c, amd_erratum_383)) set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH); } - if (cpu_has_amd_erratum(amd_erratum_400)) + if (cpu_has_amd_erratum(c, amd_erratum_400)) set_cpu_bug(c, X86_BUG_AMD_APIC_C1E); rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, ); @@ -878,22 +878,16 @@ static const int amd_erratum_400[] = static const int amd_erratum_383[] = AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf)); -static bool cpu_has_amd_erratum(const int *erratum) + +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) { - struct cpuinfo_x86 *cpu = __this_cpu_ptr(_info); int osvw_id = *erratum++; u32 range; u32 ms; - /* -* If called early enough that current_cpu_data hasn't been initialized -* yet, fall back to boot_cpu_data. -*/ - if (cpu->x86 == 0) - cpu = _cpu_data; - - if (cpu->x86_vendor != X86_VENDOR_AMD) - return false; + /* Should never be called on non-AMD-CPUs */ + if (WARN_ON(cpu->x86_vendor != X86_VENDOR_AMD)) + return false; if (osvw_id >= 0 && osvw_id < 65536 && cpu_has(cpu, X86_FEATURE_OSVW)) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]Fix early microcode loading on AMD
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov wrote: > On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote: >> Fixup the early AMD microcode loading. >> >> * load_microcode_amd() (and the helper its using) should not have an >> cpu parameter. > > Hmm, I don't think so - we get the cpu handed down from microcode_core > and besides the early load on 32bit needs to do find_patch(cpu). Thats why I moved that part into apply_microcode_amd(). See later on more, why I think that move is the right thing. And without that the current cpu parameter will only be used to get the (in the early load case not even correctly set up!) per-cpu data. But the only member of cpuinfo_x86 that gets uses is ->x86, the family. Line 159: switch(c->x86) and Line 301: if (proc_fam!)c->x86) I really wanted to make that switch from cpu to x86family a separate patch, that it would be more obvious correct, but because of that amd_bsp_mpb hunk I can't find a good cut and thats why this patch is larger that I would have preferred. >> The microcode loading is not depending on the CPU it is > > Mostly. There are mixed-stepping boxes which need to differentiate > between which cpu we're applying the patch for. Nothing looks at ->x86_model or ->x86_mask during load. It will always load all patches from the current family. If loading would really depend on the current cpu in a mixed system that would be horrible: Depending on which CPU gets execute load_microcode_amd() it there would be different patches loaded into RAM? > Btw, your config boots on my F14h box with "nomodeset" on the command > line because it is missing radeon firmware for my gpu. I suspect a F14h box will never see that hang. It trips over the the C1E erratum and amd_erratum_400[] looks like it only affects 0xfh and 0x10h (like my Phenom II X6). >> executed and all the loaded patches will end up in a global list for all >> CPUs anyway. >> * Return -1 (like Intels apply_microcode) when the loading fails, >> also do not set the active microcode level on failure. > > Yep, this part I want. Please send it as a separate patch. OK, will send that together with the resend for cpu_has_amd_erratum(). >> * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could >> later load an older microcode file that would overwrite amd_bsp_mpb, >> but would not be applied to the CPUs > > See the patch id check in apply_ucode_in_initrd()? > > if (eq_id == mc->hdr.processor_rev_id && rev < mc->hdr.patch_id) I meant with "load" load_microcode_amd() not the loading of the microcode into the CPU. 1.: load microcode rev X into CPU (early or normal is not important) 2.: get older microcode file that only contains rev Y with Y> * Save the amd_bsp_mpb on every update. Otherwise someone could offline >> the BSP, update the microcode and this would be lost on resume > > Huh, is amd_bsp_mpb going to disappear all of a sudden? > > And that doesn't matter because when we online the BSP later, it goes > through the CPU hotplug notifier mc_cpu_callback. Or am I missing > something? Yeah, me correctly describing what I was meaning. ;-) 1.: boot system, BIOS give microcode rev. X 2.: offline the BSP 3.: update microcode to rev. Y with Y > X Because the BSP is not online rev. Y will not be copied into amd_bsp_mpb 4.: supend 5.: resume, BIOS gives rev. X again 6.: amd_bsp_mpb is empty -> rev. Y will not be reapplied. >> * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because >> load_microcode_amd() its no longer doing this and its not using >> apply_microcode_amd(). >> * extract common checks and initialisations from load_ucode_ap() and >> load_microcode_amd() to load_microcode_amd_early(). The change from >> cpu to x86family in load_microcode_amd() allows to drop the code messing >> with cpu_data(cpu), with is wrong anyway because at that point the >> per-cpu cpu_info is not yet setup. And these values would later be >> overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info(). > > Right, so I was thinking about this. And the code is pretty nasty: we do a > load_ucode_amd_ap() but we do add the ucode for the BSP: > > if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK) No, that code will not be reached for the BSP, because it is behind: if (cpu && !ucode_loaded) { The BSP has cpu == 0. Thats why I adding the following in my patch: + /* BSP via load_ucode_amd_bsp() */ + if (!cpu) + return; I don't understand if that is really correct, but that was the original behavior, and I didn't feel competent enough to decree that calling load_microcode_amd() for the BSP would be save. (The code there is strange: There is a load_ucode_amd_
[PATCH]Fix early microcode loading on AMD
Fixup the early AMD microcode loading. * load_microcode_amd() (and the helper its using) should not have an cpu parameter. The microcode loading is not depending on the CPU it is executed and all the loaded patches will end up in a global list for all CPUs anyway. * Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later load an older microcode file that would overwrite amd_bsp_mpb, but would not be applied to the CPUs * Save the amd_bsp_mpb on every update. Otherwise someone could offline the BSP, update the microcode and this would be lost on resume * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because load_microcode_amd() its no longer doing this and its not using apply_microcode_amd(). * extract common checks and initialisations from load_ucode_ap() and load_microcode_amd() to load_microcode_amd_early(). The change from cpu to x86family in load_microcode_amd() allows to drop the code messing with cpu_data(cpu), with is wrong anyway because at that point the per-cpu cpu_info is not yet setup. And these values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info(). * fold collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place. * load_ucode_ap(): Quick exit for !cpu, because without load_microcode_amd() getting called apply_microcode_amd() can't do anything. Exit, if no microcode could be loaded. * reduce save_microcode_in_initrd_amd() by reusing load_microcode_amd_early() Main benefit is, that the early microcode loading no longer plays games with the not-yet-initialised per-cpu cpu_info. apply_microcode_amd() will still write into cpu_data(cpu)->microcode, but I see no good way to remove that there, because for not-early microcode updates that is exactly the right place for that update. Signed-off-by: Torsten Kaiser --- This alone also fixes the hang-on-boot I experienced with 3.11-rc1 even if the fix for cpu_has_amd_erratum() is not applied, because now the trigger (filling ->x86 but not ->x86_vendor) is no longer there. But I think both patches should be applied. Boot tested on 64 and 32bit, but as my BIOS already provides up-to-date microcode I could not test, if that gets applied correctly. --- a/arch/x86/include/asm/microcode_amd.h 2013-07-22 17:54:25.166193431 +0200 +++ b/arch/x86/include/asm/microcode_amd.h 2013-07-22 17:56:31.066192463 +0200 @@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e extern int __apply_microcode_amd(struct microcode_amd *mc_amd); extern int apply_microcode_amd(int cpu); -extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size); +extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size); #ifdef CONFIG_MICROCODE_AMD_EARLY #ifdef CONFIG_X86_32 --- a/arch/x86/kernel/microcode_amd.c 2013-07-22 17:33:55.856202878 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-22 21:45:28.186086900 +0200 @@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu, return 0; } -static unsigned int verify_patch_size(int cpu, u32 patch_size, +static unsigned int verify_patch_size(u8 x86family, u32 patch_size, unsigned int size) { - struct cpuinfo_x86 *c = _data(cpu); u32 max_size; #define F1XH_MPB_MAX_SIZE 2048 @@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in #define F15H_MPB_MAX_SIZE 4096 #define F16H_MPB_MAX_SIZE 3458 - switch (c->x86) { + switch (x86family) { case 0x14: max_size = F14H_MPB_MAX_SIZE; break; @@ -220,12 +219,20 @@ int apply_microcode_amd(int cpu) return 0; } - if (__apply_microcode_amd(mc_amd)) + if (__apply_microcode_amd(mc_amd)) { pr_err("CPU%d: update failed for patch_level=0x%08x\n", cpu, mc_amd->hdr.patch_id); - else - pr_info("CPU%d: new patch_level=0x%08x\n", cpu, - mc_amd->hdr.patch_id); + return -1; + } + +#if defined(CONFIG_MICROCODE_AMD_EARLY) && defined(CONFIG_X86_32) + /* save applied patch for early load */ + memset(amd_bsp_mpb, 0, MPB_MAX_SIZE); + memcpy(amd_bsp_mpb, p->data, min_t(u32, ksize(p->data), MPB_MAX_SIZE)); +#endif + + pr_info("CPU%d: new patch_level=0x%08x\n", cpu, + mc_amd->hdr.patch_id); uci->cpu_sig.rev = mc_amd->hdr.patch_id; c->microcode = mc_amd->hdr.patch_id; @@ -276,9 +283,8 @@ static void cleanup(void) * driver cannot continue functioning normally. In such cases, we tear * down everything we've used up so far and exit. */ -static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int leftover) +static int verify
[PATCH]Fix boot hang in 3.11-rc1/2 because of bug in AMD errata check
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info(). If early microcode loading is enabled its collect_cpu_info_amd_early() will fill ->x86 and so the fallback to boot_cpu_data is not used. But ->x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in no errata fixes getting applied and my system hangs on boot. Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses that struct. So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and the broken fallback can be dropped. I also turned the vendor check into an BUG_ON() because init_amd() can only be used by AMD CPUs and if the current failure hadn't been silent this bug would have been much more obvious. Signed-off-by: Torsten Kaiser --- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200 +++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200 @@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf static const int amd_erratum_383[]; static const int amd_erratum_400[]; -static bool cpu_has_amd_erratum(const int *erratum); +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum); static void init_amd(struct cpuinfo_x86 *c) { @@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86 value &= ~(1ULL << 24); wrmsrl_safe(MSR_AMD64_BU_CFG2, value); - if (cpu_has_amd_erratum(amd_erratum_383)) + if (cpu_has_amd_erratum(c, amd_erratum_383)) set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH); } - if (cpu_has_amd_erratum(amd_erratum_400)) + if (cpu_has_amd_erratum(c, amd_erratum_400)) set_cpu_bug(c, X86_BUG_AMD_APIC_C1E); rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, ); @@ -878,22 +878,15 @@ static const int amd_erratum_400[] = static const int amd_erratum_383[] = AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf)); -static bool cpu_has_amd_erratum(const int *erratum) + +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) { - struct cpuinfo_x86 *cpu = __this_cpu_ptr(_info); int osvw_id = *erratum++; u32 range; u32 ms; - /* -* If called early enough that current_cpu_data hasn't been initialized -* yet, fall back to boot_cpu_data. -*/ - if (cpu->x86 == 0) - cpu = _cpu_data; - - if (cpu->x86_vendor != X86_VENDOR_AMD) - return false; + /* Should never be called on non-AMD-CPUs */ + BUG_ON(cpu->x86_vendor != X86_VENDOR_AMD); if (osvw_id >= 0 && osvw_id < 65536 && cpu_has(cpu, X86_FEATURE_OSVW)) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]Fix boot hang in 3.11-rc1/2 because of bug in AMD errata check
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info(). If early microcode loading is enabled its collect_cpu_info_amd_early() will fill -x86 and so the fallback to boot_cpu_data is not used. But -x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in no errata fixes getting applied and my system hangs on boot. Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses that struct. So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and the broken fallback can be dropped. I also turned the vendor check into an BUG_ON() because init_amd() can only be used by AMD CPUs and if the current failure hadn't been silent this bug would have been much more obvious. Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200 +++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200 @@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf static const int amd_erratum_383[]; static const int amd_erratum_400[]; -static bool cpu_has_amd_erratum(const int *erratum); +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum); static void init_amd(struct cpuinfo_x86 *c) { @@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86 value = ~(1ULL 24); wrmsrl_safe(MSR_AMD64_BU_CFG2, value); - if (cpu_has_amd_erratum(amd_erratum_383)) + if (cpu_has_amd_erratum(c, amd_erratum_383)) set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH); } - if (cpu_has_amd_erratum(amd_erratum_400)) + if (cpu_has_amd_erratum(c, amd_erratum_400)) set_cpu_bug(c, X86_BUG_AMD_APIC_C1E); rdmsr_safe(MSR_AMD64_PATCH_LEVEL, c-microcode, dummy); @@ -878,22 +878,15 @@ static const int amd_erratum_400[] = static const int amd_erratum_383[] = AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf)); -static bool cpu_has_amd_erratum(const int *erratum) + +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) { - struct cpuinfo_x86 *cpu = __this_cpu_ptr(cpu_info); int osvw_id = *erratum++; u32 range; u32 ms; - /* -* If called early enough that current_cpu_data hasn't been initialized -* yet, fall back to boot_cpu_data. -*/ - if (cpu-x86 == 0) - cpu = boot_cpu_data; - - if (cpu-x86_vendor != X86_VENDOR_AMD) - return false; + /* Should never be called on non-AMD-CPUs */ + BUG_ON(cpu-x86_vendor != X86_VENDOR_AMD); if (osvw_id = 0 osvw_id 65536 cpu_has(cpu, X86_FEATURE_OSVW)) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]Fix early microcode loading on AMD
Fixup the early AMD microcode loading. * load_microcode_amd() (and the helper its using) should not have an cpu parameter. The microcode loading is not depending on the CPU it is executed and all the loaded patches will end up in a global list for all CPUs anyway. * Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later load an older microcode file that would overwrite amd_bsp_mpb, but would not be applied to the CPUs * Save the amd_bsp_mpb on every update. Otherwise someone could offline the BSP, update the microcode and this would be lost on resume * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because load_microcode_amd() its no longer doing this and its not using apply_microcode_amd(). * extract common checks and initialisations from load_ucode_ap() and load_microcode_amd() to load_microcode_amd_early(). The change from cpu to x86family in load_microcode_amd() allows to drop the code messing with cpu_data(cpu), with is wrong anyway because at that point the per-cpu cpu_info is not yet setup. And these values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info(). * fold collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place. * load_ucode_ap(): Quick exit for !cpu, because without load_microcode_amd() getting called apply_microcode_amd() can't do anything. Exit, if no microcode could be loaded. * reduce save_microcode_in_initrd_amd() by reusing load_microcode_amd_early() Main benefit is, that the early microcode loading no longer plays games with the not-yet-initialised per-cpu cpu_info. apply_microcode_amd() will still write into cpu_data(cpu)-microcode, but I see no good way to remove that there, because for not-early microcode updates that is exactly the right place for that update. Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- This alone also fixes the hang-on-boot I experienced with 3.11-rc1 even if the fix for cpu_has_amd_erratum() is not applied, because now the trigger (filling -x86 but not -x86_vendor) is no longer there. But I think both patches should be applied. Boot tested on 64 and 32bit, but as my BIOS already provides up-to-date microcode I could not test, if that gets applied correctly. --- a/arch/x86/include/asm/microcode_amd.h 2013-07-22 17:54:25.166193431 +0200 +++ b/arch/x86/include/asm/microcode_amd.h 2013-07-22 17:56:31.066192463 +0200 @@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e extern int __apply_microcode_amd(struct microcode_amd *mc_amd); extern int apply_microcode_amd(int cpu); -extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size); +extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size); #ifdef CONFIG_MICROCODE_AMD_EARLY #ifdef CONFIG_X86_32 --- a/arch/x86/kernel/microcode_amd.c 2013-07-22 17:33:55.856202878 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-22 21:45:28.186086900 +0200 @@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu, return 0; } -static unsigned int verify_patch_size(int cpu, u32 patch_size, +static unsigned int verify_patch_size(u8 x86family, u32 patch_size, unsigned int size) { - struct cpuinfo_x86 *c = cpu_data(cpu); u32 max_size; #define F1XH_MPB_MAX_SIZE 2048 @@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in #define F15H_MPB_MAX_SIZE 4096 #define F16H_MPB_MAX_SIZE 3458 - switch (c-x86) { + switch (x86family) { case 0x14: max_size = F14H_MPB_MAX_SIZE; break; @@ -220,12 +219,20 @@ int apply_microcode_amd(int cpu) return 0; } - if (__apply_microcode_amd(mc_amd)) + if (__apply_microcode_amd(mc_amd)) { pr_err(CPU%d: update failed for patch_level=0x%08x\n, cpu, mc_amd-hdr.patch_id); - else - pr_info(CPU%d: new patch_level=0x%08x\n, cpu, - mc_amd-hdr.patch_id); + return -1; + } + +#if defined(CONFIG_MICROCODE_AMD_EARLY) defined(CONFIG_X86_32) + /* save applied patch for early load */ + memset(amd_bsp_mpb, 0, MPB_MAX_SIZE); + memcpy(amd_bsp_mpb, p-data, min_t(u32, ksize(p-data), MPB_MAX_SIZE)); +#endif + + pr_info(CPU%d: new patch_level=0x%08x\n, cpu, + mc_amd-hdr.patch_id); uci-cpu_sig.rev = mc_amd-hdr.patch_id; c-microcode = mc_amd-hdr.patch_id; @@ -276,9 +283,8 @@ static void cleanup(void) * driver cannot continue functioning normally. In such cases, we tear * down everything we've used up so far and exit. */ -static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int leftover) +static int verify_and_add_patch(u8 x86family, u8 *fw, unsigned int leftover
Re: [PATCH]Fix early microcode loading on AMD
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov b...@alien8.de wrote: On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote: Fixup the early AMD microcode loading. * load_microcode_amd() (and the helper its using) should not have an cpu parameter. Hmm, I don't think so - we get the cpu handed down from microcode_core and besides the early load on 32bit needs to do find_patch(cpu). Thats why I moved that part into apply_microcode_amd(). See later on more, why I think that move is the right thing. And without that the current cpu parameter will only be used to get the (in the early load case not even correctly set up!) per-cpu data. But the only member of cpuinfo_x86 that gets uses is -x86, the family. Line 159: switch(c-x86) and Line 301: if (proc_fam!)c-x86) I really wanted to make that switch from cpu to x86family a separate patch, that it would be more obvious correct, but because of that amd_bsp_mpb hunk I can't find a good cut and thats why this patch is larger that I would have preferred. The microcode loading is not depending on the CPU it is Mostly. There are mixed-stepping boxes which need to differentiate between which cpu we're applying the patch for. Nothing looks at -x86_model or -x86_mask during load. It will always load all patches from the current family. If loading would really depend on the current cpu in a mixed system that would be horrible: Depending on which CPU gets execute load_microcode_amd() it there would be different patches loaded into RAM? Btw, your config boots on my F14h box with nomodeset on the command line because it is missing radeon firmware for my gpu. I suspect a F14h box will never see that hang. It trips over the the C1E erratum and amd_erratum_400[] looks like it only affects 0xfh and 0x10h (like my Phenom II X6). executed and all the loaded patches will end up in a global list for all CPUs anyway. * Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Yep, this part I want. Please send it as a separate patch. OK, will send that together with the resend for cpu_has_amd_erratum(). * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later load an older microcode file that would overwrite amd_bsp_mpb, but would not be applied to the CPUs See the patch id check in apply_ucode_in_initrd()? if (eq_id == mc-hdr.processor_rev_id rev mc-hdr.patch_id) I meant with load load_microcode_amd() not the loading of the microcode into the CPU. 1.: load microcode rev X into CPU (early or normal is not important) 2.: get older microcode file that only contains rev Y with YX 3.: trigger load_microcode_amd() with a corrupt file: This will call cleanup() and empty pcache. 4.: trigger load_microcode_amd() with the older file: * this will now load rev Y into pcache * rev Y will be returned by find_patch and copied into amd_bsp_mpb * any try to apply rev Y will be skipped in apply_microcode_amd() So now the CPU still correctly has rev X, but amd_bsp_mpb will contain the wrong rev Y. That copying already in load_microcode also is suspicious if someone would only load the microcode but not apply it. But I did not find a codepath in arch/x86/kernel/microcode_core.c to load it without a followup apply. * Save the amd_bsp_mpb on every update. Otherwise someone could offline the BSP, update the microcode and this would be lost on resume Huh, is amd_bsp_mpb going to disappear all of a sudden? And that doesn't matter because when we online the BSP later, it goes through the CPU hotplug notifier mc_cpu_callback. Or am I missing something? Yeah, me correctly describing what I was meaning. ;-) 1.: boot system, BIOS give microcode rev. X 2.: offline the BSP 3.: update microcode to rev. Y with Y X Because the BSP is not online rev. Y will not be copied into amd_bsp_mpb 4.: supend 5.: resume, BIOS gives rev. X again 6.: amd_bsp_mpb is empty - rev. Y will not be reapplied. * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because load_microcode_amd() its no longer doing this and its not using apply_microcode_amd(). * extract common checks and initialisations from load_ucode_ap() and load_microcode_amd() to load_microcode_amd_early(). The change from cpu to x86family in load_microcode_amd() allows to drop the code messing with cpu_data(cpu), with is wrong anyway because at that point the per-cpu cpu_info is not yet setup. And these values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info(). Right, so I was thinking about this. And the code is pretty nasty: we do a load_ucode_amd_ap() but we do add the ucode for the BSP: if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK) No, that code will not be reached for the BSP, because it is behind: if (cpu !ucode_loaded) { The BSP has cpu == 0. Thats why I adding the following in my patch: + /* BSP via load_ucode_amd_bsp
[PATCH v2] x86, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info(). If early microcode loading is enabled its collect_cpu_info_amd_early() will fill -x86 and so the fallback to boot_cpu_data is not used. But -x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in no errata fixes getting applied and my system hangs on boot. Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses that struct. So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and the broken fallback can be dropped. I also added an WARN_ON() into the vendor check because init_amd() can only be used by AMD CPUs and if the current failure hadn't been silent this bug would have been much more obvious. V2: At request of Borislav Petkov: BUG_ON - WARN_ON and subject change Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200 +++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200 @@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf static const int amd_erratum_383[]; static const int amd_erratum_400[]; -static bool cpu_has_amd_erratum(const int *erratum); +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum); static void init_amd(struct cpuinfo_x86 *c) { @@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86 value = ~(1ULL 24); wrmsrl_safe(MSR_AMD64_BU_CFG2, value); - if (cpu_has_amd_erratum(amd_erratum_383)) + if (cpu_has_amd_erratum(c, amd_erratum_383)) set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH); } - if (cpu_has_amd_erratum(amd_erratum_400)) + if (cpu_has_amd_erratum(c, amd_erratum_400)) set_cpu_bug(c, X86_BUG_AMD_APIC_C1E); rdmsr_safe(MSR_AMD64_PATCH_LEVEL, c-microcode, dummy); @@ -878,22 +878,16 @@ static const int amd_erratum_400[] = static const int amd_erratum_383[] = AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf)); -static bool cpu_has_amd_erratum(const int *erratum) + +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) { - struct cpuinfo_x86 *cpu = __this_cpu_ptr(cpu_info); int osvw_id = *erratum++; u32 range; u32 ms; - /* -* If called early enough that current_cpu_data hasn't been initialized -* yet, fall back to boot_cpu_data. -*/ - if (cpu-x86 == 0) - cpu = boot_cpu_data; - - if (cpu-x86_vendor != X86_VENDOR_AMD) - return false; + /* Should never be called on non-AMD-CPUs */ + if (WARN_ON(cpu-x86_vendor != X86_VENDOR_AMD)) + return false; if (osvw_id = 0 osvw_id 65536 cpu_has(cpu, X86_FEATURE_OSVW)) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] x86, AMD: fix error path in apply_microcode_amd()
Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- a/arch/x86/kernel/microcode_amd.c 2013-07-23 19:42:16.089517717 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-23 19:43:30.359517091 +0200 @@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu) return 0; } - if (__apply_microcode_amd(mc_amd)) + if (__apply_microcode_amd(mc_amd)) { pr_err(CPU%d: update failed for patch_level=0x%08x\n, cpu, mc_amd-hdr.patch_id); - else - pr_info(CPU%d: new patch_level=0x%08x\n, cpu, - mc_amd-hdr.patch_id); + return -1; + } + pr_info(CPU%d: new patch_level=0x%08x\n, cpu, + mc_amd-hdr.patch_id); uci-cpu_sig.rev = mc_amd-hdr.patch_id; c-microcode = mc_amd-hdr.patch_id; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] x86, microcode: Don't lose error returns in save_microcode_in_initrd()
Don't lose the error return. This was lost when early amd microcode loading was added in 757885e94a22bcc82beb9b1445c95218cb20ceab Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- a/arch/x86/kernel/microcode_core_early.c2013-07-23 19:44:05.509516795 +0200 +++ b/arch/x86/kernel/microcode_core_early.c2013-07-23 19:58:34.459509474 +0200 @@ -127,11 +127,11 @@ int __init save_microcode_in_initrd(void switch (c-x86_vendor) { case X86_VENDOR_INTEL: if (c-x86 = 6) - save_microcode_in_initrd_intel(); + return save_microcode_in_initrd_intel(); break; case X86_VENDOR_AMD: if (c-x86 = 0x10) - save_microcode_in_initrd_amd(); + return save_microcode_in_initrd_amd(); break; default: break; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] x86, AMD: cleanup: merge common code in early microcode loading
Extract common checks and initialisations from load_ucode_ap() and save_microcode_in_initrd_amd() to load_microcode_amd_early(). load_ucode_ap() gets a quick exit for !cpu, because for the BSP there is already a different function dealing with its update. The original code already didn't anything, because without load_microcode_amd() getting called apply_microcode_amd() could not do anything. Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- a/arch/x86/kernel/microcode_amd_early.c 2013-07-22 06:22:32.0 +0200 +++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 +0200 @@ -196,6 +196,23 @@ void __init load_ucode_amd_bsp(void) apply_ucode_in_initrd(cd.data, cd.size); } +static int load_microcode_amd_early(void) +{ + enum ucode_state ret; + void *ucode; + + if (ucode_loaded || !ucode_size || !initrd_start) + return 0; + + ucode = (void *)(initrd_start + ucode_offset); + ret = load_microcode_amd(0, ucode, ucode_size); + if (ret != UCODE_OK) + return -EINVAL; + + ucode_loaded = true; + return 0; +} + #ifdef CONFIG_X86_32 u8 amd_bsp_mpb[MPB_MAX_SIZE]; @@ -258,17 +275,13 @@ void load_ucode_amd_ap(void) collect_cpu_info_amd_early(cpu_data(cpu), ucode_cpu_info + cpu); - if (cpu !ucode_loaded) { - void *ucode; - - if (!ucode_size || !initrd_start) - return; + /* BSP via load_ucode_amd_bsp() */ + if (!cpu) + return; - ucode = (void *)(initrd_start + ucode_offset); - if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK) - return; - ucode_loaded = true; - } + load_microcode_amd_early(); + if (!ucode_loaded) + return; apply_microcode_amd(cpu); } @@ -276,8 +289,6 @@ void load_ucode_amd_ap(void) int __init save_microcode_in_initrd_amd(void) { - enum ucode_state ret; - void *ucode; #ifdef CONFIG_X86_32 unsigned int bsp = boot_cpu_data.cpu_index; struct ucode_cpu_info *uci = ucode_cpu_info + bsp; @@ -289,14 +300,5 @@ int __init save_microcode_in_initrd_amd( pr_info(microcode: updated early to new patch_level=0x%08x\n, ucode_new_rev); - if (ucode_loaded || !ucode_size || !initrd_start) - return 0; - - ucode = (void *)(initrd_start + ucode_offset); - ret = load_microcode_amd(0, ucode, ucode_size); - if (ret != UCODE_OK) - return -EINVAL; - - ucode_loaded = true; - return 0; + return load_microcode_amd_early(); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] x86, AMD: saved applied, not loaded microcode for reloading on resume
* Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later load an older microcode file via load_microcode_amd() that would overwrite amd_bsp_mpb, but would not be applied to the CPUs (apply_microcode_amd() checks the current patchlevel, but the copy code in load_microcode_adm() did not. If somehow cleanup() gets called and clears pcache find_patch() could return return older patches then the currently installed microcode) * Save the amd_bsp_mpb on every update. Otherwise, if someone would update the microcode after offlining the BSP, these updates would not get saved and would be lost on resume. * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because load_microcode_amd() its no longer doing this and its not using apply_microcode_amd(). Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- Removing this hunk from load_microcode_amd() also allows me to kill the cpu parameter for that function in the next patch... --- a/arch/x86/kernel/microcode_amd.c 2013-07-23 19:43:30.359517091 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-23 20:05:04.469506188 +0200 @@ -228,6 +228,12 @@ int apply_microcode_amd(int cpu) pr_info(CPU%d: new patch_level=0x%08x\n, cpu, mc_amd-hdr.patch_id); +#if defined(CONFIG_MICROCODE_AMD_EARLY) defined(CONFIG_X86_32) + /* save applied patch for early load */ + memset(amd_bsp_mpb, 0, MPB_MAX_SIZE); + memcpy(amd_bsp_mpb, p-data, min_t(u32, ksize(p-data), MPB_MAX_SIZE)); +#endif + uci-cpu_sig.rev = mc_amd-hdr.patch_id; c-microcode = mc_amd-hdr.patch_id; @@ -385,17 +391,6 @@ enum ucode_state load_microcode_amd(int if (ret != UCODE_OK) cleanup(); -#if defined(CONFIG_MICROCODE_AMD_EARLY) defined(CONFIG_X86_32) - /* save BSP's matching patch for early load */ - if (cpu_data(cpu).cpu_index == boot_cpu_data.cpu_index) { - struct ucode_patch *p = find_patch(cpu); - if (p) { - memset(amd_bsp_mpb, 0, MPB_MAX_SIZE); - memcpy(amd_bsp_mpb, p-data, min_t(u32, ksize(p-data), - MPB_MAX_SIZE)); - } - } -#endif return ret; } --- a/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 +0200 +++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:05:14.969506099 +0200 @@ -170,6 +170,13 @@ static void apply_ucode_in_initrd(void * mc = (struct microcode_amd *)(data + SECTION_HDR_SIZE); if (eq_id == mc-hdr.processor_rev_id rev mc-hdr.patch_id) if (__apply_microcode_amd(mc) == 0) { +#ifdef CONFIG_X86_32 + /* save applied patch for early load */ + memset((void *)__pa(amd_bsp_mpb), 0, + MPB_MAX_SIZE); + memcpy((void *)__pa(amd_bsp_mpb), mc, + min_t(u32, header[1], MPB_MAX_SIZE)); +#endif rev = mc-hdr.patch_id; *new_rev = rev; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] x86, AMD: simplify load_microcode_amd() to fix early microcode loading to no longer access uninitialized per-cpu data
load_microcode_amd() (and the helper it is using) should not have an cpu parameter. The microcode loading is not depending on the CPU it is executed and all the loaded patches will end up in a global list for all CPUs anyway. The change from cpu to x86family in load_microcode_amd() now allows to drop the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which is wrong anyway because at that point the per-cpu cpu_info is not yet setup. And these values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info(). Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place and without the cpuinfo_x86 accesses it was not much left. Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- One effect of this early, partly initialisation of cpu_info was, that the fallback logic in cpu_has_amd_erratum() did not use boot_cpu_data and because x86_vendor was not initialised in the per-cpu struct the E400 erratum was not activated on my system resulting in a failed boot. --- a/arch/x86/include/asm/microcode_amd.h 2013-07-23 20:15:10.549501081 +0200 +++ b/arch/x86/include/asm/microcode_amd.h 2013-07-23 20:16:05.329500620 +0200 @@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e extern int __apply_microcode_amd(struct microcode_amd *mc_amd); extern int apply_microcode_amd(int cpu); -extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size); +extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size); #ifdef CONFIG_MICROCODE_AMD_EARLY #ifdef CONFIG_X86_32 --- a/arch/x86/kernel/microcode_amd.c 2013-07-23 20:05:04.469506188 +0200 +++ b/arch/x86/kernel/microcode_amd.c 2013-07-23 20:23:22.739496934 +0200 @@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu, return 0; } -static unsigned int verify_patch_size(int cpu, u32 patch_size, +static unsigned int verify_patch_size(u8 x86family, u32 patch_size, unsigned int size) { - struct cpuinfo_x86 *c = cpu_data(cpu); u32 max_size; #define F1XH_MPB_MAX_SIZE 2048 @@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in #define F15H_MPB_MAX_SIZE 4096 #define F16H_MPB_MAX_SIZE 3458 - switch (c-x86) { + switch (x86family) { case 0x14: max_size = F14H_MPB_MAX_SIZE; break; @@ -283,9 +282,8 @@ static void cleanup(void) * driver cannot continue functioning normally. In such cases, we tear * down everything we've used up so far and exit. */ -static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int leftover) +static int verify_and_add_patch(u8 x86family, u8 *fw, unsigned int leftover) { - struct cpuinfo_x86 *c = cpu_data(cpu); struct microcode_header_amd *mc_hdr; struct ucode_patch *patch; unsigned int patch_size, crnt_size, ret; @@ -305,7 +303,7 @@ static int verify_and_add_patch(unsigned /* check if patch is for the current family */ proc_fam = ((proc_fam 8) 0xf) + ((proc_fam 20) 0xff); - if (proc_fam != c-x86) + if (proc_fam != x86family) return crnt_size; if (mc_hdr-nb_dev_id || mc_hdr-sb_dev_id) { @@ -314,7 +312,7 @@ static int verify_and_add_patch(unsigned return crnt_size; } - ret = verify_patch_size(cpu, patch_size, leftover); + ret = verify_patch_size(x86family, patch_size, leftover); if (!ret) { pr_err(Patch-ID 0x%08x: size mismatch.\n, mc_hdr-patch_id); return crnt_size; @@ -345,7 +343,7 @@ static int verify_and_add_patch(unsigned return crnt_size; } -static enum ucode_state __load_microcode_amd(int cpu, const u8 *data, size_t size) +static enum ucode_state __load_microcode_amd(u8 x86family, const u8 *data, size_t size) { enum ucode_state ret = UCODE_ERROR; unsigned int leftover; @@ -368,7 +366,7 @@ static enum ucode_state __load_microcode } while (leftover) { - crnt_size = verify_and_add_patch(cpu, fw, leftover); + crnt_size = verify_and_add_patch(x86family, fw, leftover); if (crnt_size 0) return ret; @@ -379,14 +377,14 @@ static enum ucode_state __load_microcode return UCODE_OK; } -enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size) +enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size) { enum ucode_state ret; /* free old equiv table */ free_equiv_cpu_table(); - ret = __load_microcode_amd(cpu, data, size); + ret = __load_microcode_amd(x86family, data, size); if (ret != UCODE_OK) cleanup(); @@ -436,7 +434,7 @@ static enum ucode_state request_microcod goto fw_release; } - ret = load_microcode_amd(cpu, fw-data
Re: [PATCH]Fix early microcode loading on AMD
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov b...@alien8.de wrote: On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote: Fixup the early AMD microcode loading. * load_microcode_amd() (and the helper its using) should not have an cpu parameter. Hmm, I don't think so - we get the cpu handed down from microcode_core and besides the early load on 32bit needs to do find_patch(cpu). The microcode loading is not depending on the CPU it is Mostly. There are mixed-stepping boxes which need to differentiate between which cpu we're applying the patch for. I redid the patch in 5 parts, hopefully now better to understand. Without the other changes the microcode_amd.c-part of patch 5/5 should make it much more obvious that my change did not result in a different behavior about which patches get loaded into the microcode cache 'pcache'. Btw, your config boots on my F14h box with nomodeset on the command line because it is missing radeon firmware for my gpu. executed and all the loaded patches will end up in a global list for all CPUs anyway. * Return -1 (like Intels apply_microcode) when the loading fails, also do not set the active microcode level on failure. Yep, this part I want. Please send it as a separate patch. That is now patch 1/5. Patch 2/5 is new, I skipped that part originally because I did not want to make it even bigger... So I see a couple of issues in this patch and they should be separated into single patches - one patch taking care of one issue and explaining what the problem is in the commit message (I know you can do that good :)). I'm still seeing some things in the microcode code that look suspicious: Why is the X86_64 code updating uci-cpu_sig.rev, but the 32bit version does not? And I can't see anything that reads that value. Should apply_microcode_amd() really update uci-mc even before checking if the microcode is newer? The X86_32 hunk in save_microcode_in_initrd_amd() now seems obsolete. load_microcode_amd() is no longer using find_patch() so it doesn't use ucode_cpu_info anymore. But why is that code using boot_cpu_data.cpu_index to find the BSP but always then passing 0 as cpu parameter to load_microcode_amd()? If boot_cpu_data.cpu_index is ever !=0 that code would fail. ... and collect_cpu_info_amd() also looks very weird. If csig would not point to uci-cpu_sig then find_patch() will not be happy. Wouldn't directly passing cpuid_eax(0x0001) to find_patch() be a better interface? Then the early microcode loading code would not need to access ucode_cpu_info at all. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: early microcode on amd is broken when no initramfs provided
On Sun, Jul 21, 2013 at 12:59 AM, Borislav Petkov wrote: > On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote: >> On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov wrote: >> > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote: >> >> config is attached >> > >> > Ok, I can reproduce the hang with your config but even with: >> > >> > $ grep MICROCODE .config >> > # CONFIG_MICROCODE is not set >> > # CONFIG_MICROCODE_INTEL_EARLY is not set >> > # CONFIG_MICROCODE_AMD_EARLY is not set >> > >> > which means, it cannot be microcode-related. >> > >> > And I'd bet if you wait a minute (yep, it should be exactly 60 seconds) >> > the boot would probably continue. And if so, this is that 60 sec delay >> > where the kernel tries to find firmware. >> > >> > Hmm... >> >> I have the same problem: Booting 3.11-rc1 hangs after the line: >> ACPI: Executed 3 blocks of module-level executable AML code >> >> I bisected it down to the early microcode changes: >> 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading >> implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small >> fixup) completely fail to boot (No output beyond "Booting kernel") , >> from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make >> find_ucode_in_initrd() __init") I'm seeing this hang. >> >> Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system >> now sucessfully boots 3.11-rc1. > > Ok, I need to be able to reproduce that first - I wasn't that successful > with Johannes' setup. > > So, can you please send .config and how you're loading your microcode? > Is it in the initrd or are you doing that later, how? Grub entry please. > > Also, is it just plain v3.11-rc1 or with patches ontop? > > Also, /proc/cpuinfo please. .config and cpuinfo attached. Microcode seems not to be loaded at all, for MICROCODE_EARLY I did not attach the needed file / cpio and the normal update mechanism seems to not have a newer microcode that what the BIOS is providing. I'm using a custom initrd, but that can't be used for MICROCODE_EARLY because its compressed and does not contain a AuthenticAMD.bin. Its also not containing microcode_amd.bin, because I'm suppling that via CONFIG_EXTRA_FIRMWARE. Grub entry: title 3.11.0-rc1-crypt root (hd0,0) kernel (hd0,0)/boot/kernel-3.11.0-rc1 fastboot crypt_root=/dev/md6 video=1280x1024 radeon.dpm=1 initrd (hd0,0)/boot/ramfs-2011.gz savedefault I was using plain 3.11-rc1 except the changes I made to debug this. What I think you need: A system that is fatally affected by AMD Erratum 400 and an 64bit kernel. >From my debugging I found the following sequence of events occurs on my system: The BSP will call load_ucode_ap(). That will call collect_cpu_info_amd_early(), which will fill the cpuinfo_x86.x86 and cpuinfo_x86.microcode fields of the cpu_info-per-cpu-structure that has not yet been setup. Because this code will only be used with MICROCODE_EARLY disabling this options make my system boot. OTOH this function is called regardless if AuthenticAMD.bin is available or not, thats why I'm hitting it even without the special cpio. Then the BSP will call init_amd() to apply the errata fixes. That uses cpu_has_amd_erratum(), but that function is not using the cpuinfo_x86 that was supplied to init_amd() (And used for the following set_cpu_bug() is the erratum was found!), but instead is guessing itself if it should use the per-cpu data or boot_cpu_data. And it uses the not yet initialized per-cpu data for that guess. Which normally works fine, because that will all be zeroed out, but collect_cpu_info_amd_early() has filled ->x86 and so cpu_has_amd_erratum() wil use the partly filled per-cpu data instead of the correct boot_cpu_data. But because collect_cpu_info_amd_early() did not fill ->x86_vendor that field is still 0 == X86_VENDOR_INTEL and cpu_has_amd_erratum() will lie that no erratum is present. So the C1E work around is not applied and as soon as ACPI enables this the boot hangs. Something like the following (whitespace mangled by Gmail, if it looks OK for you, I will send it as a clean patch) fixes cpu_has_amd_erratum() for me, but I did not look how the early microcode loading should work if AuthenticAMD.bin is available to offer a fix the premature accesses to per-cpu cpu_info. --- 3.11-rc1/arch/x86/kernel/cpu/amd.c.orig 2013-07-21 05:42:42.130346496 +0200 +++ 3.11-rc1/arch/x86/kernel/cpu/amd.c 2013-07-21 05:45:09.420345843 +0200 @@ -512,7 +512,7 @@ static const int amd_erratum_383[]; static const int amd_erratum_400[]; -static bool cpu_has_amd_erratum(const int *erratum); +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *e
Re: early microcode on amd is broken when no initramfs provided
On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov wrote: > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote: >> config is attached > > Ok, I can reproduce the hang with your config but even with: > > $ grep MICROCODE .config > # CONFIG_MICROCODE is not set > # CONFIG_MICROCODE_INTEL_EARLY is not set > # CONFIG_MICROCODE_AMD_EARLY is not set > > which means, it cannot be microcode-related. > > And I'd bet if you wait a minute (yep, it should be exactly 60 seconds) > the boot would probably continue. And if so, this is that 60 sec delay > where the kernel tries to find firmware. > > Hmm... I have the same problem: Booting 3.11-rc1 hangs after the line: ACPI: Executed 3 blocks of module-level executable AML code I bisected it down to the early microcode changes: 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small fixup) completely fail to boot (No output beyond "Booting kernel") , from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make find_ucode_in_initrd() __init") I'm seeing this hang. Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system now sucessfully boots 3.11-rc1. Trying to debug this I found the following hack to also solve the boot problem: Removing the following two lines from collect_cpu_info_amd_early() from arch/x86/kernel/microcode_amd_early.c: c->microcode = rev; c->x86 = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); But I can't make sense out of that. And if I try to trace who updates ->x86 it get even more confusing. Normaly only cpu_detect() seems to update cpuinfo_x86.x86 but now it seems to fight with collect_cpu_info_amd_early(). On my system this happens: (Output is always address of the struct cpuinfo_x86 -> value that gets written into it) Very early boot: cpu_detect 81c8ba40 -> 16 BSP == CPU0 calls load_ucode_ap() via cpu_init(): collect_cpu_info_amd_early 880337c10fc0 -> 16 (That is the place I patched out to get the system to boot) BSP == CPU0 via identify_boot_cpu(): cpu_detect 81c8ba40 -> 16 BSP == CPU0 stores boot_cpu_data in its per-cpu structure via smp_store_boot_cpu_info(): smpboot: BSP: store 81c8ba40 in 880337c10fc0 smpboot starts activating the secondary CPUs: Each would in start_secondary() first call load_ucode_ap() via cpu_init() and then identidfy_secondary_cpu() via smp_callin(): collect_cpu_info_amd_early 880337c50fc0 smpboot: identify_sec_cpu:1/880337c50fc0 cpu_detect 880337c50fc0 -> 16 collect_cpu_info_amd_early 880337c90fc0 smpboot: identify_sec_cpu:2/880337c90fc0 cpu_detect 880337c90fc0 -> 16 collect_cpu_info_amd_early 880337cd0fc0 smpboot: identify_sec_cpu:3/880337cd0fc0 cpu_detect 880337cd0fc0 -> 16 collect_cpu_info_amd_early 880337d10fc0 smpboot: identify_sec_cpu:4/880337d10fc0 cpu_detect 880337d10fc0 -> 16 collect_cpu_info_amd_early 880337d50fc0 smpboot: identify_sec_cpu:5/880337d50fc0 cpu_detect 880337d50fc0 -> 16 It seems the code for updating 'struct cpuinfo_x86 *C' in collect_cpu_info_amd_early() is useless, because it will be overwritten first by smp_store_cpu_info() and then again by identify_secondary_cpu(c) and wrong, because at that point the per-cpu structure should not be used yet, as smp_store_cpu_info() did not run yet. But something else seems to be using the per-cpu structure of the BSP between its cpu_init() and smp_store_boot_cpu_info(). And its cpu_has_amd_erratum(): It uses cpuinfo_x86.x86 do decide if it need to fall back to boot_cpu_data, but because collect_cpu_info_amd_early() has filled that field, but not .x86_vendor (that is still 0 == X86_VENDOR_INTEL) the erratas are not applied to the BSP and then something in ACPI gets stuck. Does this diagnostic make sense / should I send a patch? Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: early microcode on amd is broken when no initramfs provided
On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov b...@alien8.de wrote: On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote: config is attached Ok, I can reproduce the hang with your config but even with: $ grep MICROCODE .config # CONFIG_MICROCODE is not set # CONFIG_MICROCODE_INTEL_EARLY is not set # CONFIG_MICROCODE_AMD_EARLY is not set which means, it cannot be microcode-related. And I'd bet if you wait a minute (yep, it should be exactly 60 seconds) the boot would probably continue. And if so, this is that 60 sec delay where the kernel tries to find firmware. Hmm... I have the same problem: Booting 3.11-rc1 hangs after the line: ACPI: Executed 3 blocks of module-level executable AML code I bisected it down to the early microcode changes: 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small fixup) completely fail to boot (No output beyond Booting kernel) , from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 (Make find_ucode_in_initrd() __init) I'm seeing this hang. Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system now sucessfully boots 3.11-rc1. Trying to debug this I found the following hack to also solve the boot problem: Removing the following two lines from collect_cpu_info_amd_early() from arch/x86/kernel/microcode_amd_early.c: c-microcode = rev; c-x86 = ((eax 8) 0xf) + ((eax 20) 0xff); But I can't make sense out of that. And if I try to trace who updates -x86 it get even more confusing. Normaly only cpu_detect() seems to update cpuinfo_x86.x86 but now it seems to fight with collect_cpu_info_amd_early(). On my system this happens: (Output is always address of the struct cpuinfo_x86 - value that gets written into it) Very early boot: cpu_detect 81c8ba40 - 16 BSP == CPU0 calls load_ucode_ap() via cpu_init(): collect_cpu_info_amd_early 880337c10fc0 - 16 (That is the place I patched out to get the system to boot) BSP == CPU0 via identify_boot_cpu(): cpu_detect 81c8ba40 - 16 BSP == CPU0 stores boot_cpu_data in its per-cpu structure via smp_store_boot_cpu_info(): smpboot: BSP: store 81c8ba40 in 880337c10fc0 smpboot starts activating the secondary CPUs: Each would in start_secondary() first call load_ucode_ap() via cpu_init() and then identidfy_secondary_cpu() via smp_callin(): collect_cpu_info_amd_early 880337c50fc0 smpboot: identify_sec_cpu:1/880337c50fc0 cpu_detect 880337c50fc0 - 16 collect_cpu_info_amd_early 880337c90fc0 smpboot: identify_sec_cpu:2/880337c90fc0 cpu_detect 880337c90fc0 - 16 collect_cpu_info_amd_early 880337cd0fc0 smpboot: identify_sec_cpu:3/880337cd0fc0 cpu_detect 880337cd0fc0 - 16 collect_cpu_info_amd_early 880337d10fc0 smpboot: identify_sec_cpu:4/880337d10fc0 cpu_detect 880337d10fc0 - 16 collect_cpu_info_amd_early 880337d50fc0 smpboot: identify_sec_cpu:5/880337d50fc0 cpu_detect 880337d50fc0 - 16 It seems the code for updating 'struct cpuinfo_x86 *C' in collect_cpu_info_amd_early() is useless, because it will be overwritten first by smp_store_cpu_info() and then again by identify_secondary_cpu(c) and wrong, because at that point the per-cpu structure should not be used yet, as smp_store_cpu_info() did not run yet. But something else seems to be using the per-cpu structure of the BSP between its cpu_init() and smp_store_boot_cpu_info(). And its cpu_has_amd_erratum(): It uses cpuinfo_x86.x86 do decide if it need to fall back to boot_cpu_data, but because collect_cpu_info_amd_early() has filled that field, but not .x86_vendor (that is still 0 == X86_VENDOR_INTEL) the erratas are not applied to the BSP and then something in ACPI gets stuck. Does this diagnostic make sense / should I send a patch? Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: early microcode on amd is broken when no initramfs provided
On Sun, Jul 21, 2013 at 12:59 AM, Borislav Petkov b...@alien8.de wrote: On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote: On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov b...@alien8.de wrote: On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote: config is attached Ok, I can reproduce the hang with your config but even with: $ grep MICROCODE .config # CONFIG_MICROCODE is not set # CONFIG_MICROCODE_INTEL_EARLY is not set # CONFIG_MICROCODE_AMD_EARLY is not set which means, it cannot be microcode-related. And I'd bet if you wait a minute (yep, it should be exactly 60 seconds) the boot would probably continue. And if so, this is that 60 sec delay where the kernel tries to find firmware. Hmm... I have the same problem: Booting 3.11-rc1 hangs after the line: ACPI: Executed 3 blocks of module-level executable AML code I bisected it down to the early microcode changes: 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small fixup) completely fail to boot (No output beyond Booting kernel) , from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 (Make find_ucode_in_initrd() __init) I'm seeing this hang. Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system now sucessfully boots 3.11-rc1. Ok, I need to be able to reproduce that first - I wasn't that successful with Johannes' setup. So, can you please send .config and how you're loading your microcode? Is it in the initrd or are you doing that later, how? Grub entry please. Also, is it just plain v3.11-rc1 or with patches ontop? Also, /proc/cpuinfo please. .config and cpuinfo attached. Microcode seems not to be loaded at all, for MICROCODE_EARLY I did not attach the needed file / cpio and the normal update mechanism seems to not have a newer microcode that what the BIOS is providing. I'm using a custom initrd, but that can't be used for MICROCODE_EARLY because its compressed and does not contain a AuthenticAMD.bin. Its also not containing microcode_amd.bin, because I'm suppling that via CONFIG_EXTRA_FIRMWARE. Grub entry: title 3.11.0-rc1-crypt root (hd0,0) kernel (hd0,0)/boot/kernel-3.11.0-rc1 fastboot crypt_root=/dev/md6 video=1280x1024 radeon.dpm=1 initrd (hd0,0)/boot/ramfs-2011.gz savedefault I was using plain 3.11-rc1 except the changes I made to debug this. What I think you need: A system that is fatally affected by AMD Erratum 400 and an 64bit kernel. From my debugging I found the following sequence of events occurs on my system: The BSP will call load_ucode_ap(). That will call collect_cpu_info_amd_early(), which will fill the cpuinfo_x86.x86 and cpuinfo_x86.microcode fields of the cpu_info-per-cpu-structure that has not yet been setup. Because this code will only be used with MICROCODE_EARLY disabling this options make my system boot. OTOH this function is called regardless if AuthenticAMD.bin is available or not, thats why I'm hitting it even without the special cpio. Then the BSP will call init_amd() to apply the errata fixes. That uses cpu_has_amd_erratum(), but that function is not using the cpuinfo_x86 that was supplied to init_amd() (And used for the following set_cpu_bug() is the erratum was found!), but instead is guessing itself if it should use the per-cpu data or boot_cpu_data. And it uses the not yet initialized per-cpu data for that guess. Which normally works fine, because that will all be zeroed out, but collect_cpu_info_amd_early() has filled -x86 and so cpu_has_amd_erratum() wil use the partly filled per-cpu data instead of the correct boot_cpu_data. But because collect_cpu_info_amd_early() did not fill -x86_vendor that field is still 0 == X86_VENDOR_INTEL and cpu_has_amd_erratum() will lie that no erratum is present. So the C1E work around is not applied and as soon as ACPI enables this the boot hangs. Something like the following (whitespace mangled by Gmail, if it looks OK for you, I will send it as a clean patch) fixes cpu_has_amd_erratum() for me, but I did not look how the early microcode loading should work if AuthenticAMD.bin is available to offer a fix the premature accesses to per-cpu cpu_info. --- 3.11-rc1/arch/x86/kernel/cpu/amd.c.orig 2013-07-21 05:42:42.130346496 +0200 +++ 3.11-rc1/arch/x86/kernel/cpu/amd.c 2013-07-21 05:45:09.420345843 +0200 @@ -512,7 +512,7 @@ static const int amd_erratum_383[]; static const int amd_erratum_400[]; -static bool cpu_has_amd_erratum(const int *erratum); +static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum); static void __cpuinit init_amd(struct cpuinfo_x86 *c) { @@ -729,11 +729,11 @@ value = ~(1ULL 24); wrmsrl_safe(MSR_AMD64_BU_CFG2, value); - if (cpu_has_amd_erratum(amd_erratum_383)) + if (cpu_has_amd_erratum(c, amd_erratum_383)) set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH
[PATCH]xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
From: Torsten Kaiser Commit fb59581404ab7ec5075299065c22cb211a9262a9 removed xfs_flushinval_pages() and changed its callers to use filemap_write_and_wait() and truncate_pagecache_range() directly. But in xfs_swap_extents() this change accidental switched the argument for 'tip' to 'ip'. This patch switches it back to 'tip' Signed-off-by: Torsten Kaiser --- a/fs/xfs/xfs_dfrag.c +++ b/fs/xfs/xfs_dfrag.c @@ -246,10 +246,10 @@ xfs_swap_extents( goto out_unlock; } - error = -filemap_write_and_wait(VFS_I(ip)->i_mapping); + error = -filemap_write_and_wait(VFS_I(tip)->i_mapping); if (error) goto out_unlock; - truncate_pagecache_range(VFS_I(ip), 0, -1); + truncate_pagecache_range(VFS_I(tip), 0, -1); /* Verify O_DIRECT for ftmp */ if (VN_CACHED(VFS_I(tip)) != 0) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
From: Torsten Kaiser just.for.l...@googlemail.com Commit fb59581404ab7ec5075299065c22cb211a9262a9 removed xfs_flushinval_pages() and changed its callers to use filemap_write_and_wait() and truncate_pagecache_range() directly. But in xfs_swap_extents() this change accidental switched the argument for 'tip' to 'ip'. This patch switches it back to 'tip' Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com --- a/fs/xfs/xfs_dfrag.c +++ b/fs/xfs/xfs_dfrag.c @@ -246,10 +246,10 @@ xfs_swap_extents( goto out_unlock; } - error = -filemap_write_and_wait(VFS_I(ip)-i_mapping); + error = -filemap_write_and_wait(VFS_I(tip)-i_mapping); if (error) goto out_unlock; - truncate_pagecache_range(VFS_I(ip), 0, -1); + truncate_pagecache_range(VFS_I(tip), 0, -1); /* Verify O_DIRECT for ftmp */ if (VN_CACHED(VFS_I(tip)) != 0) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in md-raid1 with 3.7-rcX
On Tue, Nov 27, 2012 at 8:08 AM, Torsten Kaiser wrote: > On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown wrote: >> Can you test to see if this fixes it? > > Patch applied, I will try to get it stuck again. > I don't have a reliable reproducers, but if the problem persists I > will definitly report back here. With this patch I was not able to recreate the hang. Lacking an 100% way of recreating this, I can't be completely sure of the fix, but as you understood from the code how this hang could happen, I'm quite confident that the fix is working. (As I do not use the raid10 personality only patching raid1.c was sufficient for me, I didn't test the version that also patched raid10.c as its not even compiled on my kernel.) Thanks for the fix! Torsten >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 636bae0..a0f7309 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool >> from_schedule) >> struct r1conf *conf = mddev->private; >> struct bio *bio; >> >> - if (from_schedule) { >> + if (from_schedule || current->bio_list) { >> spin_lock_irq(>device_lock); >> bio_list_merge(>pending_bio_list, >pending); >> conf->pending_count += plug->pending_cnt; >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in md-raid1 with 3.7-rcX
On Tue, Nov 27, 2012 at 8:08 AM, Torsten Kaiser just.for.l...@googlemail.com wrote: On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown ne...@suse.de wrote: Can you test to see if this fixes it? Patch applied, I will try to get it stuck again. I don't have a reliable reproducers, but if the problem persists I will definitly report back here. With this patch I was not able to recreate the hang. Lacking an 100% way of recreating this, I can't be completely sure of the fix, but as you understood from the code how this hang could happen, I'm quite confident that the fix is working. (As I do not use the raid10 personality only patching raid1.c was sufficient for me, I didn't test the version that also patched raid10.c as its not even compiled on my kernel.) Thanks for the fix! Torsten diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 636bae0..a0f7309 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule) struct r1conf *conf = mddev-private; struct bio *bio; - if (from_schedule) { + if (from_schedule || current-bio_list) { spin_lock_irq(conf-device_lock); bio_list_merge(conf-pending_bio_list, plug-pending); conf-pending_count += plug-pending_cnt; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in md-raid1 with 3.7-rcX
On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown wrote: > On Sat, 24 Nov 2012 10:18:44 +0100 Torsten Kaiser > wrote: > >> After my system got stuck with 3.7.0-rc2 as reported in >> http://marc.info/?l=linux-kernel=135142236520624 LOCKDEP seem to >> blame XFS, because it found 2 possible deadlocks. But after these >> locking issues where fixed, my system got stuck again with 3.7.0-rc6 >> as reported in http://marc.info/?l=linux-kernel=135344072325490 >> Dave Chinner thinks its an issue within md, that it gets stuck and >> that will then prevent any further xfs activity, and that I should >> report it to the raid mailing list. >> >> The issue seems to be that multiple processes (kswapd0, xfsaild/md4 >> and flush-9:4) get stuck in md_super_wait() like this: >> [] schedule+0x24/0x60 >> [] md_super_wait+0x4d/0x80 >> [] ? __init_waitqueue_head+0x60/0x60 >> [] bitmap_unplug+0x173/0x180 >> [] ? write_cache_pages+0x12f/0x420 >> [] ? set_page_dirty_lock+0x60/0x60 >> [] raid1_unplug+0x98/0x110 >> [] blk_flush_plug_list+0xad/0x240 >> [] blk_finish_plug+0x13/0x50 >> >> The full hung-tasks stack traces and the output from SysRq+W can be >> found at http://marc.info/?l=linux-kernel=135344072325490 or in the >> LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'. > > Yes, it does look like an md bug > Can you test to see if this fixes it? Patch applied, I will try to get it stuck again. I don't have a reliable reproducers, but if the problem persists I will definitly report back here. > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 636bae0..a0f7309 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool > from_schedule) > struct r1conf *conf = mddev->private; > struct bio *bio; > > - if (from_schedule) { > + if (from_schedule || current->bio_list) { > spin_lock_irq(>device_lock); > bio_list_merge(>pending_bio_list, >pending); > conf->pending_count += plug->pending_cnt; > >> >> I tried to understand how this could happen, but I don't see anything >> wrong. Only that md_super_wait() looks like an open coded version of >> __wait_event() and could be replaced by using it. > > yeah. md_super_wait was much more complex back when we had to support > barrier operations. When they were removed it was simplified a lot and as > you say it could be simplifier further. Patches welcome. I guessed it predated that particular helper. If you ask for a patch, I have one question: md_super_wait() looks like __wait_event(), but there also is a wait_event() helper. Would it be better to switch to wait_event()? It would add an additional check for atomic_read(>pending_writes)==0 before "allocating" and initialising the wait_queue_t, which I think would be a correct optimization. >> http://marc.info/?l=linux-raid=135283030027665 looks like the same >> issue, but using ext4 instead of xfs. > > yes, sure does. > >> >> My setup wrt. md is two normal sata disks on a normal ahci controller >> (AMD SB850 southbridge). >> Both disks are divided into 4 partitions and each one assembled into a >> separate raid1. >> One (md5) is used for swap, the others hold xfs filesystems for /boot/ >> (md4), / (md6) and /home/ (md7). >> >> I will try to provide any information you ask, but I can't reproduce >> the hang on demand so gathering more information about that state is >> not so easy, but I will try. > > I'm fairly confident the above patch will fixes it, and in any case it fixes > a real bug. So if you could just run with it and confirm in a week or so > that the problem hasn't recurred, that might have to do. I only had 2 or 3 hangs since 3.7-rc1, but suspect forcing the system to swap (which lies on an raid1) plays a part of it. As the system as 12GB of RAM it normally doesn't need to swap and I see no problem. I will try theses workloads again and hope if the problem persists I can trigger it again in the next few days... Thanks for the patch, Torsten > Thanks, > NeilBrown > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in md-raid1 with 3.7-rcX
On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown ne...@suse.de wrote: On Sat, 24 Nov 2012 10:18:44 +0100 Torsten Kaiser just.for.l...@googlemail.com wrote: After my system got stuck with 3.7.0-rc2 as reported in http://marc.info/?l=linux-kernelm=135142236520624 LOCKDEP seem to blame XFS, because it found 2 possible deadlocks. But after these locking issues where fixed, my system got stuck again with 3.7.0-rc6 as reported in http://marc.info/?l=linux-kernelm=135344072325490 Dave Chinner thinks its an issue within md, that it gets stuck and that will then prevent any further xfs activity, and that I should report it to the raid mailing list. The issue seems to be that multiple processes (kswapd0, xfsaild/md4 and flush-9:4) get stuck in md_super_wait() like this: [816b1224] schedule+0x24/0x60 [814f9dad] md_super_wait+0x4d/0x80 [8105ca30] ? __init_waitqueue_head+0x60/0x60 [81500753] bitmap_unplug+0x173/0x180 [810b6acf] ? write_cache_pages+0x12f/0x420 [810b6700] ? set_page_dirty_lock+0x60/0x60 [814e8eb8] raid1_unplug+0x98/0x110 [81278a6d] blk_flush_plug_list+0xad/0x240 [81278c13] blk_finish_plug+0x13/0x50 The full hung-tasks stack traces and the output from SysRq+W can be found at http://marc.info/?l=linux-kernelm=135344072325490 or in the LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'. Yes, it does look like an md bug Can you test to see if this fixes it? Patch applied, I will try to get it stuck again. I don't have a reliable reproducers, but if the problem persists I will definitly report back here. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 636bae0..a0f7309 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule) struct r1conf *conf = mddev-private; struct bio *bio; - if (from_schedule) { + if (from_schedule || current-bio_list) { spin_lock_irq(conf-device_lock); bio_list_merge(conf-pending_bio_list, plug-pending); conf-pending_count += plug-pending_cnt; I tried to understand how this could happen, but I don't see anything wrong. Only that md_super_wait() looks like an open coded version of __wait_event() and could be replaced by using it. yeah. md_super_wait was much more complex back when we had to support barrier operations. When they were removed it was simplified a lot and as you say it could be simplifier further. Patches welcome. I guessed it predated that particular helper. If you ask for a patch, I have one question: md_super_wait() looks like __wait_event(), but there also is a wait_event() helper. Would it be better to switch to wait_event()? It would add an additional check for atomic_read(mddev-pending_writes)==0 before allocating and initialising the wait_queue_t, which I think would be a correct optimization. http://marc.info/?l=linux-raidm=135283030027665 looks like the same issue, but using ext4 instead of xfs. yes, sure does. My setup wrt. md is two normal sata disks on a normal ahci controller (AMD SB850 southbridge). Both disks are divided into 4 partitions and each one assembled into a separate raid1. One (md5) is used for swap, the others hold xfs filesystems for /boot/ (md4), / (md6) and /home/ (md7). I will try to provide any information you ask, but I can't reproduce the hang on demand so gathering more information about that state is not so easy, but I will try. I'm fairly confident the above patch will fixes it, and in any case it fixes a real bug. So if you could just run with it and confirm in a week or so that the problem hasn't recurred, that might have to do. I only had 2 or 3 hangs since 3.7-rc1, but suspect forcing the system to swap (which lies on an raid1) plays a part of it. As the system as 12GB of RAM it normally doesn't need to swap and I see no problem. I will try theses workloads again and hope if the problem persists I can trigger it again in the next few days... Thanks for the patch, Torsten Thanks, NeilBrown -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hang in md-raid1 with 3.7-rcX
After my system got stuck with 3.7.0-rc2 as reported in http://marc.info/?l=linux-kernel=135142236520624 LOCKDEP seem to blame XFS, because it found 2 possible deadlocks. But after these locking issues where fixed, my system got stuck again with 3.7.0-rc6 as reported in http://marc.info/?l=linux-kernel=135344072325490 Dave Chinner thinks its an issue within md, that it gets stuck and that will then prevent any further xfs activity, and that I should report it to the raid mailing list. The issue seems to be that multiple processes (kswapd0, xfsaild/md4 and flush-9:4) get stuck in md_super_wait() like this: [] schedule+0x24/0x60 [] md_super_wait+0x4d/0x80 [] ? __init_waitqueue_head+0x60/0x60 [] bitmap_unplug+0x173/0x180 [] ? write_cache_pages+0x12f/0x420 [] ? set_page_dirty_lock+0x60/0x60 [] raid1_unplug+0x98/0x110 [] blk_flush_plug_list+0xad/0x240 [] blk_finish_plug+0x13/0x50 The full hung-tasks stack traces and the output from SysRq+W can be found at http://marc.info/?l=linux-kernel=135344072325490 or in the LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'. I tried to understand how this could happen, but I don't see anything wrong. Only that md_super_wait() looks like an open coded version of __wait_event() and could be replaced by using it. http://marc.info/?l=linux-raid=135283030027665 looks like the same issue, but using ext4 instead of xfs. My setup wrt. md is two normal sata disks on a normal ahci controller (AMD SB850 southbridge). Both disks are divided into 4 partitions and each one assembled into a separate raid1. One (md5) is used for swap, the others hold xfs filesystems for /boot/ (md4), / (md6) and /home/ (md7). I will try to provide any information you ask, but I can't reproduce the hang on demand so gathering more information about that state is not so easy, but I will try. Thanks for looking into this, Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hang in md-raid1 with 3.7-rcX
After my system got stuck with 3.7.0-rc2 as reported in http://marc.info/?l=linux-kernelm=135142236520624 LOCKDEP seem to blame XFS, because it found 2 possible deadlocks. But after these locking issues where fixed, my system got stuck again with 3.7.0-rc6 as reported in http://marc.info/?l=linux-kernelm=135344072325490 Dave Chinner thinks its an issue within md, that it gets stuck and that will then prevent any further xfs activity, and that I should report it to the raid mailing list. The issue seems to be that multiple processes (kswapd0, xfsaild/md4 and flush-9:4) get stuck in md_super_wait() like this: [816b1224] schedule+0x24/0x60 [814f9dad] md_super_wait+0x4d/0x80 [8105ca30] ? __init_waitqueue_head+0x60/0x60 [81500753] bitmap_unplug+0x173/0x180 [810b6acf] ? write_cache_pages+0x12f/0x420 [810b6700] ? set_page_dirty_lock+0x60/0x60 [814e8eb8] raid1_unplug+0x98/0x110 [81278a6d] blk_flush_plug_list+0xad/0x240 [81278c13] blk_finish_plug+0x13/0x50 The full hung-tasks stack traces and the output from SysRq+W can be found at http://marc.info/?l=linux-kernelm=135344072325490 or in the LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'. I tried to understand how this could happen, but I don't see anything wrong. Only that md_super_wait() looks like an open coded version of __wait_event() and could be replaced by using it. http://marc.info/?l=linux-raidm=135283030027665 looks like the same issue, but using ext4 instead of xfs. My setup wrt. md is two normal sata disks on a normal ahci controller (AMD SB850 southbridge). Both disks are divided into 4 partitions and each one assembled into a separate raid1. One (md5) is used for swap, the others hold xfs filesystems for /boot/ (md4), / (md6) and /home/ (md7). I will try to provide any information you ask, but I can't reproduce the hang on demand so gathering more information about that state is not so easy, but I will try. Thanks for looking into this, Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in XFS reclaim on 3.7.0-rc3
On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner wrote: > On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote: > So, both lockdep thingy's are the same: I suspected this, but as the reports where slightly different I attached bith of them, as I couldn't decide witch one was the better/simpler report to debug this. >> [110926.972482] = >> [110926.972484] [ INFO: possible irq lock inversion dependency detected ] >> [110926.972486] 3.7.0-rc4 #1 Not tainted >> [110926.972487] - >> [110926.972489] kswapd0/725 just changed the state of lock: >> [110926.972490] (sb_internal){.+.+.?}, at: [] >> xfs_trans_alloc+0x28/0x50 >> [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the >> past: >> [110926.972500] (&(>i_lock)->mr_lock/1){+.+.+.} > > Ah, what? Since when has the ilock been reclaim unsafe? > >> [110926.972500] and interrupts could create inverse lock ordering between >> them. >> [110926.972500] >> [110926.972503] >> [110926.972503] other info that might help us debug this: >> [110926.972504] Possible interrupt unsafe locking scenario: >> [110926.972504] >> [110926.972505]CPU0CPU1 >> [110926.972506] >> [110926.972507] lock(&(>i_lock)->mr_lock/1); >> [110926.972509]local_irq_disable(); >> [110926.972509]lock(sb_internal); >> [110926.972511] >> lock(&(>i_lock)->mr_lock/1); >> [110926.972512] >> [110926.972513] lock(sb_internal); > > Um, that's just bizzare. No XFS code runs with interrupts disabled, > so I cannot see how this possible. > > . > > >[] mark_held_locks+0x7e/0x130 >[] lockdep_trace_alloc+0x63/0xc0 >[] kmem_cache_alloc+0x35/0xe0 >[] vm_map_ram+0x271/0x770 >[] _xfs_buf_map_pages+0x46/0xe0 >[] xfs_buf_get_map+0x8a/0x130 >[] xfs_trans_get_buf_map+0xa9/0xd0 >[] xfs_ialloc_inode_init+0xcd/0x1d0 > > We shouldn't be mapping buffers there, there's a patch below to fix > this. It's probably the source of this report, even though I cannot > lockdep seems to be off with the fairies... I also tried to understand what lockdep was saying, but Documentation/lockdep-design.txt is not too helpful. I think 'CLASS'-ON-R / -ON-W means that this lock was 'ON' / held while 'CLASS' (HARDIRQ, SOFTIRQ, RECLAIM_FS) happend and that makes this lock unsafe for these contexts. IN-'CLASS'-R / -W seems to be 'lock taken in context 'CLASS'. A note that 'CLASS'-ON-? means 'CLASS'-unsafe in there would be helpful to me... Wrt. above interrupt output: I think lockdep doesn't really know about RECLAIM_FS and threats it as another interrupt. I think that output should have been something like this: CPU0CPU1 lock(&(>i_lock)->mr_lock/1); lock(sb_internal); lock(&(>i_lock)->mr_lock/1); lock(sb_internal); Entering reclaim on CPU1 would mean that CPU1 would not enter reclaim again, so the reclaim-'interrupt' would be disabled. And instead of interrupts disrupting the normal codeflow on CPU0 it would be 'interrupted' be instead of doing a normal allocation, it would 'interrupt' the allocation to reclaim memory. print_irq_lock_scenario() would need to be taught to print a slightly different message for reclaim-'interrupts'. I will try your patch, but as I do not have a reliable reproducer to create this lockdep report, I can't really verify if this fixes it. But I will definitely mail you, if it happens again with this patch. Thanks, Torsten > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com > > xfs: inode allocation should use unmapped buffers. > > From: Dave Chinner > > Inode buffers do not need to be mapped as inodes are read or written > directly from/to the pages underlying the buffer. This fixes a > regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the > default behaviour"). > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_ialloc.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c > index 2d6495e..a815412 100644 > --- a/fs/xfs/xfs_ialloc.c > +++ b/fs/xfs/xfs_ialloc.c > @@ -200,7 +200,8 @@ xfs_ialloc_inode_init( > */ > d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j *
Re: Hang in XFS reclaim on 3.7.0-rc3
On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner da...@fromorbit.com wrote: On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote: So, both lockdep thingy's are the same: I suspected this, but as the reports where slightly different I attached bith of them, as I couldn't decide witch one was the better/simpler report to debug this. [110926.972482] = [110926.972484] [ INFO: possible irq lock inversion dependency detected ] [110926.972486] 3.7.0-rc4 #1 Not tainted [110926.972487] - [110926.972489] kswapd0/725 just changed the state of lock: [110926.972490] (sb_internal){.+.+.?}, at: [8122b268] xfs_trans_alloc+0x28/0x50 [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past: [110926.972500] ((ip-i_lock)-mr_lock/1){+.+.+.} Ah, what? Since when has the ilock been reclaim unsafe? [110926.972500] and interrupts could create inverse lock ordering between them. [110926.972500] [110926.972503] [110926.972503] other info that might help us debug this: [110926.972504] Possible interrupt unsafe locking scenario: [110926.972504] [110926.972505]CPU0CPU1 [110926.972506] [110926.972507] lock((ip-i_lock)-mr_lock/1); [110926.972509]local_irq_disable(); [110926.972509]lock(sb_internal); [110926.972511] lock((ip-i_lock)-mr_lock/1); [110926.972512] Interrupt [110926.972513] lock(sb_internal); Um, that's just bizzare. No XFS code runs with interrupts disabled, so I cannot see how this possible. . [8108137e] mark_held_locks+0x7e/0x130 [81081a63] lockdep_trace_alloc+0x63/0xc0 [810e9dd5] kmem_cache_alloc+0x35/0xe0 [810dba31] vm_map_ram+0x271/0x770 [811e1316] _xfs_buf_map_pages+0x46/0xe0 [811e222a] xfs_buf_get_map+0x8a/0x130 [81233ab9] xfs_trans_get_buf_map+0xa9/0xd0 [8121bced] xfs_ialloc_inode_init+0xcd/0x1d0 We shouldn't be mapping buffers there, there's a patch below to fix this. It's probably the source of this report, even though I cannot lockdep seems to be off with the fairies... I also tried to understand what lockdep was saying, but Documentation/lockdep-design.txt is not too helpful. I think 'CLASS'-ON-R / -ON-W means that this lock was 'ON' / held while 'CLASS' (HARDIRQ, SOFTIRQ, RECLAIM_FS) happend and that makes this lock unsafe for these contexts. IN-'CLASS'-R / -W seems to be 'lock taken in context 'CLASS'. A note that 'CLASS'-ON-? means 'CLASS'-unsafe in there would be helpful to me... Wrt. above interrupt output: I think lockdep doesn't really know about RECLAIM_FS and threats it as another interrupt. I think that output should have been something like this: CPU0CPU1 lock((ip-i_lock)-mr_lock/1); Allocation enters reclaim lock(sb_internal); lock((ip-i_lock)-mr_lock/1); Allocation enters reclaim lock(sb_internal); Entering reclaim on CPU1 would mean that CPU1 would not enter reclaim again, so the reclaim-'interrupt' would be disabled. And instead of interrupts disrupting the normal codeflow on CPU0 it would be 'interrupted' be instead of doing a normal allocation, it would 'interrupt' the allocation to reclaim memory. print_irq_lock_scenario() would need to be taught to print a slightly different message for reclaim-'interrupts'. I will try your patch, but as I do not have a reliable reproducer to create this lockdep report, I can't really verify if this fixes it. But I will definitely mail you, if it happens again with this patch. Thanks, Torsten Cheers, Dave. -- Dave Chinner da...@fromorbit.com xfs: inode allocation should use unmapped buffers. From: Dave Chinner dchin...@redhat.com Inode buffers do not need to be mapped as inodes are read or written directly from/to the pages underlying the buffer. This fixes a regression introduced by commit 611c994 (xfs: make XBF_MAPPED the default behaviour). Signed-off-by: Dave Chinner dchin...@redhat.com --- fs/xfs/xfs_ialloc.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c index 2d6495e..a815412 100644 --- a/fs/xfs/xfs_ialloc.c +++ b/fs/xfs/xfs_ialloc.c @@ -200,7 +200,8 @@ xfs_ialloc_inode_init( */ d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j * blks_per_cluster)); fbuf = xfs_trans_get_buf(tp, mp-m_ddev_targp, d, -mp-m_bsize * blks_per_cluster, 0); +mp-m_bsize * blks_per_cluster
Re: Hang in XFS reclaim on 3.7.0-rc3
On Mon, Nov 19, 2012 at 12:51 AM, Dave Chinner wrote: > On Sun, Nov 18, 2012 at 04:29:22PM +0100, Torsten Kaiser wrote: >> On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser >> wrote: >> > On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser >> > wrote: >> >> I will keep LOCKDEP enabled on that system, and if there really is >> >> another splat, I will report back here. But I rather doubt that this >> >> will be needed. >> > >> > After the patch, I did not see this problem again, but today I found >> > another LOCKDEP report that also looks XFS related. >> > I found it twice in the logs, and as both were slightly different, I >> > will attach both versions. >> >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent >> > {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104430]CPU0 >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104431] >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104432] >> > lock(&(>i_lock)->mr_lock); >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104433] >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104434] >> > lock(&(>i_lock)->mr_lock); >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104435] >> > Nov 6 21:57:09 thoregon kernel: [ 9941.104435] *** DEADLOCK *** >> >> Sorry! Copied the wrong report. Your fix only landed in -rc5, so my >> vanilla -rc4 did (also) report the old problem again. >> And I copy that report instead of the second appearance of the >> new problem. > > Can you repost it with line wrapping turned off? The output simply > becomes unreadable when it wraps > > Yeah, I know I can put it back together, but I've got better things > to do with my time than stitch a couple of hundred lines of debug > back into a readable format Sorry about that, but I can't find any option to turn that off in Gmail. I have added the reports as attachment, I hope thats OK for you. Thanks for looking into this. Torsten [110926.972477] [110926.972482] = [110926.972484] [ INFO: possible irq lock inversion dependency detected ] [110926.972486] 3.7.0-rc4 #1 Not tainted [110926.972487] - [110926.972489] kswapd0/725 just changed the state of lock: [110926.972490] (sb_internal){.+.+.?}, at: [] xfs_trans_alloc+0x28/0x50 [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past: [110926.972500] (&(>i_lock)->mr_lock/1){+.+.+.} [110926.972500] [110926.972500] and interrupts could create inverse lock ordering between them. [110926.972500] [110926.972503] [110926.972503] other info that might help us debug this: [110926.972504] Possible interrupt unsafe locking scenario: [110926.972504] [110926.972505]CPU0CPU1 [110926.972506] [110926.972507] lock(&(>i_lock)->mr_lock/1); [110926.972509]local_irq_disable(); [110926.972509]lock(sb_internal); [110926.972511]lock(&(>i_lock)->mr_lock/1); [110926.972512] [110926.972513] lock(sb_internal); [110926.972514] [110926.972514] *** DEADLOCK *** [110926.972514] [110926.972516] 2 locks held by kswapd0/725: [110926.972517] #0: (shrinker_rwsem){..}, at: [] shrink_slab+0x32/0x1f0 [110926.972522] #1: (>s_umount_key#20){.+}, at: [] grab_super_passive+0x3e/0x90 [110926.972527] [110926.972527] the shortest dependencies between 2nd lock and 1st lock: [110926.972533] -> (&(>i_lock)->mr_lock/1){+.+.+.} ops: 58117 { [110926.972536] HARDIRQ-ON-W at: [110926.972537] [] __lock_acquire+0x631/0x1c00 [110926.972540] [] lock_acquire+0x55/0x70 [110926.972542] [] down_write_nested+0x4a/0x70 [110926.972545] [] xfs_ilock+0x84/0xb0 [110926.972548] [] xfs_create+0x1d4/0x5a0 [110926.972550] [] xfs_vn_mknod+0x8a/0x1b0 [110926.972552] [] xfs_vn_create+0xe/0x10 [110926.972554] [] vfs_create+0x72/0xc0 [110926.972556] [] do_last.isra.69+0x80e/0xc80 [110926.972558] [] path_openat.isra.70+0xab/0x490 [110926.972560] [] do_filp_open+0x3d/0xa0 [110926.972562] [] do_sys_open+0xf9/0x1e0 [110926.972565] [] sys_open+0x1c/0x20 [110926.972567] []
Re: Hang in XFS reclaim on 3.7.0-rc3
On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser wrote: > On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser > wrote: >> I will keep LOCKDEP enabled on that system, and if there really is >> another splat, I will report back here. But I rather doubt that this >> will be needed. > > After the patch, I did not see this problem again, but today I found > another LOCKDEP report that also looks XFS related. > I found it twice in the logs, and as both were slightly different, I > will attach both versions. > Nov 6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted > Nov 6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent > {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. > Nov 6 21:57:09 thoregon kernel: [ 9941.104430]CPU0 > Nov 6 21:57:09 thoregon kernel: [ 9941.104431] > Nov 6 21:57:09 thoregon kernel: [ 9941.104432] > lock(&(>i_lock)->mr_lock); > Nov 6 21:57:09 thoregon kernel: [ 9941.104433] > Nov 6 21:57:09 thoregon kernel: [ 9941.104434] > lock(&(>i_lock)->mr_lock); > Nov 6 21:57:09 thoregon kernel: [ 9941.104435] > Nov 6 21:57:09 thoregon kernel: [ 9941.104435] *** DEADLOCK *** Sorry! Copied the wrong report. Your fix only landed in -rc5, so my vanilla -rc4 did (also) report the old problem again. And I copy that report instead of the second appearance of the new problem. Here is the correct second report of the sb_internal vs ip->i_lock->mr_lock problem: [110926.972477] [110926.972482] = [110926.972484] [ INFO: possible irq lock inversion dependency detected ] [110926.972486] 3.7.0-rc4 #1 Not tainted [110926.972487] - [110926.972489] kswapd0/725 just changed the state of lock: [110926.972490] (sb_internal){.+.+.?}, at: [] xfs_trans_alloc+0x28/0x50 [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past: [110926.972500] (&(>i_lock)->mr_lock/1){+.+.+.} [110926.972500] [110926.972500] and interrupts could create inverse lock ordering between them. [110926.972500] [110926.972503] [110926.972503] other info that might help us debug this: [110926.972504] Possible interrupt unsafe locking scenario: [110926.972504] [110926.972505]CPU0CPU1 [110926.972506] [110926.972507] lock(&(>i_lock)->mr_lock/1); [110926.972509]local_irq_disable(); [110926.972509]lock(sb_internal); [110926.972511]lock(&(>i_lock)->mr_lock/1); [110926.972512] [110926.972513] lock(sb_internal); [110926.972514] [110926.972514] *** DEADLOCK *** [110926.972514] [110926.972516] 2 locks held by kswapd0/725: [110926.972517] #0: (shrinker_rwsem){..}, at: [] shrink_slab+0x32/0x1f0 [110926.972522] #1: (>s_umount_key#20){.+}, at: [] grab_super_passive+0x3e/0x90 [110926.972527] [110926.972527] the shortest dependencies between 2nd lock and 1st lock: [110926.972533] -> (&(>i_lock)->mr_lock/1){+.+.+.} ops: 58117 { [110926.972536] HARDIRQ-ON-W at: [110926.972537] [] __lock_acquire+0x631/0x1c00 [110926.972540] [] lock_acquire+0x55/0x70 [110926.972542] [] down_write_nested+0x4a/0x70 [110926.972545] [] xfs_ilock+0x84/0xb0 [110926.972548] [] xfs_create+0x1d4/0x5a0 [110926.972550] [] xfs_vn_mknod+0x8a/0x1b0 [110926.972552] [] xfs_vn_create+0xe/0x10 [110926.972554] [] vfs_create+0x72/0xc0 [110926.972556] [] do_last.isra.69+0x80e/0xc80 [110926.972558] [] path_openat.isra.70+0xab/0x490 [110926.972560] [] do_filp_open+0x3d/0xa0 [110926.972562] [] do_sys_open+0xf9/0x1e0 [110926.972565] [] sys_open+0x1c/0x20 [110926.972567] [] system_call_fastpath+0x16/0x1b [110926.972570] SOFTIRQ-ON-W at: [110926.972571] [] __lock_acquire+0x667/0x1c00 [110926.972573] [] lock_acquire+0x55/0x70 [110926.972574] [] down_write_nested+0x4a/0x70 [110926.972576] [] xfs_ilock+0x84/0xb0 [110926.972578] [] xfs_create+0x1d4/0x5a0 [110926.972580] [] xfs_vn_mknod+0x8a/0x1b0 [110926.972581] [] xfs_vn_create+0xe/0x10 [110926.972583] [] vfs_create+0x72/0xc0 [110926.972585] [] do_last.isra.69+0x80e/0xc80 [110926.972587] [] path_openat.isra.70+0xab/0x490 [110926.972589] [] do_filp_open+0x3d/0xa0 [110926.972591] [] do_s
Re: Hang in XFS reclaim on 3.7.0-rc3
On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser wrote: > I will keep LOCKDEP enabled on that system, and if there really is > another splat, I will report back here. But I rather doubt that this > will be needed. After the patch, I did not see this problem again, but today I found another LOCKDEP report that also looks XFS related. I found it twice in the logs, and as both were slightly different, I will attach both versions. Nov 6 21:57:09 thoregon kernel: [ 9941.104345] Nov 6 21:57:09 thoregon kernel: [ 9941.104350] = Nov 6 21:57:09 thoregon kernel: [ 9941.104351] [ INFO: inconsistent lock state ] Nov 6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted Nov 6 21:57:09 thoregon kernel: [ 9941.104354] - Nov 6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. Nov 6 21:57:09 thoregon kernel: [ 9941.104357] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes: Nov 6 21:57:09 thoregon kernel: [ 9941.104359] (&(>i_lock)->mr_lock){?.}, at: [] xfs_ilock+0x84/0xb0 Nov 6 21:57:09 thoregon kernel: [ 9941.104366] {RECLAIM_FS-ON-W} state was registered at: Nov 6 21:57:09 thoregon kernel: [ 9941.104367] [] mark_held_locks+0x7e/0x130 Nov 6 21:57:09 thoregon kernel: [ 9941.104371] [] lockdep_trace_alloc+0x63/0xc0 Nov 6 21:57:09 thoregon kernel: [ 9941.104373] [] __alloc_pages_nodemask+0x75/0x800 Nov 6 21:57:09 thoregon kernel: [ 9941.104375] [] __get_free_pages+0x12/0x40 Nov 6 21:57:09 thoregon kernel: [ 9941.104377] [] pte_alloc_one_kernel+0x10/0x20 Nov 6 21:57:09 thoregon kernel: [ 9941.104380] [] __pte_alloc_kernel+0x16/0x90 Nov 6 21:57:09 thoregon kernel: [ 9941.104382] [] vmap_page_range_noflush+0x287/0x320 Nov 6 21:57:09 thoregon kernel: [ 9941.104385] [] vm_map_ram+0x694/0x770 Nov 6 21:57:09 thoregon kernel: [ 9941.104386] [] _xfs_buf_map_pages+0x46/0xe0 Nov 6 21:57:09 thoregon kernel: [ 9941.104389] [] xfs_buf_get_map+0x8a/0x130 Nov 6 21:57:09 thoregon kernel: [ 9941.104391] [] xfs_trans_get_buf_map+0xa9/0xd0 Nov 6 21:57:09 thoregon kernel: [ 9941.104393] [] xfs_ifree_cluster+0x129/0x670 Nov 6 21:57:09 thoregon kernel: [ 9941.104396] [] xfs_ifree+0xe9/0xf0 Nov 6 21:57:09 thoregon kernel: [ 9941.104398] [] xfs_inactive+0x2af/0x480 Nov 6 21:57:09 thoregon kernel: [ 9941.104400] [] xfs_fs_evict_inode+0x70/0x80 Nov 6 21:57:09 thoregon kernel: [ 9941.104402] [] evict+0xaf/0x1b0 Nov 6 21:57:09 thoregon kernel: [ 9941.104405] [] iput+0x105/0x210 Nov 6 21:57:09 thoregon kernel: [ 9941.104406] [] d_delete+0x150/0x190 Nov 6 21:57:09 thoregon kernel: [ 9941.104408] [] vfs_rmdir+0x107/0x120 Nov 6 21:57:09 thoregon kernel: [ 9941.104411] [] do_rmdir+0xe4/0x130 Nov 6 21:57:09 thoregon kernel: [ 9941.104413] [] sys_rmdir+0x11/0x20 Nov 6 21:57:09 thoregon kernel: [ 9941.104415] [] system_call_fastpath+0x16/0x1b Nov 6 21:57:09 thoregon kernel: [ 9941.104417] irq event stamp: 18505 Nov 6 21:57:09 thoregon kernel: [ 9941.104418] hardirqs last enabled at (18505): [] mutex_trylock+0xfd/0x170 Nov 6 21:57:09 thoregon kernel: [ 9941.104421] hardirqs last disabled at (18504): [] mutex_trylock+0x3e/0x170 Nov 6 21:57:09 thoregon kernel: [ 9941.104423] softirqs last enabled at (18492): [] __do_softirq+0x111/0x170 Nov 6 21:57:09 thoregon kernel: [ 9941.104426] softirqs last disabled at (18477): [] call_softirq+0x1c/0x30 Nov 6 21:57:09 thoregon kernel: [ 9941.104428] Nov 6 21:57:09 thoregon kernel: [ 9941.104428] other info that might help us debug this: Nov 6 21:57:09 thoregon kernel: [ 9941.104429] Possible unsafe locking scenario: Nov 6 21:57:09 thoregon kernel: [ 9941.104429] Nov 6 21:57:09 thoregon kernel: [ 9941.104430]CPU0 Nov 6 21:57:09 thoregon kernel: [ 9941.104431] Nov 6 21:57:09 thoregon kernel: [ 9941.104432] lock(&(>i_lock)->mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104433] Nov 6 21:57:09 thoregon kernel: [ 9941.104434] lock(&(>i_lock)->mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104435] Nov 6 21:57:09 thoregon kernel: [ 9941.104435] *** DEADLOCK *** Nov 6 21:57:09 thoregon kernel: [ 9941.104435] Nov 6 21:57:09 thoregon kernel: [ 9941.104437] 3 locks held by kswapd0/725: Nov 6 21:57:09 thoregon kernel: [ 9941.104438] #0: (shrinker_rwsem){..}, at: [] shrink_slab+0x32/0x1f0 Nov 6 21:57:09 thoregon kernel: [ 9941.104442] #1: (>s_umount_key#20){.+}, at: [] grab_super_passive+0x3e/0x90 Nov 6 21:57:09 thoregon kernel: [ 9941.104446] #2: (>pag_ici_reclaim_lock){+.+...}, at: [] xfs_reclaim_inodes_ag+0xbc/0x4f0 Nov 6 21:57:09 thoregon kernel: [ 9941.104449] Nov 6 21:57:09 thoregon kernel: [ 9941.104449] stack backtrace: Nov 6 21:57:09 thoregon kernel: [ 9941.104451] Pid: 725, comm: kswapd0 Not tainted 3.7.0-rc4 #1 Nov 6 21:57:09 thoregon kernel: [ 9941.104452] Call Trace: Nov 6 21:57:09 thoregon kernel: [ 9941.1044
Re: Hang in XFS reclaim on 3.7.0-rc3
On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser just.for.l...@googlemail.com wrote: I will keep LOCKDEP enabled on that system, and if there really is another splat, I will report back here. But I rather doubt that this will be needed. After the patch, I did not see this problem again, but today I found another LOCKDEP report that also looks XFS related. I found it twice in the logs, and as both were slightly different, I will attach both versions. Nov 6 21:57:09 thoregon kernel: [ 9941.104345] Nov 6 21:57:09 thoregon kernel: [ 9941.104350] = Nov 6 21:57:09 thoregon kernel: [ 9941.104351] [ INFO: inconsistent lock state ] Nov 6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted Nov 6 21:57:09 thoregon kernel: [ 9941.104354] - Nov 6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage. Nov 6 21:57:09 thoregon kernel: [ 9941.104357] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes: Nov 6 21:57:09 thoregon kernel: [ 9941.104359] ((ip-i_lock)-mr_lock){?.}, at: [811e8164] xfs_ilock+0x84/0xb0 Nov 6 21:57:09 thoregon kernel: [ 9941.104366] {RECLAIM_FS-ON-W} state was registered at: Nov 6 21:57:09 thoregon kernel: [ 9941.104367] [8108137e] mark_held_locks+0x7e/0x130 Nov 6 21:57:09 thoregon kernel: [ 9941.104371] [81081a63] lockdep_trace_alloc+0x63/0xc0 Nov 6 21:57:09 thoregon kernel: [ 9941.104373] [810b5a55] __alloc_pages_nodemask+0x75/0x800 Nov 6 21:57:09 thoregon kernel: [ 9941.104375] [810b6262] __get_free_pages+0x12/0x40 Nov 6 21:57:09 thoregon kernel: [ 9941.104377] [8102d7f0] pte_alloc_one_kernel+0x10/0x20 Nov 6 21:57:09 thoregon kernel: [ 9941.104380] [810cc3e6] __pte_alloc_kernel+0x16/0x90 Nov 6 21:57:09 thoregon kernel: [ 9941.104382] [810d9f37] vmap_page_range_noflush+0x287/0x320 Nov 6 21:57:09 thoregon kernel: [ 9941.104385] [810dbe54] vm_map_ram+0x694/0x770 Nov 6 21:57:09 thoregon kernel: [ 9941.104386] [811e1316] _xfs_buf_map_pages+0x46/0xe0 Nov 6 21:57:09 thoregon kernel: [ 9941.104389] [811e222a] xfs_buf_get_map+0x8a/0x130 Nov 6 21:57:09 thoregon kernel: [ 9941.104391] [81233ab9] xfs_trans_get_buf_map+0xa9/0xd0 Nov 6 21:57:09 thoregon kernel: [ 9941.104393] [8121e5a9] xfs_ifree_cluster+0x129/0x670 Nov 6 21:57:09 thoregon kernel: [ 9941.104396] [8121fbc9] xfs_ifree+0xe9/0xf0 Nov 6 21:57:09 thoregon kernel: [ 9941.104398] [811f4d2f] xfs_inactive+0x2af/0x480 Nov 6 21:57:09 thoregon kernel: [ 9941.104400] [811efe00] xfs_fs_evict_inode+0x70/0x80 Nov 6 21:57:09 thoregon kernel: [ 9941.104402] [8110cb8f] evict+0xaf/0x1b0 Nov 6 21:57:09 thoregon kernel: [ 9941.104405] [8110cd95] iput+0x105/0x210 Nov 6 21:57:09 thoregon kernel: [ 9941.104406] [81107ba0] d_delete+0x150/0x190 Nov 6 21:57:09 thoregon kernel: [ 9941.104408] [810ff8a7] vfs_rmdir+0x107/0x120 Nov 6 21:57:09 thoregon kernel: [ 9941.104411] [810ff9a4] do_rmdir+0xe4/0x130 Nov 6 21:57:09 thoregon kernel: [ 9941.104413] [81101c01] sys_rmdir+0x11/0x20 Nov 6 21:57:09 thoregon kernel: [ 9941.104415] [816b2d12] system_call_fastpath+0x16/0x1b Nov 6 21:57:09 thoregon kernel: [ 9941.104417] irq event stamp: 18505 Nov 6 21:57:09 thoregon kernel: [ 9941.104418] hardirqs last enabled at (18505): [816aec5d] mutex_trylock+0xfd/0x170 Nov 6 21:57:09 thoregon kernel: [ 9941.104421] hardirqs last disabled at (18504): [816aeb9e] mutex_trylock+0x3e/0x170 Nov 6 21:57:09 thoregon kernel: [ 9941.104423] softirqs last enabled at (18492): [81042fb1] __do_softirq+0x111/0x170 Nov 6 21:57:09 thoregon kernel: [ 9941.104426] softirqs last disabled at (18477): [816b3e3c] call_softirq+0x1c/0x30 Nov 6 21:57:09 thoregon kernel: [ 9941.104428] Nov 6 21:57:09 thoregon kernel: [ 9941.104428] other info that might help us debug this: Nov 6 21:57:09 thoregon kernel: [ 9941.104429] Possible unsafe locking scenario: Nov 6 21:57:09 thoregon kernel: [ 9941.104429] Nov 6 21:57:09 thoregon kernel: [ 9941.104430]CPU0 Nov 6 21:57:09 thoregon kernel: [ 9941.104431] Nov 6 21:57:09 thoregon kernel: [ 9941.104432] lock((ip-i_lock)-mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104433] Interrupt Nov 6 21:57:09 thoregon kernel: [ 9941.104434] lock((ip-i_lock)-mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104435] Nov 6 21:57:09 thoregon kernel: [ 9941.104435] *** DEADLOCK *** Nov 6 21:57:09 thoregon kernel: [ 9941.104435] Nov 6 21:57:09 thoregon kernel: [ 9941.104437] 3 locks held by kswapd0/725: Nov 6 21:57:09 thoregon kernel: [ 9941.104438] #0: (shrinker_rwsem){..}, at: [810bbd22] shrink_slab+0x32/0x1f0 Nov 6 21:57:09 thoregon kernel: [ 9941.104442] #1: (type-s_umount_key#20){.+}, at: [810f5a8e
Re: Hang in XFS reclaim on 3.7.0-rc3
On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser just.for.l...@googlemail.com wrote: On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser just.for.l...@googlemail.com wrote: I will keep LOCKDEP enabled on that system, and if there really is another splat, I will report back here. But I rather doubt that this will be needed. After the patch, I did not see this problem again, but today I found another LOCKDEP report that also looks XFS related. I found it twice in the logs, and as both were slightly different, I will attach both versions. Nov 6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted Nov 6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage. Nov 6 21:57:09 thoregon kernel: [ 9941.104430]CPU0 Nov 6 21:57:09 thoregon kernel: [ 9941.104431] Nov 6 21:57:09 thoregon kernel: [ 9941.104432] lock((ip-i_lock)-mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104433] Interrupt Nov 6 21:57:09 thoregon kernel: [ 9941.104434] lock((ip-i_lock)-mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104435] Nov 6 21:57:09 thoregon kernel: [ 9941.104435] *** DEADLOCK *** Sorry! Copied the wrong report. Your fix only landed in -rc5, so my vanilla -rc4 did (also) report the old problem again. And I copypasted that report instead of the second appearance of the new problem. Here is the correct second report of the sb_internal vs ip-i_lock-mr_lock problem: [110926.972477] [110926.972482] = [110926.972484] [ INFO: possible irq lock inversion dependency detected ] [110926.972486] 3.7.0-rc4 #1 Not tainted [110926.972487] - [110926.972489] kswapd0/725 just changed the state of lock: [110926.972490] (sb_internal){.+.+.?}, at: [8122b268] xfs_trans_alloc+0x28/0x50 [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past: [110926.972500] ((ip-i_lock)-mr_lock/1){+.+.+.} [110926.972500] [110926.972500] and interrupts could create inverse lock ordering between them. [110926.972500] [110926.972503] [110926.972503] other info that might help us debug this: [110926.972504] Possible interrupt unsafe locking scenario: [110926.972504] [110926.972505]CPU0CPU1 [110926.972506] [110926.972507] lock((ip-i_lock)-mr_lock/1); [110926.972509]local_irq_disable(); [110926.972509]lock(sb_internal); [110926.972511]lock((ip-i_lock)-mr_lock/1); [110926.972512] Interrupt [110926.972513] lock(sb_internal); [110926.972514] [110926.972514] *** DEADLOCK *** [110926.972514] [110926.972516] 2 locks held by kswapd0/725: [110926.972517] #0: (shrinker_rwsem){..}, at: [810bbd22] shrink_slab+0x32/0x1f0 [110926.972522] #1: (type-s_umount_key#20){.+}, at: [810f5a8e] grab_super_passive+0x3e/0x90 [110926.972527] [110926.972527] the shortest dependencies between 2nd lock and 1st lock: [110926.972533] - ((ip-i_lock)-mr_lock/1){+.+.+.} ops: 58117 { [110926.972536] HARDIRQ-ON-W at: [110926.972537] [8107f091] __lock_acquire+0x631/0x1c00 [110926.972540] [81080b55] lock_acquire+0x55/0x70 [110926.972542] [8106126a] down_write_nested+0x4a/0x70 [110926.972545] [811e8164] xfs_ilock+0x84/0xb0 [110926.972548] [811f5194] xfs_create+0x1d4/0x5a0 [110926.972550] [811eca1a] xfs_vn_mknod+0x8a/0x1b0 [110926.972552] [811ecb6e] xfs_vn_create+0xe/0x10 [110926.972554] [81100332] vfs_create+0x72/0xc0 [110926.972556] [81100b8e] do_last.isra.69+0x80e/0xc80 [110926.972558] [811010ab] path_openat.isra.70+0xab/0x490 [110926.972560] [8110184d] do_filp_open+0x3d/0xa0 [110926.972562] [810f2139] do_sys_open+0xf9/0x1e0 [110926.972565] [810f223c] sys_open+0x1c/0x20 [110926.972567] [816b2d12] system_call_fastpath+0x16/0x1b [110926.972570] SOFTIRQ-ON-W at: [110926.972571] [8107f0c7] __lock_acquire+0x667/0x1c00 [110926.972573] [81080b55] lock_acquire+0x55/0x70 [110926.972574] [8106126a] down_write_nested+0x4a/0x70 [110926.972576] [811e8164] xfs_ilock+0x84/0xb0 [110926.972578] [811f5194] xfs_create+0x1d4/0x5a0 [110926.972580] [811eca1a] xfs_vn_mknod+0x8a/0x1b0 [110926.972581] [811ecb6e] xfs_vn_create+0xe/0x10 [110926.972583
Re: Hang in XFS reclaim on 3.7.0-rc3
On Mon, Nov 19, 2012 at 12:51 AM, Dave Chinner da...@fromorbit.com wrote: On Sun, Nov 18, 2012 at 04:29:22PM +0100, Torsten Kaiser wrote: On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser just.for.l...@googlemail.com wrote: On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser just.for.l...@googlemail.com wrote: I will keep LOCKDEP enabled on that system, and if there really is another splat, I will report back here. But I rather doubt that this will be needed. After the patch, I did not see this problem again, but today I found another LOCKDEP report that also looks XFS related. I found it twice in the logs, and as both were slightly different, I will attach both versions. Nov 6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted Nov 6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage. Nov 6 21:57:09 thoregon kernel: [ 9941.104430]CPU0 Nov 6 21:57:09 thoregon kernel: [ 9941.104431] Nov 6 21:57:09 thoregon kernel: [ 9941.104432] lock((ip-i_lock)-mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104433] Interrupt Nov 6 21:57:09 thoregon kernel: [ 9941.104434] lock((ip-i_lock)-mr_lock); Nov 6 21:57:09 thoregon kernel: [ 9941.104435] Nov 6 21:57:09 thoregon kernel: [ 9941.104435] *** DEADLOCK *** Sorry! Copied the wrong report. Your fix only landed in -rc5, so my vanilla -rc4 did (also) report the old problem again. And I copypasted that report instead of the second appearance of the new problem. Can you repost it with line wrapping turned off? The output simply becomes unreadable when it wraps Yeah, I know I can put it back together, but I've got better things to do with my time than stitch a couple of hundred lines of debug back into a readable format Sorry about that, but I can't find any option to turn that off in Gmail. I have added the reports as attachment, I hope thats OK for you. Thanks for looking into this. Torsten [110926.972477] [110926.972482] = [110926.972484] [ INFO: possible irq lock inversion dependency detected ] [110926.972486] 3.7.0-rc4 #1 Not tainted [110926.972487] - [110926.972489] kswapd0/725 just changed the state of lock: [110926.972490] (sb_internal){.+.+.?}, at: [8122b268] xfs_trans_alloc+0x28/0x50 [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past: [110926.972500] ((ip-i_lock)-mr_lock/1){+.+.+.} [110926.972500] [110926.972500] and interrupts could create inverse lock ordering between them. [110926.972500] [110926.972503] [110926.972503] other info that might help us debug this: [110926.972504] Possible interrupt unsafe locking scenario: [110926.972504] [110926.972505]CPU0CPU1 [110926.972506] [110926.972507] lock((ip-i_lock)-mr_lock/1); [110926.972509]local_irq_disable(); [110926.972509]lock(sb_internal); [110926.972511]lock((ip-i_lock)-mr_lock/1); [110926.972512] Interrupt [110926.972513] lock(sb_internal); [110926.972514] [110926.972514] *** DEADLOCK *** [110926.972514] [110926.972516] 2 locks held by kswapd0/725: [110926.972517] #0: (shrinker_rwsem){..}, at: [810bbd22] shrink_slab+0x32/0x1f0 [110926.972522] #1: (type-s_umount_key#20){.+}, at: [810f5a8e] grab_super_passive+0x3e/0x90 [110926.972527] [110926.972527] the shortest dependencies between 2nd lock and 1st lock: [110926.972533] - ((ip-i_lock)-mr_lock/1){+.+.+.} ops: 58117 { [110926.972536] HARDIRQ-ON-W at: [110926.972537] [8107f091] __lock_acquire+0x631/0x1c00 [110926.972540] [81080b55] lock_acquire+0x55/0x70 [110926.972542] [8106126a] down_write_nested+0x4a/0x70 [110926.972545] [811e8164] xfs_ilock+0x84/0xb0 [110926.972548] [811f5194] xfs_create+0x1d4/0x5a0 [110926.972550] [811eca1a] xfs_vn_mknod+0x8a/0x1b0 [110926.972552] [811ecb6e] xfs_vn_create+0xe/0x10 [110926.972554] [81100332] vfs_create+0x72/0xc0 [110926.972556] [81100b8e] do_last.isra.69+0x80e/0xc80 [110926.972558] [811010ab] path_openat.isra.70+0xab/0x490 [110926.972560] [8110184d] do_filp_open+0x3d/0xa0 [110926.972562] [810f2139] do_sys_open+0xf9/0x1e0 [110926.972565] [810f223c] sys_open+0x1c/0x20 [110926.972567] [816b2d12] system_call_fastpath+0x16/0x1b [110926.972570] SOFTIRQ-ON-W at: [110926.972571
Re: [PATCH 2/4] AMD64 EDAC: Add support for >255 memory controllers
On Wed, Oct 31, 2012 at 6:55 AM, Daniel J Blueman wrote: > As the AMD64 last-level-cache ID is 16-bits and federated systems > eg using Numascale's NumaConnect/NumaChip can have more than 255 memory > controllers, use 16-bits to store the ID. > > Signed-off-by: Daniel J Blueman > --- > drivers/edac/amd64_edac.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c > index 18d404a..9920dfd 100644 > --- a/drivers/edac/amd64_edac.c > +++ b/drivers/edac/amd64_edac.c > @@ -942,7 +942,7 @@ static u64 get_error_address(struct mce *m) > struct amd64_pvt *pvt; > u64 cc6_base, tmp_addr; > u32 tmp; > - u8 mce_nid, intlv_en; > + u16 mce_nid, intlv_en; Is the change of intlv_en to u16 intentional? I assume its not, because... > if ((addr & GENMASK(24, 47)) >> 24 != 0x00fdf7) > return addr; > @@ -1499,7 +1499,7 @@ static int f1x_match_to_this_node(struct amd64_pvt > *pvt, unsigned range, > u8 channel; > bool high_range = false; > > - u8 node_id= dram_dst_node(pvt, range); > + u16 node_id = dram_dst_node(pvt, range); > u8 intlv_en = dram_intlv_en(pvt, range); ... here you keep it at u8. > u32 intlv_sel = dram_intlv_sel(pvt, range); > > @@ -2306,7 +2306,7 @@ out: > return ret; > } > > -static int toggle_ecc_err_reporting(struct ecc_settings *s, u8 nid, bool on) > +static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on) > { > cpumask_var_t cmask; > int cpu; > @@ -2344,7 +2344,7 @@ static int toggle_ecc_err_reporting(struct ecc_settings > *s, u8 nid, bool on) > return 0; > } > > -static bool enable_ecc_error_reporting(struct ecc_settings *s, u8 nid, > +static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid, >struct pci_dev *F3) > { > bool ret = true; > @@ -2396,7 +2396,7 @@ static bool enable_ecc_error_reporting(struct > ecc_settings *s, u8 nid, > return ret; > } > > -static void restore_ecc_error_reporting(struct ecc_settings *s, u8 nid, > +static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid, > struct pci_dev *F3) > { > u32 value, mask = 0x3; /* UECC/CECC enable */ > @@ -2435,7 +2435,7 @@ static const char *ecc_msg = > "'ecc_enable_override'.\n" > " (Note that use of the override may cause unknown side effects.)\n"; > > -static bool ecc_enabled(struct pci_dev *F3, u8 nid) > +static bool ecc_enabled(struct pci_dev *F3, u16 nid) > { > u32 value; > u8 ecc_en = 0; > @@ -2556,7 +2556,7 @@ static int amd64_init_one_instance(struct pci_dev *F2) > struct mem_ctl_info *mci = NULL; > struct edac_mc_layer layers[2]; > int err = 0, ret; > - u8 nid = get_node_id(F2); > + u16 nid = get_node_id(F2); > > ret = -ENOMEM; > pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL); > @@ -2647,7 +2647,7 @@ err_ret: > static int __devinit amd64_probe_one_instance(struct pci_dev *pdev, > const struct pci_device_id > *mc_type) > { > - u8 nid = get_node_id(pdev); > + u16 nid = get_node_id(pdev); > struct pci_dev *F3 = node_to_amd_nb(nid)->misc; > struct ecc_settings *s; > int ret = 0; > @@ -2697,7 +2697,7 @@ static void __devexit amd64_remove_one_instance(struct > pci_dev *pdev) > { > struct mem_ctl_info *mci; > struct amd64_pvt *pvt; > - u8 nid = get_node_id(pdev); > + u16 nid = get_node_id(pdev); > struct pci_dev *F3 = node_to_amd_nb(nid)->misc; > struct ecc_settings *s = ecc_stngs[nid]; > > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] AMD64 EDAC: Add support for 255 memory controllers
On Wed, Oct 31, 2012 at 6:55 AM, Daniel J Blueman dan...@numascale-asia.com wrote: As the AMD64 last-level-cache ID is 16-bits and federated systems eg using Numascale's NumaConnect/NumaChip can have more than 255 memory controllers, use 16-bits to store the ID. Signed-off-by: Daniel J Blueman dan...@numascale-asia.com --- drivers/edac/amd64_edac.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 18d404a..9920dfd 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -942,7 +942,7 @@ static u64 get_error_address(struct mce *m) struct amd64_pvt *pvt; u64 cc6_base, tmp_addr; u32 tmp; - u8 mce_nid, intlv_en; + u16 mce_nid, intlv_en; Is the change of intlv_en to u16 intentional? I assume its not, because... if ((addr GENMASK(24, 47)) 24 != 0x00fdf7) return addr; @@ -1499,7 +1499,7 @@ static int f1x_match_to_this_node(struct amd64_pvt *pvt, unsigned range, u8 channel; bool high_range = false; - u8 node_id= dram_dst_node(pvt, range); + u16 node_id = dram_dst_node(pvt, range); u8 intlv_en = dram_intlv_en(pvt, range); ... here you keep it at u8. u32 intlv_sel = dram_intlv_sel(pvt, range); @@ -2306,7 +2306,7 @@ out: return ret; } -static int toggle_ecc_err_reporting(struct ecc_settings *s, u8 nid, bool on) +static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on) { cpumask_var_t cmask; int cpu; @@ -2344,7 +2344,7 @@ static int toggle_ecc_err_reporting(struct ecc_settings *s, u8 nid, bool on) return 0; } -static bool enable_ecc_error_reporting(struct ecc_settings *s, u8 nid, +static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid, struct pci_dev *F3) { bool ret = true; @@ -2396,7 +2396,7 @@ static bool enable_ecc_error_reporting(struct ecc_settings *s, u8 nid, return ret; } -static void restore_ecc_error_reporting(struct ecc_settings *s, u8 nid, +static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid, struct pci_dev *F3) { u32 value, mask = 0x3; /* UECC/CECC enable */ @@ -2435,7 +2435,7 @@ static const char *ecc_msg = 'ecc_enable_override'.\n (Note that use of the override may cause unknown side effects.)\n; -static bool ecc_enabled(struct pci_dev *F3, u8 nid) +static bool ecc_enabled(struct pci_dev *F3, u16 nid) { u32 value; u8 ecc_en = 0; @@ -2556,7 +2556,7 @@ static int amd64_init_one_instance(struct pci_dev *F2) struct mem_ctl_info *mci = NULL; struct edac_mc_layer layers[2]; int err = 0, ret; - u8 nid = get_node_id(F2); + u16 nid = get_node_id(F2); ret = -ENOMEM; pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL); @@ -2647,7 +2647,7 @@ err_ret: static int __devinit amd64_probe_one_instance(struct pci_dev *pdev, const struct pci_device_id *mc_type) { - u8 nid = get_node_id(pdev); + u16 nid = get_node_id(pdev); struct pci_dev *F3 = node_to_amd_nb(nid)-misc; struct ecc_settings *s; int ret = 0; @@ -2697,7 +2697,7 @@ static void __devexit amd64_remove_one_instance(struct pci_dev *pdev) { struct mem_ctl_info *mci; struct amd64_pvt *pvt; - u8 nid = get_node_id(pdev); + u16 nid = get_node_id(pdev); struct pci_dev *F3 = node_to_amd_nb(nid)-misc; struct ecc_settings *s = ecc_stngs[nid]; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in XFS reclaim on 3.7.0-rc3
On Mon, Oct 29, 2012 at 11:26 PM, Dave Chinner wrote: > On Mon, Oct 29, 2012 at 09:03:15PM +0100, Torsten Kaiser wrote: >> After experiencing a hang of all IO yesterday ( >> http://marc.info/?l=linux-kernel=135142236520624=2 ), I turned on >> LOCKDEP after upgrading to -rc3. >> >> I then tried to replicate the load that hung yesterday and got the >> following lockdep report, implicating XFS instead of by stacking swap >> onto dm-crypt and md. >> >> [ 2844.971913] >> [ 2844.971920] = >> [ 2844.971921] [ INFO: inconsistent lock state ] >> [ 2844.971924] 3.7.0-rc3 #1 Not tainted >> [ 2844.971925] - >> [ 2844.971927] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. >> [ 2844.971929] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes: >> [ 2844.971931] (&(>i_lock)->mr_lock){?.}, at: [] >> xfs_ilock+0x84/0xb0 >> [ 2844.971941] {RECLAIM_FS-ON-W} state was registered at: >> [ 2844.971942] [] mark_held_locks+0x7e/0x130 >> [ 2844.971947] [] lockdep_trace_alloc+0x63/0xc0 >> [ 2844.971949] [] kmem_cache_alloc+0x35/0xe0 >> [ 2844.971952] [] vm_map_ram+0x271/0x770 >> [ 2844.971955] [] _xfs_buf_map_pages+0x46/0xe0 >> [ 2844.971959] [] xfs_buf_get_map+0x8a/0x130 >> [ 2844.971961] [] xfs_trans_get_buf_map+0xa9/0xd0 >> [ 2844.971964] [] xfs_ifree_cluster+0x129/0x670 >> [ 2844.971967] [] xfs_ifree+0xe9/0xf0 >> [ 2844.971969] [] xfs_inactive+0x2af/0x480 >> [ 2844.971972] [] xfs_fs_evict_inode+0x70/0x80 >> [ 2844.971974] [] evict+0xaf/0x1b0 >> [ 2844.971977] [] iput+0x105/0x210 >> [ 2844.971979] [] dentry_iput+0xa0/0xe0 >> [ 2844.971981] [] dput+0x150/0x280 >> [ 2844.971983] [] sys_renameat+0x21b/0x290 >> [ 2844.971986] [] sys_rename+0x16/0x20 >> [ 2844.971988] [] system_call_fastpath+0x16/0x1b > > We shouldn't be mapping pages there. See if the patch below fixes > it. Applying your fix and rerunning my test workload did not trigger this or any other LOCKDEP reports. While I'm not 100% sure about my test case always hitting this, your description makes me quite confident, that it really fixed this issue. I will keep LOCKDEP enabled on that system, and if there really is another splat, I will report back here. But I rather doubt that this will be needed. Thanks for the very quick fix! Torsten > Fundamentally, though, the lockdep warning has come about because > vm_map_ram is doing a GFP_KERNEL allocation when we need it to be > doing GFP_NOFS - we are within a transaction here, so memory reclaim > is not allowed to recurse back into the filesystem. > > mm-folk: can we please get this vmalloc/gfp_flags passing API > fixed once and for all? This is the fourth time in the last month or > so that I've seen XFS bug reports with silent hangs and associated > lockdep output that implicate GFP_KERNEL allocations from vm_map_ram > in GFP_NOFS conditions as the potential cause > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com > > xfs: don't vmap inode cluster buffers during free > > From: Dave Chinner > > Signed-off-by: Dave Chinner > --- > fs/xfs/xfs_inode.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index c4add46..82f6e5d 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -1781,7 +1781,8 @@ xfs_ifree_cluster( > * to mark all the active inodes on the buffer stale. > */ > bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno, > - mp->m_bsize * blks_per_cluster, 0); > + mp->m_bsize * blks_per_cluster, > + XBF_UNMAPPED); > > if (!bp) > return ENOMEM; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in XFS reclaim on 3.7.0-rc3
On Mon, Oct 29, 2012 at 11:26 PM, Dave Chinner da...@fromorbit.com wrote: On Mon, Oct 29, 2012 at 09:03:15PM +0100, Torsten Kaiser wrote: After experiencing a hang of all IO yesterday ( http://marc.info/?l=linux-kernelm=135142236520624w=2 ), I turned on LOCKDEP after upgrading to -rc3. I then tried to replicate the load that hung yesterday and got the following lockdep report, implicating XFS instead of by stacking swap onto dm-crypt and md. [ 2844.971913] [ 2844.971920] = [ 2844.971921] [ INFO: inconsistent lock state ] [ 2844.971924] 3.7.0-rc3 #1 Not tainted [ 2844.971925] - [ 2844.971927] inconsistent {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage. [ 2844.971929] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 2844.971931] ((ip-i_lock)-mr_lock){?.}, at: [811e7ef4] xfs_ilock+0x84/0xb0 [ 2844.971941] {RECLAIM_FS-ON-W} state was registered at: [ 2844.971942] [8108137e] mark_held_locks+0x7e/0x130 [ 2844.971947] [81081a63] lockdep_trace_alloc+0x63/0xc0 [ 2844.971949] [810e9dd5] kmem_cache_alloc+0x35/0xe0 [ 2844.971952] [810dba31] vm_map_ram+0x271/0x770 [ 2844.971955] [811e10a6] _xfs_buf_map_pages+0x46/0xe0 [ 2844.971959] [811e1fba] xfs_buf_get_map+0x8a/0x130 [ 2844.971961] [81233849] xfs_trans_get_buf_map+0xa9/0xd0 [ 2844.971964] [8121e339] xfs_ifree_cluster+0x129/0x670 [ 2844.971967] [8121f959] xfs_ifree+0xe9/0xf0 [ 2844.971969] [811f4abf] xfs_inactive+0x2af/0x480 [ 2844.971972] [811efb90] xfs_fs_evict_inode+0x70/0x80 [ 2844.971974] [8110cb8f] evict+0xaf/0x1b0 [ 2844.971977] [8110cd95] iput+0x105/0x210 [ 2844.971979] [811070d0] dentry_iput+0xa0/0xe0 [ 2844.971981] [81108310] dput+0x150/0x280 [ 2844.971983] [811020fb] sys_renameat+0x21b/0x290 [ 2844.971986] [81102186] sys_rename+0x16/0x20 [ 2844.971988] [816b2292] system_call_fastpath+0x16/0x1b We shouldn't be mapping pages there. See if the patch below fixes it. Applying your fix and rerunning my test workload did not trigger this or any other LOCKDEP reports. While I'm not 100% sure about my test case always hitting this, your description makes me quite confident, that it really fixed this issue. I will keep LOCKDEP enabled on that system, and if there really is another splat, I will report back here. But I rather doubt that this will be needed. Thanks for the very quick fix! Torsten Fundamentally, though, the lockdep warning has come about because vm_map_ram is doing a GFP_KERNEL allocation when we need it to be doing GFP_NOFS - we are within a transaction here, so memory reclaim is not allowed to recurse back into the filesystem. mm-folk: can we please get this vmalloc/gfp_flags passing API fixed once and for all? This is the fourth time in the last month or so that I've seen XFS bug reports with silent hangs and associated lockdep output that implicate GFP_KERNEL allocations from vm_map_ram in GFP_NOFS conditions as the potential cause Cheers, Dave. -- Dave Chinner da...@fromorbit.com xfs: don't vmap inode cluster buffers during free From: Dave Chinner dchin...@redhat.com Signed-off-by: Dave Chinner dchin...@redhat.com --- fs/xfs/xfs_inode.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index c4add46..82f6e5d 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1781,7 +1781,8 @@ xfs_ifree_cluster( * to mark all the active inodes on the buffer stale. */ bp = xfs_trans_get_buf(tp, mp-m_ddev_targp, blkno, - mp-m_bsize * blks_per_cluster, 0); + mp-m_bsize * blks_per_cluster, + XBF_UNMAPPED); if (!bp) return ENOMEM; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hang in XFS reclaim on 3.7.0-rc3
After experiencing a hang of all IO yesterday ( http://marc.info/?l=linux-kernel=135142236520624=2 ), I turned on LOCKDEP after upgrading to -rc3. I then tried to replicate the load that hung yesterday and got the following lockdep report, implicating XFS instead of by stacking swap onto dm-crypt and md. Oct 29 20:27:11 thoregon kernel: [ 2675.571958] usb 7-2: USB disconnect, device number 2 Oct 29 20:30:01 thoregon kernel: [ 2844.971913] Oct 29 20:30:01 thoregon kernel: [ 2844.971920] = Oct 29 20:30:01 thoregon kernel: [ 2844.971921] [ INFO: inconsistent lock state ] Oct 29 20:30:01 thoregon kernel: [ 2844.971924] 3.7.0-rc3 #1 Not tainted Oct 29 20:30:01 thoregon kernel: [ 2844.971925] - Oct 29 20:30:01 thoregon kernel: [ 2844.971927] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. Oct 29 20:30:01 thoregon kernel: [ 2844.971929] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes: Oct 29 20:30:01 thoregon kernel: [ 2844.971931] (&(>i_lock)->mr_lock){?.}, at: [] xfs_ilock+0x84/0xb0 Oct 29 20:30:01 thoregon kernel: [ 2844.971941] {RECLAIM_FS-ON-W} state was registered at: Oct 29 20:30:01 thoregon kernel: [ 2844.971942] [] mark_held_locks+0x7e/0x130 Oct 29 20:30:01 thoregon kernel: [ 2844.971947] [] lockdep_trace_alloc+0x63/0xc0 Oct 29 20:30:01 thoregon kernel: [ 2844.971949] [] kmem_cache_alloc+0x35/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971952] [] vm_map_ram+0x271/0x770 Oct 29 20:30:01 thoregon kernel: [ 2844.971955] [] _xfs_buf_map_pages+0x46/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971959] [] xfs_buf_get_map+0x8a/0x130 Oct 29 20:30:01 thoregon kernel: [ 2844.971961] [] xfs_trans_get_buf_map+0xa9/0xd0 Oct 29 20:30:01 thoregon kernel: [ 2844.971964] [] xfs_ifree_cluster+0x129/0x670 Oct 29 20:30:01 thoregon kernel: [ 2844.971967] [] xfs_ifree+0xe9/0xf0 Oct 29 20:30:01 thoregon kernel: [ 2844.971969] [] xfs_inactive+0x2af/0x480 Oct 29 20:30:01 thoregon kernel: [ 2844.971972] [] xfs_fs_evict_inode+0x70/0x80 Oct 29 20:30:01 thoregon kernel: [ 2844.971974] [] evict+0xaf/0x1b0 Oct 29 20:30:01 thoregon kernel: [ 2844.971977] [] iput+0x105/0x210 Oct 29 20:30:01 thoregon kernel: [ 2844.971979] [] dentry_iput+0xa0/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971981] [] dput+0x150/0x280 Oct 29 20:30:01 thoregon kernel: [ 2844.971983] [] sys_renameat+0x21b/0x290 Oct 29 20:30:01 thoregon kernel: [ 2844.971986] [] sys_rename+0x16/0x20 Oct 29 20:30:01 thoregon kernel: [ 2844.971988] [] system_call_fastpath+0x16/0x1b Oct 29 20:30:01 thoregon kernel: [ 2844.971992] irq event stamp: 155377 Oct 29 20:30:01 thoregon kernel: [ 2844.971993] hardirqs last enabled at (155377): [] mutex_trylock+0xfd/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.971997] hardirqs last disabled at (155376): [] mutex_trylock+0x3e/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.971999] softirqs last enabled at (155368): [] __do_softirq+0x111/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.972002] softirqs last disabled at (155353): [] call_softirq+0x1c/0x30 Oct 29 20:30:01 thoregon kernel: [ 2844.972004] Oct 29 20:30:01 thoregon kernel: [ 2844.972004] other info that might help us debug this: Oct 29 20:30:01 thoregon kernel: [ 2844.972006] Possible unsafe locking scenario: Oct 29 20:30:01 thoregon kernel: [ 2844.972006] Oct 29 20:30:01 thoregon kernel: [ 2844.972007]CPU0 Oct 29 20:30:01 thoregon kernel: [ 2844.972007] Oct 29 20:30:01 thoregon kernel: [ 2844.972008] lock(&(>i_lock)->mr_lock); Oct 29 20:30:01 thoregon kernel: [ 2844.972009] Oct 29 20:30:01 thoregon kernel: [ 2844.972010] lock(&(>i_lock)->mr_lock); Oct 29 20:30:01 thoregon kernel: [ 2844.972012] Oct 29 20:30:01 thoregon kernel: [ 2844.972012] *** DEADLOCK *** Oct 29 20:30:01 thoregon kernel: [ 2844.972012] Oct 29 20:30:01 thoregon kernel: [ 2844.972013] 3 locks held by kswapd0/725: Oct 29 20:30:01 thoregon kernel: [ 2844.972014] #0: (shrinker_rwsem){..}, at: [] shrink_slab+0x32/0x1f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972020] #1: (>s_umount_key#20){.+}, at: [] grab_super_passive+0x3e/0x90 Oct 29 20:30:01 thoregon kernel: [ 2844.972024] #2: (>pag_ici_reclaim_lock){+.+...}, at: [] xfs_reclaim_inodes_ag+0xbc/0x4f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972027] Oct 29 20:30:01 thoregon kernel: [ 2844.972027] stack backtrace: Oct 29 20:30:01 thoregon kernel: [ 2844.972029] Pid: 725, comm: kswapd0 Not tainted 3.7.0-rc3 #1 Oct 29 20:30:01 thoregon kernel: [ 2844.972031] Call Trace: Oct 29 20:30:01 thoregon kernel: [ 2844.972035] [] print_usage_bug+0x1f5/0x206 Oct 29 20:30:01 thoregon kernel: [ 2844.972039] [] ? save_stack_trace+0x2a/0x50 Oct 29 20:30:01 thoregon kernel: [ 2844.972042] [] mark_lock+0x28d/0x2f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972044] [] ? print_irq_inversion_bug.part.37+0x1f0/0x1f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972047] [] __lock_acquire+0x57f/0x1c00 Oct 29 20:30:01 thoregon kernel: [ 2844.972049] []
Hang in XFS reclaim on 3.7.0-rc3
After experiencing a hang of all IO yesterday ( http://marc.info/?l=linux-kernelm=135142236520624w=2 ), I turned on LOCKDEP after upgrading to -rc3. I then tried to replicate the load that hung yesterday and got the following lockdep report, implicating XFS instead of by stacking swap onto dm-crypt and md. Oct 29 20:27:11 thoregon kernel: [ 2675.571958] usb 7-2: USB disconnect, device number 2 Oct 29 20:30:01 thoregon kernel: [ 2844.971913] Oct 29 20:30:01 thoregon kernel: [ 2844.971920] = Oct 29 20:30:01 thoregon kernel: [ 2844.971921] [ INFO: inconsistent lock state ] Oct 29 20:30:01 thoregon kernel: [ 2844.971924] 3.7.0-rc3 #1 Not tainted Oct 29 20:30:01 thoregon kernel: [ 2844.971925] - Oct 29 20:30:01 thoregon kernel: [ 2844.971927] inconsistent {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage. Oct 29 20:30:01 thoregon kernel: [ 2844.971929] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes: Oct 29 20:30:01 thoregon kernel: [ 2844.971931] ((ip-i_lock)-mr_lock){?.}, at: [811e7ef4] xfs_ilock+0x84/0xb0 Oct 29 20:30:01 thoregon kernel: [ 2844.971941] {RECLAIM_FS-ON-W} state was registered at: Oct 29 20:30:01 thoregon kernel: [ 2844.971942] [8108137e] mark_held_locks+0x7e/0x130 Oct 29 20:30:01 thoregon kernel: [ 2844.971947] [81081a63] lockdep_trace_alloc+0x63/0xc0 Oct 29 20:30:01 thoregon kernel: [ 2844.971949] [810e9dd5] kmem_cache_alloc+0x35/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971952] [810dba31] vm_map_ram+0x271/0x770 Oct 29 20:30:01 thoregon kernel: [ 2844.971955] [811e10a6] _xfs_buf_map_pages+0x46/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971959] [811e1fba] xfs_buf_get_map+0x8a/0x130 Oct 29 20:30:01 thoregon kernel: [ 2844.971961] [81233849] xfs_trans_get_buf_map+0xa9/0xd0 Oct 29 20:30:01 thoregon kernel: [ 2844.971964] [8121e339] xfs_ifree_cluster+0x129/0x670 Oct 29 20:30:01 thoregon kernel: [ 2844.971967] [8121f959] xfs_ifree+0xe9/0xf0 Oct 29 20:30:01 thoregon kernel: [ 2844.971969] [811f4abf] xfs_inactive+0x2af/0x480 Oct 29 20:30:01 thoregon kernel: [ 2844.971972] [811efb90] xfs_fs_evict_inode+0x70/0x80 Oct 29 20:30:01 thoregon kernel: [ 2844.971974] [8110cb8f] evict+0xaf/0x1b0 Oct 29 20:30:01 thoregon kernel: [ 2844.971977] [8110cd95] iput+0x105/0x210 Oct 29 20:30:01 thoregon kernel: [ 2844.971979] [811070d0] dentry_iput+0xa0/0xe0 Oct 29 20:30:01 thoregon kernel: [ 2844.971981] [81108310] dput+0x150/0x280 Oct 29 20:30:01 thoregon kernel: [ 2844.971983] [811020fb] sys_renameat+0x21b/0x290 Oct 29 20:30:01 thoregon kernel: [ 2844.971986] [81102186] sys_rename+0x16/0x20 Oct 29 20:30:01 thoregon kernel: [ 2844.971988] [816b2292] system_call_fastpath+0x16/0x1b Oct 29 20:30:01 thoregon kernel: [ 2844.971992] irq event stamp: 155377 Oct 29 20:30:01 thoregon kernel: [ 2844.971993] hardirqs last enabled at (155377): [816ae1ed] mutex_trylock+0xfd/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.971997] hardirqs last disabled at (155376): [816ae12e] mutex_trylock+0x3e/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.971999] softirqs last enabled at (155368): [81042fb1] __do_softirq+0x111/0x170 Oct 29 20:30:01 thoregon kernel: [ 2844.972002] softirqs last disabled at (155353): [816b33bc] call_softirq+0x1c/0x30 Oct 29 20:30:01 thoregon kernel: [ 2844.972004] Oct 29 20:30:01 thoregon kernel: [ 2844.972004] other info that might help us debug this: Oct 29 20:30:01 thoregon kernel: [ 2844.972006] Possible unsafe locking scenario: Oct 29 20:30:01 thoregon kernel: [ 2844.972006] Oct 29 20:30:01 thoregon kernel: [ 2844.972007]CPU0 Oct 29 20:30:01 thoregon kernel: [ 2844.972007] Oct 29 20:30:01 thoregon kernel: [ 2844.972008] lock((ip-i_lock)-mr_lock); Oct 29 20:30:01 thoregon kernel: [ 2844.972009] Interrupt Oct 29 20:30:01 thoregon kernel: [ 2844.972010] lock((ip-i_lock)-mr_lock); Oct 29 20:30:01 thoregon kernel: [ 2844.972012] Oct 29 20:30:01 thoregon kernel: [ 2844.972012] *** DEADLOCK *** Oct 29 20:30:01 thoregon kernel: [ 2844.972012] Oct 29 20:30:01 thoregon kernel: [ 2844.972013] 3 locks held by kswapd0/725: Oct 29 20:30:01 thoregon kernel: [ 2844.972014] #0: (shrinker_rwsem){..}, at: [810bbd22] shrink_slab+0x32/0x1f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972020] #1: (type-s_umount_key#20){.+}, at: [810f5a8e] grab_super_passive+0x3e/0x90 Oct 29 20:30:01 thoregon kernel: [ 2844.972024] #2: (pag-pag_ici_reclaim_lock){+.+...}, at: [811f263c] xfs_reclaim_inodes_ag+0xbc/0x4f0 Oct 29 20:30:01 thoregon kernel: [ 2844.972027] Oct 29 20:30:01 thoregon kernel: [ 2844.972027] stack backtrace: Oct 29 20:30:01 thoregon kernel: [ 2844.972029] Pid: 725, comm: kswapd0 Not tainted 3.7.0-rc3 #1 Oct 29 20:30:01 thoregon kernel: [ 2844.972031] Call Trace: Oct 29 20:30:01 thoregon kernel:
Hang with swap / mempool / md on 3.7.0-rc2
While 3.7.0-rc1 and -rc2 otherwise worked fine for me, today my system experienced a hang, trying to write to its disks. Source of the problem seems to be a hang in kswapd0, after that many more processes got stuck trying to do IO. Even an emergency sync via SysRq+S did no longer complete. The hang (that was still correctly logged to disk): Oct 28 09:40:16 thoregon kernel: [141366.412179] INFO: task kswapd0:724 blocked for more than 120 seconds. Oct 28 09:40:16 thoregon kernel: [141366.412186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 28 09:40:16 thoregon kernel: [141366.412191] kswapd0 D 880337d112c0 0 724 2 0x Oct 28 09:40:16 thoregon kernel: [141366.412200] 880329b8efa0 0046 0800 88032986d240 Oct 28 09:40:16 thoregon kernel: [141366.412210] 880329183fd8 880329183fd8 880329183fd8 880329b8efa0 Oct 28 09:40:16 thoregon kernel: [141366.412217] 0246 880329947680 880329947400 Oct 28 09:40:16 thoregon kernel: [141366.412224] Call Trace: Oct 28 09:40:16 thoregon kernel: [141366.412239] [] ? md_super_wait+0x4d/0x80 Oct 28 09:40:16 thoregon kernel: [141366.412249] [] ? add_wait_queue+0x60/0x60 Oct 28 09:40:16 thoregon kernel: [141366.412257] [] ? bitmap_unplug+0x153/0x160 Oct 28 09:40:16 thoregon kernel: [141366.412265] [] ? new_slab+0x1ec/0x220 Oct 28 09:40:16 thoregon kernel: [141366.412273] [] ? raid1_unplug+0xb8/0x110 Oct 28 09:40:16 thoregon kernel: [141366.412281] [] ? blk_flush_plug_list+0xb0/0x210 Oct 28 09:40:16 thoregon kernel: [141366.412288] [] ? io_schedule_timeout+0x82/0xf0 Oct 28 09:40:16 thoregon kernel: [141366.412296] [] ? mempool_alloc+0x122/0x150 Oct 28 09:40:16 thoregon kernel: [141366.412302] [] ? add_wait_queue+0x60/0x60 Oct 28 09:40:16 thoregon kernel: [141366.412309] [] ? bio_alloc_bioset+0x4e/0x120 Oct 28 09:40:16 thoregon kernel: [141366.412315] [] ? bio_clone_bioset+0x12/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412322] [] ? make_request+0x416/0xb70 Oct 28 09:40:16 thoregon kernel: [141366.412328] [] ? new_slab+0x1ec/0x220 Oct 28 09:40:16 thoregon kernel: [141366.412336] [] ? blk_recount_segments+0x21/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412343] [] ? md_make_request+0xbf/0x1e0 Oct 28 09:40:16 thoregon kernel: [141366.412349] [] ? generic_make_request+0xba/0xf0 Oct 28 09:40:16 thoregon kernel: [141366.412355] [] ? submit_bio+0x61/0x110 Oct 28 09:40:16 thoregon kernel: [141366.412363] [] ? _xfs_buf_ioapply+0x1e5/0x270 Oct 28 09:40:16 thoregon kernel: [141366.412370] [] ? try_to_wake_up+0x280/0x280 Oct 28 09:40:16 thoregon kernel: [141366.412377] [] ? xfs_buf_iorequest+0x25/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412383] [] ? xlog_bdstrat+0x16/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412389] [] ? xlog_sync+0x1bd/0x390 Oct 28 09:40:16 thoregon kernel: [141366.412394] [] ? xlog_assign_tail_lsn_locked+0x19/0x50 Oct 28 09:40:16 thoregon kernel: [141366.412400] [] ? xlog_write+0x554/0x6f0 Oct 28 09:40:16 thoregon kernel: [141366.412408] [] ? kmem_zone_zalloc+0x32/0x50 Oct 28 09:40:16 thoregon kernel: [141366.412415] [] ? xlog_cil_push+0x26d/0x350 Oct 28 09:40:16 thoregon kernel: [141366.412421] [] ? xlog_cil_force_lsn+0x130/0x140 Oct 28 09:40:16 thoregon kernel: [141366.412427] [] ? dequeue_task_fair+0x52/0x180 Oct 28 09:40:16 thoregon kernel: [141366.412433] [] ? _xfs_log_force_lsn+0x47/0x2d0 Oct 28 09:40:16 thoregon kernel: [141366.412439] [] ? __slab_free+0x17d/0x293 Oct 28 09:40:16 thoregon kernel: [141366.412446] [] ? delayacct_end+0x81/0xa0 Oct 28 09:40:16 thoregon kernel: [141366.412452] [] ? xfs_log_force_lsn+0xb/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412458] [] ? xfs_iunpin_wait+0x93/0xf0 Oct 28 09:40:16 thoregon kernel: [141366.412465] [] ? autoremove_wake_function+0x30/0x30 Oct 28 09:40:16 thoregon kernel: [141366.412471] [] ? xfs_reclaim_inode+0x11b/0x300 Oct 28 09:40:16 thoregon kernel: [141366.412478] [] ? xfs_reclaim_inodes_ag+0x1bb/0x2c0 Oct 28 09:40:16 thoregon kernel: [141366.412486] [] ? xfs_reclaim_inodes_nr+0x2c/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412493] [] ? prune_super+0x113/0x1b0 Oct 28 09:40:16 thoregon kernel: [141366.412499] [] ? shrink_slab+0x119/0x1c0 Oct 28 09:40:16 thoregon kernel: [141366.412506] [] ? kswapd+0x682/0x9a0 Oct 28 09:40:16 thoregon kernel: [141366.412513] [] ? add_wait_queue+0x60/0x60 Oct 28 09:40:16 thoregon kernel: [141366.412519] [] ? shrink_lruvec+0x540/0x540 Oct 28 09:40:16 thoregon kernel: [141366.412525] [] ? kthread+0xb3/0xc0 Oct 28 09:40:16 thoregon kernel: [141366.412531] [] ? flush_kthread_worker+0xa0/0xa0 Oct 28 09:40:16 thoregon kernel: [141366.412538] [] ? ret_from_fork+0x7c/0xb0 Oct 28 09:40:16 thoregon kernel: [141366.412544] [] ? flush_kthread_worker+0xa0/0xa0 After that xfsaild/md4, flush-9:4 and several user processes also got such an hang message, these look like they just got stuck on
Hang with swap / mempool / md on 3.7.0-rc2
While 3.7.0-rc1 and -rc2 otherwise worked fine for me, today my system experienced a hang, trying to write to its disks. Source of the problem seems to be a hang in kswapd0, after that many more processes got stuck trying to do IO. Even an emergency sync via SysRq+S did no longer complete. The hang (that was still correctly logged to disk): Oct 28 09:40:16 thoregon kernel: [141366.412179] INFO: task kswapd0:724 blocked for more than 120 seconds. Oct 28 09:40:16 thoregon kernel: [141366.412186] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Oct 28 09:40:16 thoregon kernel: [141366.412191] kswapd0 D 880337d112c0 0 724 2 0x Oct 28 09:40:16 thoregon kernel: [141366.412200] 880329b8efa0 0046 0800 88032986d240 Oct 28 09:40:16 thoregon kernel: [141366.412210] 880329183fd8 880329183fd8 880329183fd8 880329b8efa0 Oct 28 09:40:16 thoregon kernel: [141366.412217] 0246 880329947680 880329947400 Oct 28 09:40:16 thoregon kernel: [141366.412224] Call Trace: Oct 28 09:40:16 thoregon kernel: [141366.412239] [814a6fbd] ? md_super_wait+0x4d/0x80 Oct 28 09:40:16 thoregon kernel: [141366.412249] [81054340] ? add_wait_queue+0x60/0x60 Oct 28 09:40:16 thoregon kernel: [141366.412257] [814ad283] ? bitmap_unplug+0x153/0x160 Oct 28 09:40:16 thoregon kernel: [141366.412265] [810cb3dc] ? new_slab+0x1ec/0x220 Oct 28 09:40:16 thoregon kernel: [141366.412273] [81497fc8] ? raid1_unplug+0xb8/0x110 Oct 28 09:40:16 thoregon kernel: [141366.412281] [81238180] ? blk_flush_plug_list+0xb0/0x210 Oct 28 09:40:16 thoregon kernel: [141366.412288] [816289e2] ? io_schedule_timeout+0x82/0xf0 Oct 28 09:40:16 thoregon kernel: [141366.412296] [81097882] ? mempool_alloc+0x122/0x150 Oct 28 09:40:16 thoregon kernel: [141366.412302] [81054340] ? add_wait_queue+0x60/0x60 Oct 28 09:40:16 thoregon kernel: [141366.412309] [811025fe] ? bio_alloc_bioset+0x4e/0x120 Oct 28 09:40:16 thoregon kernel: [141366.412315] [81102802] ? bio_clone_bioset+0x12/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412322] [8149be16] ? make_request+0x416/0xb70 Oct 28 09:40:16 thoregon kernel: [141366.412328] [810cb3dc] ? new_slab+0x1ec/0x220 Oct 28 09:40:16 thoregon kernel: [141366.412336] [8123b9e1] ? blk_recount_segments+0x21/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412343] [814a0a0f] ? md_make_request+0xbf/0x1e0 Oct 28 09:40:16 thoregon kernel: [141366.412349] [81236d9a] ? generic_make_request+0xba/0xf0 Oct 28 09:40:16 thoregon kernel: [141366.412355] [81236e31] ? submit_bio+0x61/0x110 Oct 28 09:40:16 thoregon kernel: [141366.412363] [811a8415] ? _xfs_buf_ioapply+0x1e5/0x270 Oct 28 09:40:16 thoregon kernel: [141366.412370] [8105f3c0] ? try_to_wake_up+0x280/0x280 Oct 28 09:40:16 thoregon kernel: [141366.412377] [811a9535] ? xfs_buf_iorequest+0x25/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412383] [811f2666] ? xlog_bdstrat+0x16/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412389] [811f398d] ? xlog_sync+0x1bd/0x390 Oct 28 09:40:16 thoregon kernel: [141366.412394] [811f40e9] ? xlog_assign_tail_lsn_locked+0x19/0x50 Oct 28 09:40:16 thoregon kernel: [141366.412400] [811f4b64] ? xlog_write+0x554/0x6f0 Oct 28 09:40:16 thoregon kernel: [141366.412408] [811bd1f2] ? kmem_zone_zalloc+0x32/0x50 Oct 28 09:40:16 thoregon kernel: [141366.412415] [811f5fed] ? xlog_cil_push+0x26d/0x350 Oct 28 09:40:16 thoregon kernel: [141366.412421] [811f67b0] ? xlog_cil_force_lsn+0x130/0x140 Oct 28 09:40:16 thoregon kernel: [141366.412427] [81061b02] ? dequeue_task_fair+0x52/0x180 Oct 28 09:40:16 thoregon kernel: [141366.412433] [811f5157] ? _xfs_log_force_lsn+0x47/0x2d0 Oct 28 09:40:16 thoregon kernel: [141366.412439] [8162265a] ? __slab_free+0x17d/0x293 Oct 28 09:40:16 thoregon kernel: [141366.412446] [81087481] ? delayacct_end+0x81/0xa0 Oct 28 09:40:16 thoregon kernel: [141366.412452] [811f53eb] ? xfs_log_force_lsn+0xb/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412458] [811e6733] ? xfs_iunpin_wait+0x93/0xf0 Oct 28 09:40:16 thoregon kernel: [141366.412465] [81054370] ? autoremove_wake_function+0x30/0x30 Oct 28 09:40:16 thoregon kernel: [141366.412471] [811b85db] ? xfs_reclaim_inode+0x11b/0x300 Oct 28 09:40:16 thoregon kernel: [141366.412478] [811b8d9b] ? xfs_reclaim_inodes_ag+0x1bb/0x2c0 Oct 28 09:40:16 thoregon kernel: [141366.412486] [811b8fbc] ? xfs_reclaim_inodes_nr+0x2c/0x40 Oct 28 09:40:16 thoregon kernel: [141366.412493] [810d80e3] ? prune_super+0x113/0x1b0 Oct 28 09:40:16 thoregon kernel: [141366.412499] [810a1829] ? shrink_slab+0x119/0x1c0 Oct 28 09:40:16 thoregon kernel: [141366.412506] [810a3d82] ?
Re: Linux 2.6.25-rc2
On Feb 19, 2008 5:20 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote: > So: > - it might be something else entirely > - it might still be the local cmpxchg, just Torsten didn't happen to >notice it until later. My new hackbench-testcase also killed 2.6.24-rc2-mm1, so I really noticed to late. > - it might still be the local cmpxchg, but something else changed its >patterns to actually make it start triggering. > > and in general I don't think we should revert it unless we have stronger > indications that it really is the problem (eg somebody finds the actual > bug, or a reporter can confirm that it goes away when the local cmpxchg > optimization is disabled). I tried the following three patches: switching the barrier() for a smp_mb() in 2.6.25-rc2-mm1: -> crashed reverting the FASTPATH-patch in 2.6.25-rc2: -> worked only removed FAST_CMPXCHG_LOCAL from arch/x86/Kconfig -> worked So all of these tests seem to confirm, that the bug is in the new SLUB fastpath. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 19, 2008 5:20 PM, Linus Torvalds [EMAIL PROTECTED] wrote: So: - it might be something else entirely - it might still be the local cmpxchg, just Torsten didn't happen to notice it until later. My new hackbench-testcase also killed 2.6.24-rc2-mm1, so I really noticed to late. - it might still be the local cmpxchg, but something else changed its patterns to actually make it start triggering. and in general I don't think we should revert it unless we have stronger indications that it really is the problem (eg somebody finds the actual bug, or a reporter can confirm that it goes away when the local cmpxchg optimization is disabled). I tried the following three patches: switching the barrier() for a smp_mb() in 2.6.25-rc2-mm1: - crashed reverting the FASTPATH-patch in 2.6.25-rc2: - worked only removed FAST_CMPXCHG_LOCAL from arch/x86/Kconfig - worked So all of these tests seem to confirm, that the bug is in the new SLUB fastpath. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 19, 2008 7:11 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > * Torsten Kaiser <[EMAIL PROTECTED]> wrote: > > On Feb 15, 2008 10:23 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > > Ok, > > > this kernel is a winner. > > > > Sadly not for me: > > [ 5282.056415] [ cut here ] > > [ 5282.059757] kernel BUG at lib/list_debug.c:33! > > [ 5282.062055] invalid opcode: [1] SMP > > [ 5282.062055] CPU 3 > > hm. Your crashes do seem to span multiple subsystems, but it always > seems to be around the SLUB code. Could you try the patch below? The > SLUB code has a new optimization and i'm not 100% sure about it. [the > hack below switches the SLUB optimization off by disabling the CPU > feature it relies on.] > > Ingo > > -> > arch/x86/Kconfig |4 > 1 file changed, 4 deletions(-) > > Index: linux/arch/x86/Kconfig > === > --- linux.orig/arch/x86/Kconfig > +++ linux/arch/x86/Kconfig > @@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT > config SEMAPHORE_SLEEPERS > def_bool y > > -config FAST_CMPXCHG_LOCAL > - bool > - default y > - > config MMU > def_bool y > $ grep FAST_CMPXCHG_LOCAL */.config linux-2.6.24-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc3-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc3-mm2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc6-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc8-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.25-rc1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.25-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.25-rc2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y -rc2-mm1 still worked for me. Did you mean the new SLUB_FASTPATH? $ grep "define SLUB_FASTPATH" */mm/slub.c linux-2.6.25-rc1/mm/slub.c:#define SLUB_FASTPATH linux-2.6.25-rc2-mm1/mm/slub.c:#define SLUB_FASTPATH linux-2.6.25-rc2/mm/slub.c:#define SLUB_FASTPATH The 2.6.24-rc3+ mm-kernels did crash for me, but don't seem to contain this... On the other hand: >From the crash in 2.6.25-rc2-mm1: [59987.116182] RIP [] kmem_cache_alloc_node+0x6d/0xa0 (gdb) list *0x8029f83d 0x8029f83d is in kmem_cache_alloc_node (mm/slub.c:1646). 1641if (unlikely(is_end(object) || !node_match(c, node))) { 1642object = __slab_alloc(s, gfpflags, node, addr, c); 1643break; 1644} 1645stat(c, ALLOC_FASTPATH); 1646} while (cmpxchg_local(>freelist, object, object[c->offset]) 1647 != object); 1648#else 1649unsigned long flags; 1650 That code is part for SLUB_FASTPATH. I'm willing to test the patch, but don't know how fast I can find the time to do it, so my answer if your patch helps might be delayed until the weekend. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 19, 2008 12:54 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Sat, 16 Feb 2008, Torsten Kaiser wrote: > > > > [ 5282.056415] [ cut here ] > > [ 5282.059757] kernel BUG at lib/list_debug.c:33! > > Is there any chance that you could try to bisect this, if it's repeatable > enough for you? Even if you can't bisect it *all* the way, it would be > really good to do a handful of bisection runs which should already > hopefully narrow it down a bit more. > > Linus > It's repeatable, but not in a really reliable way. So to mark a kernel good I need to compile around 100 KDE packages, and even then I'm not 100% sure, if it's good or if I was just lucky. But I did a partly bisect against 2.6.24-rc6-mm1: 2.6.24-rc6 + mm-patches up to (including) git.nfsd -> worked 2.6.24-rc6 + mm-patches up to (including) git.xfs -> crashed I think the only added patch between rc2-mm1 and rc3-mm2 in that range where the iommu changes that I later ruled out. That leaves some git trees as suspects: git-ocfs2.patch git-selinux.patch git-s390.patch git-sched.patch git-sh.patch git-scsi-misc.patch git-unionfs.patch git-v9fs.patch git-watchdog.patch git-wireless.patch git-ipwireless_cs.patch git-x86.patch git-xfs.patch (see http://marc.info/?l=linux-kernel=120276641105256 ) Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 19, 2008 12:54 AM, Linus Torvalds [EMAIL PROTECTED] wrote: On Sat, 16 Feb 2008, Torsten Kaiser wrote: [ 5282.056415] [ cut here ] [ 5282.059757] kernel BUG at lib/list_debug.c:33! Is there any chance that you could try to bisect this, if it's repeatable enough for you? Even if you can't bisect it *all* the way, it would be really good to do a handful of bisection runs which should already hopefully narrow it down a bit more. Linus It's repeatable, but not in a really reliable way. So to mark a kernel good I need to compile around 100 KDE packages, and even then I'm not 100% sure, if it's good or if I was just lucky. But I did a partly bisect against 2.6.24-rc6-mm1: 2.6.24-rc6 + mm-patches up to (including) git.nfsd - worked 2.6.24-rc6 + mm-patches up to (including) git.xfs - crashed I think the only added patch between rc2-mm1 and rc3-mm2 in that range where the iommu changes that I later ruled out. That leaves some git trees as suspects: git-ocfs2.patch git-selinux.patch git-s390.patch git-sched.patch git-sh.patch git-scsi-misc.patch git-unionfs.patch git-v9fs.patch git-watchdog.patch git-wireless.patch git-ipwireless_cs.patch git-x86.patch git-xfs.patch (see http://marc.info/?l=linux-kernelm=120276641105256 ) Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 19, 2008 7:11 AM, Ingo Molnar [EMAIL PROTECTED] wrote: * Torsten Kaiser [EMAIL PROTECTED] wrote: On Feb 15, 2008 10:23 PM, Linus Torvalds [EMAIL PROTECTED] wrote: Ok, this kernel is a winner. Sadly not for me: [ 5282.056415] [ cut here ] [ 5282.059757] kernel BUG at lib/list_debug.c:33! [ 5282.062055] invalid opcode: [1] SMP [ 5282.062055] CPU 3 hm. Your crashes do seem to span multiple subsystems, but it always seems to be around the SLUB code. Could you try the patch below? The SLUB code has a new optimization and i'm not 100% sure about it. [the hack below switches the SLUB optimization off by disabling the CPU feature it relies on.] Ingo - arch/x86/Kconfig |4 1 file changed, 4 deletions(-) Index: linux/arch/x86/Kconfig === --- linux.orig/arch/x86/Kconfig +++ linux/arch/x86/Kconfig @@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT config SEMAPHORE_SLEEPERS def_bool y -config FAST_CMPXCHG_LOCAL - bool - default y - config MMU def_bool y $ grep FAST_CMPXCHG_LOCAL */.config linux-2.6.24-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc3-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc3-mm2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc6-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.24-rc8-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.25-rc1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.25-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y linux-2.6.25-rc2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y -rc2-mm1 still worked for me. Did you mean the new SLUB_FASTPATH? $ grep define SLUB_FASTPATH */mm/slub.c linux-2.6.25-rc1/mm/slub.c:#define SLUB_FASTPATH linux-2.6.25-rc2-mm1/mm/slub.c:#define SLUB_FASTPATH linux-2.6.25-rc2/mm/slub.c:#define SLUB_FASTPATH The 2.6.24-rc3+ mm-kernels did crash for me, but don't seem to contain this... On the other hand: From the crash in 2.6.25-rc2-mm1: [59987.116182] RIP [8029f83d] kmem_cache_alloc_node+0x6d/0xa0 (gdb) list *0x8029f83d 0x8029f83d is in kmem_cache_alloc_node (mm/slub.c:1646). 1641if (unlikely(is_end(object) || !node_match(c, node))) { 1642object = __slab_alloc(s, gfpflags, node, addr, c); 1643break; 1644} 1645stat(c, ALLOC_FASTPATH); 1646} while (cmpxchg_local(c-freelist, object, object[c-offset]) 1647 != object); 1648#else 1649unsigned long flags; 1650 That code is part for SLUB_FASTPATH. I'm willing to test the patch, but don't know how fast I can find the time to do it, so my answer if your patch helps might be delayed until the weekend. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 17, 2008 9:25 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > There's the Bugzilla entry for it at > http://bugzilla.kernel.org/show_bug.cgi?id=9973 Thank you. > Please update it with the current information. Crash for 2.6.25-rc2-mm1 added. That one had a complete stacktrace, but the trace looks like others I already reported, so no real new information... :-( Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 17, 2008 9:25 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: There's the Bugzilla entry for it at http://bugzilla.kernel.org/show_bug.cgi?id=9973 Thank you. Please update it with the current information. Crash for 2.6.25-rc2-mm1 added. That one had a complete stacktrace, but the trace looks like others I already reported, so no real new information... :-( Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 15, 2008 10:23 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > Ok, > this kernel is a winner. Sadly not for me: [ 5282.056415] [ cut here ] [ 5282.059757] kernel BUG at lib/list_debug.c:33! [ 5282.062055] invalid opcode: [1] SMP [ 5282.062055] CPU 3 [ 5282.062055] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc tveeprom usbhid pata_amd i2c_nforce2 hid sg [ 5282.062055] Pid: 12937, comm: sed Not tainted 2.6.25-rc2 #1 [ 5282.062055] RIP: 0010:[] -> then the output from the serial console stopped. I was in X, so I could not see, if there was anything more on the real console. (gdb) list *0x803bffe4 0x803bffe4 is in __list_add (lib/list_debug.c:33). 28 } 29 if (unlikely(prev->next != next)) { 30 printk(KERN_ERR "list_add corruption. prev->next should be " 31 "next (%p), but was %p. (prev=%p).\n", 32 next, prev->next, prev); 33 BUG(); 34 } 35 next->prev = new; 36 new->next = next; 37 new->prev = prev; For more on this problem see http://marc.info/?l=linux-kernel=120293042005445 I will now try 2.6.25-rc2-mm1. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc2
On Feb 15, 2008 10:23 PM, Linus Torvalds [EMAIL PROTECTED] wrote: Ok, this kernel is a winner. Sadly not for me: [ 5282.056415] [ cut here ] [ 5282.059757] kernel BUG at lib/list_debug.c:33! [ 5282.062055] invalid opcode: [1] SMP [ 5282.062055] CPU 3 [ 5282.062055] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc tveeprom usbhid pata_amd i2c_nforce2 hid sg [ 5282.062055] Pid: 12937, comm: sed Not tainted 2.6.25-rc2 #1 [ 5282.062055] RIP: 0010:[803bffe4] - then the output from the serial console stopped. I was in X, so I could not see, if there was anything more on the real console. (gdb) list *0x803bffe4 0x803bffe4 is in __list_add (lib/list_debug.c:33). 28 } 29 if (unlikely(prev-next != next)) { 30 printk(KERN_ERR list_add corruption. prev-next should be 31 next (%p), but was %p. (prev=%p).\n, 32 next, prev-next, prev); 33 BUG(); 34 } 35 next-prev = new; 36 new-next = next; 37 new-prev = prev; For more on this problem see http://marc.info/?l=linux-kernelm=120293042005445 I will now try 2.6.25-rc2-mm1. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.25-rc1
On Feb 11, 2008 11:15 PM, Andrew Morton <[EMAIL PROTECTED]> wrote: > On Mon, 11 Feb 2008 22:46:18 +0100 > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > > > On Feb 11, 2008 1:44 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > So give it all a good testing. > > > > My mm-mystery-crash has now sneaked into mainline: > > hm, I don't remember that. Last report: http://marc.info/?l=linux-kernel=120129854023202 > > [ 1463.829078] BUG: unable to handle kernel NULL pointer dereference > > at 0378 > > [ 1463.832141] IP: [] ether1394_dg_complete+0x28/0xa0 > > [ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0 > > [ 1463.836148] Oops: [1] SMP > > [ 1463.836148] CPU 0 > > [ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner > > tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 > > tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 > > v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg > > i2c_nforce2 hid pata_amd > > [ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1 > > [ 1463.836148] RIP: 0010:[] [] > > ether1394_dg_complete+0x28/0xa0 > > [ 1463.836148] RSP: :81007eeb1e80 EFLAGS: 00010282 > > [ 1463.836148] RAX: RBX: RCX: > > 0001 > > [ 1463.836148] RDX: 81004bc62d80 RSI: RDI: > > 810051873e40 > > [ 1463.836148] RBP: 81007eeb1eb0 R08: R09: > > 0001 > > [ 1463.836148] R10: 0001 R11: 0001 R12: > > 810051873e40 > > [ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: > > 810051873e40 > > [ 1463.836148] FS: 7f727d6d4700() GS:807e8000() > > knlGS: > > [ 1463.836148] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > > [ 1463.836148] CR2: 0378 CR3: 79559000 CR4: > > 06e0 > > [ 1463.836148] DR0: DR1: DR2: > > > > [ 1463.836148] DR3: DR6: 0ff0 DR7: > > 0400 > > [ 1463.836148] Process khpsbpkt (pid: 519, threadinfo > > 81007eeb, task 81007ee9e000) > > [ 1463.836148] Stack: 81007eeb1e90 81004bc62b40 > > 810051873e40 > > [ 1463.836148] 0001 81007eeb1ee0 > > 8047b233 > > [ 1463.836148] 81007eeb1ec8 81007eeb1ef0 8046c280 > > 81007ff6df10 > > [ 1463.836148] Call Trace: > > [ 1463.836148] [] ether1394_complete_cb+0xb3/0xd0 > > [ 1463.836148] [] ? hpsbpkt_thread+0x0/0x140 > > [ 1463.836148] [] hpsbpkt_thread+0xbb/0x140 > > [ 1463.836148] [] kthread+0x4d/0x80 > > [ 1463.836148] [] child_rip+0xa/0x12 > > [ 1463.836148] [] ? restore_args+0x0/0x31 > > [ 1463.836148] [] ? kthread+0x0/0x80 > > [ 1463.836148] [] ? child_rip+0x0/0x12 > > [ 1463.836148] > > [ 1463.836148] > > [ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c > > 89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f > > 49 8b 45 20 <4c> 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8 > > 41 f0 > > [ 1463.836148] RIP [] ether1394_dg_complete+0x28/0xa0 > > [ 1463.836148] RSP > > [ 1463.836148] CR2: 0378 > > [ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is > > probably too slow > > [ 1463.839250] BUG: unable to handle kernel NULL pointer dereference > > at > > [ 1463.841549] IP: [] kmem_cache_alloc_node+0x6d/0xa0 > > [ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0 > > [ 1463.846148] Oops: [2] SMP > > [ 1463.846148] CPU 0 > > [ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner > > tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 > > tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 > > v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg > > i2c_nforce2 hid pata_amd > > [ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G D 2.6.25-rc1 #1 > > [ 1463.846148] RIP: 0010:[] [] > > kmem_cache_alloc_node+0x6d/0xa0 > > [ 1463.846148] RSP: :80871ae0 EFLAGS: 00010046 > > [ 1463.846148] RAX: RBX: 810001006820 RCX: > > 8052c549 > > [ 1463.846148] RDX: RSI: RDI: > > 807fbec0 > > [ 1463.846148] RBP: 80871b00 R08: 05e0 R09: > > 00
Re: Linux 2.6.25-rc1
On Feb 11, 2008 11:15 PM, Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 11 Feb 2008 22:46:18 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: On Feb 11, 2008 1:44 AM, Linus Torvalds [EMAIL PROTECTED] wrote: So give it all a good testing. My mm-mystery-crash has now sneaked into mainline: hm, I don't remember that. Last report: http://marc.info/?l=linux-kernelm=120129854023202 [ 1463.829078] BUG: unable to handle kernel NULL pointer dereference at 0378 [ 1463.832141] IP: [8047af18] ether1394_dg_complete+0x28/0xa0 [ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0 [ 1463.836148] Oops: [1] SMP [ 1463.836148] CPU 0 [ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg i2c_nforce2 hid pata_amd [ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1 [ 1463.836148] RIP: 0010:[8047af18] [8047af18] ether1394_dg_complete+0x28/0xa0 [ 1463.836148] RSP: :81007eeb1e80 EFLAGS: 00010282 [ 1463.836148] RAX: RBX: RCX: 0001 [ 1463.836148] RDX: 81004bc62d80 RSI: RDI: 810051873e40 [ 1463.836148] RBP: 81007eeb1eb0 R08: R09: 0001 [ 1463.836148] R10: 0001 R11: 0001 R12: 810051873e40 [ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: 810051873e40 [ 1463.836148] FS: 7f727d6d4700() GS:807e8000() knlGS: [ 1463.836148] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 1463.836148] CR2: 0378 CR3: 79559000 CR4: 06e0 [ 1463.836148] DR0: DR1: DR2: [ 1463.836148] DR3: DR6: 0ff0 DR7: 0400 [ 1463.836148] Process khpsbpkt (pid: 519, threadinfo 81007eeb, task 81007ee9e000) [ 1463.836148] Stack: 81007eeb1e90 81004bc62b40 810051873e40 [ 1463.836148] 0001 81007eeb1ee0 8047b233 [ 1463.836148] 81007eeb1ec8 81007eeb1ef0 8046c280 81007ff6df10 [ 1463.836148] Call Trace: [ 1463.836148] [8047b233] ether1394_complete_cb+0xb3/0xd0 [ 1463.836148] [8046c280] ? hpsbpkt_thread+0x0/0x140 [ 1463.836148] [8046c33b] hpsbpkt_thread+0xbb/0x140 [ 1463.836148] [8024aead] kthread+0x4d/0x80 [ 1463.836148] [8020c578] child_rip+0xa/0x12 [ 1463.836148] [8020bc8f] ? restore_args+0x0/0x31 [ 1463.836148] [8024ae60] ? kthread+0x0/0x80 [ 1463.836148] [8020c56e] ? child_rip+0x0/0x12 [ 1463.836148] [ 1463.836148] [ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f 49 8b 45 20 4c 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8 41 f0 [ 1463.836148] RIP [8047af18] ether1394_dg_complete+0x28/0xa0 [ 1463.836148] RSP 81007eeb1e80 [ 1463.836148] CR2: 0378 [ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is probably too slow [ 1463.839250] BUG: unable to handle kernel NULL pointer dereference at [ 1463.841549] IP: [80296d1d] kmem_cache_alloc_node+0x6d/0xa0 [ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0 [ 1463.846148] Oops: [2] SMP [ 1463.846148] CPU 0 [ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg i2c_nforce2 hid pata_amd [ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G D 2.6.25-rc1 #1 [ 1463.846148] RIP: 0010:[80296d1d] [80296d1d] kmem_cache_alloc_node+0x6d/0xa0 [ 1463.846148] RSP: :80871ae0 EFLAGS: 00010046 [ 1463.846148] RAX: RBX: 810001006820 RCX: 8052c549 [ 1463.846148] RDX: RSI: RDI: 807fbec0 [ 1463.846148] RBP: 80871b00 R08: 05e0 R09: ffc1 [ 1463.846148] R10: 0001 R11: R12: [ 1463.846148] R13: 0020 R14: 0020 R15: 807fbec0 - here the output from the serial console stopped. [snip] But this is a crash inside the 1394 code. So if you're getting a crash with plain-old-ethernet then it is a different crash. It'd be good if we could see the oops trace for that one too please. 2.6.24-rc3-mm2: http://marc.info/?l=linux-kernelm=119636996902805
Re: Linux 2.6.25-rc1
On Feb 11, 2008 1:44 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote: > So give it all a good testing. My mm-mystery-crash has now sneaked into mainline: [ 1463.829078] BUG: unable to handle kernel NULL pointer dereference at 0378 [ 1463.832141] IP: [] ether1394_dg_complete+0x28/0xa0 [ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0 [ 1463.836148] Oops: [1] SMP [ 1463.836148] CPU 0 [ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg i2c_nforce2 hid pata_amd [ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1 [ 1463.836148] RIP: 0010:[] [] ether1394_dg_complete+0x28/0xa0 [ 1463.836148] RSP: :81007eeb1e80 EFLAGS: 00010282 [ 1463.836148] RAX: RBX: RCX: 0001 [ 1463.836148] RDX: 81004bc62d80 RSI: RDI: 810051873e40 [ 1463.836148] RBP: 81007eeb1eb0 R08: R09: 0001 [ 1463.836148] R10: 0001 R11: 0001 R12: 810051873e40 [ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: 810051873e40 [ 1463.836148] FS: 7f727d6d4700() GS:807e8000() knlGS: [ 1463.836148] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 1463.836148] CR2: 0378 CR3: 79559000 CR4: 06e0 [ 1463.836148] DR0: DR1: DR2: [ 1463.836148] DR3: DR6: 0ff0 DR7: 0400 [ 1463.836148] Process khpsbpkt (pid: 519, threadinfo 81007eeb, task 81007ee9e000) [ 1463.836148] Stack: 81007eeb1e90 81004bc62b40 810051873e40 [ 1463.836148] 0001 81007eeb1ee0 8047b233 [ 1463.836148] 81007eeb1ec8 81007eeb1ef0 8046c280 81007ff6df10 [ 1463.836148] Call Trace: [ 1463.836148] [] ether1394_complete_cb+0xb3/0xd0 [ 1463.836148] [] ? hpsbpkt_thread+0x0/0x140 [ 1463.836148] [] hpsbpkt_thread+0xbb/0x140 [ 1463.836148] [] kthread+0x4d/0x80 [ 1463.836148] [] child_rip+0xa/0x12 [ 1463.836148] [] ? restore_args+0x0/0x31 [ 1463.836148] [] ? kthread+0x0/0x80 [ 1463.836148] [] ? child_rip+0x0/0x12 [ 1463.836148] [ 1463.836148] [ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f 49 8b 45 20 <4c> 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8 41 f0 [ 1463.836148] RIP [] ether1394_dg_complete+0x28/0xa0 [ 1463.836148] RSP [ 1463.836148] CR2: 0378 [ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is probably too slow [ 1463.839250] BUG: unable to handle kernel NULL pointer dereference at [ 1463.841549] IP: [] kmem_cache_alloc_node+0x6d/0xa0 [ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0 [ 1463.846148] Oops: [2] SMP [ 1463.846148] CPU 0 [ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg i2c_nforce2 hid pata_amd [ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G D 2.6.25-rc1 #1 [ 1463.846148] RIP: 0010:[] [] kmem_cache_alloc_node+0x6d/0xa0 [ 1463.846148] RSP: :80871ae0 EFLAGS: 00010046 [ 1463.846148] RAX: RBX: 810001006820 RCX: 8052c549 [ 1463.846148] RDX: RSI: RDI: 807fbec0 [ 1463.846148] RBP: 80871b00 R08: 05e0 R09: ffc1 [ 1463.846148] R10: 0001 R11: R12: [ 1463.846148] R13: 0020 R14: 0020 R15: 807fbec0 -> here the output from the serial console stopped. Caps lock and Scroll lock where flashing again and as it hit a 'good' spot during the installing of the package this crash resulted in a corrupted ld.so.cache and damage several housekeeping files of the package manager. :-( Last good mm was 2.6.24-rc2-mm1, the next booting mm was 2.6.24-rc3-mm2 and that version had these "random" crashes. Last good mainline was 2.6.24-rc7 that I was testing with the new iommu patches that where added to 2.6.24-rc3-mm2. I did a partly bisect of 2.6.24-rc6-mm1 that narrow it to this range: 2.6.24-rc6 + mm-patches up to (including) git.nfsd -> worked 2.6.24-rc6 + mm-patches up to (including) git.xfs -> crashed I think the only added patch between rc2-mm1 and rc3-mm2 in that range where the iommu changes that I later ruled out. That leaves some git trees as suspects: git-ocfs2.patch git-selinux.patch git-s390.patch git-sched.patch git-sh.patch git-scsi-misc.patch git-unionfs.patch git-v9fs.patch
Re: Linux 2.6.25-rc1
On Feb 11, 2008 1:44 AM, Linus Torvalds [EMAIL PROTECTED] wrote: So give it all a good testing. My mm-mystery-crash has now sneaked into mainline: [ 1463.829078] BUG: unable to handle kernel NULL pointer dereference at 0378 [ 1463.832141] IP: [8047af18] ether1394_dg_complete+0x28/0xa0 [ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0 [ 1463.836148] Oops: [1] SMP [ 1463.836148] CPU 0 [ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg i2c_nforce2 hid pata_amd [ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1 [ 1463.836148] RIP: 0010:[8047af18] [8047af18] ether1394_dg_complete+0x28/0xa0 [ 1463.836148] RSP: :81007eeb1e80 EFLAGS: 00010282 [ 1463.836148] RAX: RBX: RCX: 0001 [ 1463.836148] RDX: 81004bc62d80 RSI: RDI: 810051873e40 [ 1463.836148] RBP: 81007eeb1eb0 R08: R09: 0001 [ 1463.836148] R10: 0001 R11: 0001 R12: 810051873e40 [ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: 810051873e40 [ 1463.836148] FS: 7f727d6d4700() GS:807e8000() knlGS: [ 1463.836148] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 1463.836148] CR2: 0378 CR3: 79559000 CR4: 06e0 [ 1463.836148] DR0: DR1: DR2: [ 1463.836148] DR3: DR6: 0ff0 DR7: 0400 [ 1463.836148] Process khpsbpkt (pid: 519, threadinfo 81007eeb, task 81007ee9e000) [ 1463.836148] Stack: 81007eeb1e90 81004bc62b40 810051873e40 [ 1463.836148] 0001 81007eeb1ee0 8047b233 [ 1463.836148] 81007eeb1ec8 81007eeb1ef0 8046c280 81007ff6df10 [ 1463.836148] Call Trace: [ 1463.836148] [8047b233] ether1394_complete_cb+0xb3/0xd0 [ 1463.836148] [8046c280] ? hpsbpkt_thread+0x0/0x140 [ 1463.836148] [8046c33b] hpsbpkt_thread+0xbb/0x140 [ 1463.836148] [8024aead] kthread+0x4d/0x80 [ 1463.836148] [8020c578] child_rip+0xa/0x12 [ 1463.836148] [8020bc8f] ? restore_args+0x0/0x31 [ 1463.836148] [8024ae60] ? kthread+0x0/0x80 [ 1463.836148] [8020c56e] ? child_rip+0x0/0x12 [ 1463.836148] [ 1463.836148] [ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f 49 8b 45 20 4c 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8 41 f0 [ 1463.836148] RIP [8047af18] ether1394_dg_complete+0x28/0xa0 [ 1463.836148] RSP 81007eeb1e80 [ 1463.836148] CR2: 0378 [ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is probably too slow [ 1463.839250] BUG: unable to handle kernel NULL pointer dereference at [ 1463.841549] IP: [80296d1d] kmem_cache_alloc_node+0x6d/0xa0 [ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0 [ 1463.846148] Oops: [2] SMP [ 1463.846148] CPU 0 [ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32 v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg i2c_nforce2 hid pata_amd [ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G D 2.6.25-rc1 #1 [ 1463.846148] RIP: 0010:[80296d1d] [80296d1d] kmem_cache_alloc_node+0x6d/0xa0 [ 1463.846148] RSP: :80871ae0 EFLAGS: 00010046 [ 1463.846148] RAX: RBX: 810001006820 RCX: 8052c549 [ 1463.846148] RDX: RSI: RDI: 807fbec0 [ 1463.846148] RBP: 80871b00 R08: 05e0 R09: ffc1 [ 1463.846148] R10: 0001 R11: R12: [ 1463.846148] R13: 0020 R14: 0020 R15: 807fbec0 - here the output from the serial console stopped. Caps lock and Scroll lock where flashing again and as it hit a 'good' spot during the installing of the package this crash resulted in a corrupted ld.so.cache and damage several housekeeping files of the package manager. :-( Last good mm was 2.6.24-rc2-mm1, the next booting mm was 2.6.24-rc3-mm2 and that version had these random crashes. Last good mainline was 2.6.24-rc7 that I was testing with the new iommu patches that where added to 2.6.24-rc3-mm2. I did a partly bisect of 2.6.24-rc6-mm1 that narrow it to this range: 2.6.24-rc6 + mm-patches up to (including) git.nfsd - worked 2.6.24-rc6 + mm-patches up to (including) git.xfs - crashed I think the only added patch between rc2-mm1 and rc3-mm2 in
Re: 2.6.24-rc8-mm1
On Jan 17, 2008 11:35 AM, Andrew Morton <[EMAIL PROTECTED]> wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8/2.6.24-rc8-mm1/ I'm still seeing my mystery-crash that I had since 2.6.24-rc3-mm2. The crashed kernel was 2.6.24-rc8-mm1 with the following patches: * personal fix for the "do_md_run returned -22"-problem I'm just moving the analyze_sbs(mddev); above the test. * git-sched-fix-bug_on.patch * hotfix-libata-scsi-corruption.patch The crash (captured via serial console): Jan 25 21:40:01 treogen cron[6553]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons ) Jan 25 20:40:44 treogen syslog-ng[4839]: I/O error occurred while writing; fd='5', error='Input/output error (5)' [ 1242.319555] [ cut here ] [ 1242.319557] kernel BUG at lib/list_debug.c:33! [ 1242.319558] invalid opcode: [1] SMP [ 1242.319560] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map [ 1242.319562] CPU 3 [ 1242.319563] Modules linked in: The cursor on the receiving machine stayed after the : in the last line, the crashed machine blinked caps lock and scroll lock. I don't have a clue what the syslog-ng error is about or why this line is one hour to early. At 20:40 this kernel wasn't even build yet and syslog-ng started with the correct timezone: Jan 25 21:26:26 treogen syslog-ng[4839]: syslog-ng starting up; version='2.0.6' As I'm seeing this bug during times of both network and hard disk activity, could this be related to the problem discussed in the thread "[PATCH rc8-mm1] hotfix libata-scsi corruption"? The line fixed in the mm-hotfix seems to be to new to cause this in -rc3-mm2, but these alignment problems seem to touch more than this and I'm not clear one how old this might be. (If this matters: The crashing system is running the smartd daemon from smartmontools version 5.37) I hope I will have time to try git-misc-tree on sunday... Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
Sorry for the *really* late answer, but I did not have any time to do linux things the last weeks. :-( On Jan 7, 2008 7:16 AM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > On Sun, 6 Jan 2008 21:03:42 +0100 > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > > On Jan 6, 2008 2:33 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > > On Sun, 6 Jan 2008 12:35:35 +0100 > > > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > > > > On Jan 6, 2008 12:23 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > > > And double using something does fit with the errors I'm seeing... > > > > > > > > > Can you try the patch to revert my IOMMU changes? > > > > > > > > > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html -> This is the revert-patch I'm talking about later > > > > Testing for this bug is a little bit slow, as I'm compiling ~100 > > > > packages trying to trigger it. > > > > If my current testrun with the patch from > > > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html > > > > crashes, I will revert the hole IOMMU changes with above patch and try > > > > again. > > > > > > Thanks for testing, > > > > OK, I'm still testing this, but after 95 completed packages I'm rather > > certain that reverting the IOMMU changes with this patch fixes my > > problem. > > I didn't have time to look more into this, so I can't offer any > > concrete ideas where the bug is. Until my last mail from 7. Jan this was true, that I was not able to crash 2.6.24-rc6-mm1 with above patch. But after testing 2.6.24-rc7 with only the IOMMU changes applied it did crash once again. After looking at the patch that seems rather expected as it only touches powerpc code. (I only looked at its diffstat after testing it, so I was not aware of that fact during testing) > > If you send more patches, I'm willing to test them, but it might take > > some more time during the next week. > > Can you try 2.6.24-rc7 + the IOMMU changes? > > The patches are available at: > > http://www.kernel.org/pub/linux/kernel/people/tomo/iommu/ > > Or if you prefer the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git > iommu-sg-fixes > > > > I've looked at the changes to GART but they are straightforward and > don't look wrong... The resulting 2.6.24-rc7 kernel worked for me. I compiled 146 packages without a crash. Today I finally had some time for debugging again and tried the new 2.6.24-rc8-mm1. The crash is still there, I will report that crash in current thread. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc8-mm1
On Jan 17, 2008 11:35 AM, Andrew Morton [EMAIL PROTECTED] wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8/2.6.24-rc8-mm1/ I'm still seeing my mystery-crash that I had since 2.6.24-rc3-mm2. The crashed kernel was 2.6.24-rc8-mm1 with the following patches: * personal fix for the do_md_run returned -22-problem I'm just moving the analyze_sbs(mddev); above the test. * git-sched-fix-bug_on.patch * hotfix-libata-scsi-corruption.patch The crash (captured via serial console): Jan 25 21:40:01 treogen cron[6553]: (root) CMD (test -x /usr/sbin/run-crons /usr/sbin/run-crons ) Jan 25 20:40:44 treogen syslog-ng[4839]: I/O error occurred while writing; fd='5', error='Input/output error (5)' [ 1242.319555] [ cut here ] [ 1242.319557] kernel BUG at lib/list_debug.c:33! [ 1242.319558] invalid opcode: [1] SMP [ 1242.319560] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map [ 1242.319562] CPU 3 [ 1242.319563] Modules linked in: The cursor on the receiving machine stayed after the : in the last line, the crashed machine blinked caps lock and scroll lock. I don't have a clue what the syslog-ng error is about or why this line is one hour to early. At 20:40 this kernel wasn't even build yet and syslog-ng started with the correct timezone: Jan 25 21:26:26 treogen syslog-ng[4839]: syslog-ng starting up; version='2.0.6' As I'm seeing this bug during times of both network and hard disk activity, could this be related to the problem discussed in the thread [PATCH rc8-mm1] hotfix libata-scsi corruption? The line fixed in the mm-hotfix seems to be to new to cause this in -rc3-mm2, but these alignment problems seem to touch more than this and I'm not clear one how old this might be. (If this matters: The crashing system is running the smartd daemon from smartmontools version 5.37) I hope I will have time to try git-misc-tree on sunday... Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
Sorry for the *really* late answer, but I did not have any time to do linux things the last weeks. :-( On Jan 7, 2008 7:16 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Sun, 6 Jan 2008 21:03:42 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: On Jan 6, 2008 2:33 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Sun, 6 Jan 2008 12:35:35 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: On Jan 6, 2008 12:23 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote: And double using something does fit with the errors I'm seeing... Can you try the patch to revert my IOMMU changes? http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html - This is the revert-patch I'm talking about later Testing for this bug is a little bit slow, as I'm compiling ~100 packages trying to trigger it. If my current testrun with the patch from http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html crashes, I will revert the hole IOMMU changes with above patch and try again. Thanks for testing, OK, I'm still testing this, but after 95 completed packages I'm rather certain that reverting the IOMMU changes with this patch fixes my problem. I didn't have time to look more into this, so I can't offer any concrete ideas where the bug is. Until my last mail from 7. Jan this was true, that I was not able to crash 2.6.24-rc6-mm1 with above patch. But after testing 2.6.24-rc7 with only the IOMMU changes applied it did crash once again. After looking at the patch that seems rather expected as it only touches powerpc code. (I only looked at its diffstat after testing it, so I was not aware of that fact during testing) If you send more patches, I'm willing to test them, but it might take some more time during the next week. Can you try 2.6.24-rc7 + the IOMMU changes? The patches are available at: http://www.kernel.org/pub/linux/kernel/people/tomo/iommu/ Or if you prefer the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git iommu-sg-fixes I've looked at the changes to GART but they are straightforward and don't look wrong... The resulting 2.6.24-rc7 kernel worked for me. I compiled 146 packages without a crash. Today I finally had some time for debugging again and tried the new 2.6.24-rc8-mm1. The crash is still there, I will report that crash in current thread. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 2:33 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > On Sun, 6 Jan 2008 12:35:35 +0100 > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > > On Jan 6, 2008 12:23 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > > On Sun, 6 Jan 2008 11:41:10 +0100 > > > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > > > > I will applie your patch and see if this hunk from > > > > find_next_zero_area() makes a difference: > > > > > > > >end = index + nr; > > > > - if (end > size) > > > > + if (end >= size) > > > > return -1; -> that might still have made a difference, but ... > > > > - for (i = index + 1; i < end; i++) { > > > > + for (i = index; i < end; i++) { ... as you say below, the test for the index position is only needed if index is modified after find_next_zero_bit(). > > > > if (test_bit(i, map)) { > > > > > > The patch should not make a difference for X86_64. > > > > Hmm... > > arch/x86/kernel/pci-gart_64.c: > > alloc_iommu() calls iommu_area_alloc() > > lib/iommu-helper.c: > > iommu_area_alloc() calls find_next_zero_area() > > -> so the above code should be called even on X86_64 > > Oops, I meant that the patch fixes the align allocation (non zero > align_mask case). X86_64 doesn't use the align allocation. > > > > And the change in the for loop means that 'index' will now be tested, > > but with the old code it was not. > > With the old code, 'index' is tested by find_next_zero_bit. > > With the new code and non zero align_mask case, 'index' is not tested > by find_next_zero_bit. So test_bit needs to start with 'index'. > > So If I understand the correctly, this patch should not make a > difference for x86_64 though I might miss something. You did not miss anything. After 18 packages my system crashed again. > > And double using something does fit with the errors I'm seeing... > > > > > Can you try the patch to revert my IOMMU changes? > > > > > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html > > > > Testing for this bug is a little bit slow, as I'm compiling ~100 > > packages trying to trigger it. > > If my current testrun with the patch from > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html > > crashes, I will revert the hole IOMMU changes with above patch and try > > again. > > Thanks for testing, OK, I'm still testing this, but after 95 completed packages I'm rather certain that reverting the IOMMU changes with this patch fixes my problem. I didn't have time to look more into this, so I can't offer any concrete ideas where the bug is. If you send more patches, I'm willing to test them, but it might take some more time during the next week. Thanks for looking into this. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 12:23 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > On Sun, 6 Jan 2008 11:41:10 +0100 > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > > I will applie your patch and see if this hunk from > > find_next_zero_area() makes a difference: > > > >end = index + nr; > > - if (end > size) > > + if (end >= size) > > return -1; > > - for (i = index + 1; i < end; i++) { > > + for (i = index; i < end; i++) { > > if (test_bit(i, map)) { > > The patch should not make a difference for X86_64. Hmm... arch/x86/kernel/pci-gart_64.c: alloc_iommu() calls iommu_area_alloc() lib/iommu-helper.c: iommu_area_alloc() calls find_next_zero_area() -> so the above code should be called even on X86_64 And the change in the for loop means that 'index' will now be tested, but with the old code it was not. And double using something does fit with the errors I'm seeing... > Can you try the patch to revert my IOMMU changes? > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html Testing for this bug is a little bit slow, as I'm compiling ~100 packages trying to trigger it. If my current testrun with the patch from http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html crashes, I will revert the hole IOMMU changes with above patch and try again. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 4:28 AM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > On Sat, 5 Jan 2008 17:25:24 -0800 > Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Sat, 5 Jan 2008 23:10:17 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> > > wrote: > > > But the cause of my mail is the following question: > > > Regarding my "iommu-sg-merging-patches are new in -rc3-mm and could be > > > the cause"-suspicion I looked at these patches and came across these > > > hunks: > > > > > > This is removed from arch/x86/lib/bitstr_64.c: > > > -/* Find string of zero bits in a bitmap */ > > > -unsigned long > > > -find_next_zero_string(unsigned long *bitmap, long start, long nbits, int > > > len) > > > -{ > > > - unsigned long n, end, i; > > > - > > > - again: > > > - n = find_next_zero_bit(bitmap, nbits, start); > > > - if (n == -1) > > > - return -1; > > > - > > > - /* could test bitsliced, but it's hardly worth it */ > > > - end = n+len; > > > - if (end > nbits) > > > - return -1; > > > - for (i = n+1; i < end; i++) { > > > - if (test_bit(i, bitmap)) { > > > - start = i+1; > > > - goto again; > > > - } > > > - } > > > - return n; > > > -} > > > > > > This is added to lib/iommu-helper.c: > > > +static unsigned long find_next_zero_area(unsigned long *map, > > > +unsigned long size, > > > +unsigned long start, > > > +unsigned int nr) > > > +{ > > > + unsigned long index, end, i; > > > +again: > > > + index = find_next_zero_bit(map, size, start); > > > + end = index + nr; > > > + if (end > size) > > > + return -1; > > > + for (i = index + 1; i < end; i++) { > > > + if (test_bit(i, map)) { > > > + start = i+1; > > > + goto again; > > > + } > > > + } > > > + return index; > > > +} > > > > > > The old version checks, if find_next_zero_bit returns -1, the new > > > version doesn't do this. > > > Is this intended and can find_next_zero_bit never fail? > > > Hmm... but in the worst case it should only loop forever if I'm > > > reading this right (index = -1 => for-loop counts from 0 to nr, if any > > > bit is set it will jump to "again:" and if the next call to > > > find_next_zero_bit also fails, its an endless loop) > > find_next_zero_bit returns -1? > > It seems that x86_64 doesn't. I'm sorry. I didn't look into find_next_zero_bit, I only noted that the old version did check for -1 and the new one didn't. Obviously the old check was superfluous. > POWER and SPARC64 IOMMUs use > find_next_zero_bit too but both doesn't check if find_next_zero_bit > returns -1. If find_next_zero_bit fails, it returns size. So it > doesn't leads to an endless loop. Yes, this can't happen. It was a wrong assumption on my part. > But this patch has other bugs that break POWER IOMMUs. > > If you use the IOMMUs on POWER, please try the following patch: I'm using CONFIG_GART_IOMMU=y on x86_64. > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html I also noted the line "index = (index + align_mask) & ~align_mask;" in iommu_area_alloc() and didn't understand what this was trying to do and how this should work, but as arch/x86/kernel/pci-gart_64.c always uses 0 as align_mask I just ignored it. I will applie your patch and see if this hunk from find_next_zero_area() makes a difference: end = index + nr; - if (end > size) + if (end >= size) return -1; - for (i = index + 1; i < end; i++) { + for (i = index; i < end; i++) { if (test_bit(i, map)) { Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 9:27 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On Sat, Jan 05, 2008 at 03:52:32PM +0100, Torsten Kaiser wrote: > ... > > So my personal conclusion would be, that someone is writing to memory > > that he no longer owns. Most probably 0-bytes. (the complete_routine > > got NULLed and the warning about dst->__refcnt being 0). > > > > Use-after-free or something else? > > I agree: your conclusion seems to be the most probable explanation for > this. Then it could be really hard to solve this without bisection or > something similar. But there is some probabability this something could > try kfree later too, but simply this list debugging triggers earlier. As for example in the case when it dies in ieee1394-thread the list is so corrupted that it will die anyway. But I might try this anyway, as I don't really have a better idee. > > > > If you think some other slub_debug might catch it, I would try this... > > You can try to add "U" to these other slub_debug options. As a matter > of fact, if your above diagnose is right, it seems you risk to damage > your system or even the box with these tests, so if you want to > continue, you should probably turn any possible debugging on (not in > mm only). I did not add U, because I thought that would only needed to trace memory leaks. And I hoped that using P (poison) would catch any later use (after free). > BTW, you've written that some debugging options seem to delay the bug. > Since they often change sizes of some structures than such wrong > writes could have some 'safer' offsets. So, this could really delay > e.g. these list's bugs, but maybe this could also let to stay 'alive' > to such wrong kfree? I think this bug is highly timing dependent. Its not always the same package that dies and as this is a SMP system I would guess two CPUs using the same data will trigger this. And using the poison-option will definitily slow the system down and mess up the timings. What also speaks against the 'safer' offsets is, that after adding my notfreed-byte to skbuff the bug still triggered in the same way. I'm currently looking at http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html ,trying to understand if this is relevant for me on x86_64. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 9:27 AM, Jarek Poplawski [EMAIL PROTECTED] wrote: On Sat, Jan 05, 2008 at 03:52:32PM +0100, Torsten Kaiser wrote: ... So my personal conclusion would be, that someone is writing to memory that he no longer owns. Most probably 0-bytes. (the complete_routine got NULLed and the warning about dst-__refcnt being 0). Use-after-free or something else? I agree: your conclusion seems to be the most probable explanation for this. Then it could be really hard to solve this without bisection or something similar. But there is some probabability this something could try kfree later too, but simply this list debugging triggers earlier. As for example in the case when it dies in ieee1394-thread the list is so corrupted that it will die anyway. But I might try this anyway, as I don't really have a better idee. If you think some other slub_debug might catch it, I would try this... You can try to add U to these other slub_debug options. As a matter of fact, if your above diagnose is right, it seems you risk to damage your system or even the box with these tests, so if you want to continue, you should probably turn any possible debugging on (not in mm only). I did not add U, because I thought that would only needed to trace memory leaks. And I hoped that using P (poison) would catch any later use (after free). BTW, you've written that some debugging options seem to delay the bug. Since they often change sizes of some structures than such wrong writes could have some 'safer' offsets. So, this could really delay e.g. these list's bugs, but maybe this could also let to stay 'alive' to such wrong kfree? I think this bug is highly timing dependent. Its not always the same package that dies and as this is a SMP system I would guess two CPUs using the same data will trigger this. And using the poison-option will definitily slow the system down and mess up the timings. What also speaks against the 'safer' offsets is, that after adding my notfreed-byte to skbuff the bug still triggered in the same way. I'm currently looking at http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html ,trying to understand if this is relevant for me on x86_64. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 4:28 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Sat, 5 Jan 2008 17:25:24 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sat, 5 Jan 2008 23:10:17 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: But the cause of my mail is the following question: Regarding my iommu-sg-merging-patches are new in -rc3-mm and could be the cause-suspicion I looked at these patches and came across these hunks: This is removed from arch/x86/lib/bitstr_64.c: -/* Find string of zero bits in a bitmap */ -unsigned long -find_next_zero_string(unsigned long *bitmap, long start, long nbits, int len) -{ - unsigned long n, end, i; - - again: - n = find_next_zero_bit(bitmap, nbits, start); - if (n == -1) - return -1; - - /* could test bitsliced, but it's hardly worth it */ - end = n+len; - if (end nbits) - return -1; - for (i = n+1; i end; i++) { - if (test_bit(i, bitmap)) { - start = i+1; - goto again; - } - } - return n; -} This is added to lib/iommu-helper.c: +static unsigned long find_next_zero_area(unsigned long *map, +unsigned long size, +unsigned long start, +unsigned int nr) +{ + unsigned long index, end, i; +again: + index = find_next_zero_bit(map, size, start); + end = index + nr; + if (end size) + return -1; + for (i = index + 1; i end; i++) { + if (test_bit(i, map)) { + start = i+1; + goto again; + } + } + return index; +} The old version checks, if find_next_zero_bit returns -1, the new version doesn't do this. Is this intended and can find_next_zero_bit never fail? Hmm... but in the worst case it should only loop forever if I'm reading this right (index = -1 = for-loop counts from 0 to nr, if any bit is set it will jump to again: and if the next call to find_next_zero_bit also fails, its an endless loop) find_next_zero_bit returns -1? It seems that x86_64 doesn't. I'm sorry. I didn't look into find_next_zero_bit, I only noted that the old version did check for -1 and the new one didn't. Obviously the old check was superfluous. POWER and SPARC64 IOMMUs use find_next_zero_bit too but both doesn't check if find_next_zero_bit returns -1. If find_next_zero_bit fails, it returns size. So it doesn't leads to an endless loop. Yes, this can't happen. It was a wrong assumption on my part. But this patch has other bugs that break POWER IOMMUs. If you use the IOMMUs on POWER, please try the following patch: I'm using CONFIG_GART_IOMMU=y on x86_64. http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html I also noted the line index = (index + align_mask) ~align_mask; in iommu_area_alloc() and didn't understand what this was trying to do and how this should work, but as arch/x86/kernel/pci-gart_64.c always uses 0 as align_mask I just ignored it. I will applie your patch and see if this hunk from find_next_zero_area() makes a difference: end = index + nr; - if (end size) + if (end = size) return -1; - for (i = index + 1; i end; i++) { + for (i = index; i end; i++) { if (test_bit(i, map)) { Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 12:23 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Sun, 6 Jan 2008 11:41:10 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: I will applie your patch and see if this hunk from find_next_zero_area() makes a difference: end = index + nr; - if (end size) + if (end = size) return -1; - for (i = index + 1; i end; i++) { + for (i = index; i end; i++) { if (test_bit(i, map)) { The patch should not make a difference for X86_64. Hmm... arch/x86/kernel/pci-gart_64.c: alloc_iommu() calls iommu_area_alloc() lib/iommu-helper.c: iommu_area_alloc() calls find_next_zero_area() - so the above code should be called even on X86_64 And the change in the for loop means that 'index' will now be tested, but with the old code it was not. And double using something does fit with the errors I'm seeing... Can you try the patch to revert my IOMMU changes? http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html Testing for this bug is a little bit slow, as I'm compiling ~100 packages trying to trigger it. If my current testrun with the patch from http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html crashes, I will revert the hole IOMMU changes with above patch and try again. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 6, 2008 2:33 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Sun, 6 Jan 2008 12:35:35 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: On Jan 6, 2008 12:23 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Sun, 6 Jan 2008 11:41:10 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: I will applie your patch and see if this hunk from find_next_zero_area() makes a difference: end = index + nr; - if (end size) + if (end = size) return -1; - that might still have made a difference, but ... - for (i = index + 1; i end; i++) { + for (i = index; i end; i++) { ... as you say below, the test for the index position is only needed if index is modified after find_next_zero_bit(). if (test_bit(i, map)) { The patch should not make a difference for X86_64. Hmm... arch/x86/kernel/pci-gart_64.c: alloc_iommu() calls iommu_area_alloc() lib/iommu-helper.c: iommu_area_alloc() calls find_next_zero_area() - so the above code should be called even on X86_64 Oops, I meant that the patch fixes the align allocation (non zero align_mask case). X86_64 doesn't use the align allocation. And the change in the for loop means that 'index' will now be tested, but with the old code it was not. With the old code, 'index' is tested by find_next_zero_bit. With the new code and non zero align_mask case, 'index' is not tested by find_next_zero_bit. So test_bit needs to start with 'index'. So If I understand the correctly, this patch should not make a difference for x86_64 though I might miss something. You did not miss anything. After 18 packages my system crashed again. And double using something does fit with the errors I'm seeing... Can you try the patch to revert my IOMMU changes? http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html Testing for this bug is a little bit slow, as I'm compiling ~100 packages trying to trigger it. If my current testrun with the patch from http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html crashes, I will revert the hole IOMMU changes with above patch and try again. Thanks for testing, OK, I'm still testing this, but after 95 completed packages I'm rather certain that reverting the IOMMU changes with this patch fixes my problem. I didn't have time to look more into this, so I can't offer any concrete ideas where the bug is. If you send more patches, I'm willing to test them, but it might take some more time during the next week. Thanks for looking into this. Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 5, 2008 11:10 PM, Torsten Kaiser <[EMAIL PROTECTED]> wrote: > 2.6.24-rc6 + mm-patches up to git.battery (includes git-net and > git-netdev-all) worked for 110 packages, then I proclaimed it good. > 2.6.24-rc6 + mm-patches up to (including) git.nfsd is currently > getting testet (9 packages done...) That kernel did also work for all 110 packages. 2.6.24-rc6 + mm-patches up to (including) git.xfs -> crash [ 576.899332] [ cut here ] [ 576.903661] kernel BUG at lib/list_debug.c:33! [ 576.903661] invalid opcode: [1] SMP [ 576.903661] last sysfs file: /devices/system/cpu/cpu3/cache/index2/shared_cpu_map [ 576.903661] CPU 3 [ 576.903661] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc tveeprom videodev v4l2_common usbhid v4l1_compat sg hid i2c_nforce2 pata_amd [ 576.903661] Pid: 5559, comm: nfsv4-svc Not tainted 2.6.24-rc6-mm-git.xfs #2 [ 576.903661] RIP: 0010:[] [] __list_add+0x54/0x60 [ 576.903661] RSP: 0018:81007d4e1dc0 EFLAGS: 00010282 [ 576.903661] RAX: 0088 RBX: 81007e955800 RCX: fc6c7900 [ 576.903661] RDX: 81007d53eef0 RSI: 0001 RDI: 80760140 [ 576.903661] RBP: 81007d4e1dc0 R08: 0001 R09: [ 576.903661] R10: 810080062008 R11: 0001 R12: 81007ed00900 [ 576.903661] R13: 81007ed00938 R14: 81007ed00938 R15: 81007dd6f100 [ 576.903661] FS: 7f1b7e6a36f0() GS:81011ff1b780() knlGS: [ 576.903661] CS: 0010 DS: ES: CR0: 8005003b [ 576.903661] CR2: 7ffb28c2c000 CR3: 741ab000 CR4: 06e0 [ 576.903661] DR0: DR1: DR2: [ 576.903661] DR3: DR6: 0ff0 DR7: 0400 [ 576.903661] Process nfsv4-svc (pid: 5559, threadinfo 81007d4e, task 81007d53eef0) [ 576.903661] Stack: 81007d4e1e00 805c4dbb 81007ed00908 81007dd6f100 [ 576.903661] 81011ad7bc00 81007d458000 81007e955800 81007dd6f110 [ 576.903661] 81007d4e1e10 805c4ea7 81007d4e1ee0 805c5fd4 [ 576.903661] Call Trace: [ 576.903661] [] svc_xprt_enqueue+0x1ab/0x240 [ 576.903661] [] svc_xprt_received+0x17/0x20 [ 576.903661] [] svc_recv+0x394/0x7c0 [ 576.903661] [] svc_send+0xae/0xd0 [ 576.903661] [] default_wake_function+0x0/0x10 [ 576.903661] [] nfs_callback_svc+0x79/0x130 [ 576.903662] [] finish_task_switch+0xcc/0xe0 [ 576.903662] [] child_rip+0xa/0x12 [ 576.903662] [] restore_args+0x0/0x30 [ 576.903662] [] __svc_create_thread+0xdd/0x200 [ 576.903662] [] nfs_callback_svc+0x0/0x130 [ 576.903662] [] child_rip+0x0/0x12 [ 576.903662] [ 576.903662] [ 576.903662] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16 48 89 e5 e8 [ 576.903662] RIP [] __list_add+0x54/0x60 [ 576.903662] RSP [ 576.903673] ---[ end trace d46de6b99ae8cd5a ]--- [ 576.913664] Kernel panic - not syncing: Aiee, killing interrupt handler! Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 5, 2008 3:52 PM, Torsten Kaiser <[EMAIL PROTECTED]> wrote: > On Jan 5, 2008 11:13 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > On Sat, Jan 05, 2008 at 09:01:02AM +0100, Torsten Kaiser wrote: > > > On Jan 5, 2008 1:07 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > > I think it would be easier just to start with this working -rc6 and > > > > simply check if we have 'right' suspects, so: git-net.patch and > > > > git-nfsd.patch from -mm1-broken-out, as suggested by Herbert (I hope, > > > > can compile - otherwise you could try the other way: add the whole -mm > > > > and revert these two). Using current gits could complicate this > > > > "investigation". > > > > > > OK, I will try this... > > still on the todo-list, I had no time to try this yet... working on it... 2.6.24-rc6 + mm-patches up to git.battery (includes git-net and git-netdev-all) worked for 110 packages, then I proclaimed it good. 2.6.24-rc6 + mm-patches up to (including) git.nfsd is currently getting testet (9 packages done...) But the cause of my mail is the following question: Regarding my "iommu-sg-merging-patches are new in -rc3-mm and could be the cause"-suspicion I looked at these patches and came across these hunks: This is removed from arch/x86/lib/bitstr_64.c: -/* Find string of zero bits in a bitmap */ -unsigned long -find_next_zero_string(unsigned long *bitmap, long start, long nbits, int len) -{ - unsigned long n, end, i; - - again: - n = find_next_zero_bit(bitmap, nbits, start); - if (n == -1) - return -1; - - /* could test bitsliced, but it's hardly worth it */ - end = n+len; - if (end > nbits) - return -1; - for (i = n+1; i < end; i++) { - if (test_bit(i, bitmap)) { - start = i+1; - goto again; - } - } - return n; -} This is added to lib/iommu-helper.c: +static unsigned long find_next_zero_area(unsigned long *map, +unsigned long size, +unsigned long start, +unsigned int nr) +{ + unsigned long index, end, i; +again: + index = find_next_zero_bit(map, size, start); + end = index + nr; + if (end > size) + return -1; + for (i = index + 1; i < end; i++) { + if (test_bit(i, map)) { + start = i+1; + goto again; + } + } + return index; +} The old version checks, if find_next_zero_bit returns -1, the new version doesn't do this. Is this intended and can find_next_zero_bit never fail? Hmm... but in the worst case it should only loop forever if I'm reading this right (index = -1 => for-loop counts from 0 to nr, if any bit is set it will jump to "again:" and if the next call to find_next_zero_bit also fails, its an endless loop) So even if this can not explain my bug, could somebody check if this is a real bug or not? Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 5, 2008 1:07 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On Fri, Jan 04, 2008 at 04:21:26PM +0100, Torsten Kaiser wrote: > > On Jan 4, 2008 2:30 PM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > The only thing that is sadly not practical is bisecting the borkenout > > mm-patches, as triggering this error is to unreliable / > > time-consuming. > > Right, but it seems there are these 2 main suspects here... > > > > - is it still vanilla -rc6-mm1; I've seen on kernel list you tried > > > some fixes around raid? > > > > Yes, without these fixes I can't boot. > > But they should only be run during starting the arrays, so I doubt > > that this is that cause. > > (Also -rc3-mm2 did not need this fix) > > You've written vanilla -rc6 is OK. Does it mean -rc6 with these fixes? vanilla -rc6 is fine without these fixes. The raid-bugs from -rc6-mm1 are probably introduced by md-allow-devices-to-be-shared-between-md-arrays.patch and that patch is new in this mm-release. > I think it would be easier just to start with this working -rc6 and > simply check if we have 'right' suspects, so: git-net.patch and > git-nfsd.patch from -mm1-broken-out, as suggested by Herbert (I hope, > can compile - otherwise you could try the other way: add the whole -mm > and revert these two). Using current gits could complicate this > "investigation". OK, I will try this... > > My skbuff-double-free-detector is still in there, but was never triggered. > > > > > - could you remind this lockdep warning; is it always and the same, > > > always before crash, or no rules? > > > > ??? > > I see no lockdep warning before the crashes. > > I have seen a warning about the dst->__refcnt in dst_release and > > different warnings about list operations. > > > > I think I have always posted everything I have seen before the > > crashes. (captured via serial console) > > So, you mean there are no more of these?: > > "looked into the log in question and the only other warning was a > circular locking dependency that lockdep detected around 1.5 hour > before this warning." > ... > "[ 7620.845168] INFO: lockdep is turned off." Aha, I had forgotten about that one. Looking at all the crashlogs, I do not find another one of this lockdep warning. The only other lockdep related output was the bootup problem in vanilla -rc6. > > (If you mean the lockdep-problem in -rc6: That is more or less a > > missing annotation during early bootup. The only problem with that is, > > that it will causes lockdep to be turned off and so it can not be used > > to find any real problem. A fix for that is in -mm so I do have > > lockdep on the mm-kernels) > > > > > - I've seen you looked after double freeing, but this last debug list > > > warning could suggest locking problems during list modification too. > > > > Yes, but Herbert mentioned double freeing a skb explicit and so I > > tried to catch this. > > I do not know enough about the network core to verify the locking of > > the involved lists. > > Right, the list corruption could be because of use after freeing too. I had hoped that I could catch use-after-freeing by using slub_debug=FZP, but that did not help. (first oops in http://lkml.org/lkml/2007/12/28/159 ) I think that the main skb structs come from slub and should be poisoned by this, so it might be some other data structure that is allocated differently... > > > - above git-nfsd and git-net tests should be probably repeated with > > > -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches > > > only, and if bug triggers, with one reversed; btw., since in previous > > > message you mentioned that 50 packages could be not enough to trigger > > > this, these 54 above could make too little margin yet. > > > > Yes, I think I really need to redo the git-nfsd-test. > > With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound > > run of kde-packages triggered it after only 5 packages. > > I don't know what this bug hates about kdeartwork-wallpaper (triggered > > it this time) or kdeartwork-styles. > > I didn't read all this thread, so probably I miss many points, but are > you sure there are no problems with filesystem corruption around these > packets or where you compile(?) them (e.g. after these raid problems)? For my setup: It's a gentoo system, so compiling packages is the normal way of installing something. The compile itself is done on a tmpfs so a filesystem corruption there should be rather impossible. ;) (The system has 4Gb RAM, so it doesn't even need to swap) The sources are taken fro
Re: 2.6.24-rc6-mm1
On Jan 5, 2008 1:07 AM, Jarek Poplawski [EMAIL PROTECTED] wrote: On Fri, Jan 04, 2008 at 04:21:26PM +0100, Torsten Kaiser wrote: On Jan 4, 2008 2:30 PM, Jarek Poplawski [EMAIL PROTECTED] wrote: The only thing that is sadly not practical is bisecting the borkenout mm-patches, as triggering this error is to unreliable / time-consuming. Right, but it seems there are these 2 main suspects here... - is it still vanilla -rc6-mm1; I've seen on kernel list you tried some fixes around raid? Yes, without these fixes I can't boot. But they should only be run during starting the arrays, so I doubt that this is that cause. (Also -rc3-mm2 did not need this fix) You've written vanilla -rc6 is OK. Does it mean -rc6 with these fixes? vanilla -rc6 is fine without these fixes. The raid-bugs from -rc6-mm1 are probably introduced by md-allow-devices-to-be-shared-between-md-arrays.patch and that patch is new in this mm-release. I think it would be easier just to start with this working -rc6 and simply check if we have 'right' suspects, so: git-net.patch and git-nfsd.patch from -mm1-broken-out, as suggested by Herbert (I hope, can compile - otherwise you could try the other way: add the whole -mm and revert these two). Using current gits could complicate this investigation. OK, I will try this... My skbuff-double-free-detector is still in there, but was never triggered. - could you remind this lockdep warning; is it always and the same, always before crash, or no rules? ??? I see no lockdep warning before the crashes. I have seen a warning about the dst-__refcnt in dst_release and different warnings about list operations. I think I have always posted everything I have seen before the crashes. (captured via serial console) So, you mean there are no more of these?: looked into the log in question and the only other warning was a circular locking dependency that lockdep detected around 1.5 hour before this warning. ... [ 7620.845168] INFO: lockdep is turned off. Aha, I had forgotten about that one. Looking at all the crashlogs, I do not find another one of this lockdep warning. The only other lockdep related output was the bootup problem in vanilla -rc6. (If you mean the lockdep-problem in -rc6: That is more or less a missing annotation during early bootup. The only problem with that is, that it will causes lockdep to be turned off and so it can not be used to find any real problem. A fix for that is in -mm so I do have lockdep on the mm-kernels) - I've seen you looked after double freeing, but this last debug list warning could suggest locking problems during list modification too. Yes, but Herbert mentioned double freeing a skb explicit and so I tried to catch this. I do not know enough about the network core to verify the locking of the involved lists. Right, the list corruption could be because of use after freeing too. I had hoped that I could catch use-after-freeing by using slub_debug=FZP, but that did not help. (first oops in http://lkml.org/lkml/2007/12/28/159 ) I think that the main skb structs come from slub and should be poisoned by this, so it might be some other data structure that is allocated differently... - above git-nfsd and git-net tests should be probably repeated with -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches only, and if bug triggers, with one reversed; btw., since in previous message you mentioned that 50 packages could be not enough to trigger this, these 54 above could make too little margin yet. Yes, I think I really need to redo the git-nfsd-test. With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound run of kde-packages triggered it after only 5 packages. I don't know what this bug hates about kdeartwork-wallpaper (triggered it this time) or kdeartwork-styles. I didn't read all this thread, so probably I miss many points, but are you sure there are no problems with filesystem corruption around these packets or where you compile(?) them (e.g. after these raid problems)? For my setup: It's a gentoo system, so compiling packages is the normal way of installing something. The compile itself is done on a tmpfs so a filesystem corruption there should be rather impossible. ;) (The system has 4Gb RAM, so it doesn't even need to swap) The sources are taken from a nfsv4 share that is served from a different system. Also gentoo checksums all sources it will use. After the crashes I also did a checksum of the last installed packages. Only in one instance there was corruption, all new files where completely empty. Obviously XFS did not have the time to write them back to disk before the system crashed. Also as all crashes show network related traces and the system is working fine otherwise, I doubt any permanent filesystem problems. For the raid problems: I was just unable to even start the raid that has / on it, because of a wrong
Re: 2.6.24-rc6-mm1
On Jan 5, 2008 11:10 PM, Torsten Kaiser [EMAIL PROTECTED] wrote: 2.6.24-rc6 + mm-patches up to git.battery (includes git-net and git-netdev-all) worked for 110 packages, then I proclaimed it good. 2.6.24-rc6 + mm-patches up to (including) git.nfsd is currently getting testet (9 packages done...) That kernel did also work for all 110 packages. 2.6.24-rc6 + mm-patches up to (including) git.xfs - crash [ 576.899332] [ cut here ] [ 576.903661] kernel BUG at lib/list_debug.c:33! [ 576.903661] invalid opcode: [1] SMP [ 576.903661] last sysfs file: /devices/system/cpu/cpu3/cache/index2/shared_cpu_map [ 576.903661] CPU 3 [ 576.903661] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc tveeprom videodev v4l2_common usbhid v4l1_compat sg hid i2c_nforce2 pata_amd [ 576.903661] Pid: 5559, comm: nfsv4-svc Not tainted 2.6.24-rc6-mm-git.xfs #2 [ 576.903661] RIP: 0010:[803c16e4] [803c16e4] __list_add+0x54/0x60 [ 576.903661] RSP: 0018:81007d4e1dc0 EFLAGS: 00010282 [ 576.903661] RAX: 0088 RBX: 81007e955800 RCX: fc6c7900 [ 576.903661] RDX: 81007d53eef0 RSI: 0001 RDI: 80760140 [ 576.903661] RBP: 81007d4e1dc0 R08: 0001 R09: [ 576.903661] R10: 810080062008 R11: 0001 R12: 81007ed00900 [ 576.903661] R13: 81007ed00938 R14: 81007ed00938 R15: 81007dd6f100 [ 576.903661] FS: 7f1b7e6a36f0() GS:81011ff1b780() knlGS: [ 576.903661] CS: 0010 DS: ES: CR0: 8005003b [ 576.903661] CR2: 7ffb28c2c000 CR3: 741ab000 CR4: 06e0 [ 576.903661] DR0: DR1: DR2: [ 576.903661] DR3: DR6: 0ff0 DR7: 0400 [ 576.903661] Process nfsv4-svc (pid: 5559, threadinfo 81007d4e, task 81007d53eef0) [ 576.903661] Stack: 81007d4e1e00 805c4dbb 81007ed00908 81007dd6f100 [ 576.903661] 81011ad7bc00 81007d458000 81007e955800 81007dd6f110 [ 576.903661] 81007d4e1e10 805c4ea7 81007d4e1ee0 805c5fd4 [ 576.903661] Call Trace: [ 576.903661] [805c4dbb] svc_xprt_enqueue+0x1ab/0x240 [ 576.903661] [805c4ea7] svc_xprt_received+0x17/0x20 [ 576.903661] [805c5fd4] svc_recv+0x394/0x7c0 [ 576.903661] [805c53de] svc_send+0xae/0xd0 [ 576.903661] [80230ab0] default_wake_function+0x0/0x10 [ 576.903661] [80316499] nfs_callback_svc+0x79/0x130 [ 576.903662] [80232f8c] finish_task_switch+0xcc/0xe0 [ 576.903662] [8020c818] child_rip+0xa/0x12 [ 576.903662] [8020bf2f] restore_args+0x0/0x30 [ 576.903662] [805b9ecd] __svc_create_thread+0xdd/0x200 [ 576.903662] [80316420] nfs_callback_svc+0x0/0x130 [ 576.903662] [8020c80e] child_rip+0x0/0x12 [ 576.903662] [ 576.903662] [ 576.903662] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16 48 89 e5 e8 [ 576.903662] RIP [803c16e4] __list_add+0x54/0x60 [ 576.903662] RSP 81007d4e1dc0 [ 576.903673] ---[ end trace d46de6b99ae8cd5a ]--- [ 576.913664] Kernel panic - not syncing: Aiee, killing interrupt handler! Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 4, 2008 4:21 PM, Torsten Kaiser <[EMAIL PROTECTED]> wrote: > On Jan 4, 2008 2:30 PM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > - above git-nfsd and git-net tests should be probably repeated with > > -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches > > only, and if bug triggers, with one reversed; btw., since in previous > > message you mentioned that 50 packages could be not enough to trigger > > this, these 54 above could make too little margin yet. > > Yes, I think I really need to redo the git-nfsd-test. > With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound > run of kde-packages triggered it after only 5 packages. > I don't know what this bug hates about kdeartwork-wallpaper (triggered > it this time) or kdeartwork-styles. 49 more (kde-)packages did work too. Still looks like it is only in -mm. Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 4, 2008 2:30 PM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On 04-01-2008 11:23, Torsten Kaiser wrote: > > On Jan 2, 2008 10:51 PM, Herbert Xu <[EMAIL PROTECTED]> wrote: > >> On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote: > >>> Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings. > >> OK that's great. The next step would be to try excluding specific git > >> trees from mm to see if they make a difference. > >> > >> The two specific trees of interest would be git-nfsd and git-net. > > > > git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm > > -> compiling and installing 54 packages worked without crashes. > > > > git-net from > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git > > -> compiling and installing 95 packages worked without crashes. > ... > > I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I > > have no clue where to look... > > Hi, > > A few questions/suggestions: I'm open for any suggestions and will try to answer any questions. The only thing that is sadly not practical is bisecting the borkenout mm-patches, as triggering this error is to unreliable / time-consuming. > - is it still vanilla -rc6-mm1; I've seen on kernel list you tried > some fixes around raid? Yes, without these fixes I can't boot. But they should only be run during starting the arrays, so I doubt that this is that cause. (Also -rc3-mm2 did not need this fix) My skbuff-double-free-detector is still in there, but was never triggered. > - could you remind this lockdep warning; is it always and the same, > always before crash, or no rules? ??? I see no lockdep warning before the crashes. I have seen a warning about the dst->__refcnt in dst_release and different warnings about list operations. I think I have always posted everything I have seen before the crashes. (captured via serial console) (If you mean the lockdep-problem in -rc6: That is more or less a missing annotation during early bootup. The only problem with that is, that it will causes lockdep to be turned off and so it can not be used to find any real problem. A fix for that is in -mm so I do have lockdep on the mm-kernels) > - I've seen you looked after double freeing, but this last debug list > warning could suggest locking problems during list modification too. Yes, but Herbert mentioned double freeing a skb explicit and so I tried to catch this. I do not know enough about the network core to verify the locking of the involved lists. > - above git-nfsd and git-net tests should be probably repeated with > -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches > only, and if bug triggers, with one reversed; btw., since in previous > message you mentioned that 50 packages could be not enough to trigger > this, these 54 above could make too little margin yet. Yes, I think I really need to redo the git-nfsd-test. With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound run of kde-packages triggered it after only 5 packages. I don't know what this bug hates about kdeartwork-wallpaper (triggered it this time) or kdeartwork-styles. Output from the crash with IOMMU_DEBUG (lockdep was enabled, but did not trigger): [15593.236374] Unable to handle kernel NULL pointer dereference<3>list_add corruption. prev->next should be next (8078a410), but was 81011ec01e68. (prev=81011ec01e68). [15593.236374] at RIP: [15593.236374] [<>] [15593.236374] PGD 79d22067 PUD 7acd7067 PMD 0 [15593.236374] Oops: 0010 [1] SMP [15593.236374] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map [15593.236374] CPU 2 [15593.236374] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common v4l1_compat sg hid pata_amd i2c_nforce2 [15593.236374] Pid: 510, comm: khpsbpkt Not tainted 2.6.24-rc6-mm1 #15 [15593.236374] RIP: 0010:[<>] [<>] [15593.236374] RSP: 0018:81007eed3ee8 EFLAGS: 00010206 [15593.236374] RAX: 81007eed3ef0 RBX: 81011ec01e40 RCX: 81011ec01e40 [15593.236374] RDX: 81011ec01e68 RSI: 81011ec01e68 RDI: [15593.236374] RBP: 81007eed3f10 R08: R09: 0001 [15593.236374] R10: 0001 R11: 0058 R12: 81007eed3ef0 [15593.236374] R13: 80470e50 R14: R15: [15593.236374] FS: 7f76e6c98700() GS:81011ff1f000() knlGS:556f46c0 [15593.236374] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [15593.236374] CR2: 00
Re: 2.6.24-rc6-mm1
On Jan 2, 2008 10:51 PM, Herbert Xu <[EMAIL PROTECTED]> wrote: > On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote: > > > > Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings. > > OK that's great. The next step would be to try excluding specific git > trees from mm to see if they make a difference. > > The two specific trees of interest would be git-nfsd and git-net. git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm -> compiling and installing 54 packages worked without crashes. git-net from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git -> compiling and installing 95 packages worked without crashes. The only thing in the announces of 2.6.24-rc3-mm1/2 that stands out for me is: +iommu-sg-merging-add-device_dma_parameters-structure.patch +iommu-sg-merging-pci-add-device_dma_parameters-support.patch +iommu-sg-merging-x86-make-pci-gart-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-ppc-make-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-ia64-make-sba_iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-alpha-make-pci_iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-sparc64-make-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-parisc-make-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue.patch +iommu-sg-merging-sata_inic162x-use-pci_set_dma_max_seg_size.patch +iommu-sg-merging-aacraid-use-pci_set_dma_max_seg_size.patch iommu work I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I have no clue where to look... Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 2, 2008 10:51 PM, Herbert Xu [EMAIL PROTECTED] wrote: On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote: Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings. OK that's great. The next step would be to try excluding specific git trees from mm to see if they make a difference. The two specific trees of interest would be git-nfsd and git-net. git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm - compiling and installing 54 packages worked without crashes. git-net from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git - compiling and installing 95 packages worked without crashes. The only thing in the announces of 2.6.24-rc3-mm1/2 that stands out for me is: +iommu-sg-merging-add-device_dma_parameters-structure.patch +iommu-sg-merging-pci-add-device_dma_parameters-support.patch +iommu-sg-merging-x86-make-pci-gart-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-ppc-make-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-ia64-make-sba_iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-alpha-make-pci_iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-sparc64-make-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-parisc-make-iommu-respect-the-segment-size-limits.patch +iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue.patch +iommu-sg-merging-sata_inic162x-use-pci_set_dma_max_seg_size.patch +iommu-sg-merging-aacraid-use-pci_set_dma_max_seg_size.patch iommu work I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I have no clue where to look... Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc6-mm1
On Jan 4, 2008 2:30 PM, Jarek Poplawski [EMAIL PROTECTED] wrote: On 04-01-2008 11:23, Torsten Kaiser wrote: On Jan 2, 2008 10:51 PM, Herbert Xu [EMAIL PROTECTED] wrote: On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote: Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings. OK that's great. The next step would be to try excluding specific git trees from mm to see if they make a difference. The two specific trees of interest would be git-nfsd and git-net. git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm - compiling and installing 54 packages worked without crashes. git-net from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git - compiling and installing 95 packages worked without crashes. ... I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I have no clue where to look... Hi, A few questions/suggestions: I'm open for any suggestions and will try to answer any questions. The only thing that is sadly not practical is bisecting the borkenout mm-patches, as triggering this error is to unreliable / time-consuming. - is it still vanilla -rc6-mm1; I've seen on kernel list you tried some fixes around raid? Yes, without these fixes I can't boot. But they should only be run during starting the arrays, so I doubt that this is that cause. (Also -rc3-mm2 did not need this fix) My skbuff-double-free-detector is still in there, but was never triggered. - could you remind this lockdep warning; is it always and the same, always before crash, or no rules? ??? I see no lockdep warning before the crashes. I have seen a warning about the dst-__refcnt in dst_release and different warnings about list operations. I think I have always posted everything I have seen before the crashes. (captured via serial console) (If you mean the lockdep-problem in -rc6: That is more or less a missing annotation during early bootup. The only problem with that is, that it will causes lockdep to be turned off and so it can not be used to find any real problem. A fix for that is in -mm so I do have lockdep on the mm-kernels) - I've seen you looked after double freeing, but this last debug list warning could suggest locking problems during list modification too. Yes, but Herbert mentioned double freeing a skb explicit and so I tried to catch this. I do not know enough about the network core to verify the locking of the involved lists. - above git-nfsd and git-net tests should be probably repeated with -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches only, and if bug triggers, with one reversed; btw., since in previous message you mentioned that 50 packages could be not enough to trigger this, these 54 above could make too little margin yet. Yes, I think I really need to redo the git-nfsd-test. With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound run of kde-packages triggered it after only 5 packages. I don't know what this bug hates about kdeartwork-wallpaper (triggered it this time) or kdeartwork-styles. Output from the crash with IOMMU_DEBUG (lockdep was enabled, but did not trigger): [15593.236374] Unable to handle kernel NULL pointer dereference3list_add corruption. prev-next should be next (8078a410), but was 81011ec01e68. (prev=81011ec01e68). [15593.236374] at RIP: [15593.236374] [] [15593.236374] PGD 79d22067 PUD 7acd7067 PMD 0 [15593.236374] Oops: 0010 [1] SMP [15593.236374] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map [15593.236374] CPU 2 [15593.236374] Modules linked in: radeon drm w83792d ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common v4l1_compat sg hid pata_amd i2c_nforce2 [15593.236374] Pid: 510, comm: khpsbpkt Not tainted 2.6.24-rc6-mm1 #15 [15593.236374] RIP: 0010:[] [] [15593.236374] RSP: 0018:81007eed3ee8 EFLAGS: 00010206 [15593.236374] RAX: 81007eed3ef0 RBX: 81011ec01e40 RCX: 81011ec01e40 [15593.236374] RDX: 81011ec01e68 RSI: 81011ec01e68 RDI: [15593.236374] RBP: 81007eed3f10 R08: R09: 0001 [15593.236374] R10: 0001 R11: 0058 R12: 81007eed3ef0 [15593.236374] R13: 80470e50 R14: R15: [15593.236374] FS: 7f76e6c98700() GS:81011ff1f000() knlGS:556f46c0 [15593.236374] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [15593.236374] CR2: CR3: 79d29000 CR4: 06e0 [15593.236374] DR0: DR1: DR2: [15593.236374] DR3: DR6: 0ff0 DR7: 0400 [15593.236374] Process khpsbpkt (pid: 510, threadinfo 81007eed2000, task