Re: udiskd high CPU usage with 4.0 git

2015-03-16 Thread Torsten Kaiser
On Mon, Mar 16, 2015 at 12:44 AM, NeilBrown  wrote:
> On Sat, 14 Mar 2015 21:16:51 +0100 Torsten Kaiser
>  wrote:
>> udisksd now again behaves normal, but I'm not sending this change as a
>> patch, because I do not know about the locking and livetime of these
>> objects to evaluate, if that is really the correct fix.
>
> Thanks for the bisection and analysis!  Always easier when someone else does
> the hard work :-)
>
> There is a much simpler patch (as you probably suspected).  I'll post it in a
> moment.

Linux-4.0-rc4 is still broken as expected, but after applying your
patch from "[PATCH] kernfs: handle poll correctly on 'direct_read'
files" my udisksd process behaves normal again.

Thanks for the quick answer + fix!

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: udiskd high CPU usage with 4.0 git

2015-03-16 Thread Torsten Kaiser
On Mon, Mar 16, 2015 at 12:44 AM, NeilBrown ne...@suse.de wrote:
 On Sat, 14 Mar 2015 21:16:51 +0100 Torsten Kaiser
 just.for.l...@googlemail.com wrote:
 udisksd now again behaves normal, but I'm not sending this change as a
 patch, because I do not know about the locking and livetime of these
 objects to evaluate, if that is really the correct fix.

 Thanks for the bisection and analysis!  Always easier when someone else does
 the hard work :-)

 There is a much simpler patch (as you probably suspected).  I'll post it in a
 moment.

Linux-4.0-rc4 is still broken as expected, but after applying your
patch from [PATCH] kernfs: handle poll correctly on 'direct_read'
files my udisksd process behaves normal again.

Thanks for the quick answer + fix!

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: udiskd high CPU usage with 4.0 git

2015-03-14 Thread Torsten Kaiser
On Mon, Mar 9, 2015 at 12:30 AM, NeilBrown  wrote:
> On Sun, 08 Mar 2015 18:14:39 +0100 Prakash Punnoor  wrote:
>
>> Hi,
>>
>> I noticed the udisks daemon (version 2.1.4) suddenly started using high
>> cpu (one core at 100%) with linux 4.0 git kernel. I bisected it to:
>>
>> 750f199ee8b578062341e6ddfe36c59ac8ff2dcb

I had the same problem upgrading from 4.0-rc1 to 4.0-rc3.
I have just finished bisecting and "fixing" it.

My bisect points to the same commit.

Looking at udisksd with strace sees a loop of polling and then
accessing several md related sysfs files.
The only file that udisksd monitors and was changes by that commit was
"sync_action".

If I revert this part of the commit, my system works normal again:

 static struct md_sysfs_entry md_scan_mode =
- __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
+ __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);

It seems that polling is broken for peralloc files.

The cause seems to be that kernfs_seq_show() updates ->event, while
the new sysfs_kf_read() does not.
So the polling will always trigger and udisksd goes into an inifinite
loop looking for changes that are not there.

I fixed my local system by copying the line "of->event =
atomic_read(>kn->attr.open->event);" from kernfs_seq_show() into
sysfs_kf_read(). (I also needed to move the definition of struct
kernfs_open_node from kernfs/file.c to kefs-internal.h)

udisksd now again behaves normal, but I'm not sending this change as a
patch, because I do not know about the locking and livetime of these
objects to evaluate, if that is really the correct fix.


Thanks,

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: udiskd high CPU usage with 4.0 git

2015-03-14 Thread Torsten Kaiser
On Mon, Mar 9, 2015 at 12:30 AM, NeilBrown ne...@suse.de wrote:
 On Sun, 08 Mar 2015 18:14:39 +0100 Prakash Punnoor prak...@punnoor.de wrote:

 Hi,

 I noticed the udisks daemon (version 2.1.4) suddenly started using high
 cpu (one core at 100%) with linux 4.0 git kernel. I bisected it to:

 750f199ee8b578062341e6ddfe36c59ac8ff2dcb

I had the same problem upgrading from 4.0-rc1 to 4.0-rc3.
I have just finished bisecting and fixing it.

My bisect points to the same commit.

Looking at udisksd with strace sees a loop of polling and then
accessing several md related sysfs files.
The only file that udisksd monitors and was changes by that commit was
sync_action.

If I revert this part of the commit, my system works normal again:

 static struct md_sysfs_entry md_scan_mode =
- __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
+ __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);

It seems that polling is broken for peralloc files.

The cause seems to be that kernfs_seq_show() updates -event, while
the new sysfs_kf_read() does not.
So the polling will always trigger and udisksd goes into an inifinite
loop looking for changes that are not there.

I fixed my local system by copying the line of-event =
atomic_read(of-kn-attr.open-event); from kernfs_seq_show() into
sysfs_kf_read(). (I also needed to move the definition of struct
kernfs_open_node from kernfs/file.c to kefs-internal.h)

udisksd now again behaves normal, but I'm not sending this change as a
patch, because I do not know about the locking and livetime of these
objects to evaluate, if that is really the correct fix.


Thanks,

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] firmware: Create directories for external firmware

2014-07-08 Thread Torsten Kaiser
On Tue, Jul 8, 2014 at 2:47 PM, Michal Marek  wrote:
> Commit 5180d5f4 ("firmware: Simplify directory creation") broke
> including firmware specified in CONFIG_EXTRA_FIRMWARE:
>
>   MK_FW   firmware/amd-ucode/microcode_amd.bin.gen.S
> /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
> ...
> firmware/Makefile:185: recipe for target
> 'firmware/amd-ucode/microcode_amd.bin.gen.S' failed
>
> It works with O= builds, because the directory is created by
> Makefile.build. Create the directory in firmware/Makefile in non-O
> builds.
>
> Reported-by: Ronald 
> Reported-by: Torsten Kaiser 
> Signed-off-by: Michal Marek 
> ---
>
> Can you try this patch?

Works fine for me.

Thanks for the quick patch!

Torsten

> Ronald, can you tell me your full name for the Reported-by: line?
>
> Thanks.
> ---
>
>  firmware/Makefile | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/firmware/Makefile b/firmware/Makefile
> index 5747417..0862d34 100644
> --- a/firmware/Makefile
> +++ b/firmware/Makefile
> @@ -219,6 +219,12 @@ $(obj)/%.fw: $(obj)/%.H16 $(ihex2fw_dep)
>  obj-y   += $(patsubst %,%.gen.o, $(fw-external-y))
>  obj-$(CONFIG_FIRMWARE_IN_KERNEL) += $(patsubst %,%.gen.o, $(fw-shipped-y))
>
> +ifeq ($(KBUILD_SRC),)
> +# Makefile.build only creates subdirectories for O= builds, but external
> +# firmware might live outside the kernel source tree
> +_dummy := $(foreach d,$(addprefix $(obj)/,$(dir $(fw-external-y))), $(shell 
> [ -d $(d) ] || mkdir -p $(d)))
> +endif
> +
>  # Remove .S files and binaries created from ihex
>  # (during 'make clean' .config isn't included so they're all in 
> $(fw-shipped-))
>  targets := $(fw-shipped-) $(patsubst $(obj)/%,%, \
> --
> 1.8.4.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] firmware: Create directories for external firmware

2014-07-08 Thread Torsten Kaiser
On Tue, Jul 8, 2014 at 2:47 PM, Michal Marek mma...@suse.cz wrote:
 Commit 5180d5f4 (firmware: Simplify directory creation) broke
 including firmware specified in CONFIG_EXTRA_FIRMWARE:

   MK_FW   firmware/amd-ucode/microcode_amd.bin.gen.S
 /bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
 ...
 firmware/Makefile:185: recipe for target
 'firmware/amd-ucode/microcode_amd.bin.gen.S' failed

 It works with O= builds, because the directory is created by
 Makefile.build. Create the directory in firmware/Makefile in non-O
 builds.

 Reported-by: Ronald ronald...@gmail.com
 Reported-by: Torsten Kaiser just.for.l...@googlemail.com
 Signed-off-by: Michal Marek mma...@suse.cz
 ---

 Can you try this patch?

Works fine for me.

Thanks for the quick patch!

Torsten

 Ronald, can you tell me your full name for the Reported-by: line?

 Thanks.
 ---

  firmware/Makefile | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/firmware/Makefile b/firmware/Makefile
 index 5747417..0862d34 100644
 --- a/firmware/Makefile
 +++ b/firmware/Makefile
 @@ -219,6 +219,12 @@ $(obj)/%.fw: $(obj)/%.H16 $(ihex2fw_dep)
  obj-y   += $(patsubst %,%.gen.o, $(fw-external-y))
  obj-$(CONFIG_FIRMWARE_IN_KERNEL) += $(patsubst %,%.gen.o, $(fw-shipped-y))

 +ifeq ($(KBUILD_SRC),)
 +# Makefile.build only creates subdirectories for O= builds, but external
 +# firmware might live outside the kernel source tree
 +_dummy := $(foreach d,$(addprefix $(obj)/,$(dir $(fw-external-y))), $(shell 
 [ -d $(d) ] || mkdir -p $(d)))
 +endif
 +
  # Remove .S files and binaries created from ihex
  # (during 'make clean' .config isn't included so they're all in 
 $(fw-shipped-))
  targets := $(fw-shipped-) $(patsubst $(obj)/%,%, \
 --
 1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: firmware: Simplify directory creation + b43 = fails to build

2014-07-07 Thread Torsten Kaiser
On Wed, Jun 18, 2014 at 6:25 PM, Ronald  wrote:
> From my .config
>
>  ==> cat /usr/src/config | grep -i b43
> CONFIG_EXTRA_FIRMWARE="b43/ucode5.fw b43/b0g0initvals5.fw
> b43/b0g0bsinitvals5.fw b43/pcm5.fw"
> ... snip ...

That might be rather later, but I seem to have the same problem:
  CHK kernel/config_data.h
  MK_FW   firmware/amd-ucode/microcode_amd.bin.gen.S
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
firmware/Makefile:185: recipe for target
'firmware/amd-ucode/microcode_amd.bin.gen.S' failed
make[1]: *** [firmware/amd-ucode/microcode_amd.bin.gen.S] Error 1
Makefile:896: recipe for target 'firmware' failed
make: *** [firmware] Error 2

The directory firmware/amd-ucode does not exist in the kernel source
tree, but my .config seems to need it:
CONFIG_EXTRA_FIRMWARE="radeon/R700_rlc.bin radeon/RV710_uvd.bin
radeon/RV730_smc.bin amd-ucode/microcode_amd.bin"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"

Just doing a "mkdir firmware/amd-ucode" lets the build continue and I
get a working 3.16 kernel.

With 3.15 and earlier I never had a problem with this, but 3.16-rc4
just failed with above message.

Do you need my full .config or any other information about my system?
I would be happy to provide that and/or test a patch.

Thanks for looking into this!

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: firmware: Simplify directory creation + b43 = fails to build

2014-07-07 Thread Torsten Kaiser
On Wed, Jun 18, 2014 at 6:25 PM, Ronald ronald...@gmail.com wrote:
 From my .config

  == cat /usr/src/config | grep -i b43
 CONFIG_EXTRA_FIRMWARE=b43/ucode5.fw b43/b0g0initvals5.fw
 b43/b0g0bsinitvals5.fw b43/pcm5.fw
 ... snip ...

That might be rather later, but I seem to have the same problem:
  CHK kernel/config_data.h
  MK_FW   firmware/amd-ucode/microcode_amd.bin.gen.S
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
/bin/sh: firmware/amd-ucode/microcode_amd.bin.gen.S: No such file or directory
firmware/Makefile:185: recipe for target
'firmware/amd-ucode/microcode_amd.bin.gen.S' failed
make[1]: *** [firmware/amd-ucode/microcode_amd.bin.gen.S] Error 1
Makefile:896: recipe for target 'firmware' failed
make: *** [firmware] Error 2

The directory firmware/amd-ucode does not exist in the kernel source
tree, but my .config seems to need it:
CONFIG_EXTRA_FIRMWARE=radeon/R700_rlc.bin radeon/RV710_uvd.bin
radeon/RV730_smc.bin amd-ucode/microcode_amd.bin
CONFIG_EXTRA_FIRMWARE_DIR=/lib/firmware

Just doing a mkdir firmware/amd-ucode lets the build continue and I
get a working 3.16 kernel.

With 3.15 and earlier I never had a problem with this, but 3.16-rc4
just failed with above message.

Do you need my full .config or any other information about my system?
I would be happy to provide that and/or test a patch.

Thanks for looking into this!

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86, amd, microcode: Fix error path in apply_microcode_amd()

2013-07-31 Thread tip-bot for Torsten Kaiser
Commit-ID:  d982057f631df04f8d78321084a1a71ca51f3364
Gitweb: http://git.kernel.org/tip/d982057f631df04f8d78321084a1a71ca51f3364
Author: Torsten Kaiser 
AuthorDate: Tue, 23 Jul 2013 22:58:23 +0200
Committer:  H. Peter Anvin 
CommitDate: Wed, 31 Jul 2013 08:37:14 -0700

x86, amd, microcode: Fix error path in apply_microcode_amd()

Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.

Signed-off-by: Torsten Kaiser 
Link: http://lkml.kernel.org/r/20130723225823.2e4e7...@googlemail.com
Acked-by: Borislav Petkov 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/microcode_amd.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c
index 47ebb1d..7a0adb7 100644
--- a/arch/x86/kernel/microcode_amd.c
+++ b/arch/x86/kernel/microcode_amd.c
@@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu)
return 0;
}
 
-   if (__apply_microcode_amd(mc_amd))
+   if (__apply_microcode_amd(mc_amd)) {
pr_err("CPU%d: update failed for patch_level=0x%08x\n",
cpu, mc_amd->hdr.patch_id);
-   else
-   pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
-   mc_amd->hdr.patch_id);
+   return -1;
+   }
+   pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
+   mc_amd->hdr.patch_id);
 
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86, amd, microcode: Fix error path in apply_microcode_amd()

2013-07-31 Thread tip-bot for Torsten Kaiser
Commit-ID:  d982057f631df04f8d78321084a1a71ca51f3364
Gitweb: http://git.kernel.org/tip/d982057f631df04f8d78321084a1a71ca51f3364
Author: Torsten Kaiser just.for.l...@googlemail.com
AuthorDate: Tue, 23 Jul 2013 22:58:23 +0200
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Wed, 31 Jul 2013 08:37:14 -0700

x86, amd, microcode: Fix error path in apply_microcode_amd()

Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com
Link: http://lkml.kernel.org/r/20130723225823.2e4e7...@googlemail.com
Acked-by: Borislav Petkov b...@suse.de
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/kernel/microcode_amd.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c
index 47ebb1d..7a0adb7 100644
--- a/arch/x86/kernel/microcode_amd.c
+++ b/arch/x86/kernel/microcode_amd.c
@@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu)
return 0;
}
 
-   if (__apply_microcode_amd(mc_amd))
+   if (__apply_microcode_amd(mc_amd)) {
pr_err(CPU%d: update failed for patch_level=0x%08x\n,
cpu, mc_amd-hdr.patch_id);
-   else
-   pr_info(CPU%d: new patch_level=0x%08x\n, cpu,
-   mc_amd-hdr.patch_id);
+   return -1;
+   }
+   pr_info(CPU%d: new patch_level=0x%08x\n, cpu,
+   mc_amd-hdr.patch_id);
 
uci-cpu_sig.rev = mc_amd-hdr.patch_id;
c-microcode = mc_amd-hdr.patch_id;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Fix early microcode loading on AMD

2013-07-24 Thread Torsten Kaiser
On Wed, Jul 24, 2013 at 4:19 PM, Borislav Petkov  wrote:
> On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote:
>> > The other problem I see is not updating c->microcode since it is going
>> > to be overwritten by smp_store_cpu_info, which is unfortunate.
>> >
>> > And I don't see where Intel are updating that cpuinfo_x86.microcode
>> > field on early load too.
>> >
>> > So, AFAICT, c->microcode would remain unset when we only do early
>> > microcode load. But that is something we should fix as a later patch.
>>
>> I don't see a problem with that staying unset.
>> apply_microcode_amd() directly reads the rev from
>> MSR_AMD64_PATCH_LEVEL so it does not depend on that being correct.
>> And smp_store_(boot)_cpu_info will also read the current rev directly
>> from the CPU to fill ->microcode.
>
> We need to store the actual microcode revision to c->microcode for
> /proc/cpuinfo and MCE.

init_amd() will fill that field. (You could alway compile with
CONFIG_MICROCODE_AMD=n and that field would still need filling)
And as that will get called before smp_store_(boo)_cpu_info()
everything should be fine.

>> > So I think you should switch load_ucode_amd_ap to __apply_microcode_amd:
>> >
>> > p = find_patch()
>> >
>> > __apply_microcode_amd(p->mc_data);
>> >
>> > which should take care of the issue you're seeing, IMHO.
>>
>> The issue I'm seeing is that collect_cpu_info_amd_early() fills c->x86
>> but not c->x86_vendor.
>> Which breaks cpu_has_amd_erratum() and then Erratum 400 breaks the boot.
>>
>> I did not really want to switch from apply_microcode_amd() to
>> __apply_microcode_amd() because then I would lose the check if the new
>> microcode is really an upgrade.
>
> Well, if the BSP has already loaded the pcache, there's no need for
> the AP to parse and load the same microcode blobs file for the initrd,
> right?

loading != applying.

load_ucode_amd_ap() should probably called apply_ucode_amd_ap()
because that is primarily for applying the microcode.
That it also loads it (but really only once thanks to ucode_loaded) is
only because nobody else has run yet.

That whole place is hairy: Because on 32bit that seems to run much
earlier the 64 and 32 cases are very different.
64bit can and will use pcache/apply_microcode_amd() for the non BSP
CPUs, but on 32 bit everything directly applys the patches from initrd
memory into the CPUs be directly calling __apply_microcode_amd(). And
so bypassing pcache.

See comment above the 32bit version of load_ucode_amd_ap():
/*
 * On 32-bit, since AP's early load occurs before paging is turned on, we
 * cannot traverse cpu_equiv_table and pcache in kernel heap memory. So during
 * cold boot, AP will apply_ucode_in_initrd() just like the BSP. During
 * save_microcode_in_initrd_amd() BSP's patch is copied to amd_bsp_mpb, which
 * is used upon resume from suspend.
 */

As written in the other email: I'm currently trying to see if I can
kill amd_bsp_mpb...

>> >> * load_ucode_ap(): Quick exit for !cpu, because without 
>> >> load_microcode_amd()
>> >> getting called apply_microcode_amd() can't do anything. Exit, if no 
>> >> microcode
>> >> could be loaded.
>> >
>> > This could probably be a WARN_ON(!cpu) to catch errors...
>>
>> No, load_ucode_ap() will be called for cpu == 0.
>
> This needs fixing IMO...

Can't answer that. I have only seen that it is called for cpu == 0 and
that there is no special case für CPU#0 in all the places that call
load_ucode_ap()...

> Btw, thanks for looking at this and asking critical questions!
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Fix early microcode loading on AMD

2013-07-24 Thread Torsten Kaiser
On Wed, Jul 24, 2013 at 3:56 PM, Borislav Petkov  wrote:
> On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote:
>> >> * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could
>> >> later load an older microcode file that would overwrite amd_bsp_mpb,
>> >> but would not be applied to the CPUs
>> >
>> > See the patch id check in apply_ucode_in_initrd()?
>> >
>> > if (eq_id == mc->hdr.processor_rev_id && rev < mc->hdr.patch_id)
>>
>> I meant with "load" load_microcode_amd() not the loading of the
>> microcode into the CPU.
>>
>> 1.: load microcode rev X into CPU (early or normal is not important)
>> 2.: get older microcode file that only contains rev Y with Y> 3.: trigger load_microcode_amd() with a corrupt file: This will call
>> cleanup() and empty pcache.
>
> Ok, that's actually a good catch. So I wonder: why in hell would we
> flush the pcache if some of the blobs we're loading are corrupted. So
> what?! Jacob, what were you thinking - I'd be very interested to know
> what the idea behind this was.
>
> So, just to refresh everybody: the idea of the pcache is to keep all
> patches for the current family in memory so that we can support all
> sorts of hotplug and cpu mixed stepping diddling.

Then it would probably be the best to kill free_cache() completely.
Which would mean cleanup() should also go.
Which will make unloading microcode_amd.ko impossible.
But that is probably a good idea anyway: If you unload the module
there is no way to keep pcache.

But I still have another way to kill you: free_equiv_cpu_table()
Without that table find_patch() can't work and will not return the
correct information.

And that can be triggered by:
* start of load_microcode_amd(): If you reach that function (Only
UCODE_MAGIC needs to be in the file) that table is dead.
* __load_microcode_amd(): If the file only contains the table but no
patches ("invalid type field in container file section header\n")

>> 4.: trigger load_microcode_amd() with the older file:
>>  * this will now load rev Y into pcache
>>  * rev Y will be returned by find_patch and copied into amd_bsp_mpb
>>  * any try to apply rev Y will be skipped in apply_microcode_amd()
>>
>> So now the CPU still correctly has rev X, but amd_bsp_mpb will contain
>> the wrong rev Y.
>
> Right, so this shouldn't happen - what should happen is, pcache would
> hold both X and Y and find_patch would automatically give you the right
> one.
>
> And this is guaranteed since we keep the patches in a sorted linked list
> by ->patch_id which is guaranteed to be increasing.
>
> So actually load_microcode_amd() shouldn't be doing cleanup() but simply
> return ret upwards.

But it already called free_equiv_cpu_table() and so pcache is inaccessible.

And I don't think just preserving equiv_cpu_table for restoring in the
error case will be the right solution: If the new firmware file
contains a new table with fewer entries (or different entries!) some
of the patches in pcache might become inaccessible.

>> That copying already in load_microcode also is suspicious if someone
>> would only load the microcode but not apply it. But I did not find
>> a codepath in arch/x86/kernel/microcode_core.c to load it without a
>> followup apply.
>
> Yeah, we always load and apply.
>
> So now back to the original problem - load_microcode_amd() shouldn't
> clear the pcache and, in that case, a subsequent find_patch() would
> always give the right patch.

Not if equiv_cpu_table got mangled.
So should install_equiv_cpu_table() be turned into
add_to_equiv_cpu_table() or should pcache save all cpu_sig with each
patch, so that find_patch() no longer needs equiv_cpu_table?
I suspect saving that in struct ucode_patch might be better, to
prevent changes in equiv_id <-> cpu_sig mapping to make a patch
inaccessible.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Fix early microcode loading on AMD

2013-07-24 Thread Torsten Kaiser
On Wed, Jul 24, 2013 at 3:41 PM, Borislav Petkov  wrote:
> Let me try to answer this as well as I can, peacemeal-wise.
>
> On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote:
>> On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov  wrote:
>> > On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote:
>> >> Fixup the early AMD microcode loading.
>> >>
>> >> * load_microcode_amd() (and the helper its using) should not have an
>> >> cpu parameter.
>> >
>> > Hmm, I don't think so - we get the cpu handed down from microcode_core
>> > and besides the early load on 32bit needs to do find_patch(cpu).
>>
>> Thats why I moved that part into apply_microcode_amd(). See later on
>> more, why I think that move is the right thing.
>> And without that the current cpu parameter will only be used to get
>> the (in the early load case not even correctly set up!) per-cpu data.
>> But the only member of cpuinfo_x86 that gets uses is ->x86, the family.
>> Line 159: switch(c->x86) and Line 301: if (proc_fam!)c->x86)
>>
>> I really wanted to make that switch from cpu to x86family a separate
>> patch, that it would be more obvious correct, but because of that
>> amd_bsp_mpb hunk I can't find a good cut and thats why this patch is
>> larger that I would have preferred.
>
> Ok.

First moving that hunk, then switching from cpu to x86family did work.
See patch 4/5 and 5/5. :-)

>> >> The microcode loading is not depending on the CPU it is
>> >
>> > Mostly. There are mixed-stepping boxes which need to differentiate
>> > between which cpu we're applying the patch for.
>>
>> Nothing looks at ->x86_model or ->x86_mask during load.
>> It will always load all patches from the current family.
>
> Yes, that's the idea. We want to have all patches for the current family
> loaded.

And thats why switching from cpu to x86family is OK: during *load* we
only care for the family.

>> If loading would really depend on the current cpu in a mixed
>> system that would be horrible: Depending on which CPU gets execute
>> load_microcode_amd() it there would be different patches loaded into
>> RAM?
>
> No, we load the microcode based on CPUID(1).EAX which is in the
> equivalence table. Look at find_equiv_id().
>
> But for that we need all patches belonging to the current family to be
> in the cache.

I think you confused *load* and *apply*.
load_microcode_amd() *loads* the microcode from a firmwarefile into
the pcache list.
This wants all patches for the family and thats why my switch here is OK.
apply_microcode_amd() *applies* the microcode to the CPU / "loads" it
into the CPU. That function (or better its helper find_patch()) need
the full stepping/masking. I did not change that function, because in
that case 'cpu' makes sense as a parameter, because the microcode
needs to be applied for each CPU. (You could argue that that parameter
is also stupid: If you ever pass something else as
raw_smp_processor_id() then it will BUG(). But removing that parameter
would need to change the whole microcode_core.c and also
microcode_intel.c. And there that parameter might make sense, so it's
better to keep 'cpu' for apply_microcode_amd())

But wrt. you concern about mixed stepping systems: There early
microcode loading is definitly broken for 32bit.
The current mainline code will save the patch for the BSP in
amd_bsp_mpb and then apply that to all CPUs irregardless of its
stepping. With my change in 4/5 to move the amd_bsp_mpb setup to apply
time it will now wrongly patch all CPUs with the microcode that was
loaded last.
But u8 amd_bsp_mpb[NR_CPUS][MPB_MAX_SIZE] doesn't look like a good idea.
Maybe the best way here is to fail apply_microcode_amd() if
amd_bsp_mpb already contains an incompatible patch and in
load_ucode_amd_ap() only apply it when the cpu_sig matches.
Or u8 amd_bsp_mpb[4][MPB_MAX_SIZE] which would support up to 4
different steppings per system.

No patch yet, because I do not understand why that is not a problem on
64bit. load_ucode_amd_bsp() is shared between 32 and 64 so if that
code works then I can't really find a need for amd_bsp_mpb at all.

So my current plan is to look into who calls load_ucode_amd_bsp() and
load_ucode_amd_ap() and in what sequence (..hopefully in the same
sequence on 32 and 64bit...) and if I can find a rational why
amd_bsp_mpb can be killed, I will send you a patch.
Otherwise I will try to create something that will fail
apply_microcode_amd() in a safe way, if CONFIG_MICROCODE_AMD_EARLY
gets uses on a mixed system.

>> > Btw, your config boots on my F14h box with "nomodeset" on the command
>> > line because it is missing radeon firmware for my gpu.
>>
>> I suspect a 

Re: [PATCH]Fix early microcode loading on AMD

2013-07-24 Thread Torsten Kaiser
On Wed, Jul 24, 2013 at 3:41 PM, Borislav Petkov b...@alien8.de wrote:
 Let me try to answer this as well as I can, peacemeal-wise.

 On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote:
 On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov b...@alien8.de wrote:
  On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote:
  Fixup the early AMD microcode loading.
 
  * load_microcode_amd() (and the helper its using) should not have an
  cpu parameter.
 
  Hmm, I don't think so - we get the cpu handed down from microcode_core
  and besides the early load on 32bit needs to do find_patch(cpu).

 Thats why I moved that part into apply_microcode_amd(). See later on
 more, why I think that move is the right thing.
 And without that the current cpu parameter will only be used to get
 the (in the early load case not even correctly set up!) per-cpu data.
 But the only member of cpuinfo_x86 that gets uses is -x86, the family.
 Line 159: switch(c-x86) and Line 301: if (proc_fam!)c-x86)

 I really wanted to make that switch from cpu to x86family a separate
 patch, that it would be more obvious correct, but because of that
 amd_bsp_mpb hunk I can't find a good cut and thats why this patch is
 larger that I would have preferred.

 Ok.

First moving that hunk, then switching from cpu to x86family did work.
See patch 4/5 and 5/5. :-)

  The microcode loading is not depending on the CPU it is
 
  Mostly. There are mixed-stepping boxes which need to differentiate
  between which cpu we're applying the patch for.

 Nothing looks at -x86_model or -x86_mask during load.
 It will always load all patches from the current family.

 Yes, that's the idea. We want to have all patches for the current family
 loaded.

And thats why switching from cpu to x86family is OK: during *load* we
only care for the family.

 If loading would really depend on the current cpu in a mixed
 system that would be horrible: Depending on which CPU gets execute
 load_microcode_amd() it there would be different patches loaded into
 RAM?

 No, we load the microcode based on CPUID(1).EAX which is in the
 equivalence table. Look at find_equiv_id().

 But for that we need all patches belonging to the current family to be
 in the cache.

I think you confused *load* and *apply*.
load_microcode_amd() *loads* the microcode from a firmwarefile into
the pcache list.
This wants all patches for the family and thats why my switch here is OK.
apply_microcode_amd() *applies* the microcode to the CPU / loads it
into the CPU. That function (or better its helper find_patch()) need
the full stepping/masking. I did not change that function, because in
that case 'cpu' makes sense as a parameter, because the microcode
needs to be applied for each CPU. (You could argue that that parameter
is also stupid: If you ever pass something else as
raw_smp_processor_id() then it will BUG(). But removing that parameter
would need to change the whole microcode_core.c and also
microcode_intel.c. And there that parameter might make sense, so it's
better to keep 'cpu' for apply_microcode_amd())

But wrt. you concern about mixed stepping systems: There early
microcode loading is definitly broken for 32bit.
The current mainline code will save the patch for the BSP in
amd_bsp_mpb and then apply that to all CPUs irregardless of its
stepping. With my change in 4/5 to move the amd_bsp_mpb setup to apply
time it will now wrongly patch all CPUs with the microcode that was
loaded last.
But u8 amd_bsp_mpb[NR_CPUS][MPB_MAX_SIZE] doesn't look like a good idea.
Maybe the best way here is to fail apply_microcode_amd() if
amd_bsp_mpb already contains an incompatible patch and in
load_ucode_amd_ap() only apply it when the cpu_sig matches.
Or u8 amd_bsp_mpb[4][MPB_MAX_SIZE] which would support up to 4
different steppings per system.

No patch yet, because I do not understand why that is not a problem on
64bit. load_ucode_amd_bsp() is shared between 32 and 64 so if that
code works then I can't really find a need for amd_bsp_mpb at all.

So my current plan is to look into who calls load_ucode_amd_bsp() and
load_ucode_amd_ap() and in what sequence (..hopefully in the same
sequence on 32 and 64bit...) and if I can find a rational why
amd_bsp_mpb can be killed, I will send you a patch.
Otherwise I will try to create something that will fail
apply_microcode_amd() in a safe way, if CONFIG_MICROCODE_AMD_EARLY
gets uses on a mixed system.

  Btw, your config boots on my F14h box with nomodeset on the command
  line because it is missing radeon firmware for my gpu.

 I suspect a F14h box will never see that hang. It trips over the the
 C1E erratum and amd_erratum_400[] looks like it only affects 0xfh and
 0x10h (like my Phenom II X6).

 I could fire up my F10h if needed :)

  executed and all the loaded patches will end up in a global list for all
  CPUs anyway.
  * Return -1 (like Intels apply_microcode) when the loading fails,
  also do not set the active microcode level on failure.
 
  Yep, this part I

Re: [PATCH]Fix early microcode loading on AMD

2013-07-24 Thread Torsten Kaiser
On Wed, Jul 24, 2013 at 3:56 PM, Borislav Petkov b...@alien8.de wrote:
 On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote:
  * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could
  later load an older microcode file that would overwrite amd_bsp_mpb,
  but would not be applied to the CPUs
 
  See the patch id check in apply_ucode_in_initrd()?
 
  if (eq_id == mc-hdr.processor_rev_id  rev  mc-hdr.patch_id)

 I meant with load load_microcode_amd() not the loading of the
 microcode into the CPU.

 1.: load microcode rev X into CPU (early or normal is not important)
 2.: get older microcode file that only contains rev Y with YX
 3.: trigger load_microcode_amd() with a corrupt file: This will call
 cleanup() and empty pcache.

 Ok, that's actually a good catch. So I wonder: why in hell would we
 flush the pcache if some of the blobs we're loading are corrupted. So
 what?! Jacob, what were you thinking - I'd be very interested to know
 what the idea behind this was.

 So, just to refresh everybody: the idea of the pcache is to keep all
 patches for the current family in memory so that we can support all
 sorts of hotplug and cpu mixed stepping diddling.

Then it would probably be the best to kill free_cache() completely.
Which would mean cleanup() should also go.
Which will make unloading microcode_amd.ko impossible.
But that is probably a good idea anyway: If you unload the module
there is no way to keep pcache.

But I still have another way to kill you: free_equiv_cpu_table()
Without that table find_patch() can't work and will not return the
correct information.

And that can be triggered by:
* start of load_microcode_amd(): If you reach that function (Only
UCODE_MAGIC needs to be in the file) that table is dead.
* __load_microcode_amd(): If the file only contains the table but no
patches (invalid type field in container file section header\n)

 4.: trigger load_microcode_amd() with the older file:
  * this will now load rev Y into pcache
  * rev Y will be returned by find_patch and copied into amd_bsp_mpb
  * any try to apply rev Y will be skipped in apply_microcode_amd()

 So now the CPU still correctly has rev X, but amd_bsp_mpb will contain
 the wrong rev Y.

 Right, so this shouldn't happen - what should happen is, pcache would
 hold both X and Y and find_patch would automatically give you the right
 one.

 And this is guaranteed since we keep the patches in a sorted linked list
 by -patch_id which is guaranteed to be increasing.

 So actually load_microcode_amd() shouldn't be doing cleanup() but simply
 return ret upwards.

But it already called free_equiv_cpu_table() and so pcache is inaccessible.

And I don't think just preserving equiv_cpu_table for restoring in the
error case will be the right solution: If the new firmware file
contains a new table with fewer entries (or different entries!) some
of the patches in pcache might become inaccessible.

 That copying already in load_microcode also is suspicious if someone
 would only load the microcode but not apply it. But I did not find
 a codepath in arch/x86/kernel/microcode_core.c to load it without a
 followup apply.

 Yeah, we always load and apply.

 So now back to the original problem - load_microcode_amd() shouldn't
 clear the pcache and, in that case, a subsequent find_patch() would
 always give the right patch.

Not if equiv_cpu_table got mangled.
So should install_equiv_cpu_table() be turned into
add_to_equiv_cpu_table() or should pcache save all cpu_sig with each
patch, so that find_patch() no longer needs equiv_cpu_table?
I suspect saving that in struct ucode_patch might be better, to
prevent changes in equiv_id - cpu_sig mapping to make a patch
inaccessible.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Fix early microcode loading on AMD

2013-07-24 Thread Torsten Kaiser
On Wed, Jul 24, 2013 at 4:19 PM, Borislav Petkov b...@alien8.de wrote:
 On Tue, Jul 23, 2013 at 06:57:12PM +0200, Torsten Kaiser wrote:
  The other problem I see is not updating c-microcode since it is going
  to be overwritten by smp_store_cpu_info, which is unfortunate.
 
  And I don't see where Intel are updating that cpuinfo_x86.microcode
  field on early load too.
 
  So, AFAICT, c-microcode would remain unset when we only do early
  microcode load. But that is something we should fix as a later patch.

 I don't see a problem with that staying unset.
 apply_microcode_amd() directly reads the rev from
 MSR_AMD64_PATCH_LEVEL so it does not depend on that being correct.
 And smp_store_(boot)_cpu_info will also read the current rev directly
 from the CPU to fill -microcode.

 We need to store the actual microcode revision to c-microcode for
 /proc/cpuinfo and MCE.

init_amd() will fill that field. (You could alway compile with
CONFIG_MICROCODE_AMD=n and that field would still need filling)
And as that will get called before smp_store_(boo)_cpu_info()
everything should be fine.

  So I think you should switch load_ucode_amd_ap to __apply_microcode_amd:
 
  p = find_patch()
 
  __apply_microcode_amd(p-mc_data);
 
  which should take care of the issue you're seeing, IMHO.

 The issue I'm seeing is that collect_cpu_info_amd_early() fills c-x86
 but not c-x86_vendor.
 Which breaks cpu_has_amd_erratum() and then Erratum 400 breaks the boot.

 I did not really want to switch from apply_microcode_amd() to
 __apply_microcode_amd() because then I would lose the check if the new
 microcode is really an upgrade.

 Well, if the BSP has already loaded the pcache, there's no need for
 the AP to parse and load the same microcode blobs file for the initrd,
 right?

loading != applying.

load_ucode_amd_ap() should probably called apply_ucode_amd_ap()
because that is primarily for applying the microcode.
That it also loads it (but really only once thanks to ucode_loaded) is
only because nobody else has run yet.

That whole place is hairy: Because on 32bit that seems to run much
earlier the 64 and 32 cases are very different.
64bit can and will use pcache/apply_microcode_amd() for the non BSP
CPUs, but on 32 bit everything directly applys the patches from initrd
memory into the CPUs be directly calling __apply_microcode_amd(). And
so bypassing pcache.

See comment above the 32bit version of load_ucode_amd_ap():
/*
 * On 32-bit, since AP's early load occurs before paging is turned on, we
 * cannot traverse cpu_equiv_table and pcache in kernel heap memory. So during
 * cold boot, AP will apply_ucode_in_initrd() just like the BSP. During
 * save_microcode_in_initrd_amd() BSP's patch is copied to amd_bsp_mpb, which
 * is used upon resume from suspend.
 */

As written in the other email: I'm currently trying to see if I can
kill amd_bsp_mpb...

  * load_ucode_ap(): Quick exit for !cpu, because without 
  load_microcode_amd()
  getting called apply_microcode_amd() can't do anything. Exit, if no 
  microcode
  could be loaded.
 
  This could probably be a WARN_ON(!cpu) to catch errors...

 No, load_ucode_ap() will be called for cpu == 0.

 This needs fixing IMO...

Can't answer that. I have only seen that it is called for cpu == 0 and
that there is no special case für CPU#0 in all the places that call
load_ucode_ap()...

 Btw, thanks for looking at this and asking critical questions!

 --
 Regards/Gruss,
 Boris.

 Sent from a fat crate under my desk. Formatting is fine.
 --
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Fix early microcode loading on AMD

2013-07-23 Thread Torsten Kaiser
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov  wrote:
> On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote:
>> Fixup the early AMD microcode loading.
>>
>> * load_microcode_amd() (and the helper its using) should not have an
>> cpu parameter.
>
> Hmm, I don't think so - we get the cpu handed down from microcode_core
> and besides the early load on 32bit needs to do find_patch(cpu).
>
>> The microcode loading is not depending on the CPU it is
>
> Mostly. There are mixed-stepping boxes which need to differentiate
> between which cpu we're applying the patch for.

I redid the patch in 5 parts, hopefully now better to understand.
Without the other changes the microcode_amd.c-part of patch 5/5 should
make it much more obvious that my change did not result in a different
behavior about which patches get loaded into the microcode cache
'pcache'.

> Btw, your config boots on my F14h box with "nomodeset" on the command
> line because it is missing radeon firmware for my gpu.
>
>> executed and all the loaded patches will end up in a global list for all
>> CPUs anyway.
>> * Return -1 (like Intels apply_microcode) when the loading fails,
>> also do not set the active microcode level on failure.
>
> Yep, this part I want. Please send it as a separate patch.

That is now patch 1/5.
Patch 2/5 is new, I skipped that part originally because I did not
want to make it even bigger...

> So I see a couple of issues in this patch and they should be separated
> into single patches - one patch taking care of one issue and explaining
> what the problem is in the commit message (I know you can do that good
> :)).


I'm still seeing some things in the microcode code that look suspicious:

Why is the X86_64 code updating uci->cpu_sig.rev, but the 32bit
version does not? And I can't see anything that reads that value.

Should apply_microcode_amd() really update uci->mc even before
checking if the microcode is newer?

The X86_32 hunk in save_microcode_in_initrd_amd() now seems obsolete.
load_microcode_amd() is no longer using find_patch() so it doesn't use
ucode_cpu_info anymore. But why is that code using
boot_cpu_data.cpu_index to find the BSP but always then passing 0 as
cpu parameter to load_microcode_amd()? If boot_cpu_data.cpu_index is
ever !=0 that code would fail.

... and collect_cpu_info_amd() also looks very weird. If csig would
not point to uci->cpu_sig then find_patch() will not be happy.
Wouldn't directly passing cpuid_eax(0x0001) to find_patch() be a
better interface? Then the early microcode loading code would not need
to access ucode_cpu_info at all.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] x86, AMD: simplify load_microcode_amd() to fix early microcode loading to no longer access uninitialized per-cpu data

2013-07-23 Thread Torsten Kaiser
load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading is not depending on the CPU it is
executed and all the loaded patches will end up in a global list for all
CPUs anyway.
The change from cpu to x86family in load_microcode_amd() now allows to drop
the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which 
is wrong anyway because at that point the per-cpu cpu_info is not yet setup. 
And these values would later be overwritten by smp_store_boot_cpu_info() / 
smp_store_cpu_info().

Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because 
its
only used at one place and without the cpuinfo_x86 accesses it was not much 
left.

Signed-off-by: Torsten Kaiser 

---

One effect of this early, partly initialisation of cpu_info was, that the 
fallback
logic in cpu_has_amd_erratum() did not use boot_cpu_data and because x86_vendor
was not initialised in the per-cpu struct the E400 erratum was not activated on
my system resulting in a failed boot.

--- a/arch/x86/include/asm/microcode_amd.h  2013-07-23 20:15:10.549501081 
+0200
+++ b/arch/x86/include/asm/microcode_amd.h  2013-07-23 20:16:05.329500620 
+0200
@@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e
 
 extern int __apply_microcode_amd(struct microcode_amd *mc_amd);
 extern int apply_microcode_amd(int cpu);
-extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t 
size);
+extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, 
size_t size);
 
 #ifdef CONFIG_MICROCODE_AMD_EARLY
 #ifdef CONFIG_X86_32
--- a/arch/x86/kernel/microcode_amd.c   2013-07-23 20:05:04.469506188 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-23 20:23:22.739496934 +0200
@@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu,
return 0;
 }
 
-static unsigned int verify_patch_size(int cpu, u32 patch_size,
+static unsigned int verify_patch_size(u8 x86family, u32 patch_size,
  unsigned int size)
 {
-   struct cpuinfo_x86 *c = _data(cpu);
u32 max_size;
 
 #define F1XH_MPB_MAX_SIZE 2048
@@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in
 #define F15H_MPB_MAX_SIZE 4096
 #define F16H_MPB_MAX_SIZE 3458
 
-   switch (c->x86) {
+   switch (x86family) {
case 0x14:
max_size = F14H_MPB_MAX_SIZE;
break;
@@ -283,9 +282,8 @@ static void cleanup(void)
  * driver cannot continue functioning normally. In such cases, we tear
  * down everything we've used up so far and exit.
  */
-static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int 
leftover)
+static int verify_and_add_patch(u8 x86family, u8 *fw, unsigned int leftover)
 {
-   struct cpuinfo_x86 *c = _data(cpu);
struct microcode_header_amd *mc_hdr;
struct ucode_patch *patch;
unsigned int patch_size, crnt_size, ret;
@@ -305,7 +303,7 @@ static int verify_and_add_patch(unsigned
 
/* check if patch is for the current family */
proc_fam = ((proc_fam >> 8) & 0xf) + ((proc_fam >> 20) & 0xff);
-   if (proc_fam != c->x86)
+   if (proc_fam != x86family)
return crnt_size;
 
if (mc_hdr->nb_dev_id || mc_hdr->sb_dev_id) {
@@ -314,7 +312,7 @@ static int verify_and_add_patch(unsigned
return crnt_size;
}
 
-   ret = verify_patch_size(cpu, patch_size, leftover);
+   ret = verify_patch_size(x86family, patch_size, leftover);
if (!ret) {
pr_err("Patch-ID 0x%08x: size mismatch.\n", mc_hdr->patch_id);
return crnt_size;
@@ -345,7 +343,7 @@ static int verify_and_add_patch(unsigned
return crnt_size;
 }
 
-static enum ucode_state __load_microcode_amd(int cpu, const u8 *data, size_t 
size)
+static enum ucode_state __load_microcode_amd(u8 x86family, const u8 *data, 
size_t size)
 {
enum ucode_state ret = UCODE_ERROR;
unsigned int leftover;
@@ -368,7 +366,7 @@ static enum ucode_state __load_microcode
}
 
while (leftover) {
-   crnt_size = verify_and_add_patch(cpu, fw, leftover);
+   crnt_size = verify_and_add_patch(x86family, fw, leftover);
if (crnt_size < 0)
return ret;
 
@@ -379,14 +377,14 @@ static enum ucode_state __load_microcode
return UCODE_OK;
 }
 
-enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size)
+enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size)
 {
enum ucode_state ret;
 
/* free old equiv table */
free_equiv_cpu_table();
 
-   ret = __load_microcode_amd(cpu, data, size);
+   ret = __load_microcode_amd(x86family, data, size);
 
if (ret != UCODE_OK)
cleanup();
@@ -436,7 +434,7 @@ static enum ucode_state request_microcod
goto fw_release;
}
 
-   

[PATCH 4/5] x86, AMD: saved applied, not loaded microcode for reloading on resume

2013-07-23 Thread Torsten Kaiser
* Save the amd_bsp_mpb on apply, not on load. Otherwise someone could
  later load an older microcode file via load_microcode_amd() that
  would overwrite amd_bsp_mpb, but would not be applied to the CPUs
(apply_microcode_amd() checks the current patchlevel, but the copy code
in load_microcode_adm() did not. If somehow cleanup() gets called and
  clears pcache find_patch() could return return older patches then the
  currently installed microcode)
* Save the amd_bsp_mpb on every update. Otherwise, if someone would
  update the microcode after offlining the BSP, these updates would not
  get saved and would be lost on resume.
* apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because
load_microcode_amd() its no longer doing this and its not using
apply_microcode_amd().

Signed-off-by: Torsten Kaiser 

---

Removing this hunk from load_microcode_amd() also allows me to kill the
cpu parameter for that function in the next patch...

--- a/arch/x86/kernel/microcode_amd.c   2013-07-23 19:43:30.359517091 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-23 20:05:04.469506188 +0200
@@ -228,6 +228,12 @@ int apply_microcode_amd(int cpu)
pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
mc_amd->hdr.patch_id);
 
+#if defined(CONFIG_MICROCODE_AMD_EARLY) && defined(CONFIG_X86_32)
+   /* save applied patch for early load */
+   memset(amd_bsp_mpb, 0, MPB_MAX_SIZE);
+   memcpy(amd_bsp_mpb, p->data, min_t(u32, ksize(p->data), MPB_MAX_SIZE));
+#endif
+
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
 
@@ -385,17 +391,6 @@ enum ucode_state load_microcode_amd(int
if (ret != UCODE_OK)
cleanup();
 
-#if defined(CONFIG_MICROCODE_AMD_EARLY) && defined(CONFIG_X86_32)
-   /* save BSP's matching patch for early load */
-   if (cpu_data(cpu).cpu_index == boot_cpu_data.cpu_index) {
-   struct ucode_patch *p = find_patch(cpu);
-   if (p) {
-   memset(amd_bsp_mpb, 0, MPB_MAX_SIZE);
-   memcpy(amd_bsp_mpb, p->data, min_t(u32, ksize(p->data),
-  MPB_MAX_SIZE));
-   }
-   }
-#endif
return ret;
 }
 
--- a/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 
+0200
+++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:05:14.969506099 
+0200
@@ -170,6 +170,13 @@ static void apply_ucode_in_initrd(void *
mc = (struct microcode_amd *)(data + SECTION_HDR_SIZE);
if (eq_id == mc->hdr.processor_rev_id && rev < mc->hdr.patch_id)
if (__apply_microcode_amd(mc) == 0) {
+#ifdef CONFIG_X86_32
+   /* save applied patch for early load */
+   memset((void *)__pa(amd_bsp_mpb), 0,
+   MPB_MAX_SIZE);
+   memcpy((void *)__pa(amd_bsp_mpb), mc,
+   min_t(u32, header[1], MPB_MAX_SIZE));
+#endif
rev = mc->hdr.patch_id;
*new_rev = rev;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] x86, AMD: cleanup: merge common code in early microcode loading

2013-07-23 Thread Torsten Kaiser
Extract common checks and initialisations from load_ucode_ap() and
save_microcode_in_initrd_amd() to load_microcode_amd_early().
load_ucode_ap() gets a quick exit for !cpu, because for the BSP there is
already a different function dealing with its update.

The original code already didn't anything, because without load_microcode_amd()
getting called apply_microcode_amd() could not do anything.

Signed-off-by: Torsten Kaiser 

--- a/arch/x86/kernel/microcode_amd_early.c 2013-07-22 06:22:32.0 
+0200
+++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 
+0200
@@ -196,6 +196,23 @@ void __init load_ucode_amd_bsp(void)
apply_ucode_in_initrd(cd.data, cd.size);
 }
 
+static int load_microcode_amd_early(void)
+{
+   enum ucode_state ret;
+   void *ucode;
+
+   if (ucode_loaded || !ucode_size || !initrd_start)
+   return 0;
+
+   ucode = (void *)(initrd_start + ucode_offset);
+   ret = load_microcode_amd(0, ucode, ucode_size);
+   if (ret != UCODE_OK)
+   return -EINVAL;
+
+   ucode_loaded = true;
+   return 0;
+}
+
 #ifdef CONFIG_X86_32
 u8 amd_bsp_mpb[MPB_MAX_SIZE];
 
@@ -258,17 +275,13 @@ void load_ucode_amd_ap(void)
 
collect_cpu_info_amd_early(_data(cpu), ucode_cpu_info + cpu);
 
-   if (cpu && !ucode_loaded) {
-   void *ucode;
-
-   if (!ucode_size || !initrd_start)
-   return;
+   /* BSP via load_ucode_amd_bsp() */
+   if (!cpu)
+   return;
 
-   ucode = (void *)(initrd_start + ucode_offset);
-   if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK)
-   return;
-   ucode_loaded = true;
-   }
+   load_microcode_amd_early();
+   if (!ucode_loaded)
+   return;
 
apply_microcode_amd(cpu);
 }
@@ -276,8 +289,6 @@ void load_ucode_amd_ap(void)
 
 int __init save_microcode_in_initrd_amd(void)
 {
-   enum ucode_state ret;
-   void *ucode;
 #ifdef CONFIG_X86_32
unsigned int bsp = boot_cpu_data.cpu_index;
struct ucode_cpu_info *uci = ucode_cpu_info + bsp;
@@ -289,14 +300,5 @@ int __init save_microcode_in_initrd_amd(
pr_info("microcode: updated early to new patch_level=0x%08x\n",
ucode_new_rev);
 
-   if (ucode_loaded || !ucode_size || !initrd_start)
-   return 0;
-
-   ucode = (void *)(initrd_start + ucode_offset);
-   ret = load_microcode_amd(0, ucode, ucode_size);
-   if (ret != UCODE_OK)
-   return -EINVAL;
-
-   ucode_loaded = true;
-   return 0;
+   return load_microcode_amd_early();
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] x86, microcode: Don't lose error returns in save_microcode_in_initrd()

2013-07-23 Thread Torsten Kaiser
Don't lose the error return.
This was lost when early amd microcode loading was added in
757885e94a22bcc82beb9b1445c95218cb20ceab

Signed-off-by: Torsten Kaiser 

--- a/arch/x86/kernel/microcode_core_early.c2013-07-23 19:44:05.509516795 
+0200
+++ b/arch/x86/kernel/microcode_core_early.c2013-07-23 19:58:34.459509474 
+0200
@@ -127,11 +127,11 @@ int __init save_microcode_in_initrd(void
switch (c->x86_vendor) {
case X86_VENDOR_INTEL:
if (c->x86 >= 6)
-   save_microcode_in_initrd_intel();
+   return save_microcode_in_initrd_intel();
break;
case X86_VENDOR_AMD:
if (c->x86 >= 0x10)
-   save_microcode_in_initrd_amd();
+   return save_microcode_in_initrd_amd();
break;
default:
break;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] x86, AMD: fix error path in apply_microcode_amd()

2013-07-23 Thread Torsten Kaiser
Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.

Signed-off-by: Torsten Kaiser 

--- a/arch/x86/kernel/microcode_amd.c   2013-07-23 19:42:16.089517717 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-23 19:43:30.359517091 +0200
@@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu)
return 0;
}
 
-   if (__apply_microcode_amd(mc_amd))
+   if (__apply_microcode_amd(mc_amd)) {
pr_err("CPU%d: update failed for patch_level=0x%08x\n",
cpu, mc_amd->hdr.patch_id);
-   else
-   pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
-   mc_amd->hdr.patch_id);
+   return -1;
+   }
+   pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
+   mc_amd->hdr.patch_id);
 
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] x86, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86

2013-07-23 Thread Torsten Kaiser
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early() will
fill ->x86 and so the fallback to boot_cpu_data is not used.
But ->x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in 
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller
init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug()
that is controlled by cpu_has_amd_erratum() also only uses that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and
the broken fallback can be dropped.

I also added an WARN_ON() into the vendor check because init_amd() can only
be used by AMD CPUs and if the current failure hadn't been silent this bug
would have been much more obvious.

V2: At request of Borislav Petkov: BUG_ON -> WARN_ON and subject change

Signed-off-by: Torsten Kaiser 

--- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200
+++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200
@@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf
 
 static const int amd_erratum_383[];
 static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
 
 static void init_amd(struct cpuinfo_x86 *c)
 {
@@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86
value &= ~(1ULL << 24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);
 
-   if (cpu_has_amd_erratum(amd_erratum_383))
+   if (cpu_has_amd_erratum(c, amd_erratum_383))
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}
 
-   if (cpu_has_amd_erratum(amd_erratum_400))
+   if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
 
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, );
@@ -878,22 +878,16 @@ static const int amd_erratum_400[] =
 static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
 
-static bool cpu_has_amd_erratum(const int *erratum)
+
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 {
-   struct cpuinfo_x86 *cpu = __this_cpu_ptr(_info);
int osvw_id = *erratum++;
u32 range;
u32 ms;
 
-   /*
-* If called early enough that current_cpu_data hasn't been initialized
-* yet, fall back to boot_cpu_data.
-*/
-   if (cpu->x86 == 0)
-   cpu = _cpu_data;
-
-   if (cpu->x86_vendor != X86_VENDOR_AMD)
-   return false;
+   /* Should never be called on non-AMD-CPUs */
+   if (WARN_ON(cpu->x86_vendor != X86_VENDOR_AMD))
+   return false;
 
if (osvw_id >= 0 && osvw_id < 65536 &&
cpu_has(cpu, X86_FEATURE_OSVW)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Fix early microcode loading on AMD

2013-07-23 Thread Torsten Kaiser
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov  wrote:
> On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote:
>> Fixup the early AMD microcode loading.
>>
>> * load_microcode_amd() (and the helper its using) should not have an
>> cpu parameter.
>
> Hmm, I don't think so - we get the cpu handed down from microcode_core
> and besides the early load on 32bit needs to do find_patch(cpu).

Thats why I moved that part into apply_microcode_amd(). See later on
more, why I think that move is the right thing.
And without that the current cpu parameter will only be used to get
the (in the early load case not even correctly set up!) per-cpu data.
But the only member of cpuinfo_x86 that gets uses is ->x86, the family.
Line 159: switch(c->x86) and Line 301: if (proc_fam!)c->x86)

I really wanted to make that switch from cpu to x86family a separate
patch, that it would be more obvious correct, but because of that
amd_bsp_mpb hunk I can't find a good cut and thats why this patch is
larger that I would have preferred.

>> The microcode loading is not depending on the CPU it is
>
> Mostly. There are mixed-stepping boxes which need to differentiate
> between which cpu we're applying the patch for.

Nothing looks at ->x86_model or ->x86_mask during load.
It will always load all patches from the current family.
If loading would really depend on the current cpu in a mixed system
that would be horrible: Depending on which CPU gets execute
load_microcode_amd() it there would be different patches loaded into
RAM?

> Btw, your config boots on my F14h box with "nomodeset" on the command
> line because it is missing radeon firmware for my gpu.

I suspect a F14h box will never see that hang. It trips over the the
C1E erratum and amd_erratum_400[] looks like it only affects 0xfh and
0x10h (like my Phenom II X6).

>> executed and all the loaded patches will end up in a global list for all
>> CPUs anyway.
>> * Return -1 (like Intels apply_microcode) when the loading fails,
>> also do not set the active microcode level on failure.
>
> Yep, this part I want. Please send it as a separate patch.

OK, will send that together with the resend for cpu_has_amd_erratum().

>> * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could
>> later load an older microcode file that would overwrite amd_bsp_mpb,
>> but would not be applied to the CPUs
>
> See the patch id check in apply_ucode_in_initrd()?
>
> if (eq_id == mc->hdr.processor_rev_id && rev < mc->hdr.patch_id)

I meant with "load" load_microcode_amd() not the loading of the
microcode into the CPU.

1.: load microcode rev X into CPU (early or normal is not important)
2.: get older microcode file that only contains rev Y with Y> * Save the amd_bsp_mpb on every update. Otherwise someone could offline
>> the BSP, update the microcode and this would be lost on resume
>
> Huh, is amd_bsp_mpb going to disappear all of a sudden?
>
> And that doesn't matter because when we online the BSP later, it goes
> through the CPU hotplug notifier mc_cpu_callback. Or am I missing
> something?

Yeah, me correctly describing what I was meaning. ;-)

1.: boot system, BIOS give microcode rev. X
2.: offline the BSP
3.: update microcode to rev. Y with Y > X
Because the BSP is not online rev. Y will not be copied into amd_bsp_mpb
4.: supend
5.: resume, BIOS gives rev. X again
6.: amd_bsp_mpb is empty -> rev. Y will not be reapplied.

>> * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because
>> load_microcode_amd() its no longer doing this and its not using
>> apply_microcode_amd().
>> * extract common checks and initialisations from load_ucode_ap() and
>> load_microcode_amd() to load_microcode_amd_early(). The change from
>> cpu to x86family in load_microcode_amd() allows to drop the code messing
>> with cpu_data(cpu), with is wrong anyway because at that point the
>> per-cpu cpu_info is not yet setup. And these values would later be
>> overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info().
>
> Right, so I was thinking about this. And the code is pretty nasty: we do a
> load_ucode_amd_ap() but we do add the ucode for the BSP:
>
> if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK)

No, that code will not be reached for the BSP, because it is behind:
if (cpu && !ucode_loaded) {
The BSP has cpu == 0. Thats why I adding the following in my patch:
+   /* BSP via load_ucode_amd_bsp() */
+   if (!cpu)
+   return;

I don't understand if that is really correct, but that was the
original behavior, and I didn't feel competent enough to decree that
calling load_microcode_amd() for the BSP would be save.
(The code there is strange: There is a load_ucode_amd_

[PATCH]Fix early microcode loading on AMD

2013-07-23 Thread Torsten Kaiser
Fixup the early AMD microcode loading.

* load_microcode_amd() (and the helper its using) should not have an
cpu parameter. The microcode loading is not depending on the CPU it is
executed and all the loaded patches will end up in a global list for all
CPUs anyway.
* Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.
* Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later 
load an older microcode file that would overwrite amd_bsp_mpb, but would
not be applied to the CPUs
* Save the amd_bsp_mpb on every update. Otherwise someone could offline
the BSP, update the microcode and this would be lost on resume
* apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because
load_microcode_amd() its no longer doing this and its not using 
apply_microcode_amd().
* extract common checks and initialisations from load_ucode_ap() and
load_microcode_amd() to load_microcode_amd_early(). The change from
cpu to x86family in load_microcode_amd() allows to drop the code messing
with cpu_data(cpu), with is wrong anyway because at that point the
per-cpu cpu_info is not yet setup. And these values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info().
* fold collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its
only used at one place.
* load_ucode_ap(): Quick exit for !cpu, because without load_microcode_amd()
getting called apply_microcode_amd() can't do anything. Exit, if no microcode
could be loaded.
* reduce save_microcode_in_initrd_amd() by reusing load_microcode_amd_early()

Main benefit is, that the early microcode loading no longer plays games with 
the 
not-yet-initialised per-cpu cpu_info. apply_microcode_amd() will still write
into cpu_data(cpu)->microcode, but I see no good way to remove that there, 
because
for not-early microcode updates that is exactly the right place for that update.

Signed-off-by: Torsten Kaiser 

---

This alone also fixes the hang-on-boot I experienced with 3.11-rc1 even
if the fix for cpu_has_amd_erratum() is not applied, because now the
trigger (filling ->x86 but not ->x86_vendor) is no longer there. But I
think both patches should be applied.

Boot tested on 64 and 32bit, but as my BIOS already provides up-to-date
microcode I could not test, if that gets applied correctly.

--- a/arch/x86/include/asm/microcode_amd.h  2013-07-22 17:54:25.166193431 
+0200
+++ b/arch/x86/include/asm/microcode_amd.h  2013-07-22 17:56:31.066192463 
+0200
@@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e
 
 extern int __apply_microcode_amd(struct microcode_amd *mc_amd);
 extern int apply_microcode_amd(int cpu);
-extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t 
size);
+extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, 
size_t size);
 
 #ifdef CONFIG_MICROCODE_AMD_EARLY
 #ifdef CONFIG_X86_32
--- a/arch/x86/kernel/microcode_amd.c   2013-07-22 17:33:55.856202878 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-22 21:45:28.186086900 +0200
@@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu,
return 0;
 }
 
-static unsigned int verify_patch_size(int cpu, u32 patch_size,
+static unsigned int verify_patch_size(u8 x86family, u32 patch_size,
  unsigned int size)
 {
-   struct cpuinfo_x86 *c = _data(cpu);
u32 max_size;
 
 #define F1XH_MPB_MAX_SIZE 2048
@@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in
 #define F15H_MPB_MAX_SIZE 4096
 #define F16H_MPB_MAX_SIZE 3458
 
-   switch (c->x86) {
+   switch (x86family) {
case 0x14:
max_size = F14H_MPB_MAX_SIZE;
break;
@@ -220,12 +219,20 @@ int apply_microcode_amd(int cpu)
return 0;
}
 
-   if (__apply_microcode_amd(mc_amd))
+   if (__apply_microcode_amd(mc_amd)) {
pr_err("CPU%d: update failed for patch_level=0x%08x\n",
cpu, mc_amd->hdr.patch_id);
-   else
-   pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
-   mc_amd->hdr.patch_id);
+   return -1;
+   }
+
+#if defined(CONFIG_MICROCODE_AMD_EARLY) && defined(CONFIG_X86_32)
+   /* save applied patch for early load */
+   memset(amd_bsp_mpb, 0, MPB_MAX_SIZE);
+   memcpy(amd_bsp_mpb, p->data, min_t(u32, ksize(p->data), MPB_MAX_SIZE));
+#endif
+
+   pr_info("CPU%d: new patch_level=0x%08x\n", cpu,
+   mc_amd->hdr.patch_id);
 
uci->cpu_sig.rev = mc_amd->hdr.patch_id;
c->microcode = mc_amd->hdr.patch_id;
@@ -276,9 +283,8 @@ static void cleanup(void)
  * driver cannot continue functioning normally. In such cases, we tear
  * down everything we've used up so far and exit.
  */
-static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int 
leftover)
+static int verify

[PATCH]Fix boot hang in 3.11-rc1/2 because of bug in AMD errata check

2013-07-23 Thread Torsten Kaiser
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early() will
fill ->x86 and so the fallback to boot_cpu_data is not used.
But ->x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in 
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller
init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug()
that is controlled by cpu_has_amd_erratum() also only uses that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and
the broken fallback can be dropped.

I also turned the vendor check into an BUG_ON() because init_amd() can only
be used by AMD CPUs and if the current failure hadn't been silent this bug
would have been much more obvious.

Signed-off-by: Torsten Kaiser 

--- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200
+++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200
@@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf
 
 static const int amd_erratum_383[];
 static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
 
 static void init_amd(struct cpuinfo_x86 *c)
 {
@@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86
value &= ~(1ULL << 24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);
 
-   if (cpu_has_amd_erratum(amd_erratum_383))
+   if (cpu_has_amd_erratum(c, amd_erratum_383))
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}
 
-   if (cpu_has_amd_erratum(amd_erratum_400))
+   if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
 
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, >microcode, );
@@ -878,22 +878,15 @@ static const int amd_erratum_400[] =
 static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
 
-static bool cpu_has_amd_erratum(const int *erratum)
+
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 {
-   struct cpuinfo_x86 *cpu = __this_cpu_ptr(_info);
int osvw_id = *erratum++;
u32 range;
u32 ms;
 
-   /*
-* If called early enough that current_cpu_data hasn't been initialized
-* yet, fall back to boot_cpu_data.
-*/
-   if (cpu->x86 == 0)
-   cpu = _cpu_data;
-
-   if (cpu->x86_vendor != X86_VENDOR_AMD)
-   return false;
+   /* Should never be called on non-AMD-CPUs */
+   BUG_ON(cpu->x86_vendor != X86_VENDOR_AMD);
 
if (osvw_id >= 0 && osvw_id < 65536 &&
cpu_has(cpu, X86_FEATURE_OSVW)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]Fix boot hang in 3.11-rc1/2 because of bug in AMD errata check

2013-07-23 Thread Torsten Kaiser
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early() will
fill -x86 and so the fallback to boot_cpu_data is not used.
But -x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in 
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller
init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug()
that is controlled by cpu_has_amd_erratum() also only uses that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and
the broken fallback can be dropped.

I also turned the vendor check into an BUG_ON() because init_amd() can only
be used by AMD CPUs and if the current failure hadn't been silent this bug
would have been much more obvious.

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

--- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200
+++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200
@@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf
 
 static const int amd_erratum_383[];
 static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
 
 static void init_amd(struct cpuinfo_x86 *c)
 {
@@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86
value = ~(1ULL  24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);
 
-   if (cpu_has_amd_erratum(amd_erratum_383))
+   if (cpu_has_amd_erratum(c, amd_erratum_383))
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}
 
-   if (cpu_has_amd_erratum(amd_erratum_400))
+   if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
 
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, c-microcode, dummy);
@@ -878,22 +878,15 @@ static const int amd_erratum_400[] =
 static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
 
-static bool cpu_has_amd_erratum(const int *erratum)
+
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 {
-   struct cpuinfo_x86 *cpu = __this_cpu_ptr(cpu_info);
int osvw_id = *erratum++;
u32 range;
u32 ms;
 
-   /*
-* If called early enough that current_cpu_data hasn't been initialized
-* yet, fall back to boot_cpu_data.
-*/
-   if (cpu-x86 == 0)
-   cpu = boot_cpu_data;
-
-   if (cpu-x86_vendor != X86_VENDOR_AMD)
-   return false;
+   /* Should never be called on non-AMD-CPUs */
+   BUG_ON(cpu-x86_vendor != X86_VENDOR_AMD);
 
if (osvw_id = 0  osvw_id  65536 
cpu_has(cpu, X86_FEATURE_OSVW)) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]Fix early microcode loading on AMD

2013-07-23 Thread Torsten Kaiser
Fixup the early AMD microcode loading.

* load_microcode_amd() (and the helper its using) should not have an
cpu parameter. The microcode loading is not depending on the CPU it is
executed and all the loaded patches will end up in a global list for all
CPUs anyway.
* Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.
* Save the amd_bsp_mpb on apply, not on load. Otherwise someone could later 
load an older microcode file that would overwrite amd_bsp_mpb, but would
not be applied to the CPUs
* Save the amd_bsp_mpb on every update. Otherwise someone could offline
the BSP, update the microcode and this would be lost on resume
* apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because
load_microcode_amd() its no longer doing this and its not using 
apply_microcode_amd().
* extract common checks and initialisations from load_ucode_ap() and
load_microcode_amd() to load_microcode_amd_early(). The change from
cpu to x86family in load_microcode_amd() allows to drop the code messing
with cpu_data(cpu), with is wrong anyway because at that point the
per-cpu cpu_info is not yet setup. And these values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info().
* fold collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its
only used at one place.
* load_ucode_ap(): Quick exit for !cpu, because without load_microcode_amd()
getting called apply_microcode_amd() can't do anything. Exit, if no microcode
could be loaded.
* reduce save_microcode_in_initrd_amd() by reusing load_microcode_amd_early()

Main benefit is, that the early microcode loading no longer plays games with 
the 
not-yet-initialised per-cpu cpu_info. apply_microcode_amd() will still write
into cpu_data(cpu)-microcode, but I see no good way to remove that there, 
because
for not-early microcode updates that is exactly the right place for that update.

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

---

This alone also fixes the hang-on-boot I experienced with 3.11-rc1 even
if the fix for cpu_has_amd_erratum() is not applied, because now the
trigger (filling -x86 but not -x86_vendor) is no longer there. But I
think both patches should be applied.

Boot tested on 64 and 32bit, but as my BIOS already provides up-to-date
microcode I could not test, if that gets applied correctly.

--- a/arch/x86/include/asm/microcode_amd.h  2013-07-22 17:54:25.166193431 
+0200
+++ b/arch/x86/include/asm/microcode_amd.h  2013-07-22 17:56:31.066192463 
+0200
@@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e
 
 extern int __apply_microcode_amd(struct microcode_amd *mc_amd);
 extern int apply_microcode_amd(int cpu);
-extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t 
size);
+extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, 
size_t size);
 
 #ifdef CONFIG_MICROCODE_AMD_EARLY
 #ifdef CONFIG_X86_32
--- a/arch/x86/kernel/microcode_amd.c   2013-07-22 17:33:55.856202878 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-22 21:45:28.186086900 +0200
@@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu,
return 0;
 }
 
-static unsigned int verify_patch_size(int cpu, u32 patch_size,
+static unsigned int verify_patch_size(u8 x86family, u32 patch_size,
  unsigned int size)
 {
-   struct cpuinfo_x86 *c = cpu_data(cpu);
u32 max_size;
 
 #define F1XH_MPB_MAX_SIZE 2048
@@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in
 #define F15H_MPB_MAX_SIZE 4096
 #define F16H_MPB_MAX_SIZE 3458
 
-   switch (c-x86) {
+   switch (x86family) {
case 0x14:
max_size = F14H_MPB_MAX_SIZE;
break;
@@ -220,12 +219,20 @@ int apply_microcode_amd(int cpu)
return 0;
}
 
-   if (__apply_microcode_amd(mc_amd))
+   if (__apply_microcode_amd(mc_amd)) {
pr_err(CPU%d: update failed for patch_level=0x%08x\n,
cpu, mc_amd-hdr.patch_id);
-   else
-   pr_info(CPU%d: new patch_level=0x%08x\n, cpu,
-   mc_amd-hdr.patch_id);
+   return -1;
+   }
+
+#if defined(CONFIG_MICROCODE_AMD_EARLY)  defined(CONFIG_X86_32)
+   /* save applied patch for early load */
+   memset(amd_bsp_mpb, 0, MPB_MAX_SIZE);
+   memcpy(amd_bsp_mpb, p-data, min_t(u32, ksize(p-data), MPB_MAX_SIZE));
+#endif
+
+   pr_info(CPU%d: new patch_level=0x%08x\n, cpu,
+   mc_amd-hdr.patch_id);
 
uci-cpu_sig.rev = mc_amd-hdr.patch_id;
c-microcode = mc_amd-hdr.patch_id;
@@ -276,9 +283,8 @@ static void cleanup(void)
  * driver cannot continue functioning normally. In such cases, we tear
  * down everything we've used up so far and exit.
  */
-static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int 
leftover)
+static int verify_and_add_patch(u8 x86family, u8 *fw, unsigned int leftover

Re: [PATCH]Fix early microcode loading on AMD

2013-07-23 Thread Torsten Kaiser
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov b...@alien8.de wrote:
 On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote:
 Fixup the early AMD microcode loading.

 * load_microcode_amd() (and the helper its using) should not have an
 cpu parameter.

 Hmm, I don't think so - we get the cpu handed down from microcode_core
 and besides the early load on 32bit needs to do find_patch(cpu).

Thats why I moved that part into apply_microcode_amd(). See later on
more, why I think that move is the right thing.
And without that the current cpu parameter will only be used to get
the (in the early load case not even correctly set up!) per-cpu data.
But the only member of cpuinfo_x86 that gets uses is -x86, the family.
Line 159: switch(c-x86) and Line 301: if (proc_fam!)c-x86)

I really wanted to make that switch from cpu to x86family a separate
patch, that it would be more obvious correct, but because of that
amd_bsp_mpb hunk I can't find a good cut and thats why this patch is
larger that I would have preferred.

 The microcode loading is not depending on the CPU it is

 Mostly. There are mixed-stepping boxes which need to differentiate
 between which cpu we're applying the patch for.

Nothing looks at -x86_model or -x86_mask during load.
It will always load all patches from the current family.
If loading would really depend on the current cpu in a mixed system
that would be horrible: Depending on which CPU gets execute
load_microcode_amd() it there would be different patches loaded into
RAM?

 Btw, your config boots on my F14h box with nomodeset on the command
 line because it is missing radeon firmware for my gpu.

I suspect a F14h box will never see that hang. It trips over the the
C1E erratum and amd_erratum_400[] looks like it only affects 0xfh and
0x10h (like my Phenom II X6).

 executed and all the loaded patches will end up in a global list for all
 CPUs anyway.
 * Return -1 (like Intels apply_microcode) when the loading fails,
 also do not set the active microcode level on failure.

 Yep, this part I want. Please send it as a separate patch.

OK, will send that together with the resend for cpu_has_amd_erratum().

 * Save the amd_bsp_mpb on apply, not on load. Otherwise someone could
 later load an older microcode file that would overwrite amd_bsp_mpb,
 but would not be applied to the CPUs

 See the patch id check in apply_ucode_in_initrd()?

 if (eq_id == mc-hdr.processor_rev_id  rev  mc-hdr.patch_id)

I meant with load load_microcode_amd() not the loading of the
microcode into the CPU.

1.: load microcode rev X into CPU (early or normal is not important)
2.: get older microcode file that only contains rev Y with YX
3.: trigger load_microcode_amd() with a corrupt file: This will call
cleanup() and empty pcache.
4.: trigger load_microcode_amd() with the older file:
 * this will now load rev Y into pcache
 * rev Y will be returned by find_patch and copied into amd_bsp_mpb
 * any try to apply rev Y will be skipped in apply_microcode_amd()

So now the CPU still correctly has rev X, but amd_bsp_mpb will contain
the wrong rev Y.

That copying already in load_microcode also is suspicious if someone
would only load the microcode but not apply it. But I did not find a
codepath in arch/x86/kernel/microcode_core.c to load it without a
followup apply.

 * Save the amd_bsp_mpb on every update. Otherwise someone could offline
 the BSP, update the microcode and this would be lost on resume

 Huh, is amd_bsp_mpb going to disappear all of a sudden?

 And that doesn't matter because when we online the BSP later, it goes
 through the CPU hotplug notifier mc_cpu_callback. Or am I missing
 something?

Yeah, me correctly describing what I was meaning. ;-)

1.: boot system, BIOS give microcode rev. X
2.: offline the BSP
3.: update microcode to rev. Y with Y  X
Because the BSP is not online rev. Y will not be copied into amd_bsp_mpb
4.: supend
5.: resume, BIOS gives rev. X again
6.: amd_bsp_mpb is empty - rev. Y will not be reapplied.

 * apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because
 load_microcode_amd() its no longer doing this and its not using
 apply_microcode_amd().
 * extract common checks and initialisations from load_ucode_ap() and
 load_microcode_amd() to load_microcode_amd_early(). The change from
 cpu to x86family in load_microcode_amd() allows to drop the code messing
 with cpu_data(cpu), with is wrong anyway because at that point the
 per-cpu cpu_info is not yet setup. And these values would later be
 overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info().

 Right, so I was thinking about this. And the code is pretty nasty: we do a
 load_ucode_amd_ap() but we do add the ucode for the BSP:

 if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK)

No, that code will not be reached for the BSP, because it is behind:
if (cpu  !ucode_loaded) {
The BSP has cpu == 0. Thats why I adding the following in my patch:
+   /* BSP via load_ucode_amd_bsp

[PATCH v2] x86, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86

2013-07-23 Thread Torsten Kaiser
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early() will
fill -x86 and so the fallback to boot_cpu_data is not used.
But -x86_vendor was not filled and is still 0 == X86_VENDOR_INTEL resulting in 
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: Its only caller
init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug()
that is controlled by cpu_has_amd_erratum() also only uses that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and
the broken fallback can be dropped.

I also added an WARN_ON() into the vendor check because init_amd() can only
be used by AMD CPUs and if the current failure hadn't been silent this bug
would have been much more obvious.

V2: At request of Borislav Petkov: BUG_ON - WARN_ON and subject change

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

--- a/arch/x86/kernel/cpu/amd.c 2013-07-22 06:33:10.027931005 +0200
+++ b/arch/x86/kernel/cpu/amd.c 2013-07-22 06:35:15.757931265 +0200
@@ -512,7 +512,7 @@ static void early_init_amd(struct cpuinf
 
 static const int amd_erratum_383[];
 static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);
 
 static void init_amd(struct cpuinfo_x86 *c)
 {
@@ -729,11 +729,11 @@ static void init_amd(struct cpuinfo_x86
value = ~(1ULL  24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);
 
-   if (cpu_has_amd_erratum(amd_erratum_383))
+   if (cpu_has_amd_erratum(c, amd_erratum_383))
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}
 
-   if (cpu_has_amd_erratum(amd_erratum_400))
+   if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
 
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, c-microcode, dummy);
@@ -878,22 +878,16 @@ static const int amd_erratum_400[] =
 static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
 
-static bool cpu_has_amd_erratum(const int *erratum)
+
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 {
-   struct cpuinfo_x86 *cpu = __this_cpu_ptr(cpu_info);
int osvw_id = *erratum++;
u32 range;
u32 ms;
 
-   /*
-* If called early enough that current_cpu_data hasn't been initialized
-* yet, fall back to boot_cpu_data.
-*/
-   if (cpu-x86 == 0)
-   cpu = boot_cpu_data;
-
-   if (cpu-x86_vendor != X86_VENDOR_AMD)
-   return false;
+   /* Should never be called on non-AMD-CPUs */
+   if (WARN_ON(cpu-x86_vendor != X86_VENDOR_AMD))
+   return false;
 
if (osvw_id = 0  osvw_id  65536 
cpu_has(cpu, X86_FEATURE_OSVW)) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] x86, AMD: fix error path in apply_microcode_amd()

2013-07-23 Thread Torsten Kaiser
Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

--- a/arch/x86/kernel/microcode_amd.c   2013-07-23 19:42:16.089517717 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-23 19:43:30.359517091 +0200
@@ -220,12 +220,13 @@ int apply_microcode_amd(int cpu)
return 0;
}
 
-   if (__apply_microcode_amd(mc_amd))
+   if (__apply_microcode_amd(mc_amd)) {
pr_err(CPU%d: update failed for patch_level=0x%08x\n,
cpu, mc_amd-hdr.patch_id);
-   else
-   pr_info(CPU%d: new patch_level=0x%08x\n, cpu,
-   mc_amd-hdr.patch_id);
+   return -1;
+   }
+   pr_info(CPU%d: new patch_level=0x%08x\n, cpu,
+   mc_amd-hdr.patch_id);
 
uci-cpu_sig.rev = mc_amd-hdr.patch_id;
c-microcode = mc_amd-hdr.patch_id;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] x86, microcode: Don't lose error returns in save_microcode_in_initrd()

2013-07-23 Thread Torsten Kaiser
Don't lose the error return.
This was lost when early amd microcode loading was added in
757885e94a22bcc82beb9b1445c95218cb20ceab

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

--- a/arch/x86/kernel/microcode_core_early.c2013-07-23 19:44:05.509516795 
+0200
+++ b/arch/x86/kernel/microcode_core_early.c2013-07-23 19:58:34.459509474 
+0200
@@ -127,11 +127,11 @@ int __init save_microcode_in_initrd(void
switch (c-x86_vendor) {
case X86_VENDOR_INTEL:
if (c-x86 = 6)
-   save_microcode_in_initrd_intel();
+   return save_microcode_in_initrd_intel();
break;
case X86_VENDOR_AMD:
if (c-x86 = 0x10)
-   save_microcode_in_initrd_amd();
+   return save_microcode_in_initrd_amd();
break;
default:
break;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] x86, AMD: cleanup: merge common code in early microcode loading

2013-07-23 Thread Torsten Kaiser
Extract common checks and initialisations from load_ucode_ap() and
save_microcode_in_initrd_amd() to load_microcode_amd_early().
load_ucode_ap() gets a quick exit for !cpu, because for the BSP there is
already a different function dealing with its update.

The original code already didn't anything, because without load_microcode_amd()
getting called apply_microcode_amd() could not do anything.

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

--- a/arch/x86/kernel/microcode_amd_early.c 2013-07-22 06:22:32.0 
+0200
+++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 
+0200
@@ -196,6 +196,23 @@ void __init load_ucode_amd_bsp(void)
apply_ucode_in_initrd(cd.data, cd.size);
 }
 
+static int load_microcode_amd_early(void)
+{
+   enum ucode_state ret;
+   void *ucode;
+
+   if (ucode_loaded || !ucode_size || !initrd_start)
+   return 0;
+
+   ucode = (void *)(initrd_start + ucode_offset);
+   ret = load_microcode_amd(0, ucode, ucode_size);
+   if (ret != UCODE_OK)
+   return -EINVAL;
+
+   ucode_loaded = true;
+   return 0;
+}
+
 #ifdef CONFIG_X86_32
 u8 amd_bsp_mpb[MPB_MAX_SIZE];
 
@@ -258,17 +275,13 @@ void load_ucode_amd_ap(void)
 
collect_cpu_info_amd_early(cpu_data(cpu), ucode_cpu_info + cpu);
 
-   if (cpu  !ucode_loaded) {
-   void *ucode;
-
-   if (!ucode_size || !initrd_start)
-   return;
+   /* BSP via load_ucode_amd_bsp() */
+   if (!cpu)
+   return;
 
-   ucode = (void *)(initrd_start + ucode_offset);
-   if (load_microcode_amd(0, ucode, ucode_size) != UCODE_OK)
-   return;
-   ucode_loaded = true;
-   }
+   load_microcode_amd_early();
+   if (!ucode_loaded)
+   return;
 
apply_microcode_amd(cpu);
 }
@@ -276,8 +289,6 @@ void load_ucode_amd_ap(void)
 
 int __init save_microcode_in_initrd_amd(void)
 {
-   enum ucode_state ret;
-   void *ucode;
 #ifdef CONFIG_X86_32
unsigned int bsp = boot_cpu_data.cpu_index;
struct ucode_cpu_info *uci = ucode_cpu_info + bsp;
@@ -289,14 +300,5 @@ int __init save_microcode_in_initrd_amd(
pr_info(microcode: updated early to new patch_level=0x%08x\n,
ucode_new_rev);
 
-   if (ucode_loaded || !ucode_size || !initrd_start)
-   return 0;
-
-   ucode = (void *)(initrd_start + ucode_offset);
-   ret = load_microcode_amd(0, ucode, ucode_size);
-   if (ret != UCODE_OK)
-   return -EINVAL;
-
-   ucode_loaded = true;
-   return 0;
+   return load_microcode_amd_early();
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] x86, AMD: saved applied, not loaded microcode for reloading on resume

2013-07-23 Thread Torsten Kaiser
* Save the amd_bsp_mpb on apply, not on load. Otherwise someone could
  later load an older microcode file via load_microcode_amd() that
  would overwrite amd_bsp_mpb, but would not be applied to the CPUs
(apply_microcode_amd() checks the current patchlevel, but the copy code
in load_microcode_adm() did not. If somehow cleanup() gets called and
  clears pcache find_patch() could return return older patches then the
  currently installed microcode)
* Save the amd_bsp_mpb on every update. Otherwise, if someone would
  update the microcode after offlining the BSP, these updates would not
  get saved and would be lost on resume.
* apply_ucode_in_initrd() now also needs to save amd_bsp_mbp, because
load_microcode_amd() its no longer doing this and its not using
apply_microcode_amd().

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

---

Removing this hunk from load_microcode_amd() also allows me to kill the
cpu parameter for that function in the next patch...

--- a/arch/x86/kernel/microcode_amd.c   2013-07-23 19:43:30.359517091 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-23 20:05:04.469506188 +0200
@@ -228,6 +228,12 @@ int apply_microcode_amd(int cpu)
pr_info(CPU%d: new patch_level=0x%08x\n, cpu,
mc_amd-hdr.patch_id);
 
+#if defined(CONFIG_MICROCODE_AMD_EARLY)  defined(CONFIG_X86_32)
+   /* save applied patch for early load */
+   memset(amd_bsp_mpb, 0, MPB_MAX_SIZE);
+   memcpy(amd_bsp_mpb, p-data, min_t(u32, ksize(p-data), MPB_MAX_SIZE));
+#endif
+
uci-cpu_sig.rev = mc_amd-hdr.patch_id;
c-microcode = mc_amd-hdr.patch_id;
 
@@ -385,17 +391,6 @@ enum ucode_state load_microcode_amd(int
if (ret != UCODE_OK)
cleanup();
 
-#if defined(CONFIG_MICROCODE_AMD_EARLY)  defined(CONFIG_X86_32)
-   /* save BSP's matching patch for early load */
-   if (cpu_data(cpu).cpu_index == boot_cpu_data.cpu_index) {
-   struct ucode_patch *p = find_patch(cpu);
-   if (p) {
-   memset(amd_bsp_mpb, 0, MPB_MAX_SIZE);
-   memcpy(amd_bsp_mpb, p-data, min_t(u32, ksize(p-data),
-  MPB_MAX_SIZE));
-   }
-   }
-#endif
return ret;
 }
 
--- a/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:00:04.889508712 
+0200
+++ b/arch/x86/kernel/microcode_amd_early.c 2013-07-23 20:05:14.969506099 
+0200
@@ -170,6 +170,13 @@ static void apply_ucode_in_initrd(void *
mc = (struct microcode_amd *)(data + SECTION_HDR_SIZE);
if (eq_id == mc-hdr.processor_rev_id  rev  mc-hdr.patch_id)
if (__apply_microcode_amd(mc) == 0) {
+#ifdef CONFIG_X86_32
+   /* save applied patch for early load */
+   memset((void *)__pa(amd_bsp_mpb), 0,
+   MPB_MAX_SIZE);
+   memcpy((void *)__pa(amd_bsp_mpb), mc,
+   min_t(u32, header[1], MPB_MAX_SIZE));
+#endif
rev = mc-hdr.patch_id;
*new_rev = rev;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] x86, AMD: simplify load_microcode_amd() to fix early microcode loading to no longer access uninitialized per-cpu data

2013-07-23 Thread Torsten Kaiser
load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading is not depending on the CPU it is
executed and all the loaded patches will end up in a global list for all
CPUs anyway.
The change from cpu to x86family in load_microcode_amd() now allows to drop
the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which 
is wrong anyway because at that point the per-cpu cpu_info is not yet setup. 
And these values would later be overwritten by smp_store_boot_cpu_info() / 
smp_store_cpu_info().

Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because 
its
only used at one place and without the cpuinfo_x86 accesses it was not much 
left.

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

---

One effect of this early, partly initialisation of cpu_info was, that the 
fallback
logic in cpu_has_amd_erratum() did not use boot_cpu_data and because x86_vendor
was not initialised in the per-cpu struct the E400 erratum was not activated on
my system resulting in a failed boot.

--- a/arch/x86/include/asm/microcode_amd.h  2013-07-23 20:15:10.549501081 
+0200
+++ b/arch/x86/include/asm/microcode_amd.h  2013-07-23 20:16:05.329500620 
+0200
@@ -59,7 +59,7 @@ static inline u16 find_equiv_id(struct e
 
 extern int __apply_microcode_amd(struct microcode_amd *mc_amd);
 extern int apply_microcode_amd(int cpu);
-extern enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t 
size);
+extern enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, 
size_t size);
 
 #ifdef CONFIG_MICROCODE_AMD_EARLY
 #ifdef CONFIG_X86_32
--- a/arch/x86/kernel/microcode_amd.c   2013-07-23 20:05:04.469506188 +0200
+++ b/arch/x86/kernel/microcode_amd.c   2013-07-23 20:23:22.739496934 +0200
@@ -145,10 +145,9 @@ static int collect_cpu_info_amd(int cpu,
return 0;
 }
 
-static unsigned int verify_patch_size(int cpu, u32 patch_size,
+static unsigned int verify_patch_size(u8 x86family, u32 patch_size,
  unsigned int size)
 {
-   struct cpuinfo_x86 *c = cpu_data(cpu);
u32 max_size;
 
 #define F1XH_MPB_MAX_SIZE 2048
@@ -156,7 +155,7 @@ static unsigned int verify_patch_size(in
 #define F15H_MPB_MAX_SIZE 4096
 #define F16H_MPB_MAX_SIZE 3458
 
-   switch (c-x86) {
+   switch (x86family) {
case 0x14:
max_size = F14H_MPB_MAX_SIZE;
break;
@@ -283,9 +282,8 @@ static void cleanup(void)
  * driver cannot continue functioning normally. In such cases, we tear
  * down everything we've used up so far and exit.
  */
-static int verify_and_add_patch(unsigned int cpu, u8 *fw, unsigned int 
leftover)
+static int verify_and_add_patch(u8 x86family, u8 *fw, unsigned int leftover)
 {
-   struct cpuinfo_x86 *c = cpu_data(cpu);
struct microcode_header_amd *mc_hdr;
struct ucode_patch *patch;
unsigned int patch_size, crnt_size, ret;
@@ -305,7 +303,7 @@ static int verify_and_add_patch(unsigned
 
/* check if patch is for the current family */
proc_fam = ((proc_fam  8)  0xf) + ((proc_fam  20)  0xff);
-   if (proc_fam != c-x86)
+   if (proc_fam != x86family)
return crnt_size;
 
if (mc_hdr-nb_dev_id || mc_hdr-sb_dev_id) {
@@ -314,7 +312,7 @@ static int verify_and_add_patch(unsigned
return crnt_size;
}
 
-   ret = verify_patch_size(cpu, patch_size, leftover);
+   ret = verify_patch_size(x86family, patch_size, leftover);
if (!ret) {
pr_err(Patch-ID 0x%08x: size mismatch.\n, mc_hdr-patch_id);
return crnt_size;
@@ -345,7 +343,7 @@ static int verify_and_add_patch(unsigned
return crnt_size;
 }
 
-static enum ucode_state __load_microcode_amd(int cpu, const u8 *data, size_t 
size)
+static enum ucode_state __load_microcode_amd(u8 x86family, const u8 *data, 
size_t size)
 {
enum ucode_state ret = UCODE_ERROR;
unsigned int leftover;
@@ -368,7 +366,7 @@ static enum ucode_state __load_microcode
}
 
while (leftover) {
-   crnt_size = verify_and_add_patch(cpu, fw, leftover);
+   crnt_size = verify_and_add_patch(x86family, fw, leftover);
if (crnt_size  0)
return ret;
 
@@ -379,14 +377,14 @@ static enum ucode_state __load_microcode
return UCODE_OK;
 }
 
-enum ucode_state load_microcode_amd(int cpu, const u8 *data, size_t size)
+enum ucode_state load_microcode_amd(u8 x86family, const u8 *data, size_t size)
 {
enum ucode_state ret;
 
/* free old equiv table */
free_equiv_cpu_table();
 
-   ret = __load_microcode_amd(cpu, data, size);
+   ret = __load_microcode_amd(x86family, data, size);
 
if (ret != UCODE_OK)
cleanup();
@@ -436,7 +434,7 @@ static enum ucode_state request_microcod
goto fw_release;
}
 
-   ret = load_microcode_amd(cpu, fw-data

Re: [PATCH]Fix early microcode loading on AMD

2013-07-23 Thread Torsten Kaiser
On Tue, Jul 23, 2013 at 5:15 PM, Borislav Petkov b...@alien8.de wrote:
 On Tue, Jul 23, 2013 at 01:58:53PM +0200, Torsten Kaiser wrote:
 Fixup the early AMD microcode loading.

 * load_microcode_amd() (and the helper its using) should not have an
 cpu parameter.

 Hmm, I don't think so - we get the cpu handed down from microcode_core
 and besides the early load on 32bit needs to do find_patch(cpu).

 The microcode loading is not depending on the CPU it is

 Mostly. There are mixed-stepping boxes which need to differentiate
 between which cpu we're applying the patch for.

I redid the patch in 5 parts, hopefully now better to understand.
Without the other changes the microcode_amd.c-part of patch 5/5 should
make it much more obvious that my change did not result in a different
behavior about which patches get loaded into the microcode cache
'pcache'.

 Btw, your config boots on my F14h box with nomodeset on the command
 line because it is missing radeon firmware for my gpu.

 executed and all the loaded patches will end up in a global list for all
 CPUs anyway.
 * Return -1 (like Intels apply_microcode) when the loading fails,
 also do not set the active microcode level on failure.

 Yep, this part I want. Please send it as a separate patch.

That is now patch 1/5.
Patch 2/5 is new, I skipped that part originally because I did not
want to make it even bigger...

 So I see a couple of issues in this patch and they should be separated
 into single patches - one patch taking care of one issue and explaining
 what the problem is in the commit message (I know you can do that good
 :)).


I'm still seeing some things in the microcode code that look suspicious:

Why is the X86_64 code updating uci-cpu_sig.rev, but the 32bit
version does not? And I can't see anything that reads that value.

Should apply_microcode_amd() really update uci-mc even before
checking if the microcode is newer?

The X86_32 hunk in save_microcode_in_initrd_amd() now seems obsolete.
load_microcode_amd() is no longer using find_patch() so it doesn't use
ucode_cpu_info anymore. But why is that code using
boot_cpu_data.cpu_index to find the BSP but always then passing 0 as
cpu parameter to load_microcode_amd()? If boot_cpu_data.cpu_index is
ever !=0 that code would fail.

... and collect_cpu_info_amd() also looks very weird. If csig would
not point to uci-cpu_sig then find_patch() will not be happy.
Wouldn't directly passing cpuid_eax(0x0001) to find_patch() be a
better interface? Then the early microcode loading code would not need
to access ucode_cpu_info at all.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: early microcode on amd is broken when no initramfs provided

2013-07-20 Thread Torsten Kaiser
On Sun, Jul 21, 2013 at 12:59 AM, Borislav Petkov  wrote:
> On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote:
>> On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov  wrote:
>> > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
>> >> config is attached
>> >
>> > Ok, I can reproduce the hang with your config but even with:
>> >
>> > $ grep MICROCODE .config
>> > # CONFIG_MICROCODE is not set
>> > # CONFIG_MICROCODE_INTEL_EARLY is not set
>> > # CONFIG_MICROCODE_AMD_EARLY is not set
>> >
>> > which means, it cannot be microcode-related.
>> >
>> > And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
>> > the boot would probably continue. And if so, this is that 60 sec delay
>> > where the kernel tries to find firmware.
>> >
>> > Hmm...
>>
>> I have the same problem: Booting 3.11-rc1 hangs after the line:
>> ACPI: Executed 3 blocks of module-level executable AML code
>>
>> I bisected it down to the early microcode changes:
>> 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
>> implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
>> fixup) completely fail to boot (No output beyond "Booting kernel") ,
>> from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
>> find_ucode_in_initrd() __init") I'm seeing this hang.
>>
>> Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
>> now sucessfully boots 3.11-rc1.
>
> Ok, I need to be able to reproduce that first - I wasn't that successful
> with Johannes' setup.
>
> So, can you please send .config and how you're loading your microcode?
> Is it in the initrd or are you doing that later, how? Grub entry please.
>
> Also, is it just plain v3.11-rc1 or with patches ontop?
>
> Also, /proc/cpuinfo please.

.config and cpuinfo attached.
Microcode seems not to be loaded at all, for MICROCODE_EARLY I did not
attach the needed file / cpio and the normal update mechanism seems to
not have a newer microcode that what the BIOS is providing.
I'm using a custom initrd, but that can't be used for MICROCODE_EARLY
because its compressed and does not contain a AuthenticAMD.bin. Its
also not containing microcode_amd.bin, because I'm suppling that via
CONFIG_EXTRA_FIRMWARE.
Grub entry:
title 3.11.0-rc1-crypt
root (hd0,0)
kernel (hd0,0)/boot/kernel-3.11.0-rc1 fastboot crypt_root=/dev/md6
video=1280x1024 radeon.dpm=1
initrd (hd0,0)/boot/ramfs-2011.gz
savedefault

I was using plain 3.11-rc1 except the changes I made to debug this.

What I think you need: A system that is fatally affected by AMD
Erratum 400 and an 64bit kernel.

>From my debugging I found the following sequence of events occurs on my system:
The BSP will call load_ucode_ap().
That will call collect_cpu_info_amd_early(), which will fill the
cpuinfo_x86.x86 and cpuinfo_x86.microcode fields of the
cpu_info-per-cpu-structure that has not yet been setup. Because this
code will only be used with MICROCODE_EARLY disabling this options
make my system boot. OTOH this function is called regardless if
AuthenticAMD.bin is available or not, thats why I'm hitting it even
without the special cpio.
Then the BSP will call init_amd() to apply the errata fixes. That uses
cpu_has_amd_erratum(), but that function is not using the cpuinfo_x86
that was supplied to init_amd() (And used for the following
set_cpu_bug() is the erratum was found!), but instead is guessing
itself if it should use the per-cpu data or boot_cpu_data. And it uses
the not yet initialized per-cpu data for that guess. Which normally
works fine, because that will all be zeroed out, but
collect_cpu_info_amd_early() has filled ->x86 and so
cpu_has_amd_erratum() wil use the partly filled per-cpu data instead
of the correct boot_cpu_data. But because collect_cpu_info_amd_early()
did not fill ->x86_vendor that field is still 0 == X86_VENDOR_INTEL
and cpu_has_amd_erratum() will lie that no erratum is present.
So the C1E work around is not applied and as soon as ACPI enables this
the boot hangs.

Something like the following (whitespace mangled by Gmail, if it looks
OK for you, I will send it as a clean patch) fixes
cpu_has_amd_erratum() for me, but I did not look how the early
microcode loading should work if AuthenticAMD.bin is available to
offer a fix the premature accesses to per-cpu cpu_info.

--- 3.11-rc1/arch/x86/kernel/cpu/amd.c.orig 2013-07-21
05:42:42.130346496 +0200
+++ 3.11-rc1/arch/x86/kernel/cpu/amd.c  2013-07-21 05:45:09.420345843 +0200
@@ -512,7 +512,7 @@

 static const int amd_erratum_383[];
 static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *e

Re: early microcode on amd is broken when no initramfs provided

2013-07-20 Thread Torsten Kaiser
On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov  wrote:
> On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
>> config is attached
>
> Ok, I can reproduce the hang with your config but even with:
>
> $ grep MICROCODE .config
> # CONFIG_MICROCODE is not set
> # CONFIG_MICROCODE_INTEL_EARLY is not set
> # CONFIG_MICROCODE_AMD_EARLY is not set
>
> which means, it cannot be microcode-related.
>
> And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
> the boot would probably continue. And if so, this is that 60 sec delay
> where the kernel tries to find firmware.
>
> Hmm...

I have the same problem: Booting 3.11-rc1 hangs after the line:
ACPI: Executed 3 blocks of module-level executable AML code

I bisected it down to the early microcode changes:
757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
fixup) completely fail to boot (No output beyond "Booting kernel") ,
from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
find_ucode_in_initrd() __init") I'm seeing this hang.

Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
now sucessfully boots 3.11-rc1.

Trying to debug this I found the following hack to also solve the boot problem:
Removing the following two lines from collect_cpu_info_amd_early()
from arch/x86/kernel/microcode_amd_early.c:
   c->microcode = rev;
c->x86 = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);

But I can't make sense out of that. And if I try to trace who updates
->x86 it get even more confusing.
Normaly only cpu_detect() seems to update cpuinfo_x86.x86 but now it
seems to fight with collect_cpu_info_amd_early().
On my system this happens:
(Output is always address of the struct cpuinfo_x86 -> value that gets
written into it)

Very early boot:
cpu_detect 81c8ba40 -> 16

BSP == CPU0 calls load_ucode_ap() via cpu_init():
collect_cpu_info_amd_early 880337c10fc0 -> 16
(That is the place I patched out to get the system to boot)

BSP == CPU0 via identify_boot_cpu():
cpu_detect 81c8ba40 -> 16

BSP == CPU0 stores boot_cpu_data in its per-cpu structure via
smp_store_boot_cpu_info():
smpboot: BSP: store 81c8ba40 in 880337c10fc0

smpboot starts activating the secondary CPUs: Each would in
start_secondary() first call load_ucode_ap() via cpu_init() and then
identidfy_secondary_cpu() via smp_callin():
collect_cpu_info_amd_early 880337c50fc0
smpboot: identify_sec_cpu:1/880337c50fc0
cpu_detect 880337c50fc0 -> 16

collect_cpu_info_amd_early 880337c90fc0
smpboot: identify_sec_cpu:2/880337c90fc0
cpu_detect 880337c90fc0 -> 16

collect_cpu_info_amd_early 880337cd0fc0
smpboot: identify_sec_cpu:3/880337cd0fc0
cpu_detect 880337cd0fc0 -> 16

collect_cpu_info_amd_early 880337d10fc0
smpboot: identify_sec_cpu:4/880337d10fc0
cpu_detect 880337d10fc0 -> 16

collect_cpu_info_amd_early 880337d50fc0
smpboot: identify_sec_cpu:5/880337d50fc0
cpu_detect 880337d50fc0 -> 16


It seems the code for updating 'struct cpuinfo_x86 *C' in
collect_cpu_info_amd_early() is useless, because it will be
overwritten first by smp_store_cpu_info() and then again by
identify_secondary_cpu(c) and wrong, because at that point the per-cpu
structure should not be used yet, as smp_store_cpu_info() did not run
yet.
But something else seems to be using the per-cpu structure of the BSP
between its cpu_init() and smp_store_boot_cpu_info().

And its cpu_has_amd_erratum(): It uses cpuinfo_x86.x86 do decide if it
need to fall back to boot_cpu_data, but because
collect_cpu_info_amd_early() has filled that field, but not
.x86_vendor (that is still 0 == X86_VENDOR_INTEL) the erratas are not
applied to the BSP and then something in ACPI gets stuck.

Does this diagnostic make sense / should I send a patch?

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: early microcode on amd is broken when no initramfs provided

2013-07-20 Thread Torsten Kaiser
On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov b...@alien8.de wrote:
 On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
 config is attached

 Ok, I can reproduce the hang with your config but even with:

 $ grep MICROCODE .config
 # CONFIG_MICROCODE is not set
 # CONFIG_MICROCODE_INTEL_EARLY is not set
 # CONFIG_MICROCODE_AMD_EARLY is not set

 which means, it cannot be microcode-related.

 And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
 the boot would probably continue. And if so, this is that 60 sec delay
 where the kernel tries to find firmware.

 Hmm...

I have the same problem: Booting 3.11-rc1 hangs after the line:
ACPI: Executed 3 blocks of module-level executable AML code

I bisected it down to the early microcode changes:
757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
fixup) completely fail to boot (No output beyond Booting kernel) ,
from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 (Make
find_ucode_in_initrd() __init) I'm seeing this hang.

Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
now sucessfully boots 3.11-rc1.

Trying to debug this I found the following hack to also solve the boot problem:
Removing the following two lines from collect_cpu_info_amd_early()
from arch/x86/kernel/microcode_amd_early.c:
   c-microcode = rev;
c-x86 = ((eax  8)  0xf) + ((eax  20)  0xff);

But I can't make sense out of that. And if I try to trace who updates
-x86 it get even more confusing.
Normaly only cpu_detect() seems to update cpuinfo_x86.x86 but now it
seems to fight with collect_cpu_info_amd_early().
On my system this happens:
(Output is always address of the struct cpuinfo_x86 - value that gets
written into it)

Very early boot:
cpu_detect 81c8ba40 - 16

BSP == CPU0 calls load_ucode_ap() via cpu_init():
collect_cpu_info_amd_early 880337c10fc0 - 16
(That is the place I patched out to get the system to boot)

BSP == CPU0 via identify_boot_cpu():
cpu_detect 81c8ba40 - 16

BSP == CPU0 stores boot_cpu_data in its per-cpu structure via
smp_store_boot_cpu_info():
smpboot: BSP: store 81c8ba40 in 880337c10fc0

smpboot starts activating the secondary CPUs: Each would in
start_secondary() first call load_ucode_ap() via cpu_init() and then
identidfy_secondary_cpu() via smp_callin():
collect_cpu_info_amd_early 880337c50fc0
smpboot: identify_sec_cpu:1/880337c50fc0
cpu_detect 880337c50fc0 - 16

collect_cpu_info_amd_early 880337c90fc0
smpboot: identify_sec_cpu:2/880337c90fc0
cpu_detect 880337c90fc0 - 16

collect_cpu_info_amd_early 880337cd0fc0
smpboot: identify_sec_cpu:3/880337cd0fc0
cpu_detect 880337cd0fc0 - 16

collect_cpu_info_amd_early 880337d10fc0
smpboot: identify_sec_cpu:4/880337d10fc0
cpu_detect 880337d10fc0 - 16

collect_cpu_info_amd_early 880337d50fc0
smpboot: identify_sec_cpu:5/880337d50fc0
cpu_detect 880337d50fc0 - 16


It seems the code for updating 'struct cpuinfo_x86 *C' in
collect_cpu_info_amd_early() is useless, because it will be
overwritten first by smp_store_cpu_info() and then again by
identify_secondary_cpu(c) and wrong, because at that point the per-cpu
structure should not be used yet, as smp_store_cpu_info() did not run
yet.
But something else seems to be using the per-cpu structure of the BSP
between its cpu_init() and smp_store_boot_cpu_info().

And its cpu_has_amd_erratum(): It uses cpuinfo_x86.x86 do decide if it
need to fall back to boot_cpu_data, but because
collect_cpu_info_amd_early() has filled that field, but not
.x86_vendor (that is still 0 == X86_VENDOR_INTEL) the erratas are not
applied to the BSP and then something in ACPI gets stuck.

Does this diagnostic make sense / should I send a patch?

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: early microcode on amd is broken when no initramfs provided

2013-07-20 Thread Torsten Kaiser
On Sun, Jul 21, 2013 at 12:59 AM, Borislav Petkov b...@alien8.de wrote:
 On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote:
 On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov b...@alien8.de wrote:
  On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
  config is attached
 
  Ok, I can reproduce the hang with your config but even with:
 
  $ grep MICROCODE .config
  # CONFIG_MICROCODE is not set
  # CONFIG_MICROCODE_INTEL_EARLY is not set
  # CONFIG_MICROCODE_AMD_EARLY is not set
 
  which means, it cannot be microcode-related.
 
  And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
  the boot would probably continue. And if so, this is that 60 sec delay
  where the kernel tries to find firmware.
 
  Hmm...

 I have the same problem: Booting 3.11-rc1 hangs after the line:
 ACPI: Executed 3 blocks of module-level executable AML code

 I bisected it down to the early microcode changes:
 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
 implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
 fixup) completely fail to boot (No output beyond Booting kernel) ,
 from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 (Make
 find_ucode_in_initrd() __init) I'm seeing this hang.

 Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
 now sucessfully boots 3.11-rc1.

 Ok, I need to be able to reproduce that first - I wasn't that successful
 with Johannes' setup.

 So, can you please send .config and how you're loading your microcode?
 Is it in the initrd or are you doing that later, how? Grub entry please.

 Also, is it just plain v3.11-rc1 or with patches ontop?

 Also, /proc/cpuinfo please.

.config and cpuinfo attached.
Microcode seems not to be loaded at all, for MICROCODE_EARLY I did not
attach the needed file / cpio and the normal update mechanism seems to
not have a newer microcode that what the BIOS is providing.
I'm using a custom initrd, but that can't be used for MICROCODE_EARLY
because its compressed and does not contain a AuthenticAMD.bin. Its
also not containing microcode_amd.bin, because I'm suppling that via
CONFIG_EXTRA_FIRMWARE.
Grub entry:
title 3.11.0-rc1-crypt
root (hd0,0)
kernel (hd0,0)/boot/kernel-3.11.0-rc1 fastboot crypt_root=/dev/md6
video=1280x1024 radeon.dpm=1
initrd (hd0,0)/boot/ramfs-2011.gz
savedefault

I was using plain 3.11-rc1 except the changes I made to debug this.

What I think you need: A system that is fatally affected by AMD
Erratum 400 and an 64bit kernel.

From my debugging I found the following sequence of events occurs on my system:
The BSP will call load_ucode_ap().
That will call collect_cpu_info_amd_early(), which will fill the
cpuinfo_x86.x86 and cpuinfo_x86.microcode fields of the
cpu_info-per-cpu-structure that has not yet been setup. Because this
code will only be used with MICROCODE_EARLY disabling this options
make my system boot. OTOH this function is called regardless if
AuthenticAMD.bin is available or not, thats why I'm hitting it even
without the special cpio.
Then the BSP will call init_amd() to apply the errata fixes. That uses
cpu_has_amd_erratum(), but that function is not using the cpuinfo_x86
that was supplied to init_amd() (And used for the following
set_cpu_bug() is the erratum was found!), but instead is guessing
itself if it should use the per-cpu data or boot_cpu_data. And it uses
the not yet initialized per-cpu data for that guess. Which normally
works fine, because that will all be zeroed out, but
collect_cpu_info_amd_early() has filled -x86 and so
cpu_has_amd_erratum() wil use the partly filled per-cpu data instead
of the correct boot_cpu_data. But because collect_cpu_info_amd_early()
did not fill -x86_vendor that field is still 0 == X86_VENDOR_INTEL
and cpu_has_amd_erratum() will lie that no erratum is present.
So the C1E work around is not applied and as soon as ACPI enables this
the boot hangs.

Something like the following (whitespace mangled by Gmail, if it looks
OK for you, I will send it as a clean patch) fixes
cpu_has_amd_erratum() for me, but I did not look how the early
microcode loading should work if AuthenticAMD.bin is available to
offer a fix the premature accesses to per-cpu cpu_info.

--- 3.11-rc1/arch/x86/kernel/cpu/amd.c.orig 2013-07-21
05:42:42.130346496 +0200
+++ 3.11-rc1/arch/x86/kernel/cpu/amd.c  2013-07-21 05:45:09.420345843 +0200
@@ -512,7 +512,7 @@

 static const int amd_erratum_383[];
 static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);

 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 {
@@ -729,11 +729,11 @@
value = ~(1ULL  24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);

-   if (cpu_has_amd_erratum(amd_erratum_383))
+   if (cpu_has_amd_erratum(c, amd_erratum_383))
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH

[PATCH]xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()

2013-01-20 Thread Torsten Kaiser
From: Torsten Kaiser 

Commit fb59581404ab7ec5075299065c22cb211a9262a9 removed
xfs_flushinval_pages() and changed its callers to use
filemap_write_and_wait() and  truncate_pagecache_range() directly.

But in xfs_swap_extents() this change accidental switched the argument
for 'tip' to 'ip'. This patch switches it back to 'tip'

Signed-off-by: Torsten Kaiser 

--- a/fs/xfs/xfs_dfrag.c
+++ b/fs/xfs/xfs_dfrag.c
@@ -246,10 +246,10 @@ xfs_swap_extents(
goto out_unlock;
}
 
-   error = -filemap_write_and_wait(VFS_I(ip)->i_mapping);
+   error = -filemap_write_and_wait(VFS_I(tip)->i_mapping);
if (error)
goto out_unlock;
-   truncate_pagecache_range(VFS_I(ip), 0, -1);
+   truncate_pagecache_range(VFS_I(tip), 0, -1);
 
/* Verify O_DIRECT for ftmp */
if (VN_CACHED(VFS_I(tip)) != 0) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()

2013-01-20 Thread Torsten Kaiser
From: Torsten Kaiser just.for.l...@googlemail.com

Commit fb59581404ab7ec5075299065c22cb211a9262a9 removed
xfs_flushinval_pages() and changed its callers to use
filemap_write_and_wait() and  truncate_pagecache_range() directly.

But in xfs_swap_extents() this change accidental switched the argument
for 'tip' to 'ip'. This patch switches it back to 'tip'

Signed-off-by: Torsten Kaiser just.for.l...@googlemail.com

--- a/fs/xfs/xfs_dfrag.c
+++ b/fs/xfs/xfs_dfrag.c
@@ -246,10 +246,10 @@ xfs_swap_extents(
goto out_unlock;
}
 
-   error = -filemap_write_and_wait(VFS_I(ip)-i_mapping);
+   error = -filemap_write_and_wait(VFS_I(tip)-i_mapping);
if (error)
goto out_unlock;
-   truncate_pagecache_range(VFS_I(ip), 0, -1);
+   truncate_pagecache_range(VFS_I(tip), 0, -1);
 
/* Verify O_DIRECT for ftmp */
if (VN_CACHED(VFS_I(tip)) != 0) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in md-raid1 with 3.7-rcX

2012-12-02 Thread Torsten Kaiser
On Tue, Nov 27, 2012 at 8:08 AM, Torsten Kaiser
 wrote:
> On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown  wrote:
>> Can you test to see if this fixes it?
>
> Patch applied, I will try to get it stuck again.
> I don't have a reliable reproducers, but if the problem persists I
> will definitly report back here.

With this patch I was not able to recreate the hang. Lacking an 100%
way of recreating this, I can't be completely sure of the fix, but as
you understood from the code how this hang could happen, I'm quite
confident that the fix is working.

(As I do not use the raid10 personality only patching raid1.c was
sufficient for me, I didn't test the version that also patched
raid10.c as its not even compiled on my kernel.)

Thanks for the fix!

Torsten

>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 636bae0..a0f7309 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool 
>> from_schedule)
>> struct r1conf *conf = mddev->private;
>> struct bio *bio;
>>
>> -   if (from_schedule) {
>> +   if (from_schedule || current->bio_list) {
>> spin_lock_irq(>device_lock);
>> bio_list_merge(>pending_bio_list, >pending);
>> conf->pending_count += plug->pending_cnt;
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in md-raid1 with 3.7-rcX

2012-12-02 Thread Torsten Kaiser
On Tue, Nov 27, 2012 at 8:08 AM, Torsten Kaiser
just.for.l...@googlemail.com wrote:
 On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown ne...@suse.de wrote:
 Can you test to see if this fixes it?

 Patch applied, I will try to get it stuck again.
 I don't have a reliable reproducers, but if the problem persists I
 will definitly report back here.

With this patch I was not able to recreate the hang. Lacking an 100%
way of recreating this, I can't be completely sure of the fix, but as
you understood from the code how this hang could happen, I'm quite
confident that the fix is working.

(As I do not use the raid10 personality only patching raid1.c was
sufficient for me, I didn't test the version that also patched
raid10.c as its not even compiled on my kernel.)

Thanks for the fix!

Torsten

 diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
 index 636bae0..a0f7309 100644
 --- a/drivers/md/raid1.c
 +++ b/drivers/md/raid1.c
 @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool 
 from_schedule)
 struct r1conf *conf = mddev-private;
 struct bio *bio;

 -   if (from_schedule) {
 +   if (from_schedule || current-bio_list) {
 spin_lock_irq(conf-device_lock);
 bio_list_merge(conf-pending_bio_list, plug-pending);
 conf-pending_count += plug-pending_cnt;

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in md-raid1 with 3.7-rcX

2012-11-26 Thread Torsten Kaiser
On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown  wrote:
> On Sat, 24 Nov 2012 10:18:44 +0100 Torsten Kaiser
>  wrote:
>
>> After my system got stuck with 3.7.0-rc2 as reported in
>> http://marc.info/?l=linux-kernel=135142236520624 LOCKDEP seem to
>> blame XFS, because it found 2 possible deadlocks. But after these
>> locking issues where fixed, my system got stuck again with 3.7.0-rc6
>> as reported in http://marc.info/?l=linux-kernel=135344072325490
>> Dave Chinner thinks its an issue within md, that it gets stuck and
>> that will then prevent any further xfs activity, and that I should
>> report it to the raid mailing list.
>>
>> The issue seems to be that multiple processes (kswapd0, xfsaild/md4
>> and flush-9:4) get stuck in md_super_wait() like this:
>> [] schedule+0x24/0x60
>> [] md_super_wait+0x4d/0x80
>> [] ? __init_waitqueue_head+0x60/0x60
>> [] bitmap_unplug+0x173/0x180
>> [] ? write_cache_pages+0x12f/0x420
>> [] ? set_page_dirty_lock+0x60/0x60
>> [] raid1_unplug+0x98/0x110
>> [] blk_flush_plug_list+0xad/0x240
>> [] blk_finish_plug+0x13/0x50
>>
>> The full hung-tasks stack traces and the output from SysRq+W can be
>> found at http://marc.info/?l=linux-kernel=135344072325490 or in the
>> LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'.
>
> Yes, it does look like an md bug
> Can you test to see if this fixes it?

Patch applied, I will try to get it stuck again.
I don't have a reliable reproducers, but if the problem persists I
will definitly report back here.

> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 636bae0..a0f7309 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool 
> from_schedule)
> struct r1conf *conf = mddev->private;
> struct bio *bio;
>
> -   if (from_schedule) {
> +   if (from_schedule || current->bio_list) {
> spin_lock_irq(>device_lock);
> bio_list_merge(>pending_bio_list, >pending);
> conf->pending_count += plug->pending_cnt;
>
>>
>> I tried to understand how this could happen, but I don't see anything
>> wrong. Only that md_super_wait() looks like an open coded version of
>> __wait_event() and could be replaced by using it.
>
> yeah.  md_super_wait was much more complex back when we had to support
> barrier operations.  When they were removed it was simplified a lot and as
> you say it could be simplifier further.  Patches welcome.

I guessed it predated that particular helper.

If you ask for a patch, I have one question:
md_super_wait() looks like __wait_event(), but there also is a
wait_event() helper.
Would it be better to switch to wait_event()? It would add an
additional check for atomic_read(>pending_writes)==0 before
"allocating" and initialising the wait_queue_t, which I think would be
a correct optimization.

>> http://marc.info/?l=linux-raid=135283030027665 looks like the same
>> issue, but using ext4 instead of xfs.
>
> yes, sure does.
>
>>
>> My setup wrt. md is two normal sata disks on a normal ahci controller
>> (AMD SB850 southbridge).
>> Both disks are divided into 4 partitions and each one assembled into a
>> separate raid1.
>> One (md5) is used for swap, the others hold xfs filesystems for /boot/
>> (md4), / (md6) and /home/ (md7).
>>
>> I will try to provide any information you ask, but I can't reproduce
>> the hang on demand so gathering more information about that state is
>> not so easy, but I will try.
>
> I'm fairly confident the above patch will fixes it, and in any case it fixes
> a real bug.  So if you could just run with it and confirm in a week or so
> that the problem hasn't recurred, that might have to do.

I only had 2 or 3 hangs since 3.7-rc1, but suspect forcing the system
to swap (which lies on an raid1) plays a part of it.
As the system as 12GB of RAM it normally doesn't need to swap and I
see no problem. I will try theses workloads again and hope if the
problem persists I can trigger it again in the next few days...

Thanks for the patch,

Torsten

> Thanks,
> NeilBrown
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in md-raid1 with 3.7-rcX

2012-11-26 Thread Torsten Kaiser
On Tue, Nov 27, 2012 at 2:05 AM, NeilBrown ne...@suse.de wrote:
 On Sat, 24 Nov 2012 10:18:44 +0100 Torsten Kaiser
 just.for.l...@googlemail.com wrote:

 After my system got stuck with 3.7.0-rc2 as reported in
 http://marc.info/?l=linux-kernelm=135142236520624 LOCKDEP seem to
 blame XFS, because it found 2 possible deadlocks. But after these
 locking issues where fixed, my system got stuck again with 3.7.0-rc6
 as reported in http://marc.info/?l=linux-kernelm=135344072325490
 Dave Chinner thinks its an issue within md, that it gets stuck and
 that will then prevent any further xfs activity, and that I should
 report it to the raid mailing list.

 The issue seems to be that multiple processes (kswapd0, xfsaild/md4
 and flush-9:4) get stuck in md_super_wait() like this:
 [816b1224] schedule+0x24/0x60
 [814f9dad] md_super_wait+0x4d/0x80
 [8105ca30] ? __init_waitqueue_head+0x60/0x60
 [81500753] bitmap_unplug+0x173/0x180
 [810b6acf] ? write_cache_pages+0x12f/0x420
 [810b6700] ? set_page_dirty_lock+0x60/0x60
 [814e8eb8] raid1_unplug+0x98/0x110
 [81278a6d] blk_flush_plug_list+0xad/0x240
 [81278c13] blk_finish_plug+0x13/0x50

 The full hung-tasks stack traces and the output from SysRq+W can be
 found at http://marc.info/?l=linux-kernelm=135344072325490 or in the
 LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'.

 Yes, it does look like an md bug
 Can you test to see if this fixes it?

Patch applied, I will try to get it stuck again.
I don't have a reliable reproducers, but if the problem persists I
will definitly report back here.

 diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
 index 636bae0..a0f7309 100644
 --- a/drivers/md/raid1.c
 +++ b/drivers/md/raid1.c
 @@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool 
 from_schedule)
 struct r1conf *conf = mddev-private;
 struct bio *bio;

 -   if (from_schedule) {
 +   if (from_schedule || current-bio_list) {
 spin_lock_irq(conf-device_lock);
 bio_list_merge(conf-pending_bio_list, plug-pending);
 conf-pending_count += plug-pending_cnt;


 I tried to understand how this could happen, but I don't see anything
 wrong. Only that md_super_wait() looks like an open coded version of
 __wait_event() and could be replaced by using it.

 yeah.  md_super_wait was much more complex back when we had to support
 barrier operations.  When they were removed it was simplified a lot and as
 you say it could be simplifier further.  Patches welcome.

I guessed it predated that particular helper.

If you ask for a patch, I have one question:
md_super_wait() looks like __wait_event(), but there also is a
wait_event() helper.
Would it be better to switch to wait_event()? It would add an
additional check for atomic_read(mddev-pending_writes)==0 before
allocating and initialising the wait_queue_t, which I think would be
a correct optimization.

 http://marc.info/?l=linux-raidm=135283030027665 looks like the same
 issue, but using ext4 instead of xfs.

 yes, sure does.


 My setup wrt. md is two normal sata disks on a normal ahci controller
 (AMD SB850 southbridge).
 Both disks are divided into 4 partitions and each one assembled into a
 separate raid1.
 One (md5) is used for swap, the others hold xfs filesystems for /boot/
 (md4), / (md6) and /home/ (md7).

 I will try to provide any information you ask, but I can't reproduce
 the hang on demand so gathering more information about that state is
 not so easy, but I will try.

 I'm fairly confident the above patch will fixes it, and in any case it fixes
 a real bug.  So if you could just run with it and confirm in a week or so
 that the problem hasn't recurred, that might have to do.

I only had 2 or 3 hangs since 3.7-rc1, but suspect forcing the system
to swap (which lies on an raid1) plays a part of it.
As the system as 12GB of RAM it normally doesn't need to swap and I
see no problem. I will try theses workloads again and hope if the
problem persists I can trigger it again in the next few days...

Thanks for the patch,

Torsten

 Thanks,
 NeilBrown

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Hang in md-raid1 with 3.7-rcX

2012-11-24 Thread Torsten Kaiser
After my system got stuck with 3.7.0-rc2 as reported in
http://marc.info/?l=linux-kernel=135142236520624 LOCKDEP seem to
blame XFS, because it found 2 possible deadlocks. But after these
locking issues where fixed, my system got stuck again with 3.7.0-rc6
as reported in http://marc.info/?l=linux-kernel=135344072325490
Dave Chinner thinks its an issue within md, that it gets stuck and
that will then prevent any further xfs activity, and that I should
report it to the raid mailing list.

The issue seems to be that multiple processes (kswapd0, xfsaild/md4
and flush-9:4) get stuck in md_super_wait() like this:
[] schedule+0x24/0x60
[] md_super_wait+0x4d/0x80
[] ? __init_waitqueue_head+0x60/0x60
[] bitmap_unplug+0x173/0x180
[] ? write_cache_pages+0x12f/0x420
[] ? set_page_dirty_lock+0x60/0x60
[] raid1_unplug+0x98/0x110
[] blk_flush_plug_list+0xad/0x240
[] blk_finish_plug+0x13/0x50

The full hung-tasks stack traces and the output from SysRq+W can be
found at http://marc.info/?l=linux-kernel=135344072325490 or in the
LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'.

I tried to understand how this could happen, but I don't see anything
wrong. Only that md_super_wait() looks like an open coded version of
__wait_event() and could be replaced by using it.

http://marc.info/?l=linux-raid=135283030027665 looks like the same
issue, but using ext4 instead of xfs.

My setup wrt. md is two normal sata disks on a normal ahci controller
(AMD SB850 southbridge).
Both disks are divided into 4 partitions and each one assembled into a
separate raid1.
One (md5) is used for swap, the others hold xfs filesystems for /boot/
(md4), / (md6) and /home/ (md7).

I will try to provide any information you ask, but I can't reproduce
the hang on demand so gathering more information about that state is
not so easy, but I will try.

Thanks for looking into this,

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Hang in md-raid1 with 3.7-rcX

2012-11-24 Thread Torsten Kaiser
After my system got stuck with 3.7.0-rc2 as reported in
http://marc.info/?l=linux-kernelm=135142236520624 LOCKDEP seem to
blame XFS, because it found 2 possible deadlocks. But after these
locking issues where fixed, my system got stuck again with 3.7.0-rc6
as reported in http://marc.info/?l=linux-kernelm=135344072325490
Dave Chinner thinks its an issue within md, that it gets stuck and
that will then prevent any further xfs activity, and that I should
report it to the raid mailing list.

The issue seems to be that multiple processes (kswapd0, xfsaild/md4
and flush-9:4) get stuck in md_super_wait() like this:
[816b1224] schedule+0x24/0x60
[814f9dad] md_super_wait+0x4d/0x80
[8105ca30] ? __init_waitqueue_head+0x60/0x60
[81500753] bitmap_unplug+0x173/0x180
[810b6acf] ? write_cache_pages+0x12f/0x420
[810b6700] ? set_page_dirty_lock+0x60/0x60
[814e8eb8] raid1_unplug+0x98/0x110
[81278a6d] blk_flush_plug_list+0xad/0x240
[81278c13] blk_finish_plug+0x13/0x50

The full hung-tasks stack traces and the output from SysRq+W can be
found at http://marc.info/?l=linux-kernelm=135344072325490 or in the
LKML thread 'Hang in XFS reclaim on 3.7.0-rc3'.

I tried to understand how this could happen, but I don't see anything
wrong. Only that md_super_wait() looks like an open coded version of
__wait_event() and could be replaced by using it.

http://marc.info/?l=linux-raidm=135283030027665 looks like the same
issue, but using ext4 instead of xfs.

My setup wrt. md is two normal sata disks on a normal ahci controller
(AMD SB850 southbridge).
Both disks are divided into 4 partitions and each one assembled into a
separate raid1.
One (md5) is used for swap, the others hold xfs filesystems for /boot/
(md4), / (md6) and /home/ (md7).

I will try to provide any information you ask, but I can't reproduce
the hang on demand so gathering more information about that state is
not so easy, but I will try.

Thanks for looking into this,

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-19 Thread Torsten Kaiser
On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner  wrote:
> On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote:
> So, both lockdep thingy's are the same:

I suspected this, but as the reports where slightly different I
attached bith of them, as I couldn't decide witch one was the
better/simpler report to debug this.

>> [110926.972482] =
>> [110926.972484] [ INFO: possible irq lock inversion dependency detected ]
>> [110926.972486] 3.7.0-rc4 #1 Not tainted
>> [110926.972487] -
>> [110926.972489] kswapd0/725 just changed the state of lock:
>> [110926.972490]  (sb_internal){.+.+.?}, at: [] 
>> xfs_trans_alloc+0x28/0x50
>> [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the 
>> past:
>> [110926.972500]  (&(>i_lock)->mr_lock/1){+.+.+.}
>
> Ah, what? Since when has the ilock been reclaim unsafe?
>
>> [110926.972500] and interrupts could create inverse lock ordering between 
>> them.
>> [110926.972500]
>> [110926.972503]
>> [110926.972503] other info that might help us debug this:
>> [110926.972504]  Possible interrupt unsafe locking scenario:
>> [110926.972504]
>> [110926.972505]CPU0CPU1
>> [110926.972506]
>> [110926.972507]   lock(&(>i_lock)->mr_lock/1);
>> [110926.972509]local_irq_disable();
>> [110926.972509]lock(sb_internal);
>> [110926.972511]
>> lock(&(>i_lock)->mr_lock/1);
>> [110926.972512]   
>> [110926.972513] lock(sb_internal);
>
> Um, that's just bizzare. No XFS code runs with interrupts disabled,
> so I cannot see how this possible.
>
> .
>
>
>[] mark_held_locks+0x7e/0x130
>[] lockdep_trace_alloc+0x63/0xc0
>[] kmem_cache_alloc+0x35/0xe0
>[] vm_map_ram+0x271/0x770
>[] _xfs_buf_map_pages+0x46/0xe0
>[] xfs_buf_get_map+0x8a/0x130
>[] xfs_trans_get_buf_map+0xa9/0xd0
>[] xfs_ialloc_inode_init+0xcd/0x1d0
>
> We shouldn't be mapping buffers there, there's a patch below to fix
> this. It's probably the source of this report, even though I cannot
> lockdep seems to be off with the fairies...

I also tried to understand what lockdep was saying, but
Documentation/lockdep-design.txt is not too helpful.
I think 'CLASS'-ON-R / -ON-W means that this lock was 'ON' / held
while 'CLASS' (HARDIRQ, SOFTIRQ, RECLAIM_FS) happend and that makes
this lock unsafe for these contexts. IN-'CLASS'-R / -W seems to be
'lock taken in context 'CLASS'.
A note that 'CLASS'-ON-? means 'CLASS'-unsafe in there would be helpful to me...

Wrt. above interrupt output: I think lockdep doesn't really know about
RECLAIM_FS and threats it as another interrupt. I think that output
should have been something like this:
CPU0CPU1

   lock(&(>i_lock)->mr_lock/1);

lock(sb_internal);
lock(&(>i_lock)->mr_lock/1);
   
 lock(sb_internal);

Entering reclaim on CPU1 would mean that CPU1 would not enter reclaim
again, so the reclaim-'interrupt' would be disabled.
And instead of interrupts disrupting the normal codeflow on CPU0 it
would be 'interrupted' be instead of doing a normal allocation, it
would 'interrupt' the allocation to reclaim memory.
print_irq_lock_scenario() would need to be taught to print a slightly
different message for reclaim-'interrupts'.

I will try your patch, but as I do not have a reliable reproducer to
create this lockdep report, I can't really verify if this fixes it.
But I will definitely mail you, if it happens again with this patch.

Thanks, Torsten

> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
>
> xfs: inode allocation should use unmapped buffers.
>
> From: Dave Chinner 
>
> Inode buffers do not need to be mapped as inodes are read or written
> directly from/to the pages underlying the buffer. This fixes a
> regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
> default behaviour").
>
> Signed-off-by: Dave Chinner 
> ---
>  fs/xfs/xfs_ialloc.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
> index 2d6495e..a815412 100644
> --- a/fs/xfs/xfs_ialloc.c
> +++ b/fs/xfs/xfs_ialloc.c
> @@ -200,7 +200,8 @@ xfs_ialloc_inode_init(
>  */
> d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j *

Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-19 Thread Torsten Kaiser
On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner da...@fromorbit.com wrote:
 On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote:
 So, both lockdep thingy's are the same:

I suspected this, but as the reports where slightly different I
attached bith of them, as I couldn't decide witch one was the
better/simpler report to debug this.

 [110926.972482] =
 [110926.972484] [ INFO: possible irq lock inversion dependency detected ]
 [110926.972486] 3.7.0-rc4 #1 Not tainted
 [110926.972487] -
 [110926.972489] kswapd0/725 just changed the state of lock:
 [110926.972490]  (sb_internal){.+.+.?}, at: [8122b268] 
 xfs_trans_alloc+0x28/0x50
 [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the 
 past:
 [110926.972500]  ((ip-i_lock)-mr_lock/1){+.+.+.}

 Ah, what? Since when has the ilock been reclaim unsafe?

 [110926.972500] and interrupts could create inverse lock ordering between 
 them.
 [110926.972500]
 [110926.972503]
 [110926.972503] other info that might help us debug this:
 [110926.972504]  Possible interrupt unsafe locking scenario:
 [110926.972504]
 [110926.972505]CPU0CPU1
 [110926.972506]
 [110926.972507]   lock((ip-i_lock)-mr_lock/1);
 [110926.972509]local_irq_disable();
 [110926.972509]lock(sb_internal);
 [110926.972511]
 lock((ip-i_lock)-mr_lock/1);
 [110926.972512]   Interrupt
 [110926.972513] lock(sb_internal);

 Um, that's just bizzare. No XFS code runs with interrupts disabled,
 so I cannot see how this possible.

 .


[8108137e] mark_held_locks+0x7e/0x130
[81081a63] lockdep_trace_alloc+0x63/0xc0
[810e9dd5] kmem_cache_alloc+0x35/0xe0
[810dba31] vm_map_ram+0x271/0x770
[811e1316] _xfs_buf_map_pages+0x46/0xe0
[811e222a] xfs_buf_get_map+0x8a/0x130
[81233ab9] xfs_trans_get_buf_map+0xa9/0xd0
[8121bced] xfs_ialloc_inode_init+0xcd/0x1d0

 We shouldn't be mapping buffers there, there's a patch below to fix
 this. It's probably the source of this report, even though I cannot
 lockdep seems to be off with the fairies...

I also tried to understand what lockdep was saying, but
Documentation/lockdep-design.txt is not too helpful.
I think 'CLASS'-ON-R / -ON-W means that this lock was 'ON' / held
while 'CLASS' (HARDIRQ, SOFTIRQ, RECLAIM_FS) happend and that makes
this lock unsafe for these contexts. IN-'CLASS'-R / -W seems to be
'lock taken in context 'CLASS'.
A note that 'CLASS'-ON-? means 'CLASS'-unsafe in there would be helpful to me...

Wrt. above interrupt output: I think lockdep doesn't really know about
RECLAIM_FS and threats it as another interrupt. I think that output
should have been something like this:
CPU0CPU1

   lock((ip-i_lock)-mr_lock/1);
Allocation enters reclaim
lock(sb_internal);
lock((ip-i_lock)-mr_lock/1);
   Allocation enters reclaim
 lock(sb_internal);

Entering reclaim on CPU1 would mean that CPU1 would not enter reclaim
again, so the reclaim-'interrupt' would be disabled.
And instead of interrupts disrupting the normal codeflow on CPU0 it
would be 'interrupted' be instead of doing a normal allocation, it
would 'interrupt' the allocation to reclaim memory.
print_irq_lock_scenario() would need to be taught to print a slightly
different message for reclaim-'interrupts'.

I will try your patch, but as I do not have a reliable reproducer to
create this lockdep report, I can't really verify if this fixes it.
But I will definitely mail you, if it happens again with this patch.

Thanks, Torsten

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com

 xfs: inode allocation should use unmapped buffers.

 From: Dave Chinner dchin...@redhat.com

 Inode buffers do not need to be mapped as inodes are read or written
 directly from/to the pages underlying the buffer. This fixes a
 regression introduced by commit 611c994 (xfs: make XBF_MAPPED the
 default behaviour).

 Signed-off-by: Dave Chinner dchin...@redhat.com
 ---
  fs/xfs/xfs_ialloc.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
 index 2d6495e..a815412 100644
 --- a/fs/xfs/xfs_ialloc.c
 +++ b/fs/xfs/xfs_ialloc.c
 @@ -200,7 +200,8 @@ xfs_ialloc_inode_init(
  */
 d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j * 
 blks_per_cluster));
 fbuf = xfs_trans_get_buf(tp, mp-m_ddev_targp, d,
 -mp-m_bsize * blks_per_cluster, 0);
 +mp-m_bsize * blks_per_cluster

Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-18 Thread Torsten Kaiser
On Mon, Nov 19, 2012 at 12:51 AM, Dave Chinner  wrote:
> On Sun, Nov 18, 2012 at 04:29:22PM +0100, Torsten Kaiser wrote:
>> On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser
>>  wrote:
>> > On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
>> >  wrote:
>> >> I will keep LOCKDEP enabled on that system, and if there really is
>> >> another splat, I will report back here. But I rather doubt that this
>> >> will be needed.
>> >
>> > After the patch, I did not see this problem again, but today I found
>> > another LOCKDEP report that also looks XFS related.
>> > I found it twice in the logs, and as both were slightly different, I
>> > will attach both versions.
>>
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
>> > {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104430]CPU0
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104431]
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   
>> > lock(&(>i_lock)->mr_lock);
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
>> > lock(&(>i_lock)->mr_lock);
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
>> > Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***
>>
>> Sorry! Copied the wrong report. Your fix only landed in -rc5, so my
>> vanilla -rc4 did (also) report the old problem again.
>> And I copy that report instead of the second appearance of the
>> new problem.
>
> Can you repost it with line wrapping turned off? The output simply
> becomes unreadable when it wraps
>
> Yeah, I know I can put it back together, but I've got better things
> to do with my time than stitch a couple of hundred lines of debug
> back into a readable format

Sorry about that, but I can't find any option to turn that off in Gmail.

I have added the reports as attachment, I hope thats OK for you.

Thanks for looking into this.

Torsten
[110926.972477] 
[110926.972482] =
[110926.972484] [ INFO: possible irq lock inversion dependency detected ]
[110926.972486] 3.7.0-rc4 #1 Not tainted
[110926.972487] -
[110926.972489] kswapd0/725 just changed the state of lock:
[110926.972490]  (sb_internal){.+.+.?}, at: [] 
xfs_trans_alloc+0x28/0x50
[110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[110926.972500]  (&(>i_lock)->mr_lock/1){+.+.+.}
[110926.972500] 
[110926.972500] and interrupts could create inverse lock ordering between them.
[110926.972500] 
[110926.972503] 
[110926.972503] other info that might help us debug this:
[110926.972504]  Possible interrupt unsafe locking scenario:
[110926.972504] 
[110926.972505]CPU0CPU1
[110926.972506]
[110926.972507]   lock(&(>i_lock)->mr_lock/1);
[110926.972509]local_irq_disable();
[110926.972509]lock(sb_internal);
[110926.972511]lock(&(>i_lock)->mr_lock/1);
[110926.972512]   
[110926.972513] lock(sb_internal);
[110926.972514] 
[110926.972514]  *** DEADLOCK ***
[110926.972514] 
[110926.972516] 2 locks held by kswapd0/725:
[110926.972517]  #0:  (shrinker_rwsem){..}, at: [] 
shrink_slab+0x32/0x1f0
[110926.972522]  #1:  (>s_umount_key#20){.+}, at: 
[] grab_super_passive+0x3e/0x90
[110926.972527] 
[110926.972527] the shortest dependencies between 2nd lock and 1st lock:
[110926.972533]  -> (&(>i_lock)->mr_lock/1){+.+.+.} ops: 58117 {
[110926.972536] HARDIRQ-ON-W at:
[110926.972537]   [] 
__lock_acquire+0x631/0x1c00
[110926.972540]   [] 
lock_acquire+0x55/0x70
[110926.972542]   [] 
down_write_nested+0x4a/0x70
[110926.972545]   [] xfs_ilock+0x84/0xb0
[110926.972548]   [] 
xfs_create+0x1d4/0x5a0
[110926.972550]   [] 
xfs_vn_mknod+0x8a/0x1b0
[110926.972552]   [] 
xfs_vn_create+0xe/0x10
[110926.972554]   [] vfs_create+0x72/0xc0
[110926.972556]   [] 
do_last.isra.69+0x80e/0xc80
[110926.972558]   [] 
path_openat.isra.70+0xab/0x490
[110926.972560]   [] 
do_filp_open+0x3d/0xa0
[110926.972562]   [] 
do_sys_open+0xf9/0x1e0
[110926.972565]   [] sys_open+0x1c/0x20
[110926.972567]   [] 

Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-18 Thread Torsten Kaiser
On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser
 wrote:
> On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
>  wrote:
>> I will keep LOCKDEP enabled on that system, and if there really is
>> another splat, I will report back here. But I rather doubt that this
>> will be needed.
>
> After the patch, I did not see this problem again, but today I found
> another LOCKDEP report that also looks XFS related.
> I found it twice in the logs, and as both were slightly different, I
> will attach both versions.

> Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
> Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
> {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
> Nov  6 21:57:09 thoregon kernel: [ 9941.104430]CPU0
> Nov  6 21:57:09 thoregon kernel: [ 9941.104431]
> Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   
> lock(&(>i_lock)->mr_lock);
> Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   
> Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
> lock(&(>i_lock)->mr_lock);
> Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
> Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***

Sorry! Copied the wrong report. Your fix only landed in -rc5, so my
vanilla -rc4 did (also) report the old problem again.
And I copy that report instead of the second appearance of the
new problem.

Here is the correct second report of the sb_internal vs
ip->i_lock->mr_lock problem:
[110926.972477]
[110926.972482] =
[110926.972484] [ INFO: possible irq lock inversion dependency detected ]
[110926.972486] 3.7.0-rc4 #1 Not tainted
[110926.972487] -
[110926.972489] kswapd0/725 just changed the state of lock:
[110926.972490]  (sb_internal){.+.+.?}, at: []
xfs_trans_alloc+0x28/0x50
[110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[110926.972500]  (&(>i_lock)->mr_lock/1){+.+.+.}
[110926.972500]
[110926.972500] and interrupts could create inverse lock ordering between them.
[110926.972500]
[110926.972503]
[110926.972503] other info that might help us debug this:
[110926.972504]  Possible interrupt unsafe locking scenario:
[110926.972504]
[110926.972505]CPU0CPU1
[110926.972506]
[110926.972507]   lock(&(>i_lock)->mr_lock/1);
[110926.972509]local_irq_disable();
[110926.972509]lock(sb_internal);
[110926.972511]lock(&(>i_lock)->mr_lock/1);
[110926.972512]   
[110926.972513] lock(sb_internal);
[110926.972514]
[110926.972514]  *** DEADLOCK ***
[110926.972514]
[110926.972516] 2 locks held by kswapd0/725:
[110926.972517]  #0:  (shrinker_rwsem){..}, at:
[] shrink_slab+0x32/0x1f0
[110926.972522]  #1:  (>s_umount_key#20){.+}, at:
[] grab_super_passive+0x3e/0x90
[110926.972527]
[110926.972527] the shortest dependencies between 2nd lock and 1st lock:
[110926.972533]  -> (&(>i_lock)->mr_lock/1){+.+.+.} ops: 58117 {
[110926.972536] HARDIRQ-ON-W at:
[110926.972537]   []
__lock_acquire+0x631/0x1c00
[110926.972540]   []
lock_acquire+0x55/0x70
[110926.972542]   []
down_write_nested+0x4a/0x70
[110926.972545]   [] xfs_ilock+0x84/0xb0
[110926.972548]   []
xfs_create+0x1d4/0x5a0
[110926.972550]   []
xfs_vn_mknod+0x8a/0x1b0
[110926.972552]   []
xfs_vn_create+0xe/0x10
[110926.972554]   [] vfs_create+0x72/0xc0
[110926.972556]   []
do_last.isra.69+0x80e/0xc80
[110926.972558]   []
path_openat.isra.70+0xab/0x490
[110926.972560]   []
do_filp_open+0x3d/0xa0
[110926.972562]   []
do_sys_open+0xf9/0x1e0
[110926.972565]   [] sys_open+0x1c/0x20
[110926.972567]   []
system_call_fastpath+0x16/0x1b
[110926.972570] SOFTIRQ-ON-W at:
[110926.972571]   []
__lock_acquire+0x667/0x1c00
[110926.972573]   []
lock_acquire+0x55/0x70
[110926.972574]   []
down_write_nested+0x4a/0x70
[110926.972576]   [] xfs_ilock+0x84/0xb0
[110926.972578]   []
xfs_create+0x1d4/0x5a0
[110926.972580]   []
xfs_vn_mknod+0x8a/0x1b0
[110926.972581]   []
xfs_vn_create+0xe/0x10
[110926.972583]   [] vfs_create+0x72/0xc0
[110926.972585]   []
do_last.isra.69+0x80e/0xc80
[110926.972587]   []
path_openat.isra.70+0xab/0x490
[110926.972589]   []
do_filp_open+0x3d/0xa0
[110926.972591]   []
do_s

Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-18 Thread Torsten Kaiser
On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
 wrote:
> I will keep LOCKDEP enabled on that system, and if there really is
> another splat, I will report back here. But I rather doubt that this
> will be needed.

After the patch, I did not see this problem again, but today I found
another LOCKDEP report that also looks XFS related.
I found it twice in the logs, and as both were slightly different, I
will attach both versions.


Nov  6 21:57:09 thoregon kernel: [ 9941.104345]
Nov  6 21:57:09 thoregon kernel: [ 9941.104350]
=
Nov  6 21:57:09 thoregon kernel: [ 9941.104351] [ INFO: inconsistent
lock state ]
Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
Nov  6 21:57:09 thoregon kernel: [ 9941.104354]
-
Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
{RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
Nov  6 21:57:09 thoregon kernel: [ 9941.104357] kswapd0/725
[HC0[0]:SC0[0]:HE1:SE1] takes:
Nov  6 21:57:09 thoregon kernel: [ 9941.104359]
(&(>i_lock)->mr_lock){?.}, at: []
xfs_ilock+0x84/0xb0
Nov  6 21:57:09 thoregon kernel: [ 9941.104366] {RECLAIM_FS-ON-W}
state was registered at:
Nov  6 21:57:09 thoregon kernel: [ 9941.104367]   []
mark_held_locks+0x7e/0x130
Nov  6 21:57:09 thoregon kernel: [ 9941.104371]   []
lockdep_trace_alloc+0x63/0xc0
Nov  6 21:57:09 thoregon kernel: [ 9941.104373]   []
__alloc_pages_nodemask+0x75/0x800
Nov  6 21:57:09 thoregon kernel: [ 9941.104375]   []
__get_free_pages+0x12/0x40
Nov  6 21:57:09 thoregon kernel: [ 9941.104377]   []
pte_alloc_one_kernel+0x10/0x20
Nov  6 21:57:09 thoregon kernel: [ 9941.104380]   []
__pte_alloc_kernel+0x16/0x90
Nov  6 21:57:09 thoregon kernel: [ 9941.104382]   []
vmap_page_range_noflush+0x287/0x320
Nov  6 21:57:09 thoregon kernel: [ 9941.104385]   []
vm_map_ram+0x694/0x770
Nov  6 21:57:09 thoregon kernel: [ 9941.104386]   []
_xfs_buf_map_pages+0x46/0xe0
Nov  6 21:57:09 thoregon kernel: [ 9941.104389]   []
xfs_buf_get_map+0x8a/0x130
Nov  6 21:57:09 thoregon kernel: [ 9941.104391]   []
xfs_trans_get_buf_map+0xa9/0xd0
Nov  6 21:57:09 thoregon kernel: [ 9941.104393]   []
xfs_ifree_cluster+0x129/0x670
Nov  6 21:57:09 thoregon kernel: [ 9941.104396]   []
xfs_ifree+0xe9/0xf0
Nov  6 21:57:09 thoregon kernel: [ 9941.104398]   []
xfs_inactive+0x2af/0x480
Nov  6 21:57:09 thoregon kernel: [ 9941.104400]   []
xfs_fs_evict_inode+0x70/0x80
Nov  6 21:57:09 thoregon kernel: [ 9941.104402]   []
evict+0xaf/0x1b0
Nov  6 21:57:09 thoregon kernel: [ 9941.104405]   []
iput+0x105/0x210
Nov  6 21:57:09 thoregon kernel: [ 9941.104406]   []
d_delete+0x150/0x190
Nov  6 21:57:09 thoregon kernel: [ 9941.104408]   []
vfs_rmdir+0x107/0x120
Nov  6 21:57:09 thoregon kernel: [ 9941.104411]   []
do_rmdir+0xe4/0x130
Nov  6 21:57:09 thoregon kernel: [ 9941.104413]   []
sys_rmdir+0x11/0x20
Nov  6 21:57:09 thoregon kernel: [ 9941.104415]   []
system_call_fastpath+0x16/0x1b
Nov  6 21:57:09 thoregon kernel: [ 9941.104417] irq event stamp: 18505
Nov  6 21:57:09 thoregon kernel: [ 9941.104418] hardirqs last  enabled
at (18505): [] mutex_trylock+0xfd/0x170
Nov  6 21:57:09 thoregon kernel: [ 9941.104421] hardirqs last disabled
at (18504): [] mutex_trylock+0x3e/0x170
Nov  6 21:57:09 thoregon kernel: [ 9941.104423] softirqs last  enabled
at (18492): [] __do_softirq+0x111/0x170
Nov  6 21:57:09 thoregon kernel: [ 9941.104426] softirqs last disabled
at (18477): [] call_softirq+0x1c/0x30
Nov  6 21:57:09 thoregon kernel: [ 9941.104428]
Nov  6 21:57:09 thoregon kernel: [ 9941.104428] other info that might
help us debug this:
Nov  6 21:57:09 thoregon kernel: [ 9941.104429]  Possible unsafe
locking scenario:
Nov  6 21:57:09 thoregon kernel: [ 9941.104429]
Nov  6 21:57:09 thoregon kernel: [ 9941.104430]CPU0
Nov  6 21:57:09 thoregon kernel: [ 9941.104431]
Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   lock(&(>i_lock)->mr_lock);
Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   
Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
lock(&(>i_lock)->mr_lock);
Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***
Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
Nov  6 21:57:09 thoregon kernel: [ 9941.104437] 3 locks held by kswapd0/725:
Nov  6 21:57:09 thoregon kernel: [ 9941.104438]  #0:
(shrinker_rwsem){..}, at: []
shrink_slab+0x32/0x1f0
Nov  6 21:57:09 thoregon kernel: [ 9941.104442]  #1:
(>s_umount_key#20){.+}, at: []
grab_super_passive+0x3e/0x90
Nov  6 21:57:09 thoregon kernel: [ 9941.104446]  #2:
(>pag_ici_reclaim_lock){+.+...}, at: []
xfs_reclaim_inodes_ag+0xbc/0x4f0
Nov  6 21:57:09 thoregon kernel: [ 9941.104449]
Nov  6 21:57:09 thoregon kernel: [ 9941.104449] stack backtrace:
Nov  6 21:57:09 thoregon kernel: [ 9941.104451] Pid: 725, comm:
kswapd0 Not tainted 3.7.0-rc4 #1
Nov  6 21:57:09 thoregon kernel: [ 9941.104452] Call Trace:
Nov  6 21:57:09 thoregon kernel: [ 9941.1044

Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-18 Thread Torsten Kaiser
On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
just.for.l...@googlemail.com wrote:
 I will keep LOCKDEP enabled on that system, and if there really is
 another splat, I will report back here. But I rather doubt that this
 will be needed.

After the patch, I did not see this problem again, but today I found
another LOCKDEP report that also looks XFS related.
I found it twice in the logs, and as both were slightly different, I
will attach both versions.


Nov  6 21:57:09 thoregon kernel: [ 9941.104345]
Nov  6 21:57:09 thoregon kernel: [ 9941.104350]
=
Nov  6 21:57:09 thoregon kernel: [ 9941.104351] [ INFO: inconsistent
lock state ]
Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
Nov  6 21:57:09 thoregon kernel: [ 9941.104354]
-
Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
{RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage.
Nov  6 21:57:09 thoregon kernel: [ 9941.104357] kswapd0/725
[HC0[0]:SC0[0]:HE1:SE1] takes:
Nov  6 21:57:09 thoregon kernel: [ 9941.104359]
((ip-i_lock)-mr_lock){?.}, at: [811e8164]
xfs_ilock+0x84/0xb0
Nov  6 21:57:09 thoregon kernel: [ 9941.104366] {RECLAIM_FS-ON-W}
state was registered at:
Nov  6 21:57:09 thoregon kernel: [ 9941.104367]   [8108137e]
mark_held_locks+0x7e/0x130
Nov  6 21:57:09 thoregon kernel: [ 9941.104371]   [81081a63]
lockdep_trace_alloc+0x63/0xc0
Nov  6 21:57:09 thoregon kernel: [ 9941.104373]   [810b5a55]
__alloc_pages_nodemask+0x75/0x800
Nov  6 21:57:09 thoregon kernel: [ 9941.104375]   [810b6262]
__get_free_pages+0x12/0x40
Nov  6 21:57:09 thoregon kernel: [ 9941.104377]   [8102d7f0]
pte_alloc_one_kernel+0x10/0x20
Nov  6 21:57:09 thoregon kernel: [ 9941.104380]   [810cc3e6]
__pte_alloc_kernel+0x16/0x90
Nov  6 21:57:09 thoregon kernel: [ 9941.104382]   [810d9f37]
vmap_page_range_noflush+0x287/0x320
Nov  6 21:57:09 thoregon kernel: [ 9941.104385]   [810dbe54]
vm_map_ram+0x694/0x770
Nov  6 21:57:09 thoregon kernel: [ 9941.104386]   [811e1316]
_xfs_buf_map_pages+0x46/0xe0
Nov  6 21:57:09 thoregon kernel: [ 9941.104389]   [811e222a]
xfs_buf_get_map+0x8a/0x130
Nov  6 21:57:09 thoregon kernel: [ 9941.104391]   [81233ab9]
xfs_trans_get_buf_map+0xa9/0xd0
Nov  6 21:57:09 thoregon kernel: [ 9941.104393]   [8121e5a9]
xfs_ifree_cluster+0x129/0x670
Nov  6 21:57:09 thoregon kernel: [ 9941.104396]   [8121fbc9]
xfs_ifree+0xe9/0xf0
Nov  6 21:57:09 thoregon kernel: [ 9941.104398]   [811f4d2f]
xfs_inactive+0x2af/0x480
Nov  6 21:57:09 thoregon kernel: [ 9941.104400]   [811efe00]
xfs_fs_evict_inode+0x70/0x80
Nov  6 21:57:09 thoregon kernel: [ 9941.104402]   [8110cb8f]
evict+0xaf/0x1b0
Nov  6 21:57:09 thoregon kernel: [ 9941.104405]   [8110cd95]
iput+0x105/0x210
Nov  6 21:57:09 thoregon kernel: [ 9941.104406]   [81107ba0]
d_delete+0x150/0x190
Nov  6 21:57:09 thoregon kernel: [ 9941.104408]   [810ff8a7]
vfs_rmdir+0x107/0x120
Nov  6 21:57:09 thoregon kernel: [ 9941.104411]   [810ff9a4]
do_rmdir+0xe4/0x130
Nov  6 21:57:09 thoregon kernel: [ 9941.104413]   [81101c01]
sys_rmdir+0x11/0x20
Nov  6 21:57:09 thoregon kernel: [ 9941.104415]   [816b2d12]
system_call_fastpath+0x16/0x1b
Nov  6 21:57:09 thoregon kernel: [ 9941.104417] irq event stamp: 18505
Nov  6 21:57:09 thoregon kernel: [ 9941.104418] hardirqs last  enabled
at (18505): [816aec5d] mutex_trylock+0xfd/0x170
Nov  6 21:57:09 thoregon kernel: [ 9941.104421] hardirqs last disabled
at (18504): [816aeb9e] mutex_trylock+0x3e/0x170
Nov  6 21:57:09 thoregon kernel: [ 9941.104423] softirqs last  enabled
at (18492): [81042fb1] __do_softirq+0x111/0x170
Nov  6 21:57:09 thoregon kernel: [ 9941.104426] softirqs last disabled
at (18477): [816b3e3c] call_softirq+0x1c/0x30
Nov  6 21:57:09 thoregon kernel: [ 9941.104428]
Nov  6 21:57:09 thoregon kernel: [ 9941.104428] other info that might
help us debug this:
Nov  6 21:57:09 thoregon kernel: [ 9941.104429]  Possible unsafe
locking scenario:
Nov  6 21:57:09 thoregon kernel: [ 9941.104429]
Nov  6 21:57:09 thoregon kernel: [ 9941.104430]CPU0
Nov  6 21:57:09 thoregon kernel: [ 9941.104431]
Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   lock((ip-i_lock)-mr_lock);
Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   Interrupt
Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
lock((ip-i_lock)-mr_lock);
Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***
Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
Nov  6 21:57:09 thoregon kernel: [ 9941.104437] 3 locks held by kswapd0/725:
Nov  6 21:57:09 thoregon kernel: [ 9941.104438]  #0:
(shrinker_rwsem){..}, at: [810bbd22]
shrink_slab+0x32/0x1f0
Nov  6 21:57:09 thoregon kernel: [ 9941.104442]  #1:
(type-s_umount_key#20){.+}, at: [810f5a8e

Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-18 Thread Torsten Kaiser
On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser
just.for.l...@googlemail.com wrote:
 On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
 just.for.l...@googlemail.com wrote:
 I will keep LOCKDEP enabled on that system, and if there really is
 another splat, I will report back here. But I rather doubt that this
 will be needed.

 After the patch, I did not see this problem again, but today I found
 another LOCKDEP report that also looks XFS related.
 I found it twice in the logs, and as both were slightly different, I
 will attach both versions.

 Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
 Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
 {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage.
 Nov  6 21:57:09 thoregon kernel: [ 9941.104430]CPU0
 Nov  6 21:57:09 thoregon kernel: [ 9941.104431]
 Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   
 lock((ip-i_lock)-mr_lock);
 Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   Interrupt
 Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
 lock((ip-i_lock)-mr_lock);
 Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
 Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***

Sorry! Copied the wrong report. Your fix only landed in -rc5, so my
vanilla -rc4 did (also) report the old problem again.
And I copypasted that report instead of the second appearance of the
new problem.

Here is the correct second report of the sb_internal vs
ip-i_lock-mr_lock problem:
[110926.972477]
[110926.972482] =
[110926.972484] [ INFO: possible irq lock inversion dependency detected ]
[110926.972486] 3.7.0-rc4 #1 Not tainted
[110926.972487] -
[110926.972489] kswapd0/725 just changed the state of lock:
[110926.972490]  (sb_internal){.+.+.?}, at: [8122b268]
xfs_trans_alloc+0x28/0x50
[110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[110926.972500]  ((ip-i_lock)-mr_lock/1){+.+.+.}
[110926.972500]
[110926.972500] and interrupts could create inverse lock ordering between them.
[110926.972500]
[110926.972503]
[110926.972503] other info that might help us debug this:
[110926.972504]  Possible interrupt unsafe locking scenario:
[110926.972504]
[110926.972505]CPU0CPU1
[110926.972506]
[110926.972507]   lock((ip-i_lock)-mr_lock/1);
[110926.972509]local_irq_disable();
[110926.972509]lock(sb_internal);
[110926.972511]lock((ip-i_lock)-mr_lock/1);
[110926.972512]   Interrupt
[110926.972513] lock(sb_internal);
[110926.972514]
[110926.972514]  *** DEADLOCK ***
[110926.972514]
[110926.972516] 2 locks held by kswapd0/725:
[110926.972517]  #0:  (shrinker_rwsem){..}, at:
[810bbd22] shrink_slab+0x32/0x1f0
[110926.972522]  #1:  (type-s_umount_key#20){.+}, at:
[810f5a8e] grab_super_passive+0x3e/0x90
[110926.972527]
[110926.972527] the shortest dependencies between 2nd lock and 1st lock:
[110926.972533]  - ((ip-i_lock)-mr_lock/1){+.+.+.} ops: 58117 {
[110926.972536] HARDIRQ-ON-W at:
[110926.972537]   [8107f091]
__lock_acquire+0x631/0x1c00
[110926.972540]   [81080b55]
lock_acquire+0x55/0x70
[110926.972542]   [8106126a]
down_write_nested+0x4a/0x70
[110926.972545]   [811e8164] xfs_ilock+0x84/0xb0
[110926.972548]   [811f5194]
xfs_create+0x1d4/0x5a0
[110926.972550]   [811eca1a]
xfs_vn_mknod+0x8a/0x1b0
[110926.972552]   [811ecb6e]
xfs_vn_create+0xe/0x10
[110926.972554]   [81100332] vfs_create+0x72/0xc0
[110926.972556]   [81100b8e]
do_last.isra.69+0x80e/0xc80
[110926.972558]   [811010ab]
path_openat.isra.70+0xab/0x490
[110926.972560]   [8110184d]
do_filp_open+0x3d/0xa0
[110926.972562]   [810f2139]
do_sys_open+0xf9/0x1e0
[110926.972565]   [810f223c] sys_open+0x1c/0x20
[110926.972567]   [816b2d12]
system_call_fastpath+0x16/0x1b
[110926.972570] SOFTIRQ-ON-W at:
[110926.972571]   [8107f0c7]
__lock_acquire+0x667/0x1c00
[110926.972573]   [81080b55]
lock_acquire+0x55/0x70
[110926.972574]   [8106126a]
down_write_nested+0x4a/0x70
[110926.972576]   [811e8164] xfs_ilock+0x84/0xb0
[110926.972578]   [811f5194]
xfs_create+0x1d4/0x5a0
[110926.972580]   [811eca1a]
xfs_vn_mknod+0x8a/0x1b0
[110926.972581]   [811ecb6e]
xfs_vn_create+0xe/0x10
[110926.972583

Re: Hang in XFS reclaim on 3.7.0-rc3

2012-11-18 Thread Torsten Kaiser
On Mon, Nov 19, 2012 at 12:51 AM, Dave Chinner da...@fromorbit.com wrote:
 On Sun, Nov 18, 2012 at 04:29:22PM +0100, Torsten Kaiser wrote:
 On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser
 just.for.l...@googlemail.com wrote:
  On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
  just.for.l...@googlemail.com wrote:
  I will keep LOCKDEP enabled on that system, and if there really is
  another splat, I will report back here. But I rather doubt that this
  will be needed.
 
  After the patch, I did not see this problem again, but today I found
  another LOCKDEP report that also looks XFS related.
  I found it twice in the logs, and as both were slightly different, I
  will attach both versions.

  Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
  Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
  {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage.
  Nov  6 21:57:09 thoregon kernel: [ 9941.104430]CPU0
  Nov  6 21:57:09 thoregon kernel: [ 9941.104431]
  Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   
  lock((ip-i_lock)-mr_lock);
  Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   Interrupt
  Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
  lock((ip-i_lock)-mr_lock);
  Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
  Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***

 Sorry! Copied the wrong report. Your fix only landed in -rc5, so my
 vanilla -rc4 did (also) report the old problem again.
 And I copypasted that report instead of the second appearance of the
 new problem.

 Can you repost it with line wrapping turned off? The output simply
 becomes unreadable when it wraps

 Yeah, I know I can put it back together, but I've got better things
 to do with my time than stitch a couple of hundred lines of debug
 back into a readable format

Sorry about that, but I can't find any option to turn that off in Gmail.

I have added the reports as attachment, I hope thats OK for you.

Thanks for looking into this.

Torsten
[110926.972477] 
[110926.972482] =
[110926.972484] [ INFO: possible irq lock inversion dependency detected ]
[110926.972486] 3.7.0-rc4 #1 Not tainted
[110926.972487] -
[110926.972489] kswapd0/725 just changed the state of lock:
[110926.972490]  (sb_internal){.+.+.?}, at: [8122b268] 
xfs_trans_alloc+0x28/0x50
[110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[110926.972500]  ((ip-i_lock)-mr_lock/1){+.+.+.}
[110926.972500] 
[110926.972500] and interrupts could create inverse lock ordering between them.
[110926.972500] 
[110926.972503] 
[110926.972503] other info that might help us debug this:
[110926.972504]  Possible interrupt unsafe locking scenario:
[110926.972504] 
[110926.972505]CPU0CPU1
[110926.972506]
[110926.972507]   lock((ip-i_lock)-mr_lock/1);
[110926.972509]local_irq_disable();
[110926.972509]lock(sb_internal);
[110926.972511]lock((ip-i_lock)-mr_lock/1);
[110926.972512]   Interrupt
[110926.972513] lock(sb_internal);
[110926.972514] 
[110926.972514]  *** DEADLOCK ***
[110926.972514] 
[110926.972516] 2 locks held by kswapd0/725:
[110926.972517]  #0:  (shrinker_rwsem){..}, at: [810bbd22] 
shrink_slab+0x32/0x1f0
[110926.972522]  #1:  (type-s_umount_key#20){.+}, at: 
[810f5a8e] grab_super_passive+0x3e/0x90
[110926.972527] 
[110926.972527] the shortest dependencies between 2nd lock and 1st lock:
[110926.972533]  - ((ip-i_lock)-mr_lock/1){+.+.+.} ops: 58117 {
[110926.972536] HARDIRQ-ON-W at:
[110926.972537]   [8107f091] 
__lock_acquire+0x631/0x1c00
[110926.972540]   [81080b55] 
lock_acquire+0x55/0x70
[110926.972542]   [8106126a] 
down_write_nested+0x4a/0x70
[110926.972545]   [811e8164] xfs_ilock+0x84/0xb0
[110926.972548]   [811f5194] 
xfs_create+0x1d4/0x5a0
[110926.972550]   [811eca1a] 
xfs_vn_mknod+0x8a/0x1b0
[110926.972552]   [811ecb6e] 
xfs_vn_create+0xe/0x10
[110926.972554]   [81100332] vfs_create+0x72/0xc0
[110926.972556]   [81100b8e] 
do_last.isra.69+0x80e/0xc80
[110926.972558]   [811010ab] 
path_openat.isra.70+0xab/0x490
[110926.972560]   [8110184d] 
do_filp_open+0x3d/0xa0
[110926.972562]   [810f2139] 
do_sys_open+0xf9/0x1e0
[110926.972565]   [810f223c] sys_open+0x1c/0x20
[110926.972567]   [816b2d12] 
system_call_fastpath+0x16/0x1b
[110926.972570] SOFTIRQ-ON-W at:
[110926.972571

Re: [PATCH 2/4] AMD64 EDAC: Add support for >255 memory controllers

2012-10-31 Thread Torsten Kaiser
On Wed, Oct 31, 2012 at 6:55 AM, Daniel J Blueman
 wrote:
> As the AMD64 last-level-cache ID is 16-bits and federated systems
> eg using Numascale's NumaConnect/NumaChip can have more than 255 memory
> controllers, use 16-bits to store the ID.
>
> Signed-off-by: Daniel J Blueman 
> ---
>  drivers/edac/amd64_edac.c |   18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 18d404a..9920dfd 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -942,7 +942,7 @@ static u64 get_error_address(struct mce *m)
> struct amd64_pvt *pvt;
> u64 cc6_base, tmp_addr;
> u32 tmp;
> -   u8 mce_nid, intlv_en;
> +   u16 mce_nid, intlv_en;

Is the change of intlv_en to u16 intentional?
I assume its not, because...

> if ((addr & GENMASK(24, 47)) >> 24 != 0x00fdf7)
> return addr;
> @@ -1499,7 +1499,7 @@ static int f1x_match_to_this_node(struct amd64_pvt 
> *pvt, unsigned range,
> u8 channel;
> bool high_range = false;
>
> -   u8 node_id= dram_dst_node(pvt, range);
> +   u16 node_id   = dram_dst_node(pvt, range);
> u8 intlv_en   = dram_intlv_en(pvt, range);

... here you keep it at u8.

> u32 intlv_sel = dram_intlv_sel(pvt, range);
>
> @@ -2306,7 +2306,7 @@ out:
> return ret;
>  }
>
> -static int toggle_ecc_err_reporting(struct ecc_settings *s, u8 nid, bool on)
> +static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on)
>  {
> cpumask_var_t cmask;
> int cpu;
> @@ -2344,7 +2344,7 @@ static int toggle_ecc_err_reporting(struct ecc_settings 
> *s, u8 nid, bool on)
> return 0;
>  }
>
> -static bool enable_ecc_error_reporting(struct ecc_settings *s, u8 nid,
> +static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid,
>struct pci_dev *F3)
>  {
> bool ret = true;
> @@ -2396,7 +2396,7 @@ static bool enable_ecc_error_reporting(struct 
> ecc_settings *s, u8 nid,
> return ret;
>  }
>
> -static void restore_ecc_error_reporting(struct ecc_settings *s, u8 nid,
> +static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
> struct pci_dev *F3)
>  {
> u32 value, mask = 0x3;  /* UECC/CECC enable */
> @@ -2435,7 +2435,7 @@ static const char *ecc_msg =
> "'ecc_enable_override'.\n"
> " (Note that use of the override may cause unknown side effects.)\n";
>
> -static bool ecc_enabled(struct pci_dev *F3, u8 nid)
> +static bool ecc_enabled(struct pci_dev *F3, u16 nid)
>  {
> u32 value;
> u8 ecc_en = 0;
> @@ -2556,7 +2556,7 @@ static int amd64_init_one_instance(struct pci_dev *F2)
> struct mem_ctl_info *mci = NULL;
> struct edac_mc_layer layers[2];
> int err = 0, ret;
> -   u8 nid = get_node_id(F2);
> +   u16 nid = get_node_id(F2);
>
> ret = -ENOMEM;
> pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
> @@ -2647,7 +2647,7 @@ err_ret:
>  static int __devinit amd64_probe_one_instance(struct pci_dev *pdev,
>  const struct pci_device_id 
> *mc_type)
>  {
> -   u8 nid = get_node_id(pdev);
> +   u16 nid = get_node_id(pdev);
> struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
> struct ecc_settings *s;
> int ret = 0;
> @@ -2697,7 +2697,7 @@ static void __devexit amd64_remove_one_instance(struct 
> pci_dev *pdev)
>  {
> struct mem_ctl_info *mci;
> struct amd64_pvt *pvt;
> -   u8 nid = get_node_id(pdev);
> +   u16 nid = get_node_id(pdev);
> struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
> struct ecc_settings *s = ecc_stngs[nid];
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] AMD64 EDAC: Add support for 255 memory controllers

2012-10-31 Thread Torsten Kaiser
On Wed, Oct 31, 2012 at 6:55 AM, Daniel J Blueman
dan...@numascale-asia.com wrote:
 As the AMD64 last-level-cache ID is 16-bits and federated systems
 eg using Numascale's NumaConnect/NumaChip can have more than 255 memory
 controllers, use 16-bits to store the ID.

 Signed-off-by: Daniel J Blueman dan...@numascale-asia.com
 ---
  drivers/edac/amd64_edac.c |   18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

 diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
 index 18d404a..9920dfd 100644
 --- a/drivers/edac/amd64_edac.c
 +++ b/drivers/edac/amd64_edac.c
 @@ -942,7 +942,7 @@ static u64 get_error_address(struct mce *m)
 struct amd64_pvt *pvt;
 u64 cc6_base, tmp_addr;
 u32 tmp;
 -   u8 mce_nid, intlv_en;
 +   u16 mce_nid, intlv_en;

Is the change of intlv_en to u16 intentional?
I assume its not, because...

 if ((addr  GENMASK(24, 47))  24 != 0x00fdf7)
 return addr;
 @@ -1499,7 +1499,7 @@ static int f1x_match_to_this_node(struct amd64_pvt 
 *pvt, unsigned range,
 u8 channel;
 bool high_range = false;

 -   u8 node_id= dram_dst_node(pvt, range);
 +   u16 node_id   = dram_dst_node(pvt, range);
 u8 intlv_en   = dram_intlv_en(pvt, range);

... here you keep it at u8.

 u32 intlv_sel = dram_intlv_sel(pvt, range);

 @@ -2306,7 +2306,7 @@ out:
 return ret;
  }

 -static int toggle_ecc_err_reporting(struct ecc_settings *s, u8 nid, bool on)
 +static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on)
  {
 cpumask_var_t cmask;
 int cpu;
 @@ -2344,7 +2344,7 @@ static int toggle_ecc_err_reporting(struct ecc_settings 
 *s, u8 nid, bool on)
 return 0;
  }

 -static bool enable_ecc_error_reporting(struct ecc_settings *s, u8 nid,
 +static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid,
struct pci_dev *F3)
  {
 bool ret = true;
 @@ -2396,7 +2396,7 @@ static bool enable_ecc_error_reporting(struct 
 ecc_settings *s, u8 nid,
 return ret;
  }

 -static void restore_ecc_error_reporting(struct ecc_settings *s, u8 nid,
 +static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
 struct pci_dev *F3)
  {
 u32 value, mask = 0x3;  /* UECC/CECC enable */
 @@ -2435,7 +2435,7 @@ static const char *ecc_msg =
 'ecc_enable_override'.\n
  (Note that use of the override may cause unknown side effects.)\n;

 -static bool ecc_enabled(struct pci_dev *F3, u8 nid)
 +static bool ecc_enabled(struct pci_dev *F3, u16 nid)
  {
 u32 value;
 u8 ecc_en = 0;
 @@ -2556,7 +2556,7 @@ static int amd64_init_one_instance(struct pci_dev *F2)
 struct mem_ctl_info *mci = NULL;
 struct edac_mc_layer layers[2];
 int err = 0, ret;
 -   u8 nid = get_node_id(F2);
 +   u16 nid = get_node_id(F2);

 ret = -ENOMEM;
 pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
 @@ -2647,7 +2647,7 @@ err_ret:
  static int __devinit amd64_probe_one_instance(struct pci_dev *pdev,
  const struct pci_device_id 
 *mc_type)
  {
 -   u8 nid = get_node_id(pdev);
 +   u16 nid = get_node_id(pdev);
 struct pci_dev *F3 = node_to_amd_nb(nid)-misc;
 struct ecc_settings *s;
 int ret = 0;
 @@ -2697,7 +2697,7 @@ static void __devexit amd64_remove_one_instance(struct 
 pci_dev *pdev)
  {
 struct mem_ctl_info *mci;
 struct amd64_pvt *pvt;
 -   u8 nid = get_node_id(pdev);
 +   u16 nid = get_node_id(pdev);
 struct pci_dev *F3 = node_to_amd_nb(nid)-misc;
 struct ecc_settings *s = ecc_stngs[nid];

 --
 1.7.9.5

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in XFS reclaim on 3.7.0-rc3

2012-10-30 Thread Torsten Kaiser
On Mon, Oct 29, 2012 at 11:26 PM, Dave Chinner  wrote:
> On Mon, Oct 29, 2012 at 09:03:15PM +0100, Torsten Kaiser wrote:
>> After experiencing a hang of all IO yesterday (
>> http://marc.info/?l=linux-kernel=135142236520624=2 ), I turned on
>> LOCKDEP after upgrading to -rc3.
>>
>> I then tried to replicate the load that hung yesterday and got the
>> following lockdep report, implicating XFS instead of by stacking swap
>> onto dm-crypt and md.
>>
>> [ 2844.971913]
>> [ 2844.971920] =
>> [ 2844.971921] [ INFO: inconsistent lock state ]
>> [ 2844.971924] 3.7.0-rc3 #1 Not tainted
>> [ 2844.971925] -
>> [ 2844.971927] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
>> [ 2844.971929] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> [ 2844.971931] (&(>i_lock)->mr_lock){?.}, at: [] 
>> xfs_ilock+0x84/0xb0
>> [ 2844.971941] {RECLAIM_FS-ON-W} state was registered at:
>> [ 2844.971942]   [] mark_held_locks+0x7e/0x130
>> [ 2844.971947]   [] lockdep_trace_alloc+0x63/0xc0
>> [ 2844.971949]   [] kmem_cache_alloc+0x35/0xe0
>> [ 2844.971952]   [] vm_map_ram+0x271/0x770
>> [ 2844.971955]   [] _xfs_buf_map_pages+0x46/0xe0
>> [ 2844.971959]   [] xfs_buf_get_map+0x8a/0x130
>> [ 2844.971961]   [] xfs_trans_get_buf_map+0xa9/0xd0
>> [ 2844.971964]   [] xfs_ifree_cluster+0x129/0x670
>> [ 2844.971967]   [] xfs_ifree+0xe9/0xf0
>> [ 2844.971969]   [] xfs_inactive+0x2af/0x480
>> [ 2844.971972]   [] xfs_fs_evict_inode+0x70/0x80
>> [ 2844.971974]   [] evict+0xaf/0x1b0
>> [ 2844.971977]   [] iput+0x105/0x210
>> [ 2844.971979]   [] dentry_iput+0xa0/0xe0
>> [ 2844.971981]   [] dput+0x150/0x280
>> [ 2844.971983]   [] sys_renameat+0x21b/0x290
>> [ 2844.971986]   [] sys_rename+0x16/0x20
>> [ 2844.971988]   [] system_call_fastpath+0x16/0x1b
>
> We shouldn't be mapping pages there. See if the patch below fixes
> it.

Applying your fix and rerunning my test workload did not trigger this
or any other LOCKDEP reports.
While I'm not 100% sure about my test case always hitting this, your
description makes me quite confident, that it really fixed this issue.

I will keep LOCKDEP enabled on that system, and if there really is
another splat, I will report back here. But I rather doubt that this
will be needed.

Thanks for the very quick fix!

Torsten

> Fundamentally, though, the lockdep warning has come about because
> vm_map_ram is doing a GFP_KERNEL allocation when we need it to be
> doing GFP_NOFS - we are within a transaction here, so memory reclaim
> is not allowed to recurse back into the filesystem.
>
> mm-folk: can we please get this vmalloc/gfp_flags passing API
> fixed once and for all? This is the fourth time in the last month or
> so that I've seen XFS bug reports with silent hangs and associated
> lockdep output that implicate GFP_KERNEL allocations from vm_map_ram
> in GFP_NOFS conditions as the potential cause
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
>
> xfs: don't vmap inode cluster buffers during free
>
> From: Dave Chinner 
>
> Signed-off-by: Dave Chinner 
> ---
>  fs/xfs/xfs_inode.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index c4add46..82f6e5d 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1781,7 +1781,8 @@ xfs_ifree_cluster(
>  * to mark all the active inodes on the buffer stale.
>  */
> bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno,
> -   mp->m_bsize * blks_per_cluster, 0);
> +   mp->m_bsize * blks_per_cluster,
> +   XBF_UNMAPPED);
>
> if (!bp)
> return ENOMEM;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in XFS reclaim on 3.7.0-rc3

2012-10-30 Thread Torsten Kaiser
On Mon, Oct 29, 2012 at 11:26 PM, Dave Chinner da...@fromorbit.com wrote:
 On Mon, Oct 29, 2012 at 09:03:15PM +0100, Torsten Kaiser wrote:
 After experiencing a hang of all IO yesterday (
 http://marc.info/?l=linux-kernelm=135142236520624w=2 ), I turned on
 LOCKDEP after upgrading to -rc3.

 I then tried to replicate the load that hung yesterday and got the
 following lockdep report, implicating XFS instead of by stacking swap
 onto dm-crypt and md.

 [ 2844.971913]
 [ 2844.971920] =
 [ 2844.971921] [ INFO: inconsistent lock state ]
 [ 2844.971924] 3.7.0-rc3 #1 Not tainted
 [ 2844.971925] -
 [ 2844.971927] inconsistent {RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage.
 [ 2844.971929] kswapd0/725 [HC0[0]:SC0[0]:HE1:SE1] takes:
 [ 2844.971931] ((ip-i_lock)-mr_lock){?.}, at: [811e7ef4] 
 xfs_ilock+0x84/0xb0
 [ 2844.971941] {RECLAIM_FS-ON-W} state was registered at:
 [ 2844.971942]   [8108137e] mark_held_locks+0x7e/0x130
 [ 2844.971947]   [81081a63] lockdep_trace_alloc+0x63/0xc0
 [ 2844.971949]   [810e9dd5] kmem_cache_alloc+0x35/0xe0
 [ 2844.971952]   [810dba31] vm_map_ram+0x271/0x770
 [ 2844.971955]   [811e10a6] _xfs_buf_map_pages+0x46/0xe0
 [ 2844.971959]   [811e1fba] xfs_buf_get_map+0x8a/0x130
 [ 2844.971961]   [81233849] xfs_trans_get_buf_map+0xa9/0xd0
 [ 2844.971964]   [8121e339] xfs_ifree_cluster+0x129/0x670
 [ 2844.971967]   [8121f959] xfs_ifree+0xe9/0xf0
 [ 2844.971969]   [811f4abf] xfs_inactive+0x2af/0x480
 [ 2844.971972]   [811efb90] xfs_fs_evict_inode+0x70/0x80
 [ 2844.971974]   [8110cb8f] evict+0xaf/0x1b0
 [ 2844.971977]   [8110cd95] iput+0x105/0x210
 [ 2844.971979]   [811070d0] dentry_iput+0xa0/0xe0
 [ 2844.971981]   [81108310] dput+0x150/0x280
 [ 2844.971983]   [811020fb] sys_renameat+0x21b/0x290
 [ 2844.971986]   [81102186] sys_rename+0x16/0x20
 [ 2844.971988]   [816b2292] system_call_fastpath+0x16/0x1b

 We shouldn't be mapping pages there. See if the patch below fixes
 it.

Applying your fix and rerunning my test workload did not trigger this
or any other LOCKDEP reports.
While I'm not 100% sure about my test case always hitting this, your
description makes me quite confident, that it really fixed this issue.

I will keep LOCKDEP enabled on that system, and if there really is
another splat, I will report back here. But I rather doubt that this
will be needed.

Thanks for the very quick fix!

Torsten

 Fundamentally, though, the lockdep warning has come about because
 vm_map_ram is doing a GFP_KERNEL allocation when we need it to be
 doing GFP_NOFS - we are within a transaction here, so memory reclaim
 is not allowed to recurse back into the filesystem.

 mm-folk: can we please get this vmalloc/gfp_flags passing API
 fixed once and for all? This is the fourth time in the last month or
 so that I've seen XFS bug reports with silent hangs and associated
 lockdep output that implicate GFP_KERNEL allocations from vm_map_ram
 in GFP_NOFS conditions as the potential cause

 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com

 xfs: don't vmap inode cluster buffers during free

 From: Dave Chinner dchin...@redhat.com

 Signed-off-by: Dave Chinner dchin...@redhat.com
 ---
  fs/xfs/xfs_inode.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
 index c4add46..82f6e5d 100644
 --- a/fs/xfs/xfs_inode.c
 +++ b/fs/xfs/xfs_inode.c
 @@ -1781,7 +1781,8 @@ xfs_ifree_cluster(
  * to mark all the active inodes on the buffer stale.
  */
 bp = xfs_trans_get_buf(tp, mp-m_ddev_targp, blkno,
 -   mp-m_bsize * blks_per_cluster, 0);
 +   mp-m_bsize * blks_per_cluster,
 +   XBF_UNMAPPED);

 if (!bp)
 return ENOMEM;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Hang in XFS reclaim on 3.7.0-rc3

2012-10-29 Thread Torsten Kaiser
After experiencing a hang of all IO yesterday (
http://marc.info/?l=linux-kernel=135142236520624=2 ), I turned on
LOCKDEP after upgrading to -rc3.

I then tried to replicate the load that hung yesterday and got the
following lockdep report, implicating XFS instead of by stacking swap
onto dm-crypt and md.

Oct 29 20:27:11 thoregon kernel: [ 2675.571958] usb 7-2: USB
disconnect, device number 2
Oct 29 20:30:01 thoregon kernel: [ 2844.971913]
Oct 29 20:30:01 thoregon kernel: [ 2844.971920]
=
Oct 29 20:30:01 thoregon kernel: [ 2844.971921] [ INFO: inconsistent
lock state ]
Oct 29 20:30:01 thoregon kernel: [ 2844.971924] 3.7.0-rc3 #1 Not tainted
Oct 29 20:30:01 thoregon kernel: [ 2844.971925]
-
Oct 29 20:30:01 thoregon kernel: [ 2844.971927] inconsistent
{RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
Oct 29 20:30:01 thoregon kernel: [ 2844.971929] kswapd0/725
[HC0[0]:SC0[0]:HE1:SE1] takes:
Oct 29 20:30:01 thoregon kernel: [ 2844.971931]
(&(>i_lock)->mr_lock){?.}, at: []
xfs_ilock+0x84/0xb0
Oct 29 20:30:01 thoregon kernel: [ 2844.971941] {RECLAIM_FS-ON-W}
state was registered at:
Oct 29 20:30:01 thoregon kernel: [ 2844.971942]   []
mark_held_locks+0x7e/0x130
Oct 29 20:30:01 thoregon kernel: [ 2844.971947]   []
lockdep_trace_alloc+0x63/0xc0
Oct 29 20:30:01 thoregon kernel: [ 2844.971949]   []
kmem_cache_alloc+0x35/0xe0
Oct 29 20:30:01 thoregon kernel: [ 2844.971952]   []
vm_map_ram+0x271/0x770
Oct 29 20:30:01 thoregon kernel: [ 2844.971955]   []
_xfs_buf_map_pages+0x46/0xe0
Oct 29 20:30:01 thoregon kernel: [ 2844.971959]   []
xfs_buf_get_map+0x8a/0x130
Oct 29 20:30:01 thoregon kernel: [ 2844.971961]   []
xfs_trans_get_buf_map+0xa9/0xd0
Oct 29 20:30:01 thoregon kernel: [ 2844.971964]   []
xfs_ifree_cluster+0x129/0x670
Oct 29 20:30:01 thoregon kernel: [ 2844.971967]   []
xfs_ifree+0xe9/0xf0
Oct 29 20:30:01 thoregon kernel: [ 2844.971969]   []
xfs_inactive+0x2af/0x480
Oct 29 20:30:01 thoregon kernel: [ 2844.971972]   []
xfs_fs_evict_inode+0x70/0x80
Oct 29 20:30:01 thoregon kernel: [ 2844.971974]   []
evict+0xaf/0x1b0
Oct 29 20:30:01 thoregon kernel: [ 2844.971977]   []
iput+0x105/0x210
Oct 29 20:30:01 thoregon kernel: [ 2844.971979]   []
dentry_iput+0xa0/0xe0
Oct 29 20:30:01 thoregon kernel: [ 2844.971981]   []
dput+0x150/0x280
Oct 29 20:30:01 thoregon kernel: [ 2844.971983]   []
sys_renameat+0x21b/0x290
Oct 29 20:30:01 thoregon kernel: [ 2844.971986]   []
sys_rename+0x16/0x20
Oct 29 20:30:01 thoregon kernel: [ 2844.971988]   []
system_call_fastpath+0x16/0x1b
Oct 29 20:30:01 thoregon kernel: [ 2844.971992] irq event stamp: 155377
Oct 29 20:30:01 thoregon kernel: [ 2844.971993] hardirqs last  enabled
at (155377): [] mutex_trylock+0xfd/0x170
Oct 29 20:30:01 thoregon kernel: [ 2844.971997] hardirqs last disabled
at (155376): [] mutex_trylock+0x3e/0x170
Oct 29 20:30:01 thoregon kernel: [ 2844.971999] softirqs last  enabled
at (155368): [] __do_softirq+0x111/0x170
Oct 29 20:30:01 thoregon kernel: [ 2844.972002] softirqs last disabled
at (155353): [] call_softirq+0x1c/0x30
Oct 29 20:30:01 thoregon kernel: [ 2844.972004]
Oct 29 20:30:01 thoregon kernel: [ 2844.972004] other info that might
help us debug this:
Oct 29 20:30:01 thoregon kernel: [ 2844.972006]  Possible unsafe
locking scenario:
Oct 29 20:30:01 thoregon kernel: [ 2844.972006]
Oct 29 20:30:01 thoregon kernel: [ 2844.972007]CPU0
Oct 29 20:30:01 thoregon kernel: [ 2844.972007]
Oct 29 20:30:01 thoregon kernel: [ 2844.972008]   lock(&(>i_lock)->mr_lock);
Oct 29 20:30:01 thoregon kernel: [ 2844.972009]   
Oct 29 20:30:01 thoregon kernel: [ 2844.972010]
lock(&(>i_lock)->mr_lock);
Oct 29 20:30:01 thoregon kernel: [ 2844.972012]
Oct 29 20:30:01 thoregon kernel: [ 2844.972012]  *** DEADLOCK ***
Oct 29 20:30:01 thoregon kernel: [ 2844.972012]
Oct 29 20:30:01 thoregon kernel: [ 2844.972013] 3 locks held by kswapd0/725:
Oct 29 20:30:01 thoregon kernel: [ 2844.972014]  #0:
(shrinker_rwsem){..}, at: []
shrink_slab+0x32/0x1f0
Oct 29 20:30:01 thoregon kernel: [ 2844.972020]  #1:
(>s_umount_key#20){.+}, at: []
grab_super_passive+0x3e/0x90
Oct 29 20:30:01 thoregon kernel: [ 2844.972024]  #2:
(>pag_ici_reclaim_lock){+.+...}, at: []
xfs_reclaim_inodes_ag+0xbc/0x4f0
Oct 29 20:30:01 thoregon kernel: [ 2844.972027]
Oct 29 20:30:01 thoregon kernel: [ 2844.972027] stack backtrace:
Oct 29 20:30:01 thoregon kernel: [ 2844.972029] Pid: 725, comm:
kswapd0 Not tainted 3.7.0-rc3 #1
Oct 29 20:30:01 thoregon kernel: [ 2844.972031] Call Trace:
Oct 29 20:30:01 thoregon kernel: [ 2844.972035]  []
print_usage_bug+0x1f5/0x206
Oct 29 20:30:01 thoregon kernel: [ 2844.972039]  []
? save_stack_trace+0x2a/0x50
Oct 29 20:30:01 thoregon kernel: [ 2844.972042]  []
mark_lock+0x28d/0x2f0
Oct 29 20:30:01 thoregon kernel: [ 2844.972044]  []
? print_irq_inversion_bug.part.37+0x1f0/0x1f0
Oct 29 20:30:01 thoregon kernel: [ 2844.972047]  []
__lock_acquire+0x57f/0x1c00
Oct 29 20:30:01 thoregon kernel: [ 2844.972049]  []

Hang in XFS reclaim on 3.7.0-rc3

2012-10-29 Thread Torsten Kaiser
After experiencing a hang of all IO yesterday (
http://marc.info/?l=linux-kernelm=135142236520624w=2 ), I turned on
LOCKDEP after upgrading to -rc3.

I then tried to replicate the load that hung yesterday and got the
following lockdep report, implicating XFS instead of by stacking swap
onto dm-crypt and md.

Oct 29 20:27:11 thoregon kernel: [ 2675.571958] usb 7-2: USB
disconnect, device number 2
Oct 29 20:30:01 thoregon kernel: [ 2844.971913]
Oct 29 20:30:01 thoregon kernel: [ 2844.971920]
=
Oct 29 20:30:01 thoregon kernel: [ 2844.971921] [ INFO: inconsistent
lock state ]
Oct 29 20:30:01 thoregon kernel: [ 2844.971924] 3.7.0-rc3 #1 Not tainted
Oct 29 20:30:01 thoregon kernel: [ 2844.971925]
-
Oct 29 20:30:01 thoregon kernel: [ 2844.971927] inconsistent
{RECLAIM_FS-ON-W} - {IN-RECLAIM_FS-W} usage.
Oct 29 20:30:01 thoregon kernel: [ 2844.971929] kswapd0/725
[HC0[0]:SC0[0]:HE1:SE1] takes:
Oct 29 20:30:01 thoregon kernel: [ 2844.971931]
((ip-i_lock)-mr_lock){?.}, at: [811e7ef4]
xfs_ilock+0x84/0xb0
Oct 29 20:30:01 thoregon kernel: [ 2844.971941] {RECLAIM_FS-ON-W}
state was registered at:
Oct 29 20:30:01 thoregon kernel: [ 2844.971942]   [8108137e]
mark_held_locks+0x7e/0x130
Oct 29 20:30:01 thoregon kernel: [ 2844.971947]   [81081a63]
lockdep_trace_alloc+0x63/0xc0
Oct 29 20:30:01 thoregon kernel: [ 2844.971949]   [810e9dd5]
kmem_cache_alloc+0x35/0xe0
Oct 29 20:30:01 thoregon kernel: [ 2844.971952]   [810dba31]
vm_map_ram+0x271/0x770
Oct 29 20:30:01 thoregon kernel: [ 2844.971955]   [811e10a6]
_xfs_buf_map_pages+0x46/0xe0
Oct 29 20:30:01 thoregon kernel: [ 2844.971959]   [811e1fba]
xfs_buf_get_map+0x8a/0x130
Oct 29 20:30:01 thoregon kernel: [ 2844.971961]   [81233849]
xfs_trans_get_buf_map+0xa9/0xd0
Oct 29 20:30:01 thoregon kernel: [ 2844.971964]   [8121e339]
xfs_ifree_cluster+0x129/0x670
Oct 29 20:30:01 thoregon kernel: [ 2844.971967]   [8121f959]
xfs_ifree+0xe9/0xf0
Oct 29 20:30:01 thoregon kernel: [ 2844.971969]   [811f4abf]
xfs_inactive+0x2af/0x480
Oct 29 20:30:01 thoregon kernel: [ 2844.971972]   [811efb90]
xfs_fs_evict_inode+0x70/0x80
Oct 29 20:30:01 thoregon kernel: [ 2844.971974]   [8110cb8f]
evict+0xaf/0x1b0
Oct 29 20:30:01 thoregon kernel: [ 2844.971977]   [8110cd95]
iput+0x105/0x210
Oct 29 20:30:01 thoregon kernel: [ 2844.971979]   [811070d0]
dentry_iput+0xa0/0xe0
Oct 29 20:30:01 thoregon kernel: [ 2844.971981]   [81108310]
dput+0x150/0x280
Oct 29 20:30:01 thoregon kernel: [ 2844.971983]   [811020fb]
sys_renameat+0x21b/0x290
Oct 29 20:30:01 thoregon kernel: [ 2844.971986]   [81102186]
sys_rename+0x16/0x20
Oct 29 20:30:01 thoregon kernel: [ 2844.971988]   [816b2292]
system_call_fastpath+0x16/0x1b
Oct 29 20:30:01 thoregon kernel: [ 2844.971992] irq event stamp: 155377
Oct 29 20:30:01 thoregon kernel: [ 2844.971993] hardirqs last  enabled
at (155377): [816ae1ed] mutex_trylock+0xfd/0x170
Oct 29 20:30:01 thoregon kernel: [ 2844.971997] hardirqs last disabled
at (155376): [816ae12e] mutex_trylock+0x3e/0x170
Oct 29 20:30:01 thoregon kernel: [ 2844.971999] softirqs last  enabled
at (155368): [81042fb1] __do_softirq+0x111/0x170
Oct 29 20:30:01 thoregon kernel: [ 2844.972002] softirqs last disabled
at (155353): [816b33bc] call_softirq+0x1c/0x30
Oct 29 20:30:01 thoregon kernel: [ 2844.972004]
Oct 29 20:30:01 thoregon kernel: [ 2844.972004] other info that might
help us debug this:
Oct 29 20:30:01 thoregon kernel: [ 2844.972006]  Possible unsafe
locking scenario:
Oct 29 20:30:01 thoregon kernel: [ 2844.972006]
Oct 29 20:30:01 thoregon kernel: [ 2844.972007]CPU0
Oct 29 20:30:01 thoregon kernel: [ 2844.972007]
Oct 29 20:30:01 thoregon kernel: [ 2844.972008]   lock((ip-i_lock)-mr_lock);
Oct 29 20:30:01 thoregon kernel: [ 2844.972009]   Interrupt
Oct 29 20:30:01 thoregon kernel: [ 2844.972010]
lock((ip-i_lock)-mr_lock);
Oct 29 20:30:01 thoregon kernel: [ 2844.972012]
Oct 29 20:30:01 thoregon kernel: [ 2844.972012]  *** DEADLOCK ***
Oct 29 20:30:01 thoregon kernel: [ 2844.972012]
Oct 29 20:30:01 thoregon kernel: [ 2844.972013] 3 locks held by kswapd0/725:
Oct 29 20:30:01 thoregon kernel: [ 2844.972014]  #0:
(shrinker_rwsem){..}, at: [810bbd22]
shrink_slab+0x32/0x1f0
Oct 29 20:30:01 thoregon kernel: [ 2844.972020]  #1:
(type-s_umount_key#20){.+}, at: [810f5a8e]
grab_super_passive+0x3e/0x90
Oct 29 20:30:01 thoregon kernel: [ 2844.972024]  #2:
(pag-pag_ici_reclaim_lock){+.+...}, at: [811f263c]
xfs_reclaim_inodes_ag+0xbc/0x4f0
Oct 29 20:30:01 thoregon kernel: [ 2844.972027]
Oct 29 20:30:01 thoregon kernel: [ 2844.972027] stack backtrace:
Oct 29 20:30:01 thoregon kernel: [ 2844.972029] Pid: 725, comm:
kswapd0 Not tainted 3.7.0-rc3 #1
Oct 29 20:30:01 thoregon kernel: [ 2844.972031] Call Trace:
Oct 29 20:30:01 thoregon kernel: 

Hang with swap / mempool / md on 3.7.0-rc2

2012-10-28 Thread Torsten Kaiser
While 3.7.0-rc1 and -rc2 otherwise worked fine for me, today my system
experienced a hang, trying to write to its disks.

Source of the problem seems to be a hang in kswapd0, after that many
more processes got stuck trying to do IO. Even an emergency sync via
SysRq+S did no longer complete.

The hang (that was still correctly logged to disk):
Oct 28 09:40:16 thoregon kernel: [141366.412179] INFO: task
kswapd0:724 blocked for more than 120 seconds.
Oct 28 09:40:16 thoregon kernel: [141366.412186] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 09:40:16 thoregon kernel: [141366.412191] kswapd0 D
880337d112c0 0   724  2 0x
Oct 28 09:40:16 thoregon kernel: [141366.412200]  880329b8efa0
0046 0800 88032986d240
Oct 28 09:40:16 thoregon kernel: [141366.412210]  880329183fd8
880329183fd8 880329183fd8 880329b8efa0
Oct 28 09:40:16 thoregon kernel: [141366.412217]  0246
880329947680 880329947400 
Oct 28 09:40:16 thoregon kernel: [141366.412224] Call Trace:
Oct 28 09:40:16 thoregon kernel: [141366.412239]  []
? md_super_wait+0x4d/0x80
Oct 28 09:40:16 thoregon kernel: [141366.412249]  []
? add_wait_queue+0x60/0x60
Oct 28 09:40:16 thoregon kernel: [141366.412257]  []
? bitmap_unplug+0x153/0x160
Oct 28 09:40:16 thoregon kernel: [141366.412265]  []
? new_slab+0x1ec/0x220
Oct 28 09:40:16 thoregon kernel: [141366.412273]  []
? raid1_unplug+0xb8/0x110
Oct 28 09:40:16 thoregon kernel: [141366.412281]  []
? blk_flush_plug_list+0xb0/0x210
Oct 28 09:40:16 thoregon kernel: [141366.412288]  []
? io_schedule_timeout+0x82/0xf0
Oct 28 09:40:16 thoregon kernel: [141366.412296]  []
? mempool_alloc+0x122/0x150
Oct 28 09:40:16 thoregon kernel: [141366.412302]  []
? add_wait_queue+0x60/0x60
Oct 28 09:40:16 thoregon kernel: [141366.412309]  []
? bio_alloc_bioset+0x4e/0x120
Oct 28 09:40:16 thoregon kernel: [141366.412315]  []
? bio_clone_bioset+0x12/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412322]  []
? make_request+0x416/0xb70
Oct 28 09:40:16 thoregon kernel: [141366.412328]  []
? new_slab+0x1ec/0x220
Oct 28 09:40:16 thoregon kernel: [141366.412336]  []
? blk_recount_segments+0x21/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412343]  []
? md_make_request+0xbf/0x1e0
Oct 28 09:40:16 thoregon kernel: [141366.412349]  []
? generic_make_request+0xba/0xf0
Oct 28 09:40:16 thoregon kernel: [141366.412355]  []
? submit_bio+0x61/0x110
Oct 28 09:40:16 thoregon kernel: [141366.412363]  []
? _xfs_buf_ioapply+0x1e5/0x270
Oct 28 09:40:16 thoregon kernel: [141366.412370]  []
? try_to_wake_up+0x280/0x280
Oct 28 09:40:16 thoregon kernel: [141366.412377]  []
? xfs_buf_iorequest+0x25/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412383]  []
? xlog_bdstrat+0x16/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412389]  []
? xlog_sync+0x1bd/0x390
Oct 28 09:40:16 thoregon kernel: [141366.412394]  []
? xlog_assign_tail_lsn_locked+0x19/0x50
Oct 28 09:40:16 thoregon kernel: [141366.412400]  []
? xlog_write+0x554/0x6f0
Oct 28 09:40:16 thoregon kernel: [141366.412408]  []
? kmem_zone_zalloc+0x32/0x50
Oct 28 09:40:16 thoregon kernel: [141366.412415]  []
? xlog_cil_push+0x26d/0x350
Oct 28 09:40:16 thoregon kernel: [141366.412421]  []
? xlog_cil_force_lsn+0x130/0x140
Oct 28 09:40:16 thoregon kernel: [141366.412427]  []
? dequeue_task_fair+0x52/0x180
Oct 28 09:40:16 thoregon kernel: [141366.412433]  []
? _xfs_log_force_lsn+0x47/0x2d0
Oct 28 09:40:16 thoregon kernel: [141366.412439]  []
? __slab_free+0x17d/0x293
Oct 28 09:40:16 thoregon kernel: [141366.412446]  []
? delayacct_end+0x81/0xa0
Oct 28 09:40:16 thoregon kernel: [141366.412452]  []
? xfs_log_force_lsn+0xb/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412458]  []
? xfs_iunpin_wait+0x93/0xf0
Oct 28 09:40:16 thoregon kernel: [141366.412465]  []
? autoremove_wake_function+0x30/0x30
Oct 28 09:40:16 thoregon kernel: [141366.412471]  []
? xfs_reclaim_inode+0x11b/0x300
Oct 28 09:40:16 thoregon kernel: [141366.412478]  []
? xfs_reclaim_inodes_ag+0x1bb/0x2c0
Oct 28 09:40:16 thoregon kernel: [141366.412486]  []
? xfs_reclaim_inodes_nr+0x2c/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412493]  []
? prune_super+0x113/0x1b0
Oct 28 09:40:16 thoregon kernel: [141366.412499]  []
? shrink_slab+0x119/0x1c0
Oct 28 09:40:16 thoregon kernel: [141366.412506]  []
? kswapd+0x682/0x9a0
Oct 28 09:40:16 thoregon kernel: [141366.412513]  []
? add_wait_queue+0x60/0x60
Oct 28 09:40:16 thoregon kernel: [141366.412519]  []
? shrink_lruvec+0x540/0x540
Oct 28 09:40:16 thoregon kernel: [141366.412525]  []
? kthread+0xb3/0xc0
Oct 28 09:40:16 thoregon kernel: [141366.412531]  []
? flush_kthread_worker+0xa0/0xa0
Oct 28 09:40:16 thoregon kernel: [141366.412538]  []
? ret_from_fork+0x7c/0xb0
Oct 28 09:40:16 thoregon kernel: [141366.412544]  []
? flush_kthread_worker+0xa0/0xa0

After that xfsaild/md4, flush-9:4 and several user processes also got
such an hang message, these look like they just got stuck on 

Hang with swap / mempool / md on 3.7.0-rc2

2012-10-28 Thread Torsten Kaiser
While 3.7.0-rc1 and -rc2 otherwise worked fine for me, today my system
experienced a hang, trying to write to its disks.

Source of the problem seems to be a hang in kswapd0, after that many
more processes got stuck trying to do IO. Even an emergency sync via
SysRq+S did no longer complete.

The hang (that was still correctly logged to disk):
Oct 28 09:40:16 thoregon kernel: [141366.412179] INFO: task
kswapd0:724 blocked for more than 120 seconds.
Oct 28 09:40:16 thoregon kernel: [141366.412186] echo 0 
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Oct 28 09:40:16 thoregon kernel: [141366.412191] kswapd0 D
880337d112c0 0   724  2 0x
Oct 28 09:40:16 thoregon kernel: [141366.412200]  880329b8efa0
0046 0800 88032986d240
Oct 28 09:40:16 thoregon kernel: [141366.412210]  880329183fd8
880329183fd8 880329183fd8 880329b8efa0
Oct 28 09:40:16 thoregon kernel: [141366.412217]  0246
880329947680 880329947400 
Oct 28 09:40:16 thoregon kernel: [141366.412224] Call Trace:
Oct 28 09:40:16 thoregon kernel: [141366.412239]  [814a6fbd]
? md_super_wait+0x4d/0x80
Oct 28 09:40:16 thoregon kernel: [141366.412249]  [81054340]
? add_wait_queue+0x60/0x60
Oct 28 09:40:16 thoregon kernel: [141366.412257]  [814ad283]
? bitmap_unplug+0x153/0x160
Oct 28 09:40:16 thoregon kernel: [141366.412265]  [810cb3dc]
? new_slab+0x1ec/0x220
Oct 28 09:40:16 thoregon kernel: [141366.412273]  [81497fc8]
? raid1_unplug+0xb8/0x110
Oct 28 09:40:16 thoregon kernel: [141366.412281]  [81238180]
? blk_flush_plug_list+0xb0/0x210
Oct 28 09:40:16 thoregon kernel: [141366.412288]  [816289e2]
? io_schedule_timeout+0x82/0xf0
Oct 28 09:40:16 thoregon kernel: [141366.412296]  [81097882]
? mempool_alloc+0x122/0x150
Oct 28 09:40:16 thoregon kernel: [141366.412302]  [81054340]
? add_wait_queue+0x60/0x60
Oct 28 09:40:16 thoregon kernel: [141366.412309]  [811025fe]
? bio_alloc_bioset+0x4e/0x120
Oct 28 09:40:16 thoregon kernel: [141366.412315]  [81102802]
? bio_clone_bioset+0x12/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412322]  [8149be16]
? make_request+0x416/0xb70
Oct 28 09:40:16 thoregon kernel: [141366.412328]  [810cb3dc]
? new_slab+0x1ec/0x220
Oct 28 09:40:16 thoregon kernel: [141366.412336]  [8123b9e1]
? blk_recount_segments+0x21/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412343]  [814a0a0f]
? md_make_request+0xbf/0x1e0
Oct 28 09:40:16 thoregon kernel: [141366.412349]  [81236d9a]
? generic_make_request+0xba/0xf0
Oct 28 09:40:16 thoregon kernel: [141366.412355]  [81236e31]
? submit_bio+0x61/0x110
Oct 28 09:40:16 thoregon kernel: [141366.412363]  [811a8415]
? _xfs_buf_ioapply+0x1e5/0x270
Oct 28 09:40:16 thoregon kernel: [141366.412370]  [8105f3c0]
? try_to_wake_up+0x280/0x280
Oct 28 09:40:16 thoregon kernel: [141366.412377]  [811a9535]
? xfs_buf_iorequest+0x25/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412383]  [811f2666]
? xlog_bdstrat+0x16/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412389]  [811f398d]
? xlog_sync+0x1bd/0x390
Oct 28 09:40:16 thoregon kernel: [141366.412394]  [811f40e9]
? xlog_assign_tail_lsn_locked+0x19/0x50
Oct 28 09:40:16 thoregon kernel: [141366.412400]  [811f4b64]
? xlog_write+0x554/0x6f0
Oct 28 09:40:16 thoregon kernel: [141366.412408]  [811bd1f2]
? kmem_zone_zalloc+0x32/0x50
Oct 28 09:40:16 thoregon kernel: [141366.412415]  [811f5fed]
? xlog_cil_push+0x26d/0x350
Oct 28 09:40:16 thoregon kernel: [141366.412421]  [811f67b0]
? xlog_cil_force_lsn+0x130/0x140
Oct 28 09:40:16 thoregon kernel: [141366.412427]  [81061b02]
? dequeue_task_fair+0x52/0x180
Oct 28 09:40:16 thoregon kernel: [141366.412433]  [811f5157]
? _xfs_log_force_lsn+0x47/0x2d0
Oct 28 09:40:16 thoregon kernel: [141366.412439]  [8162265a]
? __slab_free+0x17d/0x293
Oct 28 09:40:16 thoregon kernel: [141366.412446]  [81087481]
? delayacct_end+0x81/0xa0
Oct 28 09:40:16 thoregon kernel: [141366.412452]  [811f53eb]
? xfs_log_force_lsn+0xb/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412458]  [811e6733]
? xfs_iunpin_wait+0x93/0xf0
Oct 28 09:40:16 thoregon kernel: [141366.412465]  [81054370]
? autoremove_wake_function+0x30/0x30
Oct 28 09:40:16 thoregon kernel: [141366.412471]  [811b85db]
? xfs_reclaim_inode+0x11b/0x300
Oct 28 09:40:16 thoregon kernel: [141366.412478]  [811b8d9b]
? xfs_reclaim_inodes_ag+0x1bb/0x2c0
Oct 28 09:40:16 thoregon kernel: [141366.412486]  [811b8fbc]
? xfs_reclaim_inodes_nr+0x2c/0x40
Oct 28 09:40:16 thoregon kernel: [141366.412493]  [810d80e3]
? prune_super+0x113/0x1b0
Oct 28 09:40:16 thoregon kernel: [141366.412499]  [810a1829]
? shrink_slab+0x119/0x1c0
Oct 28 09:40:16 thoregon kernel: [141366.412506]  [810a3d82]
? 

Re: Linux 2.6.25-rc2

2008-02-19 Thread Torsten Kaiser
On Feb 19, 2008 5:20 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> So:
>  - it might be something else entirely
>  - it might still be the local cmpxchg, just Torsten didn't happen to
>notice it until later.

My new hackbench-testcase also killed 2.6.24-rc2-mm1, so I really
noticed to late.

>  - it might still be the local cmpxchg, but something else changed its
>patterns to actually make it start triggering.
>
> and in general I don't think we should revert it unless we have stronger
> indications that it really is the problem (eg somebody finds the actual
> bug, or a reporter can confirm that it goes away when the local cmpxchg
> optimization is disabled).

I tried the following three patches:

switching the barrier() for a smp_mb() in 2.6.25-rc2-mm1:
-> crashed

reverting the FASTPATH-patch in 2.6.25-rc2:
-> worked

only removed FAST_CMPXCHG_LOCAL from arch/x86/Kconfig
-> worked

So all of these tests seem to confirm, that the bug is in the new SLUB fastpath.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-19 Thread Torsten Kaiser
On Feb 19, 2008 5:20 PM, Linus Torvalds [EMAIL PROTECTED] wrote:
 So:
  - it might be something else entirely
  - it might still be the local cmpxchg, just Torsten didn't happen to
notice it until later.

My new hackbench-testcase also killed 2.6.24-rc2-mm1, so I really
noticed to late.

  - it might still be the local cmpxchg, but something else changed its
patterns to actually make it start triggering.

 and in general I don't think we should revert it unless we have stronger
 indications that it really is the problem (eg somebody finds the actual
 bug, or a reporter can confirm that it goes away when the local cmpxchg
 optimization is disabled).

I tried the following three patches:

switching the barrier() for a smp_mb() in 2.6.25-rc2-mm1:
- crashed

reverting the FASTPATH-patch in 2.6.25-rc2:
- worked

only removed FAST_CMPXCHG_LOCAL from arch/x86/Kconfig
- worked

So all of these tests seem to confirm, that the bug is in the new SLUB fastpath.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Torsten Kaiser
On Feb 19, 2008 7:11 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> > On Feb 15, 2008 10:23 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > >
> > > Ok,
> > >  this kernel is a winner.
> >
> > Sadly not for me:
> > [ 5282.056415] [ cut here ]
> > [ 5282.059757] kernel BUG at lib/list_debug.c:33!
> > [ 5282.062055] invalid opcode:  [1] SMP
> > [ 5282.062055] CPU 3
>
> hm. Your crashes do seem to span multiple subsystems, but it always
> seems to be around the SLUB code. Could you try the patch below? The
> SLUB code has a new optimization and i'm not 100% sure about it. [the
> hack below switches the SLUB optimization off by disabling the CPU
> feature it relies on.]
>
> Ingo
>
> ->
>  arch/x86/Kconfig |4 
>  1 file changed, 4 deletions(-)
>
> Index: linux/arch/x86/Kconfig
> ===
> --- linux.orig/arch/x86/Kconfig
> +++ linux/arch/x86/Kconfig
> @@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT
>  config SEMAPHORE_SLEEPERS
> def_bool y
>
> -config FAST_CMPXCHG_LOCAL
> -   bool
> -   default y
> -
>  config MMU
> def_bool y
>

$ grep FAST_CMPXCHG_LOCAL */.config
linux-2.6.24-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc3-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc3-mm2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc6-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc8-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y

-rc2-mm1 still worked for me.

Did you mean the new SLUB_FASTPATH?
$ grep "define SLUB_FASTPATH" */mm/slub.c
linux-2.6.25-rc1/mm/slub.c:#define SLUB_FASTPATH
linux-2.6.25-rc2-mm1/mm/slub.c:#define SLUB_FASTPATH
linux-2.6.25-rc2/mm/slub.c:#define SLUB_FASTPATH

The 2.6.24-rc3+ mm-kernels did crash for me, but don't seem to contain this...

On the other hand:
>From the crash in 2.6.25-rc2-mm1:
[59987.116182] RIP  [] kmem_cache_alloc_node+0x6d/0xa0

(gdb) list *0x8029f83d
0x8029f83d is in kmem_cache_alloc_node (mm/slub.c:1646).
1641if (unlikely(is_end(object) || !node_match(c, node))) {
1642object = __slab_alloc(s, gfpflags,
node, addr, c);
1643break;
1644}
1645stat(c, ALLOC_FASTPATH);
1646} while (cmpxchg_local(>freelist, object, object[c->offset])
1647
 != object);
1648#else
1649unsigned long flags;
1650

That code is part for SLUB_FASTPATH.

I'm willing to test the patch, but don't know how fast I can find the
time to do it, so my answer if your patch helps might be delayed until
the weekend.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Torsten Kaiser
On Feb 19, 2008 12:54 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Sat, 16 Feb 2008, Torsten Kaiser wrote:
> >
> > [ 5282.056415] [ cut here ]
> > [ 5282.059757] kernel BUG at lib/list_debug.c:33!
>
> Is there any chance that you could try to bisect this, if it's repeatable
> enough for you? Even if you can't bisect it *all* the way, it would be
> really good to do a handful of bisection runs which should already
> hopefully narrow it down a bit more.
>
> Linus
>

It's repeatable, but not in a really reliable way.
So to mark a kernel good I need to compile around 100 KDE packages,
and even then I'm not 100% sure, if it's good or if I was just lucky.

But I did a partly bisect against 2.6.24-rc6-mm1:
2.6.24-rc6 + mm-patches up to (including) git.nfsd -> worked
2.6.24-rc6 + mm-patches up to (including) git.xfs -> crashed

I think the only added patch between rc2-mm1 and rc3-mm2 in that range
where the iommu changes that I later ruled out.
That leaves some git trees as suspects:
git-ocfs2.patch
git-selinux.patch
git-s390.patch
git-sched.patch
git-sh.patch
git-scsi-misc.patch
git-unionfs.patch
git-v9fs.patch
git-watchdog.patch
git-wireless.patch
git-ipwireless_cs.patch
git-x86.patch
git-xfs.patch

(see http://marc.info/?l=linux-kernel=120276641105256 )

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Torsten Kaiser
On Feb 19, 2008 12:54 AM, Linus Torvalds [EMAIL PROTECTED] wrote:


 On Sat, 16 Feb 2008, Torsten Kaiser wrote:
 
  [ 5282.056415] [ cut here ]
  [ 5282.059757] kernel BUG at lib/list_debug.c:33!

 Is there any chance that you could try to bisect this, if it's repeatable
 enough for you? Even if you can't bisect it *all* the way, it would be
 really good to do a handful of bisection runs which should already
 hopefully narrow it down a bit more.

 Linus


It's repeatable, but not in a really reliable way.
So to mark a kernel good I need to compile around 100 KDE packages,
and even then I'm not 100% sure, if it's good or if I was just lucky.

But I did a partly bisect against 2.6.24-rc6-mm1:
2.6.24-rc6 + mm-patches up to (including) git.nfsd - worked
2.6.24-rc6 + mm-patches up to (including) git.xfs - crashed

I think the only added patch between rc2-mm1 and rc3-mm2 in that range
where the iommu changes that I later ruled out.
That leaves some git trees as suspects:
git-ocfs2.patch
git-selinux.patch
git-s390.patch
git-sched.patch
git-sh.patch
git-scsi-misc.patch
git-unionfs.patch
git-v9fs.patch
git-watchdog.patch
git-wireless.patch
git-ipwireless_cs.patch
git-x86.patch
git-xfs.patch

(see http://marc.info/?l=linux-kernelm=120276641105256 )

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Torsten Kaiser
On Feb 19, 2008 7:11 AM, Ingo Molnar [EMAIL PROTECTED] wrote:
 * Torsten Kaiser [EMAIL PROTECTED] wrote:
  On Feb 15, 2008 10:23 PM, Linus Torvalds [EMAIL PROTECTED] wrote:
  
   Ok,
this kernel is a winner.
 
  Sadly not for me:
  [ 5282.056415] [ cut here ]
  [ 5282.059757] kernel BUG at lib/list_debug.c:33!
  [ 5282.062055] invalid opcode:  [1] SMP
  [ 5282.062055] CPU 3

 hm. Your crashes do seem to span multiple subsystems, but it always
 seems to be around the SLUB code. Could you try the patch below? The
 SLUB code has a new optimization and i'm not 100% sure about it. [the
 hack below switches the SLUB optimization off by disabling the CPU
 feature it relies on.]

 Ingo

 -
  arch/x86/Kconfig |4 
  1 file changed, 4 deletions(-)

 Index: linux/arch/x86/Kconfig
 ===
 --- linux.orig/arch/x86/Kconfig
 +++ linux/arch/x86/Kconfig
 @@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT
  config SEMAPHORE_SLEEPERS
 def_bool y

 -config FAST_CMPXCHG_LOCAL
 -   bool
 -   default y
 -
  config MMU
 def_bool y


$ grep FAST_CMPXCHG_LOCAL */.config
linux-2.6.24-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc3-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc3-mm2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc6-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc8-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y

-rc2-mm1 still worked for me.

Did you mean the new SLUB_FASTPATH?
$ grep define SLUB_FASTPATH */mm/slub.c
linux-2.6.25-rc1/mm/slub.c:#define SLUB_FASTPATH
linux-2.6.25-rc2-mm1/mm/slub.c:#define SLUB_FASTPATH
linux-2.6.25-rc2/mm/slub.c:#define SLUB_FASTPATH

The 2.6.24-rc3+ mm-kernels did crash for me, but don't seem to contain this...

On the other hand:
From the crash in 2.6.25-rc2-mm1:
[59987.116182] RIP  [8029f83d] kmem_cache_alloc_node+0x6d/0xa0

(gdb) list *0x8029f83d
0x8029f83d is in kmem_cache_alloc_node (mm/slub.c:1646).
1641if (unlikely(is_end(object) || !node_match(c, node))) {
1642object = __slab_alloc(s, gfpflags,
node, addr, c);
1643break;
1644}
1645stat(c, ALLOC_FASTPATH);
1646} while (cmpxchg_local(c-freelist, object, object[c-offset])
1647
 != object);
1648#else
1649unsigned long flags;
1650

That code is part for SLUB_FASTPATH.

I'm willing to test the patch, but don't know how fast I can find the
time to do it, so my answer if your patch helps might be delayed until
the weekend.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-17 Thread Torsten Kaiser
On Feb 17, 2008 9:25 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> There's the Bugzilla entry for it at
> http://bugzilla.kernel.org/show_bug.cgi?id=9973

Thank you.

> Please update it with the current information.

Crash for 2.6.25-rc2-mm1 added. That one had a complete stacktrace,
but the trace looks like others I already reported, so no real new
information... :-(

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-17 Thread Torsten Kaiser
On Feb 17, 2008 9:25 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 There's the Bugzilla entry for it at
 http://bugzilla.kernel.org/show_bug.cgi?id=9973

Thank you.

 Please update it with the current information.

Crash for 2.6.25-rc2-mm1 added. That one had a complete stacktrace,
but the trace looks like others I already reported, so no real new
information... :-(

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-16 Thread Torsten Kaiser
On Feb 15, 2008 10:23 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
> Ok,
>  this kernel is a winner.

Sadly not for me:
[ 5282.056415] [ cut here ]
[ 5282.059757] kernel BUG at lib/list_debug.c:33!
[ 5282.062055] invalid opcode:  [1] SMP
[ 5282.062055] CPU 3
[ 5282.062055] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
v4l2_common videobuf_dma_sg videobuf_core btcx_risc tveeprom usbhid
pata_amd i2c_nforce2 hid sg
[ 5282.062055] Pid: 12937, comm: sed Not tainted 2.6.25-rc2 #1
[ 5282.062055] RIP: 0010:[]
-> then the output from the serial console stopped. I was in X, so I
could not see, if there was anything more on the real console.

(gdb) list *0x803bffe4
0x803bffe4 is in __list_add (lib/list_debug.c:33).
28  }
29  if (unlikely(prev->next != next)) {
30  printk(KERN_ERR "list_add corruption.
prev->next should be "
31  "next (%p), but was %p. (prev=%p).\n",
32  next, prev->next, prev);
33  BUG();
34  }
35  next->prev = new;
36  new->next = next;
37  new->prev = prev;

For more on this problem see http://marc.info/?l=linux-kernel=120293042005445

I will now try 2.6.25-rc2-mm1.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-16 Thread Torsten Kaiser
On Feb 15, 2008 10:23 PM, Linus Torvalds [EMAIL PROTECTED] wrote:

 Ok,
  this kernel is a winner.

Sadly not for me:
[ 5282.056415] [ cut here ]
[ 5282.059757] kernel BUG at lib/list_debug.c:33!
[ 5282.062055] invalid opcode:  [1] SMP
[ 5282.062055] CPU 3
[ 5282.062055] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
v4l2_common videobuf_dma_sg videobuf_core btcx_risc tveeprom usbhid
pata_amd i2c_nforce2 hid sg
[ 5282.062055] Pid: 12937, comm: sed Not tainted 2.6.25-rc2 #1
[ 5282.062055] RIP: 0010:[803bffe4]
- then the output from the serial console stopped. I was in X, so I
could not see, if there was anything more on the real console.

(gdb) list *0x803bffe4
0x803bffe4 is in __list_add (lib/list_debug.c:33).
28  }
29  if (unlikely(prev-next != next)) {
30  printk(KERN_ERR list_add corruption.
prev-next should be 
31  next (%p), but was %p. (prev=%p).\n,
32  next, prev-next, prev);
33  BUG();
34  }
35  next-prev = new;
36  new-next = next;
37  new-prev = prev;

For more on this problem see http://marc.info/?l=linux-kernelm=120293042005445

I will now try 2.6.25-rc2-mm1.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc1

2008-02-13 Thread Torsten Kaiser
On Feb 11, 2008 11:15 PM, Andrew Morton <[EMAIL PROTECTED]> wrote:
> On Mon, 11 Feb 2008 22:46:18 +0100
> "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:
>
> > On Feb 11, 2008 1:44 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > > So give it all a good testing.
> >
> > My mm-mystery-crash has now sneaked into mainline:
>
> hm, I don't remember that.

Last report: http://marc.info/?l=linux-kernel=120129854023202

> > [ 1463.829078] BUG: unable to handle kernel NULL pointer dereference
> > at 0378
> > [ 1463.832141] IP: [] ether1394_dg_complete+0x28/0xa0
> > [ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0
> > [ 1463.836148] Oops:  [1] SMP
> > [ 1463.836148] CPU 0
> > [ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner
> > tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
> > tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
> > v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
> > i2c_nforce2 hid pata_amd
> > [ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1
> > [ 1463.836148] RIP: 0010:[]  []
> > ether1394_dg_complete+0x28/0xa0
> > [ 1463.836148] RSP: :81007eeb1e80  EFLAGS: 00010282
> > [ 1463.836148] RAX:  RBX:  RCX: 
> > 0001
> > [ 1463.836148] RDX: 81004bc62d80 RSI:  RDI: 
> > 810051873e40
> > [ 1463.836148] RBP: 81007eeb1eb0 R08:  R09: 
> > 0001
> > [ 1463.836148] R10: 0001 R11: 0001 R12: 
> > 810051873e40
> > [ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: 
> > 810051873e40
> > [ 1463.836148] FS:  7f727d6d4700() GS:807e8000()
> > knlGS:
> > [ 1463.836148] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> > [ 1463.836148] CR2: 0378 CR3: 79559000 CR4: 
> > 06e0
> > [ 1463.836148] DR0:  DR1:  DR2: 
> > 
> > [ 1463.836148] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > [ 1463.836148] Process khpsbpkt (pid: 519, threadinfo
> > 81007eeb, task 81007ee9e000)
> > [ 1463.836148] Stack:  81007eeb1e90 81004bc62b40
> > 810051873e40 
> > [ 1463.836148]  0001  81007eeb1ee0
> > 8047b233
> > [ 1463.836148]  81007eeb1ec8 81007eeb1ef0 8046c280
> > 81007ff6df10
> > [ 1463.836148] Call Trace:
> > [ 1463.836148]  [] ether1394_complete_cb+0xb3/0xd0
> > [ 1463.836148]  [] ? hpsbpkt_thread+0x0/0x140
> > [ 1463.836148]  [] hpsbpkt_thread+0xbb/0x140
> > [ 1463.836148]  [] kthread+0x4d/0x80
> > [ 1463.836148]  [] child_rip+0xa/0x12
> > [ 1463.836148]  [] ? restore_args+0x0/0x31
> > [ 1463.836148]  [] ? kthread+0x0/0x80
> > [ 1463.836148]  [] ? child_rip+0x0/0x12
> > [ 1463.836148]
> > [ 1463.836148]
> > [ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c
> > 89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f
> > 49 8b 45 20 <4c> 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8
> > 41 f0
> > [ 1463.836148] RIP  [] ether1394_dg_complete+0x28/0xa0
> > [ 1463.836148]  RSP 
> > [ 1463.836148] CR2: 0378
> > [ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is
> > probably too slow
> > [ 1463.839250] BUG: unable to handle kernel NULL pointer dereference
> > at 
> > [ 1463.841549] IP: [] kmem_cache_alloc_node+0x6d/0xa0
> > [ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0
> > [ 1463.846148] Oops:  [2] SMP
> > [ 1463.846148] CPU 0
> > [ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner
> > tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
> > tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
> > v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
> > i2c_nforce2 hid pata_amd
> > [ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G  D  2.6.25-rc1 #1
> > [ 1463.846148] RIP: 0010:[]  []
> > kmem_cache_alloc_node+0x6d/0xa0
> > [ 1463.846148] RSP: :80871ae0  EFLAGS: 00010046
> > [ 1463.846148] RAX:  RBX: 810001006820 RCX: 
> > 8052c549
> > [ 1463.846148] RDX:  RSI:  RDI: 
> > 807fbec0
> > [ 1463.846148] RBP: 80871b00 R08: 05e0 R09: 
> > 00

Re: Linux 2.6.25-rc1

2008-02-13 Thread Torsten Kaiser
On Feb 11, 2008 11:15 PM, Andrew Morton [EMAIL PROTECTED] wrote:
 On Mon, 11 Feb 2008 22:46:18 +0100
 Torsten Kaiser [EMAIL PROTECTED] wrote:

  On Feb 11, 2008 1:44 AM, Linus Torvalds [EMAIL PROTECTED] wrote:
   So give it all a good testing.
 
  My mm-mystery-crash has now sneaked into mainline:

 hm, I don't remember that.

Last report: http://marc.info/?l=linux-kernelm=120129854023202

  [ 1463.829078] BUG: unable to handle kernel NULL pointer dereference
  at 0378
  [ 1463.832141] IP: [8047af18] ether1394_dg_complete+0x28/0xa0
  [ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0
  [ 1463.836148] Oops:  [1] SMP
  [ 1463.836148] CPU 0
  [ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner
  tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
  tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
  v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
  i2c_nforce2 hid pata_amd
  [ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1
  [ 1463.836148] RIP: 0010:[8047af18]  [8047af18]
  ether1394_dg_complete+0x28/0xa0
  [ 1463.836148] RSP: :81007eeb1e80  EFLAGS: 00010282
  [ 1463.836148] RAX:  RBX:  RCX: 
  0001
  [ 1463.836148] RDX: 81004bc62d80 RSI:  RDI: 
  810051873e40
  [ 1463.836148] RBP: 81007eeb1eb0 R08:  R09: 
  0001
  [ 1463.836148] R10: 0001 R11: 0001 R12: 
  810051873e40
  [ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: 
  810051873e40
  [ 1463.836148] FS:  7f727d6d4700() GS:807e8000()
  knlGS:
  [ 1463.836148] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
  [ 1463.836148] CR2: 0378 CR3: 79559000 CR4: 
  06e0
  [ 1463.836148] DR0:  DR1:  DR2: 
  
  [ 1463.836148] DR3:  DR6: 0ff0 DR7: 
  0400
  [ 1463.836148] Process khpsbpkt (pid: 519, threadinfo
  81007eeb, task 81007ee9e000)
  [ 1463.836148] Stack:  81007eeb1e90 81004bc62b40
  810051873e40 
  [ 1463.836148]  0001  81007eeb1ee0
  8047b233
  [ 1463.836148]  81007eeb1ec8 81007eeb1ef0 8046c280
  81007ff6df10
  [ 1463.836148] Call Trace:
  [ 1463.836148]  [8047b233] ether1394_complete_cb+0xb3/0xd0
  [ 1463.836148]  [8046c280] ? hpsbpkt_thread+0x0/0x140
  [ 1463.836148]  [8046c33b] hpsbpkt_thread+0xbb/0x140
  [ 1463.836148]  [8024aead] kthread+0x4d/0x80
  [ 1463.836148]  [8020c578] child_rip+0xa/0x12
  [ 1463.836148]  [8020bc8f] ? restore_args+0x0/0x31
  [ 1463.836148]  [8024ae60] ? kthread+0x0/0x80
  [ 1463.836148]  [8020c56e] ? child_rip+0x0/0x12
  [ 1463.836148]
  [ 1463.836148]
  [ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c
  89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f
  49 8b 45 20 4c 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8
  41 f0
  [ 1463.836148] RIP  [8047af18] ether1394_dg_complete+0x28/0xa0
  [ 1463.836148]  RSP 81007eeb1e80
  [ 1463.836148] CR2: 0378
  [ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is
  probably too slow
  [ 1463.839250] BUG: unable to handle kernel NULL pointer dereference
  at 
  [ 1463.841549] IP: [80296d1d] kmem_cache_alloc_node+0x6d/0xa0
  [ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0
  [ 1463.846148] Oops:  [2] SMP
  [ 1463.846148] CPU 0
  [ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner
  tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
  tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
  v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
  i2c_nforce2 hid pata_amd
  [ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G  D  2.6.25-rc1 #1
  [ 1463.846148] RIP: 0010:[80296d1d]  [80296d1d]
  kmem_cache_alloc_node+0x6d/0xa0
  [ 1463.846148] RSP: :80871ae0  EFLAGS: 00010046
  [ 1463.846148] RAX:  RBX: 810001006820 RCX: 
  8052c549
  [ 1463.846148] RDX:  RSI:  RDI: 
  807fbec0
  [ 1463.846148] RBP: 80871b00 R08: 05e0 R09: 
  ffc1
  [ 1463.846148] R10: 0001 R11:  R12: 
  
  [ 1463.846148] R13: 0020 R14: 0020 R15: 
  807fbec0
  - here the output from the serial console stopped.
[snip]
 But this is a crash inside the 1394 code.  So if you're getting a crash
 with plain-old-ethernet then it is a different crash.  It'd be good if we
 could see the oops trace for that one too please.

2.6.24-rc3-mm2:
http://marc.info/?l=linux-kernelm=119636996902805

Re: Linux 2.6.25-rc1

2008-02-11 Thread Torsten Kaiser
On Feb 11, 2008 1:44 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> So give it all a good testing.

My mm-mystery-crash has now sneaked into mainline:
[ 1463.829078] BUG: unable to handle kernel NULL pointer dereference
at 0378
[ 1463.832141] IP: [] ether1394_dg_complete+0x28/0xa0
[ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0
[ 1463.836148] Oops:  [1] SMP
[ 1463.836148] CPU 0
[ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
i2c_nforce2 hid pata_amd
[ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1
[ 1463.836148] RIP: 0010:[]  []
ether1394_dg_complete+0x28/0xa0
[ 1463.836148] RSP: :81007eeb1e80  EFLAGS: 00010282
[ 1463.836148] RAX:  RBX:  RCX: 0001
[ 1463.836148] RDX: 81004bc62d80 RSI:  RDI: 810051873e40
[ 1463.836148] RBP: 81007eeb1eb0 R08:  R09: 0001
[ 1463.836148] R10: 0001 R11: 0001 R12: 810051873e40
[ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: 810051873e40
[ 1463.836148] FS:  7f727d6d4700() GS:807e8000()
knlGS:
[ 1463.836148] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[ 1463.836148] CR2: 0378 CR3: 79559000 CR4: 06e0
[ 1463.836148] DR0:  DR1:  DR2: 
[ 1463.836148] DR3:  DR6: 0ff0 DR7: 0400
[ 1463.836148] Process khpsbpkt (pid: 519, threadinfo
81007eeb, task 81007ee9e000)
[ 1463.836148] Stack:  81007eeb1e90 81004bc62b40
810051873e40 
[ 1463.836148]  0001  81007eeb1ee0
8047b233
[ 1463.836148]  81007eeb1ec8 81007eeb1ef0 8046c280
81007ff6df10
[ 1463.836148] Call Trace:
[ 1463.836148]  [] ether1394_complete_cb+0xb3/0xd0
[ 1463.836148]  [] ? hpsbpkt_thread+0x0/0x140
[ 1463.836148]  [] hpsbpkt_thread+0xbb/0x140
[ 1463.836148]  [] kthread+0x4d/0x80
[ 1463.836148]  [] child_rip+0xa/0x12
[ 1463.836148]  [] ? restore_args+0x0/0x31
[ 1463.836148]  [] ? kthread+0x0/0x80
[ 1463.836148]  [] ? child_rip+0x0/0x12
[ 1463.836148]
[ 1463.836148]
[ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c
89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f
49 8b 45 20 <4c> 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8
41 f0
[ 1463.836148] RIP  [] ether1394_dg_complete+0x28/0xa0
[ 1463.836148]  RSP 
[ 1463.836148] CR2: 0378
[ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is
probably too slow
[ 1463.839250] BUG: unable to handle kernel NULL pointer dereference
at 
[ 1463.841549] IP: [] kmem_cache_alloc_node+0x6d/0xa0
[ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0
[ 1463.846148] Oops:  [2] SMP
[ 1463.846148] CPU 0
[ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
i2c_nforce2 hid pata_amd
[ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G  D  2.6.25-rc1 #1
[ 1463.846148] RIP: 0010:[]  []
kmem_cache_alloc_node+0x6d/0xa0
[ 1463.846148] RSP: :80871ae0  EFLAGS: 00010046
[ 1463.846148] RAX:  RBX: 810001006820 RCX: 8052c549
[ 1463.846148] RDX:  RSI:  RDI: 807fbec0
[ 1463.846148] RBP: 80871b00 R08: 05e0 R09: ffc1
[ 1463.846148] R10: 0001 R11:  R12: 
[ 1463.846148] R13: 0020 R14: 0020 R15: 807fbec0
-> here the output from the serial console stopped.
Caps lock and Scroll lock where flashing again and as it hit a 'good'
spot during the installing of the package this crash resulted in a
corrupted ld.so.cache and damage several housekeeping files of the
package manager. :-(

Last good mm was 2.6.24-rc2-mm1, the next booting mm was
2.6.24-rc3-mm2 and that version had these "random" crashes.
Last good mainline was 2.6.24-rc7 that I was testing with the new
iommu patches that where added to 2.6.24-rc3-mm2.

I did a partly bisect of 2.6.24-rc6-mm1 that narrow it to this range:
2.6.24-rc6 + mm-patches up to (including) git.nfsd -> worked
2.6.24-rc6 + mm-patches up to (including) git.xfs -> crashed

I think the only added patch between rc2-mm1 and rc3-mm2 in that range
where the iommu changes that I later ruled out.
That leaves some git trees as suspects:
git-ocfs2.patch
git-selinux.patch
git-s390.patch
git-sched.patch
git-sh.patch
git-scsi-misc.patch
git-unionfs.patch
git-v9fs.patch

Re: Linux 2.6.25-rc1

2008-02-11 Thread Torsten Kaiser
On Feb 11, 2008 1:44 AM, Linus Torvalds [EMAIL PROTECTED] wrote:
 So give it all a good testing.

My mm-mystery-crash has now sneaked into mainline:
[ 1463.829078] BUG: unable to handle kernel NULL pointer dereference
at 0378
[ 1463.832141] IP: [8047af18] ether1394_dg_complete+0x28/0xa0
[ 1463.834616] PGD 7955e067 PUD 7955d067 PMD 0
[ 1463.836148] Oops:  [1] SMP
[ 1463.836148] CPU 0
[ 1463.836148] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
i2c_nforce2 hid pata_amd
[ 1463.836148] Pid: 519, comm: khpsbpkt Not tainted 2.6.25-rc1 #1
[ 1463.836148] RIP: 0010:[8047af18]  [8047af18]
ether1394_dg_complete+0x28/0xa0
[ 1463.836148] RSP: :81007eeb1e80  EFLAGS: 00010282
[ 1463.836148] RAX:  RBX:  RCX: 0001
[ 1463.836148] RDX: 81004bc62d80 RSI:  RDI: 810051873e40
[ 1463.836148] RBP: 81007eeb1eb0 R08:  R09: 0001
[ 1463.836148] R10: 0001 R11: 0001 R12: 810051873e40
[ 1463.836148] R13: 81007e1f7200 R14: 0001 R15: 810051873e40
[ 1463.836148] FS:  7f727d6d4700() GS:807e8000()
knlGS:
[ 1463.836148] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[ 1463.836148] CR2: 0378 CR3: 79559000 CR4: 06e0
[ 1463.836148] DR0:  DR1:  DR2: 
[ 1463.836148] DR3:  DR6: 0ff0 DR7: 0400
[ 1463.836148] Process khpsbpkt (pid: 519, threadinfo
81007eeb, task 81007ee9e000)
[ 1463.836148] Stack:  81007eeb1e90 81004bc62b40
810051873e40 
[ 1463.836148]  0001  81007eeb1ee0
8047b233
[ 1463.836148]  81007eeb1ec8 81007eeb1ef0 8046c280
81007ff6df10
[ 1463.836148] Call Trace:
[ 1463.836148]  [8047b233] ether1394_complete_cb+0xb3/0xd0
[ 1463.836148]  [8046c280] ? hpsbpkt_thread+0x0/0x140
[ 1463.836148]  [8046c33b] hpsbpkt_thread+0xbb/0x140
[ 1463.836148]  [8024aead] kthread+0x4d/0x80
[ 1463.836148]  [8020c578] child_rip+0xa/0x12
[ 1463.836148]  [8020bc8f] ? restore_args+0x0/0x31
[ 1463.836148]  [8024ae60] ? kthread+0x0/0x80
[ 1463.836148]  [8020c56e] ? child_rip+0x0/0x12
[ 1463.836148]
[ 1463.836148]
[ 1463.836148] Code: 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d d8 4c
89 75 f0 89 f3 4c 89 7d f8 4c 89 65 e0 49 89 ff 4c 89 6d e8 4c 8b 2f
49 8b 45 20 4c 8b a0 78 03 00 00 4d 8d b4 24 d0 00 00 00 4c 89 f7 e8
41 f0
[ 1463.836148] RIP  [8047af18] ether1394_dg_complete+0x28/0xa0
[ 1463.836148]  RSP 81007eeb1e80
[ 1463.836148] CR2: 0378
[ 1463.836208] ohci1394: fw-host0: Waking dma ctx=0 ... processing is
probably too slow
[ 1463.839250] BUG: unable to handle kernel NULL pointer dereference
at 
[ 1463.841549] IP: [80296d1d] kmem_cache_alloc_node+0x6d/0xa0
[ 1463.842925] PGD 7955e067 PUD 7955d067 PMD 0
[ 1463.846148] Oops:  [2] SMP
[ 1463.846148] CPU 0
[ 1463.846148] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv videodev v4l1_compat ir_common compat_ioctl32
v4l2_common videobuf_dma_sg videobuf_core btcx_risc usbhid tveeprom sg
i2c_nforce2 hid pata_amd
[ 1463.846148] Pid: 519, comm: khpsbpkt Tainted: G  D  2.6.25-rc1 #1
[ 1463.846148] RIP: 0010:[80296d1d]  [80296d1d]
kmem_cache_alloc_node+0x6d/0xa0
[ 1463.846148] RSP: :80871ae0  EFLAGS: 00010046
[ 1463.846148] RAX:  RBX: 810001006820 RCX: 8052c549
[ 1463.846148] RDX:  RSI:  RDI: 807fbec0
[ 1463.846148] RBP: 80871b00 R08: 05e0 R09: ffc1
[ 1463.846148] R10: 0001 R11:  R12: 
[ 1463.846148] R13: 0020 R14: 0020 R15: 807fbec0
- here the output from the serial console stopped.
Caps lock and Scroll lock where flashing again and as it hit a 'good'
spot during the installing of the package this crash resulted in a
corrupted ld.so.cache and damage several housekeeping files of the
package manager. :-(

Last good mm was 2.6.24-rc2-mm1, the next booting mm was
2.6.24-rc3-mm2 and that version had these random crashes.
Last good mainline was 2.6.24-rc7 that I was testing with the new
iommu patches that where added to 2.6.24-rc3-mm2.

I did a partly bisect of 2.6.24-rc6-mm1 that narrow it to this range:
2.6.24-rc6 + mm-patches up to (including) git.nfsd - worked
2.6.24-rc6 + mm-patches up to (including) git.xfs - crashed

I think the only added patch between rc2-mm1 and rc3-mm2 in 

Re: 2.6.24-rc8-mm1

2008-01-25 Thread Torsten Kaiser
On Jan 17, 2008 11:35 AM, Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8/2.6.24-rc8-mm1/

I'm still seeing my mystery-crash that I had since 2.6.24-rc3-mm2.

The crashed kernel was 2.6.24-rc8-mm1 with the following patches:

* personal fix for the "do_md_run returned -22"-problem
I'm just moving the analyze_sbs(mddev); above the test.

* git-sched-fix-bug_on.patch
* hotfix-libata-scsi-corruption.patch

The crash (captured via serial console):
Jan 25 21:40:01 treogen cron[6553]: (root) CMD (test -x
/usr/sbin/run-crons && /usr/sbin/run-crons )
Jan 25 20:40:44 treogen syslog-ng[4839]: I/O error occurred while
writing; fd='5', error='Input/output error (5)'
[ 1242.319555] [ cut here ]
[ 1242.319557] kernel BUG at lib/list_debug.c:33!
[ 1242.319558] invalid opcode:  [1] SMP
[ 1242.319560] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[ 1242.319562] CPU 3
[ 1242.319563] Modules linked in:

The cursor on the receiving machine stayed after the : in the last
line, the crashed machine blinked caps lock and scroll lock.

I don't have a clue what the syslog-ng error is about or why this line
is one hour to early.
At 20:40 this kernel wasn't even build yet and syslog-ng started with
the correct timezone:
Jan 25 21:26:26 treogen syslog-ng[4839]: syslog-ng starting up; version='2.0.6'


As I'm seeing this bug during times of both network and hard disk
activity, could this be related to the problem discussed in the thread
"[PATCH rc8-mm1] hotfix libata-scsi corruption"? The line fixed in the
mm-hotfix seems to be to new to cause this in -rc3-mm2, but these
alignment problems seem to touch more than this and I'm not clear one
how old this might be.

(If this matters: The crashing system is running the smartd daemon
from smartmontools version 5.37)


I hope I will have time to try git-misc-tree on sunday...

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-25 Thread Torsten Kaiser
Sorry for the *really* late answer, but I did not have any time to do
linux things the last weeks. :-(

On Jan 7, 2008 7:16 AM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> On Sun, 6 Jan 2008 21:03:42 +0100
> "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:
> > On Jan 6, 2008 2:33 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> > > On Sun, 6 Jan 2008 12:35:35 +0100
> > > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:
> > > > On Jan 6, 2008 12:23 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> > > > And double using something does fit with the errors I'm seeing...
> > > >
> > > > > Can you try the patch to revert my IOMMU changes?
> > > > >
> > > > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html

-> This is the revert-patch I'm talking about later

> > > > Testing for this bug is a little bit slow, as I'm compiling ~100
> > > > packages trying to trigger it.
> > > > If my current testrun with the patch from
> > > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
> > > > crashes, I will revert the hole IOMMU changes with above patch and try 
> > > > again.
> > >
> > > Thanks for testing,
> >
> > OK, I'm still testing this, but after 95 completed packages I'm rather
> > certain that reverting the IOMMU changes with this patch fixes my
> > problem.
> > I didn't have time to look more into this, so I can't offer any
> > concrete ideas where the bug is.

Until my last mail from 7. Jan this was true, that I was not able to
crash 2.6.24-rc6-mm1 with above patch.
But after testing 2.6.24-rc7 with only the IOMMU changes applied it
did crash once again.

After looking at the patch that seems rather expected as it only
touches powerpc code.
(I only looked at its diffstat after testing it, so I was not aware of
that fact during testing)

> > If you send more patches, I'm willing to test them, but it might take
> > some more time during the next week.
>
> Can you try 2.6.24-rc7 + the IOMMU changes?
>
> The patches are available at:
>
> http://www.kernel.org/pub/linux/kernel/people/tomo/iommu/
>
> Or if you prefer the git tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git 
> iommu-sg-fixes
>
>
>
> I've looked at the changes to GART but they are straightforward and
> don't look wrong...

The resulting 2.6.24-rc7 kernel worked for me. I compiled 146 packages
without a crash.

Today I finally had some time for debugging again and tried the new
2.6.24-rc8-mm1.
The crash is still there, I will report that crash in current thread.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1

2008-01-25 Thread Torsten Kaiser
On Jan 17, 2008 11:35 AM, Andrew Morton [EMAIL PROTECTED] wrote:

 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8/2.6.24-rc8-mm1/

I'm still seeing my mystery-crash that I had since 2.6.24-rc3-mm2.

The crashed kernel was 2.6.24-rc8-mm1 with the following patches:

* personal fix for the do_md_run returned -22-problem
I'm just moving the analyze_sbs(mddev); above the test.

* git-sched-fix-bug_on.patch
* hotfix-libata-scsi-corruption.patch

The crash (captured via serial console):
Jan 25 21:40:01 treogen cron[6553]: (root) CMD (test -x
/usr/sbin/run-crons  /usr/sbin/run-crons )
Jan 25 20:40:44 treogen syslog-ng[4839]: I/O error occurred while
writing; fd='5', error='Input/output error (5)'
[ 1242.319555] [ cut here ]
[ 1242.319557] kernel BUG at lib/list_debug.c:33!
[ 1242.319558] invalid opcode:  [1] SMP
[ 1242.319560] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[ 1242.319562] CPU 3
[ 1242.319563] Modules linked in:

The cursor on the receiving machine stayed after the : in the last
line, the crashed machine blinked caps lock and scroll lock.

I don't have a clue what the syslog-ng error is about or why this line
is one hour to early.
At 20:40 this kernel wasn't even build yet and syslog-ng started with
the correct timezone:
Jan 25 21:26:26 treogen syslog-ng[4839]: syslog-ng starting up; version='2.0.6'


As I'm seeing this bug during times of both network and hard disk
activity, could this be related to the problem discussed in the thread
[PATCH rc8-mm1] hotfix libata-scsi corruption? The line fixed in the
mm-hotfix seems to be to new to cause this in -rc3-mm2, but these
alignment problems seem to touch more than this and I'm not clear one
how old this might be.

(If this matters: The crashing system is running the smartd daemon
from smartmontools version 5.37)


I hope I will have time to try git-misc-tree on sunday...

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-25 Thread Torsten Kaiser
Sorry for the *really* late answer, but I did not have any time to do
linux things the last weeks. :-(

On Jan 7, 2008 7:16 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
 On Sun, 6 Jan 2008 21:03:42 +0100
 Torsten Kaiser [EMAIL PROTECTED] wrote:
  On Jan 6, 2008 2:33 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
   On Sun, 6 Jan 2008 12:35:35 +0100
   Torsten Kaiser [EMAIL PROTECTED] wrote:
On Jan 6, 2008 12:23 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
And double using something does fit with the errors I'm seeing...
   
 Can you try the patch to revert my IOMMU changes?

 http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html

- This is the revert-patch I'm talking about later

Testing for this bug is a little bit slow, as I'm compiling ~100
packages trying to trigger it.
If my current testrun with the patch from
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
crashes, I will revert the hole IOMMU changes with above patch and try 
again.
  
   Thanks for testing,
 
  OK, I'm still testing this, but after 95 completed packages I'm rather
  certain that reverting the IOMMU changes with this patch fixes my
  problem.
  I didn't have time to look more into this, so I can't offer any
  concrete ideas where the bug is.

Until my last mail from 7. Jan this was true, that I was not able to
crash 2.6.24-rc6-mm1 with above patch.
But after testing 2.6.24-rc7 with only the IOMMU changes applied it
did crash once again.

After looking at the patch that seems rather expected as it only
touches powerpc code.
(I only looked at its diffstat after testing it, so I was not aware of
that fact during testing)

  If you send more patches, I'm willing to test them, but it might take
  some more time during the next week.

 Can you try 2.6.24-rc7 + the IOMMU changes?

 The patches are available at:

 http://www.kernel.org/pub/linux/kernel/people/tomo/iommu/

 Or if you prefer the git tree:

 git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git 
 iommu-sg-fixes



 I've looked at the changes to GART but they are straightforward and
 don't look wrong...

The resulting 2.6.24-rc7 kernel worked for me. I compiled 146 packages
without a crash.

Today I finally had some time for debugging again and tried the new
2.6.24-rc8-mm1.
The crash is still there, I will report that crash in current thread.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 2:33 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> On Sun, 6 Jan 2008 12:35:35 +0100
> "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:
> > On Jan 6, 2008 12:23 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> > > On Sun, 6 Jan 2008 11:41:10 +0100
> > > "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:
> > > > I will applie your patch and see if this hunk from
> > > > find_next_zero_area() makes a difference:
> > > >
> > > >end = index + nr;
> > > > -   if (end > size)
> > > > +   if (end >= size)
> > > > return -1;

-> that might still have made a difference, but ...

> > > > -   for (i = index + 1; i < end; i++) {
> > > > +   for (i = index; i < end; i++) {

... as you say below, the test for the index position is only needed
if index is modified after find_next_zero_bit().

> > > > if (test_bit(i, map)) {
> > >
> > > The patch should not make a difference for X86_64.
> >
> > Hmm...
> > arch/x86/kernel/pci-gart_64.c:
> > alloc_iommu() calls iommu_area_alloc()
> > lib/iommu-helper.c:
> > iommu_area_alloc() calls find_next_zero_area()
> > -> so the above code should be called even on X86_64
>
> Oops, I meant that the patch fixes the align allocation (non zero
> align_mask case). X86_64 doesn't use the align allocation.
>
>
> > And the change in the for loop means that 'index' will now be tested,
> > but with the old code it was not.
>
> With the old code, 'index' is tested by find_next_zero_bit.
>
> With the new code and non zero align_mask case, 'index' is not tested
> by find_next_zero_bit. So test_bit needs to start with 'index'.
>
> So If I understand the correctly, this patch should not make a
> difference for x86_64 though I might miss something.

You did not miss anything.
After 18 packages my system crashed again.

> > And double using something does fit with the errors I'm seeing...
> >
> > > Can you try the patch to revert my IOMMU changes?
> > >
> > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html
> >
> > Testing for this bug is a little bit slow, as I'm compiling ~100
> > packages trying to trigger it.
> > If my current testrun with the patch from
> > http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
> > crashes, I will revert the hole IOMMU changes with above patch and try 
> > again.
>
> Thanks for testing,

OK, I'm still testing this, but after 95 completed packages I'm rather
certain that reverting the IOMMU changes with this patch fixes my
problem.
I didn't have time to look more into this, so I can't offer any
concrete ideas where the bug is.

If you send more patches, I'm willing to test them, but it might take
some more time during the next week.

Thanks for looking into this.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 12:23 PM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> On Sun, 6 Jan 2008 11:41:10 +0100
> "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:
> > I will applie your patch and see if this hunk from
> > find_next_zero_area() makes a difference:
> >
> >end = index + nr;
> > -   if (end > size)
> > +   if (end >= size)
> > return -1;
> > -   for (i = index + 1; i < end; i++) {
> > +   for (i = index; i < end; i++) {
> > if (test_bit(i, map)) {
>
> The patch should not make a difference for X86_64.

Hmm...
arch/x86/kernel/pci-gart_64.c:
alloc_iommu() calls iommu_area_alloc()
lib/iommu-helper.c:
iommu_area_alloc() calls find_next_zero_area()
-> so the above code should be called even on X86_64

And the change in the for loop means that 'index' will now be tested,
but with the old code it was not.

And double using something does fit with the errors I'm seeing...

> Can you try the patch to revert my IOMMU changes?
>
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html

Testing for this bug is a little bit slow, as I'm compiling ~100
packages trying to trigger it.
If my current testrun with the patch from
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
crashes, I will revert the hole IOMMU changes with above patch and try again.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 4:28 AM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote:
> On Sat, 5 Jan 2008 17:25:24 -0800
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> > On Sat, 5 Jan 2008 23:10:17 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> 
> > wrote:
> > > But the cause of my mail is the following question:
> > > Regarding my "iommu-sg-merging-patches are new in -rc3-mm and could be
> > > the cause"-suspicion I looked at these patches and came across these
> > > hunks:
> > >
> > > This is removed from arch/x86/lib/bitstr_64.c:
> > > -/* Find string of zero bits in a bitmap */
> > > -unsigned long
> > > -find_next_zero_string(unsigned long *bitmap, long start, long nbits, int 
> > > len)
> > > -{
> > > -   unsigned long n, end, i;
> > > -
> > > - again:
> > > -   n = find_next_zero_bit(bitmap, nbits, start);
> > > -   if (n == -1)
> > > -   return -1;
> > > -
> > > -   /* could test bitsliced, but it's hardly worth it */
> > > -   end = n+len;
> > > -   if (end > nbits)
> > > -   return -1;
> > > -   for (i = n+1; i < end; i++) {
> > > -   if (test_bit(i, bitmap)) {
> > > -   start = i+1;
> > > -   goto again;
> > > -   }
> > > -   }
> > > -   return n;
> > > -}
> > >
> > > This is added to lib/iommu-helper.c:
> > > +static unsigned long find_next_zero_area(unsigned long *map,
> > > +unsigned long size,
> > > +unsigned long start,
> > > +unsigned int nr)
> > > +{
> > > +   unsigned long index, end, i;
> > > +again:
> > > +   index = find_next_zero_bit(map, size, start);
> > > +   end = index + nr;
> > > +   if (end > size)
> > > +   return -1;
> > > +   for (i = index + 1; i < end; i++) {
> > > +   if (test_bit(i, map)) {
> > > +   start = i+1;
> > > +   goto again;
> > > +   }
> > > +   }
> > > +   return index;
> > > +}
> > >
> > > The old version checks, if find_next_zero_bit returns -1, the new
> > > version doesn't do this.
> > > Is this intended and can find_next_zero_bit never fail?
> > > Hmm... but in the worst case it should only loop forever if I'm
> > > reading this right (index = -1 => for-loop counts from 0 to nr, if any
> > > bit is set it will jump to "again:" and if the next call to
> > > find_next_zero_bit also fails, its an endless loop)
>
> find_next_zero_bit returns -1?
>
> It seems that x86_64 doesn't.

I'm sorry. I didn't look into find_next_zero_bit, I only noted that
the old version did check for -1 and the new one didn't.
Obviously the old check was superfluous.

> POWER and SPARC64 IOMMUs use
> find_next_zero_bit too but both doesn't check if find_next_zero_bit
> returns -1. If find_next_zero_bit fails, it returns size. So it
> doesn't leads to an endless loop.

Yes, this can't happen. It was a wrong assumption on my part.

> But this patch has other bugs that break POWER IOMMUs.
>
> If you use the IOMMUs on POWER, please try the following patch:

I'm using CONFIG_GART_IOMMU=y on x86_64.

> http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html

I also noted the line "index = (index + align_mask) & ~align_mask;" in
iommu_area_alloc() and didn't understand what this was trying to do
and how this should work, but as arch/x86/kernel/pci-gart_64.c always
uses 0 as align_mask I just ignored it.

I will applie your patch and see if this hunk from
find_next_zero_area() makes a difference:

   end = index + nr;
-   if (end > size)
+   if (end >= size)
return -1;
-   for (i = index + 1; i < end; i++) {
+   for (i = index; i < end; i++) {
if (test_bit(i, map)) {

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 9:27 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> On Sat, Jan 05, 2008 at 03:52:32PM +0100, Torsten Kaiser wrote:
> ...
> > So my personal conclusion would be, that someone is writing to memory
> > that he no longer owns. Most probably 0-bytes. (the complete_routine
> > got NULLed and the warning about dst->__refcnt being 0).
> >
> > Use-after-free or something else?
>
> I agree: your conclusion seems to be the most probable explanation for
> this. Then it could be really hard to solve this without bisection or
> something similar. But there is some probabability this something could
> try kfree later too, but simply this list debugging triggers earlier.

As for example in the case when it dies in ieee1394-thread the list is
so corrupted that it will die anyway.

But I might try this anyway, as I don't really have a better idee.

> > > > If you think some other slub_debug might catch it, I would try this...
>
> You can try to add "U" to these other slub_debug options. As a matter
> of fact, if your above diagnose is right, it seems you risk to damage
> your system or even the box with these tests, so if you want to
> continue, you should probably turn any possible debugging on (not in
> mm only).

I did not add U, because I thought that would only needed to trace memory leaks.
And I hoped that using P (poison) would catch any later use (after free).

> BTW, you've written that some debugging options seem to delay the bug.
> Since they often change sizes of some structures than such wrong
> writes could have some 'safer' offsets. So, this could really delay
> e.g. these list's bugs, but maybe this could also let to stay 'alive'
> to such wrong kfree?

I think this bug is highly timing dependent. Its not always the same
package that dies and as this is a SMP system I would guess two CPUs
using the same data will trigger this.
And using the poison-option will definitily slow the system down and
mess up the timings.

What also speaks against the 'safer' offsets is, that after adding my
notfreed-byte to skbuff the bug still triggered in the same way.

I'm currently looking at
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
,trying to understand if this is relevant for me on x86_64.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 9:27 AM, Jarek Poplawski [EMAIL PROTECTED] wrote:
 On Sat, Jan 05, 2008 at 03:52:32PM +0100, Torsten Kaiser wrote:
 ...
  So my personal conclusion would be, that someone is writing to memory
  that he no longer owns. Most probably 0-bytes. (the complete_routine
  got NULLed and the warning about dst-__refcnt being 0).
 
  Use-after-free or something else?

 I agree: your conclusion seems to be the most probable explanation for
 this. Then it could be really hard to solve this without bisection or
 something similar. But there is some probabability this something could
 try kfree later too, but simply this list debugging triggers earlier.

As for example in the case when it dies in ieee1394-thread the list is
so corrupted that it will die anyway.

But I might try this anyway, as I don't really have a better idee.

If you think some other slub_debug might catch it, I would try this...

 You can try to add U to these other slub_debug options. As a matter
 of fact, if your above diagnose is right, it seems you risk to damage
 your system or even the box with these tests, so if you want to
 continue, you should probably turn any possible debugging on (not in
 mm only).

I did not add U, because I thought that would only needed to trace memory leaks.
And I hoped that using P (poison) would catch any later use (after free).

 BTW, you've written that some debugging options seem to delay the bug.
 Since they often change sizes of some structures than such wrong
 writes could have some 'safer' offsets. So, this could really delay
 e.g. these list's bugs, but maybe this could also let to stay 'alive'
 to such wrong kfree?

I think this bug is highly timing dependent. Its not always the same
package that dies and as this is a SMP system I would guess two CPUs
using the same data will trigger this.
And using the poison-option will definitily slow the system down and
mess up the timings.

What also speaks against the 'safer' offsets is, that after adding my
notfreed-byte to skbuff the bug still triggered in the same way.

I'm currently looking at
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
,trying to understand if this is relevant for me on x86_64.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 4:28 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
 On Sat, 5 Jan 2008 17:25:24 -0800
 Andrew Morton [EMAIL PROTECTED] wrote:
  On Sat, 5 Jan 2008 23:10:17 +0100 Torsten Kaiser [EMAIL PROTECTED] 
  wrote:
   But the cause of my mail is the following question:
   Regarding my iommu-sg-merging-patches are new in -rc3-mm and could be
   the cause-suspicion I looked at these patches and came across these
   hunks:
  
   This is removed from arch/x86/lib/bitstr_64.c:
   -/* Find string of zero bits in a bitmap */
   -unsigned long
   -find_next_zero_string(unsigned long *bitmap, long start, long nbits, int 
   len)
   -{
   -   unsigned long n, end, i;
   -
   - again:
   -   n = find_next_zero_bit(bitmap, nbits, start);
   -   if (n == -1)
   -   return -1;
   -
   -   /* could test bitsliced, but it's hardly worth it */
   -   end = n+len;
   -   if (end  nbits)
   -   return -1;
   -   for (i = n+1; i  end; i++) {
   -   if (test_bit(i, bitmap)) {
   -   start = i+1;
   -   goto again;
   -   }
   -   }
   -   return n;
   -}
  
   This is added to lib/iommu-helper.c:
   +static unsigned long find_next_zero_area(unsigned long *map,
   +unsigned long size,
   +unsigned long start,
   +unsigned int nr)
   +{
   +   unsigned long index, end, i;
   +again:
   +   index = find_next_zero_bit(map, size, start);
   +   end = index + nr;
   +   if (end  size)
   +   return -1;
   +   for (i = index + 1; i  end; i++) {
   +   if (test_bit(i, map)) {
   +   start = i+1;
   +   goto again;
   +   }
   +   }
   +   return index;
   +}
  
   The old version checks, if find_next_zero_bit returns -1, the new
   version doesn't do this.
   Is this intended and can find_next_zero_bit never fail?
   Hmm... but in the worst case it should only loop forever if I'm
   reading this right (index = -1 = for-loop counts from 0 to nr, if any
   bit is set it will jump to again: and if the next call to
   find_next_zero_bit also fails, its an endless loop)

 find_next_zero_bit returns -1?

 It seems that x86_64 doesn't.

I'm sorry. I didn't look into find_next_zero_bit, I only noted that
the old version did check for -1 and the new one didn't.
Obviously the old check was superfluous.

 POWER and SPARC64 IOMMUs use
 find_next_zero_bit too but both doesn't check if find_next_zero_bit
 returns -1. If find_next_zero_bit fails, it returns size. So it
 doesn't leads to an endless loop.

Yes, this can't happen. It was a wrong assumption on my part.

 But this patch has other bugs that break POWER IOMMUs.

 If you use the IOMMUs on POWER, please try the following patch:

I'm using CONFIG_GART_IOMMU=y on x86_64.

 http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html

I also noted the line index = (index + align_mask)  ~align_mask; in
iommu_area_alloc() and didn't understand what this was trying to do
and how this should work, but as arch/x86/kernel/pci-gart_64.c always
uses 0 as align_mask I just ignored it.

I will applie your patch and see if this hunk from
find_next_zero_area() makes a difference:

   end = index + nr;
-   if (end  size)
+   if (end = size)
return -1;
-   for (i = index + 1; i  end; i++) {
+   for (i = index; i  end; i++) {
if (test_bit(i, map)) {

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 12:23 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
 On Sun, 6 Jan 2008 11:41:10 +0100
 Torsten Kaiser [EMAIL PROTECTED] wrote:
  I will applie your patch and see if this hunk from
  find_next_zero_area() makes a difference:
 
 end = index + nr;
  -   if (end  size)
  +   if (end = size)
  return -1;
  -   for (i = index + 1; i  end; i++) {
  +   for (i = index; i  end; i++) {
  if (test_bit(i, map)) {

 The patch should not make a difference for X86_64.

Hmm...
arch/x86/kernel/pci-gart_64.c:
alloc_iommu() calls iommu_area_alloc()
lib/iommu-helper.c:
iommu_area_alloc() calls find_next_zero_area()
- so the above code should be called even on X86_64

And the change in the for loop means that 'index' will now be tested,
but with the old code it was not.

And double using something does fit with the errors I'm seeing...

 Can you try the patch to revert my IOMMU changes?

 http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html

Testing for this bug is a little bit slow, as I'm compiling ~100
packages trying to trigger it.
If my current testrun with the patch from
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
crashes, I will revert the hole IOMMU changes with above patch and try again.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-06 Thread Torsten Kaiser
On Jan 6, 2008 2:33 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
 On Sun, 6 Jan 2008 12:35:35 +0100
 Torsten Kaiser [EMAIL PROTECTED] wrote:
  On Jan 6, 2008 12:23 PM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
   On Sun, 6 Jan 2008 11:41:10 +0100
   Torsten Kaiser [EMAIL PROTECTED] wrote:
I will applie your patch and see if this hunk from
find_next_zero_area() makes a difference:
   
   end = index + nr;
-   if (end  size)
+   if (end = size)
return -1;

- that might still have made a difference, but ...

-   for (i = index + 1; i  end; i++) {
+   for (i = index; i  end; i++) {

... as you say below, the test for the index position is only needed
if index is modified after find_next_zero_bit().

if (test_bit(i, map)) {
  
   The patch should not make a difference for X86_64.
 
  Hmm...
  arch/x86/kernel/pci-gart_64.c:
  alloc_iommu() calls iommu_area_alloc()
  lib/iommu-helper.c:
  iommu_area_alloc() calls find_next_zero_area()
  - so the above code should be called even on X86_64

 Oops, I meant that the patch fixes the align allocation (non zero
 align_mask case). X86_64 doesn't use the align allocation.


  And the change in the for loop means that 'index' will now be tested,
  but with the old code it was not.

 With the old code, 'index' is tested by find_next_zero_bit.

 With the new code and non zero align_mask case, 'index' is not tested
 by find_next_zero_bit. So test_bit needs to start with 'index'.

 So If I understand the correctly, this patch should not make a
 difference for x86_64 though I might miss something.

You did not miss anything.
After 18 packages my system crashed again.

  And double using something does fit with the errors I'm seeing...
 
   Can you try the patch to revert my IOMMU changes?
  
   http://www.mail-archive.com/[EMAIL PROTECTED]/msg12694.html
 
  Testing for this bug is a little bit slow, as I'm compiling ~100
  packages trying to trigger it.
  If my current testrun with the patch from
  http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html
  crashes, I will revert the hole IOMMU changes with above patch and try 
  again.

 Thanks for testing,

OK, I'm still testing this, but after 95 completed packages I'm rather
certain that reverting the IOMMU changes with this patch fixes my
problem.
I didn't have time to look more into this, so I can't offer any
concrete ideas where the bug is.

If you send more patches, I'm willing to test them, but it might take
some more time during the next week.

Thanks for looking into this.

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-05 Thread Torsten Kaiser
On Jan 5, 2008 11:10 PM, Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> 2.6.24-rc6 + mm-patches up to git.battery (includes git-net and
> git-netdev-all) worked for 110 packages, then I proclaimed it good.
> 2.6.24-rc6 + mm-patches up to (including) git.nfsd is currently
> getting testet (9 packages done...)
That kernel did also work for all 110 packages.

2.6.24-rc6 + mm-patches up to (including) git.xfs -> crash

[  576.899332] [ cut here ]
[  576.903661] kernel BUG at lib/list_debug.c:33!
[  576.903661] invalid opcode:  [1] SMP
[  576.903661] last sysfs file:
/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[  576.903661] CPU 3
[  576.903661] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev v4l2_common usbhid
v4l1_compat sg hid i2c_nforce2 pata_amd
[  576.903661] Pid: 5559, comm: nfsv4-svc Not tainted 2.6.24-rc6-mm-git.xfs #2
[  576.903661] RIP: 0010:[]  []
__list_add+0x54/0x60
[  576.903661] RSP: 0018:81007d4e1dc0  EFLAGS: 00010282
[  576.903661] RAX: 0088 RBX: 81007e955800 RCX: fc6c7900
[  576.903661] RDX: 81007d53eef0 RSI: 0001 RDI: 80760140
[  576.903661] RBP: 81007d4e1dc0 R08: 0001 R09: 
[  576.903661] R10: 810080062008 R11: 0001 R12: 81007ed00900
[  576.903661] R13: 81007ed00938 R14: 81007ed00938 R15: 81007dd6f100
[  576.903661] FS:  7f1b7e6a36f0() GS:81011ff1b780()
knlGS:
[  576.903661] CS:  0010 DS:  ES:  CR0: 8005003b
[  576.903661] CR2: 7ffb28c2c000 CR3: 741ab000 CR4: 06e0
[  576.903661] DR0:  DR1:  DR2: 
[  576.903661] DR3:  DR6: 0ff0 DR7: 0400
[  576.903661] Process nfsv4-svc (pid: 5559, threadinfo
81007d4e, task 81007d53eef0)
[  576.903661] Stack:  81007d4e1e00 805c4dbb
81007ed00908 81007dd6f100
[  576.903661]  81011ad7bc00 81007d458000 81007e955800
81007dd6f110
[  576.903661]  81007d4e1e10 805c4ea7 81007d4e1ee0
805c5fd4
[  576.903661] Call Trace:
[  576.903661]  [] svc_xprt_enqueue+0x1ab/0x240
[  576.903661]  [] svc_xprt_received+0x17/0x20
[  576.903661]  [] svc_recv+0x394/0x7c0
[  576.903661]  [] svc_send+0xae/0xd0
[  576.903661]  [] default_wake_function+0x0/0x10
[  576.903661]  [] nfs_callback_svc+0x79/0x130
[  576.903662]  [] finish_task_switch+0xcc/0xe0
[  576.903662]  [] child_rip+0xa/0x12
[  576.903662]  [] restore_args+0x0/0x30
[  576.903662]  [] __svc_create_thread+0xdd/0x200
[  576.903662]  [] nfs_callback_svc+0x0/0x130
[  576.903662]  [] child_rip+0x0/0x12
[  576.903662]
[  576.903662]
[  576.903662] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16
48 89 e5 e8
[  576.903662] RIP  [] __list_add+0x54/0x60
[  576.903662]  RSP 
[  576.903673] ---[ end trace d46de6b99ae8cd5a ]---
[  576.913664] Kernel panic - not syncing: Aiee, killing interrupt handler!

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-05 Thread Torsten Kaiser
On Jan 5, 2008 3:52 PM, Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> On Jan 5, 2008 11:13 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> > On Sat, Jan 05, 2008 at 09:01:02AM +0100, Torsten Kaiser wrote:
> > > On Jan 5, 2008 1:07 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> > > > I think it would be easier just to start with this working -rc6 and
> > > > simply check if we have 'right' suspects, so: git-net.patch and
> > > > git-nfsd.patch from -mm1-broken-out, as suggested by Herbert (I hope,
> > > > can compile - otherwise you could try the other way: add the whole -mm
> > > > and revert these two). Using current gits could complicate this
> > > > "investigation".
> > >
> > > OK, I will try this...
>
> still on the todo-list, I had no time to try this yet...

working on it...
2.6.24-rc6 + mm-patches up to git.battery (includes git-net and
git-netdev-all) worked for 110 packages, then I proclaimed it good.
2.6.24-rc6 + mm-patches up to (including) git.nfsd is currently
getting testet (9 packages done...)

But the cause of my mail is the following question:
Regarding my "iommu-sg-merging-patches are new in -rc3-mm and could be
the cause"-suspicion I looked at these patches and came across these
hunks:

This is removed from arch/x86/lib/bitstr_64.c:
-/* Find string of zero bits in a bitmap */
-unsigned long
-find_next_zero_string(unsigned long *bitmap, long start, long nbits, int len)
-{
-   unsigned long n, end, i;
-
- again:
-   n = find_next_zero_bit(bitmap, nbits, start);
-   if (n == -1)
-   return -1;
-
-   /* could test bitsliced, but it's hardly worth it */
-   end = n+len;
-   if (end > nbits)
-   return -1;
-   for (i = n+1; i < end; i++) {
-   if (test_bit(i, bitmap)) {
-   start = i+1;
-   goto again;
-   }
-   }
-   return n;
-}

This is added to lib/iommu-helper.c:
+static unsigned long find_next_zero_area(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int nr)
+{
+   unsigned long index, end, i;
+again:
+   index = find_next_zero_bit(map, size, start);
+   end = index + nr;
+   if (end > size)
+   return -1;
+   for (i = index + 1; i < end; i++) {
+   if (test_bit(i, map)) {
+   start = i+1;
+   goto again;
+   }
+   }
+   return index;
+}

The old version checks, if find_next_zero_bit returns -1, the new
version doesn't do this.
Is this intended and can find_next_zero_bit never fail?
Hmm... but in the worst case it should only loop forever if I'm
reading this right (index = -1 => for-loop counts from 0 to nr, if any
bit is set it will jump to "again:" and if the next call to
find_next_zero_bit also fails, its an endless loop)

So even if this can not explain my bug, could somebody check if this
is a real bug or not?

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-05 Thread Torsten Kaiser
On Jan 5, 2008 1:07 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> On Fri, Jan 04, 2008 at 04:21:26PM +0100, Torsten Kaiser wrote:
> > On Jan 4, 2008 2:30 PM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> > The only thing that is sadly not practical is bisecting the borkenout
> > mm-patches, as triggering this error is to unreliable /
> > time-consuming.
>
> Right, but it seems there are these 2 main suspects here...
>
> > > - is it still vanilla -rc6-mm1; I've seen on kernel list you tried
> > > some fixes around raid?
> >
> > Yes, without these fixes I can't boot.
> > But they should only be run during starting the arrays, so I doubt
> > that this is that cause.
> > (Also -rc3-mm2 did not need this fix)
>
> You've written vanilla -rc6 is OK. Does it mean -rc6 with these fixes?

vanilla -rc6 is fine without these fixes.
The raid-bugs from -rc6-mm1 are probably introduced by
md-allow-devices-to-be-shared-between-md-arrays.patch and that patch
is new in this mm-release.

> I think it would be easier just to start with this working -rc6 and
> simply check if we have 'right' suspects, so: git-net.patch and
> git-nfsd.patch from -mm1-broken-out, as suggested by Herbert (I hope,
> can compile - otherwise you could try the other way: add the whole -mm
> and revert these two). Using current gits could complicate this
> "investigation".

OK, I will try this...

> > My skbuff-double-free-detector is still in there, but was never triggered.
> >
> > > - could you remind this lockdep warning; is it always and the same,
> > > always before crash, or no rules?
> >
> > ???
> > I see no lockdep warning before the crashes.
> > I have seen a warning about the dst->__refcnt in dst_release and
> > different warnings about list operations.
> >
> > I think I have always posted everything I have seen before the
> > crashes. (captured via serial console)
>
> So, you mean there are no more of these?:
>
> "looked into the log in question and the only other warning was a
>  circular locking dependency that lockdep detected around 1.5 hour
>  before this warning."
> ...
> "[ 7620.845168] INFO: lockdep is turned off."

Aha, I had forgotten about that one.
Looking at all the crashlogs, I do not find another one of this lockdep warning.
The only other lockdep related output was the bootup problem in vanilla -rc6.

> > (If you mean the lockdep-problem in -rc6: That is more or less a
> > missing annotation during early bootup. The only problem with that is,
> > that it will causes lockdep to be turned off and so it can not be used
> > to find any real problem. A fix for that is in -mm so I do have
> > lockdep on the mm-kernels)
> >
> > > - I've seen you looked after double freeing, but this last debug list
> > > warning could suggest locking problems during list modification too.
> >
> > Yes, but Herbert mentioned double freeing a skb explicit and so I
> > tried to catch this.
> > I do not know enough about the network core to verify the locking of
> > the involved lists.
>
> Right, the list corruption could be because of use after freeing too.

I had hoped that I could catch use-after-freeing by using
slub_debug=FZP, but that did not help.
(first oops in http://lkml.org/lkml/2007/12/28/159 )

I think that the main skb structs come from slub and should be
poisoned by this, so it might be some other data structure that is
allocated differently...

> > > - above git-nfsd and git-net tests should be probably repeated with
> > > -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
> > > only, and if bug triggers, with one reversed; btw., since in previous
> > > message you mentioned that 50 packages could be not enough to trigger
> > > this, these 54 above could make too little margin yet.
> >
> > Yes, I think I really need to redo the git-nfsd-test.
> > With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
> > run of kde-packages triggered it after only 5 packages.
> > I don't know what this bug hates about kdeartwork-wallpaper (triggered
> > it this time) or kdeartwork-styles.
>
> I didn't read all this thread, so probably I miss many points, but are
> you sure there are no problems with filesystem corruption around these
> packets or where you compile(?) them (e.g. after these raid problems)?

For my setup: It's a gentoo system, so compiling packages is the
normal way of installing something.
The compile itself is done on a tmpfs so a filesystem corruption there
should be rather impossible. ;)
(The system has 4Gb RAM, so it doesn't even need to swap)
The sources are taken fro

Re: 2.6.24-rc6-mm1

2008-01-05 Thread Torsten Kaiser
On Jan 5, 2008 1:07 AM, Jarek Poplawski [EMAIL PROTECTED] wrote:
 On Fri, Jan 04, 2008 at 04:21:26PM +0100, Torsten Kaiser wrote:
  On Jan 4, 2008 2:30 PM, Jarek Poplawski [EMAIL PROTECTED] wrote:
  The only thing that is sadly not practical is bisecting the borkenout
  mm-patches, as triggering this error is to unreliable /
  time-consuming.

 Right, but it seems there are these 2 main suspects here...

   - is it still vanilla -rc6-mm1; I've seen on kernel list you tried
   some fixes around raid?
 
  Yes, without these fixes I can't boot.
  But they should only be run during starting the arrays, so I doubt
  that this is that cause.
  (Also -rc3-mm2 did not need this fix)

 You've written vanilla -rc6 is OK. Does it mean -rc6 with these fixes?

vanilla -rc6 is fine without these fixes.
The raid-bugs from -rc6-mm1 are probably introduced by
md-allow-devices-to-be-shared-between-md-arrays.patch and that patch
is new in this mm-release.

 I think it would be easier just to start with this working -rc6 and
 simply check if we have 'right' suspects, so: git-net.patch and
 git-nfsd.patch from -mm1-broken-out, as suggested by Herbert (I hope,
 can compile - otherwise you could try the other way: add the whole -mm
 and revert these two). Using current gits could complicate this
 investigation.

OK, I will try this...

  My skbuff-double-free-detector is still in there, but was never triggered.
 
   - could you remind this lockdep warning; is it always and the same,
   always before crash, or no rules?
 
  ???
  I see no lockdep warning before the crashes.
  I have seen a warning about the dst-__refcnt in dst_release and
  different warnings about list operations.
 
  I think I have always posted everything I have seen before the
  crashes. (captured via serial console)

 So, you mean there are no more of these?:

 looked into the log in question and the only other warning was a
  circular locking dependency that lockdep detected around 1.5 hour
  before this warning.
 ...
 [ 7620.845168] INFO: lockdep is turned off.

Aha, I had forgotten about that one.
Looking at all the crashlogs, I do not find another one of this lockdep warning.
The only other lockdep related output was the bootup problem in vanilla -rc6.

  (If you mean the lockdep-problem in -rc6: That is more or less a
  missing annotation during early bootup. The only problem with that is,
  that it will causes lockdep to be turned off and so it can not be used
  to find any real problem. A fix for that is in -mm so I do have
  lockdep on the mm-kernels)
 
   - I've seen you looked after double freeing, but this last debug list
   warning could suggest locking problems during list modification too.
 
  Yes, but Herbert mentioned double freeing a skb explicit and so I
  tried to catch this.
  I do not know enough about the network core to verify the locking of
  the involved lists.

 Right, the list corruption could be because of use after freeing too.

I had hoped that I could catch use-after-freeing by using
slub_debug=FZP, but that did not help.
(first oops in http://lkml.org/lkml/2007/12/28/159 )

I think that the main skb structs come from slub and should be
poisoned by this, so it might be some other data structure that is
allocated differently...

   - above git-nfsd and git-net tests should be probably repeated with
   -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
   only, and if bug triggers, with one reversed; btw., since in previous
   message you mentioned that 50 packages could be not enough to trigger
   this, these 54 above could make too little margin yet.
 
  Yes, I think I really need to redo the git-nfsd-test.
  With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
  run of kde-packages triggered it after only 5 packages.
  I don't know what this bug hates about kdeartwork-wallpaper (triggered
  it this time) or kdeartwork-styles.

 I didn't read all this thread, so probably I miss many points, but are
 you sure there are no problems with filesystem corruption around these
 packets or where you compile(?) them (e.g. after these raid problems)?

For my setup: It's a gentoo system, so compiling packages is the
normal way of installing something.
The compile itself is done on a tmpfs so a filesystem corruption there
should be rather impossible. ;)
(The system has 4Gb RAM, so it doesn't even need to swap)
The sources are taken from a nfsv4 share that is served from a
different system. Also gentoo checksums all sources it will use.

After the crashes I also did a checksum of the last installed
packages. Only in one instance there was corruption, all new files
where completely empty. Obviously XFS did not have the time to write
them back to disk before the system crashed.
Also as all crashes show network related traces and the system is
working fine otherwise, I doubt any permanent filesystem problems.

For the raid problems: I was just unable to even start the raid that
has / on it, because of a wrong

Re: 2.6.24-rc6-mm1

2008-01-05 Thread Torsten Kaiser
On Jan 5, 2008 11:10 PM, Torsten Kaiser [EMAIL PROTECTED] wrote:
 2.6.24-rc6 + mm-patches up to git.battery (includes git-net and
 git-netdev-all) worked for 110 packages, then I proclaimed it good.
 2.6.24-rc6 + mm-patches up to (including) git.nfsd is currently
 getting testet (9 packages done...)
That kernel did also work for all 110 packages.

2.6.24-rc6 + mm-patches up to (including) git.xfs - crash

[  576.899332] [ cut here ]
[  576.903661] kernel BUG at lib/list_debug.c:33!
[  576.903661] invalid opcode:  [1] SMP
[  576.903661] last sysfs file:
/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[  576.903661] CPU 3
[  576.903661] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev v4l2_common usbhid
v4l1_compat sg hid i2c_nforce2 pata_amd
[  576.903661] Pid: 5559, comm: nfsv4-svc Not tainted 2.6.24-rc6-mm-git.xfs #2
[  576.903661] RIP: 0010:[803c16e4]  [803c16e4]
__list_add+0x54/0x60
[  576.903661] RSP: 0018:81007d4e1dc0  EFLAGS: 00010282
[  576.903661] RAX: 0088 RBX: 81007e955800 RCX: fc6c7900
[  576.903661] RDX: 81007d53eef0 RSI: 0001 RDI: 80760140
[  576.903661] RBP: 81007d4e1dc0 R08: 0001 R09: 
[  576.903661] R10: 810080062008 R11: 0001 R12: 81007ed00900
[  576.903661] R13: 81007ed00938 R14: 81007ed00938 R15: 81007dd6f100
[  576.903661] FS:  7f1b7e6a36f0() GS:81011ff1b780()
knlGS:
[  576.903661] CS:  0010 DS:  ES:  CR0: 8005003b
[  576.903661] CR2: 7ffb28c2c000 CR3: 741ab000 CR4: 06e0
[  576.903661] DR0:  DR1:  DR2: 
[  576.903661] DR3:  DR6: 0ff0 DR7: 0400
[  576.903661] Process nfsv4-svc (pid: 5559, threadinfo
81007d4e, task 81007d53eef0)
[  576.903661] Stack:  81007d4e1e00 805c4dbb
81007ed00908 81007dd6f100
[  576.903661]  81011ad7bc00 81007d458000 81007e955800
81007dd6f110
[  576.903661]  81007d4e1e10 805c4ea7 81007d4e1ee0
805c5fd4
[  576.903661] Call Trace:
[  576.903661]  [805c4dbb] svc_xprt_enqueue+0x1ab/0x240
[  576.903661]  [805c4ea7] svc_xprt_received+0x17/0x20
[  576.903661]  [805c5fd4] svc_recv+0x394/0x7c0
[  576.903661]  [805c53de] svc_send+0xae/0xd0
[  576.903661]  [80230ab0] default_wake_function+0x0/0x10
[  576.903661]  [80316499] nfs_callback_svc+0x79/0x130
[  576.903662]  [80232f8c] finish_task_switch+0xcc/0xe0
[  576.903662]  [8020c818] child_rip+0xa/0x12
[  576.903662]  [8020bf2f] restore_args+0x0/0x30
[  576.903662]  [805b9ecd] __svc_create_thread+0xdd/0x200
[  576.903662]  [80316420] nfs_callback_svc+0x0/0x130
[  576.903662]  [8020c80e] child_rip+0x0/0x12
[  576.903662]
[  576.903662]
[  576.903662] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16
48 89 e5 e8
[  576.903662] RIP  [803c16e4] __list_add+0x54/0x60
[  576.903662]  RSP 81007d4e1dc0
[  576.903673] ---[ end trace d46de6b99ae8cd5a ]---
[  576.913664] Kernel panic - not syncing: Aiee, killing interrupt handler!

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-04 Thread Torsten Kaiser
On Jan 4, 2008 4:21 PM, Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> On Jan 4, 2008 2:30 PM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> > - above git-nfsd and git-net tests should be probably repeated with
> > -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
> > only, and if bug triggers, with one reversed; btw., since in previous
> > message you mentioned that 50 packages could be not enough to trigger
> > this, these 54 above could make too little margin yet.
>
> Yes, I think I really need to redo the git-nfsd-test.
> With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
> run of kde-packages triggered it after only 5 packages.
> I don't know what this bug hates about kdeartwork-wallpaper (triggered
> it this time) or kdeartwork-styles.

49 more (kde-)packages did work too. Still looks like it is only in -mm.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-04 Thread Torsten Kaiser
On Jan 4, 2008 2:30 PM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> On 04-01-2008 11:23, Torsten Kaiser wrote:
> > On Jan 2, 2008 10:51 PM, Herbert Xu <[EMAIL PROTECTED]> wrote:
> >> On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote:
> >>> Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
> >> OK that's great.  The next step would be to try excluding specific git
> >> trees from mm to see if they make a difference.
> >>
> >> The two specific trees of interest would be git-nfsd and git-net.
> >
> > git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm
> > -> compiling and installing 54 packages worked without crashes.
> >
> > git-net from 
> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git
> > -> compiling and installing 95 packages worked without crashes.
> ...
> > I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I
> > have no clue where to look...
>
> Hi,
>
> A few questions/suggestions:

I'm open for any suggestions and will try to answer any questions.
The only thing that is sadly not practical is bisecting the borkenout
mm-patches, as triggering this error is to unreliable /
time-consuming.

> - is it still vanilla -rc6-mm1; I've seen on kernel list you tried
> some fixes around raid?

Yes, without these fixes I can't boot.
But they should only be run during starting the arrays, so I doubt
that this is that cause.
(Also -rc3-mm2 did not need this fix)

My skbuff-double-free-detector is still in there, but was never triggered.

> - could you remind this lockdep warning; is it always and the same,
> always before crash, or no rules?

???
I see no lockdep warning before the crashes.
I have seen a warning about the dst->__refcnt in dst_release and
different warnings about list operations.

I think I have always posted everything I have seen before the
crashes. (captured via serial console)

(If you mean the lockdep-problem in -rc6: That is more or less a
missing annotation during early bootup. The only problem with that is,
that it will causes lockdep to be turned off and so it can not be used
to find any real problem. A fix for that is in -mm so I do have
lockdep on the mm-kernels)

> - I've seen you looked after double freeing, but this last debug list
> warning could suggest locking problems during list modification too.

Yes, but Herbert mentioned double freeing a skb explicit and so I
tried to catch this.
I do not know enough about the network core to verify the locking of
the involved lists.

> - above git-nfsd and git-net tests should be probably repeated with
> -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
> only, and if bug triggers, with one reversed; btw., since in previous
> message you mentioned that 50 packages could be not enough to trigger
> this, these 54 above could make too little margin yet.

Yes, I think I really need to redo the git-nfsd-test.
With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
run of kde-packages triggered it after only 5 packages.
I don't know what this bug hates about kdeartwork-wallpaper (triggered
it this time) or kdeartwork-styles.

Output from the crash with IOMMU_DEBUG (lockdep was enabled, but did
not trigger):
[15593.236374] Unable to handle kernel NULL pointer
dereference<3>list_add corruption. prev->next should be next
(8078a410), but was 81011ec01e68. (prev=81011ec01e68).
[15593.236374]  at  RIP:
[15593.236374]  [<>]
[15593.236374] PGD 79d22067 PUD 7acd7067 PMD 0
[15593.236374] Oops: 0010 [1] SMP
[15593.236374] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[15593.236374] CPU 2
[15593.236374] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat sg hid pata_amd i2c_nforce2
[15593.236374] Pid: 510, comm: khpsbpkt Not tainted 2.6.24-rc6-mm1 #15
[15593.236374] RIP: 0010:[<>]  [<>]
[15593.236374] RSP: 0018:81007eed3ee8  EFLAGS: 00010206
[15593.236374] RAX: 81007eed3ef0 RBX: 81011ec01e40 RCX: 81011ec01e40
[15593.236374] RDX: 81011ec01e68 RSI: 81011ec01e68 RDI: 
[15593.236374] RBP: 81007eed3f10 R08:  R09: 0001
[15593.236374] R10: 0001 R11: 0058 R12: 81007eed3ef0
[15593.236374] R13: 80470e50 R14:  R15: 
[15593.236374] FS:  7f76e6c98700() GS:81011ff1f000()
knlGS:556f46c0
[15593.236374] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[15593.236374] CR2: 00

Re: 2.6.24-rc6-mm1

2008-01-04 Thread Torsten Kaiser
On Jan 2, 2008 10:51 PM, Herbert Xu <[EMAIL PROTECTED]> wrote:
> On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote:
> >
> > Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
>
> OK that's great.  The next step would be to try excluding specific git
> trees from mm to see if they make a difference.
>
> The two specific trees of interest would be git-nfsd and git-net.

git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm
-> compiling and installing 54 packages worked without crashes.

git-net from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git
-> compiling and installing 95 packages worked without crashes.

The only thing in the announces of 2.6.24-rc3-mm1/2 that stands out for me is:
+iommu-sg-merging-add-device_dma_parameters-structure.patch
+iommu-sg-merging-pci-add-device_dma_parameters-support.patch
+iommu-sg-merging-x86-make-pci-gart-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-ppc-make-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-ia64-make-sba_iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-alpha-make-pci_iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-sparc64-make-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-parisc-make-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue.patch
+iommu-sg-merging-sata_inic162x-use-pci_set_dma_max_seg_size.patch
+iommu-sg-merging-aacraid-use-pci_set_dma_max_seg_size.patch

 iommu work

I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I
have no clue where to look...

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-04 Thread Torsten Kaiser
On Jan 2, 2008 10:51 PM, Herbert Xu [EMAIL PROTECTED] wrote:
 On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote:
 
  Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.

 OK that's great.  The next step would be to try excluding specific git
 trees from mm to see if they make a difference.

 The two specific trees of interest would be git-nfsd and git-net.

git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm
- compiling and installing 54 packages worked without crashes.

git-net from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git
- compiling and installing 95 packages worked without crashes.

The only thing in the announces of 2.6.24-rc3-mm1/2 that stands out for me is:
+iommu-sg-merging-add-device_dma_parameters-structure.patch
+iommu-sg-merging-pci-add-device_dma_parameters-support.patch
+iommu-sg-merging-x86-make-pci-gart-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-ppc-make-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-ia64-make-sba_iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-alpha-make-pci_iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-sparc64-make-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-parisc-make-iommu-respect-the-segment-size-limits.patch
+iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue.patch
+iommu-sg-merging-sata_inic162x-use-pci_set_dma_max_seg_size.patch
+iommu-sg-merging-aacraid-use-pci_set_dma_max_seg_size.patch

 iommu work

I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I
have no clue where to look...

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc6-mm1

2008-01-04 Thread Torsten Kaiser
On Jan 4, 2008 2:30 PM, Jarek Poplawski [EMAIL PROTECTED] wrote:
 On 04-01-2008 11:23, Torsten Kaiser wrote:
  On Jan 2, 2008 10:51 PM, Herbert Xu [EMAIL PROTECTED] wrote:
  On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote:
  Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
  OK that's great.  The next step would be to try excluding specific git
  trees from mm to see if they make a difference.
 
  The two specific trees of interest would be git-nfsd and git-net.
 
  git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm
  - compiling and installing 54 packages worked without crashes.
 
  git-net from 
  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git
  - compiling and installing 95 packages worked without crashes.
 ...
  I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I
  have no clue where to look...

 Hi,

 A few questions/suggestions:

I'm open for any suggestions and will try to answer any questions.
The only thing that is sadly not practical is bisecting the borkenout
mm-patches, as triggering this error is to unreliable /
time-consuming.

 - is it still vanilla -rc6-mm1; I've seen on kernel list you tried
 some fixes around raid?

Yes, without these fixes I can't boot.
But they should only be run during starting the arrays, so I doubt
that this is that cause.
(Also -rc3-mm2 did not need this fix)

My skbuff-double-free-detector is still in there, but was never triggered.

 - could you remind this lockdep warning; is it always and the same,
 always before crash, or no rules?

???
I see no lockdep warning before the crashes.
I have seen a warning about the dst-__refcnt in dst_release and
different warnings about list operations.

I think I have always posted everything I have seen before the
crashes. (captured via serial console)

(If you mean the lockdep-problem in -rc6: That is more or less a
missing annotation during early bootup. The only problem with that is,
that it will causes lockdep to be turned off and so it can not be used
to find any real problem. A fix for that is in -mm so I do have
lockdep on the mm-kernels)

 - I've seen you looked after double freeing, but this last debug list
 warning could suggest locking problems during list modification too.

Yes, but Herbert mentioned double freeing a skb explicit and so I
tried to catch this.
I do not know enough about the network core to verify the locking of
the involved lists.

 - above git-nfsd and git-net tests should be probably repeated with
 -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
 only, and if bug triggers, with one reversed; btw., since in previous
 message you mentioned that 50 packages could be not enough to trigger
 this, these 54 above could make too little margin yet.

Yes, I think I really need to redo the git-nfsd-test.
With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
run of kde-packages triggered it after only 5 packages.
I don't know what this bug hates about kdeartwork-wallpaper (triggered
it this time) or kdeartwork-styles.

Output from the crash with IOMMU_DEBUG (lockdep was enabled, but did
not trigger):
[15593.236374] Unable to handle kernel NULL pointer
dereference3list_add corruption. prev-next should be next
(8078a410), but was 81011ec01e68. (prev=81011ec01e68).
[15593.236374]  at  RIP:
[15593.236374]  []
[15593.236374] PGD 79d22067 PUD 7acd7067 PMD 0
[15593.236374] Oops: 0010 [1] SMP
[15593.236374] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[15593.236374] CPU 2
[15593.236374] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat sg hid pata_amd i2c_nforce2
[15593.236374] Pid: 510, comm: khpsbpkt Not tainted 2.6.24-rc6-mm1 #15
[15593.236374] RIP: 0010:[]  []
[15593.236374] RSP: 0018:81007eed3ee8  EFLAGS: 00010206
[15593.236374] RAX: 81007eed3ef0 RBX: 81011ec01e40 RCX: 81011ec01e40
[15593.236374] RDX: 81011ec01e68 RSI: 81011ec01e68 RDI: 
[15593.236374] RBP: 81007eed3f10 R08:  R09: 0001
[15593.236374] R10: 0001 R11: 0058 R12: 81007eed3ef0
[15593.236374] R13: 80470e50 R14:  R15: 
[15593.236374] FS:  7f76e6c98700() GS:81011ff1f000()
knlGS:556f46c0
[15593.236374] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[15593.236374] CR2:  CR3: 79d29000 CR4: 06e0
[15593.236374] DR0:  DR1:  DR2: 
[15593.236374] DR3:  DR6: 0ff0 DR7: 0400
[15593.236374] Process khpsbpkt (pid: 510, threadinfo
81007eed2000, task

  1   2   3   4   >