Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-22 Thread Sedat Dilek
On 5/16/16, Sedat Dilek  wrote:
> On 5/16/16, Peter Zijlstra  wrote:
>> On Mon, May 16, 2016 at 07:42:35PM +0200, Sedat Dilek wrote:
>>
>>> Unfortunately, I could not reproduce this again with none of my
>>> 183-kernels.
>>> When I first hit a "chain_key collision" issue, it was hard to
>>> redproduce,
>>> so.
>>> Any idea, how I can "force" this?
>>
>> Nope; I wish I knew, that'd be so much easier to work with :/
>>
>> I'm hoping someone will report a reproducer, even something that
>> triggers once every 5-10 runs would be awesome.
>>
>> In any case, like I've explained before, nothing regressed as such, we
>> only added this new warning under DEBUG_LOCKDEP because we want to
>> better understand the condition that triggers it.
>>
>> If it bothers you, just turn off DEBUG_LOCKDEP and know that your kernel
>> is as reliable as it was before. OTOH, if you do keep it on, please
>> let me know if you can (semi) reliably trigger this, as I'd really like
>> to have a better understanding.
>>
>
> OK, I keep checking my logs.
>
> I refreshed your patch Ingo pointed me to.
>
> But it fails like this (on top of Linux v4.6 final)...
> [...]
>   if [ "" = "-pg" ]; then if [ kernel/locking/mutex-debug.o !=
> "scripts/mod/empty.o" ]; then ./scripts/recordmcount
> "kernel/locking/mutex-debug.o"; fi; fi;
>   mycompiler -Wp,-MD,kernel/locking/.lockdep.o.d  -nostdinc -isystem
> /usr/lib/gcc/x86_64-linux-gnu/4.9/include -nostdinc -isystem
> /usr/lib/gcc/x86_64-linux-gnu/4.9/include -I./arch/x86/include
> -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated
> -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi
> -I./include/uapi -Iinclude/generated/uapi -include
> ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef
> -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
> -Werror-implicit-function-declaration -Wno-format-security -std=gnu89
> -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1
> -falign-loops=1 -mno-80387 -mno-fp-ret-in-387
> -mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone
> -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args
> -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
> -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1
> -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe
> -Wno-sign-compare -fno-asynchronous-unwind-tables
> -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0
> -Wframe-larger-than=1024 -fno-stack-protector
> -Wno-unused-but-set-variable -fno-omit-frame-pointer
> -fno-optimize-sibling-calls -fno-var-tracking-assignments -mfentry
> -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign
> -fno-strict-overflow -fconserve-stack -Werror=implicit-int
> -Werror=strict-prototypes -Werror=date-time -DCC_HAVE_ASM_GOTO
> -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(lockdep)"
> -D"KBUILD_MODNAME=KBUILD_STR(lockdep)" -c -o
> kernel/locking/.tmp_lockdep.o kernel/locking/lockdep.c
> kernel/locking/lockdep.c: In function 'print_chain_keys_held_locks':
> kernel/locking/lockdep.c:2034:2: error: too few arguments to function
> 'print_chain_key_iteration'
>   print_chain_key_iteration(hlock_next->class_idx, chain_key);
>   ^
> kernel/locking/lockdep.c:2006:12: note: declared here
>  static u64 print_chain_key_iteration(int class_idx, u64 chain_key,
> u64 prev_key)
> ^
> make[4]: *** [kernel/locking/lockdep.o] Error 1
> make[3]: *** [kernel/locking] Error 2
> make[2]: *** [kernel] Error 2
> [...]
>

Is the attached fix correct?

- Sedat -

P.S.: Attached is a refreshed version of your original proposal patch
which does not compile correctly.
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5dc21eb101b0..b771a691b5e8 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2031,7 +2031,8 @@ print_chain_keys_held_locks(struct task_struct *curr, 
struct held_lock *hlock_ne
print_lock(hlock);
}
 
-   print_chain_key_iteration(hlock_next->class_idx, chain_key);
+   print_chain_key_iteration(hlock_next->class_idx, chain_key,
+ hlock->prev_chain_key);
print_lock(hlock_next);
 }
 


0001-locking-lockdep-Some-more-additional-chain_key-colli.patch
Description: Binary data


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-22 Thread Sedat Dilek
On 5/16/16, Sedat Dilek  wrote:
> On 5/16/16, Peter Zijlstra  wrote:
>> On Mon, May 16, 2016 at 07:42:35PM +0200, Sedat Dilek wrote:
>>
>>> Unfortunately, I could not reproduce this again with none of my
>>> 183-kernels.
>>> When I first hit a "chain_key collision" issue, it was hard to
>>> redproduce,
>>> so.
>>> Any idea, how I can "force" this?
>>
>> Nope; I wish I knew, that'd be so much easier to work with :/
>>
>> I'm hoping someone will report a reproducer, even something that
>> triggers once every 5-10 runs would be awesome.
>>
>> In any case, like I've explained before, nothing regressed as such, we
>> only added this new warning under DEBUG_LOCKDEP because we want to
>> better understand the condition that triggers it.
>>
>> If it bothers you, just turn off DEBUG_LOCKDEP and know that your kernel
>> is as reliable as it was before. OTOH, if you do keep it on, please
>> let me know if you can (semi) reliably trigger this, as I'd really like
>> to have a better understanding.
>>
>
> OK, I keep checking my logs.
>
> I refreshed your patch Ingo pointed me to.
>
> But it fails like this (on top of Linux v4.6 final)...
> [...]
>   if [ "" = "-pg" ]; then if [ kernel/locking/mutex-debug.o !=
> "scripts/mod/empty.o" ]; then ./scripts/recordmcount
> "kernel/locking/mutex-debug.o"; fi; fi;
>   mycompiler -Wp,-MD,kernel/locking/.lockdep.o.d  -nostdinc -isystem
> /usr/lib/gcc/x86_64-linux-gnu/4.9/include -nostdinc -isystem
> /usr/lib/gcc/x86_64-linux-gnu/4.9/include -I./arch/x86/include
> -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated
> -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi
> -I./include/uapi -Iinclude/generated/uapi -include
> ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef
> -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
> -Werror-implicit-function-declaration -Wno-format-security -std=gnu89
> -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1
> -falign-loops=1 -mno-80387 -mno-fp-ret-in-387
> -mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone
> -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args
> -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
> -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1
> -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe
> -Wno-sign-compare -fno-asynchronous-unwind-tables
> -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0
> -Wframe-larger-than=1024 -fno-stack-protector
> -Wno-unused-but-set-variable -fno-omit-frame-pointer
> -fno-optimize-sibling-calls -fno-var-tracking-assignments -mfentry
> -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign
> -fno-strict-overflow -fconserve-stack -Werror=implicit-int
> -Werror=strict-prototypes -Werror=date-time -DCC_HAVE_ASM_GOTO
> -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(lockdep)"
> -D"KBUILD_MODNAME=KBUILD_STR(lockdep)" -c -o
> kernel/locking/.tmp_lockdep.o kernel/locking/lockdep.c
> kernel/locking/lockdep.c: In function 'print_chain_keys_held_locks':
> kernel/locking/lockdep.c:2034:2: error: too few arguments to function
> 'print_chain_key_iteration'
>   print_chain_key_iteration(hlock_next->class_idx, chain_key);
>   ^
> kernel/locking/lockdep.c:2006:12: note: declared here
>  static u64 print_chain_key_iteration(int class_idx, u64 chain_key,
> u64 prev_key)
> ^
> make[4]: *** [kernel/locking/lockdep.o] Error 1
> make[3]: *** [kernel/locking] Error 2
> make[2]: *** [kernel] Error 2
> [...]
>

Is the attached fix correct?

- Sedat -

P.S.: Attached is a refreshed version of your original proposal patch
which does not compile correctly.
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5dc21eb101b0..b771a691b5e8 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2031,7 +2031,8 @@ print_chain_keys_held_locks(struct task_struct *curr, 
struct held_lock *hlock_ne
print_lock(hlock);
}
 
-   print_chain_key_iteration(hlock_next->class_idx, chain_key);
+   print_chain_key_iteration(hlock_next->class_idx, chain_key,
+ hlock->prev_chain_key);
print_lock(hlock_next);
 }
 


0001-locking-lockdep-Some-more-additional-chain_key-colli.patch
Description: Binary data


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Sedat Dilek
On 5/16/16, Peter Zijlstra <pet...@infradead.org> wrote:
> On Mon, May 16, 2016 at 07:42:35PM +0200, Sedat Dilek wrote:
>
>> Unfortunately, I could not reproduce this again with none of my
>> 183-kernels.
>> When I first hit a "chain_key collision" issue, it was hard to redproduce,
>> so.
>> Any idea, how I can "force" this?
>
> Nope; I wish I knew, that'd be so much easier to work with :/
>
> I'm hoping someone will report a reproducer, even something that
> triggers once every 5-10 runs would be awesome.
>
> In any case, like I've explained before, nothing regressed as such, we
> only added this new warning under DEBUG_LOCKDEP because we want to
> better understand the condition that triggers it.
>
> If it bothers you, just turn off DEBUG_LOCKDEP and know that your kernel
> is as reliable as it was before. OTOH, if you do keep it on, please
> let me know if you can (semi) reliably trigger this, as I'd really like
> to have a better understanding.
>

OK, I keep checking my logs.

I refreshed your patch Ingo pointed me to.

But it fails like this (on top of Linux v4.6 final)...
[...]
  if [ "" = "-pg" ]; then if [ kernel/locking/mutex-debug.o !=
"scripts/mod/empty.o" ]; then ./scripts/recordmcount
"kernel/locking/mutex-debug.o"; fi; fi;
  mycompiler -Wp,-MD,kernel/locking/.lockdep.o.d  -nostdinc -isystem
/usr/lib/gcc/x86_64-linux-gnu/4.9/include -nostdinc -isystem
/usr/lib/gcc/x86_64-linux-gnu/4.9/include -I./arch/x86/include
-Iarch/x86/include/generated/uapi -Iarch/x86/include/generated
-Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi
-I./include/uapi -Iinclude/generated/uapi -include
./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-Werror-implicit-function-declaration -Wno-format-security -std=gnu89
-mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1
-falign-loops=1 -mno-80387 -mno-fp-ret-in-387
-mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone
-mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args
-DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
-DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1
-DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe
-Wno-sign-compare -fno-asynchronous-unwind-tables
-fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0
-Wframe-larger-than=1024 -fno-stack-protector
-Wno-unused-but-set-variable -fno-omit-frame-pointer
-fno-optimize-sibling-calls -fno-var-tracking-assignments -mfentry
-DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign
-fno-strict-overflow -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -DCC_HAVE_ASM_GOTO
-D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(lockdep)"
-D"KBUILD_MODNAME=KBUILD_STR(lockdep)" -c -o
kernel/locking/.tmp_lockdep.o kernel/locking/lockdep.c
kernel/locking/lockdep.c: In function 'print_chain_keys_held_locks':
kernel/locking/lockdep.c:2034:2: error: too few arguments to function
'print_chain_key_iteration'
  print_chain_key_iteration(hlock_next->class_idx, chain_key);
  ^
kernel/locking/lockdep.c:2006:12: note: declared here
 static u64 print_chain_key_iteration(int class_idx, u64 chain_key,
u64 prev_key)
^
make[4]: *** [kernel/locking/lockdep.o] Error 1
make[3]: *** [kernel/locking] Error 2
make[2]: *** [kernel] Error 2
[...]

- Sedat -
From b953be255bfb46970c75950e297be836577bc525 Mon Sep 17 00:00:00 2001
From: Sedat Dilek <sedat.di...@gmail.com>
Date: Mon, 16 May 2016 15:51:04 +0200
Subject: [PATCH] locking/lockdep: Some more additional chain_key collision
 information

From: Peter Zijlstra <pet...@infradead.org>

For more details see thread "[v4.6-rc7-183-g1410b74e4061]" at LKML [1].

Patch for testing from Peter Zijlstra see [2] and [3].

[1] http://marc.info/?t=14632178432=1=2
[2] http://marc.info/?l=linux-kernel=146339587506110=2
[3] https://lkml.org/lkml/2016/5/10/214

Cc: Wanpeng Li <wanpeng...@linux.intel.com>
Cc: Alfredo Alvarez Fernandez <alfredoalvarezfernan...@gmail.com>
Cc: Peter Zijlstra (Intel) <pet...@infradead.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Sedat Dilek <sedat.di...@gmail.com>
Cc: Ted Tso <ty...@mit.edu>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: the arch/x86 maintainers <x...@kernel.org>
Cc: linux-fsde...@vger.kernel.org
---
 kernel/locking/lockdep.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 78c1c0ee6dc1..5dc21eb101b0 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c

Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Sedat Dilek
On 5/16/16, Peter Zijlstra  wrote:
> On Mon, May 16, 2016 at 07:42:35PM +0200, Sedat Dilek wrote:
>
>> Unfortunately, I could not reproduce this again with none of my
>> 183-kernels.
>> When I first hit a "chain_key collision" issue, it was hard to redproduce,
>> so.
>> Any idea, how I can "force" this?
>
> Nope; I wish I knew, that'd be so much easier to work with :/
>
> I'm hoping someone will report a reproducer, even something that
> triggers once every 5-10 runs would be awesome.
>
> In any case, like I've explained before, nothing regressed as such, we
> only added this new warning under DEBUG_LOCKDEP because we want to
> better understand the condition that triggers it.
>
> If it bothers you, just turn off DEBUG_LOCKDEP and know that your kernel
> is as reliable as it was before. OTOH, if you do keep it on, please
> let me know if you can (semi) reliably trigger this, as I'd really like
> to have a better understanding.
>

OK, I keep checking my logs.

I refreshed your patch Ingo pointed me to.

But it fails like this (on top of Linux v4.6 final)...
[...]
  if [ "" = "-pg" ]; then if [ kernel/locking/mutex-debug.o !=
"scripts/mod/empty.o" ]; then ./scripts/recordmcount
"kernel/locking/mutex-debug.o"; fi; fi;
  mycompiler -Wp,-MD,kernel/locking/.lockdep.o.d  -nostdinc -isystem
/usr/lib/gcc/x86_64-linux-gnu/4.9/include -nostdinc -isystem
/usr/lib/gcc/x86_64-linux-gnu/4.9/include -I./arch/x86/include
-Iarch/x86/include/generated/uapi -Iarch/x86/include/generated
-Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi
-I./include/uapi -Iinclude/generated/uapi -include
./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-Werror-implicit-function-declaration -Wno-format-security -std=gnu89
-mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1
-falign-loops=1 -mno-80387 -mno-fp-ret-in-387
-mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone
-mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args
-DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
-DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1
-DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe
-Wno-sign-compare -fno-asynchronous-unwind-tables
-fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0
-Wframe-larger-than=1024 -fno-stack-protector
-Wno-unused-but-set-variable -fno-omit-frame-pointer
-fno-optimize-sibling-calls -fno-var-tracking-assignments -mfentry
-DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign
-fno-strict-overflow -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -DCC_HAVE_ASM_GOTO
-D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(lockdep)"
-D"KBUILD_MODNAME=KBUILD_STR(lockdep)" -c -o
kernel/locking/.tmp_lockdep.o kernel/locking/lockdep.c
kernel/locking/lockdep.c: In function 'print_chain_keys_held_locks':
kernel/locking/lockdep.c:2034:2: error: too few arguments to function
'print_chain_key_iteration'
  print_chain_key_iteration(hlock_next->class_idx, chain_key);
  ^
kernel/locking/lockdep.c:2006:12: note: declared here
 static u64 print_chain_key_iteration(int class_idx, u64 chain_key,
u64 prev_key)
^
make[4]: *** [kernel/locking/lockdep.o] Error 1
make[3]: *** [kernel/locking] Error 2
make[2]: *** [kernel] Error 2
[...]

- Sedat -
From b953be255bfb46970c75950e297be836577bc525 Mon Sep 17 00:00:00 2001
From: Sedat Dilek 
Date: Mon, 16 May 2016 15:51:04 +0200
Subject: [PATCH] locking/lockdep: Some more additional chain_key collision
 information

From: Peter Zijlstra 

For more details see thread "[v4.6-rc7-183-g1410b74e4061]" at LKML [1].

Patch for testing from Peter Zijlstra see [2] and [3].

[1] http://marc.info/?t=14632178432=1=2
[2] http://marc.info/?l=linux-kernel=146339587506110=2
[3] https://lkml.org/lkml/2016/5/10/214

Cc: Wanpeng Li 
Cc: Alfredo Alvarez Fernandez 
Cc: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Sedat Dilek 
Cc: Ted Tso 
Cc: LKML 
Cc: the arch/x86 maintainers 
Cc: linux-fsde...@vger.kernel.org
---
 kernel/locking/lockdep.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 78c1c0ee6dc1..5dc21eb101b0 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2003,13 +2003,14 @@ static inline int get_first_held_lock(struct task_struct *curr,
 /*
  * Returns the next chain_key iteration
  */
-static u64 print_chain_key_iteration(int class_idx, u64 chain_key)
+static u64 print_chain_key_iteration(int class_idx, u64 chain_key, u64 prev_key)
 {
 	u64 new_chain_key = iterate_chain_key(chain_key, class_idx);
 
-	printk(" class_idx:%d -&g

Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Peter Zijlstra
On Mon, May 16, 2016 at 07:42:35PM +0200, Sedat Dilek wrote:

> Unfortunately, I could not reproduce this again with none of my 183-kernels.
> When I first hit a "chain_key collision" issue, it was hard to redproduce, so.
> Any idea, how I can "force" this?

Nope; I wish I knew, that'd be so much easier to work with :/

I'm hoping someone will report a reproducer, even something that
triggers once every 5-10 runs would be awesome.

In any case, like I've explained before, nothing regressed as such, we
only added this new warning under DEBUG_LOCKDEP because we want to
better understand the condition that triggers it.

If it bothers you, just turn off DEBUG_LOCKDEP and know that your kernel
is as reliable as it was before. OTOH, if you do keep it on, please
let me know if you can (semi) reliably trigger this, as I'd really like
to have a better understanding.


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Peter Zijlstra
On Mon, May 16, 2016 at 07:42:35PM +0200, Sedat Dilek wrote:

> Unfortunately, I could not reproduce this again with none of my 183-kernels.
> When I first hit a "chain_key collision" issue, it was hard to redproduce, so.
> Any idea, how I can "force" this?

Nope; I wish I knew, that'd be so much easier to work with :/

I'm hoping someone will report a reproducer, even something that
triggers once every 5-10 runs would be awesome.

In any case, like I've explained before, nothing regressed as such, we
only added this new warning under DEBUG_LOCKDEP because we want to
better understand the condition that triggers it.

If it bothers you, just turn off DEBUG_LOCKDEP and know that your kernel
is as reliable as it was before. OTOH, if you do keep it on, please
let me know if you can (semi) reliably trigger this, as I'd really like
to have a better understanding.


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Sedat Dilek
On 5/16/16, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Sedat Dilek <sedat.di...@gmail.com> wrote:
>
>> Hi,
>>
>> as Linux v4.6 is very near, I decided to write this bug report (only
>> drunk one coffee).
>>
>> First, I am not absolutely sure if this is a real issue as...
>> #1: This is only a (lockdep) warning.
>> #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached
>> patch)
>
> Having such a kernel base is certainly not helpful to the quality of your
> report,
> please try to reproduce under the vanilla kernel.
>
>> For a more helpful feedback I should test a...
>> #1: vanilla v4.6-rc7-183-g1410b74e4061
>
> I'd suggest v4.6 plus this debug patch from Peter:
>
>   https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142352.html
>
> ... to see whether something fishy is going on.
>
> Thanks,
>

Unfortunately, I could not reproduce this again with none of my 183-kernels.
When I first hit a "chain_key collision" issue, it was hard to redproduce, so.
Any idea, how I can "force" this?

- Sedat -


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Sedat Dilek
On 5/16/16, Ingo Molnar  wrote:
>
> * Sedat Dilek  wrote:
>
>> Hi,
>>
>> as Linux v4.6 is very near, I decided to write this bug report (only
>> drunk one coffee).
>>
>> First, I am not absolutely sure if this is a real issue as...
>> #1: This is only a (lockdep) warning.
>> #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached
>> patch)
>
> Having such a kernel base is certainly not helpful to the quality of your
> report,
> please try to reproduce under the vanilla kernel.
>
>> For a more helpful feedback I should test a...
>> #1: vanilla v4.6-rc7-183-g1410b74e4061
>
> I'd suggest v4.6 plus this debug patch from Peter:
>
>   https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142352.html
>
> ... to see whether something fishy is going on.
>
> Thanks,
>

Unfortunately, I could not reproduce this again with none of my 183-kernels.
When I first hit a "chain_key collision" issue, it was hard to redproduce, so.
Any idea, how I can "force" this?

- Sedat -


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Ingo Molnar

* Sedat Dilek <sedat.di...@gmail.com> wrote:

> Hi,
> 
> as Linux v4.6 is very near, I decided to write this bug report (only
> drunk one coffee).
> 
> First, I am not absolutely sure if this is a real issue as...
> #1: This is only a (lockdep) warning.
> #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached patch)

Having such a kernel base is certainly not helpful to the quality of your 
report, 
please try to reproduce under the vanilla kernel.

> For a more helpful feedback I should test a...
> #1: vanilla v4.6-rc7-183-g1410b74e4061

I'd suggest v4.6 plus this debug patch from Peter:

  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142352.html

... to see whether something fishy is going on.

Thanks,

Ingo


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-16 Thread Ingo Molnar

* Sedat Dilek  wrote:

> Hi,
> 
> as Linux v4.6 is very near, I decided to write this bug report (only
> drunk one coffee).
> 
> First, I am not absolutely sure if this is a real issue as...
> #1: This is only a (lockdep) warning.
> #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached patch)

Having such a kernel base is certainly not helpful to the quality of your 
report, 
please try to reproduce under the vanilla kernel.

> For a more helpful feedback I should test a...
> #1: vanilla v4.6-rc7-183-g1410b74e4061

I'd suggest v4.6 plus this debug patch from Peter:

  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142352.html

... to see whether something fishy is going on.

Thanks,

Ingo


Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-14 Thread Eric Dumazet
On Sat, May 14, 2016 at 2:22 AM, Sedat Dilek <sedat.di...@gmail.com> wrote:
> Hi,
>
> as Linux v4.6 is very near, I decided to write this bug report (only
> drunk one coffee).
>
> First, I am not absolutely sure if this is a real issue as...
> #1: This is only a (lockdep) warning.
> #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached patch)
>
> For a more helpful feedback I should test a...
> #1: vanilla v4.6-rc7-183-g1410b74e4061
> #2: net.git#master on top of #1
>
> What I am seeing is this while surfing with a UMTS/HSPA internet-stick
> (using PPP) and running Firefox on Ubuntu/precise AMD64...
>
> [  423.484105] [ cut here ]
> [  423.484119] WARNING: CPU: 2 PID: 2392 at
> kernel/locking/lockdep.c:2098 __lock_acquire
> [  423.484123] DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base
> [  423.484125] Modules linked in: btrfs xor raid6_pq ntfs xfs
> libcrc32c ppp_deflate bsd_comp ppp_async crc_ccitt option usb_wwan
> cdc_ether usbserial usbnet snd_hda_codec_hdmi i915
> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel
> snd_hda_codec arc4 uvcvideo iwldvm snd_hwdep joydev mac80211
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
> videodev rfcomm bnep kvm_intel kvm usb_storage btusb i2c_algo_bit
> iwlwifi btrtl snd_hda_core drm_kms_helper btbcm snd_pcm parport_pc
> btintel syscopyarea irqbypass ppdev snd_seq_midi bluetooth sysfillrect
> psmouse snd_seq_midi_event sysimgblt snd_rawmidi fb_sys_fops cfg80211
> snd_seq drm serio_raw snd_timer samsung_laptop snd_seq_device snd
> soundcore wmi mac_hid video intel_rst lpc_ich lp parport binfmt_misc
> hid_generic usbhid hid r8169 mii
> [  423.484241] CPU: 2 PID: 2392 Comm: firefox Not tainted
> 4.6.0-rc7-183.1-iniza-small #1
> [  423.484244] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [  423.484247]   88011fa83910 81413825
> 88011fa83960
> [  423.484253]   88011fa83950 81083ea1
> 083282f34ec0
> [  423.484259]  0005 880082f34540 
> 0027
> [  423.484265] Call Trace:
> [  423.484268][] dump_stack
> [  423.484280]  [] __warn
> [  423.484285]  [] warn_slowpath_fmt
> [  423.484289]  [] __lock_acquire
> [  423.484294]  [] ? __lock_acquire
> [  423.484298]  [] ? __lock_acquire
> [  423.484302]  [] lock_acquire
> [  423.484307]  [] ? __dev_queue_xmit
> [  423.484313]  [] _raw_spin_lock
> [  423.484317]  [] ? __dev_queue_xmit
> [  423.484321]  [] __dev_queue_xmit
> [  423.484326]  [] ? __dev_queue_xmit
> [  423.484330]  [] dev_queue_xmit
> [  423.484334]  [] neigh_direct_output
> [  423.484339]  [] ip_finish_output2
> [  423.484344]  [] ? ip_finish_output2
> [  423.484349]  [] ip_finish_output
> [  423.484353]  [] ip_output
> [  423.484357]  [] ? __lock_is_held
> [  423.484362]  [] ip_local_out
> [  423.484366]  [] ip_queue_xmit
> [  423.484371]  [] ? ip_queue_xmit
> [  423.484376]  [] tcp_transmit_skb
> [  423.484380]  [] __tcp_retransmit_skb
> [  423.484385]  [] tcp_retransmit_skb
> [  423.484389]  [] tcp_retransmit_timer
> [  423.484394]  [] ? tcp_write_timer_handler
> [  423.484398]  [] tcp_write_timer_handler
> [  423.484402]  [] tcp_write_timer
> [  423.484407]  [] call_timer_fn
> [  423.484411]  [] ? call_timer_fn
> [  423.484416]  [] ? tcp_write_timer_handler
> [  423.484419]  [] run_timer_softirq
> [  423.484424]  [] __do_softirq
> [  423.484428]  [] irq_exit
> [  423.484432]  [] smp_apic_timer_interrupt
> [  423.484437]  [] apic_timer_interrupt
> [  423.484439]  
> [  423.484443] ---[ end trace a29d8ee0ef420d5c ]---
> [  423.484446]
> [  423.484447] ==
> [  423.484449] [chain_key collision ]
> [  423.484452] 4.6.0-rc7-183.1-iniza-small #1 Tainted: GW
> [  423.484454] --
> [  423.484457] firefox/2392: Hash chain already cached but the
> contents don't match!
> [  423.484460] Held locks:depth: 6
> [  423.484463]  class_idx:1993 -> chain_key:07c9
> (((>icsk_retransmit_timer))){
> [  423.484473]  class_idx:1334 -> chain_key:00f92536 (slock-AF_INET){
> [  423.484482]  class_idx:33 -> chain_key:001f24a6c021
> (rcu_read_lock){..}, at: [] ip_queue_xmit
> [  423.484492]  class_idx:1005 -> chain_key:0003e494d80423ed
> (rcu_read_lock_bh){..}, at: [] ip_finish_output2
> [  423.484500]  class_idx:1005 -> chain_key:7c929b00847da3ed
> (rcu_read_lock_bh){..}, at: [] __dev_queue_xmit
> [  423.484509]  class_idx:1996 -> chain_key:5360108fb47da85e
> (dev->qdisc_tx_busylock ?: _tx_busylock){
> [  42

Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-14 Thread Eric Dumazet
On Sat, May 14, 2016 at 2:22 AM, Sedat Dilek  wrote:
> Hi,
>
> as Linux v4.6 is very near, I decided to write this bug report (only
> drunk one coffee).
>
> First, I am not absolutely sure if this is a real issue as...
> #1: This is only a (lockdep) warning.
> #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached patch)
>
> For a more helpful feedback I should test a...
> #1: vanilla v4.6-rc7-183-g1410b74e4061
> #2: net.git#master on top of #1
>
> What I am seeing is this while surfing with a UMTS/HSPA internet-stick
> (using PPP) and running Firefox on Ubuntu/precise AMD64...
>
> [  423.484105] [ cut here ]
> [  423.484119] WARNING: CPU: 2 PID: 2392 at
> kernel/locking/lockdep.c:2098 __lock_acquire
> [  423.484123] DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base
> [  423.484125] Modules linked in: btrfs xor raid6_pq ntfs xfs
> libcrc32c ppp_deflate bsd_comp ppp_async crc_ccitt option usb_wwan
> cdc_ether usbserial usbnet snd_hda_codec_hdmi i915
> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel
> snd_hda_codec arc4 uvcvideo iwldvm snd_hwdep joydev mac80211
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
> videodev rfcomm bnep kvm_intel kvm usb_storage btusb i2c_algo_bit
> iwlwifi btrtl snd_hda_core drm_kms_helper btbcm snd_pcm parport_pc
> btintel syscopyarea irqbypass ppdev snd_seq_midi bluetooth sysfillrect
> psmouse snd_seq_midi_event sysimgblt snd_rawmidi fb_sys_fops cfg80211
> snd_seq drm serio_raw snd_timer samsung_laptop snd_seq_device snd
> soundcore wmi mac_hid video intel_rst lpc_ich lp parport binfmt_misc
> hid_generic usbhid hid r8169 mii
> [  423.484241] CPU: 2 PID: 2392 Comm: firefox Not tainted
> 4.6.0-rc7-183.1-iniza-small #1
> [  423.484244] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [  423.484247]   88011fa83910 81413825
> 88011fa83960
> [  423.484253]   88011fa83950 81083ea1
> 083282f34ec0
> [  423.484259]  0005 880082f34540 
> 0027
> [  423.484265] Call Trace:
> [  423.484268][] dump_stack
> [  423.484280]  [] __warn
> [  423.484285]  [] warn_slowpath_fmt
> [  423.484289]  [] __lock_acquire
> [  423.484294]  [] ? __lock_acquire
> [  423.484298]  [] ? __lock_acquire
> [  423.484302]  [] lock_acquire
> [  423.484307]  [] ? __dev_queue_xmit
> [  423.484313]  [] _raw_spin_lock
> [  423.484317]  [] ? __dev_queue_xmit
> [  423.484321]  [] __dev_queue_xmit
> [  423.484326]  [] ? __dev_queue_xmit
> [  423.484330]  [] dev_queue_xmit
> [  423.484334]  [] neigh_direct_output
> [  423.484339]  [] ip_finish_output2
> [  423.484344]  [] ? ip_finish_output2
> [  423.484349]  [] ip_finish_output
> [  423.484353]  [] ip_output
> [  423.484357]  [] ? __lock_is_held
> [  423.484362]  [] ip_local_out
> [  423.484366]  [] ip_queue_xmit
> [  423.484371]  [] ? ip_queue_xmit
> [  423.484376]  [] tcp_transmit_skb
> [  423.484380]  [] __tcp_retransmit_skb
> [  423.484385]  [] tcp_retransmit_skb
> [  423.484389]  [] tcp_retransmit_timer
> [  423.484394]  [] ? tcp_write_timer_handler
> [  423.484398]  [] tcp_write_timer_handler
> [  423.484402]  [] tcp_write_timer
> [  423.484407]  [] call_timer_fn
> [  423.484411]  [] ? call_timer_fn
> [  423.484416]  [] ? tcp_write_timer_handler
> [  423.484419]  [] run_timer_softirq
> [  423.484424]  [] __do_softirq
> [  423.484428]  [] irq_exit
> [  423.484432]  [] smp_apic_timer_interrupt
> [  423.484437]  [] apic_timer_interrupt
> [  423.484439]  
> [  423.484443] ---[ end trace a29d8ee0ef420d5c ]---
> [  423.484446]
> [  423.484447] ==
> [  423.484449] [chain_key collision ]
> [  423.484452] 4.6.0-rc7-183.1-iniza-small #1 Tainted: GW
> [  423.484454] --
> [  423.484457] firefox/2392: Hash chain already cached but the
> contents don't match!
> [  423.484460] Held locks:depth: 6
> [  423.484463]  class_idx:1993 -> chain_key:07c9
> (((>icsk_retransmit_timer))){
> [  423.484473]  class_idx:1334 -> chain_key:00f92536 (slock-AF_INET){
> [  423.484482]  class_idx:33 -> chain_key:001f24a6c021
> (rcu_read_lock){..}, at: [] ip_queue_xmit
> [  423.484492]  class_idx:1005 -> chain_key:0003e494d80423ed
> (rcu_read_lock_bh){..}, at: [] ip_finish_output2
> [  423.484500]  class_idx:1005 -> chain_key:7c929b00847da3ed
> (rcu_read_lock_bh){..}, at: [] __dev_queue_xmit
> [  423.484509]  class_idx:1996 -> chain_key:5360108fb47da85e
> (dev->qdisc_tx_busylock ?: _tx_busylock){
> [  423.484517] Locks in cached