Re: [cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050

2018-04-18 Thread Fengguang Wu

Hi James,

On Wed, Apr 18, 2018 at 02:59:15PM +0100, James Simmons wrote:



Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It looks like a new regression.

[7.587002]  lnet_selftest_init+0x2c4/0x5d9:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:134
[7.587002]  ? lnet_selftest_exit+0x8d/0x8d:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:90


Are you running lnet selftest ?


Perhaps yes -- it's randconfig boot test and the .config does include
CONFIG_LNET_SELFTEST:

   CONFIG_LNET=y
   CONFIG_LNET_MAX_PAYLOAD=1048576
==> CONFIG_LNET_SELFTEST=y
   CONFIG_LNET_XPRT_IB=y


Is this a UMP setup?


Yes, .config has:

   # CONFIG_SMP is not set


The reason I ask is that their is a SMP handling bug in lnet
selftest. If you look at the mailing list I pushed a SMP patch
series. Can you try that series and tell me if it works for you.


So it looks your fixup patch is not for this case? Anyway the
reproduce-* script attached in the previous email should be fairly
straightforward to try out for reproducing the bug.

Thanks,
Fengguang


Re: [cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050

2018-04-18 Thread Fengguang Wu

Hi James,

On Wed, Apr 18, 2018 at 02:59:15PM +0100, James Simmons wrote:



Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It looks like a new regression.

[7.587002]  lnet_selftest_init+0x2c4/0x5d9:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:134
[7.587002]  ? lnet_selftest_exit+0x8d/0x8d:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:90


Are you running lnet selftest ?


Perhaps yes -- it's randconfig boot test and the .config does include
CONFIG_LNET_SELFTEST:

   CONFIG_LNET=y
   CONFIG_LNET_MAX_PAYLOAD=1048576
==> CONFIG_LNET_SELFTEST=y
   CONFIG_LNET_XPRT_IB=y


Is this a UMP setup?


Yes, .config has:

   # CONFIG_SMP is not set


The reason I ask is that their is a SMP handling bug in lnet
selftest. If you look at the mailing list I pushed a SMP patch
series. Can you try that series and tell me if it works for you.


So it looks your fixup patch is not for this case? Anyway the
reproduce-* script attached in the previous email should be fairly
straightforward to try out for reproducing the bug.

Thanks,
Fengguang


Re: [cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050

2018-04-18 Thread James Simmons

> Hello,
> 
> FYI this happens in mainline kernel 4.17.0-rc1.
> It looks like a new regression.
> 
> [7.587002]  lnet_selftest_init+0x2c4/0x5d9:
>   lnet_selftest_init at 
> drivers/staging/lustre/lnet/selftest/module.c:134
> [7.587002]  ? lnet_selftest_exit+0x8d/0x8d:
>   lnet_selftest_init at 
> drivers/staging/lustre/lnet/selftest/module.c:90

Are you running lnet selftest ? Is this a UMP setup? The reason I ask is
that their is a SMP handling bug in lnet selftest. If you look at the
mailing list I pushed a SMP patch series. Can you try that series and
tell me if it works for you. Thanks


Re: [cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050

2018-04-18 Thread James Simmons

> Hello,
> 
> FYI this happens in mainline kernel 4.17.0-rc1.
> It looks like a new regression.
> 
> [7.587002]  lnet_selftest_init+0x2c4/0x5d9:
>   lnet_selftest_init at 
> drivers/staging/lustre/lnet/selftest/module.c:134
> [7.587002]  ? lnet_selftest_exit+0x8d/0x8d:
>   lnet_selftest_init at 
> drivers/staging/lustre/lnet/selftest/module.c:90

Are you running lnet selftest ? Is this a UMP setup? The reason I ask is
that their is a SMP handling bug in lnet selftest. If you look at the
mailing list I pushed a SMP patch series. Can you try that series and
tell me if it works for you. Thanks


[cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050

2018-04-18 Thread Fengguang Wu
Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It looks like a new regression.

It occurs in 5 out of 5 boots.

[6.524361] ledtrig-cpu: registered to indicate activity on CPUs
[6.527658] NET: Registered protocol family 4
[6.528191] comedi: version 0.7.76 - http://www.comedi.org
[6.528851] LNetError: 1:0:(module.c:546:libcfs_init()) misc_register: error 
-16
[7.220272] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input3
[7.586283] BUG: unable to handle kernel NULL pointer dereference at 0050
[7.586962] *pdpt =  *pde = f000ff53f000ff53
[7.587002] Oops:  [#1] PREEMPT
[7.587002] CPU: 0 PID: 1 Comm: swapper Not tainted 4.17.0-rc1 #1
[7.587002] EIP: cfs_trace_lock_tcd+0xb/0xa0:
cfs_trace_lock_tcd at 
drivers/staging/lustre/lnet/libcfs/linux/linux-tracefile.c:149
[7.587002] EFLAGS: 00210246 CPU: 0
[7.587002] EAX:  EBX:  ECX: 81fcb588 EDX: 
[7.587002] ESI: 1800 EDI: 8f5d1e08 EBP: 8f5d1d7c ESP: 8f5d1d70
[7.587002]  DS: 007b ES: 007b FS:  GS: 00e0 SS: 0068
[7.587002] CR0: 80050033 CR2: 0050 CR3: 022f CR4: 06b0
[7.587002] Call Trace:
[7.587002]  libcfs_debug_vmsg2+0x8f/0x82f:
libcfs_debug_vmsg2 at 
drivers/staging/lustre/lnet/libcfs/tracefile.c:317
[7.587002]  ? trace_irq_enable_rcuidle+0x25/0x62:
static_key_false at 
include/linux/jump_label.h:206
 (inlined by) 
trace_irq_enable_rcuidle at include/trace/events/preemptirq.h:40
[7.587002]  ? slob_free+0x249/0x251:
slob_free at mm/slob.c:421
[7.587002]  libcfs_debug_msg+0x19/0x1b:
libcfs_debug_msg at 
drivers/staging/lustre/lnet/libcfs/tracefile.c:287
[7.587002]  ksocknal_startup+0xe77/0x12b2:
ksocknal_startup at 
drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c:2845 (discriminator 3)
[7.587002]  ? lock_release+0x135/0x1ec:
lock_release at 
kernel/locking/lockdep.c:3942
[7.587002]  ? _raw_spin_unlock+0x3c/0x4b:
__raw_spin_unlock at 
include/linux/spinlock_api_smp.h:152
 (inlined by) _raw_spin_unlock 
at kernel/locking/spinlock.c:176
[7.587002]  lnet_startup_lndni+0x4cd/0x9ec:
lnet_startup_lndni at 
drivers/staging/lustre/lnet/lnet/api-ni.c:1304
[7.587002]  LNetNIInit+0x880/0xa00:
lnet_startup_lndnis at 
drivers/staging/lustre/lnet/lnet/api-ni.c:1385
 (inlined by) LNetNIInit at 
drivers/staging/lustre/lnet/lnet/api-ni.c:1543
[7.587002]  ? read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at 
include/linux/seqlock.h:222

read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at 
include/linux/seqlock.h:222

read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at 
include/linux/seqlock.h:222
[7.587002]  srpc_startup+0x84/0x381:
srpc_startup at 
drivers/staging/lustre/lnet/selftest/rpc.c:1613
[7.587002]  lnet_selftest_init+0x2c4/0x5d9:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:134
[7.587002]  ? lnet_selftest_exit+0x8d/0x8d:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:90
[7.587002]  do_one_initcall+0x76/0x1d7:
__read_once_size at 
include/linux/compiler.h:188
 (inlined by) arch_atomic_read 
at arch/x86/include/asm/atomic.h:31
 (inlined by) atomic_read at 
include/asm-generic/atomic-instrumented.h:22
 (inlined by) static_key_count 
at include/linux/jump_label.h:194
 (inlined by) static_key_false 
at include/linux/jump_label.h:206
 (inlined by) 
trace_initcall_finish at include/trace/events/initcall.h:44
 (inlined by) do_one_initcall 
at init/main.c:884
[7.587002]  ? 

[cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050

2018-04-18 Thread Fengguang Wu
Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It looks like a new regression.

It occurs in 5 out of 5 boots.

[6.524361] ledtrig-cpu: registered to indicate activity on CPUs
[6.527658] NET: Registered protocol family 4
[6.528191] comedi: version 0.7.76 - http://www.comedi.org
[6.528851] LNetError: 1:0:(module.c:546:libcfs_init()) misc_register: error 
-16
[7.220272] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input3
[7.586283] BUG: unable to handle kernel NULL pointer dereference at 0050
[7.586962] *pdpt =  *pde = f000ff53f000ff53
[7.587002] Oops:  [#1] PREEMPT
[7.587002] CPU: 0 PID: 1 Comm: swapper Not tainted 4.17.0-rc1 #1
[7.587002] EIP: cfs_trace_lock_tcd+0xb/0xa0:
cfs_trace_lock_tcd at 
drivers/staging/lustre/lnet/libcfs/linux/linux-tracefile.c:149
[7.587002] EFLAGS: 00210246 CPU: 0
[7.587002] EAX:  EBX:  ECX: 81fcb588 EDX: 
[7.587002] ESI: 1800 EDI: 8f5d1e08 EBP: 8f5d1d7c ESP: 8f5d1d70
[7.587002]  DS: 007b ES: 007b FS:  GS: 00e0 SS: 0068
[7.587002] CR0: 80050033 CR2: 0050 CR3: 022f CR4: 06b0
[7.587002] Call Trace:
[7.587002]  libcfs_debug_vmsg2+0x8f/0x82f:
libcfs_debug_vmsg2 at 
drivers/staging/lustre/lnet/libcfs/tracefile.c:317
[7.587002]  ? trace_irq_enable_rcuidle+0x25/0x62:
static_key_false at 
include/linux/jump_label.h:206
 (inlined by) 
trace_irq_enable_rcuidle at include/trace/events/preemptirq.h:40
[7.587002]  ? slob_free+0x249/0x251:
slob_free at mm/slob.c:421
[7.587002]  libcfs_debug_msg+0x19/0x1b:
libcfs_debug_msg at 
drivers/staging/lustre/lnet/libcfs/tracefile.c:287
[7.587002]  ksocknal_startup+0xe77/0x12b2:
ksocknal_startup at 
drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c:2845 (discriminator 3)
[7.587002]  ? lock_release+0x135/0x1ec:
lock_release at 
kernel/locking/lockdep.c:3942
[7.587002]  ? _raw_spin_unlock+0x3c/0x4b:
__raw_spin_unlock at 
include/linux/spinlock_api_smp.h:152
 (inlined by) _raw_spin_unlock 
at kernel/locking/spinlock.c:176
[7.587002]  lnet_startup_lndni+0x4cd/0x9ec:
lnet_startup_lndni at 
drivers/staging/lustre/lnet/lnet/api-ni.c:1304
[7.587002]  LNetNIInit+0x880/0xa00:
lnet_startup_lndnis at 
drivers/staging/lustre/lnet/lnet/api-ni.c:1385
 (inlined by) LNetNIInit at 
drivers/staging/lustre/lnet/lnet/api-ni.c:1543
[7.587002]  ? read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at 
include/linux/seqlock.h:222

read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at 
include/linux/seqlock.h:222

read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at 
include/linux/seqlock.h:222
[7.587002]  srpc_startup+0x84/0x381:
srpc_startup at 
drivers/staging/lustre/lnet/selftest/rpc.c:1613
[7.587002]  lnet_selftest_init+0x2c4/0x5d9:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:134
[7.587002]  ? lnet_selftest_exit+0x8d/0x8d:
lnet_selftest_init at 
drivers/staging/lustre/lnet/selftest/module.c:90
[7.587002]  do_one_initcall+0x76/0x1d7:
__read_once_size at 
include/linux/compiler.h:188
 (inlined by) arch_atomic_read 
at arch/x86/include/asm/atomic.h:31
 (inlined by) atomic_read at 
include/asm-generic/atomic-instrumented.h:22
 (inlined by) static_key_count 
at include/linux/jump_label.h:194
 (inlined by) static_key_false 
at include/linux/jump_label.h:206
 (inlined by) 
trace_initcall_finish at include/trace/events/initcall.h:44
 (inlined by) do_one_initcall 
at init/main.c:884
[7.587002]  ?