Re: [llc_ui_release] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004

2018-04-29 Thread Fengguang Wu

On Sun, Apr 29, 2018 at 03:30:48AM +, Linus Torvalds wrote:

On Sat, Apr 28, 2018 at 7:12 PM Fengguang Wu <fengguang...@intel.com> wrote:


FYI this happens in mainline kernel 4.17.0-rc2.
It looks like a new regression.



It occurs in 5 out of 5 boots.



[main] 375 sockets created based on info from socket cachefile.
[main] Generating file descriptors
[main] Added 83 filenames from /dev
udevd[507]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv

platform:regulatory': No such file or directory

[  372.057947] caif:caif_disconnect_client(): nothing to disconnect
[  372.082415] BUG: unable to handle kernel NULL pointer dereference at

0004

I think this is fixed by commit 3a04ce7130a7 ("llc: fix NULL pointer deref
for SOCK_ZAPPED")


Confirmed. Sorry for the late report!

Regards,
Fengguang


Re: [wireless-testsing2:master 1/4] drivers/net/netdevsim/bpf.c:130:14: sparse: incompatible types for 'case' statement

2018-01-05 Thread Fengguang Wu

On Wed, Jan 03, 2018 at 05:02:37PM -0800, Jakub Kicinski wrote:

On Thu, 4 Jan 2018 03:53:20 +0800, kbuild test robot wrote:

tree:   
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-testing.git 
master
head:   6b3b30d0c31ddb2f4d8208c90bc2b4adef47204d
commit: af2cae39f6ab9dc596616d6a28c7772e1dd55e91 [1/4] Merge remote-tracking 
branch 'wireless-drivers-next/master'
reproduce:
# apt-get install sparse
git checkout af2cae39f6ab9dc596616d6a28c7772e1dd55e91
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__



   drivers/net/netdevsim/bpf.c: In function 'nsim_bpf_setup_tc_block_cb':
>> drivers/net/netdevsim/bpf.c:130:7: error: 'TC_CLSBPF_REPLACE' undeclared 
(first use in this function); did you mean 'TC_RED_REPLACE'?
 case TC_CLSBPF_REPLACE:
  ^
  TC_RED_REPLACE


FWIW looks like the tree contains old net-next code and latest net
(linux/master) code.  Pulling from net-next will solve this.


:: TO: Jakub Kicinski 
:: CC: Daniel Borkmann 


Interestingly Daniel and I were not CCed on the report, is this
intentional?


Yeah the above ":: TO/CC" lines are for manual addition when
appropriate. They are the author/committer of the commit that last
modified the code line in question.

Thanks,
Fengguang


Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-12-04 Thread Fengguang Wu

Hi Tushar,

On Tue, Nov 28, 2017 at 01:01:23AM +0530, Tushar Dave wrote:



On 11/23/2017 04:43 AM, Fengguang Wu wrote:

On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote:



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as
/devices/platform/i8042/serio1/input/input6
[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641
pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted
4.14.0-01330-g3c07399 #1
[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX:

[   45.020182] RDX:  RSI:  RDI:
0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09:

[   45.021986] R10: 827ec29c R11: 0002 R12:
0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15:
c9863e8f
[   45.023842] FS:  () GS:88013fd0()
knlGS:
[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4:
06a0
[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
    __e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
    e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[   45.028351]  device_shutdown+0x110/0x1aa:
    device_shutdown at drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
    kernel_power_off at kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
    autoremove_wake_function at
kernel/sched/wait.c:376
[   45.030414]  kthread+0x126/0x12e:
    kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
    kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
    kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb
98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8
55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0
b1 61 82
[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.


Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW.
Unless you'd like to prevent regressions on real HW.

The original report attached a reproduce script to run the QEMU test.
Or you may send me the patch for testing.

Fengguang,

Would you please try this patch and test. The patch is compile tested
only. The patch is similar to how ixgbe handled the issue.
Thanks.

e1000: fix disabling already-disabled warning

This patch adds check so that driver does not disable already
disabled device.


It works! I tried 100 boots and the "e1000 :00:03.0: disabling
already-disabled device" error no longer show up.

Tested-by: Fengguang Wu <fengguang...@intel.com>

Thanks,
Fengguang


Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 drivers/net/ethernet/intel/e1000/e1000.h  |  3 ++-
 drivers/net/ethernet/intel/e1000/e1000_main.c | 23 ++-
 2 

Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-11-22 Thread Fengguang Wu

On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote:



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input6
[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 
pci_disable_device+0xa1/0x105:
pci_disable_device at 
drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 
4.14.0-01330-g3c07399 #1
[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
pci_disable_device at 
drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX: 
[   45.020182] RDX:  RSI:  RDI: 0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09: 
[   45.021986] R10: 827ec29c R11: 0002 R12: 0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f
[   45.023842] FS:  () GS:88013fd0() 
knlGS:
[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0
[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
__e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
rcu_perf_shutdown at 
kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[   45.028351]  device_shutdown+0x110/0x1aa:
device_shutdown at 
drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
kernel_power_off at 
kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
rcu_perf_shutdown at 
kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
autoremove_wake_function at 
kernel/sched/wait.c:376
[   45.030414]  kthread+0x126/0x12e:
kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
ret_from_fork at 
arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
ret_from_fork at 
arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 
aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 
00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82
[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.


Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW.
Unless you'd like to prevent regressions on real HW.

The original report attached a reproduce script to run the QEMU test.
Or you may send me the patch for testing.

Thanks,
Fengguang


[rht_deferred_worker] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 62s!

2017-11-21 Thread Fengguang Wu
Hello,

FYI this happens in mainline kernel 4.14.0-10859-gcf9b077.
It at least dates back to v4.5 .

It occurs in 2 out of 2 boots.

[  204.036012] Average test time: 28300548370
[  204.045272] Testing concurrent rhashtable access from 10 threads
[  206.134970] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
[  267.575158] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
[  329.015853] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
[  368.268565] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
stuck for 62s!
[  368.294584] Showing busy workqueues and worker pools:
[  368.303142] workqueue events: flags=0x0
[  368.310492]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256
[  368.312885] in-flight: 3:rht_deferred_worker
[  368.312885] pending: rht_deferred_worker, tsc_refine_calibration_work, 
vmstat_shepherd
[  368.342255] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=62s workers=2 idle: 
16
[  368.356480] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  368.451098]   Not tainted 4.14.0-10859-gcf9b077 #13
[  368.564524] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  368.726003] swapper/0   D0 1  0 0x8000
[  368.845359] Call Trace:
[  368.908714]  ? __schedule+0x81a/0x84b:
context_switch at kernel/sched/core.c:2801
  (inlined by) __schedule at kernel/sched/core.c:3374
[  368.984507]  ? schedule+0x59/0x84:
__preempt_count_sub at arch/x86/include/asm/preempt.h:107 (discriminator 1)
  (inlined by) schedule at kernel/sched/core.c:3434 (discriminator 1)
[  369.067920]  ? schedule_timeout+0x1b/0xb9:
schedule_timeout at kernel/time/timer.c:1766
[  369.155414]  ? _raw_spin_unlock_irq+0x28/0x6a:
__raw_spin_unlock_irq at include/linux/spinlock_api_smp.h:168
  (inlined by) _raw_spin_unlock_irq at kernel/locking/spinlock.c:199
[  369.241353]  ? _raw_spin_unlock_irq+0x4a/0x6a:
__raw_spin_unlock_irq at include/linux/spinlock_api_smp.h:182
  (inlined by) _raw_spin_unlock_irq at kernel/locking/spinlock.c:199
[  369.334592]  ? __wait_for_common+0x186/0x24a:
do_wait_for_common at kernel/sched/completion.c:91
  (inlined by) __wait_for_common at kernel/sched/completion.c:112
[  369.431098]  ? console_conditional_schedule+0x3c/0x3c:
schedule_timeout at kernel/time/timer.c:1751
[  369.557839]  ? do_task_dead+0x46/0x46:
default_wake_function at kernel/sched/core.c:3626
[  369.651937]  ? wait_for_completion+0x19/0x1c:
wait_for_completion at kernel/sched/completion.c:145
[  369.752007]  ? kthread_stop+0x196/0x326:
kthread_stop at kernel/kthread.c:531
[  369.847831]  ? test_rht_init+0x963/0x9da:
test_rht_init at lib/test_rhashtable.c:663
[  369.942078]  ? ___siphash_aligned+0x10/0x10
[  370.077915]  ? test_rhltable+0xd1e/0xd1e:
test_rht_init at lib/test_rhashtable.c:565
[  370.171118]  ? do_one_initcall+0x96/0x19b:
do_one_initcall at init/main.c:827
[  370.272905]  ? parse_args+0x19d/0x250:
parse_one at kernel/params.c:156
  (inlined by) parse_args at kernel/params.c:191
[  370.358882]  ? kernel_init_freeable+0xd2/0x1b8:
do_initcall_level at init/main.c:886
  (inlined by) do_initcalls at init/main.c:901
  (inlined by) do_basic_setup at init/main.c:919
  (inlined by) kernel_init_freeable at init/main.c:1067
[  370.465397]  ? kernel_init_freeable+0xee/0x1b8:
do_initcall_level at init/main.c:892
  (inlined by) do_initcalls at init/main.c:901
  (inlined by) do_basic_setup at init/main.c:919
  (inlined by) kernel_init_freeable at init/main.c:1067
[  370.571169]  ? rest_init+0xb3/0xb3:
kernel_init at init/main.c:991
[  370.658826]  ? kernel_init+0xd/0x15f:
kernel_init at init/main.c:996
[  370.746031]  ? ret_from_fork+0x19/0x24:
ret_from_fork at arch/x86/entry/entry_32.S:299
[  370.839497]
[  370.839497] Showing all locks held in the system:
[  370.978033] 2 locks held by khungtaskd/17:
[  370.985872]  #0:  (rcu_read_lock){}, at: [<7911eae2>] 
rcu_read_lock+0x0/0x6c:
rcu_read_lock at include/linux/rcupdate.h:628

rcu_read_lock+0x0/0x6c:
rcu_read_lock at include/linux/rcupdate.h:628

rcu_read_lock+0x0/0x6c:
rcu_read_lock at include/linux/rcupdate.h:628

rcu_read_lock+0x0/0x6c:
rcu_read_lock at include/linux/rcupdate.h:628

rcu_read_lock+0x0/0x6c:
rcu_read_lock at include/linux/rcupdate.h:628

rcu_read_lock+0x0/0x6c:
rcu_read_lock at include/linux/rcupdate.h:628

rcu_read_lock+0x0/0x6c:
rcu_read_lock at include/linux/rcupdate.h:628
[  370.999617]  #1:  (tasklist_lock){.+.+}, at: [<790bb514>] 
debug_show_all_locks+0x3a/0x13d:
debug_show_all_locks at kernel/locking/lockdep.c:4554
[  371.014651]
[  371.018049] =
[  371.018049]
[  371.651885] NMI backtrace for cpu 0
[  371.668798] CPU: 0 PID: 17 Comm: khungtaskd Not tainted 
4.14.0-10859-gcf9b077 #13
[  371.670706] Call Trace:
[  371.670706]  ? show_stack+0x5e/0x66:
show_stack at arch/x86/kernel/dumpstack.c:179
[  371.670706]  ? dump_stack+0xc1/0x121:
__dump_stack at lib/dump_stack.c:17
  (inlined by) dump_stack at lib/dump_stack.c:53
[  371.670706]  ? 

Re: CONFIG_DEBUG_INFO_SPLIT impacts on faddr2line

2017-11-14 Thread Fengguang Wu

Hi Andi,

On Mon, Nov 13, 2017 at 10:52:27AM -0800, Andi Kleen wrote:

> It's the "CONFIG_DEBUG_INFO_SPLIT" thing that makes faddr2line unable
> to see the inlining information,
>
> Using OPTIMIZE_INLINING is fine.

Good to know that!


It works for me. Perhaps your binutils is too old? It was
added at some point. Can you try upgrading?

% ./linux/scripts/faddr2line obj/vmlinux schedule+10
schedule+10/0x80:
schedule at arch/x86/include/asm/current.h:15

% addr2line --version
GNU addr2line version 2.27-24.fc26


I use debian and tried addr2line in 2 systems:

GNU addr2line (GNU Binutils for Debian) 2.28
GNU addr2line (GNU Binutils for Debian) 2.29.1

Regards,
Fengguang


Re: CONFIG_DEBUG_INFO_SPLIT impacts on faddr2line

2017-11-12 Thread Fengguang Wu

[...]

> Oh - and talking about "big step forward" - does the 0day robot do
> any
> suspend/resume testing at all?
Yes, we do. CC Rui and Aaron on power testing.


yes, we have added suspend/resume test in 0day, including both
functionality and suspend/resume performance. It is not widely run
because most of the 0Day testboxes are servers/desktops, now we've just
added some client laptops as testboxes, and will add more in the near
future. :)

>
> Even on non-laptop hardware, it should be possible to do something
> like
>
>    echo platform > /sys/power/pm_test
>    echo freeze > /sys/power/state
>
> or similar (assuming CONFIG_PM_DEBUG is enabled).
>


yes.

I will run native suspend/resume test on laptops and other test boxes
that really support it, and run suspend/resume test in pm_test modes on
the others to help us find more issues.


It's a good plan, thanks! Client devices can be much cheaper than servers.
They have more diversities in HW while being more general available.

On the other hand, if there are PM functionalities that can be tested
inside QEMU, it'll be good to have. Since no real HW can be tested as
cheap and extensive as the large amount of VMs.

Thanks,
Fengguang


CONFIG_DEBUG_INFO_SPLIT impacts on faddr2line

2017-11-12 Thread Fengguang Wu

CC Andi and more DEBUG_INFO_SPLIT people.

On Sun, Nov 12, 2017 at 11:31:56AM -0800, Linus Torvalds wrote:

On Wed, Nov 8, 2017 at 9:12 AM, Fengguang Wu <fengguang...@intel.com> wrote:


OK. Here is the original faddr2line output:

$ ~/linux/scripts/faddr2line vmlinux vlan_device_event+0x7f5/0xa40
vlan_device_event+0x7f5/0xa40:
vlan_device_event at net/8021q/vlan.h:60

And below is call trace embedded with full faddr2line output.

I notice that this trace shows no additional inline files at all.
Is it because I did some kconfig option wrong, so that inline info is
lost? Eg.

CONFIG_OPTIMIZE_INLINING=y (it looks better set to N)
CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_DEBUG_INFO_SPLIT=y


Ok, this annoyed me, so I went back and looked.

It's the "CONFIG_DEBUG_INFO_SPLIT" thing that makes faddr2line unable
to see the inlining information,

Using OPTIMIZE_INLINING is fine.


Good to know that!


I'm not sure that addr2line could be made to understand the .dwo files
that DEBUG_INFO_SPLIT causes (particularly since we munge the vmlinux
file itself, who knows how that could confuse things).

So can I ask that you make the 0day build scripts always use

 CONFIG_DEBUG_INFO=y
 CONFIG_DEBUG_INFO_REDUCED=y
 # CONFIG_DEBUG_INFO_SPLIT is not set

because with that "DEBUG_INFO_REDUCED=y", the use of DEBUG_INFO_SPLIT
shouldn't be _that_ big of a deal.

Yes, splitting the debug info does help reduce disk usage for the
build, and presumably speed it up a bit too due to less IO and reduced
copying of the debug info data, but right now it really makes the
debug info much less useful.


Yes DEBUG_INFO_SPLIT helps reduce build cost. Equally importantly,
it helps cut down the *.ko sizes, which saves boot test cost, too.
Since in our test scheme, the below modules.cgz will be loaded as part
of initrd on boot testing. Which will cost memory, and to the lesser
degree, IO and uncompressing time.

Here is the diff of the modules.cgz size:

Big files under 
/pkg/linux/x86_64-rhel-7.2+CONFIG_DEBUG_INFO_REDUCED/gcc-6/v4.14-rc7/,
comparing to +CONFIG_DEBUG_INFO_SPLIT:

=>54M  135M  modules.cgz
7.3M  7.3M  vmlinuz-4.14.0-rc7
1.2M  1.2M  linux-headers.cgz
7.6M  7.7M  linux-selftests.cgz
 31M   31M  linux-perf.cgz

Nevertheless, that's machine cost. If DEBUG_INFO_SPLIT hurts our
ability to analyze bugs, I think the forthright way would be to
disable it in our tests.


Just to see the difference:

- with DEBUG_INFO_SPLIT=y

   [torvalds@i7 linux]$ ./scripts/faddr2line vmlinux __schedule+0x314
   __schedule+0x314/0x840:
   __schedule at kernel/sched/stats.h:12

- with DEBUG_INFO_SPLIT is not set

   [torvalds@i7 linux]$ ./scripts/faddr2line vmlinux __schedule+0x314
   __schedule+0x314/0x840:
   rq_sched_info_arrive at kernel/sched/stats.h:12
(inlined by) sched_info_arrive at kernel/sched/stats.h:99
(inlined by) __sched_info_switch at kernel/sched/stats.h:151
(inlined by) sched_info_switch at kernel/sched/stats.h:158
(inlined by) prepare_task_switch at kernel/sched/core.c:2582
(inlined by) context_switch at kernel/sched/core.c:2755
(inlined by) __schedule at kernel/sched/core.c:3366

and while (once again) this is a pretty extreme case, we do use a lot
of inlines, and gcc will add its own inlining. Getting this whole
information - particularly for the faulting IP - would really help in
some situations.

I love what the 0day robot is doing, this would be another big step forward.


Thank you for the helpful information and appreciations!
I'll make the change to disable DEBUG_INFO_SPLIT.


Oh - and talking about "big step forward" - does the 0day robot do any
suspend/resume testing at all?


Yes, we do. CC Rui and Aaron on power testing.


Even on non-laptop hardware, it should be possible to do something like

   echo platform > /sys/power/pm_test
   echo freeze > /sys/power/state

or similar (assuming CONFIG_PM_DEBUG is enabled).

Maybe you already do something like this?


Rui/Aaron have better knowledge on the current status. It does look an
error-prone area that's worth more testing efforts.


Anyway, regardless this was a good release for the 0day robot. Thanks.


My (and our) pleasure. I'd like to thank you and all the people who
take time to analyze/fix the bugs. It's great to see the long standing
bugs being fixed in mainline -- they have been a big source of noises
that hurt our auto bisect capabilities.

Regards,
Fengguang


Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Fengguang Wu

On Sun, Nov 12, 2017 at 09:23:52AM +0800, Alexei Starovoitov wrote:

On 11/12/17 9:18 AM, Fengguang Wu wrote:

On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote:

On 11/12/17 8:23 AM, kbuild test robot wrote:

tree:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf:
reduce verifier memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
# save the attached .config to linux build tree
make.cross ARCH=mips

All errors (new ones prefixed by >>):

   kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':

verifier.c:(.text+0x36fc): undefined reference to `__multi3'


that's a known issue with gcc 7 on mips that is "optimizing"
normal 64-bit multiply into 128-bit variant.
Nothing to fix on the kernel side.


Good to know that! Do you think it a good idea to blacklist __multi3
errors in mips builds?


I would do so. yes.


OK.


Though digging further this function was added to
arch/sparc/lib/multi3.S
since gcc doing the same "optimization" there.
Adding asm code doesn't look right to me. I'd rather push
gcc folks to avoid such codegen.


Sure, I just forwarded the original report to GCC list.

Thanks,
Fengguang


[net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Fengguang Wu
CC gcc list. According to Alexei:

 This is a known issue with gcc 7 on mips that is "optimizing"
 normal 64-bit multiply into 128-bit variant.
 Nothing to fix on the kernel side.

 Digging further, this function was added to
 arch/sparc/lib/multi3.S
 since gcc doing the same "optimization" there.
 Adding asm code doesn't look right to me. I'd rather push
 gcc folks to avoid such codegen.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier 
memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
 # save the attached .config to linux build tree
 make.cross ARCH=mips 

All errors (new ones prefixed by >>):

kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':
>> verifier.c:(.text+0x36fc): undefined reference to `__multi3'
crypto/scompress.o: In function `.L82':
scompress.c:(.text+0x55c): undefined reference to `__multi3'
lib/mpi/generic_mpih-mul1.o: In function `.L2':
generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3'
lib/mpi/generic_mpih-mul2.o: In function `.L2':
generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3'
lib/mpi/generic_mpih-mul3.o: In function `.L2':
generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3'
lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to 
`__multi3' follow

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all


Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Fengguang Wu

On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote:

On 11/12/17 8:23 AM, kbuild test robot wrote:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier 
memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
# save the attached .config to linux build tree
make.cross ARCH=mips

All errors (new ones prefixed by >>):

   kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':

verifier.c:(.text+0x36fc): undefined reference to `__multi3'


that's a known issue with gcc 7 on mips that is "optimizing"
normal 64-bit multiply into 128-bit variant.
Nothing to fix on the kernel side.


Good to know that! Do you think it a good idea to blacklist __multi3
errors in mips builds?

Thanks,
Fengguang


   crypto/scompress.o: In function `.L82':
   scompress.c:(.text+0x55c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul1.o: In function `.L2':
   generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul2.o: In function `.L2':
   generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul3.o: In function `.L2':
   generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to 
`__multi3' follow





Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

2017-11-11 Thread Fengguang Wu

On Fri, Nov 10, 2017 at 10:29:59PM +0100, Thomas Gleixner wrote:

On Fri, 10 Nov 2017, Linus Torvalds wrote:


On Wed, Nov 8, 2017 at 9:19 PM, Fengguang Wu <fengguang...@intel.com> wrote:
>
> Yes it's accessing the list. Here is the faddr2line output.

Ok, so it's a corrupted timer list. Which is not a big surprise.

It's

next->pprev = pprev;

in __hlist_del(), and the trapping instruction decodes as

mov%rdx,0x8(%rax)

with %rax having the value dead0200,

Which is just LIST_POISON2.

So we've deleted that entry twice - LIST_POISON2 is what hlist_del()
sets pprev to after already deleting it once.

Although in this case it might not be hlist_del(), because
detach_timer() also sets entry->next to LIST_POISON2.

Which is pretty bogus, we are supposed to use LIST_POISON1 for the
"next" pointer. Oh well. Nobody cares, except for the list entry
debugging code, which isn't run on the hlist cases.

Adding Thomas Gleixner to the cc. It should not be possible to delete
the same timer twice.


Right, it shouldn't.

Fengguang, can you please enable:

CONFIG_DEBUG_OBJECTS
CONFIG_DEBUG_OBJECTS_TIMERS

and try to reproduce? Debugobject should catch that hopefully.


Sure. However I've not got any results until now -- it's rather hard
to reproduce. I'll check possible results tomorrow.

Regards,
Fengguang


Re: [Patch net] vlan: fix a use-after-free in vlan_device_event()

2017-11-10 Thread Fengguang Wu

It works, thank you for fixing this ancient bug!

Tested-by: Fengguang Wu <fengguang...@intel.com>


Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

On Thu, Nov 09, 2017 at 02:55:10PM +0800, Fengguang Wu wrote:

On Wed, Nov 08, 2017 at 10:34:10PM -0800, Cong Wang wrote:

On Wed, Nov 8, 2017 at 7:12 PM, Fengguang Wu <fengguang...@intel.com> wrote:

Hi Alex,


So looking over the trace the panic seems to be happening after a
decnet interface is getting deleted. Is there any chance we could try
compiling the kernel without decnet support to see if that is the
source of these issues? I don't know if anyone on the Intel Wired Lan
team is testing with that enabled so if we can eliminate that as a
possible cause that would be useful.



Sure and thank you for the suggestion!

It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.


Those decnet symbols are probably noises.


Yes it's not related to CONFIG_DECNET.


How do you reproduce it? And what is your setup? Vlan device on
top of your eth0 (e1000)?


It can basically be reproduced in one of our test machines --
lkp-wsx03, which is a Westmere EX server.


Anyway if you'd like to try, here are the steps. It'll auto download
the images and run QEMU.

   apt-get install lib32gcc-7-dev # or lib32gcc-6-dev
   git clone https://github.com/intel/lkp-tests.git
   cd lkp-tests
   bin/lkp qemu -k  job-script  # job-script is attached in this 
email

Note that even in our lkp-wsx03 machine, the chance of reproducing it
is only 3% (3 out of 100 boots).

Thanks,
Fengguang
#!/bin/sh

export_top_env()
{
export suite='trinity'
export testcase='trinity'
export runtime=300
export 
job_origin='/lkp/lkp/src/allot/rand/vm-lkp-wsx03-openwrt-i386/trinity.yaml'
export testbox='vm-lkp-wsx03-openwrt-i386-5'
export tbox_group='vm-lkp-wsx03-openwrt-i386'
export kconfig='i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS'
export compiler='gcc-6'
export queue='wfg'
export branch='linus/master'
export commit='c470abd4fde40ea6a0846a2beab642a578c0b8cd'
export submit_id='5a03a4550b9a93f7c99708b0'
export 
job_file='/lkp/scheduled/vm-lkp-wsx03-openwrt-i386-5/trinity-300s-openwrt-i386-2016-03-16.cgz-c470abd4fde40ea6a0846a2beab642a578c0b8cd-20171109-63433-kf9gj3-wait_kernel-0.yaml'
export id='181954ca4367d475b88dc8de99b2d52ab533a5e1'
export model='qemu-system-i386 -enable-kvm'
export nr_vm=32
export nr_cpu=1
export memory='320M'
export rootfs='openwrt-i386-2016-03-16.cgz'
export hdd_partitions='/dev/vda'
export swap_partitions='/dev/vdb'
export need_kconfig='CONFIG_KVM_GUEST=y'
export enqueue_time='2017-11-09 08:41:58 +0800'
export _id='5a03a4560b9a93f7c99708bb'
export 
_rt='/result/trinity/300s/vm-lkp-wsx03-openwrt-i386/openwrt-i386-2016-03-16.cgz/i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS/gcc-6/c470abd4fde40ea6a0846a2beab642a578c0b8cd'
export user='lkp'
export 
result_root='/result/trinity/300s/vm-lkp-wsx03-openwrt-i386/openwrt-i386-2016-03-16.cgz/i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS/gcc-6/c470abd4fde40ea6a0846a2beab642a578c0b8cd/0'
export LKP_SERVER='inn'
export max_uptime=1500
export initrd='/osimage/openwrt/openwrt-i386-2016-03-16.cgz'
export bootloader_append='root=/dev/ram0
user=lkp
job=/lkp/scheduled/vm-lkp-wsx03-openwrt-i386-5/trinity-300s-openwrt-i386-2016-03-16.cgz-c470abd4fde40ea6a0846a2beab642a578c0b8cd-20171109-63433-kf9gj3-wait_kernel-0.yaml
ARCH=i386
kconfig=i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS
branch=linus/master
commit=c470abd4fde40ea6a0846a2beab642a578c0b8cd
BOOT_IMAGE=/pkg/linux/i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS/gcc-6/c470abd4fde40ea6a0846a2beab642a578c0b8cd/vmlinuz-4.10.0
max_uptime=1500
RESULT_ROOT=/result/trinity/300s/vm-lkp-wsx03-openwrt-i386/openwrt-i386-2016-03-16.cgz/i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS/gcc-6/c470abd4fde40ea6a0846a2beab642a578c0b8cd/0
LKP_SERVER=inn
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_loglevel
console=tty0
earlyprintk=ttyS0,115200
console=ttyS0,115200
vga=normal
rw'
export lkp_initrd='/lkp/lkp/lkp-i386.cgz'
export bm_initrd='/osimage/pkg/static/trinity-i386.cgz'
export site='inn'
export LKP_CGI_PORT=80
export LKP_CIFS_PORT=139
export 
vmlinux_file='/pkg/linux/i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS/gcc-6/c470abd4fde40ea6a0846a2beab642a578c0b8cd/vmlinux'
export 
kernel='/pkg/linux/i386-randconfig-b0-11061302-CONFIG_DRM_BOCHS/gcc-6/c470abd4fde40ea6a0846a2beab642a578c0b8cd/vmlinuz-4.10.0'
export dequeue_time='2017-11-09 09:06:15 +0800'
export 
job_initrd

Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

On Thu, Nov 09, 2017 at 12:09:45PM +0800, Fengguang Wu wrote:

On Thu, Nov 09, 2017 at 11:12:06AM +0800, Fengguang Wu wrote:

Hi Alex,


So looking over the trace the panic seems to be happening after a
decnet interface is getting deleted. Is there any chance we could try
compiling the kernel without decnet support to see if that is the
source of these issues? I don't know if anyone on the Intel Wired Lan
team is testing with that enabled so if we can eliminate that as a
possible cause that would be useful.


Sure and thank you for the suggestion!

It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.

Please press Enter to activate this console.
[ 1291.938326] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
[ 1297.731690] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1297.828227] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1300.506245] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1302.467460] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-10, MAC , kernel 4.13.0 1, serial 
console /dev/ttyS0
[ 1304.161688] Kernel tests: Boot OK!
[ 1306.558532] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1308.507499] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1310.526380] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1311.246017] LKP: waiting for network...
[ 1312.543432] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1313.985807]
[ 1313.991541] =
[ 1314.002398] WARNING: bad unlock balance detected!
[ 1314.013154] 4.13.0 #1 Not tainted
[ 1314.021549] -
[ 1314.032505] procd/1244 is trying to release lock (rcu_preempt_state) at:
[ 1314.047216] [] rcu_read_unlock_special+0x580/0x5b0
[ 1314.059825] but there are no more locks to release!
[ 1314.070546]
[ 1314.070546] other info that might help us debug this:
[ 1314.085941] 2 locks held by procd/1244:
[ 1314.095139]  #0:  (>cred_guard_mutex){..}, at: [] 
prepare_bprm_creds+0x28/0xc0
[ 1314.114616]  #1:  (rcu_read_lock){..}, at: [] 
path_init+0x490/0x6f0
[ 1314.132155]
[ 1314.132155] stack backtrace:
[ 1314.144402] CPU: 0 PID: 1244 Comm: procd Not tainted 4.13.0 #1
[ 1314.160197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[ 1314.179404] Call Trace:
[ 1314.186768]  dump_stack+0x16/0x1c
[ 1314.195387]  print_unlock_imbalance_bug+0xb9/0xd0
[ 1314.205753]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.216381]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.226982]  lock_release+0x1cc/0x490
[ 1314.235602]  ? rcu_gp_kthread_wake+0x34/0x50
[ 1314.245262]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.255724]  rt_mutex_unlock+0x1e/0xb0
[ 1314.264610]  rcu_read_unlock_special+0x580/0x5b0
[ 1314.274814]  __rcu_read_unlock+0xa7/0xb0
[ 1314.283954]  unlazy_walk+0xcf/0x1f0
[ 1314.292409]  trailing_symlink+0x349/0x4e0
[ 1314.301583]  path_openat+0x333/0x1280
[ 1314.310197]  do_filp_open+0x67/0x140
[ 1314.318696]  ? getname_kernel+0x23/0x1e0
[ 1314.327766]  ? cache_alloc_debugcheck_after+0x13a/0x2a0
[ 1314.340076]  ? getname_kernel+0x23/0x1e0
[ 1314.349179]  do_open_execat+0xab/0x2a0
[ 1314.358063]  open_exec+0x57/0x80
[ 1314.366128]  load_script+0x33c/0x3d0
[ 1314.374556]  ? kvm_sched_clock_read+0x9/0x20
[ 1314.384219]  ? sched_clock+0x9/0x10
[ 1314.392611]  ? sched_clock_cpu+0x1a/0x1e0
[ 1314.401875]  ? _raw_read_unlock+0x55/0x90
[ 1314.411080]  search_binary_handler+0xd9/0x160
[ 1314.420799]  do_execveat_common+0x8f6/0xb10
[ 1314.430334]  SyS_execve+0x1f/0x30
[ 1314.438458]  do_int80_syscall_32+0x95/0x1b0
[ 1314.447956]  entry_INT80_32+0x2f/0x2f
[ 1314.456606] EIP: 0xb7e9ab07
[ 1314.464062] EFLAGS: 0296 CPU: 0
[ 1314.472421] EAX: ffda EBX: 0807b584 ECX: bfb0fd70 EDX: 08061250
[ 1314.485257] ESI: 0807b584 EDI:  EBP: bfb0fd58 ESP: bfb0fd28
[ 1314.498024]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[ 1314.613681] hotplug-call (1244) used greatest stack depth: 6384 bytes left
[ 1314.957636] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1316.955154] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1318.197800] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1320.222754] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1321.409456] BUG: unable to handle kernel paging request at 6b6b6f4f
[ 1321.421942] IP: vlan_device_event+0x7f5/0xa40
[ 1321.431239] *pde = 


Note that this call trace is different from the ones posted in earlier
emails.

[ 1321.431267]
[ 1321.443356] Oops:  [#1] PREEMPT
[ 1321.451390] CPU: 0 PID: 798 Comm: netifd Not tainted 4.13.0 #1
[ 1321.462802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[ 1321.479743] task: cf8ae

Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

On Wed, Nov 08, 2017 at 10:34:10PM -0800, Cong Wang wrote:

On Wed, Nov 8, 2017 at 7:12 PM, Fengguang Wu <fengguang...@intel.com> wrote:

Hi Alex,


So looking over the trace the panic seems to be happening after a
decnet interface is getting deleted. Is there any chance we could try
compiling the kernel without decnet support to see if that is the
source of these issues? I don't know if anyone on the Intel Wired Lan
team is testing with that enabled so if we can eliminate that as a
possible cause that would be useful.



Sure and thank you for the suggestion!

It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.


Those decnet symbols are probably noises.


Yes it's not related to CONFIG_DECNET.


How do you reproduce it? And what is your setup? Vlan device on
top of your eth0 (e1000)?


It can basically be reproduced in one of our test machines --
lkp-wsx03, which is a Westmere EX server.

The test boots an openwrt image in QEMU and run trinity for minutes.

Here is the openwrt image:

https://github.com/0day-ci/lkp-qemu/blob/master/osimage/openwrt/openwrt-i386-2016-03-16.cgz

I don't know how openwrt deals with VLAN. :)

Thanks,
Fengguang


Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

On Thu, Nov 09, 2017 at 10:43:08AM +0800, Fengguang Wu wrote:

Of course, if it's bisectable, that would be great too.


Yes, bisect is on the way. So far it's bisecting in the 4.12 commits.


The bisect was unsuccessful due to an unrelated DRM_BOCHS oops in 4.11.
Disabling the buggy driver, I managed to reproduce the
vlan_device_event bug in 4.11. However only to find the older kernels
suffer from different kind of oops, which make the bisect troublesome.

% grep -E 'dmesg.(BUG|EIP)' v4.*/matrix.json
v4.10/matrix.json:  "dmesg.BUG:unable_to_handle_kernel": [
v4.10/matrix.json:  "dmesg.EIP:kobject_get": [
v4.10/matrix.json:  "dmesg.BUG:kernel_reboot-without-warning_in_test_stage": [
v4.11/matrix.json:  "dmesg.BUG:unable_to_handle_kernel": [
v4.11/matrix.json:  "dmesg.EIP:vlan_device_event": [
v4.8/matrix.json:  
"dmesg.BUG:kernel_reboot-without-warning_in_early-boot_stage,last_printk:Decompressing_Linux":
 [
v4.9/matrix.json:  "dmesg.BUG:key_not_in.data": [

I'll try tuning kconfig to get a good bisect base.

Hopefully we'll make the coming RC releases free from such bisect
stoppers with better regression tracking and bisect automation.


However if the situation turns out to be not that optimistic, we might
track "bisect stopper" kconfig options over kernel releases. For
bisect scripts to auto pick them up.

An example kconfig bisect-blacklist file would be:

   $ cat tools/testing/configs/bisect-4.12.config
   # CONFIG_DRM_BOCHS is not set

That would mean:

   CONFIG_DRM_BOCHS will reliably stop kernel from booting before 4.12.
   If ever bisecting into pre-4.12 kernels, better disable it.

The best fit would be the uncommon drivers and optional features.

Warnings are typically not bisect stoppers so can be ignored. Kernel
panics that happen rare enough to impact bisect may also be ignored.

Thanks,
Fengguang


Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

On Thu, Nov 09, 2017 at 11:12:06AM +0800, Fengguang Wu wrote:

Hi Alex,


So looking over the trace the panic seems to be happening after a
decnet interface is getting deleted. Is there any chance we could try
compiling the kernel without decnet support to see if that is the
source of these issues? I don't know if anyone on the Intel Wired Lan
team is testing with that enabled so if we can eliminate that as a
possible cause that would be useful.


Sure and thank you for the suggestion!

It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.

Please press Enter to activate this console.
[ 1291.938326] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
[ 1297.731690] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1297.828227] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1300.506245] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1302.467460] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-10, MAC , kernel 4.13.0 1, serial 
console /dev/ttyS0
[ 1304.161688] Kernel tests: Boot OK!
[ 1306.558532] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1308.507499] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1310.526380] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1311.246017] LKP: waiting for network...
[ 1312.543432] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1313.985807]
[ 1313.991541] =
[ 1314.002398] WARNING: bad unlock balance detected!
[ 1314.013154] 4.13.0 #1 Not tainted
[ 1314.021549] -
[ 1314.032505] procd/1244 is trying to release lock (rcu_preempt_state) at:
[ 1314.047216] [] rcu_read_unlock_special+0x580/0x5b0
[ 1314.059825] but there are no more locks to release!
[ 1314.070546]
[ 1314.070546] other info that might help us debug this:
[ 1314.085941] 2 locks held by procd/1244:
[ 1314.095139]  #0:  (>cred_guard_mutex){..}, at: [] 
prepare_bprm_creds+0x28/0xc0
[ 1314.114616]  #1:  (rcu_read_lock){..}, at: [] 
path_init+0x490/0x6f0
[ 1314.132155]
[ 1314.132155] stack backtrace:
[ 1314.144402] CPU: 0 PID: 1244 Comm: procd Not tainted 4.13.0 #1
[ 1314.160197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[ 1314.179404] Call Trace:
[ 1314.186768]  dump_stack+0x16/0x1c
[ 1314.195387]  print_unlock_imbalance_bug+0xb9/0xd0
[ 1314.205753]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.216381]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.226982]  lock_release+0x1cc/0x490
[ 1314.235602]  ? rcu_gp_kthread_wake+0x34/0x50
[ 1314.245262]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.255724]  rt_mutex_unlock+0x1e/0xb0
[ 1314.264610]  rcu_read_unlock_special+0x580/0x5b0
[ 1314.274814]  __rcu_read_unlock+0xa7/0xb0
[ 1314.283954]  unlazy_walk+0xcf/0x1f0
[ 1314.292409]  trailing_symlink+0x349/0x4e0
[ 1314.301583]  path_openat+0x333/0x1280
[ 1314.310197]  do_filp_open+0x67/0x140
[ 1314.318696]  ? getname_kernel+0x23/0x1e0
[ 1314.327766]  ? cache_alloc_debugcheck_after+0x13a/0x2a0
[ 1314.340076]  ? getname_kernel+0x23/0x1e0
[ 1314.349179]  do_open_execat+0xab/0x2a0
[ 1314.358063]  open_exec+0x57/0x80
[ 1314.366128]  load_script+0x33c/0x3d0
[ 1314.374556]  ? kvm_sched_clock_read+0x9/0x20
[ 1314.384219]  ? sched_clock+0x9/0x10
[ 1314.392611]  ? sched_clock_cpu+0x1a/0x1e0
[ 1314.401875]  ? _raw_read_unlock+0x55/0x90
[ 1314.411080]  search_binary_handler+0xd9/0x160
[ 1314.420799]  do_execveat_common+0x8f6/0xb10
[ 1314.430334]  SyS_execve+0x1f/0x30
[ 1314.438458]  do_int80_syscall_32+0x95/0x1b0
[ 1314.447956]  entry_INT80_32+0x2f/0x2f
[ 1314.456606] EIP: 0xb7e9ab07
[ 1314.464062] EFLAGS: 0296 CPU: 0
[ 1314.472421] EAX: ffda EBX: 0807b584 ECX: bfb0fd70 EDX: 08061250
[ 1314.485257] ESI: 0807b584 EDI:  EBP: bfb0fd58 ESP: bfb0fd28
[ 1314.498024]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[ 1314.613681] hotplug-call (1244) used greatest stack depth: 6384 bytes left
[ 1314.957636] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1316.955154] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1318.197800] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1320.222754] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[ 1321.409456] BUG: unable to handle kernel paging request at 6b6b6f4f
[ 1321.421942] IP: vlan_device_event+0x7f5/0xa40
[ 1321.431239] *pde = 


Note that this call trace is different from the ones posted in earlier
emails.

[ 1321.431267]
[ 1321.443356] Oops:  [#1] PREEMPT
[ 1321.451390] CPU: 0 PID: 798 Comm: netifd Not tainted 4.13.0 #1
[ 1321.462802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[ 1321.479743] task: cf8ae5c0 task.stack: cf114000
[ 1321.489345] EIP: vlan_device_event+0x7f5/0x

Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

Of course, if it's bisectable, that would be great too.


Yes, bisect is on the way. So far it's bisecting in the 4.12 commits.


The bisect was unsuccessful due to an unrelated DRM_BOCHS oops in 4.11.
Disabling the buggy driver, I managed to reproduce the
vlan_device_event bug in 4.11. However only to find the older kernels
suffer from different kind of oops, which make the bisect troublesome.

% grep -E 'dmesg.(BUG|EIP)' v4.*/matrix.json
v4.10/matrix.json:  "dmesg.BUG:unable_to_handle_kernel": [
v4.10/matrix.json:  "dmesg.EIP:kobject_get": [
v4.10/matrix.json:  "dmesg.BUG:kernel_reboot-without-warning_in_test_stage": [
v4.11/matrix.json:  "dmesg.BUG:unable_to_handle_kernel": [
v4.11/matrix.json:  "dmesg.EIP:vlan_device_event": [
v4.8/matrix.json:  
"dmesg.BUG:kernel_reboot-without-warning_in_early-boot_stage,last_printk:Decompressing_Linux":
 [
v4.9/matrix.json:  "dmesg.BUG:key_not_in.data": [

I'll try tuning kconfig to get a good bisect base.

Hopefully we'll make the coming RC releases free from such bisect
stoppers with better regression tracking and bisect automation.

Regards,
Fengguang


Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

[...]

I notice that this trace shows no additional inline files at all.
Is it because I did some kconfig option wrong, so that inline info is
lost? Eg.

CONFIG_OPTIMIZE_INLINING=y (reading lib/Kconfig.debug, it looks better set to N)
CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_DEBUG_INFO_SPLIT=y
(full .config attached)


FYI it's randconfig testing, so may well contain absurd settings --
unless we add rules to force enable/disable some specific options to
make life easier.

Regards,
Fengguang


[  745.719623] BUG: unable to handle kernel paging request at 6b6b6f4f
[  745.732871] IP: vlan_device_event+0x7f5/0xa40:
vlan_device_event at net/8021q/vlan.h:60
[  745.742106] *pde = 
[  745.748587] Oops:  [#1] PREEMPT
[  745.756104] CPU: 0 PID: 786 Comm: netifd Not tainted 4.14.0-rc8 #1
[  745.769171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[  745.786791] task: cf768780 task.stack: d187a000
[  745.796485] EIP: vlan_device_event+0x7f5/0xa40:
vlan_device_event at net/8021q/vlan.h:60
[  745.805877] EFLAGS: 00010206 CPU: 0
[  745.813237] EAX: 00f9 EBX: 0002 ECX:  EDX: 6b6b6b6b
[  745.825774] ESI: 02f9 EDI: d1de3700 EBP: d187bdd8 ESP: d187bda4
[  745.838871]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[  745.850218] CR0: 80050033 CR2: 6b6b6f4f CR3: 0f4c8000 CR4: 0690
[  745.862750] Call Trace:
[  745.868650]  ? dn_dev_delete+0x138/0x1d0:
dn_dev_delete at net/decnet/dn_dev.c:1224
[  745.876751]  ? dn_dev_down+0x69/0x80:
dn_dev_down at net/decnet/dn_dev.c:1240
[  745.885084]  notifier_call_chain+0x4e/0xa0:
notifier_call_chain at kernel/notifier.c:95 (discriminator 1)
[  745.894254]  raw_notifier_call_chain+0xc/0x10:
raw_notifier_call_chain at kernel/notifier.c:402
[  745.903979]  call_netdevice_notifiers_info+0x59/0x90:
call_netdevice_notifiers_info at net/core/dev.c:1672
[  745.914670]  __dev_notify_flags+0xea/0x130:
__dev_notify_flags at net/core/dev.c:1687
[  745.923446]  dev_change_flags+0x60/0x70:
dev_change_flags at net/core/dev.c:6813
[  745.931679]  dev_ifsioc+0x49b/0x610:
dev_ifsioc at net/core/dev_ioctl.c:257
[  745.939102]  ? mutex_lock_nested+0x14/0x20:
mutex_lock_nested at kernel/locking/mutex.c:909
[  745.948173]  dev_ioctl+0x36f/0xb20:
dev_ioctl at net/core/dev_ioctl.c:566
[  745.956154]  sock_ioctl+0x1cd/0x350:
sock_ioctl at net/socket.c:968
[  745.964313]  ? sock_fasync+0xb0/0xb0:
sock_ioctl at net/socket.c:984
[  745.972512]  vfs_ioctl+0x33/0x70:
vfs_ioctl at fs/ioctl.c:47
[  745.979867]  do_vfs_ioctl+0x8d/0xc60:
do_vfs_ioctl at fs/ioctl.c:690
[  745.987782]  ? kmem_cache_free+0x186/0x290:
kmem_cache_free at include/linux/rcupdate.h:777
[  745.996138]  ? putname+0x9f/0xe0:
putname at fs/namei.c:259
[  746.003434]  ? putname+0x9f/0xe0:
putname at fs/namei.c:259
[  746.011240]  ? do_sys_open+0x28d/0x420:
do_sys_open at fs/open.c:1069
[  746.019728]  ? __fget_light+0xb7/0xf0:
__fget_light at fs/file.c:739 (discriminator 2)
bad symbol size: base: 0xc1277040 end: 0xc1277040
[  746.029292]  SyS_ioctl+0x98/0xb0
[  746.036680]  do_int80_syscall_32+0x95/0x290:
do_int80_syscall_32 at arch/x86/entry/common.c:329
[  746.045685]  entry_INT80_32+0x2f/0x2f:
restore_all at arch/x86/entry/entry_32.S:536
[  746.053427] EIP: 0xb7e97648
[  746.059907] EFLAGS: 0246 CPU: 0
[  746.068336] EAX: ffda EBX: 0005 ECX: 8914 EDX: bfcaa350
[  746.081238] ESI: bfcaa350 EDI: bfcaa370 EBP: bfcaa388 ESP: bfcaa31c
[  746.093449]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[  746.103600] Code: 8d e0 db b4 c4 8d 56 01 89 14 8d e0 db b4 c4 0f 85 03 02 00 00 
89 7d d4 31 f6 8b 7d d8 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 
d2 85 db 0f 95 c2 8b 04 95 cc db b4 c4 83 c0 01 85
[  746.140089] EIP: vlan_device_event+0x7f5/0xa40:
vlan_device_event at net/8021q/vlan.h:60 SS:ESP: 0068:d187bda4
[  746.153505] CR2: 6b6b6f4f
[  746.413297] Kernel tests: Boot OK!
[  748.237463] ---[ end trace 40505af7d815b57d ]---
[  748.281157] Kernel panic - not syncing: Fatal exception


Anyway, it does all mean that the use-after-free almost certainly is
that "array[]" access, which was the main suspect to begin with.

I'm adding Jeff Kirsher and the e1000 list to the cc, since apparently
the e1000 driver is involved. Of course, all the stack traces are in
generic networking code, so it is possibly a generic networking issue,
but it would probably be good to have a few people look at it.

But all of this code is ancient. So it's almost certainly not a new
bug, whatever it is.

But as a test-case for your faddr2line improvements, this is excellent!

  Linus



Regards,
Fengguang



#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.14.0-rc8 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8

Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-08 Thread Fengguang Wu

Hi Linus,

On Wed, Nov 08, 2017 at 08:20:38AM -0800, Linus Torvalds wrote:

On Wed, Nov 8, 2017 at 1:48 AM, Fengguang Wu <fengguang...@intel.com> wrote:


Now I got the faddr2line output. :)


Thank you, but this also shows that you then compress the output too
much for convenience.


[  745.719623] BUG: unable to handle kernel paging request at 6b6b6f4f
[  745.732871] IP: vlan_device_event at net/8021q/vlan.h:60


So this looks more legible than "vlan_device_event+0x7f5/0xa40" (or
whatever), but in fact it's not.

The reason faddr2line is great is because it gives inlining
information, so that you can see exactly which access it is, even if
it's inside some helper inline function (macros still obviously are
going to be problematic).

And you've cut the whole information down, to the point where there is
_less_ information than there used to be.

So the full faddr2line output is actually important, and should look
something like this:

   __schedule+0x315/0x840:
   rq_sched_info_arrive at kernel/sched/stats.h:13
(inlined by) sched_info_arrive at kernel/sched/stats.h:99
(inlined by) __sched_info_switch at kernel/sched/stats.h:151
(inlined by) sched_info_switch at kernel/sched/stats.h:158
(inlined by) prepare_task_switch at kernel/sched/core.c:2582
(inlined by) context_switch at kernel/sched/core.c:2755
(inlined by) __schedule at kernel/sched/core.c:3366

which is admittedly a fairly extreme case, but it shows how involved
things can be.


Indeed! I wasn't aware that faddr2line could have so informative
output! After checking several of its outputs, I decided to pipe its
output to |tail -1 :/  That's clearly due to my lacking of first hand
experience on oops analyzing!

IMHO this kind of informative demo in your email could be very good
candidates to put in Documentation/. If you search faddr2line under
Documentation there's no single mention of it (addr2line does show up
7 times).


Note how in my example above the access itself is from a header file
(that inlined rq_sched_info_arrive() function), and I happened to pick
the

   rq->rq_sched_info.pcount++;

line. But Many inline functions are inlined from different points,
often many times in the same top-level function (think of the atomic
helper inlines, for example).

In your case, that net/8021q/vlan.h:60 information is literally not
helping, because it only shows what I could already figure out from
looking at the constants in the disassembly: the code comes from the

   vlan_group_for_each_dev(...) {
   ..

loop setup (it's the inline __vlan_group_get_device() function, and
the constants I could recognize are the VLAN_GROUP_ARRAY_PART_LEN
things.

But exactly like the constants didn't tell me *which* invocation of
that loop it was, your "net/8021q/vlan.h:60" doesn't tell me which one
it is.

There's at least _eight_ different "vlan_group_for_each_dev() {" loops
in vlan_device_event(), and maybe it's not all that meaningful which
of the eight it is in this case, in other cases it's critical.

And since you've actually replaced the "vlan_device_event+0x7f5/0xa40"
with that "net/8021q/vlan.h:60" entirely, all the offset information
that *could* maybe have been used to pick one case over another (but
likely not) is also gone.

That's why that inlining chain is important - so that we can see how
it got to that access in __vlan_group_get_device().

So please do keep _all_ of the faddr2line output, at least for the
"IP:" line (the stack traces might not be worth it).


OK. Here is the original faddr2line output:

$ ~/linux/scripts/faddr2line vmlinux vlan_device_event+0x7f5/0xa40
vlan_device_event+0x7f5/0xa40:
vlan_device_event at net/8021q/vlan.h:60

And below is call trace embedded with full faddr2line output.

I notice that this trace shows no additional inline files at all.
Is it because I did some kconfig option wrong, so that inline info is
lost? Eg.

CONFIG_OPTIMIZE_INLINING=y (reading lib/Kconfig.debug, it looks better set to N)
CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_DEBUG_INFO_SPLIT=y
(full .config attached)

[  745.719623] BUG: unable to handle kernel paging request at 6b6b6f4f
[  745.732871] IP: vlan_device_event+0x7f5/0xa40:
vlan_device_event at net/8021q/vlan.h:60
[  745.742106] *pde = 
[  745.748587] Oops:  [#1] PREEMPT
[  745.756104] CPU: 0 PID: 786 Comm: netifd Not tainted 4.14.0-rc8 #1
[  745.769171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[  745.786791] task: cf768780 task.stack: d187a000
[  745.796485] EIP: vlan_device_event+0x7f5/0xa40:
vlan_device_event at net/8021q/vlan.h:60
[  745.805877] EFLAGS: 00010206 CPU: 0
[  745.813237] EAX: 00f9 EBX: 0002 ECX:  EDX: 6b6b6b6b
[  745.825774] ESI: 02f9 EDI: d1de3700 EBP: d187bdd8 ESP: d187bda4
[  745.838871]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[  745.850218] CR0: 80050033 CR2: 6b6b6f4f

Re: [mdiobus_free] BUG: KASAN: slab-out-of-bounds in _copy_from_user+0x5d/0x8f

2017-11-07 Thread Fengguang Wu

On Tue, Nov 07, 2017 at 08:04:25AM -0800, Linus Torvalds wrote:

On Tue, Nov 7, 2017 at 4:15 AM, Fengguang Wu <fengguang...@intel.com> wrote:

Sorry please ignore this report -- according to Andrey, old gcc may
well generate false KASAN reports.


Oh wow, this gcc is even older than the other one that caused slob problems..


Linux version 4.14.0-rc8 (kbuild@intel11) (gcc version 4.6.4 (Debian 4.6.4-7))


Yeah, please do upgrade your compilers to something from this decade.


Sure. I'll drop gcc 4.4, 4.6, 4.8 testing.

Fengguang


Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-07 Thread Fengguang Wu

On Tue, Nov 07, 2017 at 08:25:03AM -0800, Linus Torvalds wrote:

On Tue, Nov 7, 2017 at 2:21 AM, Fengguang Wu <fengguang...@intel.com> wrote:


FYI this happens in v4.14-rc8 -- it's not necessarily a new bug.


Probably not.

Looks like a use-after-free bug in vlan_device_event() judging by the
base pointer:

   ECX: 6b6b6b6b

this is one of those circumstances where having the faddr2line output
for that EIP would make it much easier to see exactly which access it
is that causes problems. There's lots of inlining going on, so without
that it's a pain to figure out.

The code is

  0: 31 c0xor%eax,%eax
  2: 8d 76 00  lea0x0(%esi),%esi
  5: 89 c2mov%eax,%edx
  7: 89 c3mov%eax,%ebx
  9: 81 e2 ff 0f 00 00and$0xfff,%edx
  f: 89 d1mov%edx,%ecx
 11: c1 fb 0c  sar$0xc,%ebx
 14: c1 e9 09  shr$0x9,%ecx
 17: 8d 0c d9  lea(%ecx,%ebx,8),%ecx
 1a: 8b 4c 8e 10  mov0x10(%esi,%ecx,4),%ecx
 1e: 85 c9test   %ecx,%ecx
 20: 74 34je 0x56
 22: 81 e2 ff 01 00 00and$0x1ff,%edx
 28:* 8b 14 91  mov(%ecx,%edx,4),%edx <-- trapping instruction
 2b: 85 d2test   %edx,%edx
 2d: 74 27je 0x56
 2f: f6 82 30 01 00 00 01 testb  $0x1,0x130(%edx)
 36: 74 1eje 0x56

and just by going by the constants in question (0xfff and 0x1ff), I
can see that it's one of

   vlan_group_for_each_dev(..) {
   ...
   }

things, but that's pretty much all I can tell.

Apparently we'll get that faddr2line output soon. In the meantime, I
think this is a real bug report but I don't see enough information to
really go on.


Got it. I should be able to get faddr2line output tomorrow.


Of course, if it's bisectable, that would be great too.


It looks reproducible enough to be bisectable. I'll try.

Regards,
Fengguang


Re: [sock_def_readable] WARNING: bad unlock balance detected!

2017-11-07 Thread Fengguang Wu

Sorry please ignore this report -- according to Peter:

This is fixed by commit:

 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints")

The problem is that RCU boosting was mixing futex and !futex rt_mutex
ops.

On Tue, Nov 07, 2017 at 12:34:15PM +0800, Fengguang Wu wrote:

FYI, this warning shows up in both v4.14-rc8 and v4.13:

[  144.578809] br-lan: port 1(eth0) entered disabled state
[  144.581360] device eth0 left promiscuous mode
[  144.582699] br-lan: port 1(eth0) entered disabled state
[  144.685012]
[  144.685370] =
[  144.686093] WARNING: bad unlock balance detected!
[  144.686816] 4.14.0-rc8 #158 Not tainted
[  144.687420] -
[  144.688132] ubus/19703 is trying to release lock (rcu_preempt_state) at:
[  144.688911] [] rcu_read_unlock_special+0x5f8/0x620
[  144.689621] but there are no more locks to release!
[  144.690374]
[  144.690374] other info that might help us debug this:
[  144.691407] 1 lock held by ubus/19703:
[  144.692009]  #0:  (rcu_read_lock){}, at: [] 
sock_def_readable+0x0/0x100
[  144.693212]
[  144.693212] stack backtrace:
[  144.693980] CPU: 0 PID: 19703 Comm: ubus Not tainted 4.14.0-rc8 #158
[  144.694923] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[  144.696168] Call Trace:
[  144.696632]  dump_stack+0x16/0x1c
[  144.697195]  print_unlock_imbalance_bug+0xb9/0xd0
[  144.697940]  ? rcu_read_unlock_special+0x5f8/0x620
[  144.698673]  ? rcu_read_unlock_special+0x5f8/0x620
[  144.699390]  lock_release+0x1cc/0x490
[  144.60]  ? rcu_gp_kthread_wake+0x34/0x50
[  144.700678]  ? rcu_read_unlock_special+0x5f8/0x620
[  144.701431]  rt_mutex_unlock+0x1e/0xb0
[  144.702048]  rcu_read_unlock_special+0x5f8/0x620
[  144.702774]  __rcu_read_unlock+0xa7/0xb0
[  144.703422]  sock_def_readable+0xd1/0x100
[  144.704073]  unix_stream_sendmsg+0x2d1/0x4d0
[  144.704755]  sock_sendmsg+0x2d/0x70
[  144.705334]  ___sys_sendmsg+0x390/0x3a0
[  144.705963]  ? kvm_clock_read+0x31/0x80
[  144.706569]  ? kvm_sched_clock_read+0x9/0x20
[  144.707250]  ? sched_clock+0x9/0x10
[  144.707834]  ? sched_clock_cpu+0x1a/0x1e0
[  144.708473]  ? pvclock_clocksource_read+0xd5/0x230
[  144.709216]  ? kvm_clock_read+0x31/0x80
[  144.709850]  ? kvm_sched_clock_read+0x9/0x20
[  144.710526]  ? sched_clock+0x9/0x10
[  144.711082]  ? sched_clock_cpu+0x1a/0x1e0
[  144.711714]  ? __fget_light+0xb7/0xf0
[  144.712301]  ? sockfd_lookup_light+0xd3/0x120
[  144.713002]  __sys_sendmsg+0x59/0x90
[  144.713585]  SyS_socketcall+0x9e4/0xb20
[  144.714204]  do_int80_syscall_32+0x95/0x290
[  144.714883]  entry_INT80_32+0x2f/0x2f
[  144.715491] EIP: 0xb7f3e595
[  144.715985] EFLAGS: 0292 CPU: 0
[  144.716566] EAX: ffda EBX: 0010 ECX: b678 EDX: b7f83ff4
[  144.717474] ESI: b678 EDI: b6c4 EBP: b708 ESP: b668
[  144.718374]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
procd: - reboot -
[  145.174700] Unregister pv shared memory for cpu 0

Here is another one, called from unix_stream_connect():

Please press Enter to activate this console.
[   46.373661] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[   46.376639] 8021q: adding VLAN 0 to HW filter on device eth0
[   46.507035]
[   46.507329] =
[   46.507940] WARNING: bad unlock balance detected!
[   46.508543] 4.14.0-rc8 #158 Not tainted
[   46.509048] -
[   46.509648] fw3/1185 is trying to release lock (rcu_preempt_state) at:
[   46.510452] [] rcu_read_unlock_special+0x5f8/0x620
[   46.511147] but there are no more locks to release!
[   46.511766]
[   46.511766] other info that might help us debug this:
[   46.512591] 1 lock held by fw3/1185:
[   46.513070]  #0:  (rcu_read_lock){}, at: [] 
sock_def_readable+0x0/0x100
[   46.514074]
[   46.514074] stack backtrace:
[   46.514684] CPU: 0 PID: 1185 Comm: fw3 Not tainted 4.14.0-rc8 #158
[   46.515440] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   46.516463] Call Trace:
[   46.516831]  dump_stack+0x16/0x1c
[   46.517288]  print_unlock_imbalance_bug+0xb9/0xd0
[   46.517896]  ? rcu_read_unlock_special+0x5f8/0x620
[   46.518512]  ? rcu_read_unlock_special+0x5f8/0x620
[   46.519113]  lock_release+0x1cc/0x490
[   46.519604]  ? rcu_read_unlock_special+0x5f8/0x620
[   46.520208]  ? _raw_spin_unlock_irqrestore+0x86/0xd0
[   46.520840]  rt_mutex_unlock+0x1e/0xb0
[   46.521341]  rcu_read_unlock_special+0x5f8/0x620
[   46.521936]  __rcu_read_unlock+0xa7/0xb0
[   46.522460]  sock_def_readable+0xd1/0x100
[   46.522993]  unix_stream_connect+0x633/0x680
[   46.523552]  SYSC_connect+0x107/0x120
[   46.524046]  SyS_socketcall+0x384/0xb20
[   46.524562]  ? SyS_fcntl64+0x103/0x2e0
[   46.525060]  do_int80_syscall_32+0x95/0x290
[   46.525610]  entry_INT80_32+0x2f/0x2f
[   46.526099] EIP: 0xb7e97384
[   46.526505] EFLAGS: 0296 CPU: 0
[   4

[rht_deferred_worker] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 33s!

2017-11-07 Thread Fengguang Wu
FYI another workqueue lockup on rhashtable testing.

[  330.583179]   Deleting 5 keys
[  339.302251]   Duration of test: 26623682510 ns
[  339.310019] Average test time: 21144321486
[  339.311680] Testing concurrent rhashtable access from 10 threads
[  367.586868] Writes:  Total: 2  Max/Min: 0/0   Fail: 0 
[  406.501055] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
stuck for 33s!
[  406.503708] Showing busy workqueues and worker pools:
[  406.510695] workqueue events: flags=0x0
[  406.517288]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256
[  406.519526] in-flight: 3:rht_deferred_worker
[  406.520454] pending: vmstat_shepherd, rht_deferred_worker

Attached the full dmesg and kconfig.

Thanks,
Fengguang
early console in setup code
[0.00] Linux version 4.14.0-rc8 (kbuild@lkp-wsm-ep2) (gcc version 6.2.0 
20160901 (Debian 6.2.0-3)) #88 SMP Tue Nov 7 06:44:05 CST 2017
[0.00] Command line: ip=vm-lkp-os-openwrt-ia32-11::dhcp 
root=/dev/ram0 user=lkp 
job=/lkp/scheduled/vm-lkp-os-openwrt-ia32-11/boot-1-openwrt-i386-2016-03-16.cgz-39dae59d66acd86d1de24294bd2f343fd5e7a625-20171107-3579-gel5jo-0.yaml
 ARCH=x86_64 kconfig=x86_64-randconfig-ws0-11070630 branch=linus/master 
commit=39dae59d66acd86d1de24294bd2f343fd5e7a625 
BOOT_IMAGE=/pkg/linux/x86_64-randconfig-ws0-11070630/gcc-6/39dae59d66acd86d1de24294bd2f343fd5e7a625/vmlinuz-4.14.0-rc8
 max_uptime=600 
RESULT_ROOT=/result/boot/1/vm-lkp-os-openwrt-ia32/openwrt-i386-2016-03-16.cgz/x86_64-randconfig-ws0-11070630/gcc-6/39dae59d66acd86d1de24294bd2f343fd5e7a625/0
 LKP_SERVER=inn debug apic=debug sysrq_always_enabled 
rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 
softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel 
console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw 
drbd.minor_c
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x1a3d] usable
[0.00] BIOS-e820: [mem 0x1a3e-0x1a3f] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] debug: ignoring loglevel setting.
[0.00] NX (Execute Disable) protection: active
[0.00] random: fast init done
[0.00] SMBIOS 2.8 present.
[0.00] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 
04/01/2014
[0.00] tsc: Unable to calibrate against PIT
[0.00] tsc: No reference (HPET/PMTIMER) available
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x1a3e0 max_arch_pfn = 0x4
[0.00] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC  
[0.00] Scan for SMP in [mem 0x-0x03ff]
[0.00] Scan for SMP in [mem 0x0009fc00-0x0009]
[0.00] Scan for SMP in [mem 0x000f-0x000f]
[0.00] found SMP MP-table at [mem 0x000f6aa0-0x000f6aaf] mapped at 
[ff200aa0]
[0.00]   mpc: f6ab0-f6b80
[0.00] Base memory trampoline at [88099000] 99000 size 24576
[0.00] BRK [0x15a08000, 0x15a08fff] PGTABLE
[0.00] BRK [0x15a09000, 0x15a09fff] PGTABLE
[0.00] BRK [0x15a0a000, 0x15a0afff] PGTABLE
[0.00] BRK [0x15a0b000, 0x15a0bfff] PGTABLE
[0.00] RAMDISK: [mem 0x1a11b000-0x1a3d]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000F68D0 14 (v00 BOCHS )
[0.00] ACPI: RSDT 0x1A3E15CE 30 (v01 BOCHS  BXPCRSDT 
0001 BXPC 0001)
[0.00] ACPI: FACP 0x1A3E142A 74 (v01 BOCHS  BXPCFACP 
0001 BXPC 0001)
[0.00] ACPI: DSDT 0x1A3E0040 0013EA (v01 BOCHS  BXPCDSDT 
0001 BXPC 0001)
[0.00] ACPI: FACS 0x1A3E 40
[0.00] ACPI: APIC 0x1A3E151E 78 (v01 BOCHS  BXPCAPIC 
0001 BXPC 0001)
[0.00] ACPI: HPET 0x1A3E1596 38 (v01 BOCHS  BXPCHPET 
0001 BXPC 0001)
[0.00] ACPI: Local APIC address 0xfee0
[0.00] mapped APIC to ff5fd000 (fee0)
[0.00] No NUMA configuration found
[0.00] Faking a node at [mem 0x-0x1a3d]
[0.00] NODE_DATA(0) allocated [mem 0x1a117000-0x1a11afff]
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x1000-0x1a3d]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[ 

[rht_deferred_worker] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:15]

2017-11-07 Thread Fengguang Wu
Hello,

FYI this happens in v4.14-rc8 -- it's not necessarily a new bug.

[  485.097496] rcu-torture: Reader Pipe:  2 0 0 0 0 0 0 0 0 0 0
[  485.317082] rcu-torture: Reader Batch:  0 2 0 0 0 0 0 0 0 0 0
[  485.809530] rcu-torture: Free-Block Circulation:  0 0 0 0 0 0 0 0 0 0 0
[  486.097071] ??? Writer stall state RTWS_STUTTER(8) g0 c0 f0x0 ->state 0x1 
cpu 0
[  523.211468] hrtimer: interrupt took 7942680 ns
[  551.200303] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[kworker/0:1:15]
[  551.200319] irq event stamp: 3267541
[  551.200319] hardirqs last  enabled at (3267539): [] 
__local_bh_enable_ip+0x14e/0x15c
[  551.200319] hardirqs last disabled at (3267541): [] 
apic_timer_interrupt+0xab/0xc0
[  551.200319] softirqs last  enabled at (3267538): [] 
rht_deferred_worker+0xb5a/0xd40
[  551.200319] softirqs last disabled at (3267540): [] 
rht_deferred_worker+0x657/0xd40
[  551.200319] CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 4.14.0-rc8 #22
[  551.200319] Workqueue: events rht_deferred_worker
[  551.200319] task: 8800186023c0 task.stack: 880018608000
[  551.200319] RIP: 0010:arch_local_irq_restore+0x6/0xd
[  551.200319] RSP: :88001860fc80 EFLAGS: 0246 ORIG_RAX: 
ff10
[  551.200319] RAX: 1100030c0500 RBX: 8800186023c0 RCX: 8116353d
[  551.200319] RDX: 880018602a20 RSI: 0007 RDI: 0246
[  551.200319] RBP: 88001860fc80 R08: dc00 R09: fbfff0674c41
[  551.200319] R10:  R11: 833a620f R12: 880018602a24
[  551.200319] R13: 0246 R14: 84413180 R15: 
[  551.200319] FS:  () GS:8284e000() 
knlGS:
[  551.200319] CS:  0010 DS:  ES:  CR0: 80050033
[  551.200319] CR2:  CR3: 02820001 CR4: 000206b0
[  551.200319] Call Trace:
[  551.200319]  lock_is_held_type+0x83/0x94
[  551.200319]  lockdep_rht_mutex_is_held+0x31/0x36
[  551.200319]  rht_deferred_worker+0x708/0xd40
[  551.200319]  process_one_work+0x68d/0xb7d
[  551.200319]  ? pwq_dec_nr_in_flight+0x24d/0x24d
[  551.200319]  ? ftrace_likely_update+0x249/0x26b
[  551.200319]  worker_thread+0x5ba/0x724
[  551.200319]  kthread+0x228/0x238
[  551.200319]  ? process_scheduled_works+0x43/0x43
[  551.200319]  ? __list_del_entry+0x77/0x77
[  551.200319]  ret_from_fork+0x2a/0x40
[  551.200319] Code: 49 8d 7d 60 e8 dd f4 1c 00 49 c7 45 60 00 00 00 00 5b 41 
5c 41 5d 5d c3 55 48 89 e5 9c 58 66 66 90 66 90 5d c3 55 48 89 e5 57 9d <66> 66 
90 66 90 5d c3 55 48 89 e5 e8 dd ff ff ff 48 89 c2 fa 66 
[  551.200319] Kernel panic - not syncing: softlockup: hung tasks
[  551.200319] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G L  
4.14.0-rc8 #22
[  551.200319] Workqueue: events rht_deferred_worker
[  551.200319] Call Trace:
[  551.200319]  
[  551.200319]  show_stack+0x6d/0x70
[  551.200319]  dump_stack+0x19/0x1b
[  551.200319]  panic+0x1cc/0x434
[  551.200319]  ? __warn+0x1a6/0x1a6
[  551.200319]  ? set_fs+0x1e/0x2e
[  551.200319]  ? watchdog_timer_fn+0x34b/0x389
[  551.200319]  watchdog_timer_fn+0x369/0x389
[  551.200319]  __hrtimer_run_queues+0x469/0x808
[  551.200319]  ? watchdog+0x1e/0x1e
[  551.200319]  ? retrigger_next_event+0x6f/0x6f
[  551.200319]  ? ktime_get_update_offsets_now+0x180/0x19a
[  551.200319]  hrtimer_interrupt+0xab/0x278
[  551.200319]  ? hrtimer_cancel+0x51/0x51
[  551.200319]  smp_apic_timer_interrupt+0x26d/0x466
[  551.200319]  apic_timer_interrupt+0xb0/0xc0
[  551.200319]  
[  551.200319] RIP: 0010:arch_local_irq_restore+0x6/0xd
[  551.200319] RSP: :88001860fc80 EFLAGS: 0246 ORIG_RAX: 
ff10
[  551.200319] RAX: 1100030c0500 RBX: 8800186023c0 RCX: 8116353d
[  551.200319] RDX: 880018602a20 RSI: 0007 RDI: 0246
[  551.200319] RBP: 88001860fc80 R08: dc00 R09: fbfff0674c41
[  551.200319] R10:  R11: 833a620f R12: 880018602a24
[  551.200319] R13: 0246 R14: 84413180 R15: 
[  551.200319]  ? lock_is_held_type+0x71/0x94
[  551.200319]  lock_is_held_type+0x83/0x94
[  551.200319]  lockdep_rht_mutex_is_held+0x31/0x36
[  551.200319]  rht_deferred_worker+0x708/0xd40
[  551.200319]  process_one_work+0x68d/0xb7d
[  551.200319]  ? pwq_dec_nr_in_flight+0x24d/0x24d
[  551.200319]  ? ftrace_likely_update+0x249/0x26b
[  551.200319]  worker_thread+0x5ba/0x724
[  551.200319]  kthread+0x228/0x238
[  551.200319]  ? process_scheduled_works+0x43/0x43
[  551.200319]  ? __list_del_entry+0x77/0x77
[  551.200319]  ret_from_fork+0x2a/0x40
[  551.200319] Kernel Offset: disabled

Attached the full dmesg and kconfig.

Thanks,
Fengguang
early console in setup code
Probing EDD (edd=off to disable)... ok
early console in extract_kernel
input_data: 0x04e6e255
input_len: 0x00d2f35c
output: 0x0100
output_len: 0x0298b3f8
kernel_total_size: 0x04bbd000


[sock_def_readable] WARNING: bad unlock balance detected!

2017-11-06 Thread Fengguang Wu

FYI, this warning shows up in both v4.14-rc8 and v4.13:

[  144.578809] br-lan: port 1(eth0) entered disabled state
[  144.581360] device eth0 left promiscuous mode
[  144.582699] br-lan: port 1(eth0) entered disabled state
[  144.685012] 
[  144.685370] =

[  144.686093] WARNING: bad unlock balance detected!
[  144.686816] 4.14.0-rc8 #158 Not tainted
[  144.687420] -
[  144.688132] ubus/19703 is trying to release lock (rcu_preempt_state) at:
[  144.688911] [] rcu_read_unlock_special+0x5f8/0x620
[  144.689621] but there are no more locks to release!
[  144.690374] 
[  144.690374] other info that might help us debug this:

[  144.691407] 1 lock held by ubus/19703:
[  144.692009]  #0:  (rcu_read_lock){}, at: [] 
sock_def_readable+0x0/0x100
[  144.693212] 
[  144.693212] stack backtrace:

[  144.693980] CPU: 0 PID: 19703 Comm: ubus Not tainted 4.14.0-rc8 #158
[  144.694923] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[  144.696168] Call Trace:
[  144.696632]  dump_stack+0x16/0x1c
[  144.697195]  print_unlock_imbalance_bug+0xb9/0xd0
[  144.697940]  ? rcu_read_unlock_special+0x5f8/0x620
[  144.698673]  ? rcu_read_unlock_special+0x5f8/0x620
[  144.699390]  lock_release+0x1cc/0x490
[  144.60]  ? rcu_gp_kthread_wake+0x34/0x50
[  144.700678]  ? rcu_read_unlock_special+0x5f8/0x620
[  144.701431]  rt_mutex_unlock+0x1e/0xb0
[  144.702048]  rcu_read_unlock_special+0x5f8/0x620
[  144.702774]  __rcu_read_unlock+0xa7/0xb0
[  144.703422]  sock_def_readable+0xd1/0x100
[  144.704073]  unix_stream_sendmsg+0x2d1/0x4d0
[  144.704755]  sock_sendmsg+0x2d/0x70
[  144.705334]  ___sys_sendmsg+0x390/0x3a0
[  144.705963]  ? kvm_clock_read+0x31/0x80
[  144.706569]  ? kvm_sched_clock_read+0x9/0x20
[  144.707250]  ? sched_clock+0x9/0x10
[  144.707834]  ? sched_clock_cpu+0x1a/0x1e0
[  144.708473]  ? pvclock_clocksource_read+0xd5/0x230
[  144.709216]  ? kvm_clock_read+0x31/0x80
[  144.709850]  ? kvm_sched_clock_read+0x9/0x20
[  144.710526]  ? sched_clock+0x9/0x10
[  144.711082]  ? sched_clock_cpu+0x1a/0x1e0
[  144.711714]  ? __fget_light+0xb7/0xf0
[  144.712301]  ? sockfd_lookup_light+0xd3/0x120
[  144.713002]  __sys_sendmsg+0x59/0x90
[  144.713585]  SyS_socketcall+0x9e4/0xb20
[  144.714204]  do_int80_syscall_32+0x95/0x290
[  144.714883]  entry_INT80_32+0x2f/0x2f
[  144.715491] EIP: 0xb7f3e595
[  144.715985] EFLAGS: 0292 CPU: 0
[  144.716566] EAX: ffda EBX: 0010 ECX: b678 EDX: b7f83ff4
[  144.717474] ESI: b678 EDI: b6c4 EBP: b708 ESP: b668
[  144.718374]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
procd: - reboot -
[  145.174700] Unregister pv shared memory for cpu 0

Here is another one, called from unix_stream_connect():

Please press Enter to activate this console.
[   46.373661] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[   46.376639] 8021q: adding VLAN 0 to HW filter on device eth0
[   46.507035] 
[   46.507329] =

[   46.507940] WARNING: bad unlock balance detected!
[   46.508543] 4.14.0-rc8 #158 Not tainted
[   46.509048] -
[   46.509648] fw3/1185 is trying to release lock (rcu_preempt_state) at:
[   46.510452] [] rcu_read_unlock_special+0x5f8/0x620
[   46.511147] but there are no more locks to release!
[   46.511766] 
[   46.511766] other info that might help us debug this:

[   46.512591] 1 lock held by fw3/1185:
[   46.513070]  #0:  (rcu_read_lock){}, at: [] 
sock_def_readable+0x0/0x100
[   46.514074] 
[   46.514074] stack backtrace:

[   46.514684] CPU: 0 PID: 1185 Comm: fw3 Not tainted 4.14.0-rc8 #158
[   46.515440] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   46.516463] Call Trace:
[   46.516831]  dump_stack+0x16/0x1c
[   46.517288]  print_unlock_imbalance_bug+0xb9/0xd0
[   46.517896]  ? rcu_read_unlock_special+0x5f8/0x620
[   46.518512]  ? rcu_read_unlock_special+0x5f8/0x620
[   46.519113]  lock_release+0x1cc/0x490
[   46.519604]  ? rcu_read_unlock_special+0x5f8/0x620
[   46.520208]  ? _raw_spin_unlock_irqrestore+0x86/0xd0
[   46.520840]  rt_mutex_unlock+0x1e/0xb0
[   46.521341]  rcu_read_unlock_special+0x5f8/0x620
[   46.521936]  __rcu_read_unlock+0xa7/0xb0
[   46.522460]  sock_def_readable+0xd1/0x100
[   46.522993]  unix_stream_connect+0x633/0x680
[   46.523552]  SYSC_connect+0x107/0x120
[   46.524046]  SyS_socketcall+0x384/0xb20
[   46.524562]  ? SyS_fcntl64+0x103/0x2e0
[   46.525060]  do_int80_syscall_32+0x95/0x290
[   46.525610]  entry_INT80_32+0x2f/0x2f
[   46.526099] EIP: 0xb7e97384
[   46.526505] EFLAGS: 0296 CPU: 0
[   46.526977] EAX: ffda EBX: 0003 ECX: bf988038 EDX: b7edd000
[   46.527741] ESI: bf988038 EDI: bf9880e0 EBP: bf988098 ESP: bf988028
[   46.528505]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[   46.607747] 8021q: adding VLAN 0 to HW filter on device eth0
LKP: HOSTNAME vm-lkp-nhm-dp1-openwrt-ia32-6, 

Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

2017-10-30 Thread Fengguang Wu

On Mon, Oct 30, 2017 at 12:29:47PM -0700, Linus Torvalds wrote:

On Sun, Oct 29, 2017 at 4:48 PM, Fengguang Wu <fengguang...@intel.com> wrote:


Here are 3 dmesgs related to wireless and 1 from ethernet.


Fengguang, these would be lovelier still _if_ you have DEBUG_INFO
enabled on the kernel, and your script were to find things like
"symbol+0xhex/0xhex", and run "./scripts/faddr2line" on them.

So


[  235.425464] BUG: unable to handle kernel paging request at 00010007
[  235.425470] IP: run_timer_softirq+0x13a/0x470


would also then have

  run_timer_softirq at timer.c:XYZ

which would make it easier to see exactly _what_ it is that faults. As
it is, I think there's a fair number of inlining that makes it hard to
see the cause, but that faddrtoline would make very obvious.


Good idea and tips! It'll definitely help debug the issues where
bisect cannot help.


Finding that "symbol+xyz/abc" pattern should be fairly easy to
automate, and should fit the 0day model fairly well. No?


Sure. We'll add DEBUG_INFO and automate faddr2line.

Regards,
Fengguang


Re: [kbuild-all] [PATCH net-next] vhost_net: do not stall on zerocopy depletion

2017-09-30 Thread Fengguang Wu

On Sun, Oct 01, 2017 at 06:20:49AM +0300, Michael S. Tsirkin wrote:

On Sun, Oct 01, 2017 at 08:09:30AM +0800, kbuild test robot wrote:

Hi Willem,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/vhost_net-do-not-stall-on-zerocopy-depletion/20171001-054709
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


BTW __CHECK_ENDIAN__ is the default now, I think you can drop it from
your scripts.


Good tip, thank you! However since we're testing old kernels, too,
we'll need to keep it for probably 1-3 more years.

Thanks,
Fengguang



sparse warnings: (new ones prefixed by >>)


vim +440 drivers/vhost/net.c

   433  
   434  static bool vhost_exceeds_maxpend(struct vhost_net *net)
   435  {
   436  struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
   437  struct vhost_virtqueue *vq = >vq;
   438  
   439  return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV 
>
 > 440  min(VHOST_MAX_PEND, vq->num >> 2);
   441  }
   442  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all


Re: [RFC PATCH linus] tcp: md5: tcp_md5_do_lookup_exact() can be static

2017-07-06 Thread Fengguang Wu

Hi Stephen,

On Thu, Jul 06, 2017 at 10:27:12AM +1000, Stephen Rothwell wrote:

Hi,

Not sure why you sent this to me ... it fixes a commit in the net-next
tree (now in Linus' tree) ...


Yeah sorry -- it's a bug in the robot.


On Thu, 6 Jul 2017 07:58:53 +0800 kbuild test robot <fengguang...@intel.com> 
wrote:


Fixes: 0c5f0311f690 ("Add linux-next specific files for 20170705")


Actually:  Fixes: 6797318e623d ("tcp: md5: add an address prefix for key 
lookup")


Yeah, fixed the robot. Thank you for the reminder!

Regards,
Fengguang


Signed-off-by: Fengguang Wu <fengguang...@intel.com>
---
 tcp_ipv4.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6ec6900..a20e7f0 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -943,9 +943,9 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock 
*sk,
 }
 EXPORT_SYMBOL(tcp_md5_do_lookup);

-struct tcp_md5sig_key *tcp_md5_do_lookup_exact(const struct sock *sk,
-  const union tcp_md5_addr *addr,
-  int family, u8 prefixlen)
+static struct tcp_md5sig_key *tcp_md5_do_lookup_exact(const struct sock *sk,
+ const union tcp_md5_addr 
*addr,
+ int family, u8 prefixlen)
 {
const struct tcp_sock *tp = tcp_sk(sk);
struct tcp_md5sig_key *key;


--
Cheers,
Stephen Rothwell


Re: [PATCH net-next] selftests/bpf: make correct use of exit codes in bpf selftests

2017-06-13 Thread Fengguang Wu

On Tue, Jun 13, 2017 at 03:17:19PM +0200, Jesper Dangaard Brouer wrote:

The selftests depend on using the shell exit code as a mean of
detecting the success or failure of test-binary executed.  The
appropiate output "[PASS]" or "[FAIL]" in generated by
tools/testing/selftests/lib.mk.

Notice that the exit code is masked with 255. Thus, be careful if
using the number of errors as the exits code, as 256 errors would be


nit pick:

s/exits/exit/


printf("Summary: %d PASSED, %d FAILED\n", passes, errors);
-   return errors ? -errors : 0;
+   return errors ? EXIT_FAILURE : EXIT_SUCCESS;


Reviewed-by: Fengguang Wu <fengguang...@intel.com>

Thanks,
Fengguang


Re: [kbuild-all] [PATCH] net: hns: fix boolreturn.cocci warnings

2017-04-02 Thread Fengguang Wu

Hi David,

On Sun, Apr 02, 2017 at 07:44:02PM -0700, David Miller wrote:

From: kbuild test robot <l...@intel.com>
Date: Sat, 1 Apr 2017 07:50:55 +0800


drivers/net/ethernet/hisilicon/hns/hns_enet.c:1548:8-9: WARNING: return of 0/1 
in function 'hns_enable_serdes_lb' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: lipeng <lipeng...@huawei.com>
Signed-off-by: Fengguang Wu <fengguang...@intel.com>


This doesn't apply to any of my trees.


It's a reply to Salil's patch

   [PATCH net 08/19] net: hns: Fix to adjust buf_size of ring according to 
mtu

and so is based on that patch.

What can the robot improve to avoid you confusing it as a general
patch for the net master trees? One possible way is to change title to

   [PATCH for Salil] net: hns: fix boolreturn.cocci warnings
   ^

Regards,
Fengguang


Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Fengguang Wu

On Wed, Mar 08, 2017 at 02:43:44PM -0800, Linus Torvalds wrote:

On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann  wrote:


The issue seems to be accessing buff first (can be read or write access)
and then doing set_memory_ro() doesn't make it read-only immediately,
meaning the subsequent call into probe_kernel_write() will succeed without
error.

Then, if I don't touch buff first and only do the set_memory_ro() seems
to work and probe_kernel_write() will then fail as expected due to pages
being read-only now.


Ok, that definitely sounds like a TLB invalidate didn't happen.


Now, if I access buff, do the set_memory_ro() and then a msleep(0), for
example, it "kind of" works most of the time (see last log extract below),
and probe_kernel_write() will fail.


Yeah, very much consistent with a missing TLB invalidate. Scheduling
will end up invalidating it, although if it's a global page even that
might not do it (but eventually the entry will just get flushed due to
other activity).


None of this seems an issue with x86_64 and the test_setmem runs fine all
the time, same for the actual BPF stuff.


The code does look somewhat confused about when to actually flush
things - see my earlier note about NX - but it would seem to always do
__flush_tlb_all() unless I missed something. At least as long as
CPA_FLUSHTLB is set. Maybe some case forgets to set that..


Not sure if it's relevant, but out of 189 boots there are 2 boots
showing the below "CPA: called for zero pte." warning.

[7.116932] random: trinity: uninitialized urandom read (4 bytes read)
[   16.366468] sock: process `trinity-main' is using obsolete setsockopt 
SO_BSDCOMPAT
[   17.202396] BUG: unable to handle kernel paging request at 655d9eb2
[   17.204081] IP: __release_sock+0x6e/0x100
[   17.205207] *pde = 
[   17.205208]
[   17.206755] Oops:  [#1]
[   17.207686] CPU: 0 PID: 382 Comm: trinity-main Not tainted 
4.10.0-rc8-02017-g9d876e7 #1
[   17.209819] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[   17.212431] task: d625d200 task.stack: d6222000
[   17.213655] EIP: __release_sock+0x6e/0x100
[   17.214833] EFLAGS: 00010246 CPU: 0
[   17.215951] EAX:  EBX: 655d9eb2 ECX:  EDX: 0201
[   17.217587] ESI: 0605 EDI: d6064800 EBP: d6223ef4 ESP: d6223ee8
[   17.219185]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[   17.220602] CR0: 80050033 CR2: 655d9eb2 CR3: 1610f000 CR4: 0610
[   17.221966] DR0: 080cb000 DR1:  DR2:  DR3: 
[   17.223444] DR6: 0ff0 DR7: 0600
[   17.224343] Call Trace:
[   17.225007]  release_sock+0x2e/0x80
[   17.225900]  sock_setsockopt+0x8c/0x880
[   17.226857]  SyS_socketcall+0x658/0x6a0
[   17.227804]  do_fast_syscall_32+0x9a/0x160
[   17.228765]  entry_SYSENTER_32+0x4c/0x7b
[   17.229694] EIP: 0xbcc5
[   17.230428] EFLAGS: 0282 CPU: 0
[   17.231263] EAX: ffda EBX: 000e ECX: bfedce00 EDX: bfedce80
[   17.232582] ESI: 001a EDI: 00ae EBP: b754f93c ESP: bfedcdec
[   17.233882]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[   17.235044] Code: eb 29 8d 76 00 89 da 89 f8 ff 97 98 01 00 00 31 c9 ba 06 08 00 
00 b8 d8 19 b1 c1 e8 ed 3d 85 ff e8 e8 62 04 00 85 f6 89 f3 74 42 <8b>
33 0f 18 06 8b 43 48 a8 01 74 0e 83 e0 fe 74 09 80 3d 3d 9c
[   17.240429] EIP: __release_sock+0x6e/0x100 SS:ESP: 0068:d6223ee8
[   17.241689] CR2: 655d9eb2
[   17.242509] ---[ end trace dc10480164c75444 ]---
[   17.243569] [ cut here ]
[   17.243574] WARNING: CPU: 0 PID: 15 at arch/x86/mm/pageattr.c:1150 
__cpa_process_fault+0x388/0x390
[   17.243575] CPA: called for zero pte. vaddr = d7ab4000 cpa->vaddr = d7ab4000
[   17.243577] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G  D 
4.10.0-rc8-02017-g9d876e7 #1
[   17.243578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[   17.243582] Workqueue: events bpf_prog_free_deferred
[   17.243583] Call Trace:
[   17.243588]  dump_stack+0x16/0x25
[   17.243588]  dump_stack+0x16/0x25
[   17.243590]  __warn+0xd1/0xf0
[   17.243592]  ? __cpa_process_fault+0x388/0x390
[   17.243593]  warn_slowpath_fmt+0x3b/0x40
[   17.243594]  __cpa_process_fault+0x388/0x390
[   17.243596]  ? lookup_address_in_pgd+0xa/0x90
[   17.243598]  __change_page_attr+0x520/0x6c0
[   17.243600]  ? pfn_range_is_mapped+0xe/0x80
[   17.243601]  __change_page_attr_set_clr+0x38/0x180
[   17.243603]  change_page_attr_set_clr+0x107/0x3f0
[   17.243605]  ? dequeue_entity+0x86/0x230
[   17.243607]  set_memory_rw+0x3a/0x40
[   17.243608]  bpf_prog_free_deferred+0x16/0x30
[   17.243612]  process_one_work+0xfc/0x440
[   17.243614]  ? pick_next_task_fair+0x149/0x1d0
[   17.243615]  worker_thread+0x37/0x4e0
[   17.243617]  kthread+0xdd/0x110
[   17.243618]  ? process_one_work+0x440/0x440
[   17.243620]  ? __kthread_create_on_node+0x100/0x100
[   17.243622]  ret_from_fork+0x21/0x2c
[   17.243623] 

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-02 Thread Fengguang Wu

On Wed, Mar 01, 2017 at 08:54:26PM +0800, Fengguang Wu wrote:

Hi all,

Is it BPF triggering BUGs all over the places?


It looks so, and here is a fix.


1e74a2eb1f  Merge tag 'gcc-plugins-v4.11-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
005c3490e9  Revert "ath10k: Search SMBIOS for OEM board file extension"
3051bf36c2  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
+---++++
|   | 1e74a2eb1f | 
005c3490e9 | 3051bf36c2 |
+---++++
| boot_successes| 1223   | 1098 
  | 242|
| boot_failures | 1  | 126  
  | 72 |
| BUG:unable_to_handle_kernel   | 1  | 117  
  | 69 |
| Oops  | 1  | 126  
  | 72 |
| EIP:perf_callchain_user   | 1  |  
  ||
| Kernel_panic-not_syncing:Fatal_exception  | 1  | 121  
  | 67 |
| EIP:netlink_release   | 0  | 20   
  | 3  |
| EIP:bpf_prog_free | 0  | 22   
  | 3  |
| EIP:filp_close| 0  | 64   
  | 23 |
| EIP:netlink_update_listeners  | 0  | 10   
  | 9  |
| EIP:security_inode_getattr| 0  | 2
  ||
| EIP:__lock_acquire| 0  | 1
  | 11 |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0  | 5
  | 4  |
| EIP:__rcu_process_callbacks   | 0  | 2
  ||
| EIP:__fget_light  | 0  | 1
  ||
| EIP:__unix_remove_socket  | 0  | 0
  | 13 |
| INFO:trying_to_register_non-static_key| 0  | 0
  | 2  |
| EIP:mnt_want_write_file   | 0  | 0
  | 1  |
| EIP:skb_dequeue   | 0  | 0
  | 1  |
| EIP:strlen| 0  | 0
  | 1  |
| EIP:__netlink_lookup  | 0  | 0
  | 2  |
| EIP:vfs_fsync_range   | 0  | 0
  | 1  |
| EIP:__unix_find_socket_byname | 0  | 0
  | 1  |
| EIP:release_sock  | 0  | 0
  | 1  |
+---++++


I confirm that the below patch provided by Daniel fixes the above
issues on mainline kernel, too. Where should this patch be sent to?
It'd be very noisy if all these Oops hit the upcoming RC1 kernel.

Daniel thinks there may be deeper problem in i386 set_memory_rw().
However that could take much longer time to debug.

Thanks,
Fengguang
---

Re: [bpf] 9d876e79df:  BUG: unable to handle kernel paging request at 653a8346


On Tue, Feb 28, 2017 at 04:39:36PM +0100, Daniel Borkmann wrote:


I have a rough feeling what it is, but I didn't have cycles to work on
it yet (due to travel, sorry about that). The issue is likely shut down
by just doing:

---
arch/x86/Kconfig |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- linux.orig/arch/x86/Kconfig 2017-03-03 03:44:35.962022996 +0800
+++ linux/arch/x86/Kconfig  2017-03-03 03:44:35.962022996 +0800
@@ -54,7 +54,7 @@ config X86
select ARCH_HAS_KCOVif X86_64
select ARCH_HAS_MMIO_FLUSH
select ARCH_HAS_PMEM_APIif X86_64
-   select ARCH_HAS_SET_MEMORY
+   select ARCH_HAS_SET_MEMORY  if X86_64
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX


Re: [bpf] 9d876e79df: BUG: unable to handle kernel paging request at 653a8346

2017-02-28 Thread Fengguang Wu
[remove unrelated mailing lists]

On Mon, Feb 27, 2017 at 04:25:57PM +0100, Daniel Borkmann wrote:
>On 02/27/2017 03:14 AM, kernel test robot wrote:
>> Greetings,
>>
>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
>I'll take a look, thanks for the report!

You are welcome! btw here is another bisect result showing a different
call trace. The attached reproduce-* script may help reproduce the bug.

d2852a2240  arch: add ARCH_HAS_SET_MEMORY config
9d876e79df  bpf: fix unlocking of jited image when module ronx not set
++++
|| d2852a2240 | 
9d876e79df |
++++
| boot_successes | 2756   | 207 
   |
| boot_failures  | 0  | 238 
   |
| BUG:unable_to_handle_kernel| 0  | 236 
   |
| Oops:#[##] | 0  | 236 
   |
| EIP:__release_sock | 0  | 13  
   |
| Kernel_panic-not_syncing:Fatal_exception   | 0  | 218 
   |
| EIP:bpf_prog_free  | 0  | 23  
   |
| EIP:filp_close | 0  | 44  
   |
| EIP:__wake_up_common   | 0  | 16  
   |
| EIP:unix_release_sock  | 0  | 76  
   |
| EIP:__netlink_lookup   | 0  | 3   
   |
| EIP:release_sock   | 0  | 9   
   |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt  | 0  | 18  
   |
| EIP:netlink_update_listeners   | 0  | 7   
   |
| EIP:__rcu_process_callbacks| 0  | 4   
   |
| EIP:__unix_find_socket_byname  | 0  | 14  
   |
| BUG:kernel_hang_in_test_stage  | 0  | 2   
   |
| EIP:__fget_light   | 0  | 15  
   |
| EIP:rht_bucket_nested  | 0  | 4   
   |
| EIP:___cache_free  | 0  | 2   
   |
| WARNING:at_arch/x86/mm/pageattr.c:#__cpa_process_fault | 0  | 1   
   |
| EIP:mnt_want_write_file| 0  | 1   
   |
| EIP:netlink_release| 0  | 2   
   |
++++

[5.875292] init: Failed to create pty - disabling logging for job
[5.876281] init: Temporary process spawn error: No such file or directory
[5.894107] genirq: Flags mismatch irq 4.  (serial) vs. 0080 
(goldfish_pdev_bus)
[5.904376] random: trinity: uninitialized urandom read (4 bytes read)
[   15.457341] sock: process `trinity-main' is using obsolete setsockopt 
SO_BSDCOMPAT
[   15.493431] BUG: unable to handle kernel paging request at 83aa
[   15.494924] IP: netlink_update_listeners+0x65/0xb0
[   15.496147] *pde =  
[   15.496148] 
[   15.497698] Oops:  [#1]
[   15.498505] CPU: 0 PID: 377 Comm: trinity-main Not tainted 
4.10.0-rc8-02017-g9d876e7 #59
[   15.500409] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[   15.502721] task: d652f840 task.stack: d4b74000
[   15.503830] EIP: netlink_update_listeners+0x65/0xb0
[   15.505017] EFLAGS: 00010002 CPU: 0
[   15.505931] EAX: 81e6 EBX: 0023 ECX:  EDX: 
[   15.507334] ESI:  EDI: d5226118 EBP: d4b75e68 ESP: d4b75e58
[   15.508746]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[   15.509996] CR0: 80050033 CR2: 83aa CR3: 1696 CR4: 0610
[   15.511399] Call Trace:
[   15.512126]  netlink_bind+0x136/0x240
[   15.513068]  SYSC_bind+0x98/0xb0
[   15.513924]  ? __might_sleep+0x32/0xa0
[   15.514886]  ? __might_sleep+0x32/0xa0
[   15.515843]  ? __copy_from_user_ll+0xb/0xe0
[   15.516878]  ? _copy_from_user+0x62/0xa0
[   15.517866]  SyS_socketcall+0x49c/0x6a0
[   15.518842]  ? __sb_end_write+0x8/0x40
[   15.519795]  ? __might_sleep+0x32/0xa0
[   15.520739]  ? mutex_unlock+0x9/0x30
[   15.521664]  do_fast_syscall_32+0x9a/0x160
[   15.522674]  entry_SYSENTER_32+0x4c/0x7b
[   15.523645] EIP: 0xb77bdcc5
[   15.524415] EFLAGS: 0286 CPU: 0
[   15.525334] EAX: ffda EBX: 0002 ECX: bfc54530 EDX: b7591710
[   15.526737] ESI: 0177 EDI: 0010 EBP: 0003 ESP: bfc5451c
[   15.528139]  DS: 007b ES: 007b FS:  GS: 0033 SS: 

Re: [PATCH] net/utils: fix semicolon.cocci warnings

2017-02-17 Thread Fengguang Wu

Hi Dave,

On Fri, Feb 17, 2017 at 01:52:54PM -0500, David Miller wrote:

From: kbuild test robot <l...@intel.com>
Date: Fri, 17 Feb 2017 05:34:03 +0800


net/core/utils.c:388:2-3: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Sagi Grimberg <s...@grimberg.me>
Signed-off-by: Fengguang Wu <fengguang...@intel.com>
---

 utils.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -385,7 +385,7 @@ int inet_pton_with_scope(struct net *net


I have no idea what tree this could be against, as net/core/utils.c doesn't
have more than 351 lines in any tree I maintain.


You can ignore it -- the cocci fix is for Sagi Grimberg's RFC patch
which can be judged by looking at the thread:

 Feb 17 To Sagi Grimber (  22:0) -->Re: [PATCH rfc 1/4] net/utils: generic 
inet_pton_with_scope helper
 Feb 17 To Sagi Grimber (  29:0) `->[PATCH] net/utils: fix semicolon.cocci 
warnings

Thanks,
Fengguang


Re: [LKP] [net] 34fad54c25: kernel BUG at include/linux/skbuff.h:1935!

2016-11-23 Thread Fengguang Wu

On Tue, Nov 22, 2016 at 11:07:16PM -0800, Linus Torvalds wrote:

On Tue, Nov 22, 2016 at 10:44 PM, Fengguang Wu <fengguang...@intel.com> wrote:


On Tue, Nov 22, 2016 at 02:04:42PM -0800, Linus Torvalds wrote:


I also noticed that the kernel test robot had screwed up the
participants list for some reason, and had

 "Acked-by: Alexander Duyck <alexander.h.du...@intel.com>, David S.
Miller" <da...@davemloft.net>

as one of the participants. So there's some odd commit parsing issue
there somewhere. But Alexander seems to have seen this report despite
that, it just never went anywhere that I can tell.



Yeah the robot will CC all "Acked-by" people in the bug reports.

Shall we limit it to the below TO/CC list?


No. We do want to keep the Acked-by's on the cc.

But you missed the real problem.

It *didn't* cc the acked-by. Look closer. What happened was that it cc'd this:

"Acked-by: Alexander Duyck <alexander.h.du...@intel.com>, David S. Miller"

<da...@davemloft.net>

ie there is only _one_ email address (that of da...@davemloft.net),
and the whole "Acked-by: Alexander Duyck <...>" part is quoted as the
_name_ of that email address.

At least that's what the headers look like for me in the original report:

  From: kernel test robot <xiaolong...@intel.com>
  To: Eric Dumazet <eduma...@google.com>
  Cc: l...@01.org, Linus Torvalds <torva...@linux-foundation.org>,
LKML <linux-ker...@vger.kernel.org>, Alexei Starovoitov
<a...@kernel.org>, Willem de Bruijn <will...@google.com>, "Acked-by:
Alexander Duyck <alexander.h.du...@intel.com>, David S. Miller"
<da...@davemloft.net>

Notice the quoting of that last "name".


Ah thanks! Xiaolong just root caused the parse error and will fix it.

Interestingly we didn't see that problem -- the CC list looks correct
in our emails -- perhaps Intel's email system auto fixed up the header.

Thanks,
Fengguang


Re: [LKP] [net] 34fad54c25: kernel BUG at include/linux/skbuff.h:1935!

2016-11-22 Thread Fengguang Wu

Hi Linus,

On Tue, Nov 22, 2016 at 02:04:42PM -0800, Linus Torvalds wrote:
[snip]


I also noticed that the kernel test robot had screwed up the
participants list for some reason, and had

 "Acked-by: Alexander Duyck , David S.
Miller" 

as one of the participants. So there's some odd commit parsing issue
there somewhere. But Alexander seems to have seen this report despite
that, it just never went anywhere that I can tell.


Yeah the robot will CC all "Acked-by" people in the bug reports.

Shall we limit it to the below TO/CC list?

   TO: author
   CC: committer (maintainer)
   CC: all Signed-off-by
   CC: all Reviewed-by
   CC: mailing lists, if the bug is found in a maintainer/well known tree

Regards,
Fengguang


On Tue, Nov 15, 2016 at 1:20 PM, kernel test robot
 wrote:


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit 34fad54c2537f7c99d07375e50cb30aa3c23bd83 ("net: __skb_flow_dissect() must cap 
its return value")

in testcase: pbzip2
with following parameters:

nr_threads: 25%
blocksize: 900K
cpufreq_governor: performance



on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 
with 64G memory

caused below changes:


+--+++
|  | 79774d6bfa 
| 34fad54c25 |
+--+++
| boot_successes   | 0  
| 2  |
| boot_failures| 2  
| 20 |
| invoked_oom-killer:gfp_mask=0x   | 2  
| 2  |
| Mem-Info | 2  
| 2  |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 2  
| 2  |
| kernel_BUG_at_include/linux/skbuff.h | 0  
| 16 |
| invalid_opcode:#[##]SMP  | 0  
| 16 |
| RIP:eth_type_trans   | 0  
| 16 |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt| 0  
| 15 |
| calltrace:hub_event  | 0  
| 1  |
| WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup| 0  
| 2  |
| calltrace:parport_pc_init| 0  
| 2  |
| calltrace:SyS_finit_module   | 0  
| 2  |
| WARNING:at_lib/kobject.c:#kobject_add_internal   | 0  
| 2  |
+--+++



[   19.375251] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   19.388892] Sending DHCP requests .
[   19.388892] [ cut here ]
[   19.388894] kernel BUG at include/linux/skbuff.h:1935!
[   19.388895] invalid opcode:  [#1] SMP
[   19.388896] Modules linked in:
[   19.388897] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.9.0-rc3-00320-g34fad54 #1
[   19.388898] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS 
SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   19.388899] task: 81e0e4c0 task.stack: 81e0
[   19.388904] RIP: 0010:[]  [] 
eth_type_trans+0xe8/0x140
[   19.388904] RSP: :88081e803db8  EFLAGS: 00010297
[   19.388905] RAX: 0152 RBX: 88080221f200 RCX: 1073
[   19.388905] RDX: 8808013afdc0 RSI: 880801114000 RDI: 880819407c00
[   19.388906] RBP: 88081e803e20 R08: 880801114000 R09: 0800
[   19.388907] R10: 8808013afec0 R11: ea003fd5a880 R12: 880819407c00
[   19.388907] R13: 881033408000 R14: c9000843e000 R15: 0158
[   19.388908] FS:  () GS:88081e80() 
knlGS:
[   19.388909] CS:  0010 DS:  ES:  CR0: 80050033
[   19.388910] CR2: 88103000 CR3: 01e07000 CR4: 001406f0
[   19.388910] Stack:
[   19.388912]  816905a7 ea003fd5a880 ea08 
88080221f050
[   19.388913]  88080221f000 00400160 ea003fd5a880 

[   19.388915]  0040  88080221f050 
88100d216000
[   19.388915] Call Trace:
[   19.388919]  
[   19.388919]  [] ? igb_clean_rx_irq+0x6a7/0x7d0
[   19.388921]  [] igb_poll+0x382/0x700
[   19.388922]  [] ? igb_poll+0x397/0x700
[   19.388925]  [] net_rx_action+0x217/0x360
[   19.388928]  [] __do_softirq+0x104/0x2ab
[   19.388931]  [] irq_exit+0xf1/0x100

Re: [PATCH net-next v3 2/3] net: fsl: Allow most drivers to be built with COMPILE_TEST

2016-11-16 Thread Fengguang Wu

On Wed, Nov 16, 2016 at 11:52:45AM -0800, Florian Fainelli wrote:

On 11/15/2016 07:23 PM, kbuild test robot wrote:

Hi Florian,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-gianfar_ptp-Rename-FS-bit-to-FIPERST/20161116-095805
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/freescale/fsl_pq_mdio.c: In function 
'fsl_pq_mdio_remove':

drivers/net/ethernet/freescale/fsl_pq_mdio.c:498:27: warning: unused variable 
'priv' [-Wunused-variable]

 struct fsl_pq_mdio_priv *priv = bus->priv;


Humm, this looks bogus, the variable is used see below:


  ^~~~

vim +/priv +498 drivers/net/ethernet/freescale/fsl_pq_mdio.c

1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  482   return 0;
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  483
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi  
2012-08-29  484  error:
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi  2012-08-29  
485   if (priv->map)
b3319b10 drivers/net/fsl_pq_mdio.cAnton Vorontsov 2009-12-30  
486   iounmap(priv->map);
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi  
2012-08-29  487
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  488   kfree(new_bus);
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi  
2012-08-29  489
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  490   return err;
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  491  }
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  492
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  493
5078ac79 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi  
2012-08-29  494  static int fsl_pq_mdio_remove(struct platform_device *pdev)
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  495  {
5078ac79 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi  2012-08-29  496  
 struct device *device = >dev;
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  497   struct mii_bus *bus = dev_get_drvdata(device);
b3319b10 drivers/net/fsl_pq_mdio.cAnton Vorontsov 2009-12-30 
@498   struct fsl_pq_mdio_priv *priv = bus->priv;
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  499
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  500   mdiobus_unregister(bus);
1577ecef drivers/net/fsl_pq_mdio.cAndy Fleming
2009-02-04  501
b3319b10 drivers/net/fsl_pq_mdio.cAnton Vorontsov 2009-12-30  
502   iounmap(priv->map);


Right here.

What compiler version is this?


Compiler is sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705.

include/asm-generic/io.h conditionally defines iounmap() to be an
empty inline function, which may explain the warning on sh4.

General speaking, it's a false warning. The solution could be to teach
the robot to ignore such 'unused variable' warnings in non-x86 archs.

Thanks,
Fengguang


Re: [LKP] [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!

2016-11-13 Thread Fengguang Wu

Hi guys.

I took a look at the commit again and I do not see how this can happen.

Are you sure patch was properly applied ?

In particular, the following extract is obscure for me :



https://github.com/0day-ci/linux 
Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839
commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net: __skb_flow_dissect() must cap 
its return value")



Hi,

The above two lines means 0day repo setup a new branch
"Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839"
which is based on net/master, then applied you patch on top of it,
commit id is 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb.


Xiaolong, it may be more helpful to show the base tree where we apply
the patch to. And the final url:

https://github.com/0day-ci/linux/tree/Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839

Thanks,
Fengguang


Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-31 Thread Fengguang Wu

On Thu, Jul 28, 2016 at 08:53:12PM +0800, Fengguang Wu wrote:

On Thu, Jul 28, 2016 at 01:18:27PM +0200, Jiri Kosina wrote:

On Thu, 28 Jul 2016, kbuild test robot wrote:


[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: i386-randconfig-s0-201630 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> (.text+0x37ccb): undefined reference to `qdisc_hash_add'


Dear 0-day team,

could you please check my question regarding this very build failure here?

lkml.kernel.org/r/alpine.lnx.2.00.1607141612560.24...@cbobk.fhfr.pm


Sorry I missed that. For your convenience, here is the answer to the
original email:


This issue is be there even without my patch (but with qdisc_list_add
instead), isn't it?


Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'


Jiri, I just double checked and find no similar errors related to
qdisc_list_add(). The parent commit 95556a8838 ("dccp: avoid deadlock
in dccp_v4_ctl_send_reset") builds fine without error.

Thanks,
Fengguang


The problem is that sch_generic.c (where dev_activate() is) is being
compiled everytime CONFIG_NET is set, but sch_api.c (where
qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
is no stub for !CONFIG_NET_SCHED case).


So it looks like a more general problem than specific to this patch.

Thanks,
Fengguang


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Fengguang Wu

Hi Sabrina,


The idea when this first came up was to skip the sleeping part of
disable_irq():

http://marc.info/?l=linux-netdev=142314159626052

This fell off my todolist and I didn't send the conversion patches,
which would basically look like this:


Yes it works in the several machines that had the BUG!

[   23.806847] netpoll: netconsole: local port 6665
[   23.807145] netpoll: netconsole: local IPv4 address 0.0.0.0
[   23.807494] netpoll: netconsole: interface 'eth0'
[   23.807799] netpoll: netconsole: remote port 6646
[   23.808096] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   23.808474] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   23.808910] netpoll: netconsole: local IP 192.168.1.161
[   23.811680] 28 Jul 19:42:10 ntpdate[376]: step time server 192.168.1.1 
offset 1696.257557 sec
[   23.811886] console [netcon0] enabled
[   23.812131] netconsole: network logging started

Thanks,
Fengguang



diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0b341e..b022691e680b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused 
irq, void *data)

vector = 0;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_rx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_rx(msix_irq, netdev);
enable_irq(msix_irq);

vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_tx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_tx(msix_irq, netdev);
enable_irq(msix_irq);

vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_msix_other(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_msix_other(msix_irq, netdev);
enable_irq(msix_irq);
}

@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
case E1000E_INT_MODE_MSI:
-   disable_irq(adapter->pdev->irq);
-   e1000_intr_msi(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr_msi(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
default:/* E1000E_INT_MODE_LEGACY */
-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
br
ak;
}


Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Fengguang Wu

On Thu, Jul 28, 2016 at 01:18:27PM +0200, Jiri Kosina wrote:

On Thu, 28 Jul 2016, kbuild test robot wrote:


[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: i386-randconfig-s0-201630 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> (.text+0x37ccb): undefined reference to `qdisc_hash_add'


Dear 0-day team,

could you please check my question regarding this very build failure here?

lkml.kernel.org/r/alpine.lnx.2.00.1607141612560.24...@cbobk.fhfr.pm


Sorry I missed that. For your convenience, here is the answer to the
original email:


This issue is be there even without my patch (but with qdisc_list_add
instead), isn't it?


Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'


The problem is that sch_generic.c (where dev_activate() is) is being
compiled everytime CONFIG_NET is set, but sch_api.c (where
qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
is no stub for !CONFIG_NET_SCHED case).


So it looks like a more general problem than specific to this patch.

Thanks,
Fengguang


Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Fengguang Wu

Hi Jiri,

On Thu, Jul 14, 2016 at 04:14:58PM +0200, Jiri Kosina wrote:


[ added CCs ]

On Tue, 12 Jul 2016, kbuild test robot wrote:


Hi,

[auto build test ERROR on net/master]
[also build test ERROR on v4.7-rc7 next-20160711]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160711-220527
config: arm-tct_hammer_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> wext-proc.c:(.text+0x38544): undefined reference to `qdisc_hash_add'


This issue is be there even without my patch (but with qdisc_list_add
instead), isn't it?


Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'


The problem is that sch_generic.c (where dev_activate() is) is being
compiled everytime CONFIG_NET is set, but sch_api.c (where
qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
is no stub for !CONFIG_NET_SCHED case).


So it looks like a more general problem than specific to this patch.

Thanks,
Fengguang


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Fengguang Wu

On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:

On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:

Hi Eric,

It works!

On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
>On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
>> Greetings,
>>
>> This BUG message can be found in recent kernels as well as v4.4 and
>> linux-stable. It happens when running
>>
>> modprobe netconsole netconsole=@/,$port@$server/
>>
>> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
>> [   39.943285] netpoll: netconsole: local port 6665
>> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
>> [   39.943609] netpoll: netconsole: interface 'eth0'
>> [   39.943756] netpoll: netconsole: remote port 6672
>> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
>> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
>> [   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
>> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
>> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
>> [   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
>> [   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
>> [   39.944524]  006e c90001f2fa00 810aec03 
81a25948
>> [   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
>> [   39.944526] Call Trace:
>> [   39.944533]  [] dump_stack+0x63/0x8a
>> [   39.944536]  [] ___might_sleep+0xd3/0x120
>> [   39.944537]  [] __might_sleep+0x4a/0x80
>> [   39.944541]  [] synchronize_irq+0x38/0xa0
>> [   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
>> [   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
>> [   39.944547]  [] disable_irq+0x1c/0x20
>> [   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
>> [   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
>> [   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
>> [   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
>> [   39.944572]  [] netpoll_send_udp+0x2ec/0x450
>> [   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
>> [   39.944578]  [] call_console_drivers+0x115/0x120
>> [   39.944580]  [] console_unlock+0x333/0x5c0
>> [   39.944583]  [] register_console+0x1c4/0x380
>> [   39.944586]  [] init_netconsole+0x1c5/0x1000 
[netconsole]
>> [   39.944588]  [] ? 0xa004f000
>> [   39.944591]  [] do_one_initcall+0x3d/0x150
>> [   39.944592]  [] ? __might_sleep+0x4a/0x80
>> [   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
>> [   39.944598]  [] do_init_module+0x5f/0x1d8
>> [   39.944602]  [] load_module+0x1429/0x1b40
>> [   39.944604]  [] ? __symbol_put+0x40/0x40
>> [   39.944607]  [] ? kernel_read_file+0x178/0x1a0
>> [   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
>> [   39.944611]  [] SYSC_finit_module+0xc3/0xf0
>> [   39.944614]  [] SyS_finit_module+0xe/0x10
>> [   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [   39.946384] console [netcon0] enabled
>> [   39.946514] netconsole: network logging started
>>
>> Can this be possibly fixed?
>
>Could you try this ?
>
>diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
>index 
f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 
100644
>--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> {
>struct e1000_adapter *adapter = netdev_priv(netdev);
>
>-   disable_irq(adapter->pdev->irq);
>-   e1000_intr(adapter->pdev->irq, netdev);
>-   enable_irq(adapter->pdev->irq);
>+   if (napi_schedule_prep(>napi)) {
>+   adapter->total_tx_bytes = 0;
>+   adapter->total_tx_packets = 0;
>+   adapter->total_rx_bytes = 0;
>+   adapter->total_rx_packets = 0;
>+   __napi_schedule(>napi);
>+   }

The machines are actually running e1000e driver, so I copied your
approach to e1000e and it works:

kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
kern  :i

Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-26 Thread Fengguang Wu

Hi Eric,

It works!

On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:

On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:

Greetings,

This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running

modprobe netconsole netconsole=@/,$port@$server/

[   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
[   39.943285] netpoll: netconsole: local port 6665
[   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[   39.943609] netpoll: netconsole: interface 'eth0'
[   39.943756] netpoll: netconsole: remote port 6672
[   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   39.944311] netpoll: netconsole: local IP 192.168.1.193
[   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
[   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
[   39.944524]  006e c90001f2fa00 810aec03 
81a25948
[   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
[   39.944526] Call Trace:
[   39.944533]  [] dump_stack+0x63/0x8a
[   39.944536]  [] ___might_sleep+0xd3/0x120
[   39.944537]  [] __might_sleep+0x4a/0x80
[   39.944541]  [] synchronize_irq+0x38/0xa0
[   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
[   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
[   39.944547]  [] disable_irq+0x1c/0x20
[   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
[   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
[   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
[   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
[   39.944572]  [] netpoll_send_udp+0x2ec/0x450
[   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
[   39.944578]  [] call_console_drivers+0x115/0x120
[   39.944580]  [] console_unlock+0x333/0x5c0
[   39.944583]  [] register_console+0x1c4/0x380
[   39.944586]  [] init_netconsole+0x1c5/0x1000 [netconsole]
[   39.944588]  [] ? 0xa004f000
[   39.944591]  [] do_one_initcall+0x3d/0x150
[   39.944592]  [] ? __might_sleep+0x4a/0x80
[   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
[   39.944598]  [] do_init_module+0x5f/0x1d8
[   39.944602]  [] load_module+0x1429/0x1b40
[   39.944604]  [] ? __symbol_put+0x40/0x40
[   39.944607]  [] ? kernel_read_file+0x178/0x1a0
[   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
[   39.944611]  [] SYSC_finit_module+0xc3/0xf0
[   39.944614]  [] SyS_finit_module+0xe/0x10
[   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   39.946384] console [netcon0] enabled
[   39.946514] netconsole: network logging started

Can this be possibly fixed?


Could you try this ?

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 
f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b
 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);

-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
-   enable_irq(adapter->pdev->irq);
+   if (napi_schedule_prep(>napi)) {
+   adapter->total_tx_bytes = 0;
+   adapter->total_tx_packets = 0;
+   adapter->total_rx_bytes = 0;
+   adapter->total_rx_packets = 0;
+   __napi_schedule(>napi);
+   }


The machines are actually running e1000e driver, so I copied your
approach to e1000e and it works:

kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 
192.168.2.1
kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address 
ff:ff:ff:ff:ff:ff
kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
kern  :info  : [   16.114284] console [netcon0] enabled
kern  :info  : [   16.114550] netconsole: network logging started

However I'm not sure if it'll have side effects, because this
effectively disables the various checks in e1000_intr() and
e1000_intr_msi().

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9b4ec13..4f89873 100644
--- a/drivers/net/ethernet/int

[e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-25 Thread Fengguang Wu
Greetings,

This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running

modprobe netconsole netconsole=@/,$port@$server/ 

[   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
[   39.943285] netpoll: netconsole: local port 6665
[   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[   39.943609] netpoll: netconsole: interface 'eth0'
[   39.943756] netpoll: netconsole: remote port 6672
[   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   39.944311] netpoll: netconsole: local IP 192.168.1.193
[   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
[   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
[   39.944524]  006e c90001f2fa00 810aec03 
81a25948
[   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
[   39.944526] Call Trace:
[   39.944533]  [] dump_stack+0x63/0x8a
[   39.944536]  [] ___might_sleep+0xd3/0x120
[   39.944537]  [] __might_sleep+0x4a/0x80
[   39.944541]  [] synchronize_irq+0x38/0xa0
[   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
[   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
[   39.944547]  [] disable_irq+0x1c/0x20
[   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
[   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
[   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
[   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
[   39.944572]  [] netpoll_send_udp+0x2ec/0x450
[   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
[   39.944578]  [] call_console_drivers+0x115/0x120
[   39.944580]  [] console_unlock+0x333/0x5c0
[   39.944583]  [] register_console+0x1c4/0x380
[   39.944586]  [] init_netconsole+0x1c5/0x1000 [netconsole]
[   39.944588]  [] ? 0xa004f000
[   39.944591]  [] do_one_initcall+0x3d/0x150
[   39.944592]  [] ? __might_sleep+0x4a/0x80
[   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
[   39.944598]  [] do_init_module+0x5f/0x1d8
[   39.944602]  [] load_module+0x1429/0x1b40
[   39.944604]  [] ? __symbol_put+0x40/0x40
[   39.944607]  [] ? kernel_read_file+0x178/0x1a0
[   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
[   39.944611]  [] SYSC_finit_module+0xc3/0xf0
[   39.944614]  [] SyS_finit_module+0xe/0x10
[   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   39.946384] console [netcon0] enabled
[   39.946514] netconsole: network logging started

Can this be possibly fixed?

Thanks,
Fengguang


Re: [kbuild-all] [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Fengguang Wu

On Tue, Jul 19, 2016 at 07:52:53PM -0700, David Miller wrote:

From: Dexuan Cui 
Date: Wed, 20 Jul 2016 01:48:18 +


From: kbuild test robot [mailto:l...@intel.com]
Sent: Wednesday, July 20, 2016 1:10

Hi,

[auto build test WARNING on net-next/master]

url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-
Hyper-V-VM-Sockets-hv_sock/20160715-223433
config: x86_64-randconfig-a0-07191719 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All warnings (new ones prefixed by >>):

   net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
   net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in
this function
   net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used
uninitialized in this function
   net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized
in this function
   net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
   net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in
this function
>> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d:
function has unreachable instruction


These warnings are all false alarms.


But you still have to quiet them.


Agreed.

However the last warning

   objtool: hvsock_on_channel_cb()+0x1d: function has unreachable 
instruction

is my fault -- it turns out to be a problem with old gcc-4.4 and
gcc-4.6 which we enabled 2 days ago. I just disabled reporting this
odd warning because it complaints on ~1 functions all over the
place, which can only be sanely fixed in the toolchain.

Thanks,
Fengguang


Re: [kbuild-all] [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Fengguang Wu

Yes, sorry for the noises!


>> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d:
function has unreachable instruction


These warnings are all false alarms.

Thanks,
-- Dexuan
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all


Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-18 Thread Fengguang Wu

On Mon, Jul 18, 2016 at 07:38:27PM -0700, Alexei Starovoitov wrote:

On Tue, Jul 19, 2016 at 08:38:02AM +0800, Fengguang Wu wrote:

Hi Alexei,

On Mon, Jul 18, 2016 at 05:33:07PM -0700, Alexei Starovoitov wrote:
>On Mon, Jul 18, 2016 at 03:50:58PM -0700, a...@linux-foundation.org wrote:
>>From: Andrew Morton <a...@linux-foundation.org>
>>Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
initialization bug
>>
>>kernel/trace/bpf_trace.c: In function 'bpf_event_output':
>>kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
initializer
>>kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
>>kernel/trace/bpf_trace.c:312: warning: (near initialization for 
'raw.frag.')
>>
>>Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
output")
>>Acked-by: Daniel Borkmann <dan...@iogearbox.net>
>>Cc: Alexei Starovoitov <a...@kernel.org>
>>Cc: David S. Miller <da...@davemloft.net>
>>Signed-off-by: Andrew Morton <a...@linux-foundation.org>
>
>Acked-by: Alexei Starovoitov <a...@kernel.org>
>
>Fengguang can you add gcc-4.4 to buildbot. Thanks!

Sure. Currently we only test gcc-6. It'd be easy to test more versions
concurrently, like

gcc-4.4
gcc-4.6
gcc-4.8
gcc-4.9
gcc-5
gcc-6


thanks! If you need to reduce the test matrix I don't see a concern
of dropping 4.6 and 4.8.
4.4 is good for old stuff, 4.9 is the most stable and 5/6 are good
for new warnings.


Not a burden at all. I've enabled them all. :)

Thanks,
Fengguang


Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-18 Thread Fengguang Wu

Hi Alexei,

On Mon, Jul 18, 2016 at 05:33:07PM -0700, Alexei Starovoitov wrote:

On Mon, Jul 18, 2016 at 03:50:58PM -0700, a...@linux-foundation.org wrote:

From: Andrew Morton 
Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
initialization bug

kernel/trace/bpf_trace.c: In function 'bpf_event_output':
kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
initializer
kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
kernel/trace/bpf_trace.c:312: warning: (near initialization for 
'raw.frag.')

Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
output")
Acked-by: Daniel Borkmann 
Cc: Alexei Starovoitov 
Cc: David S. Miller 
Signed-off-by: Andrew Morton 


Acked-by: Alexei Starovoitov 

Fengguang can you add gcc-4.4 to buildbot. Thanks!


Sure. Currently we only test gcc-6. It'd be easy to test more versions
concurrently, like

gcc-4.4
gcc-4.6
gcc-4.8
gcc-4.9
gcc-5
gcc-6

Thanks,
Fengguang


Re: [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output

2016-07-12 Thread Fengguang Wu

Hi Daniel,

On Wed, Jul 13, 2016 at 01:45:47AM +0200, Daniel Borkmann wrote:

On 07/13/2016 01:25 AM, kbuild test robot wrote:

Hi,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Daniel-Borkmann/BPF-event-output-helper-improvements/20160713-065944
config: s390-allyesconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
 wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
 chmod +x ~/bin/make.cross
 # save the attached .config to linux build tree
 make.cross ARCH=s390

All warnings (new ones prefixed by >>):

kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output':
kernel/trace/bpf_trace.c:284:1: warning: 'bpf_perf_event_output' uses 
dynamic stack allocation
 }
 ^
kernel/trace/bpf_trace.c: In function 'bpf_event_output':

kernel/trace/bpf_trace.c:319:1: warning: 'bpf_event_output' uses dynamic stack 
allocation

 }
 ^


Hmm, searching a bit on lkml, it seems these warnings on s390 are actually 
mostly
harmless I believe [1][2] ... looks like they are there to find structs sitting
on stack, for example, at least that's also what the currently existing one in 
the
above line (bpf_trace.c +284) appears to be about.


Yes it does look so. All such warnings happen only in s390:

% g -h -o '[^ ]*config' *dynamic-stack* | sort | uniq -c | sort -nr
   118 s390-allyesconfig
80 s390-allmodconfig

Let's ignore all of them on s390.

Thanks,
Fengguang


  [1] http://lkml.iu.edu/hypermail/linux/kernel/1601.2/04074.html
  [2] https://lkml.org/lkml/2013/6/25/42


Re: [PATCH] bpf: avoid warning for wrong pointer cast

2016-04-18 Thread Fengguang Wu
Hi Alexei,

On Sat, Apr 16, 2016 at 05:47:42PM -0700, Alexei Starovoitov wrote:
> On Sat, Apr 16, 2016 at 10:29:33PM +0200, Arnd Bergmann wrote:
> > Two new functions in bpf contain a cast from a 'u64' to a
> > pointer. This works on 64-bit architectures but causes a warning
> > on all 32-bit architectures:
> > 
> > kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output_tp':
> > kernel/trace/bpf_trace.c:350:13: error: cast to pointer from integer of 
> > different size [-Werror=int-to-pointer-cast]
> >   u64 ctx = *(long *)r1;
> > 
> > This changes the cast to first convert the u64 argument into a uintptr_t,
> > which is guaranteed to be the same size as a pointer.
> > 
> > Signed-off-by: Arnd Bergmann 
> > Fixes: 9940d67c93b5 ("bpf: support bpf_get_stackid() and 
> > bpf_perf_event_output() in tracepoint programs")
> 
> Thanks.
> Acked-by: Alexei Starovoitov 
> 
> I guess I started to rely on 0-day build-bot too much.
> This patch has been in my tree for 2+ weeks and then in net-next and
> I didn't receive a single email from build-bot about this warning,
> though I do receive them for my other work-in-progress stuff. Odd.
> Fengguang, any idea why build-bot sometimes silent?

Sorry I went off for some time.. Philip, would you help have a check?

Thanks,
Fengguang


Re: [LKP] [PATCH v2] rhashtable: Kill harmless RCU warning in rhashtable_walk_init

2015-12-18 Thread Fengguang Wu
On Fri, Dec 18, 2015 at 11:42:59PM -0500, David Miller wrote:
> From: Herbert Xu 
> Date: Sat, 19 Dec 2015 10:45:28 +0800
> 
> > On Fri, Dec 18, 2015 at 04:27:31PM -0500, David Miller wrote:
> >> From: Herbert Xu 
> >> Date: Fri, 18 Dec 2015 21:14:08 +0800
> >> 
> >> > On Fri, Dec 18, 2015 at 04:54:14AM -0800, Eric Dumazet wrote:
> >> >>
> >> >> You can avoid the comment by using the self documented and lockdep
> >> >> enabled primitive
> >> >> 
> >> >> iter->walker->tbl = rcu_dereference_protected(ht->tbl,
> >> >>   
> >> >> lockdep_is_held(>lock));
> >> > 
> >> > That is just gross.  I think a comment is much better in this case.
> >> 
> >> Herbert, this macro was created exactly to handle this situation,
> >> and this is what we do everywhere else in the tree.
> > 
> > OK.
> > 
> > ---8<---
> > The commit f9f51b8070be3e829100614a7372b219723b864f ("rhashtable:
> > Fix walker list corruption") causes a suspicious RCU usage warning
> > because we no longer hold ht->mutex when we dereference ht->tbl.
> > 
> > However, this is a false positive because we now hold ht->lock
> > which also guarantees that ht->tbl won't disppear from under us.
> > 
> > This patch kills the warning by using rcu_dereference_protected.
> > 
> > Reported-by: kernel test robot 
> > Signed-off-by: Herbert Xu 
> 
> The correct commti SHA1 is c6ff5268293ef98e48a99597e765ffc417e39fa5.
> 
> Or at least, when I run:
> 
>   git show f9f51b8070be3e829100614a7372b219723b864f
> 
> I get:
> 
>   fatal: bad object f9f51b8070be3e829100614a7372b219723b864f
> 
> :-)

Oops, that commit comes from 0day robot :-)

> https://github.com/0day-ci/linux 
> Herbert-Xu/rhashtable-Fix-walker-list-corruption/20151216-164833
> commit f9f51b8070be3e829100614a7372b219723b864f ("rhashtable: Fix walker list 
> corruption")

commit f9f51b8070be3e829100614a7372b219723b864f
Author: Herbert Xu 
AuthorDate: Wed Dec 16 16:45:54 2015 +0800
Commit: 0day robot 
CommitDate: Wed Dec 16 16:48:36 2015 +0800

rhashtable: Fix walker list corruption

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kbuild-all] [PATCH] mpls: fix semicolon.cocci warnings

2015-11-03 Thread Fengguang Wu
On Tue, Nov 03, 2015 at 11:17:36AM -0500, David Miller wrote:
> From: kbuild test robot <l...@intel.com>
> Date: Tue, 3 Nov 2015 23:25:39 +0800
> 
> > net/mpls/af_mpls.c:722:22-23: Unneeded semicolon
> > 
> > 
> >  Remove unneeded semicolon.
> > 
> > Generated by: scripts/coccinelle/misc/semicolon.cocci
> > 
> > CC: Roopa Prabhu <ro...@cumulusnetworks.com>
> > Signed-off-by: Fengguang Wu <fengguang...@intel.com>
> > ---
> > 
> >  af_mpls.c |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > --- a/net/mpls/af_mpls.c
> > +++ b/net/mpls/af_mpls.c
> > @@ -719,9 +719,9 @@ static int mpls_nh_build_multi(struct mp
> >  
> > rtnh = rtnh_next(rtnh, );
> > nhs++;
> > -   } endfor_nexthops(rt);
> > +   } endfor_nexthops(rt)
> >  
> > -   rt->rt_nhn = nhs;
> > +   rt->rt_nhn = nhs;
> 
> This new indentation of "rt->rt_nhn = nhs;" is not correct.

Yes. CC Julia for coccinelle.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next] tipc: tipc_link_is_active() can be static

2015-10-25 Thread Fengguang Wu
On Sun, Oct 25, 2015 at 06:33:18AM -0700, David Miller wrote:
> From: kbuild test robot <fengguang...@intel.com>
> Date: Sat, 24 Oct 2015 23:11:00 +0800
> 
> > TO: "David S. Miller" <da...@davemloft.net>
> > CC: netdev@vger.kernel.org
> > CC: Jon Maloy <jon.ma...@ericsson.com>
> > CC: Ying Xue <ying@windriver.com>
> > CC: tipc-discuss...@lists.sourceforge.net
> > CC: linux-ker...@vger.kernel.org
> > 
> > 
> > Signed-off-by: Fengguang Wu <fengguang...@intel.com>
> 
> Why doesn't the kbuild robot run on it's own changes? :-/

It does, however it detects only build failures (which indicates a
false sparse warning) to avoid sending bad make-it-static patch
and the false warning.

The build warning looks easier to be discovered and fixed in the
larger loop of

apply patch => git push => 0day build test

>   CC [M]  net/tipc/link.o
> net/tipc/link.c:176:12: warning: ‘tipc_link_is_active’ defined but not used 
> [-Wunused-function]

If the robot detected the above warning, it'll still need to send the
report out. Otherwise we lose a chance to notice tipc_link_is_active()
is not used.

However it may be valuable to include possible new warnings inside
the patch changelog, so that maintainers can immediately see the
consequences of applying the patch.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kbuild-all] [PATCH net-next] net: Lookup actual route when oif is VRF device

2015-10-11 Thread Fengguang Wu
Hi David,

On Mon, Oct 05, 2015 at 12:10:12PM -0600, David Ahern wrote:
> On 10/5/15 12:01 PM, kbuild test robot wrote:
> >Hi David,
> >
> >[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please 
> >ignore]
> >
> 
> net-next patches can *not* be applied to Linus' tree. If you are going to
> run the build bot for patches submitted to netdev, please be sure to apply
> those patches to net-next tree.

OK. Hope these rules will work:

- "[PATCH net-next]" will be tested on net-next/master
- "[PATCH net]"  will be tested on net/master
- "[PATCH nf]"   will be tested on nf/master
- "[PATCH nf-next]"  will be tested on nf-next/master

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next:master 33/33] net/sched/sch_dsmark.c:316:1: error: unrecognizable insn:

2015-09-18 Thread Fengguang Wu
> > All error/warnings (new ones prefixed by >>):
> > 
> >net/sched/sch_dsmark.c: In function 'dsmark_dequeue':
> > >> net/sched/sch_dsmark.c:316:1: error: unrecognizable insn:
> >(insn 245 244 119 15 (set (reg:QI 11 r11 [179])
> >(and:QI (mem/s/j:QI (reg/f:SI 2 r2 [orig:48 D.44551 ] [48]) [0 
> > D.44551_34->mask+0 S1 A8])
> >(reg:QI 11 r11 [179]))) include/net/dsfield.h:33 -1
> > (nil))
> > >> net/sched/sch_dsmark.c:316:1: internal compiler error: in extract_insn, 
> > >> at recog.c:2109
> >Please submit a full bug report,
> >with preprocessed source if appropriate.
> >See  for instructions.

> I assume this is an informational message, and that you'll submit a gcc
> bug report ?

I'm not sure if gcc people would be interested because that's a pretty
old cross compiler for ARCH=cris.

Someday 0day will be able to track regressions in gcc, which may be
useful for them. :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next:master 6/12] include/linux/usb/cdc.h:23: error: redefinition of 'struct usb_cdc_parsed_header'

2015-09-16 Thread Fengguang Wu
On Tue, Sep 15, 2015 at 01:27:42PM -0700, David Miller wrote:
> From: kbuild test robot 
> Date: Wed, 16 Sep 2015 03:57:11 +0800
> 
> > All error/warnings (new ones prefixed by >>):
> > 
> >In file included from drivers/usb/gadget/function/u_ether.h:20,
> > from drivers/usb/gadget/legacy/cdc2.c:16:
> >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared 
> > inside parameter list
> >include/linux/usb/cdc.h:47: warning: its scope is only this definition 
> > or declaration, which is probably not what you want
> >In file included from drivers/usb/gadget/function/u_serial.h:16,
> > from drivers/usb/gadget/legacy/cdc2.c:17:
> >>> include/linux/usb/cdc.h:23: error: redefinition of 'struct 
> >>> usb_cdc_parsed_header'
> >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared 
> > inside parameter list
> >>> include/linux/usb/cdc.h:47: error: conflicting types for 
> >>> 'cdc_parse_cdc_header'
> >include/linux/usb/cdc.h:47: error: previous declaration of 
> > 'cdc_parse_cdc_header' was here
> 
> This may be a side effect of the initial warning, does this reproduce with
> that fixed?  Please show me what the warning looks like in that case.

Dave, net-next/master commit ad1e7b97b3 ("cdc: Fix build warning.")
still has errors.

The problem is, the header file  is included twice.

recent_errors
├── arm-arm5
│   ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header
│   └── 
include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header
├── arm-arm67
│   ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header
│   └── 
include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header
├── arm-mmp
│   ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header
│   └── 
include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header
├── arm-omap2plus_defconfig
│   ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header
│   └── 
include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header
├── avr32-atngw100_defconfig
│   └── 
include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header
├── avr32-atstk1006_defconfig
│   └── 
include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header
└── i386-allmodconfig
├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header
└── 
include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header

The error messages are now:

In file included from drivers/usb/gadget/function/u_ether.h:20:0,
 from drivers/usb/gadget/function/f_ncm.c:26:
include/linux/usb/cdc.h:23:8: error: redefinition of 'struct 
usb_cdc_parsed_header'
 struct usb_cdc_parsed_header {
^
In file included from drivers/usb/gadget/function/f_ncm.c:24:0:
include/linux/usb/cdc.h:23:8: note: originally defined here
 struct usb_cdc_parsed_header {
^
In file included from drivers/usb/gadget/function/u_ether.h:20:0,
 from drivers/usb/gadget/function/f_ncm.c:26:
include/linux/usb/cdc.h:44:5: error: conflicting types for 
'cdc_parse_cdc_header'
 int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr,
 ^
In file included from drivers/usb/gadget/function/f_ncm.c:24:0:
include/linux/usb/cdc.h:44:5: note: previous declaration of 
'cdc_parse_cdc_header' was here
 int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr,
 ^

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next:master 6/12] include/linux/usb/cdc.h:23: error: redefinition of 'struct usb_cdc_parsed_header'

2015-09-16 Thread Fengguang Wu
On Wed, Sep 16, 2015 at 10:54:54AM -0700, David Miller wrote:
> From: Fengguang Wu <fengguang...@intel.com>
> Date: Wed, 16 Sep 2015 21:06:58 +0800
> 
> > On Tue, Sep 15, 2015 at 01:27:42PM -0700, David Miller wrote:
> >> From: kbuild test robot <fengguang...@intel.com>
> >> Date: Wed, 16 Sep 2015 03:57:11 +0800
> >> 
> >> > All error/warnings (new ones prefixed by >>):
> >> > 
> >> >In file included from drivers/usb/gadget/function/u_ether.h:20,
> >> > from drivers/usb/gadget/legacy/cdc2.c:16:
> >> >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared 
> >> > inside parameter list
> >> >include/linux/usb/cdc.h:47: warning: its scope is only this 
> >> > definition or declaration, which is probably not what you want
> >> >In file included from drivers/usb/gadget/function/u_serial.h:16,
> >> > from drivers/usb/gadget/legacy/cdc2.c:17:
> >> >>> include/linux/usb/cdc.h:23: error: redefinition of 'struct 
> >> >>> usb_cdc_parsed_header'
> >> >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared 
> >> > inside parameter list
> >> >>> include/linux/usb/cdc.h:47: error: conflicting types for 
> >> >>> 'cdc_parse_cdc_header'
> >> >include/linux/usb/cdc.h:47: error: previous declaration of 
> >> > 'cdc_parse_cdc_header' was here
> >> 
> >> This may be a side effect of the initial warning, does this reproduce with
> >> that fixed?  Please show me what the warning looks like in that case.
> > 
> > Dave, net-next/master commit ad1e7b97b3 ("cdc: Fix build warning.")
> > still has errors.
> > 
> > The problem is, the header file  is included twice.
> 
> That's not possible after the patch I committed from Stephen Rothwell
> which adds proper include guards:

Yes, this patch fixed the errors nicely!

Thanks,
Fengguang

> 
> commit b84ee0d7f375ed7840c7c110d46eac24cf94b2a2
> Author: Stephen Rothwell <s...@canb.auug.org.au>
> Date:   Wed Sep 16 11:10:16 2015 +1000
> 
> cdc: add header guards
> 
> Signed-off-by: Stephen Rothwell <s...@canb.auug.org.au>
> Signed-off-by: David S. Miller <da...@davemloft.net>
> 
> diff --git a/include/linux/usb/cdc.h b/include/linux/usb/cdc.h
> index 959d0c8..b5706f9 100644
> --- a/include/linux/usb/cdc.h
> +++ b/include/linux/usb/cdc.h
> @@ -7,6 +7,8 @@
>   * modify it under the terms of the GNU General Public License
>   * version 2 as published by the Free Software Foundation.
>   */
> +#ifndef __LINUX_USB_CDC_H
> +#define __LINUX_USB_CDC_H
>  
>  #include 
>  
> @@ -45,3 +47,5 @@ int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr,
>   struct usb_interface *intf,
>   u8 *buffer,
>   int buflen);
> +
> +#endif /* __LINUX_USB_CDC_H */
> diff --git a/include/uapi/linux/usb/cdc.h b/include/uapi/linux/usb/cdc.h
> index b6a9cdd..e2bc417 100644
> --- a/include/uapi/linux/usb/cdc.h
> +++ b/include/uapi/linux/usb/cdc.h
> @@ -6,8 +6,8 @@
>   * firmware based USB peripherals.
>   */
>  
> -#ifndef __LINUX_USB_CDC_H
> -#define __LINUX_USB_CDC_H
> +#ifndef __UAPI_LINUX_USB_CDC_H
> +#define __UAPI_LINUX_USB_CDC_H
>  
>  #include 
>  
> @@ -444,4 +444,4 @@ struct usb_cdc_ncm_ndp_input_size {
>  #define USB_CDC_NCM_CRC_NOT_APPENDED 0x00
>  #define USB_CDC_NCM_CRC_APPENDED 0x01
>  
> -#endif /* __LINUX_USB_CDC_H */
> +#endif /* __UAPI_LINUX_USB_CDC_H */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[netfilter] INFO: task kworker/u2:0:6 blocked for more than 120 seconds.

2015-07-19 Thread Fengguang Wu
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git master

commit 085db2c04557d31db61541f361bd8b4de92c9939
Author: Eric W. Biederman ebied...@xmission.com
AuthorDate: Fri Jul 10 18:15:06 2015 -0500
Commit: Pablo Neira Ayuso pa...@netfilter.org
CommitDate: Wed Jul 15 18:17:26 2015 +0200

netfilter: Per network namespace netfilter hooks.

- Add a new set of functions for registering and unregistering per
  network namespace hooks.

- Modify the old global namespace hook functions to use the per
  network namespace hooks in their implementation, so their remains a
  single list that needs to be walked for any hook (this is important
  for keeping the hook priority working and for keeping the code
  walking the hooks simple).

- Only allow registering the per netdevice hooks in the network
  namespace where the network device lives.

- Dynamically allocate the structures in the per network namespace
  hook list in nf_register_net_hook, and unregister them in
  nf_unregister_net_hook.

  Dynamic allocate is required somewhere as the number of network
  namespaces are not fixed so we might as well allocate them in the
  registration function.

  The chain of registered hooks on any list is expected to be small so
  the cost of walking that list to find the entry we are unregistering
  should also be small.

  Performing the management of the dynamically allocated list entries
  in the registration and unregistration functions keeps the complexity
  from spreading.

Signed-off-by: Eric W. Biederman ebied...@xmission.com

+--++++
|  | 0edcf282b0 | 085db2c045 | 
91bd7e9bd9 |
+--++++
| boot_successes   | 909| 215| 
30 |
| boot_failures| 1  | 15 | 
4  |
| BUG:kernel_test_hang | 1  | 0  | 
1  |
| INFO:task_blocked_for_more_than#seconds  | 0  | 15 | 
4  |
| EIP_is_at_default_send_IPI_mask_logical  | 0  | 15 | 
3  |
| Kernel_panic-not_syncing:hung_task:blocked_tasks | 0  | 15 | 
3  |
| backtrace:reg_timeout_work   | 0  | 11 | 
2  |
| backtrace:watchdog   | 0  | 15 | 
3  |
| backtrace:cleanup_net| 0  | 4  | 
2  |
| backtrace:reg_check_chans_work   | 0  | 0  | 
1  |
| backtrace:do_group_exit  | 0  | 0  | 
1  |
| backtrace:SyS_exit_group | 0  | 0  | 
1  |
| backtrace:do_vfs_ioctl   | 0  | 0  | 
1  |
| backtrace:SyS_ioctl  | 0  | 0  | 
1  |
| backtrace:addrconf_verify_work   | 0  | 0  | 
1  |
+--++++

[   24.456849] cfg80211: Timeout while waiting for CRDA to reply, restoring 
regulatory settings
Deconfiguring network interfaces... 
[   81.336920] cfg80211: Verifying active interfaces after reg change
[  241.816853] INFO: task kworker/u2:0:6 blocked for more than 120 seconds.
[  241.818319]   Not tainted 4.2.0-rc2-00147-g085db2c #3
[  241.819076] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[  241.820103] kworker/u2:0D be827000  7124 6  2 0x
[  241.821015] Workqueue: events_power_efficient reg_timeout_work
[  241.821829]  b02a5e5c 0096 bdf5d300 be827000 b0298000  b02a6000 
b0298000
[  241.823067]  b02a5e6c b1718a1c  b1b72080 b02a5e74 b1718ce0 b02a5ea0 
b171ad73
[  241.824353]  b1b720a4 0246 bd82de64 bdcede98 b0298000 b02a5e84 b1b7f800 
b02535c0
[  241.825525] Call Trace:
[  241.825866]  [b1718a1c] schedule+0x53/0x7b
[  241.826489]  [b1718ce0] schedule_preempt_disabled+0x13/0x1b
[  241.827267]  [b171ad73] mutex_lock_nested+0x22b/0x3dd
[  241.827978]  [b147d2da] rtnl_lock+0x14/0x16
[  241.828571]  [b1679607] reg_timeout_work+0x17/0x2f
[  241.829310]  [b106eba6] process_one_work+0x3fc/0x730
[  241.830058]  [b106f51d] ? worker_thread+0x35/0x77d
[  241.830791]  [b106fa99] worker_thread+0x5b1/0x77d
[  241.831452]  [b106f4e8] ? max_active_store+0x5a/0x5a
[  241.832156]  [b1074c11] kthread+0xe3/0xe8
[  241.832706]  [b171e460] ret_from_kernel_thread+0x20/0x30
[  

[rhashtable] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:301 __debug_object_init()

2015-07-13 Thread Fengguang Wu
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit 97defe1ecf868b8127f8e62395499d6a06e4c4b1
Author: Thomas Graf tg...@suug.ch
AuthorDate: Fri Jan 2 23:00:20 2015 +0100
Commit: David S. Miller da...@davemloft.net
CommitDate: Sat Jan 3 14:32:57 2015 -0500

rhashtable: Per bucket locks  deferred expansion/shrinking

Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.

The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.

Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.

In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts.  Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.

The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.

Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: David S. Miller da...@davemloft.net

+--++++
|  | 113948d841 | 
97defe1ecf | 05016b0f0a |
+--++++
| boot_successes   | 61 | 0 
 | 0  |
| boot_failures| 27 | 22
 | 1931   |
| WARNING:at_net/netlink/genetlink.c:#genl_unbind()| 27 | 1 
 ||
| backtrace:do_group_exit  | 27 | 1 
 ||
| backtrace:SyS_exit_group | 27 | 1 
 ||
| WARNING:at_lib/debugobjects.c:#__debug_object_init() | 0  | 22
 | 1931   |
| backtrace:__debug_object_init| 0  | 22
 | 1931   |
| backtrace:warn_slowpath_null | 0  | 22
 | 1931   |
| backtrace:debug_object_init  | 0  | 22
 | 1931   |
| backtrace:__init_work| 0  | 22
 | 1931   |
| backtrace:rhashtable_init| 0  | 22
 | 1931   |
| backtrace:test_rht_init  | 0  | 22
 | 1931   |
| backtrace:kernel_init_freeable   | 0  | 22
 | 1931   |
| backtrace:init_timer_key | 0  | 22
 ||
| RIP:__asan_load8 | 0  | 0 
 | 6  |
| Kernel_panic-not_syncing:softlockup:hung_tasks   | 0  | 0 
 | 18 |
| backtrace:erase_augmented| 0  | 0 
 | 4  |
| backtrace:rbtree_test_init   | 0  | 0 
 | 18 |
| RIP:__asan_loadN | 0  | 0 
 | 4  |
| backtrace:insert_augmented   | 0  | 0 
 | 5  |
| RIP:__asan_load4 | 0  | 0 
 | 2  |
| backtrace:apic_timer_interrupt   | 0  | 0 
 | 6  |
| RIP:insert_augmented | 0  | 0 
 | 2  |
| backtrace:rb_erase   | 0  | 0 
 | 1  |
| invoked_oom-killer:gfp_mask=0x   | 0  | 0 
 | 29 |
| Mem-Info | 0  | 0 
 | 29 |
| Out_of_memory:Kill_process 

Re: [rhashtable] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:301 __debug_object_init()

2015-07-13 Thread Fengguang Wu
Sorry please ignore -- this no longer happen in linux-next, so should be fine.

On Tue, Jul 14, 2015 at 01:19:57PM +0800, Fengguang Wu wrote:
 Greetings,
 
 0day kernel testing robot got the below dmesg and the first bad commit is
 
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
 
 commit 97defe1ecf868b8127f8e62395499d6a06e4c4b1
 Author: Thomas Graf tg...@suug.ch
 AuthorDate: Fri Jan 2 23:00:20 2015 +0100
 Commit: David S. Miller da...@davemloft.net
 CommitDate: Sat Jan 3 14:32:57 2015 -0500
 
 rhashtable: Per bucket locks  deferred expansion/shrinking
 
 Introduces an array of spinlocks to protect bucket mutations. The number
 of spinlocks per CPU is configurable and selected based on the hash of
 the bucket. This allows for parallel insertions and removals of entries
 which do not share a lock.
 
 The patch also defers expansion and shrinking to a worker queue which
 allows insertion and removal from atomic context. Insertions and
 deletions may occur in parallel to it and are only held up briefly
 while the particular bucket is linked or unzipped.
 
 Mutations of the bucket table pointer is protected by a new mutex, read
 access is RCU protected.
 
 In the event of an expansion or shrinking, the new bucket table allocated
 is exposed as a so called future table as soon as the resize process
 starts.  Lookups, deletions, and insertions will briefly use both tables.
 The future table becomes the main table after an RCU grace period and
 initial linking of the old to the new table was performed. Optimization
 of the chains to make use of the new number of buckets follows only the
 new table is in use.
 
 The side effect of this is that during that RCU grace period, a bucket
 traversal using any rht_for_each() variant on the main table will not see
 any insertions performed during the RCU grace period which would at that
 point land in the future table. The lookup will see them as it searches
 both tables if needed.
 
 Having multiple insertions and removals occur in parallel requires nelems
 to become an atomic counter.
 
 Signed-off-by: Thomas Graf tg...@suug.ch
 Signed-off-by: David S. Miller da...@davemloft.net
 
 +--++++
 |  | 113948d841 | 
 97defe1ecf | 05016b0f0a |
 +--++++
 | boot_successes   | 61 | 0   
| 0  |
 | boot_failures| 27 | 22  
| 1931   |
 | WARNING:at_net/netlink/genetlink.c:#genl_unbind()| 27 | 1   
||
 | backtrace:do_group_exit  | 27 | 1   
||
 | backtrace:SyS_exit_group | 27 | 1   
||
 | WARNING:at_lib/debugobjects.c:#__debug_object_init() | 0  | 22  
| 1931   |
 | backtrace:__debug_object_init| 0  | 22  
| 1931   |
 | backtrace:warn_slowpath_null | 0  | 22  
| 1931   |
 | backtrace:debug_object_init  | 0  | 22  
| 1931   |
 | backtrace:__init_work| 0  | 22  
| 1931   |
 | backtrace:rhashtable_init| 0  | 22  
| 1931   |
 | backtrace:test_rht_init  | 0  | 22  
| 1931   |
 | backtrace:kernel_init_freeable   | 0  | 22  
| 1931   |
 | backtrace:init_timer_key | 0  | 22  
||
 | RIP:__asan_load8 | 0  | 0   
| 6  |
 | Kernel_panic-not_syncing:softlockup:hung_tasks   | 0  | 0   
| 18 |
 | backtrace:erase_augmented| 0  | 0   
| 4  |
 | backtrace:rbtree_test_init   | 0  | 0   
| 18 |
 | RIP:__asan_loadN | 0  | 0   
| 4  |
 | backtrace:insert_augmented   | 0  | 0   
| 5  |
 | RIP:__asan_load4 | 0  | 0   
| 2  |
 | backtrace:apic_timer_interrupt   | 0  | 0   
| 6  |
 | RIP:insert_augmented | 0  | 0   
| 2  |
 | backtrace:rb_erase   | 0  | 0   
| 1