Bug#838491: linux-image-4.7.0-0.bpo.1-amd64-unsigned: extreme load averages and over 2000 kworker threads

2016-10-14 Thread Markus Koeberl
On Monday 10 October 2016 16:16:30 Ben Hutchings wrote:
>
> I think this might be fixed by "mm: memcontrol: use special workqueue
> for creating per-memcg caches" included in version 4.7.6-1.  Let us
> know whether that does it.

It did not happen again with linux-image-4.7.0-1-amd64-unsigned (4.7.6-1) 
within the last day.
I guess you can close the BUG.
Thanks!


regards
Markus Köberl
-- 
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeb...@tugraz.at



Bug#838491: linux-image-4.7.0-0.bpo.1-amd64-unsigned: extreme load averages and over 2000 kworker threads

2016-10-10 Thread Ben Hutchings
On Mon, 2016-10-10 at 15:41 +0200, Markus Koeberl wrote:
> Package: src:linux
> Followup-For: Bug #838491
> 
> Dear Maintainer,
> 
>    * What led up to the situation?
> 
> upgrade kernel and systemd to the version proveded in jessie-
> backports
> 
>    * What exactly did you do (or not do) that was effective (or
>  ineffective)?
> 
> during normal usage (slurm cluster node):
> 
> load average: 1290.54, 513.19, 466.29
> 
> the load 5 peaks reache 2000
>
> ps aux | grep kworker | wc -l
> 4188
[...]

I think this might be fixed by "mm: memcontrol: use special workqueue
for creating per-memcg caches" included in version 4.7.6-1.  Let us
know whether that does it.

Ben.

-- 
Ben Hutchings
Unix is many things to many people,
but it's never been everything to anybody.


signature.asc
Description: This is a digitally signed message part


Bug#838491: linux-image-4.7.0-0.bpo.1-amd64-unsigned: extreme load averages and over 2000 kworker threads

2016-10-10 Thread Markus Koeberl
Package: src:linux
Followup-For: Bug #838491

Dear Maintainer,

   * What led up to the situation?

upgrade kernel and systemd to the version proveded in jessie-backports

   * What exactly did you do (or not do) that was effective (or
 ineffective)?

during normal usage (slurm cluster node):

load average: 1290.54, 513.19, 466.29

the load 5 peaks reache 2000

ps aux | grep kworker | wc -l
4188

I followed the Debugging instruction of
https://raw.githubusercontent.com/torvalds/linux/master/Documentation/workqueue.txt

echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
cat /sys/kernel/debug/tracing/trace_pipe > out.txt
after a vew seconds:
cat out.txt | awk '{print $8}' | sort | uniq -c | sort -n
  1 function=do_cache_clean
  1 function=pcpu_balance_workfn
  1 function=xfs_eofblocks_worker
  2 function=neigh_periodic_work
  6 function=xfs_reclaim_worker
  6 function=xlog_cil_push_work
  8 function=disk_events_workfn
  8 function=igb_watchdog_task
 12 function=push_to_pool
 13 function=blk_timeout_work
 15 function=vmstat_shepherd
 22 function=xfs_end_io
 27 function=key_garbage_collector
 27 function=lru_add_drain_per_cpu
 34 function=delayed_fput
 38 function=scsi_requeue_run_queue
 39 function=blk_delay_work
 40 function=cgroup_pidlist_destroy_work_fn
 56 function=flush_to_ldisc
 64 function=cache_reap
 77 function=wb_workfn
101 function=os_execute_work_item
131 function=css_killed_work_fn
142 function=xfs_buf_ioend_work
156 function=vmstat_update
162 function=call_usermodehelper_exec_work
162 function=cgroup_release_agent
409 function=vmpressure_work_fn
497 function=css_release_work_fn
500 function=css_free_work_fn
  47931 function=memcg_kmem_cache_create_func


I found https://bugzilla.kernel.org/show_bug.cgi?id=172981 which seams to be 
the same problem.



-- Package-specific info:
** Version:
Linux version 4.7.0-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 
4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.7.5-1~bpo8+2 (2016-10-01)

** Command line:
BOOT_IMAGE=/vmlinuz-4.7.0-0.bpo.1-amd64 
root=UUID=d3b74f44-0f5e-4ba1-9606-ad42b76e5918 ro cgroup_enable=memory 
swapaccount=1 elevator=deadline quiet nomodeset nouveau.modeset=0

** Tainted: POE (12289)
 * Proprietary module has been loaded.
 * Out-of-tree module has been loaded.
 * Unsigned module has been loaded.


** Model information
sys_vendor: Supermicro
product_name: X10SRA
product_version: 0123456789
chassis_vendor: Supermicro
chassis_version: 0123456789
bios_vendor: American Megatrends Inc.
bios_version: 2.0
board_vendor: Supermicro
board_name: X10SRA
board_version: 1.01

** Loaded modules:
8021q(E)
garp(E)
mrp(E)
stp(E)
llc(E)
nvidia_drm(POE)
nvidia_modeset(POE)
nvidia(POE)
drm_kms_helper(E)
drm(E)
openafs(POE)
nfsd(E)
auth_rpcgss(E)
nfs_acl(E)
nfs(E)
lockd(E)
grace(E)
fscache(E)
sunrpc(E)
intel_rapl(E)
sb_edac(E)
edac_core(E)
x86_pkg_temp_thermal(E)
intel_powerclamp(E)
coretemp(E)
xfs(E)
libcrc32c(E)
snd_hda_codec_hdmi(E)
iTCO_wdt(E)
iTCO_vendor_support(E)
mxm_wmi(E)
evdev(E)
kvm_intel(E)
kvm(E)
irqbypass(E)
crct10dif_pclmul(E)
crc32_pclmul(E)
ghash_clmulni_intel(E)
hmac(E)
drbg(E)
ansi_cprng(E)
aesni_intel(E)
aes_x86_64(E)
lrw(E)
gf128mul(E)
glue_helper(E)
ablk_helper(E)
cryptd(E)
pcspkr(E)
serio_raw(E)
snd_hda_codec_realtek(E)
snd_hda_codec_generic(E)
snd_hda_intel(E)
snd_hda_codec(E)
snd_hda_core(E)
snd_hwdep(E)
snd_pcm(E)
snd_timer(E)
snd(E)
soundcore(E)
lpc_ich(E)
mfd_core(E)
sg(E)
i2c_i801(E)
shpchp(E)
ipmi_msghandler(E)
wmi(E)
acpi_power_meter(E)
tpm_tis(E)
tpm(E)
button(E)
usbhid(E)
hid(E)
fuse(E)
autofs4(E)
ext4(E)
crc16(E)
jbd2(E)
crc32c_generic(E)
mbcache(E)
dm_mod(E)
sr_mod(E)
cdrom(E)
sd_mod(E)
crc32c_intel(E)
psmouse(E)
ahci(E)
igb(E)
libahci(E)
ehci_pci(E)
i2c_algo_bit(E)
ehci_hcd(E)
dca(E)
ptp(E)
pps_core(E)
xhci_pci(E)
libata(E)
xhci_hcd(E)
usbcore(E)
scsi_mod(E)
usb_common(E)
fjes(E)

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Haswell-E DMI2 [8086:2f00] (rev 
02)
Subsystem: Super Micro Computer Inc Device [15d9:0857]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 

00:01.0 PCI bridge [0604]: Intel Corporation Haswell-E PCI Express Root Port 1 
[8086:2f02] (rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: 
Kernel driver in use: pcieport

00:03.0 PCI bridge [0604]: Intel Corporation Haswell-E PCI Express Root Port 3 
[8086:2f08] (rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle-