Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-08-31 Thread Bjørn Mork
Phil  writes:

> Hi, I hope you're all doing well.
>
> Shall we/I maybe reopen a new issue?

I believe so.  I am almost sure we fixed the original memset BUG in
cdc_ncm_fill_tx_frame. Or at least one of them...

So you are probably seeing another issue if you still have problems with
that fix in place.  Although the issues may or may not be related.  But
still, aother bug report would make it easier to track given that we
alread have one fix for 893393.

> I'm still affected by this and I'd could use some advice how to debug
> the issue a little bit better, especially since the kexec kernel
> crashdumps appear not to be helpful. Can I maybe compile the module with
> special debug flags and load it via. dkms or something?

Crash reports of some sort are best.  But any info is useful.  Like what
device is this really and what mode is in currently in?  What driver
does it use?  Most Huawei firmwares will support many different modes
using different USB drivers. But laptop internal modems are most likely
not tested with anything but the Windows MBIM class driver, since that
is the certification requirement and only target platform.

You can enable the little debugging that's already in the drivers by
doing something like

 echo 'module cdc_ncm +fp' >/sys/kernel/debug/dynamic_debug/control
 echo 'module cdc_mbim +fp' >/sys/kernel/debug/dynamic_debug/control
 echo 'module huawei_cdc_ncm +fp' >/sys/kernel/debug/dynamic_debug/control

See https://www.kernel.org/doc/html/v4.11/admin-guide/dynamic-debug-howto.html

Not sure it will be useful to debug a freeze though.

> I don't see any actual changes in [cdc_ncm.c][cdc_ncm], besides the one
> change in `cdc_ncm_unbind`.

Not sure I understood this...  Are you referring to the fix for bug
893393?  That's part of the v4.9.111 stable release:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/usb/cdc_ncm.c?h=linux-4.9.y=35fd10aeb2248cc7f8d3d48ccc2eff1cf19918f4

> Also I'm confused why this is happening now again, I managed to do an
> rsync upload with ~10GB over night back then - and my system didn't
> crash - but right now even if I'm just trying to upload a picture to
> twitter via. Firefox my laptop freezes.

Freezing without any Oops or similar? 



Bjørn



Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-08-30 Thread Phil
Hi, I hope you're all doing well.

Shall we/I maybe reopen a new issue?

I'm still affected by this and I'd could use some advice how to debug
the issue a little bit better, especially since the kexec kernel
crashdumps appear not to be helpful. Can I maybe compile the module with
special debug flags and load it via. dkms or something?

I don't see any actual changes in [cdc_ncm.c][cdc_ncm], besides the one
change in `cdc_ncm_unbind`.

Also I'm confused why this is happening now again, I managed to do an
rsync upload with ~10GB over night back then - and my system didn't
crash - but right now even if I'm just trying to upload a picture to
twitter via. Firefox my laptop freezes.

[cdc_ncm]:
https://github.com/torvalds/linux/commits/master/drivers/net/usb/cdc_ncm.c



Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-08-06 Thread debian

On 08/04/2018 02:04 PM, Горбешко Богдан wrote:
Unfortunately, I messed with it for several hours but couldn't reproduce 
the bug intentionally.


I've just launched a long living rsync job and streamed a video via. mpv 
and managed to crash my system this way.
Before the crash I managed to upload for a couple of minutes while 
watching a video via. Firefox.


(mpv downloads the entire video at full until it's done in comparison to 
Firefox which only caches sequences...)


So from my perspective crashes appear to be still showing up, but way 
less frequently.


I'm on 4.17.11-arch1 x86_64 right now.



Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-08-04 Thread Горбешко Богдан
Unfortunately, I messed with it for several hours but couldn't reproduce 
the bug intentionally. Does anyone have any hints on how to do this more 
reliably? I tried to upload several files simultaneously, to fill the 
memory with tmpfs partitions for emulating high memory pressure 
condition, but nothing helped to trigger the crash.


On 8/3/18 12:19 AM, Горбешко Богдан wrote:
I upgraded the kernel to 4.17.8 and experienced the issue again. Not 
sure if the bug is the same technically, but the sympthomes are: I 
tried to upload a 30 MB file, and in the midst got a noisy screen. I 
will try to catch it with kdump to get the backtrace again later.


On 6/29/18 11:17 AM, Bjørn Mork wrote:

This issue should be fixed by commit

  49c2c3f246e2 ("cdc_ncm: avoid padding beyond end of skb")

which has been backported to v4.17.3, v4.16.18 and v4.14.52. Please
check again with one of those kernel versions (or newer).

I see now that the fix doesn't apply cleanly to v4.9 stable due to
unrelated context changes.  I'll go fix that and resubmit a backport for
v4.9, so we get the fix into "stretch" too.  Thanks for reminding me.



Bjørn







Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-08-02 Thread Горбешко Богдан
I upgraded the kernel to 4.17.8 and experienced the issue again. Not 
sure if the bug is the same technically, but the sympthomes are: I tried 
to upload a 30 MB file, and in the midst got a noisy screen. I will try 
to catch it with kdump to get the backtrace again later.


On 6/29/18 11:17 AM, Bjørn Mork wrote:

This issue should be fixed by commit

  49c2c3f246e2 ("cdc_ncm: avoid padding beyond end of skb")

which has been backported to v4.17.3, v4.16.18 and v4.14.52.  Please
check again with one of those kernel versions (or newer).

I see now that the fix doesn't apply cleanly to v4.9 stable due to
unrelated context changes.  I'll go fix that and resubmit a backport for
v4.9, so we get the fix into "stretch" too.  Thanks for reminding me.



Bjørn





Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-07-06 Thread Phil
On Fri, 29 Jun 2018 10:17:20 +0200 =?utf-8?Q?Bj=C3=B8rn_Mork?= 
 wrote:

> This issue should be fixed by commit
>  49c2c3f246e2 ("cdc_ncm: avoid padding beyond end of skb")
>  https://patchwork.kernel.org/patch/10453923/
> Please check again with one of those kernel versions (or newer).

Hi, thank you for your quick response.

I had to wait a bit for the 4.17-3 being released in ArchLinux repos.

I've tested uploading again and couldn't reproduce the crash anymore.
I experienced a single crash - but I'm not sure if it was related to 
uploading. I'll reach out to you if I experience any further crash 
related to modem.


> [...] we get the fix into "stretch" too. Thanks for reminding me.

Thank you for your work, this issue has been super exhausting and I'm 
really thankful that it appears finally to be fixed.


Best wishes,
Phil.



Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-06-29 Thread Bjørn Mork
This issue should be fixed by commit

 49c2c3f246e2 ("cdc_ncm: avoid padding beyond end of skb")

which has been backported to v4.17.3, v4.16.18 and v4.14.52.  Please
check again with one of those kernel versions (or newer).

I see now that the fix doesn't apply cleanly to v4.9 stable due to
unrelated context changes.  I'll go fix that and resubmit a backport for
v4.9, so we get the fix into "stretch" too.  Thanks for reminding me.



Bjørn



Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-06-27 Thread Phil

Hi everybody,

I'm really greatful about stumbling upon this issue, because it 
describes the exact same issue I've been experiencing for a while now.


Basically whenever I upload file/s via. rsync/Firefox/Chromium, within 
several seconds my entire Linux system crashes. I've experienced this 
issue on Debian 10, but it also shows up on ArchLinux. In my case the 
modem in charge is an M.2. module Huawei ME906s (USB ID 12d1:15c1).


I've also tried debugging via. kdump and I've got different kernel 
errors across multiple crashes and I've tried logging my debugging issue 
resolving problems on this gist [0].


It doesn't matter if I'm uploading files from a ramfs (/tmp/) or my SATA 
SSD.


I'm also using modemmanager and network-manager.

I switched ISP and thought the issue was resolved, but I've just tried 
uploading a file again and it still crashes my Linux 4.17.2-1-ARCH 
kernel (so I guess this is a Linux and not Debian only related issue).


[0]: https://gist.github.com/norpol/d5b043d6082ace9fc232527d4835f045 or 
attachment
# Debugging Linux Kernel Crash

## Error description:
Almost everytime I'm uploading a bigger file (65MB in this case) via. my browser (Firefox, build provided by mozilla.org as `.tar.gz`), my system crashes.
Issue especially happens when I'm doing different things at the same time. (Watching a video, reading email + uploading a file). System is using an SSD, bug also appears if the file is served from `/tmp`, though.

-,- | -,-
--- | ---
OS | Debian Testing (Release Buster / 10)
Kernel | `Linux 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14) x86_64 GNU/Linux`
CPU | `Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz`
Machine | Thinkpad T560
EFI | `EFI v2.40 by Lenovo, efi:  SMBIOS=0xb705e000  ACPI=0xb7ffe000  ACPI 2.0=0xb7ffe014  MPS=0xb7f48000  ESRT=0xb6aa8000`
Boot method | efistub
storage | Samsung SSD 840 EVO (256GB)`, LUKS (with LVM), rootfs=btrfs, homefs=ext4, cryptswap in LVM`

The issue is persistent for multiple Kernel upgrades, though. Also showed up back when Debian testing was called Stretch.
Issue mostly appears on file uploads via. LTE-modem.

## Actions

- [ ] Intel uCode upgrade didn't help.
- [ ] Vendor BIOS/uEFI upgrade didn't help.
- [ ] Disabling apparmor didn't help.
- [ ] Disabling/chaning IO scheduler didn't help.
- [ ] Reinstalling operating system from Debian => archlinux didn't help
- [ ] Disabling anything power saving related in BIOS, didn't help
  - See [Skylake crash bug arstechnica (2017)](https://arstechnica.com/information-technology/2017/06/skylake-kaby-lake-chips-have-a-crash-bug-with-hyperthreading-enabled/)
  - Basically setting c-state to 1 [might also work](https://askubuntu.com/questions/749349/how-to-set-intel-idle-max-cstate-1)

Installing and setup `kdump-tools` (had to set `/proc/cmdline` => `nmi_watchdog=1`, otherwise kdump failed to load kdump kernel on crash).

## Other

Early bootup BIOS warning: 
```
[  +0.00] Kernel command line: initrd=\initrd.img root=/dev/mapper/system-root resume=UUID=d9506118-b9e2-49db-9385-f731ef1c8615 ro quiet splash crashkernel=384M nmi_watchdog=1
[  +0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
[  +0.00] Calgary: detecting Calgary via BIOS EBDA area
[  +0.00] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
```

### kdump-tools dmesg error trace

Note: I have multiple crashes, this is the only one containing `[ cut here ]` section.

```
[ cut here ]
WARNING: CPU: 2 PID: 2206 at /build/linux-K4nuoe/linux-4.14.17/mm/vmacache.c:102 vmacache_find+0x96/0xa0
Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter cdc_mbim cdc_wdm cdc_ncm snd_hrtimer snd_seq snd_seq_device cpufreq_userspace cpufreq_powersave cpufreq_conservative wireguard(O) ip6_udp_tunnel udp_tunnel binfmt_misc nls_ascii nls_cp437 vfat fat ext4 mbcache jbd2 fscrypto ecb arc4 iwlmvm snd_soc_skl snd_hda_codec_hdmi snd_soc_skl_ipc intel_rapl snd_soc_sst_ipc btusb x86_pkg_temp_thermal snd_soc_sst_dsp intel_powerclamp btrtl mac80211 btbcm snd_hda_ext_core snd_hda_codec_realtek coretemp btintel snd_soc_sst_match efi_pstore snd_hda_codec_generic kvm_intel bluetooth snd_soc_core snd_compress kvm snd_hda_intel irqbypass uvcvideo videobuf2_vmalloc intel_cstate videobuf2_memops intel_uncore videobuf2_v4l2 iwlwifi intel_rapl_perf snd_hda_codec serio_raw wmi_bmof videobuf2_core snd_hda_core efivars rtsx_pci_ms drbg cfg80211 memstick ansi_cprng snd_hwdep cdc_ether option videodev snd_pcm usb_wwan thinkpad_acpi usbnet iTCO_wdt usbserial mei_me snd_timer ecdh_generic nvram mii iTCO_vendor_support media sg crc16 joydev shpchp mei snd soundcore intel_pch_thermal rfkill battery ac evdev nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_counter nft_ct nf_conntrack nft_meta 

Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-03-19 Thread Горбешко Богдан

On 3/19/18 2:18 AM, Bjørn Mork wrote:

Горбешко Богдан  writes:


vboxdrv(O)
binder_linux(O)
ashmem_linux(O)

Can you reproduce the problem without these modules loaded?
ashmem/binder were installed only 3 weeks ago. And Virtualbox VMs were 
run last time in July 2017, nothing other is expected to use its kernel 
module; however I'll try to blacklist it for now.


AFAICS there is no way the only memset in cdc_ncm can be called with
crashing input parameters. Unless something is scribbling over the
driver's data.
Maybe inspecting the crashdump would shed some light on the possible 
module conflict? If so, I'll try to upload it.


Bjørn






Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-03-18 Thread Bjørn Mork
Горбешко Богдан  writes:

> vboxdrv(O)
> binder_linux(O)
> ashmem_linux(O)

Can you reproduce the problem without these modules loaded?

AFAICS there is no way the only memset in cdc_ncm can be called with
crashing input parameters. Unless something is scribbling over the
driver's data.


Bjørn



Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

2018-03-18 Thread Горбешко Богдан

Package: linux-image-amd64
Version: 4.14+89
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

This bothers me from November 2017, when wvdial broke and I moved to
NetworkManager. While wvdial uses only serial interface (ttyUSB),
NetworkManager sometimes recognizes the modem as ttyUSB and sometimes as 
cdc-
wdm. So maybe the bug is much older as I was not actively using 
huawei_cdc_nbm

module before.

Since that, I started to experience strange system crashes. The only common
thing for them is that HDD activity stops and the cooler keeps working; the
system doesn't respond to anything including REISUB. The screen image was
simply freezing for first weeks, then it started cluttering when crash 
happens.


I was not sure if this is a software problem or a hardware one. I 
couldn't even
strictly determine what conditions lead to this. The only mostly common 
thing

was that it happens on active outgoing traffic (file uploading, torrents
seeding and so). But not sure if every time. Sometimes the issue huddled 
and I

could calmly upload large files for several days or even several weeks, but
then crashes started happening again.

People on a forum suggested me to install crash/kdump. Sometimes kdump 
triggers

on kernel panic, sometimes it doesn't and I still get an unresponsive system
with a cluttered screen. When it triggers, systemd tries to start the 
bunch of
services in a small amount of RAM, so it proceeds very slowly and 
finally hangs
or fails to the maintenance mode because of expired timeouts. Today I 
found out

that in maintenance mode I still can run the kdump service and successfully
collect the kernel dump and dmesg.

[60103.825970] BUG: unable to handle kernel paging request at 
9641f2004000

[60103.825998] IP: __memset+0x24/0x30
[60103.826001] PGD a6a06067 P4D a6a06067 PUD 4f65a063 PMD 72003063 PTE 0
[60103.826013] Oops: 0002 [#1] SMP NOPTI
[60103.826018] Modules linked in: iptable_filter option huawei_cdc_ncm 
cdc_wdm

cdc_ncm usbnet usb_wwan usbserial mii lz4 lz4_compress zram zsmallo
c cpufreq_userspace cpufreq_powersave cpufreq_conservative rtsx_usb_ms 
memstick

uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobu
f2_core videodev media arc4 brcmsmac cordic brcmutil b43 mac80211 
binfmt_misc

cfg80211 fuse xfs ssb libcrc32c rng_core pcmcia pcmcia_core snd_hda_
codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_codec 
snd_hda_core

kvm_amd snd_hwdep kvm snd_pcm_oss snd_mixer_oss joydev irqbypass pcs
pkr snd_pcm bcma serio_raw ideapad_laptop sparse_keymap rfkill k10temp sg
snd_timer wmi snd shpchp sp5100_tco battery ac soundcore evdev acpi_cpuf
req vboxdrv(O) squashfs loop parport_pc ppdev lp parport sunrpc 
binder_linux(O)
[60103.826105]  ashmem_linux(O) ip_tables x_tables autofs4 ext4 crc16 
mbcache
jbd2 crc32c_generic fscrypto ecb crypto_simd cryptd glue_helper 
aes_x86_64 uas

usb_storage sr_mod sd_mod cdrom rtsx_usb_sdmmc mmc_core rtsx_usb mfd_core
amdkfd radeon psmouse ohci_pci ahci libahci i2c_algo_bit ttm atl1c libata
drm_kms_helper ohci_hcd ehci_pci ehci_hcd i2c_piix4 scsi_mod drm usbcore
usb_common video button thermal
[60103.826158] CPU: 0 PID: 5990 Comm: Chrome_DevTools Tainted: G   O
4.14.0-3-amd64 #1 Debian 4.14.17-1
[60103.826162] Hardware name: LENOVO 20081   
/Inagua,

BIOS 41CN28WW(V2.04) 05/03/2012
[60103.826166] task: 964193484fc0 task.stack: b2890137c000
[60103.826171] RIP: 0010:__memset+0x24/0x30
[60103.826174] RSP: :964316c03b68 EFLAGS: 00010216
[60103.826178] RAX:  RBX: fffd RCX:
1ffa5000
[60103.826181] RDX: 0005 RSI:  RDI:
9641f2003ffc
[60103.826184] RBP: 964192f6c800 R08: 304d434e R09:
9641f1d2c004
[60103.826187] R10: 0002 R11: 05ae R12:
9642e6957a80
[60103.826190] R13: 964282ff2ee8 R14: 000d R15:
9642e4843900
[60103.826194] FS:  7f395aaf6700() GS:964316c0()
knlGS:
[60103.826197] CS:  0010 DS:  ES:  CR0: 80050033
[60103.826200] CR2: 9641f2004000 CR3: 13b0c000 CR4:
06f0
[60103.826204] Call Trace:
[60103.826212]  
[60103.826225]  cdc_ncm_fill_tx_frame+0x5e3/0x740 [cdc_ncm]
[60103.826236]  cdc_ncm_tx_fixup+0x57/0x70 [cdc_ncm]
[60103.826246]  usbnet_start_xmit+0x5d/0x710 [usbnet]
[60103.826254]  ? netif_skb_features+0x119/0x250
[60103.826259]  dev_hard_start_xmit+0xa1/0x200
[60103.826267]  sch_direct_xmit+0xf2/0x1b0
[60103.826273]  __dev_queue_xmit+0x5e3/0x7c0
[60103.826280]  ? ip_finish_output2+0x263/0x3c0
[60103.826284]  ip_finish_output2+0x263/0x3c0
[60103.826289]  ? ip_output+0x6c/0xe0
[60103.826293]  ip_output+0x6c/0xe0
[60103.826298]  ? ip_forward_options+0x1a0/0x1a0
[60103.826303]  tcp_transmit_skb+0x516/0x9b0
[60103.826309]  tcp_write_xmit+0x1aa/0xee0
[60103.826313]  ? sch_direct_xmit+0x71/0x1b0
[60103.826318]  tcp_tasklet_func+0x177/0x180
[60103.826325]