Bug#858125: e1000: ethernet interface hangs occasionally, kernel reports hang
I have determined that Debian was complaining about my ethernet port because I had flow control enabled on the switch, and the switch was getting easily overwhelmed and hanging, so the Debian resets were valid. Thank you for the research on this. I think you can close this case. --- On Wed, Mar 22, 2017 at 02:42:30AM +, Ben Hutchings wrote: > Control: retitle -1 TX watchdog fires on e1000e interface with flow control > enabled > > On Tue, 2017-03-21 at 18:36 -0400, Bruce Momjian,,, wrote: > > On Tue, Mar 21, 2017 at 04:04:11PM -0400, Bruce Momjian,,, wrote: > > > I think this proves my problems are related to flow control. How would > > > you like to proceed? Is there a patch or change you would like me to > > > test? Just close the ticket? > > > > > > I have a fix, but it is likely others would not know they had this > > > problem unless they were monitoring their kernel logs or their network > > > traffic for lag. > > > > Oh, I should also mention the port that is having problems is connected > > to a NetGear GS108Ev3 switch, with current firmware, version 2.00.09. > > The port connected to my Actiontec FIOS router is not having problems. > > I don't know about any specific bug, but if the switch sends flow > control XOFF frames continually for long enough (usually 5 seconds) > this will trigger the TX watchdog. > > It sounds like your switch implements flow control properly (some > broken switches auto-negotiate it but actually flood flow control > frames). However, if a device on some other port (that also has flow > control enabled) sends XOFF frames continually *and* your server sends > frames that should go to that other port, the switch will do the same > to the server once the switch's internal queue has filled up. > > If the switch has port statistics including numbers of pause frames > then you can see where they are coming from, but I think it doesn't. > Without that information it's going to be hard to tell exactly where > the fault lies. > > The e1000e driver *does* have statistics for pause frames transmitted > and received (run: "ethtool -S eth0| grep flow_control"). If you log > these every second then it should be possible to see what happens > around the time the TX watchdog fires. That could provide some clues > as to whether the NIC is behaving correctly. > > Ben. > > -- > Ben Hutchings > Power corrupts. Absolute power is kind of neat. > - John Lehman, Secretary of the US Navy > 1981-1987 -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bug#869670: Depends: linux-headers-4.11.0-2-common ... but it is not going to be installed
Package: linux-headers-4.11.0-2-all Followup-For: Bug #869670 Dear Maintainer, I have tried to install nvidia drivers and run virtualbox, but for building DKMS I needed linux-headers and I couldn't install them. And when I try with my graphic card Gigabyte GTX 960 4GB to boot the system, it stops at boot. I couldn't get to work it. When I somehow got the system to install the nvidia driver DKMS, it required me to remove the packages from recovery, because I couldn't boot even withot my GPU. -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.11.0-2-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages linux-headers-4.11.0-2-all depends on: pn linux-headers-4.11.0-2-all-amd64 linux-headers-4.11.0-2-all recommends no packages. linux-headers-4.11.0-2-all suggests no packages.
Bug#869670: Depends: linux-headers-4.11.0-2-common ... but it is not going to be installed
Package: linux-headers-4.11.0-2-common Version: 4.11.11-1 Followup-For: Bug #869670 I had the same issue here with the same result of losing my kernel headers, DKMS not working, thus being unable to build VirtualBox modules and use VirtualBox at all. I tries to look for a ``4.11.11-1+b1`` version for the ``-common`` and ``-common-rt`` packages (both lacking the latest consistent version), but found nothing. It looks like the only chance left for me is rebooting into a previous kernel.`:(` Cheers! -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 4.11.0-2-amd64 (SMP w/4 CPU cores) Locale: LANG=ca_ES.UTF-8, LC_CTYPE=ca_ES.UTF-8 (charmap=UTF-8), LANGUAGE=ca:es (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system)
Bug#871786: linux-image-4.9.0-3-amd64: list_del corruption in ext4_evict_inode from SyS_rename
Package: src:linux Version: 4.9.30-2+deb9u2 Severity: important Tags: upstream Dear Maintainer, When I got up this morning, the system was hung, and MagicSysrq did not restore it, so I had to hard power-cycle. The automatically include syslog contains only the latest, clean boot, so I'm going to edit it below to contain the backtrace.Looks like the bug is in EXT4. There was a long fsck at boot as I think the FS was corrupted. Based on the time of the hang, my backup system may have been running and heavily using BIO. -- Alison Chaiken, ali...@she-devel.com -- Package-specific info: ** Version: Linux version 4.9.0-3-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) ** Command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-3-amd64 root=UUID=d75b29e8-fd83-4065-941a-7263be529f74 ro ** Not tainted ** Kernel log: Aug 11 00:24:28 bonnet kernel: [269680.711160] [ cut here ] Aug 11 00:24:28 bonnet kernel: [269680.711173] WARNING: CPU: 1 PID: 3173 at /build/linux-9uDFZV/linux-4.9.30/lib/list_debug.c:62 list_del+0x9/0x30 Aug 11 00:24:28 bonnet kernel: [269680.711176] list_del corruption. next->prev should be fc2ac1119da0, but was fc2ac0112aa0 Aug 11 00:24:28 bonnet kernel: [269680.711178] Modules linked in: uas usb_storage usblp ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter binfmt_misc sr_mod cdrom joydev ata_generic amdkfd edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic edac_core kvm_amd kvm snd_hda_codec_hdmi irqbypass radeon psmouse serio_raw pcspkr firewire_ohci k10temp firewire_core crc_itu_t snd_hda_intel ttm snd_hda_codec pata_atiixp snd_hda_core snd_hwdep snd_pcm sp5100_tco snd_timer i2c_piix4 drm_kms_helper drm snd i2c_algo_bit sky2 soundcore sg floppy asus_atk0110 shpchp wmi button acpi_cpufreq parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb glue_helper Aug 11 00:24:28 bonnet kernel: [269680.711224] lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache evdev hid_generic usbhid hid sd_mod ohci_pci ahci libahci ohci_hcd ehci_pci ehci_hcd libata usbcore scsi_mod usb_common Aug 11 00:24:28 bonnet kernel: [269680.711240] CPU: 1 PID: 3173 Comm: DOM Worker Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2 Aug 11 00:24:28 bonnet kernel: [269680.711243] Hardware name: System manufacturer System Product Name/M3A78-T, BIOS 080201/08/2009 Aug 11 00:24:28 bonnet kernel: [269680.711245] b4d28414 b1c645657bb0 Aug 11 00:24:28 bonnet kernel: [269680.711249] b4a76ebe fc2ac1119da0 b1c645657c08 fc2ac1119d80 Aug 11 00:24:28 bonnet kernel: [269680.711252] 000a b1c645657dd8 b5714c28 b4a76f3f Aug 11 00:24:28 bonnet kernel: [269680.711255] Call Trace: Aug 11 00:24:28 bonnet kernel: [269680.711261] [] ? dump_stack+0x5c/0x78 Aug 11 00:24:28 bonnet kernel: [269680.711266] [] ? __warn+0xbe/0xe0 Aug 11 00:24:28 bonnet kernel: [269680.711269] [] ? warn_slowpath_fmt+0x5f/0x80 Aug 11 00:24:28 bonnet kernel: [269680.711272] [] ? ___cache_free+0x1c2/0x2e0 Aug 11 00:24:28 bonnet kernel: [269680.711275] [] ? list_del+0x9/0x30 Aug 11 00:24:28 bonnet kernel: [269680.711279] [] ? release_pages+0x14d/0x370 Aug 11 00:24:28 bonnet kernel: [269680.711282] [] ? __pagevec_release+0x2a/0x40 Aug 11 00:24:28 bonnet kernel: [269680.711285] [] ? truncate_inode_pages_range+0x2c9/0x7e0 Aug 11 00:24:28 bonnet kernel: [269680.711322] [] ? ext4_evict_inode+0xfe/0x460 [ext4] Aug 11 00:24:28 bonnet kernel: [269680.711326] [] ? evict+0xb6/0x180 Aug 11 00:24:28 bonnet kernel: [269680.711328] [] ? __dentry_kill+0xa7/0x150 Aug 11 00:24:28 bonnet kernel: [269680.711331] [] ? dput+0x140/0x250 Aug 11 00:24:28 bonnet kernel: [269680.711334] [] ? SyS_rename+0x2a2/0x3f0 Aug 11 00:24:28 bonnet kernel: [269680.711339] [] ? system_call_fast_compare_end+0xc/0x9b Aug 11 00:24:28 bonnet kernel: [269680.711342] ---[ end trace 912a900e4c50a743 ]--- Aug 11 00:24:28 bonnet kernel: [269680.711343] [ cut here ] Aug 11 00:24:28 bonnet kernel: [269680.711347] WARNING: CPU: 1 PID: 3173 at /build/linux-9uDFZV/linux-4.9.30/lib/list_debug.c:62 list_del+0x9/0x30 Aug 11 00:24:28 bonnet kernel: [269680.711348] list_del corruption. next->prev should be fc2ac28da520, but was b1c645657c68 Aug 11 00:24:28 bonnet kernel: [269680.711350] Modules linked in: uas usb_storage usblp ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter binfmt_misc sr_mod cdrom joydev ata_generic amdkfd edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic
Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes
On 10.08.2017 15:09, Andrew Moore wrote: > Both of those reports were me. I suspect the issue may be isolated to > the HPE custom implementation of the ESXi 6.5u1 build. I haven't seen > any similar reports of people using the vanilla 6.5u1 build. Not surprising. It wouldn't be the first time HPE horribly botched their ESX custom ISOs. (Which is the prime reason I don't *ever* use custom vendor ISOs from any vendor in the first place.) > Interestingly none of the fixes that have been discussed work with this > build either. This includes disabling the rx-mini buffer (# ethtool -G > rx-mini 0) and adding vmxnet3.rev.30 = FALSE to the VMs vmx > file. Very strange, indeed. > The only way I've managed to restore stability is by removing vmxnet3 > out of the equation completely and changing to the e1000 NIC type. Using a HW version lower than 13 should also help. Unfortunately the sample size of people reporting failure or success is very small at the time, a conclusive result can't be drawn, I am afraid. Grüße, Sven. signature.asc Description: OpenPGP digital signature