Bug#858125: e1000: ethernet interface hangs occasionally, kernel reports hang

2017-08-11 Thread Bruce Momjian,,,

I have determined that Debian was complaining about my ethernet port
because I had flow control enabled on the switch, and the switch was
getting easily overwhelmed and hanging, so the Debian resets were valid.

Thank you for the research on this.  I think you can close this case.

---

On Wed, Mar 22, 2017 at 02:42:30AM +, Ben Hutchings wrote:
> Control: retitle -1 TX watchdog fires on e1000e interface with flow control 
> enabled
> 
> On Tue, 2017-03-21 at 18:36 -0400, Bruce Momjian,,, wrote:
> > On Tue, Mar 21, 2017 at 04:04:11PM -0400, Bruce Momjian,,, wrote:
> > > I think this proves my problems are related to flow control.  How would
> > > you like to proceed?  Is there a patch or change you would like me to
> > > test?  Just close the ticket?
> > > 
> > > I have a fix, but it is likely others would not know they had this
> > > problem unless they were monitoring their kernel logs or their network
> > > traffic for lag.
> > 
> > Oh, I should also mention the port that is having problems is connected
> > to a NetGear GS108Ev3 switch, with current firmware, version 2.00.09. 
> > The port connected to my Actiontec FIOS router is not having problems.
> 
> I don't know about any specific bug, but if the switch sends flow
> control XOFF frames continually for long enough (usually 5 seconds)
> this will trigger the TX watchdog.
> 
> It sounds like your switch implements flow control properly (some
> broken switches auto-negotiate it but actually flood flow control
> frames).  However, if a device on some other port (that also has flow
> control enabled) sends XOFF frames continually *and* your server sends
> frames that should go to that other port, the switch will do the same
> to the server once the switch's internal queue has filled up.
> 
> If the switch has port statistics including numbers of pause frames
> then you can see where they are coming from, but I think it doesn't.
> Without that information it's going to be hard to tell exactly where
> the fault lies.
> 
> The e1000e driver *does* have statistics for pause frames transmitted
> and received (run: "ethtool -S eth0| grep flow_control").  If you log
> these every second then it should be possible to see what happens
> around the time the TX watchdog fires.  That could provide some clues
> as to whether the NIC is behaving correctly.
> 
> Ben.
> 
> -- 
> Ben Hutchings
> Power corrupts.  Absolute power is kind of neat.
>    - John Lehman, Secretary of the US Navy
> 1981-1987



-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Bug#869670: Depends: linux-headers-4.11.0-2-common ... but it is not going to be installed

2017-08-11 Thread Kurthy Gyula
Package: linux-headers-4.11.0-2-all
Followup-For: Bug #869670

Dear Maintainer,

I have tried to install nvidia drivers and run virtualbox, but for
building DKMS I needed linux-headers and I couldn't install them.

And when I try with my graphic card Gigabyte GTX 960 4GB to boot the
system, it stops at boot.

I couldn't get to work it.
When I somehow got the system to install the nvidia driver DKMS, it required me 
to remove
the packages from recovery, because I couldn't boot even withot my GPU.

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.11.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages linux-headers-4.11.0-2-all depends on:
pn  linux-headers-4.11.0-2-all-amd64  

linux-headers-4.11.0-2-all recommends no packages.

linux-headers-4.11.0-2-all suggests no packages.



Bug#869670: Depends: linux-headers-4.11.0-2-common ... but it is not going to be installed

2017-08-11 Thread Ivan Vilata i Balaguer
Package: linux-headers-4.11.0-2-common
Version: 4.11.11-1
Followup-For: Bug #869670

I had the same issue here with the same result of losing my kernel headers,
DKMS not working, thus being unable to build VirtualBox modules and use
VirtualBox at all.

I tries to look for a ``4.11.11-1+b1`` version for the ``-common`` and
``-common-rt`` packages (both lacking the latest consistent version), but
found nothing.

It looks like the only chance left for me is rebooting into a previous
kernel.`:(`

Cheers!

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.11.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=ca_ES.UTF-8, LC_CTYPE=ca_ES.UTF-8 (charmap=UTF-8), LANGUAGE=ca:es 
(charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)



Bug#871786: linux-image-4.9.0-3-amd64: list_del corruption in ext4_evict_inode from SyS_rename

2017-08-11 Thread Alison Chaiken
Package: src:linux
Version: 4.9.30-2+deb9u2
Severity: important
Tags: upstream

Dear Maintainer,

When I got up this morning, the system was hung, and MagicSysrq did
not restore it, so I had to hard power-cycle.   The automatically
include syslog contains only the latest, clean boot, so I'm going to
edit it below to contain the backtrace.Looks like the bug is in
EXT4.  There was a long fsck at boot as I think the FS was corrupted.
Based on the time of the hang, my backup system may have been running
and heavily using BIO.

-- Alison Chaiken, ali...@she-devel.com

-- Package-specific info:
** Version:
Linux version 4.9.0-3-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 
20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-4.9.0-3-amd64 
root=UUID=d75b29e8-fd83-4065-941a-7263be529f74 ro

** Not tainted

** Kernel log:

Aug 11 00:24:28 bonnet kernel: [269680.711160] [ cut here 
]
Aug 11 00:24:28 bonnet kernel: [269680.711173] WARNING: CPU: 1 PID: 3173 at 
/build/linux-9uDFZV/linux-4.9.30/lib/list_debug.c:62 list_del+0x9/0x30
Aug 11 00:24:28 bonnet kernel: [269680.711176] list_del corruption. next->prev 
should be fc2ac1119da0, but was fc2ac0112aa0
Aug 11 00:24:28 bonnet kernel: [269680.711178] Modules linked in: uas 
usb_storage usblp ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter 
binfmt_misc sr_mod cdrom joydev ata_generic amdkfd edac_mce_amd 
snd_hda_codec_realtek snd_hda_codec_generic edac_core kvm_amd kvm 
snd_hda_codec_hdmi irqbypass radeon psmouse serio_raw pcspkr firewire_ohci 
k10temp firewire_core crc_itu_t snd_hda_intel ttm snd_hda_codec pata_atiixp 
snd_hda_core snd_hwdep snd_pcm sp5100_tco snd_timer i2c_piix4 drm_kms_helper 
drm snd i2c_algo_bit sky2 soundcore sg floppy asus_atk0110 shpchp wmi button 
acpi_cpufreq parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 
jbd2 crc32c_generic fscrypto ecb glue_helper
Aug 11 00:24:28 bonnet kernel: [269680.711224]  lrw gf128mul ablk_helper cryptd 
aes_x86_64 mbcache evdev hid_generic usbhid hid sd_mod ohci_pci ahci libahci 
ohci_hcd ehci_pci ehci_hcd libata usbcore scsi_mod usb_common
Aug 11 00:24:28 bonnet kernel: [269680.711240] CPU: 1 PID: 3173 Comm: DOM 
Worker Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
Aug 11 00:24:28 bonnet kernel: [269680.711243] Hardware name: System 
manufacturer System Product Name/M3A78-T, BIOS 080201/08/2009
Aug 11 00:24:28 bonnet kernel: [269680.711245]   
b4d28414 b1c645657bb0 
Aug 11 00:24:28 bonnet kernel: [269680.711249]  b4a76ebe 
fc2ac1119da0 b1c645657c08 fc2ac1119d80
Aug 11 00:24:28 bonnet kernel: [269680.711252]  000a 
b1c645657dd8 b5714c28 b4a76f3f
Aug 11 00:24:28 bonnet kernel: [269680.711255] Call Trace:
Aug 11 00:24:28 bonnet kernel: [269680.711261]  [] ? 
dump_stack+0x5c/0x78
Aug 11 00:24:28 bonnet kernel: [269680.711266]  [] ? 
__warn+0xbe/0xe0
Aug 11 00:24:28 bonnet kernel: [269680.711269]  [] ? 
warn_slowpath_fmt+0x5f/0x80
Aug 11 00:24:28 bonnet kernel: [269680.711272]  [] ? 
___cache_free+0x1c2/0x2e0
Aug 11 00:24:28 bonnet kernel: [269680.711275]  [] ? 
list_del+0x9/0x30
Aug 11 00:24:28 bonnet kernel: [269680.711279]  [] ? 
release_pages+0x14d/0x370
Aug 11 00:24:28 bonnet kernel: [269680.711282]  [] ? 
__pagevec_release+0x2a/0x40
Aug 11 00:24:28 bonnet kernel: [269680.711285]  [] ? 
truncate_inode_pages_range+0x2c9/0x7e0
Aug 11 00:24:28 bonnet kernel: [269680.711322]  [] ? 
ext4_evict_inode+0xfe/0x460 [ext4]
Aug 11 00:24:28 bonnet kernel: [269680.711326]  [] ? 
evict+0xb6/0x180
Aug 11 00:24:28 bonnet kernel: [269680.711328]  [] ? 
__dentry_kill+0xa7/0x150
Aug 11 00:24:28 bonnet kernel: [269680.711331]  [] ? 
dput+0x140/0x250
Aug 11 00:24:28 bonnet kernel: [269680.711334]  [] ? 
SyS_rename+0x2a2/0x3f0
Aug 11 00:24:28 bonnet kernel: [269680.711339]  [] ? 
system_call_fast_compare_end+0xc/0x9b
Aug 11 00:24:28 bonnet kernel: [269680.711342] ---[ end trace 912a900e4c50a743 
]---
Aug 11 00:24:28 bonnet kernel: [269680.711343] [ cut here 
]
Aug 11 00:24:28 bonnet kernel: [269680.711347] WARNING: CPU: 1 PID: 3173 at 
/build/linux-9uDFZV/linux-4.9.30/lib/list_debug.c:62 list_del+0x9/0x30
Aug 11 00:24:28 bonnet kernel: [269680.711348] list_del corruption. next->prev 
should be fc2ac28da520, but was b1c645657c68
Aug 11 00:24:28 bonnet kernel: [269680.711350] Modules linked in: uas 
usb_storage usblp ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 
ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter 
binfmt_misc sr_mod cdrom joydev ata_generic amdkfd edac_mce_amd 
snd_hda_codec_realtek snd_hda_codec_generic 

Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes

2017-08-11 Thread Sven Hartge
On 10.08.2017 15:09, Andrew Moore wrote:

> Both of those reports were me. I suspect the issue may be isolated to
> the HPE custom implementation of the ESXi 6.5u1 build. I haven't seen
> any similar reports of people using the vanilla 6.5u1 build.

Not surprising. It wouldn't be the first time HPE horribly botched their
ESX custom ISOs. (Which is the prime reason I don't *ever* use custom
vendor ISOs from any vendor in the first place.)

> Interestingly none of the fixes that have been discussed work with this
> build either. This includes disabling the rx-mini buffer (# ethtool -G
>  rx-mini 0) and adding vmxnet3.rev.30 = FALSE to the VMs vmx
> file.

Very strange, indeed.

> The only way I've managed to restore stability is by removing vmxnet3
> out of the equation completely and changing to the e1000 NIC type.

Using a HW version lower than 13 should also help.


Unfortunately the sample size of people reporting failure or success is
very small at the time, a conclusive result can't be drawn, I am afraid.

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature