[Kernel-packages] [Bug 2044199] Re: package linux-image-6.2.0-37-generic 6.2.0-37.38 failed to install/upgrade: run-parts: /etc/kernel/postinst.d/dkms exited with return code 11

2023-11-21 Thread Paulo Abadie Guedes
Right after submitting this bug (and before doing anything else) I ran a
manual "apt update/apt upgrade", just in case.

To my surprise, the package installed cleanly (see attachment)
Really can't explain why it failed before. I didn't even rebooted the machine, 
so... no clue about why it worked now, after having failed previously.

Any ideas?
Anyway, hope this might be somehow useful for the developers/maintainers of 
this package.

Should it be closed right away?
Thanks!
Paulo

** Attachment added: "2023_11_21_Log_successfull_apt_upgrade.txt"
   
https://bugs.launchpad.net/ubuntu/+source/dkms/+bug/2044199/+attachment/5722008/+files/2023_11_21_Log_successfull_apt_upgrade.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dkms in Ubuntu.
https://bugs.launchpad.net/bugs/2044199

Title:
  package linux-image-6.2.0-37-generic 6.2.0-37.38 failed to
  install/upgrade: run-parts: /etc/kernel/postinst.d/dkms exited with
  return code 11

Status in dkms package in Ubuntu:
  New

Bug description:
  Maybe the issue is related to power saving. The system tried to
  download and update a package (new kernel image), AFAIK, without my
  intervention. Then one of the DKMS steps failed, and the install
  procedure failed. the dmesg command shows nothing particularly
  interesting.

  ProblemType: Package
  DistroRelease: Ubuntu 23.04
  Package: linux-image-6.2.0-37-generic 6.2.0-37.38
  ProcVersionSignature: Ubuntu 6.2.0-36.37-generic 6.2.16
  Uname: Linux 6.2.0-36-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.26.1-0ubuntu2.1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC1:  pag2341 F wireplumber
   /dev/snd/controlC0:  pag2341 F wireplumber
   /dev/snd/seq:pag2330 F pipewire
  CRDA: N/A
  CasperMD5CheckResult: pass
  Date: Tue Nov 21 19:12:16 2023
  ErrorMessage: run-parts: /etc/kernel/postinst.d/dkms exited with return code 
11
  InstallationDate: Installed on 2022-09-13 (434 days ago)
  InstallationMedia: Ubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 
(20220809.1)
  MachineType: SAMSUNG ELECTRONICS CO., LTD. 550XCJ/550XCR
  ProcFB: 0 i915drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-36-generic 
root=/dev/mapper/vg1_ssd-lv1_root ro quiet splash vt.handoff=7
  PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No 
PulseAudio daemon running, or not running as session daemon.
  Python3Details: /usr/bin/python3.11, Python 3.11.4, python3-minimal, 3.11.2-1
  PythonDetails: N/A
  RelatedPackageVersions: grub-pc 2.06-2ubuntu16
  SourcePackage: dkms
  Title: package linux-image-6.2.0-37-generic 6.2.0-37.38 failed to 
install/upgrade: run-parts: /etc/kernel/postinst.d/dkms exited with return code 
11
  UpgradeStatus: Upgraded to lunar on 2023-06-05 (169 days ago)
  dmi.bios.date: 12/29/2020
  dmi.bios.release: 5.16
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: P11RFH.051.201229.HC
  dmi.board.asset.tag: No Asset Tag
  dmi.board.name: NP550XCJ-XS2BR
  dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
  dmi.board.version: SGLA664A0C-C01-G003-S0001+10.0.19041
  dmi.chassis.asset.tag: No Asset Tag
  dmi.chassis.type: 10
  dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
  dmi.chassis.version: N/A
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvrP11RFH.051.201229.HC:bd12/29/2020:br5.16:svnSAMSUNGELECTRONICSCO.,LTD.:pn550XCJ/550XCR:pvrP11RFH:rvnSAMSUNGELECTRONICSCO.,LTD.:rnNP550XCJ-XS2BR:rvrSGLA664A0C-C01-G003-S0001+10.0.19041:cvnSAMSUNGELECTRONICSCO.,LTD.:ct10:cvrN/A:skuSCAI-A5A5-A5A5-A5A5-PRFH:
  dmi.product.family: Notebook 5 Series
  dmi.product.name: 550XCJ/550XCR
  dmi.product.sku: SCAI-A5A5-A5A5-A5A5-PRFH
  dmi.product.version: P11RFH
  dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dkms/+bug/2044199/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2044199] [NEW] package linux-image-6.2.0-37-generic 6.2.0-37.38 failed to install/upgrade: run-parts: /etc/kernel/postinst.d/dkms exited with return code 11

2023-11-21 Thread Paulo Abadie Guedes
Public bug reported:

Maybe the issue is related to power saving. The system tried to download
and update a package (new kernel image), AFAIK, without my intervention.
Then one of the DKMS steps failed, and the install procedure failed. the
dmesg command shows nothing particularly interesting.

ProblemType: Package
DistroRelease: Ubuntu 23.04
Package: linux-image-6.2.0-37-generic 6.2.0-37.38
ProcVersionSignature: Ubuntu 6.2.0-36.37-generic 6.2.16
Uname: Linux 6.2.0-36-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.26.1-0ubuntu2.1
Architecture: amd64
AudioDevicesInUse:
 USERPID ACCESS COMMAND
 /dev/snd/controlC1:  pag2341 F wireplumber
 /dev/snd/controlC0:  pag2341 F wireplumber
 /dev/snd/seq:pag2330 F pipewire
CRDA: N/A
CasperMD5CheckResult: pass
Date: Tue Nov 21 19:12:16 2023
ErrorMessage: run-parts: /etc/kernel/postinst.d/dkms exited with return code 11
InstallationDate: Installed on 2022-09-13 (434 days ago)
InstallationMedia: Ubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 
(20220809.1)
MachineType: SAMSUNG ELECTRONICS CO., LTD. 550XCJ/550XCR
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-36-generic 
root=/dev/mapper/vg1_ssd-lv1_root ro quiet splash vt.handoff=7
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No 
PulseAudio daemon running, or not running as session daemon.
Python3Details: /usr/bin/python3.11, Python 3.11.4, python3-minimal, 3.11.2-1
PythonDetails: N/A
RelatedPackageVersions: grub-pc 2.06-2ubuntu16
SourcePackage: dkms
Title: package linux-image-6.2.0-37-generic 6.2.0-37.38 failed to 
install/upgrade: run-parts: /etc/kernel/postinst.d/dkms exited with return code 
11
UpgradeStatus: Upgraded to lunar on 2023-06-05 (169 days ago)
dmi.bios.date: 12/29/2020
dmi.bios.release: 5.16
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P11RFH.051.201229.HC
dmi.board.asset.tag: No Asset Tag
dmi.board.name: NP550XCJ-XS2BR
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: SGLA664A0C-C01-G003-S0001+10.0.19041
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: N/A
dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvrP11RFH.051.201229.HC:bd12/29/2020:br5.16:svnSAMSUNGELECTRONICSCO.,LTD.:pn550XCJ/550XCR:pvrP11RFH:rvnSAMSUNGELECTRONICSCO.,LTD.:rnNP550XCJ-XS2BR:rvrSGLA664A0C-C01-G003-S0001+10.0.19041:cvnSAMSUNGELECTRONICSCO.,LTD.:ct10:cvrN/A:skuSCAI-A5A5-A5A5-A5A5-PRFH:
dmi.product.family: Notebook 5 Series
dmi.product.name: 550XCJ/550XCR
dmi.product.sku: SCAI-A5A5-A5A5-A5A5-PRFH
dmi.product.version: P11RFH
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

** Affects: dkms (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: amd64 apport-package lunar

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dkms in Ubuntu.
https://bugs.launchpad.net/bugs/2044199

Title:
  package linux-image-6.2.0-37-generic 6.2.0-37.38 failed to
  install/upgrade: run-parts: /etc/kernel/postinst.d/dkms exited with
  return code 11

Status in dkms package in Ubuntu:
  New

Bug description:
  Maybe the issue is related to power saving. The system tried to
  download and update a package (new kernel image), AFAIK, without my
  intervention. Then one of the DKMS steps failed, and the install
  procedure failed. the dmesg command shows nothing particularly
  interesting.

  ProblemType: Package
  DistroRelease: Ubuntu 23.04
  Package: linux-image-6.2.0-37-generic 6.2.0-37.38
  ProcVersionSignature: Ubuntu 6.2.0-36.37-generic 6.2.16
  Uname: Linux 6.2.0-36-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.26.1-0ubuntu2.1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC1:  pag2341 F wireplumber
   /dev/snd/controlC0:  pag2341 F wireplumber
   /dev/snd/seq:pag2330 F pipewire
  CRDA: N/A
  CasperMD5CheckResult: pass
  Date: Tue Nov 21 19:12:16 2023
  ErrorMessage: run-parts: /etc/kernel/postinst.d/dkms exited with return code 
11
  InstallationDate: Installed on 2022-09-13 (434 days ago)
  InstallationMedia: Ubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 
(20220809.1)
  MachineType: SAMSUNG ELECTRONICS CO., LTD. 550XCJ/550XCR
  ProcFB: 0 i915drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-36-generic 
root=/dev/mapper/vg1_ssd-lv1_root ro quiet splash vt.handoff=7
  PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No 
PulseAudio daemon running, or not running as session daemon.
  Python3Details: /usr/bin/python3.11, Python 3.11.4, python3-minimal, 3.11.2-1
  PythonDetails: N/A
  RelatedPackageVersions: grub-pc 2.06-2ubuntu16
  SourcePackage: dkms
  Title: package linux-image-6.2.0-37-generic 6.2.0-37.38 failed to 
install/upgrade: run-parts: /etc/kernel/postinst.d/dkms exited with return 

Re: [Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2019-07-02 Thread Paulo Abadie Guedes
Thank you, Kai-Heng Feng. Really appreciate it.

Currently I'm under a lot of pressure at work. But I will try this in the
next days, to see if it fixes the problem for us. My network still have the
same condition and my previous kernel versions are still breaking. So, it
should be easy to reproduce.
Will write back reporting as soon as I can.

Thank you again,
Paulo


On Tue, Jul 2, 2019, 03:15 Kai-Heng Feng 
wrote:

> Latest kernels in Xenial, Bionic, Cosmic and Disco have the following
> commit:
> commit 3a498606bb04af603a46ebde8296040b2de350d1
> Author: Sanjeev Bansal 
> Date:   Mon Jul 16 11:13:32 2018 +0530
>
> tg3: Add higher cpu clock for 5762.
>
> This patch has fix for TX timeout while running bi-directional
> traffic with 100 Mbps using 5762.
>
> Signed-off-by: Sanjeev Bansal 
> Signed-off-by: Siva Reddy Kallam 
> Reviewed-by: Michael Chan 
> Signed-off-by: David S. Miller 
>
> ** Changed in: linux (Ubuntu)
>Status: Triaged => Fix Released
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1447664
>
> Title:
>   14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux package in Debian:
>   New
>
> Bug description:
>   The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load.  Call trace:
>   https://launchpadlibrarian.net/204185480/dmesg
>
>   When this happens, only a reboot would be able to fix it.  Sometimes,
>   however, bringing the interface offline and online (via ifconfig)
>   would recover networking.  I've also tested with the latest tg3 driver
>   (dec 2014 version) and networking is still problematic.  I have also
>   disabled TSO, GSO etc... with ethtool and the bug still surfaces.
>   This bug may be related to the integrated Firmware.
>
>   Here is the procedure to replicate the issue because it is hard to
>   replicate it under moderate network load.
>
>   1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
> using a Ubuntu/Kubunu Live CD 14.04-15.04.
>   2. from another machine: start 5 sessions, repetitively copy (scp with
> public key authentication) a 70 meg file back and forth to the tg3 machine
> in each session. (not sure if this is necessary)
>   3. create a 1GB file on the tg3 machine, with something like dd
> if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
>   4. from another machine: repetitively scp copy that 1GB file from the
> tg3 machine. This can be done with something like:
>
>   while [ 0 ]; do
>  scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
>   done;
>
>   Networking will mostly goes offline in about 10-30 minutes.
>
>   WORKAROUND: Add udev rule to make the changes permanent in
> /etc/udev/rules.d/80-tg3-fix.rules :
>   ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4",
> ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 15.04
>   Package: linux-image-3.19.0-15-generic 3.19.0-15.15
>   ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
>   Uname: Linux 3.19.0-15-generic x86_64
>   ApportVersion: 2.17.2-0ubuntu1
>   Architecture: amd64
>   AudioDevicesInUse:
>USERPID ACCESS COMMAND
>/dev/snd/controlC1:  kubuntu3748 F pulseaudio
>/dev/snd/controlC0:  kubuntu3748 F pulseaudio
>   CasperVersion: 1.360
>   Date: Thu Apr 23 11:16:24 2015
>   IwConfig:
>eth0  no wireless extensions.
>
>lono wireless extensions.
>   LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
>   MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
>   ProcEnviron:
>LANGUAGE=
>TERM=xterm
>PATH=(custom, no user)
>LANG=en_US.UTF-8
>SHELL=/bin/bash
>   ProcFB: 0 radeondrmfb
>   ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi
> file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash
> ---
>   PulseList:
>Error: command ['pacmd', 'list'] failed with exit code 1: Home
> directory not accessible: Permission denied
>No PulseAudio daemon running, or not running as session daemon.
>   RelatedPackageVersions:
>linux-restricted-modules-3.19.0-15-generic N/A
>linux-backports-modules-3.19.0-15-generic  N/A
>linux-firmware 1.143
>   RfKill:
>
>   SourcePackage: linux
>   UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
>   UpgradeStatus: No upgrade log present (probably fresh install)
>   dmi.bios.date: 10/22/2014
>   dmi.bios.vendor: Hewlett-Packard
>   dmi.bios.version: L06 v02.15
>   dmi.board.asset.tag: 2UA5041TG4
>   dmi.board.name: 2215
>   dmi.board.vendor: Hewlett-Packard
>   dmi.chassis.asset.tag: 2UA5041TG4
>   dmi.chassis.type: 6
>   dmi.chassis.vendor: Hewlett-Packard
>   dmi.modalias:

Re: [Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2019-01-26 Thread Paulo Abadie Guedes
Thank you.
I am still having the problem during our cloning process, although it's not
so frequent. Before the patch I applied, each and every transfer would
ALWAYS kick the tg3 bug.

Here it seems related to problems with NAPI. AFAIK, this is an approach to
handle interrupt bursts. NIC's work typically in bursts: a long time
without packets, then a very large stream of packets, then silence. This is
the common scenario.

Having interrupts to serve sporadic data is ok. But a burst of packets
trigger a burst of interrupts, which is not as efficient as just polling
the NIC (during the burst).

What NAPI does is (in a very very simplified way): it expects the first
interrupt from the network, then switches off interrupts, poll the NIC (up
to a limit) until there are no more network packets, or the "work quota" is
exhausted, what happens first. Then it turns on interrupts and the cycle
repeats. This quota (sorry, don't remember the correct term) is very
important to prevent the kernel from being stuck just serving packets.

What's happening is (my understanding) that something went wrong during
this process and the tg3 driver gets stuck.

A colleague told me that it's related to the broadcom driver.

Please try this workaround. Remove the two drivers, then reload "broadcom"
and "tg3" in this order. Maybe then your network will restart.

sudo modprobe -r broadcom tg3
sudo modprobe broadcom
sudo modprobe tg3

Please tell us what happens when you try this. It won't solve the problem,
but perhaps it helps.

Regards,
Paulo

On Sat, Jan 26, 2019, 10:39 Bob Lawrence <1447...@bugs.launchpad.net
wrote:

> Confirmed that this is still an issue on 18.04.1. I have an HP 705 G1
> with the Broadcom 5762. In my case it's a Plex server. Whenever I try to
> stream something the interface goes "NO-CARRIER" and the only way to
> recover is to reboot. I've tried disabling highdma, tso and gso using
> ethtool, iommu=soft kernel parameter, and forcing every combo of
> 1gbps/100mbps & half/full duplex. Nothing seems to workaround the issue.
>
> System:Host: Bobs-HTPC Kernel: 4.15.0-43-generic x86_64 bits: 64
> Console: tty 1 Distro: Ubuntu 18.04.1 LTS
> Machine:   Device: desktop System: Hewlett-Packard product: HP EliteDesk
> 705 G1 DM serial: N/A
>Mobo: Hewlett-Packard model: 225E serial: N/A BIOS:
> Hewlett-Packard v: L06 v02.31 date: 08/31/2018
> Batteryhidpp__0: charge: N/A condition: NA/NA Wh
> CPU:   Quad core AMD A8-7600 Radeon R7 10 Compute Cores 4C+6G (-MCP-)
> cache: 8192 KB
>clock speeds: max: 3100 MHz 1: 3094 MHz 2: 3094 MHz 3: 3094 MHz
> 4: 3094 MHz
> Graphics:  Card: Advanced Micro Devices [AMD/ATI] Kaveri [Radeon R7
> Graphics]
>Display Server: N/A drivers: ati,radeon (unloaded:
> modesetting,fbdev,vesa)
>tty size: 120x53 Advanced Data: N/A out of X
> Audio: Card-1 Advanced Micro Devices [AMD] FCH Azalia Controller
> driver: snd_hda_intel
>Card-2 Advanced Micro Devices [AMD/ATI] Kaveri HDMI/DP Audio
> Controller driver: snd_hda_intel
>Sound: Advanced Linux Sound Architecture v: k4.15.0-43-generic
> Network:   Card-1: Intel Wireless 7260 driver: iwlwifi
>IF: wlp2s0 state: up mac: cc:3d:82:a7:bf:ed
>Card-2: Broadcom Limited NetXtreme BCM5762 Gigabit Ethernet
> PCIe driver: tg3
>IF: eno1 state: up speed: 100 Mbps duplex: half mac:
> ec:b1:d7:4c:2d:8e
> Drives:HDD Total Size: 9501.7GB (42.8% used)
>ID-1: /dev/sda model: ST500LM000 size: 500.1GB
>ID-2: USB /dev/sdb model: 5 size: 9001.6GB
> Partition: ID-1: / size: 458G used: 23G (6%) fs: ext4 dev: /dev/sda1
> RAID:  No RAID devices: /proc/mdstat, md_mod kernel module present
> Sensors:   System Temperatures: cpu: 40.8C mobo: N/A gpu: 42.0
>Fan Speeds (in rpm): cpu: N/A
> Info:  Processes: 227 Uptime: 12:49 Memory: 1608.0/5943.7MB Init:
> systemd runlevel: 5
>Client: Shell (bash) inxi: 2.3.56
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1447664
>
> Title:
>   14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
>   Triaged
> Status in linux package in Debian:
>   New
>
> Bug description:
>   The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load.  Call trace:
>   https://launchpadlibrarian.net/204185480/dmesg
>
>   When this happens, only a reboot would be able to fix it.  Sometimes,
>   however, bringing the interface offline and online (via ifconfig)
>   would recover networking.  I've also tested with the latest tg3 driver
>   (dec 2014 version) and networking is still problematic.  I have also
>   disabled TSO, GSO etc... with ethtool and the bug still surfaces.
>   This bug may be related to the integrated Firmware.
>
>   Here is the 

Re: [Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2018-04-13 Thread Paulo Abadie Guedes
Hi Kai-heng,

Here are the test results we got.
Kernel 4.15.0-14-generic failed. Transmit queue timed out. The dmesg output
is attached. The tg3 module crashes in a few seconds right after opening
the user session (e.g. about less than 10 sec).

However, kernel 4.15.0-9-generic worked like a charm. It boots and brings
up tg3, the Ethernet link is working and the module seems stable. We tested
it to download a few gb, an Ubuntu image, play videos for a few hours and
the like. Not even a single crash was observed. The dmesg output for this
working kernel is attached also, because maybe it might help you to sort
out what's different from one kernel to the other.

Would you like us to test another image? Or to gather more information?

Regards,
Paulo



On Fri, Apr 13, 2018, 14:03 Paulo Guedes - IFPE - Campus Recife <
paulo.gue...@recife.ifpe.edu.br> wrote:

> We tried this same version yesterday and the bug was still present.
> Actually it looked worse, because the machine crashed faster (maybe was
> just an impression). Will collect logs to report this properly soon, in a
> few hours.
> Paulo
>
> On Fri, Apr 13, 2018, 13:55 luc <1447...@bugs.launchpad.net> wrote:
>
>> Hi Kai-heng,
>>
>> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018
>> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
>> I have a  Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28
>> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
>> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is
>> mandatory)
>>
>> [  105.620301] tg3 :03:00.0 eno1: 0: Host status block
>> [0001:00cc:(:002e:):(:0006)]
>> [  105.620309] tg3 :03:00.0 eno1: 0: NAPI info
>> [00cc:00cc:(0024:0006:01ff)::(00f7:::)]
>> [  105.620317] tg3 :03:00.0 eno1: 1: Host status block
>> [0001:0042:(::):(0830:)]
>> [  105.620324] tg3 :03:00.0 eno1: 1: NAPI info
>> [0042:0042:(::01ff):0830:(0030:0030::)]
>> [  105.620331] tg3 :03:00.0 eno1: 2: Host status block
>> [0001:00d2:(0fff::):(:)]
>> [  105.620370] tg3 :03:00.0 eno1: 2: NAPI info
>> [00d2:00d2:(::01ff):0fff:(07ff:07ff::)]
>> [  105.755739] tg3 :03:00.0: tg3_stop_block timed out, ofs=4c00
>> enable_bit=2
>> [  105.797123] tg3 :03:00.0 eno1: Link is down
>> [  105.889440] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=0xffe3d640 flags=0x0020]
>> [  105.889478] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=0xffe3d680 flags=0x0020]
>> [  109.932707] tg3 :03:00.0 eno1: Link is up at 1000 Mbps, full duplex
>> [  109.932710] tg3 :03:00.0 eno1: Flow control is off for TX and off
>> for RX
>> [  109.932711] tg3 :03:00.0 eno1: EEE is enabled
>>
>> ** Attachment added: "Bug tg3"
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+attachment/5114233/+files/Bug%20tg3
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1447664
>>
>> Title:
>>   14e4:1687 broadcom tg3 network driver disconnects under high load
>>
>> Status in linux package in Ubuntu:
>>   Triaged
>> Status in linux package in Debian:
>>   New
>>
>> Bug description:
>>   The tg3 broadcom network driver that binds with chipset 5762 goes
>> offline and unable to recover (even with tg3 watchdog timeout) when network
>> transmit is under high load.  Call trace:
>>   https://launchpadlibrarian.net/204185480/dmesg
>>
>>   When this happens, only a reboot would be able to fix it.  Sometimes,
>>   however, bringing the interface offline and online (via ifconfig)
>>   would recover networking.  I've also tested with the latest tg3 driver
>>   (dec 2014 version) and networking is still problematic.  I have also
>>   disabled TSO, GSO etc... with ethtool and the bug still surfaces.
>>   This bug may be related to the integrated Firmware.
>>
>>   Here is the procedure to replicate the issue because it is hard to
>>   replicate it under moderate network load.
>>
>>   1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
>> using a Ubuntu/Kubunu Live CD 14.04-15.04.
>>   2. from another machine: start 5 sessions, repetitively copy (scp with
>> public key authentication) a 70 meg file back and forth to the tg3 machine
>> in each session. (not sure if this is necessary)
>>   3. create a 1GB file on the tg3 machine, with something like dd
>> if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
>>   4. from another machine: repetitively scp copy that 1GB file from the
>> tg3 machine. This can be done with something like:
>>
>>   while [ 0 ]; do
>>  scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
>>   done;
>>
>>   Networking will mostly goes offline in about 10-30 minutes.
>>
>>   WORKAROUND: Add udev rule to make the changes 

Re: [Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2018-04-13 Thread Paulo Abadie Guedes
We tried this same version yesterday and the bug was still present.
Actually it looked worse, because the machine crashed faster (maybe was
just an impression). Will collect logs to report this properly soon, in a
few hours.
Paulo

On Fri, Apr 13, 2018, 13:55 luc <1447...@bugs.launchpad.net> wrote:

> Hi Kai-heng,
>
> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018
> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
> I have a  Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28
> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is
> mandatory)
>
> [  105.620301] tg3 :03:00.0 eno1: 0: Host status block
> [0001:00cc:(:002e:):(:0006)]
> [  105.620309] tg3 :03:00.0 eno1: 0: NAPI info
> [00cc:00cc:(0024:0006:01ff)::(00f7:::)]
> [  105.620317] tg3 :03:00.0 eno1: 1: Host status block
> [0001:0042:(::):(0830:)]
> [  105.620324] tg3 :03:00.0 eno1: 1: NAPI info
> [0042:0042:(::01ff):0830:(0030:0030::)]
> [  105.620331] tg3 :03:00.0 eno1: 2: Host status block
> [0001:00d2:(0fff::):(:)]
> [  105.620370] tg3 :03:00.0 eno1: 2: NAPI info
> [00d2:00d2:(::01ff):0fff:(07ff:07ff::)]
> [  105.755739] tg3 :03:00.0: tg3_stop_block timed out, ofs=4c00
> enable_bit=2
> [  105.797123] tg3 :03:00.0 eno1: Link is down
> [  105.889440] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x000d address=0xffe3d640 flags=0x0020]
> [  105.889478] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x000d address=0xffe3d680 flags=0x0020]
> [  109.932707] tg3 :03:00.0 eno1: Link is up at 1000 Mbps, full duplex
> [  109.932710] tg3 :03:00.0 eno1: Flow control is off for TX and off
> for RX
> [  109.932711] tg3 :03:00.0 eno1: EEE is enabled
>
> ** Attachment added: "Bug tg3"
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+attachment/5114233/+files/Bug%20tg3
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1447664
>
> Title:
>   14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
>   Triaged
> Status in linux package in Debian:
>   New
>
> Bug description:
>   The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load.  Call trace:
>   https://launchpadlibrarian.net/204185480/dmesg
>
>   When this happens, only a reboot would be able to fix it.  Sometimes,
>   however, bringing the interface offline and online (via ifconfig)
>   would recover networking.  I've also tested with the latest tg3 driver
>   (dec 2014 version) and networking is still problematic.  I have also
>   disabled TSO, GSO etc... with ethtool and the bug still surfaces.
>   This bug may be related to the integrated Firmware.
>
>   Here is the procedure to replicate the issue because it is hard to
>   replicate it under moderate network load.
>
>   1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
> using a Ubuntu/Kubunu Live CD 14.04-15.04.
>   2. from another machine: start 5 sessions, repetitively copy (scp with
> public key authentication) a 70 meg file back and forth to the tg3 machine
> in each session. (not sure if this is necessary)
>   3. create a 1GB file on the tg3 machine, with something like dd
> if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
>   4. from another machine: repetitively scp copy that 1GB file from the
> tg3 machine. This can be done with something like:
>
>   while [ 0 ]; do
>  scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
>   done;
>
>   Networking will mostly goes offline in about 10-30 minutes.
>
>   WORKAROUND: Add udev rule to make the changes permanent in
> /etc/udev/rules.d/80-tg3-fix.rules :
>   ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4",
> ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 15.04
>   Package: linux-image-3.19.0-15-generic 3.19.0-15.15
>   ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
>   Uname: Linux 3.19.0-15-generic x86_64
>   ApportVersion: 2.17.2-0ubuntu1
>   Architecture: amd64
>   AudioDevicesInUse:
>USERPID ACCESS COMMAND
>/dev/snd/controlC1:  kubuntu3748 F pulseaudio
>/dev/snd/controlC0:  kubuntu3748 F pulseaudio
>   CasperVersion: 1.360
>   Date: Thu Apr 23 11:16:24 2015
>   IwConfig:
>eth0  no wireless extensions.
>
>lono wireless extensions.
>   LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
>   MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
>   ProcEnviron:
>LANGUAGE=
>TERM=xterm
>

Re: [Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2018-03-20 Thread Paulo Abadie Guedes
Ok, I'll check it out. Thank you very much!

By the way, we downloaded and tested one of the Deb packages you created,
and it worked quite well. Will check which one was exactly before
reporting  (almost sure it was the one for xenial).

We managed to reproduce the issue easily by booting into pxe and, after the
nic was started (trying to get an ip), we reset the machine and booted into
Ubuntu. There is a huge difference by doing this and doing a cold boot,
directly into Ubuntu.

My hypothesis is that pxe setups the nic in a way that is not the default,
by changing one (or more) of the config bits for some register. This same
bit(s) is/are not being touched by the tg3 driver without patch. This way,
a boot may work sometimes, maybe due to default values not being set by the
kernel module tg3 (and being set by pxe code, if it executed before Linux
is loaded).

Anyway, the unpatched kernel breaks very quickly, while the patched kernel
you provided worked out very well. This happens after running pxe.

I will check your links soon and return with our results in the next days,
hopefully this weekend or next week.

Thank you,
Paulo

On Mar 20, 2018 14:16, "Kai-Heng Feng" 
wrote:

Guy, Broadcom has a new patch [1] that need to test.
Here's the kernel [2] to try.

[1] https://lkml.org/lkml/2018/3/20/35
[2] https://people.canonical.com/~khfeng/lp1447664-20180320/


--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:
  The tg3 broadcom network driver that binds with chipset 5762 goes offline
and unable to recover (even with tg3 watchdog timeout) when network
transmit is under high load.  Call trace:
  https://launchpadlibrarian.net/204185480/dmesg

  When this happens, only a reboot would be able to fix it.  Sometimes,
  however, bringing the interface offline and online (via ifconfig)
  would recover networking.  I've also tested with the latest tg3 driver
  (dec 2014 version) and networking is still problematic.  I have also
  disabled TSO, GSO etc... with ethtool and the bug still surfaces.
  This bug may be related to the integrated Firmware.

  Here is the procedure to replicate the issue because it is hard to
  replicate it under moderate network load.

  1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using
a Ubuntu/Kubunu Live CD 14.04-15.04.
  2. from another machine: start 5 sessions, repetitively copy (scp with
public key authentication) a 70 meg file back and forth to the tg3 machine
in each session. (not sure if this is necessary)
  3. create a 1GB file on the tg3 machine, with something like dd
if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
  4. from another machine: repetitively scp copy that 1GB file from the tg3
machine. This can be done with something like:

  while [ 0 ]; do
 scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
  done;

  Networking will mostly goes offline in about 10-30 minutes.

  WORKAROUND: Add udev rule to make the changes permanent in
/etc/udev/rules.d/80-tg3-fix.rules :
  ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4",
ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"

  ProblemType: Bug
  DistroRelease: Ubuntu 15.04
  Package: linux-image-3.19.0-15-generic 3.19.0-15.15
  ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
  Uname: Linux 3.19.0-15-generic x86_64
  ApportVersion: 2.17.2-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC1:  kubuntu3748 F pulseaudio
   /dev/snd/controlC0:  kubuntu3748 F pulseaudio
  CasperVersion: 1.360
  Date: Thu Apr 23 11:16:24 2015
  IwConfig:
   eth0  no wireless extensions.

   lono wireless extensions.
  LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
  MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
  ProcEnviron:
   LANGUAGE=
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi
file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash
---
  PulseList:
   Error: command ['pacmd', 'list'] failed with exit code 1: Home directory
not accessible: Permission denied
   No PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions:
   linux-restricted-modules-3.19.0-15-generic N/A
   linux-backports-modules-3.19.0-15-generic  N/A
   linux-firmware 1.143
  RfKill:

  SourcePackage: linux
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 10/22/2014
  dmi.bios.vendor: Hewlett-Packard
  dmi.bios.version: L06 

Re: [Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2018-02-17 Thread Paulo Abadie Guedes
Thank you, we will try it as soon as possible.

Currently I'm on vacation, and will not be able to test it until about
March 5 (2 weeks from now). But as soon as I test it, I'll let you know
about the results.

It would be great if someone else could try it too.

Thanks,
Paulo


On Feb 12, 2018 3:25 AM, "Kai-Heng Feng" 
wrote:

Kernel with patch in comment #40. Please try it out.

http://people.canonical.com/~khfeng/lp1447664-clk/

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:
  The tg3 broadcom network driver that binds with chipset 5762 goes offline and 
unable to recover (even with tg3 watchdog timeout) when network transmit is 
under high load.  Call trace:
  https://launchpadlibrarian.net/204185480/dmesg

  When this happens, only a reboot would be able to fix it.  Sometimes,
  however, bringing the interface offline and online (via ifconfig)
  would recover networking.  I've also tested with the latest tg3 driver
  (dec 2014 version) and networking is still problematic.  I have also
  disabled TSO, GSO etc... with ethtool and the bug still surfaces.
  This bug may be related to the integrated Firmware.

  Here is the procedure to replicate the issue because it is hard to
  replicate it under moderate network load.

  1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a 
Ubuntu/Kubunu Live CD 14.04-15.04.
  2. from another machine: start 5 sessions, repetitively copy (scp with public 
key authentication) a 70 meg file back and forth to the tg3 machine in each 
session. (not sure if this is necessary)
  3. create a 1GB file on the tg3 machine, with something like dd 
if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
  4. from another machine: repetitively scp copy that 1GB file from the tg3 
machine. This can be done with something like:

  while [ 0 ]; do
     scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
  done;

  Networking will mostly goes offline in about 10-30 minutes.

  WORKAROUND: Add udev rule to make the changes permanent in 
/etc/udev/rules.d/80-tg3-fix.rules :
  ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", 
ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"

  ProblemType: Bug
  DistroRelease: Ubuntu 15.04
  Package: linux-image-3.19.0-15-generic 3.19.0-15.15
  ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
  Uname: Linux 3.19.0-15-generic x86_64
  ApportVersion: 2.17.2-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC1:  kubuntu3748 F pulseaudio
   /dev/snd/controlC0:  kubuntu3748 F pulseaudio
  CasperVersion: 1.360
  Date: Thu Apr 23 11:16:24 2015
  IwConfig:
   eth0  no wireless extensions.

   lono wireless extensions.
  LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
  MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
  ProcEnviron:
   LANGUAGE=
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi 
file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash ---
  PulseList:
   Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not 
accessible: Permission denied
   No PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions:
   linux-restricted-modules-3.19.0-15-generic N/A
   linux-backports-modules-3.19.0-15-generic  N/A
   linux-firmware 1.143
  RfKill:

  SourcePackage: linux
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 10/22/2014
  dmi.bios.vendor: Hewlett-Packard
  dmi.bios.version: L06 v02.15
  dmi.board.asset.tag: 2UA5041TG4
  dmi.board.name: 2215
  dmi.board.vendor: Hewlett-Packard
  dmi.chassis.asset.tag: 2UA5041TG4
  dmi.chassis.type: 6
  dmi.chassis.vendor: Hewlett-Packard
  dmi.modalias: 
dmi:bvnHewlett-Packard:bvrL06v02.15:bd10/22/2014:svnHewlett-Packard:pnHPEliteDesk705G1MT:pvr:rvnHewlett-Packard:rn2215:rvr:cvnHewlett-Packard:ct6:cvr:
  dmi.product.name: HP EliteDesk 705 G1 MT
  dmi.sys.vendor: Hewlett-Packard

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+subscriptions

-- 
Mailing list: 

[Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2018-02-10 Thread Paulo Abadie Guedes
Hello, this thread has a patch that solved the bug (for me).
https://www.mail-archive.com/netdev@vger.kernel.org/msg189347.html

The patch is here:
https://www.mail-archive.com/netdev@vger.kernel.org/msg189923/0001-tg3-Add-clock-override-support-for-5762.patch

I tested this patch on the following kernels and situations.

1) Stable kernels 4.13.3 and 4.15 crash without the patch (plus all
other versions tested). Patch is not merged yet in the main linux
branch, until (and including) 4.15 (stable).

2) Stable kernels 4.13.3 and 4.15 work great with the patch: no timeouts
on tg3. Fast transfers on gigabit links and 10/100 links.

3) I wrote to the patch author, mentioned my results and asked when it
will be merged on Jan 31 (10 days ago). Still waiting, probably the
author is currently quite busy.

4) A lot of tests performed during weeks. The last session took about
one or two weeks, working full time, on an isolated network. Using the
fog open source cloning solution. Several hundreds of GB transferred
during tests, for cloning 100+ machines inside a few labs. Both single
and multicast cloning sessions used. Tested with a gigabit switch and
also with 10/100 switches. Checked both single and multicast, sequential
tests, in parallel, with/without power failures,  with/without several
patches, in many configurations, with lots of kernel parameters, you
name it.

5) The test scenario shows this bug is completely reproducible, 100% of
the time. Without the patch, my kernels always fail. Tested about 20
different versions and none worked. With the patch above, the two
versions always work correctly.

6) A minor detail: patch has a slight offset for 4.15 (2 lines, probably
new comments or code) but works anyway.

This work would be impossible without all the cooperation from the fog
team. Sebastian suggested the patch, and others helped a lot. A big
"thank you" for them!

I wonder when this will be merged in the main kernel. Please, can anyone
help on this?

Regards,
Paulo

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:
  The tg3 broadcom network driver that binds with chipset 5762 goes offline and 
unable to recover (even with tg3 watchdog timeout) when network transmit is 
under high load.  Call trace:
  https://launchpadlibrarian.net/204185480/dmesg

  When this happens, only a reboot would be able to fix it.  Sometimes,
  however, bringing the interface offline and online (via ifconfig)
  would recover networking.  I've also tested with the latest tg3 driver
  (dec 2014 version) and networking is still problematic.  I have also
  disabled TSO, GSO etc... with ethtool and the bug still surfaces.
  This bug may be related to the integrated Firmware.

  Here is the procedure to replicate the issue because it is hard to
  replicate it under moderate network load.

  1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a 
Ubuntu/Kubunu Live CD 14.04-15.04.
  2. from another machine: start 5 sessions, repetitively copy (scp with public 
key authentication) a 70 meg file back and forth to the tg3 machine in each 
session. (not sure if this is necessary)
  3. create a 1GB file on the tg3 machine, with something like dd 
if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
  4. from another machine: repetitively scp copy that 1GB file from the tg3 
machine. This can be done with something like:

  while [ 0 ]; do
     scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
  done;

  Networking will mostly goes offline in about 10-30 minutes.

  WORKAROUND: Add udev rule to make the changes permanent in 
/etc/udev/rules.d/80-tg3-fix.rules :
  ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", 
ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"

  ProblemType: Bug
  DistroRelease: Ubuntu 15.04
  Package: linux-image-3.19.0-15-generic 3.19.0-15.15
  ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
  Uname: Linux 3.19.0-15-generic x86_64
  ApportVersion: 2.17.2-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC1:  kubuntu3748 F pulseaudio
   /dev/snd/controlC0:  kubuntu3748 F pulseaudio
  CasperVersion: 1.360
  Date: Thu Apr 23 11:16:24 2015
  IwConfig:
   eth0  no wireless extensions.

   lono wireless extensions.
  LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
  MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
  ProcEnviron:
   LANGUAGE=
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi 
file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity 

Re: [Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2018-01-31 Thread Paulo Abadie Guedes
Hello, I would like to confirm that it's useful to file a new bug for this
issue. For me, the problem I'm having is the same as we are discussing in
this thread. Would it be just a duplicate?

Maybe I'm missing something, because I don't know the details of the bug
hunting process for Ubuntu.

Can you please confirm I should open it?

In this case, I can add a detailed description and dmesg logs, with debug
on and the timeout error message inside.

Anyway, I want to report advances in this problem. I have tested a few
kernels and patches in the last weeks, and have found one combination that
does solve the issue.

I also checked that this patch is not yet merged into the latest vanilla
stable kernel, version 4.15, released three days ago. But it patches and
works also for 4.15, which is just great (at last for me).

Will send the details later (or tomorrow), as soon as I get back to my
computer.

Paulo

On Jan 29, 2018 12:54 AM, "Kai-Heng Feng" 
wrote:

> First please file an upstream bug at https://bugzilla.kernel.org/
> Product: Drivers
> Component: Network
>
> Also, looks like it's a Ubuntu certified hardware, let me ask around.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1447664
>
> Title:
>   14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
>   Triaged
> Status in linux package in Debian:
>   New
>
> Bug description:
>   The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load.  Call trace:
>   https://launchpadlibrarian.net/204185480/dmesg
>
>   When this happens, only a reboot would be able to fix it.  Sometimes,
>   however, bringing the interface offline and online (via ifconfig)
>   would recover networking.  I've also tested with the latest tg3 driver
>   (dec 2014 version) and networking is still problematic.  I have also
>   disabled TSO, GSO etc... with ethtool and the bug still surfaces.
>   This bug may be related to the integrated Firmware.
>
>   Here is the procedure to replicate the issue because it is hard to
>   replicate it under moderate network load.
>
>   1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
> using a Ubuntu/Kubunu Live CD 14.04-15.04.
>   2. from another machine: start 5 sessions, repetitively copy (scp with
> public key authentication) a 70 meg file back and forth to the tg3 machine
> in each session. (not sure if this is necessary)
>   3. create a 1GB file on the tg3 machine, with something like dd
> if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
>   4. from another machine: repetitively scp copy that 1GB file from the
> tg3 machine. This can be done with something like:
>
>   while [ 0 ]; do
>  scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
>   done;
>
>   Networking will mostly goes offline in about 10-30 minutes.
>
>   WORKAROUND: Add udev rule to make the changes permanent in
> /etc/udev/rules.d/80-tg3-fix.rules :
>   ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4",
> ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 15.04
>   Package: linux-image-3.19.0-15-generic 3.19.0-15.15
>   ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
>   Uname: Linux 3.19.0-15-generic x86_64
>   ApportVersion: 2.17.2-0ubuntu1
>   Architecture: amd64
>   AudioDevicesInUse:
>USERPID ACCESS COMMAND
>/dev/snd/controlC1:  kubuntu3748 F pulseaudio
>/dev/snd/controlC0:  kubuntu3748 F pulseaudio
>   CasperVersion: 1.360
>   Date: Thu Apr 23 11:16:24 2015
>   IwConfig:
>eth0  no wireless extensions.
>
>lono wireless extensions.
>   LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
>   MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
>   ProcEnviron:
>LANGUAGE=
>TERM=xterm
>PATH=(custom, no user)
>LANG=en_US.UTF-8
>SHELL=/bin/bash
>   ProcFB: 0 radeondrmfb
>   ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi
> file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash
> ---
>   PulseList:
>Error: command ['pacmd', 'list'] failed with exit code 1: Home
> directory not accessible: Permission denied
>No PulseAudio daemon running, or not running as session daemon.
>   RelatedPackageVersions:
>linux-restricted-modules-3.19.0-15-generic N/A
>linux-backports-modules-3.19.0-15-generic  N/A
>linux-firmware 1.143
>   RfKill:
>
>   SourcePackage: linux
>   UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
>   UpgradeStatus: No upgrade log present (probably fresh install)
>   dmi.bios.date: 10/22/2014
>   dmi.bios.vendor: Hewlett-Packard
>   dmi.bios.version: L06 v02.15
>   dmi.board.asset.tag: 2UA5041TG4
>   

[Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2018-01-28 Thread Paulo Abadie Guedes
Hello, I am still having this bug. I'm working with several HP machines, with 
the same model as Yngvi. Here it is (from dmesg messages):
Hardware name: HP HP EliteDesk 705 G3 Brazil Desktop Mini/8266, BIOS P26 Ver. 
02.03 12/22/2016

Interesting to notice that it always happens with a 10/100 switch, but
never occurs with a gigabit one.

I've compiled and tested the 4.15.0-rc8 release candidade, which has the
commit 4419bb1cedcda0272e1dc410345c5a1d1da0e367, but it does not solve
the issue. I added a few printk and can see that the module is correctly
compiled and loaded, but my machine is not a Dell. Hence, the "if"
condition fails and the body is not executed.

I tried also to force the patch, by keeping the "if body" and removing the 
condition, just to see what happens (with another printk to prove that it 
runs). The code runs (limiting MRRS t0 2048, I think), but it does not solve 
the bug. 
It complains that TSC is unstable, right after tg3 breaks. Here is a dmesg 
snippet, maybe it helps.


<...>
[  155.816404] clocksource: timekeeping watchdog on CPU0: Marking clocksource 
'tsc' as unstable because the skew is too large:
[  155.816447] clocksource:   'refined-jiffies' wd_now: 
fffdcbf3 wd_last: fffdc110 mask: 
[  155.816490] clocksource:   'tsc' cs_now: 7d3f16e620 
cs_last: 7b2987b172 mask: 
[  155.816533] tsc: Marking TSC unstable due to clocksource watchdog
[  155.939181] tg3 :01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[  156.103998] tg3 :01:00.0 eth0: Link is down
[  156.322988] TSC found unstable after boot, most likely due to broken BIOS. 
Use 'tsc=unstable'.
[  156.323040] sched_clock: Marking unstable (156322980975, 
5436)<-(156582881282, -259894745)
[  156.323144] clocksource: Switched to clocksource refined-jiffies
<...>

If you want to take a deeper look, there are a few logs here. Tried also
with "tsc=unstable" and other boot parameters, mostly to see if any
would help (feeling lucky, perhaps?). Nothing changed, the bug is still
in here. They show mostly the same messages, to me.

log_01_acpi_off.txt
https://pastebin.com/FGQNiLqk

log_02_maxcpus_1.txt
https://pastebin.com/2eEJnA3Z

log_03_nmi_watchdog_off.txt
https://pastebin.com/Su44AqiX

log_04_nmi_watchdog_off.txt
https://pastebin.com/4ja0UZ0c

log_05_noapic_nolapic.txt
https://pastebin.com/fZNJbME5

Well, any ideas? I can reproduce the problem 100% of the time. Would you
like me to test any other patch?

Kai-Heng Feng, you mention "it's better to ask HP and Broadcom to fix
the issue". I agree, but how can we do that?

Thank you,
Paulo

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:
  The tg3 broadcom network driver that binds with chipset 5762 goes offline and 
unable to recover (even with tg3 watchdog timeout) when network transmit is 
under high load.  Call trace:
  https://launchpadlibrarian.net/204185480/dmesg

  When this happens, only a reboot would be able to fix it.  Sometimes,
  however, bringing the interface offline and online (via ifconfig)
  would recover networking.  I've also tested with the latest tg3 driver
  (dec 2014 version) and networking is still problematic.  I have also
  disabled TSO, GSO etc... with ethtool and the bug still surfaces.
  This bug may be related to the integrated Firmware.

  Here is the procedure to replicate the issue because it is hard to
  replicate it under moderate network load.

  1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a 
Ubuntu/Kubunu Live CD 14.04-15.04.
  2. from another machine: start 5 sessions, repetitively copy (scp with public 
key authentication) a 70 meg file back and forth to the tg3 machine in each 
session. (not sure if this is necessary)
  3. create a 1GB file on the tg3 machine, with something like dd 
if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
  4. from another machine: repetitively scp copy that 1GB file from the tg3 
machine. This can be done with something like:

  while [ 0 ]; do
     scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
  done;

  Networking will mostly goes offline in about 10-30 minutes.

  WORKAROUND: Add udev rule to make the changes permanent in 
/etc/udev/rules.d/80-tg3-fix.rules :
  ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", 
ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"

  ProblemType: Bug
  DistroRelease: Ubuntu 15.04
  Package: linux-image-3.19.0-15-generic 3.19.0-15.15
  ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
  Uname: Linux 3.19.0-15-generic x86_64
  ApportVersion: 2.17.2-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS 

[Kernel-packages] [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load

2017-12-14 Thread Paulo Abadie Guedes
Hello, I have seen the exactly same issue, with the exactly same
hardware you have: it's the HP EliteDesk 705 G3 Desktop Mini.

I've tested already a ton of options, including recompiling the latest
kernel, booting with several parameters, and so on and so forth. Got
nothing more than a big headache. I have 100+ machines to install in a
month and my team is having a really hard time to deal with this issue.

I have posted my findings on the fog forums. Fog is an open-source
cloning tool. Please check it out:

https://forums.fogproject.org/topic/10731/crash-due-to-timeout-in-tg3
-kernel-module-tg3_stop_block-timed-out-ofs-4c00-enable_bit-2

Any ideas on this bug? It seems to be related to 10/100 switches. If
both ends are gigabit, it works much more reliably. Problems still
arise, but much less frequently. With my old "fast ethernet" switch, the
problem alwasy happens.

It's lurking anywhere between the binary blob (the firmware), the kernel
driver, the hardware or any tricky combination of these. Perhaps related
to the AMD platform

I can run tests or gather more data, if it helps. The issue always happens here.
Any ideas on how to solve or workaround this issue? Patches or parameters are 
welcome...

Regards,
Paulo

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:
  The tg3 broadcom network driver that binds with chipset 5762 goes offline and 
unable to recover (even with tg3 watchdog timeout) when network transmit is 
under high load.  Call trace:
  https://launchpadlibrarian.net/204185480/dmesg

  When this happens, only a reboot would be able to fix it.  Sometimes,
  however, bringing the interface offline and online (via ifconfig)
  would recover networking.  I've also tested with the latest tg3 driver
  (dec 2014 version) and networking is still problematic.  I have also
  disabled TSO, GSO etc... with ethtool and the bug still surfaces.
  This bug may be related to the integrated Firmware.

  Here is the procedure to replicate the issue because it is hard to
  replicate it under moderate network load.

  1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a 
Ubuntu/Kubunu Live CD 14.04-15.04.
  2. from another machine: start 5 sessions, repetitively copy (scp with public 
key authentication) a 70 meg file back and forth to the tg3 machine in each 
session. (not sure if this is necessary)
  3. create a 1GB file on the tg3 machine, with something like dd 
if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
  4. from another machine: repetitively scp copy that 1GB file from the tg3 
machine. This can be done with something like:

  while [ 0 ]; do
     scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp
  done;

  Networking will mostly goes offline in about 10-30 minutes.

  WORKAROUND: Add udev rule to make the changes permanent in 
/etc/udev/rules.d/80-tg3-fix.rules :
  ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", 
ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"

  ProblemType: Bug
  DistroRelease: Ubuntu 15.04
  Package: linux-image-3.19.0-15-generic 3.19.0-15.15
  ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
  Uname: Linux 3.19.0-15-generic x86_64
  ApportVersion: 2.17.2-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC1:  kubuntu3748 F pulseaudio
   /dev/snd/controlC0:  kubuntu3748 F pulseaudio
  CasperVersion: 1.360
  Date: Thu Apr 23 11:16:24 2015
  IwConfig:
   eth0  no wireless extensions.

   lono wireless extensions.
  LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
  MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
  ProcEnviron:
   LANGUAGE=
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi 
file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash ---
  PulseList:
   Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not 
accessible: Permission denied
   No PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions:
   linux-restricted-modules-3.19.0-15-generic N/A
   linux-backports-modules-3.19.0-15-generic  N/A
   linux-firmware 1.143
  RfKill:

  SourcePackage: linux
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 10/22/2014
  dmi.bios.vendor: Hewlett-Packard
  dmi.bios.version: L06 v02.15
  dmi.board.asset.tag: 2UA5041TG4
  dmi.board.name: 2215
  dmi.board.vendor: Hewlett-Packard
  dmi.chassis.asset.tag: