[Kernel-packages] [Bug 1331513] Re: tg3 eth1: transmit timed out, resetting on BCM5720

2014-06-18 Thread Kent Baxley
** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1331513

Title:
  tg3 eth1: transmit timed out, resetting on BCM5720

Status in The Dell PowerEdge project:
  Incomplete
Status in “linux” package in Ubuntu:
  Confirmed
Status in “linux-lts-saucy” package in Ubuntu:
  Confirmed

Bug description:
  we have a problem with Dell PowerEdge machines, having the Broadcom
  5720 chip. We have this problem on generation 12 systems, across
  different models (R420, R620), with several combinations of bios
  firmwares, lifecycle firmwares, etc... We see this on several versions
  of the linux kernel, ranging from 3.2.x up tot 3.11, with several
  versions of the tg3 driver, including a manually compiled latest
  version (3.133d) loaded in a 3.11. The latest machine, where we can
  reproduce the problem has Ubuntu Precise installed, but we also see
  this behaviour on Debian machines. We run Xen on it, running HVM hosts
  on it. Storage is handled over iSCSI (and it is the iSCSI interface we
  can trigger this bug on in a reproducible way, while we have the
  impression it also happens on other interfaces, but there we don't
  have a solid case where we have e reproducible setup).

  All this info actually points into the direction of the tg3 driver
  and/or hardware below it not handling certain datastreams or data
  patterns correctly, and finally crashing the system. It seems
  unrelated to the version of kernel running, xen-version running,
  amount of VM's running, firmwares and revisions running, etc...

  We have been trying to pinpoint this for over a year now, being unable
  to actually create a scenario where we could reproduce this. As of
  this week, we finally found a specific setup where we could trigger
  the error within a reasonable time.

  The error is triggered by running a certain VM on the Xen stack, and
  inside that VM, importing a mysqldump in a running mysql on that VM.
  The VM has it's traffic on an iSCSI volume, so this effectually
  generates a datastream over the eth1 interface of the machine.  Within
  a short amount of time, the system will crash in 2 steps. We first see
  a timeout on the tg3 driver on the eth1 interface (dmesg output
  section attached). This sometimes repeats two or three times, and
  finally, step 2, the machine freezes and reboots.

  While debugging, we noticed that the bug goes away when we disable sg
  offloading with ethtool.

  If you need any additional info, feel free to ask.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: linux-image-3.11.0-19-generic 3.11.0-19.33~precise1
  ProcVersionSignature: Ubuntu 3.11.0-19.33~precise1-generic 3.11.10.5
  Uname: Linux 3.11.0-19-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---T 1 root audio 116,  1 Jun 18 16:36 seq
   crw-rw---T 1 root audio 116, 33 Jun 18 16:36 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.0.1-0ubuntu17.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Wed Jun 18 16:47:27 2014
  HibernationDevice: RESUME=UUID=f3577e02-64e3-4cab-b6e7-f30efa111565
  InstallationMedia: Ubuntu-Server 12.04.4 LTS Precise Pangolin - Release 
amd64 (20140204)
  MachineType: Dell Inc. PowerEdge R420
  MarkForUpload: True
  PciMultimedia:
   
  ProcFB:
   
  ProcKernelCmdLine: placeholder root=UUID=bbc71780-90bf-4647-b579-e48d5d8c2bce 
ro vga=0x317
  RelatedPackageVersions:
   linux-restricted-modules-3.11.0-19-generic N/A
   linux-backports-modules-3.11.0-19-generic  N/A
   linux-firmware 1.79.12
  RfKill: Error: [Errno 2] No such file or directory
  SourcePackage: linux-lts-saucy
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 01/20/2014
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.1.2
  dmi.board.name: 0JD6X3
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A00
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.1.2:bd01/20/2014:svnDellInc.:pnPowerEdgeR420:pvr:rvnDellInc.:rn0JD6X3:rvrA00:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R420
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/dell-poweredge/+bug/1331513/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1331513] Re: tg3 eth1: transmit timed out, resetting on BCM5720

2014-06-18 Thread Christopher M. Penalver
** Tags added: saucy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1331513

Title:
  tg3 eth1: transmit timed out, resetting on BCM5720

Status in The Dell PowerEdge project:
  Incomplete
Status in “linux” package in Ubuntu:
  Confirmed
Status in “linux-lts-saucy” package in Ubuntu:
  Confirmed

Bug description:
  we have a problem with Dell PowerEdge machines, having the Broadcom
  5720 chip. We have this problem on generation 12 systems, across
  different models (R420, R620), with several combinations of bios
  firmwares, lifecycle firmwares, etc... We see this on several versions
  of the linux kernel, ranging from 3.2.x up tot 3.11, with several
  versions of the tg3 driver, including a manually compiled latest
  version (3.133d) loaded in a 3.11. The latest machine, where we can
  reproduce the problem has Ubuntu Precise installed, but we also see
  this behaviour on Debian machines. We run Xen on it, running HVM hosts
  on it. Storage is handled over iSCSI (and it is the iSCSI interface we
  can trigger this bug on in a reproducible way, while we have the
  impression it also happens on other interfaces, but there we don't
  have a solid case where we have e reproducible setup).

  All this info actually points into the direction of the tg3 driver
  and/or hardware below it not handling certain datastreams or data
  patterns correctly, and finally crashing the system. It seems
  unrelated to the version of kernel running, xen-version running,
  amount of VM's running, firmwares and revisions running, etc...

  We have been trying to pinpoint this for over a year now, being unable
  to actually create a scenario where we could reproduce this. As of
  this week, we finally found a specific setup where we could trigger
  the error within a reasonable time.

  The error is triggered by running a certain VM on the Xen stack, and
  inside that VM, importing a mysqldump in a running mysql on that VM.
  The VM has it's traffic on an iSCSI volume, so this effectually
  generates a datastream over the eth1 interface of the machine.  Within
  a short amount of time, the system will crash in 2 steps. We first see
  a timeout on the tg3 driver on the eth1 interface (dmesg output
  section attached). This sometimes repeats two or three times, and
  finally, step 2, the machine freezes and reboots.

  While debugging, we noticed that the bug goes away when we disable sg
  offloading with ethtool.

  If you need any additional info, feel free to ask.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: linux-image-3.11.0-19-generic 3.11.0-19.33~precise1
  ProcVersionSignature: Ubuntu 3.11.0-19.33~precise1-generic 3.11.10.5
  Uname: Linux 3.11.0-19-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---T 1 root audio 116,  1 Jun 18 16:36 seq
   crw-rw---T 1 root audio 116, 33 Jun 18 16:36 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.0.1-0ubuntu17.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Wed Jun 18 16:47:27 2014
  HibernationDevice: RESUME=UUID=f3577e02-64e3-4cab-b6e7-f30efa111565
  InstallationMedia: Ubuntu-Server 12.04.4 LTS Precise Pangolin - Release 
amd64 (20140204)
  MachineType: Dell Inc. PowerEdge R420
  MarkForUpload: True
  PciMultimedia:
   
  ProcFB:
   
  ProcKernelCmdLine: placeholder root=UUID=bbc71780-90bf-4647-b579-e48d5d8c2bce 
ro vga=0x317
  RelatedPackageVersions:
   linux-restricted-modules-3.11.0-19-generic N/A
   linux-backports-modules-3.11.0-19-generic  N/A
   linux-firmware 1.79.12
  RfKill: Error: [Errno 2] No such file or directory
  SourcePackage: linux-lts-saucy
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 01/20/2014
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.1.2
  dmi.board.name: 0JD6X3
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A00
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.1.2:bd01/20/2014:svnDellInc.:pnPowerEdgeR420:pvr:rvnDellInc.:rn0JD6X3:rvrA00:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R420
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/dell-poweredge/+bug/1331513/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1331513] Re: tg3 eth1: transmit timed out, resetting on BCM5720

2014-06-18 Thread Christopher M. Penalver
wonko, could you please test the latest upstream kernel available from the very 
top line at the top of the page (not the daily folder) following 
https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional 
upstream developers to examine the issue. Once you've tested the upstream 
kernel, please comment on which kernel version specifically you tested. If this 
bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For 
example:
kernel-fixed-upstream-3.16-rc1

This can be done by clicking on the yellow circle with a black pencil icon next 
to the word Tags located at the bottom of the bug description. As well, please 
remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's
Status as Confirmed. Please let us know your results. Thank you for your
understanding.

** Tags added: bios-outdated-2.1.3

** Tags added: trusty

** Changed in: linux (Ubuntu)
   Importance: Undecided = High

** Changed in: linux (Ubuntu)
   Status: Confirmed = Incomplete

** Summary changed:

- tg3 eth1: transmit timed out, resetting on BCM5720
+ 14e4:165f tg3 eth1: transmit timed out, resetting on BCM5720

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1331513

Title:
  14e4:165f tg3 eth1: transmit timed out, resetting on BCM5720

Status in The Dell PowerEdge project:
  Incomplete
Status in “linux” package in Ubuntu:
  Incomplete
Status in “linux-lts-saucy” package in Ubuntu:
  Confirmed

Bug description:
  we have a problem with Dell PowerEdge machines, having the Broadcom
  5720 chip. We have this problem on generation 12 systems, across
  different models (R420, R620), with several combinations of bios
  firmwares, lifecycle firmwares, etc... We see this on several versions
  of the linux kernel, ranging from 3.2.x up tot 3.11, with several
  versions of the tg3 driver, including a manually compiled latest
  version (3.133d) loaded in a 3.11. The latest machine, where we can
  reproduce the problem has Ubuntu Precise installed, but we also see
  this behaviour on Debian machines. We run Xen on it, running HVM hosts
  on it. Storage is handled over iSCSI (and it is the iSCSI interface we
  can trigger this bug on in a reproducible way, while we have the
  impression it also happens on other interfaces, but there we don't
  have a solid case where we have e reproducible setup).

  All this info actually points into the direction of the tg3 driver
  and/or hardware below it not handling certain datastreams or data
  patterns correctly, and finally crashing the system. It seems
  unrelated to the version of kernel running, xen-version running,
  amount of VM's running, firmwares and revisions running, etc...

  We have been trying to pinpoint this for over a year now, being unable
  to actually create a scenario where we could reproduce this. As of
  this week, we finally found a specific setup where we could trigger
  the error within a reasonable time.

  The error is triggered by running a certain VM on the Xen stack, and
  inside that VM, importing a mysqldump in a running mysql on that VM.
  The VM has it's traffic on an iSCSI volume, so this effectually
  generates a datastream over the eth1 interface of the machine.  Within
  a short amount of time, the system will crash in 2 steps. We first see
  a timeout on the tg3 driver on the eth1 interface (dmesg output
  section attached). This sometimes repeats two or three times, and
  finally, step 2, the machine freezes and reboots.

  While debugging, we noticed that the bug goes away when we disable sg
  offloading with ethtool.

  If you need any additional info, feel free to ask.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: linux-image-3.11.0-19-generic 3.11.0-19.33~precise1
  ProcVersionSignature: Ubuntu 3.11.0-19.33~precise1-generic 3.11.10.5
  Uname: Linux 3.11.0-19-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---T 1 root audio 116,  1 Jun 18 16:36 seq
   crw-rw---T 1 root audio 116, 33 Jun 18 16:36 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.0.1-0ubuntu17.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Wed Jun 18 16:47:27 2014
  HibernationDevice: RESUME=UUID=f3577e02-64e3-4cab-b6e7-f30efa111565
  InstallationMedia: Ubuntu-Server 12.04.4 LTS Precise