Hi Daniel,

I haven't been able to reproduce this panic yet, but from inspecting the code, I'm fairly certain I understand the root cause.

The TX code prior to OS-5225 was fairly simplistic in that it didn't support DMA binding. So for every mblk, it would bcopy the fragments into a single pre-allocated DMA buffer attached to a single TCB. Thus, there was always a 1-to-1 relationship between descriptors and TCBs.

And the assertion that blew in i40e_tx_cleanup_ring() assumes there is always a 1-to-1 relationship between descriptors and TCBs.

In addition to adding support for LSO (aka TSO), OS-5225 also added support for doing DMA binding. By default DMA binding is used for the LSO case and/or for cases where the mblk size is larger than the threshold defined by the "tx_dma_threshold" .conf property.

In the DMA bind case, there is will be either 1 or 2 TCBs, but N descriptors (where N can be 1 context descriptor, iff LSO, plus 1 data descriptor per DMA cookie used to bind each of the mblk fragments).

So I think the fix here is two fold:

1) The assertion that blew should be removed, as it's no longer valid.

2) The cleanup logic in i40e_tx_cleanup_ring() needs to be updated to take the above logic for how TCBs and descriptors are related in the DMA bind case into account when cleaning things up - probably something similar to what was done in i40e_tx_recycle_ring().

I've updated OS-7082 with the above and will get a fix into illumos-joyent soon.

Sorry again for the inconvenience.

rob


On 07/17/18 01:53 PM, Robert Johnston wrote:
Yeah, seems likely related to my changes for OS-5225 :(.  I’ve filed the 
following ticket to track this issue:

https://jira.joyent.us/browse/OS-7082

One possible workaround would be to disable TSO support in i40e by setting 
tx_lso_enable to 0 in /etc/system.

rob

On Jul 17, 2018, at 12:05 PM, Robert Mustacchi <r...@joyent.com> wrote:

Hi Daniel,

It looks like your dump device may not be large enough for us to have
the entire dump. Is it possible to increase the dump device? I guess
there's something that's gone wrong as a result of the integration of
the TSO support for i40e.

Robert

On 7/17/18 12:00 , Daniel Plominski wrote:
Hi,

after a smartos reinstall and upgrade to a newer platform image, the
network card will not work after a while.
The network card is responsible for vlan. Vlan works in a lx zone, but
not for a kvm anymore.

1.  SmartOS Clean Re-install
Restore Datasets Settings
Restore usbkey/config etc.
Restore ZFS Datasets
Import Zones to /etc/zones/index
Run / Start LX & KVM Zones

PI from SunOS assg10 5.11 joyent_20180509T053210Z i86pc i386 i86pc to
SunOS assg10 5.11 joyent_20180717T123432Z i86pc i386 i86pc
https://github.com/ass-a2s/illumos-joyent/tree/ass-release-20180717
https://datasets.ass.de/public/SmartOS/20180717T123432Z/smartos-20180717T123432Z-USB.img.bz2

Rollback on version joyent_20180509T053210Z now shows the same error
after a while.

I did not see any abnormalities under
https://github.com/joyent/illumos-joyent/tree/master/usr/src/uts/common/io/i40e

after Shutdown all VMs and SmartOS Reboot:

2018-07-16T05:18:37.496685+00:00 assg10 genunix: [ID 936769 kern.info]
mpt_sas2 is /pci@7a,0/pci8086,2f01@0/pci1000,30e0@0/iport@v0
2018-07-16T05:18:37.496688+00:00 assg10 genunix: [ID 408114 kern.info]
/pci@7a,0/pci8086,2f01@0/pci1000,30e0@0/iport@v0 (mpt_sas2) online
2018-07-16T05:18:37.496703+00:00 assg10 genunix: [ID 454863 kern.info]
dump on /dev/zvol/dsk/zones/dump size 10465 MB
2018-07-16T05:18:37.496706+00:00 assg10 genunix: [ID 127566 kern.info]
device pciclass,030000@0(display#0) keeps up device sd@0,0(disk#0), but
the former is not power managed
2018-07-16T05:18:37.496709+00:00 assg10 mac: [ID 469746 kern.info]
NOTICE: aggr1000 registered
2018-07-16T05:18:37.496712+00:00 assg10 mac: [ID 435574 kern.info]
NOTICE: igb1 link up, 1000 Mbps, full duplex
2018-07-16T05:18:37.496715+00:00 assg10 mac: [ID 435574 kern.info]
NOTICE: aggr1000 link up, 1000 Mbps, full duplex
2018-07-16T05:18:37.496718+00:00 assg10 mac: [ID 435574 kern.info]
NOTICE: igb2 link up, 1000 Mbps, full duplex
2018-07-16T05:18:37.496721+00:00 assg10 mac: [ID 435574 kern.info]
NOTICE: igb3 link up, 1000 Mbps, full duplex
2018-07-16T05:18:37.496724+00:00 assg10 mac: [ID 435574 kern.info]
NOTICE: igb0 link up, 1000 Mbps, full duplex
2018-07-16T05:18:37.496727+00:00 assg10 genunix: [ID 390243 kern.info]
Creating /etc/devices/devid_cache
2018-07-16T05:18:37.496730+00:00 assg10 genunix: [ID 390243 kern.info]
Creating /etc/devices/pci_unitaddr_persistent
2018-07-16T05:18:37.497026+00:00 assg10 savecore: [ID 570001 auth.error]
reboot after panic: assertion failed: tcb != NULL, file:
../../common/io/i40e/i40e_transceiver.c, line: 2074
2018-07-16T05:18:33+00:00 assg10 savecore: [ID 676874 auth.error] Saving
compressed system crash dump in /var/crash/volatile/vmdump.0
2018-07-16T05:18:40.860505+00:00 assg10 unix: [ID 504448 kern.info]
NOTICE: Fastboot: Couldn't open /platform/i86pc/amd64/boot_archive
2018-07-16T05:18:45.870340+00:00 assg10 pseudo: [ID 129642 kern.info]
pseudo-device: devinfo0
2018-07-16T05:18:45.870392+00:00 assg10 genunix: [ID 936769 kern.info]
devinfo0 is /pseudo/devinfo@0
2018-07-16T05:18:50.549203+00:00 assg10 genunix: [ID 390243 kern.info]
Creating /etc/devices/devname_cache
2018-07-16T05:20:04+00:00 assg10 savecore: [ID 320429 auth.error]
Decompress the crash dump with #012'savecore -vf
/var/crash/volatile/vmdump.0'
2018-07-16T05:20:04.857876+00:00 assg10 rootnex: [ID 349649 kern.info]
xsvc0 at root: space 0 offset 0
2018-07-16T05:20:04.857902+00:00 assg10 genunix: [ID 936769 kern.info]
xsvc0 is /xsvc@0,0
2018-07-16T05:20:06.914513+00:00 assg10 fmd: [ID 377184 daemon.error]
SUNW-MSG-ID: FMD-8000-2K, TYPE: Defect, VER: 1, SEVERITY:
Minor#012EVENT-TIME: Mon Jul 16 05:20:06 UTC 2018#012PLATFORM:
Super-Server, CSN: 9000135765, HOSTNAME:
assg10.assdomain.intern#012SOURCE: fmd-self-diagnosis, REV:
   #012EVENT-ID: 901bac51-20d6-c3ba-d2c3-df70f5af044f#012DESC: An
illumos Fault Manager component has experienced an error that required
the module to be disabled.  Refer to http://illumos.org/msg/FMD-8000-2K
for more information.#012AUTO-RESPONSE: The module has been disabled.
Events destined for the module will be saved for manual
diagnosis.#012IMPACT: Automated diagnosis and response for subsequent
events associated with this module will not occur.#012REC-ACTION: Use
fmdump -v -u <EVENT-ID> to locate the module.  Use fmadm reset <module>
to reset the module.
[root@assg10 ~]#
[root@assg10 /var/crash/volatile]# ls -all
total 20467016
drwx------   2 root     root           5 Juli 16 05:20 .
drwxr-xr-x   3 root     root           3 Juli 15 23:03 ..
-rw-r--r--   1 root     root           2 Juli 16 05:20 bounds
-rw-r--r--   1 root     root        1067 Juli 16 05:20 METRICS.csv
-rw-r--r--   1 root     root     10957094912 Juli 16 05:20 vmdump.0
[root@assg10 /var/crash/volatile]#
[root@assg10 /var/crash/volatile]# du -m vmdump.0
9993    vmdump.0
[root@assg10 /var/crash/volatile]#
[root@assg10 /var/crash/volatile]# savecore -f vmdump.0
savecore: incomplete dump on dump device
savecore: System dump time: Mon Jul 16 05:10:15 2018

savecore: saving system crash dump in /var/crash/volatile/{unix,vmcore}.0
Constructing namelist /var/crash/volatile/unix.0
Constructing corefile /var/crash/volatile/vmcore.0
pfn 65995776 not found for as=fffffffffbc4b0a0, va=fffffcc160600000
pfn 65995777 not found for as=fffffffffbc4b0a0, va=fffffcc160601000
pfn 65995778 not found for as=fffffffffbc4b0a0, va=fffffcc160602000
pfn 65995779 not found for as=fffffffffbc4b0a0, va=fffffcc160603000
pfn 65995780 not found for as=fffffffffbc4b0a0, va=fffffcc160604000
pfn 65995781 not found for as=fffffffffbc4b0a0, va=fffffcc160605000
pfn 65995782 not found for as=fffffffffbc4b0a0, va=fffffcc160606000
pfn 65995783 not found for as=fffffffffbc4b0a0, va=fffffcc160607000
pfn 65995784 not found for as=fffffffffbc4b0a0, va=fffffcc160608000
pfn 65995785 not found for as=fffffffffbc4b0a0, va=fffffcc160609000
  1:42  99% donesavecore: stream tag 1670 not in range 1..1
[root@assg10 /var/crash/volatile]#
[root@assg10 /var/crash/volatile]# mdb 0
mdb: vmcore.0 is not a kernel core file (bad magic number 0)
     failed to initialize target: No such file or directory
[root@assg10 /var/crash/volatile]#
penultimate Sysinfo:
{
   "Live Image": "20180711T060947Z",
   "System Type": "SunOS",
   "Boot Time": "1531720038",
   "SDC Version": "7.0",
   "Manufacturer": "Thomas-Krenn.AG",
   "Product": "Super Server",
   "Serial Number": "0000000000",
   "SKU Number": "Default string",
   "HW Version": "0123456789",
   "HW Family": "Default string",
   "Setup": "false",
   "VM Capable": true,
   "Bhyve Capable": true,
   "Bhyve Max Vcpus": 32,
   "CPU Type": "Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz",
   "CPU Virtualization": "vmx",
   "CPU Physical Cores": 2,
   "UUID": "00000000-0000-0000-0000-000000000000",
   "Hostname": "assg10",
   "CPU Total Cores": 24,
   "MiB of Memory": "262030",
   "Zpool": "zones",
   "Zpool Disks":
"c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000,c0t000000000000000000",

   "Zpool Profile": "mirror",
   "Zpool Creation": 1531695575,
   "Zpool Size in GiB": 2150,
   "Disks": {
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480},
     "c0t000000000000000000": {"Size in GB": 480}
   },
   "Boot Parameters": {
     "console": "vga",
     "vga_mode": "115200,8,n,1,-",
     "root_shadow": "$000000000000000000",
     "smartos": "true",
     "boot_args": "",
     "bootargs": ""
   },
   "Network Interfaces": {
     "i40e0": {"MAC Address": "00:00:00:00:00:00", "ip4addr": "", "Link
Status": "up", "NIC Names": ["vlan"]},
     "i40e1": {"MAC Address": "00:00:00:00:00:00", "ip4addr": "", "Link
Status": "unknown", "NIC Names": ["test"]},
     "igb0": {"MAC Address": "00:00:00:00:00:00", "ip4addr": "", "Link
Status": "up", "NIC Names": []},
     "igb2": {"MAC Address": "00:00:00:00:00:00", "ip4addr": "10.1.1.10",
"Link Status": "up", "NIC Names": ["admin"]},
     "igb1": {"MAC Address": "00:00:00:00:00:00", "ip4addr": "", "Link
Status": "up", "NIC Names": []},
     "igb3": {"MAC Address": "00:00:00:00:00:00", "ip4addr": "", "Link
Status": "up", "NIC Names": []},
     "aggr0": {"MAC Address": "00:00:00:00:00:00", "ip4addr": "", "Link
Status": "up", "NIC Names": ["untag"]}
   },
   "Virtual Network Interfaces": {
   },
   "Link Aggregations": {
     "aggr0": {"LACP mode": "active", "Interfaces": ["igb3", "igb0",
"igb1"]}
   }
}
Mit freundlichen Grüßen

DANIEL PLOMINSKI
Leiter IT / Head of IT

Telefon09265 808-151 | Mobil 0151 58026316 | d...@ass.de
PGP Key: http://pgp.ass.de/2B4EB20A.key

ASS-Einrichtungssysteme GmbH
ASS-Adam-Stegner-Straße 19 | D-96342 Stockheim

Geschäftsführer: Matthias Stegner, Michael Stegner, Stefan Weiß
Amtsgericht Coburg HRB 3395 | Ust-ID: DE218715721







-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com

Reply via email to