Re: [BUG] amdgpu GPU fault detected in VM context

2024-01-30 Thread Jose Maldonado
El Tue, 30 Jan 2024 15:28:00 -0400
Jose Maldonado  escribió:
> 
> Hello everyone!
> 
> I have been detecting a bug in the new DRM code corresponding to
> amdgpu using an RX580 (Polaris10). I am currently running -current
> 
> **OpenBSD 7.4 GENERIC.MP#1637 amd64**
> 
> And I'm seeing these errors that appear shortly after starting
> Xenocara.
> 
> drm:pid13609:gmc_v8_0_process_interrupt *ERROR* GPU fault detected:
> 146 0x0420920c for process Xorg pid 63811 thread Xorg pid 349016
> drm:pid13609:gmc_v8_0_process_interrupt *ERROR*
> VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010C284
> drm:pid13609:gmc_v8_0_process_interrupt *ERROR*
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0809200C
> 

Adding more information, since the error was just triggered and I was
able to capture a different output in the dmesg.

This line caught my attention a lot.

Jan 30 18:22:29 volfread /bsd: WARNING acrtc_attach->pflip_status !=
AMDGPU_FLIP_NONE failed at
/usr/src/sys/dev/pci/drm/amd/display/amdgpu_dm/amdgpu_dm.c:8293

Reviewing the code in question I see that it is related to the
management of Page Flipping and its interaction with VRR (FreeSync),
VSync and Mesa.

I have disabled the VRR option on the monitor and in Xenocara I have
always used the default option, without a configuration file, which
indicates that VRR is inactive by default in amdgpu. 

I'll try to see if this solves the problem, and I'll let you know if
there's any progress.



-- 
*
Dios en su cielo, todo bien en la Tierra


dmesg-on-crash
Description: Binary data


[BUG] amdgpu GPU fault detected in VM context

2024-01-30 Thread Jose Maldonado

Hello everyone!

I have been detecting a bug in the new DRM code corresponding to amdgpu
using an RX580 (Polaris10). I am currently running -current

**OpenBSD 7.4 GENERIC.MP#1637 amd64**

And I'm seeing these errors that appear shortly after starting Xenocara.

drm:pid13609:gmc_v8_0_process_interrupt *ERROR* GPU fault detected: 146
0x0420920c for process Xorg pid 63811 thread Xorg pid 349016
drm:pid13609:gmc_v8_0_process_interrupt *ERROR*
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010C284
drm:pid13609:gmc_v8_0_process_interrupt *ERROR*
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0809200C

In any case, the addresses change every time the dmesg spams. I have
been investigating and the error is not new, it has appeared before as
you can see in the following links:

https://www.mail-archive.com/misc@openbsd.org/msg179677.html

https://www.mail-archive.com/bugs@openbsd.org/msg17844.html

But the behavior in all cases is similar. The bug appears, spams the
dmesg and from one moment to the next, breaks Xenocara and restarts the
graphic server. I may be browsing the Internet, watching a video or
simply moving a window, the error appears and triggers the Xenocara
crash.

Checking further, I find that a possible solution is to downgrade the
amdgpu firmware

https://bugzilla.kernel.org/show_bug.cgi?id=201957

But I have checked pre-current firmwares up to OpenBSD version 7.1
(amdgpu-firmware-20211027.tgz) and the firmware for Polaris10 is the
same in any case, so I doubt this will solve the problem of bug spam
and Xenocara crash.

Complete dmesg attached.

-- 
*
Dios en su cielo, todo bien en la Tierra
OpenBSD 7.4-current (GENERIC.MP) #1633: Sat Jan 27 08:06:43 MST 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34228117504 (32642MB)
avail mem = 33169588224 (31632MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xdd9fb000 (62 entries)
bios0: vendor American Megatrends International, LLC. version "A.E0" date 
06/27/2023
bios0: Micro-Star International Co., Ltd. MS-7C95
efi0 at bios0: UEFI 2.7
efi0: American Megatrends rev 0x50011
acpi0 at bios0: ACPI 6.2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT FIDT MCFG HPET IVRS FPDT VFCT BGRT TPM2 
PCCT SSDT CRAT CDIT SSDT SSDT SSDT SSDT WSMT APIC SSDT SSDT SSDT
acpi0: wakeup devices GP12(S4) GP13(S4) XHC0(S4) GP30(S4) GP31(S4) GPP0(S4) 
GPP8(S4) GPP1(S4) PTXH(S4) PT20(S4) PT24(S4) PT26(S4) PT27(S4) PT28(S4) PT29(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-127
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 3800X 8-Core Processor, 4200.01 MHz, 17-71-00, patch 08701030
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,HWPSTATE,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,STIBP,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 
8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: AMD Ryzen 7 3800X 8-Core Processor, 4200.00 MHz, 17-71-00, patch 08701030
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,HWPSTATE,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,STIBP,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 
8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: AMD Ryzen 7 3800X 8-Core Processor, 4200.00 MHz, 17-71-00, patch 08701030
cpu2: 

chromium on -current #1637 crashes (perhaps graphic error?)

2024-01-30 Thread hahahahacker2009

hi,
after upgrading to snapshot #1637, upgraded packages, I found chromium
cannot render properly.
I suspect that is an issue of the graphic driver, since I encounter
the issue on Alpine Linux (running unstable mesa driver) as well.
Chromium generate kilobytes of logs and megabytes of ktrace output.
the log just repeatly
Location of variable sk_FragColor conflicts with another variable.

OpenBSD 7.4-current (GENERIC.MP) #1637: Mon Jan 29 11:59:31 MST 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8401256448 (8012MB)
avail mem = 8125845504 (7749MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe (87 entries)
bios0: vendor Dell Inc. version "A25" date 05/30/2019
bios0: Dell Inc. OptiPlex 9020
efi0 at bios0: UEFI 2.3.1
efi0: American Megatrends rev 0x4028d
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT SLIC LPIT SSDT SSDT HPET SSDT MCFG 
SSDT ASF! SSDT BGRT DMAR TCPA
acpi0: wakeup devices UAR1(S3) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) 
PXSX(S4) PXSX(S4) PXSX(S4) GLAN(S4) EHC1(S3) EHC2(S3) XHC_(S4) HDEF(S4) 
PEG0(S4) PEGP(S4) PEG1(S4) [...]

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz, 3192.88 MHz, 06-3c-03, 
patch 0028
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache

cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz, 3192.81 MHz, 06-3c-03, 
patch 0028
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache

cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz, 3192.77 MHz, 06-3c-03, 
patch 0028
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache

cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz, 3193.00 MHz, 06-3c-03, 
patch 0028
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache

cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 14318179 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG0)
acpiprt2 at acpi0: bus -1 (PEG1)
acpiprt3 at acpi0: bus -1 (PEG2)
acpiec0 at acpi0: not present
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
acpicmos0 at acpi0
com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
acpibtn0 at acpi0: PWRB(wakeup)
"PNP0C14" at acpi0 not configured
tpm0 at acpi0 TPM_ 1.2 (TIS) addr 

Re: TSO em(4) problem

2024-01-30 Thread Hrvoje Popovski
On 30.1.2024. 13:33, Alexander Bluhm wrote:
> On Tue, Jan 30, 2024 at 12:07:08PM +0100, Hrvoje Popovski wrote:
>> On 30.1.2024. 9:27, Hrvoje Popovski wrote:
>>> I will prepare one box for this kind of traffic and will contact you and
>>> marcus
>>>
 In theory when going through vlan interface it should remove
 M_VLANTAG.  But something must be wrong and I wonder what.

 bluhm
>>
>> Hi,
>>
>> I've managed to trigger watchdog in lab. It couldn't be possible without
>> bluhm@ information about ix vlan, thank you.
> 
> Great, now we can debug the details.
> 
> I have to know how ix and em are connected.
> 
> Do you have any bridge or veb?  Where are your vlan trunks?
> Any aggr, trunk, carp?

no, only vlan on ix0.


> Is my understanding of your setup corect?
> 
> ix -> vlan -> forward -> em

yes, and forwarding only without pf.
I'm sending traffic from host connected to vlan/ix0 and forward through
em5 to other host.
I'm sending 1Gbps of traffic with cisco t-rex

> Can something more happen, like
> 
> ix -> forward -> em
> 

In setup without vlan on ix I've got only one watchdog at the begging of
testing and that's it.
With vlan I'm getting around 6 or 7 watchdogs per minute which means 6
or 7 links going up/down.


without vlan
smc4# netstat -sp tcp | grep TSO
0 output TSO packets software chopped
268 output TSO packets hardware processed
0 output TSO packets generated
0 output TSO packets dropped
smc4# netstat -sp tcp | grep LRO
0 input LRO packets passed through pseudo device
7666573 input LRO generated packets from hardware
21667579 input LRO coalesced packets by network device
0 input bad LRO packets dropped




kernel panic, PCe APU3, unknown trap in user mode, nodejs?

2024-01-30 Thread Mikolaj Kucharski
Hi,

I found one of my PC Engines APU3 in kernel panic. What changed recently
on this machine, I started node.js on it about two days ago.

- it runs local installation of Zigbee2MQTT
- nodejs / Zigbee2MQTT opens /dev/cuaU0
- cuaU0 is ITEAD SONOFF Zigbee 3.0 USB Dongle Plus V2

This is very new workload on this machine. Before I never had kernel
panics on it, but that machine was never really busy with anything as
it's mainly for experiments.

I have it in the ddb prompt, if anyone would be intrested to see
something more there. I will reboot probably tomorrow or so...


...
starting local daemons: cron.
Mon Jan 29 19:46:28 UTC 2024

OpenBSD/amd64 (pce-3967.home.local) (tty00)

login: unknown trap 763636304 in user mode
trap type 34043 code 4 rip a0bce1e2a cs 2b rflags 10206 cr2 cb5a038 cpl 0 rsp 
cbb524cc0
uvm_fault(0xfd81135c2858, 0x84fb, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  trap_print+0xed:leave
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 478162  61157  32767  0x1823  0x4002  node
*477361  61157  32767  0x1823  0x4003  node
 291026  61157  32767  0x1823  0x4001  node
8163851d(uvm_fault(0xfd81135c2858, 0x84f3, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  db_read_bytes+0x43: movq0(%rdi),%rax
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 478162  61157  32767  0x1823  0x4002  node
*477361  61157  32767  0x1823  0x4003  node
 291026  61157  32767  0x1823  0x4001  node
db_read_bytes(84f3,8,80002d8427c8) at db_read_bytes+0x43
db_get_value(84f3,8,0) at db_get_value+0x43
db_stack_trace_print(8163851d,0,e,82130d25,817ce280) at
 db_stack_trace_print+0x2dd
db_trap(6,0) at db_trap+0xef
db_ktrap(6,0,80002d8429b0) at db_ktrap+0x111
kerntrap(80002d8429b0) at kerntrap+0xa7
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
end of kernel
end trace frame: 0x84fb, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{3}> set $maxwidth = 0
ddb{3}> set $lines = 0

ddb{3}> x/s version
version:OpenBSD 7.4-current (GENERIC.MP) #1633: Sat Jan 27 08:06:43 MST 
2024\012
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012

ddb{3}> show panic
*cpu3: uvm_fault(0xfd81135c2858, 0x84f3, 0, 1) -> e

ddb{3}> trace
db_read_bytes(84f3,8,80002d8427c8) at db_read_bytes+0x43
db_get_value(84f3,8,0) at db_get_value+0x43
db_stack_trace_print(8163851d,0,e,82130d25,817ce280) at 
db_stack_trace_print+0x2dd
db_trap(6,0) at db_trap+0xef
db_ktrap(6,0,80002d8429b0) at db_ktrap+0x111
kerntrap(80002d8429b0) at kerntrap+0xa7
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
end of kernel
end trace frame: 0x84fb, count: -7

ddb{3}> machine cpuinfo
0: stopped
1: stopped
2: stopped
*   3: ddb

ddb{3}> show registers
rdi   0x84f3__ALIGN_SIZE+0x74f3
rsi  0x8
rbp   0x80002d8427b0
rbx  0xe
rdx   0x80002d8427c8
rcx 0x4f
rax   0x84fb__ALIGN_SIZE+0x74fb
r80x80002d8427e0
r9 0
r10   0x60437e0337a9e3d2
r11   0xcc55262893bf16eb
r12  0x8
r130
r14  0x8
r150
rip   0x812ef833db_read_bytes+0x43
cs   0x8
rflags   0x10246__ALIGN_SIZE+0xf246
rsp   0x80002d842790
ss 0
db_read_bytes+0x43: movq0(%rdi),%rax

ddb{3}> show proc
PROC (node) tid=477361 pid=61157 tcnt=11 stat=onproc
flags process=1823 proc=400
runpri=32, usrpri=51, slppri=32, nice=20
wchan=0x0, wmesg=, ps_single=0x0
forw=0x, list=0x80002d6fbaa0,0x80002d74dad0
process=0x8000fffed940 user=0x80002d83d000, 
vmspace=0xfd81135c2858
estcpu=1, cpticks=5, pctcpu=0.9, user=1, sys=2, intr=0

ddb{3}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 48956   20103  1  0  3  0x18100083  ttyin getty
  6441   46980  1  0  3  0x18100098  kqreadcron
 40773  359395  35505  32767  3  0x18100083  kqreadtail
 35505   66509  26482  32767  3   0x810008b  sigsusp   sh
 61157  134400  26482  32767  3  0x18200083  kqreadnode
 61157  182681  26482  32767  3  0x1c200083  kqreadnode
 61157  478162  26482  32767  7  0x1c23node
 61157  277278  26482  32767  3  0x1c200083  fsleepnode
*61157  477361  26482  32767  7  0x1c23node
 61157  291026  26482  32767  7  

Re: TSO em(4) problem

2024-01-30 Thread Alexander Bluhm
On Tue, Jan 30, 2024 at 12:07:08PM +0100, Hrvoje Popovski wrote:
> On 30.1.2024. 9:27, Hrvoje Popovski wrote:
> > I will prepare one box for this kind of traffic and will contact you and
> > marcus
> > 
> >> In theory when going through vlan interface it should remove
> >> M_VLANTAG.  But something must be wrong and I wonder what.
> >>
> >> bluhm
> 
> Hi,
> 
> I've managed to trigger watchdog in lab. It couldn't be possible without
> bluhm@ information about ix vlan, thank you.

Great, now we can debug the details.

I have to know how ix and em are connected.

Do you have any bridge or veb?  Where are your vlan trunks?
Any aggr, trunk, carp?

Is my understanding of your setup corect?

ix -> vlan -> forward -> em

Can something more happen, like

ix -> forward -> em

bluhm

> Jan 30 12:01:09 smc4 /bsd: em5: watchdog: head 123 tail 187 TDH 187 TDT 123
> Jan 30 12:01:18 smc4 /bsd: em5: watchdog: head 243 tail 307 TDH 307 TDT 243
> Jan 30 12:01:28 smc4 /bsd: em5: watchdog: head 463 tail 15 TDH 15 TDT 463
> Jan 30 12:01:37 smc4 /bsd: em5: watchdog: head 413 tail 477 TDH 477 TDT 413
> Jan 30 12:01:46 smc4 /bsd: em5: watchdog: head 195 tail 259 TDH 259 TDT 195
> Jan 30 12:01:55 smc4 /bsd: em5: watchdog: head 259 tail 323 TDH 323 TDT 259
> Jan 30 12:02:05 smc4 /bsd: em5: watchdog: head 333 tail 397 TDH 397 TDT 333
> Jan 30 12:02:14 smc4 /bsd: em5: watchdog: head 33 tail 97 TDH 97 TDT 33
> Jan 30 12:02:24 smc4 /bsd: em5: watchdog: head 459 tail 11 TDH 11 TDT 459
> Jan 30 12:02:33 smc4 /bsd: em5: watchdog: head 447 tail 511 TDH 511 TDT 447
> 
> 
> em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01: msi, address
> 00:1b:21:61:8a:94
> em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01: msi, address
> 00:1b:21:61:8a:95
> em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address
> 00:25:90:5d:c9:98
> em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address
> 00:25:90:5d:c9:99
> em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9a
> em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9b
> em6 at pci12 dev 0 function 2 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9c
> em7 at pci12 dev 0 function 3 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9d
> 
> 
> smc4# netstat -sp tcp | grep LRO
> 0 input LRO packets passed through pseudo device
> 4696315 input LRO generated packets from hardware
> 13205047 input LRO coalesced packets by network device
> 0 input bad LRO packets dropped
> smc4# netstat -sp tcp | grep TSO
> 0 output TSO packets software chopped
> 3672 output TSO packets hardware processed
> 0 output TSO packets generated
> 0 output TSO packets dropped
> 
> 
> 
> 
> smc4# ifconfig em5 hwfeatures
> em5: flags=8c43 mtu 1500
>  
> hwfeatures=31b7
>  hardmtu 9216
> lladdr 00:25:90:5d:c9:9b
> index 8 priority 0 llprio 3
> media: Ethernet autoselect (1000baseT
> full-duplex,master,rxpause,txpause)
> status: active
> inet 192.168.20.1 netmask 0xff00 broadcast 192.168.20.255
> 



pfsync in 7.4 generating much larger amount of traffic than 7.3

2024-01-30 Thread Ryan Freeman
Hello,

I've been trying to track down what exactly is causing such a large increase
in traffic on pfsync(4) interfaces on several of the firewall pairs I've
upgraded to 7.4.  All have recent errata applied.

custedge1$ doas syspatch -l
002_msplit
003_patch
004_ospfd
005_tmux
006_httpd
007_perl
008_vmm
009_pf
011_ssh

I've seen both CPU graphs and network graphs jump greatly since the upgrades,
by about 3-4x in some cases.

I've included some data in-line, but due to production nature of these
for pcap data and pfctl -ss -vv outputs I sent directly to dlg.

The issue wasn't noticed immediately, and even now for the most part everything
is performing 'OK' in that resources behind these firewalls are loading fine
and latency is low.

This has happened on a few virtualized fw pairs running on proxmox nodes
with virtio-backed vio(4) interfaces that tend to be like so:
- one vio for 'outside'
- one vio for 'inside'
- one vio for pfsync/also ssh pf conf changes to secondary
All generally have lots of vlan/carp interfaces.

However, I also have a physical pair of fw that use em(4) and the pfsync there
is also showing these same symptoms, linked directly to each other with a single
cable.

Here is a bwm-ng output of an affected host, the pfsync interface
is yarding around data at a higher rate vs every other interface (this
time of day is a quieter time)

  bwm-ng v0.6.3 (probing every 0.500s), press 'h' for help
  input: getifaddrs type: rate
  | iface   Rx   TxTotal
  ==
  lo0:   0.00  b/s0.00  b/s0.00  b/s
 vio0: 630.04 kb/s1.13 Mb/s1.76 Mb/s
 vio1:  27.78 Mb/s   28.12 Mb/s   55.90 Mb/s
 vio2:   1.47 Mb/s  199.67 kb/s1.67 Mb/s
 carp2100:   0.00  b/s1.10 kb/s1.10 kb/s
  carp600:   0.00  b/s1.10 kb/s1.10 kb/s
  carp601:   1.10 kb/s1.10 kb/s2.19 kb/s
  carp602:   0.00  b/s1.10 kb/s1.10 kb/s
  carp605:   0.00  b/s1.10 kb/s1.10 kb/s
  carp607:  16.05 kb/s1.10 kb/s   17.14 kb/s
  carp612:   0.00  b/s1.10 kb/s1.10 kb/s
  carp614:   0.00  b/s1.10 kb/s1.10 kb/s
  carp615:   0.00  b/s1.10 kb/s1.10 kb/s
   carp99:   1.06 Mb/s1.10 kb/s1.06 Mb/s
  pfsync0:  25.35 Mb/s   27.17 Mb/s   52.52 Mb/s
 vlan2100:   0.00  b/s1.10 kb/s1.10 kb/s
  vlan600:   0.00  b/s1.10 kb/s1.10 kb/s
  vlan601:   1.10 kb/s1.10 kb/s2.19 kb/s
  vlan602:   0.00  b/s1.10 kb/s1.10 kb/s
  vlan605:   0.00  b/s1.10 kb/s1.10 kb/s
  vlan607:  16.05 kb/s   82.25 kb/s   98.30 kb/s
  vlan612:   0.00  b/s1.10 kb/s1.10 kb/s
  vlan614:   0.00  b/s1.10 kb/s1.10 kb/s
  vlan615:   0.00  b/s1.10 kb/s1.10 kb/s
   vlan99:   1.06 Mb/s  102.76 kb/s1.16 Mb/s
  lo1:   0.00  b/s0.00  b/s0.00  b/s
   pflog0:   0.00  b/s3.32 kb/s3.32 kb/s
  vlan616: 222.81 kb/s1.12 Mb/s1.35 Mb/s
  carp616: 192.00 kb/s1.10 kb/s  193.10 kb/s
  --
total:  57.80 Mb/s   57.95 Mb/s  115.75 Mb/s


custedge1$ ifconfig vio1
vio1: flags=8843 mtu 1500
lladdr 82:7b:8a:de:50:6e
index 2 priority 0 llprio 3
media: Ethernet autoselect
status: active
inet 10.100.1.5 netmask 0xfffc broadcast 10.100.1.7
custedge1$ ifconfig pfsync0
pfsync0: flags=41 mtu 1500
index 18 priority 0 llprio 3
encap: parent vio1
pfsync: syncdev: vio1 syncpeer: 10.100.1.6 maxupd: 128 defer: off
groups: carp pfsync

I've verified maxupd is the same on both sides.

I recently switched to using syncpeer to see if that would lessen the load,
it does appear to have helped but I am suspecting over time the amount of
duplicates (maybe?) increases linearly.  It seemed like downing the pfsync if
and then upping it 'reset' the issue for awhile, at least with much lower than
what I saw the other day (>100mbit.. on 

Re: TSO em(4) problem

2024-01-30 Thread Hrvoje Popovski
On 30.1.2024. 9:27, Hrvoje Popovski wrote:
> I will prepare one box for this kind of traffic and will contact you and
> marcus
> 
>> In theory when going through vlan interface it should remove
>> M_VLANTAG.  But something must be wrong and I wonder what.
>>
>> bluhm

Hi,

I've managed to trigger watchdog in lab. It couldn't be possible without
bluhm@ information about ix vlan, thank you.



Jan 30 12:01:09 smc4 /bsd: em5: watchdog: head 123 tail 187 TDH 187 TDT 123
Jan 30 12:01:18 smc4 /bsd: em5: watchdog: head 243 tail 307 TDH 307 TDT 243
Jan 30 12:01:28 smc4 /bsd: em5: watchdog: head 463 tail 15 TDH 15 TDT 463
Jan 30 12:01:37 smc4 /bsd: em5: watchdog: head 413 tail 477 TDH 477 TDT 413
Jan 30 12:01:46 smc4 /bsd: em5: watchdog: head 195 tail 259 TDH 259 TDT 195
Jan 30 12:01:55 smc4 /bsd: em5: watchdog: head 259 tail 323 TDH 323 TDT 259
Jan 30 12:02:05 smc4 /bsd: em5: watchdog: head 333 tail 397 TDH 397 TDT 333
Jan 30 12:02:14 smc4 /bsd: em5: watchdog: head 33 tail 97 TDH 97 TDT 33
Jan 30 12:02:24 smc4 /bsd: em5: watchdog: head 459 tail 11 TDH 11 TDT 459
Jan 30 12:02:33 smc4 /bsd: em5: watchdog: head 447 tail 511 TDH 511 TDT 447


em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01: msi, address
00:1b:21:61:8a:94
em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01: msi, address
00:1b:21:61:8a:95
em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:25:90:5d:c9:98
em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:25:90:5d:c9:99
em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01: msi, address
00:25:90:5d:c9:9a
em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01: msi, address
00:25:90:5d:c9:9b
em6 at pci12 dev 0 function 2 "Intel I350" rev 0x01: msi, address
00:25:90:5d:c9:9c
em7 at pci12 dev 0 function 3 "Intel I350" rev 0x01: msi, address
00:25:90:5d:c9:9d


smc4# netstat -sp tcp | grep LRO
0 input LRO packets passed through pseudo device
4696315 input LRO generated packets from hardware
13205047 input LRO coalesced packets by network device
0 input bad LRO packets dropped
smc4# netstat -sp tcp | grep TSO
0 output TSO packets software chopped
3672 output TSO packets hardware processed
0 output TSO packets generated
0 output TSO packets dropped




smc4# ifconfig em5 hwfeatures
em5: flags=8c43 mtu 1500
 
hwfeatures=31b7
 hardmtu 9216
lladdr 00:25:90:5d:c9:9b
index 8 priority 0 llprio 3
media: Ethernet autoselect (1000baseT
full-duplex,master,rxpause,txpause)
status: active
inet 192.168.20.1 netmask 0xff00 broadcast 192.168.20.255




Re: TSO em(4) problem

2024-01-30 Thread Hrvoje Popovski
On 29.1.2024. 15:29, Alexander Bluhm wrote:
> On Sat, Jan 27, 2024 at 08:08:35AM +0100, Hrvoje Popovski wrote:
>> On 26.1.2024. 22:47, Alexander Bluhm wrote:
>>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
 I've manage to reproduce TSO em problem on anoter setup, unfortunatly
 production.
>>> What helped debugging a similar issue with ixl(4) and TSO was to
>>> remove all TSO specific code from the driver.  Then only this part
>>> remains from the original em(4) TSO diff.
>>>
>>> error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE,
>>> EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1),
>>> EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map);
>>>
>>> The parameters that changed when adding TSO are:
>>>
>>> bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535
>>> bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 
>>> 4096
>>>
>>> I suspect that this is the cause for the regression as disabling
>>> TSO did not help.  Would it be possible to run the diff below?  I
>>> expect that the problem will still be there.  But then we know it
>>> must be the change of one of the bus_dmamap_create() arguments.
>>>
>>> bluhm
>>
>> Hi,
>>
>> with this diff em0 seems happy and em watchdog is gone.
> 
> This is very interesting.  That means that the bus_dmamap_create()
> argument does not cause the regression.
> 
> Did you see anywhere "output TSO packets hardware processed in"
> netstat -s.  In some iteration of testing you turned TSO off with
> sysctl net.inet.tcp.tso=0, but it did not help.  So no TSO packets
> from the stack.
> 
> In another mail you mentioned
> 
>> Setup is very simple
>> em0 - carp <- uplink
>> em1 - pfsync
>> ix1 - vlans - carp
> 
> ix supports LRO.  If you forward from ix1 to em0 the LRO packets
> from ix hardware are split by TSO on em hardware.  And the ix does
> vlan offloading + LRO, so em must do vlan offloading properly with
> TSO.  Or do you use a vlan interface?
> 
> Does it help to disable LRO, ifconfig ix1 -tcplro ?


Yes, it helps... Thank you

uplink
em0: flags=8b43
mtu 1500

hwfeatures=31b7
hardmtu 9216
lladdr 0c:c4:7a:da:cd:5a
index 3 priority 0 llprio 3
groups: egress
media: Ethernet autoselect (1000baseT full-duplex,master,rxpause)
status: active


vlans are on ix1 - I've disabled LRO
ix1: flags=8b43
mtu 1500
lladdr 90:e2:ba:d7:1b:f5
index 2 priority 0 llprio 3
media: Ethernet autoselect (10GbaseSR full-duplex,rxpause,txpause)
status: active


before I've disabled LRO on ix1 I've got lot of watchdog on em0

bcbnfw1# uptime
 9:25AM  up 8 mins, 1 user, load averages: 0.14, 0.13, 0.06
bcbnfw1# cat /var/log/messages| grep watchdog
Jan 30 09:18:51 bcbnfw1 /bsd: em0: watchdog: head 148 tail 213 TDH 213
TDT 148
Jan 30 09:19:01 bcbnfw1 /bsd: em0: watchdog: head 160 tail 224 TDH 224
TDT 160
Jan 30 09:19:12 bcbnfw1 /bsd: em0: watchdog: head 163 tail 228 TDH 228
TDT 163
Jan 30 09:19:22 bcbnfw1 /bsd: em0: watchdog: head 128 tail 192 TDH 192
TDT 128
Jan 30 09:19:32 bcbnfw1 /bsd: em0: watchdog: head 309 tail 373 TDH 373
TDT 309
Jan 30 09:19:41 bcbnfw1 /bsd: em0: watchdog: head 113 tail 177 TDH 177
TDT 113
Jan 30 09:19:51 bcbnfw1 /bsd: em0: watchdog: head 402 tail 466 TDH 466
TDT 402
Jan 30 09:20:01 bcbnfw1 /bsd: em0: watchdog: head 114 tail 178 TDH 178
TDT 114
Jan 30 09:20:16 bcbnfw1 /bsd: em0: watchdog: head 111 tail 175 TDH 175
TDT 111
Jan 30 09:20:26 bcbnfw1 /bsd: em0: watchdog: head 199 tail 263 TDH 263
TDT 199



without LRO on ix1 everything seems to work just fine ...


> 
> I see this vlan code with mac_type checks.  Can we end in a
> configuration where we enable TSO but cannot do VLAN offloading?
> 
> #if NVLAN > 0
> /* Find out if we are in VLAN mode */
> if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 ||
> sc->hw.mac_type > em_i210)) {
> /* Set the VLAN id */
> desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag);
> 
> /* Tell hardware to add tag */
> desc->lower.data |= htole32(E1000_TXD_CMD_VLE);
> }
> #endif
> 
> Hrvoje, I know you do great tests in your lab.  Did you try this
> setup:
> 
> Send bulk TCP traffic in vlan that will trigger LRO.
> Do VLAN + LRO offloading in ix.
> Forward it to em with TSO.
> 

I will prepare one box for this kind of traffic and will contact you and
marcus

> In theory when going through vlan interface it should remove
> M_VLANTAG.  But something must be wrong and I wonder what.
> 
> bluhm
> 



Re: OpenBSD 7.4/amd64 on APU4D4 - kernel panic

2024-01-30 Thread Radek
> "uhidev" not "unidev",
There was a typo in my previous email, but there is a correct entry in 
/etc/bsd.re-config.

I removed "disable upd" from /etc/bsd.re-config and now my UPS is handled by 
ugen.

ugen0 at uhub0 port 4 "American Power Conversion Smart-UPS 2200 FW:UPS 09.3 / 
ID=18" rev 2.00/1.06 addr 2

# cat /etc/bsd.re-config
disable uhidev

Thanks!


On Sun, 28 Jan 2024 18:18:04 +
Stuart Henderson  wrote:

> On 2024/01/28 15:14, Radek wrote:
> > In this case UPS is handled by uhid but I need it to be handeled by ugen. 
> > How can I make it work without rebuilding the kernel?
> > Disabling unidev in /etc/bsd.re-config doesn't work in this case.
> 
> "uhidev" not "unidev",
> 
> > On Sat, 27 Jan 2024 18:48:48 +
> > Stuart Henderson  wrote:
> > 
> > > On 2024/01/27 10:36, Radek wrote:
> > > > but it doesn't work for another APC UPS hardware
> > > 
> > > > uhidev0 at uhub0 port 3 configuration 1 interface 0 "American Power 
> > > > Conversion Smart-UPS 2200 FW:UPS 09.3 / ID=18" rev 2.00/1.06 addr 2
> > > 
> > > ...
> > > 
> > > > > > On 2024-01-19 13:21, Stuart Henderson wrote:
> > > > > > > ...actually, maybe it needs to be "disable uhidev".
> > > 
> > > ^^
> > > 
> > 
> > 
> > Radek
> > 
> 


Radek