Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Vitaliy Makkoveev
Hi,

I’m not a fun of this.

> 
> + if (span_port_pool.pr_size == 0) {
> + pool_init(_port_pool, sizeof(struct veb_span_port),
> + 0, IPL_SOFTNET, 0, "vebspl", NULL);
> + }

Does initialized pool consume significant resources? Why don’t we
do this within vebattach(). This is also true for `veb_rule_pool’
initialization.



Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexandr Nedvedicky
Hello,

On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote:
> Hello,
> 
> I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
> My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
> Also I've used 7.1 for a limited time and there were no crash.
> After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.
> Here crash report and dmesg:
> 
> ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> veb_port_input(851c3800,fd806064c200,,82066600)
>  at veb_port_input+0x4d2
> ether_input(851c3800,fd806064c200) at ether_input+0x100
> end trace frame: 0x800025575290, count: 0
> ddb{1}> show panic
> 
> *cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth ==
> 0" f
> ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163
> 
> ddb{1}> trace
> 
> db_enter() at db_enter+0x10
> 
> panic(81f22e39) at panic+0xbf
> 
> __assert(81f96c9d,81f85ebc,a3,81fd252f) at
> __assert+0x2
> 
> 5
> 

diff below attempts to fix this particular panic triggered by veb_span()
function. This is fairly simple/straightforward change:

we grab references to veb ports inside SMR_READ_ section.

we keep those references in single linked list

as soon as we leave SMR_READ_ section we process the list:
dispatch packets

drop references to port

The change may uncover similar panics in other veb/bridge area.

diff applies to current

thanks for testing and reporting back.

regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c
index 2976cc200f1..a02dbac887f 100644
--- a/sys/net/if_veb.c
+++ b/sys/net/if_veb.c
@@ -159,6 +159,11 @@ struct veb_softc {
struct veb_ports sc_spans;
 };
 
+struct veb_span_port {
+   SLIST_ENTRY(veb_span_port)   sp_entry;
+   struct veb_port *sp_port;
+};
+
 #define DPRINTF(_sc, fmt...)do { \
if (ISSET((_sc)->sc_if.if_flags, IFF_DEBUG)) \
printf(fmt); \
@@ -225,6 +230,7 @@ static struct if_clone veb_cloner =
 IF_CLONE_INITIALIZER("veb", veb_clone_create, veb_clone_destroy);
 
 static struct pool veb_rule_pool;
+static struct pool span_port_pool;
 
 static int vport_clone_create(struct if_clone *, int);
 static int vport_clone_destroy(struct ifnet *);
@@ -266,6 +272,11 @@ veb_clone_create(struct if_clone *ifc, int unit)
0, IPL_SOFTNET, 0, "vebrpl", NULL);
}
 
+   if (span_port_pool.pr_size == 0) {
+   pool_init(_port_pool, sizeof(struct veb_span_port),
+   0, IPL_SOFTNET, 0, "vebspl", NULL);
+   }
+
sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
if (sc == NULL)
return (ENOMEM);
@@ -352,22 +363,38 @@ veb_span(struct veb_softc *sc, struct mbuf *m0)
struct veb_port *p;
struct ifnet *ifp0;
struct mbuf *m;
+   struct veb_span_port *sp;
+   SLIST_HEAD(, veb_span_port) span_list;
 
+   SLIST_INIT(_list)
smr_read_enter();
SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
ifp0 = p->p_ifp0;
if (!ISSET(ifp0->if_flags, IFF_RUNNING))
continue;
 
-   m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
-   if (m == NULL) {
-   /* XXX count error */
-   continue;
-   }
+   sp = pool_get(_port_pool, PR_NOWAIT);
+   if (sp == NULL)
+   continue;   /* XXX count error */
 
-   if_enqueue(ifp0, m); /* XXX count error */
+   veb_eb_brport_take(p);
+   sp->sp_port = p;
+   SLIST_INSERT_HEAD(_list, sp, sp_entry);
}
smr_read_leave();
+
+   while (!SLIST_EMPTY(_list)) {
+   sp = SLIST_FIRST(_list);
+   SLIST_REMOVE_HEAD(_list, sp_entry);
+
+   m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
+   if (m != NULL)
+   if_enqueue(sp->sp_port->p_ifp0, m);
+   /* XXX count error */
+
+   veb_eb_brport_rele(sp->sp_port);
+   pool_put(_port_pool, sp);
+   }
 }
 
 static int



Re: uhid spam: uhidev_intr: bad repid 33

2022-05-09 Thread Stuart Henderson
On 2022/05/09 20:39, Mark Kettenis wrote:
> > Date: Mon, 9 May 2022 17:44:29 +0100
> > From: Stuart Henderson 
> > 
> > I have a USB combi keyboard/trackpad thing which is triggering "bad
> > repid 33" frequently while attached (between a couple of times a minute,
> > and once every few minutes). It does work but it's annoying.
> > 
> > Presumably this is because it has non-contiguous report IDs?
> 
> That shouldn't be a problem.
> 
> > Anyone have an idea how to handle it?
> 
> No.  But showing dmesg output might help.

Here's one (the machine I had it connected to previously had been up
for long enough that the live dmesg wasn't any help, and it wasn't
connected early enough for dmesg.boot).

OpenBSD 7.1 (GENERIC.MP) #0: Sun Apr 24 09:30:43 MDT 2022

r...@syspatch-71-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4169531392 (3976MB)
avail mem = 4025880576 (3839MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec240 (83 entries)
bios0: vendor Intel Corp. version "WYLPT10H.86A.0054.2019.0902.1752" date 
09/02/2019
bios0: Intel Corporation D34010WYK
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT MCFG HPET SSDT SSDT DMAR CSRT
acpi0: wakeup devices RP01(S4) PXSX(S4) PXSX(S4) PXSX(S4) RP04(S4) PXSX(S4) 
PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) GLAN(S4) EHC1(S4) EHC2(S4) XHC_(S4) 
HDEF(S4) PEG0(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i3-4010U CPU @ 1.70GHz, 1696.37 MHz, 06-45-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i3-4010U CPU @ 1.70GHz, 1696.08 MHz, 06-45-01
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Core(TM) i3-4010U CPU @ 1.70GHz, 1696.08 MHz, 06-45-01
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i3-4010U CPU @ 1.70GHz, 1696.08 MHz, 06-45-01
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 40 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (RP01)
acpiprt2 at acpi0: bus 2 (RP04)
acpiprt3 at acpi0: bus -1 (PEG0)
acpiec0 at acpi0: not present
acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
acpicmos0 at acpi0
"PNP0C14" at acpi0 not configured
acpibtn0 at acpi0: PWRB
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
acpicpu0 at acpi0: C2(500@67 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(500@67 mwait.1@0x10), 

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexander Bluhm
On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote:
> I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
> My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
> Also I've used 7.1 for a limited time and there were no crash.
> After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.

For me this looks like a bug in veb(4).

> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(81f22e39) at panic+0xbf
> __assert(81f96c9d,81f85ebc,a3,81fd252f) at 
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40
> sleep_finish(800025574da0,1) at sleep_finish+0x10b
> rw_enter(822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,8520e000,800025575058) at pf_test+0x1088
> ip_input_if(800025575058,800025575064,4,0,8520e000) at 
> ip_input_if+0xcd
> ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39
> ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> veb_port_input(851c3800,fd806064c200,,82066600)
>  at veb_port_input+0x4d2
> ether_input(851c3800,fd806064c200) at ether_input+0x100
> vlan_input(8095a050,fd806064c200,8000255752bc) at 
> vlan_input+0x23d
> ether_input(8095a050,fd806064c200) at ether_input+0x85
> if_input_process(8095a050,800025575358) at if_input_process+0x6f
> ifiq_process(8095a460) at ifiq_process+0x69
> taskq_thread(80035080) at taskq_thread+0x100

veb_port_input -> veb_broadcast -> smr_read_enter; tp->p_enqueue
-> vport_if_enqueue -> if_vinput -> ifp->if_input -> ether_input ->
ipv4_input -> ip_input_if -> pf_test -> PF_LOCK -> rw_enter_write()

After calling smr_read_enter sleeping is not allowed according to
man page.  pf sleeps because it uses a read write lock.  I looks
like we have some contention on the pf lock.  With more forwarding
threads, sleep in pf is more likely.

> __mp_lock(823d986c) at __mp_lock+0x72
> wakeup_n(822cfe50,) at wakeup_n+0x32
> pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6
> pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6
> pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f
> ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at 
> ip_input_if+0xcd
> ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39
> ether_input(80924050,fd8053540f00) at ether_input+0x3ad
> if_input_process(80924050,80002557b688) at if_input_process+0x6f
> ifiq_process(80926500) at ifiq_process+0x69
> taskq_thread(80035100) at taskq_thread+0x100

> __mp_lock(823d986c) at __mp_lock+0x72
> wakeup_n(822cfe50,) at wakeup_n+0x32
> pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6
> pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6
> pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f
> ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at 
> ip_input_if+0xcd
> ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39
> ether_input(80924050,fd8053540f00) at ether_input+0x3ad
> if_input_process(80924050,80002557b688) at if_input_process+0x6f
> ifiq_process(80926500) at ifiq_process+0x69
> taskq_thread(80035100) at taskq_thread+0x100

Can some veb or smr hacker explain how this is supposed to work?

Sleeping in pf is also not ideal as it is in the hot path and slows
down packets.  But that is not easy to fix as we have to refactor
the memory allocations before converting pf lock to a mutex.  sashan@
is working on that.

bluhm



Re: uhid spam: uhidev_intr: bad repid 33

2022-05-09 Thread Mark Kettenis
> Date: Mon, 9 May 2022 17:44:29 +0100
> From: Stuart Henderson 
> 
> I have a USB combi keyboard/trackpad thing which is triggering "bad
> repid 33" frequently while attached (between a couple of times a minute,
> and once every few minutes). It does work but it's annoying.
> 
> Presumably this is because it has non-contiguous report IDs?

That shouldn't be a problem.

> Anyone have an idea how to handle it?

No.  But showing dmesg output might help.

> Bus 000 Device 002: ID 045e:0800 Microsoft Corp. 
> Device Descriptor:
>   bLength18
>   bDescriptorType 1
>   bcdUSB   2.00
>   bDeviceClass0 (Defined at Interface level)
>   bDeviceSubClass 0 
>   bDeviceProtocol 0 
>   bMaxPacketSize064
>   idVendor   0x045e Microsoft Corp.
>   idProduct  0x0800 
>   bcdDevice9.44
>   iManufacturer   1 Microsoft
>   iProduct2 Microsoft? Nano Transceiver v2.0
>   iSerial 0 
>   bNumConfigurations  1
>   Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength   84
> bNumInterfaces  3
> bConfigurationValue 1
> iConfiguration  0 
> bmAttributes 0xa0
>   (Bus Powered)
>   Remote Wakeup
> MaxPower  100mA
> Interface Descriptor:
>   bLength 9
>   bDescriptorType 4
>   bInterfaceNumber0
>   bAlternateSetting   0
>   bNumEndpoints   1
>   bInterfaceClass 3 Human Interface Device
>   bInterfaceSubClass  1 Boot Interface Subclass
>   bInterfaceProtocol  1 Keyboard
>   iInterface  0 
> HID Device Descriptor:
>   bLength 9
>   bDescriptorType33
>   bcdHID   1.11
>   bCountryCode0 Not supported
>   bNumDescriptors 1
>   bDescriptorType34 Report
>   wDescriptorLength  57
>   Report Descriptor: (length is 57)
> Item(Global): Usage Page, data= [ 0x01 ] 1
> Generic Desktop Controls
> Item(Local ): Usage, data= [ 0x06 ] 6
> Keyboard
> Item(Main  ): Collection, data= [ 0x01 ] 1
> Application
> Item(Global): Usage Page, data= [ 0x08 ] 8
> LEDs
> Item(Local ): Usage Minimum, data= [ 0x01 ] 1
> NumLock
> Item(Local ): Usage Maximum, data= [ 0x03 ] 3
> Scroll Lock
> Item(Global): Logical Minimum, data= [ 0x00 ] 0
> Item(Global): Logical Maximum, data= [ 0x01 ] 1
> Item(Global): Report Size, data= [ 0x01 ] 1
> Item(Global): Report Count, data= [ 0x03 ] 3
> Item(Main  ): Output, data= [ 0x02 ] 2
> Data Variable Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Report Count, data= [ 0x05 ] 5
> Item(Main  ): Output, data= [ 0x01 ] 1
> Constant Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Usage Page, data= [ 0x07 ] 7
> Keyboard
> Item(Local ): Usage Minimum, data= [ 0xe0 0x00 ] 224
> Control Left
> Item(Local ): Usage Maximum, data= [ 0xe7 0x00 ] 231
> GUI Right
> Item(Global): Report Count, data= [ 0x08 ] 8
> Item(Main  ): Input, data= [ 0x02 ] 2
> Data Variable Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Report Size, data= [ 0x08 ] 8
> Item(Global): Report Count, data= [ 0x01 ] 1
> Item(Main  ): Input, data= [ 0x01 ] 1
> Constant Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Local ): Usage Minimum, data= [ 0x00 ] 0
> No Event
> Item(Local ): Usage Maximum, data= [ 0x91 0x00 ] 145
> LANG 2 (Hanja Conversion, Korea)
> Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
> Item(Global): Report Count, data= [ 0x06 ] 6
> Item(Main  ): Input, data= [ 0x00 ] 0
> Data Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Main  ): End Collection, data=none
>   Endpoint Descriptor:
>  

Re: 'less -F' broken?

2022-05-09 Thread Lyndon Nerenberg (VE7TFX/VE6BBM)
Oh!  I completely missed -X.  Adding that fixed it.  I'm running
with LESS=-aicFX and all is well under TERM=xterm now.  Thanks!

--lyndon



Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexandr Nedvedicky
Hello Barbaros,

thank you for testing and excellent report.



> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(81f22e39) at panic+0xbf
> __assert(81f96c9d,81f85ebc,a3,81fd252f) at 
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40

assert indicates we attempt to sleep inside SMR section,
which must be avoided.

> sleep_finish(800025574da0,1) at sleep_finish+0x10b
> rw_enter(822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,8520e000,800025575058) at pf_test+0x1088
> ip_input_if(800025575058,800025575064,4,0,8520e000) at 
> ip_input_if+0xcd
> ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39
> ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> veb_port_input(851c3800,fd806064c200,,82066600)
>  at veb_port_input+0x4d2
> ether_input(851c3800,fd806064c200) at ether_input+0x100
> vlan_input(8095a050,fd806064c200,8000255752bc) at 
> vlan_input+0x23d
> ether_input(8095a050,fd806064c200) at ether_input+0x85
> if_input_process(8095a050,800025575358) at if_input_process+0x6f
> ifiq_process(8095a460) at ifiq_process+0x69
> taskq_thread(80035080) at taskq_thread+0x100

above is a call stack, which has done a bad thing (sleeping SMR section)

in my opinion the primary suspect is veb_port_input() which code reads as
follows:

 966 static struct mbuf *
 967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void 
*brport)
 968 {
 969 struct veb_port *p = brport;
 970 struct veb_softc *sc = p->p_veb;
 971 struct ifnet *ifp = >sc_if;
 972 struct ether_header *eh;
 ...
1021 counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
1022 m->m_pkthdr.len);
1023 
1024 /* force packets into the one routing domain for pf */
1025 m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
1026 
1027 #if NBPFILTER > 0
1028 if_bpf = READ_ONCE(ifp->if_bpf);
1029 if (if_bpf != NULL) {
1030 if (bpf_mtap_ether(if_bpf, m, 0) != 0)
1031 goto drop;
1032 }
1033 #endif
1034 
1035 veb_span(sc, m);
1036 
1037 if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) &&
1038 veb_ip_filter(m))
1039 goto drop;
1040 
1041 if (!ISSET(ifp->if_flags, IFF_LINK0) &&
1042 veb_vlan_filter(m))
1043 goto drop;
1044 
1045 if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst))
1046 goto drop;

call to veb_span() at line 1035 seems to be our guy/culprit (in my opinion):

 356 smr_read_enter();
 357 SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
 358 ifp0 = p->p_ifp0;
 359 if (!ISSET(ifp0->if_flags, IFF_RUNNING))
 360 continue;
 361 
 362 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
 363 if (m == NULL) {
 364 /* XXX count error */
 365 continue;
 366 }
 367 
 368 if_enqueue(ifp0, m); /* XXX count error */
 369 }
 370 smr_read_leave();

loop above comes from veb_span(), which calls if_enqueue() from within
a smr section. The line 368 calls here:

2191 static int
2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m)
2193 {
2194 /*
2195  * switching an l2 packet toward a vport means pushing it
2196  * into the network stack. this function exists to make
2197  * if_vinput compat with veb calling if_enqueue.
2198  */
2199 
2200 if_vinput(ifp, m);
2201
2202 return (0);
2203 }  

which in turn calls if_vinput() which calls further down to ipstack, and IP
stack my sleep. We must change veb_span() such calls to if_vinput() will happen
outside of SMR section.

I don't have such complex setup to use vlans and virtual ports. I'll try to
cook some diff and pass it to you for testing.

thanks again for coming back to us with report.

regards
sashan




Re: 'less -F' broken?

2022-05-09 Thread Hashim Mahmoud
On Mon, 09 May 2022 00:37:58 -0700
"Lyndon Nerenberg (VE7TFX/VE6BBM)"  wrote:

> It seems that the -F flag to less is broken. Instead of reverting to
> cat-like behaviour on short files, it prints nothing at all.
> 
>   : lyndon@orthanc:/home/lyndon; cat typescript
> Script started on Mon May  9 00:22:41 2022
>   : lyndon@orthanc:/home/lyndon; seq 10 > numbers
>   : lyndon@orthanc:/home/lyndon; cat numbers
>   1
>   2
>   3
>   4
>   5
>   6
>   7
>   8
>   9
>   10
>   : lyndon@orthanc:/home/lyndon; less -F numbers
>   : lyndon@orthanc:/home/lyndon; ^D
>   
>   Script done on Mon May  9 00:22:58 2022
> 
> OpenBSD orthanc.ca 7.1 GENERIC.MP#465 amd64
> 
> This is with TERM=xterm. If I set TERM=vt100, it works.  This seems
> to relate to some long-standing oddities I've noticed with the
> 'xterm' terminfo definition OpenBSD uses.  In particular, programs
> like man always switch to the alternate screen buffer when displaying
> output, then switch back afterwards.  I find this very annoying so
> I always compile my own terminfo definition for 'xterm', which works
> everyplace else but not on OpenBSD.
> 
> I've tried poking into this a few times, but I just don't have the
> energy to dive into the guts of curses.  Has anyone else run into
> this, and maybe have some suggestions on where to start digging?
> 
> Here's the terminfo definition I use, FWIW:
> 
> # An xterm without the internal screen memory buffer.  This variant
> # does not save/restore the screen when running termcap based
> applications. # This means the man page you were reading doesn't
> disappear from the screen # when you quit the pager.
> xterm|vs100|xterm terminal emulator,
>   am, xenl, km, mir, msgr,
>   cols#80, it#8, lines#65,
>   acsc=``aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
>   bel=^G, cr=^M, csr=\E[%i%p1%d;%p2%dr, tbc=\E[3g,
>   clear=\E[H\E[2J, el1=\E[1K$<3>, el=\E[K, ed=\E[J,
>   cup=\E[%i%p1%d;%p2%dH, cud1=^J, home=\E[H, cub1=^H,
>   cuf1=\E[C, cuu1=\E[A, dch1=\E[P, dl1=\E[M, enacs=\E(B\E)0,
>   smacs=^N, blink=\E[5m, bold=\E[1m, rev=\E[7m, smso=\E[7m,
>   smul=\E[4m, rmacs=^O, sgr0=\E[m, rmso=\E[m, rmul=\E[m,
>   ich1=\E[@, il1=\E[L, ka1=\EOq, ka3=\EOs, kb2=\EOr, kbs=^H,
>   kc1=\EOp, kc3=\EOn, kcud1=\EOB, kent=\EOM, kf0=\E[21~,
>   kf1=\E[11~, kf10=\EOx, kf2=\E[12~, kf3=\E[13~, kf4=\E[14~,
>   kf5=\E[15~, kf6=\E[17~, kf7=\E[18~, kf8=\E[19~, kf9=\E[20~,
>   kcub1=\EOD, kcuf1=\EOC, kcuu1=\EOA, rmkx=\E[?1l\E>,
>   smkx=\E[?1h\E=, dch=\E[%p1%dP, dl=\E[%p1%dM,
>   cud=\E[%p1%dB, ich=\E[%p1%d@, il=\E[%p1%dL,
>   cub=\E[%p1%dD, cuf=\E[%p1%dC, cuu=\E[%p1%dA,
>   rs1=\E>\E[1;3;4;5;6l\E[?7h\E[m\E[r\E[2J\E[H, rs2=@,  
>   rc=\E8, sc=\E7, ind=^J, ri=\EM,
>   
> sgr=\E[0%?%p1%p6%|%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;m%?%p9%t\016%e\017%;,
>   hts=\EH, ht=^I,
> 
> 
> --lyndon
> 

Just replying to confirm this bug appears for me too.
Using tmux under xterm makes $TERM equal to "screen", and when I `less
-F` on a file that can fit on less than one screen just makes xterm
flicker, with no output... but if I do `less -F foo > bar`, then bar
has the contents of foo... so it's still outputting to stdout,
presumably.

If I manually set $TERM to `xterm` I get no output or flickering, but
`less -F foo > bar` exhibits the same behaivour as above.

TERM=vt100 works for me, likewise.

A recently updated machine to OpenBSD 7.1, if it matters.



uhid spam: uhidev_intr: bad repid 33

2022-05-09 Thread Stuart Henderson
I have a USB combi keyboard/trackpad thing which is triggering "bad
repid 33" frequently while attached (between a couple of times a minute,
and once every few minutes). It does work but it's annoying.

Presumably this is because it has non-contiguous report IDs?
Anyone have an idea how to handle it?

Bus 000 Device 002: ID 045e:0800 Microsoft Corp. 
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize064
  idVendor   0x045e Microsoft Corp.
  idProduct  0x0800 
  bcdDevice9.44
  iManufacturer   1 Microsoft
  iProduct2 Microsoft? Nano Transceiver v2.0
  iSerial 0 
  bNumConfigurations  1
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   84
bNumInterfaces  3
bConfigurationValue 1
iConfiguration  0 
bmAttributes 0xa0
  (Bus Powered)
  Remote Wakeup
MaxPower  100mA
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   1
  bInterfaceClass 3 Human Interface Device
  bInterfaceSubClass  1 Boot Interface Subclass
  bInterfaceProtocol  1 Keyboard
  iInterface  0 
HID Device Descriptor:
  bLength 9
  bDescriptorType33
  bcdHID   1.11
  bCountryCode0 Not supported
  bNumDescriptors 1
  bDescriptorType34 Report
  wDescriptorLength  57
  Report Descriptor: (length is 57)
Item(Global): Usage Page, data= [ 0x01 ] 1
Generic Desktop Controls
Item(Local ): Usage, data= [ 0x06 ] 6
Keyboard
Item(Main  ): Collection, data= [ 0x01 ] 1
Application
Item(Global): Usage Page, data= [ 0x08 ] 8
LEDs
Item(Local ): Usage Minimum, data= [ 0x01 ] 1
NumLock
Item(Local ): Usage Maximum, data= [ 0x03 ] 3
Scroll Lock
Item(Global): Logical Minimum, data= [ 0x00 ] 0
Item(Global): Logical Maximum, data= [ 0x01 ] 1
Item(Global): Report Size, data= [ 0x01 ] 1
Item(Global): Report Count, data= [ 0x03 ] 3
Item(Main  ): Output, data= [ 0x02 ] 2
Data Variable Absolute No_Wrap Linear
Preferred_State No_Null_Position Non_Volatile 
Bitfield
Item(Global): Report Count, data= [ 0x05 ] 5
Item(Main  ): Output, data= [ 0x01 ] 1
Constant Array Absolute No_Wrap Linear
Preferred_State No_Null_Position Non_Volatile 
Bitfield
Item(Global): Usage Page, data= [ 0x07 ] 7
Keyboard
Item(Local ): Usage Minimum, data= [ 0xe0 0x00 ] 224
Control Left
Item(Local ): Usage Maximum, data= [ 0xe7 0x00 ] 231
GUI Right
Item(Global): Report Count, data= [ 0x08 ] 8
Item(Main  ): Input, data= [ 0x02 ] 2
Data Variable Absolute No_Wrap Linear
Preferred_State No_Null_Position Non_Volatile 
Bitfield
Item(Global): Report Size, data= [ 0x08 ] 8
Item(Global): Report Count, data= [ 0x01 ] 1
Item(Main  ): Input, data= [ 0x01 ] 1
Constant Array Absolute No_Wrap Linear
Preferred_State No_Null_Position Non_Volatile 
Bitfield
Item(Local ): Usage Minimum, data= [ 0x00 ] 0
No Event
Item(Local ): Usage Maximum, data= [ 0x91 0x00 ] 145
LANG 2 (Hanja Conversion, Korea)
Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
Item(Global): Report Count, data= [ 0x06 ] 6
Item(Main  ): Input, data= [ 0x00 ] 0
Data Array Absolute No_Wrap Linear
Preferred_State No_Null_Position Non_Volatile 
Bitfield
Item(Main  ): End Collection, data=none
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81  EP 1 IN
bmAttributes3
  Transfer TypeInterrupt
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0008  1x 8 bytes
bInterval   4

7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Barbaros Bilek
Hello,

I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
Also I've used 7.1 for a limited time and there were no crash.
After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.
Here crash report and dmesg:

ether_input(8520e000,fd8053616700) at ether_input+0x3ad

vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19

veb_port_input(851c3800,fd806064c200,,82066600)

 at veb_port_input+0x4d2

ether_input(851c3800,fd806064c200) at ether_input+0x100

end trace frame: 0x800025575290, count: 0

https://www.openbsd.org/ddb.html describes the minimum info required in bug

reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> show panic

*cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth ==
0" f

ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163

ddb{1}> trace

db_enter() at db_enter+0x10

panic(81f22e39) at panic+0xbf

__assert(81f96c9d,81f85ebc,a3,81fd252f) at
__assert+0x2

5

assertwaitok() at assertwaitok+0xcc

mi_switch() at mi_switch+0x40

sleep_finish(800025574da0,1) at sleep_finish+0x10b

rw_enter(822cfe50,1) at rw_enter+0x1cb

pf_test(2,1,8520e000,800025575058) at pf_test+0x1088

ip_input_if(800025575058,800025575064,4,0,8520e000) at
ip_input

_if+0xcd

ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39

ether_input(8520e000,fd8053616700) at ether_input+0x3ad

vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19

veb_port_input(851c3800,fd806064c200,,82066600)

 at veb_port_input+0x4d2

ether_input(851c3800,fd806064c200) at ether_input+0x100

vlan_input(8095a050,fd806064c200,8000255752bc) at
vlan_input+0x

23d

ether_input(8095a050,fd806064c200) at ether_input+0x85

if_input_process(8095a050,800025575358) at if_input_process+0x6f

ifiq_process(8095a460) at ifiq_process+0x69

taskq_thread(80035080) at taskq_thread+0x100

end trace frame: 0x0, count: -19

ddb{1}> ps /o

TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND

 422021  80579  0 0x2  07  ifconfig

 292011   89065020x12  0x4008  mariadbd

 427181   89065020x12  0x4006K mariadbd

  86788   89065020x12  0x4003  mariadbd

 302453  98158  0 0x14000  0x2009  softnet

  88346  66890  0 0x14000  0x2005  softnet

ddb{1}> machine ddbcpu 2

Stopped at  x86_ipi_db+0x12:leave

x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1c3ff0) at sched_idle+0x280

end trace frame: 0x0, count: 10

ddb{2}> trace

x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1c3ff0) at sched_idle+0x280

end trace frame: 0x0, count: -5

ddb{2}> machine ddbcpu 2

Invalid cpu 2

ddb{2}> t[A[A

Bad character

x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1c3ff0) at sched_idle+0x280

end trace frame: 0x0, count: -5

ddb{2}> machine ddbcpu 3

Stopped at  x86_ipi_db+0x12:leave

x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

__mp_lock(823d986c) at __mp_lock+0x72

wakeup_n(8000fffeca88,1) at wakeup_n+0x32

futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4

sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at
sys_futex+0xe6


syscall(8000265ca840) at syscall+0x374

Xsyscall() at Xsyscall+0x128

end of kernel

end trace frame: 0xc13f5b4b090, count: 6

ddb{3}> trace

x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

__mp_lock(823d986c) at __mp_lock+0x72

wakeup_n(8000fffeca88,1) at wakeup_n+0x32

futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4

sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at
sys_futex+0xe6


syscall(8000265ca840) at syscall+0x374

Xsyscall() at Xsyscall+0x128

end of kernel

end trace frame: 0xc13f5b4b090, count: -9

ddb{3}> machine ddbcpu 4

Stopped at  x86_ipi_db+0x12:leave

x86_ipi_db(80001d1d5ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1d5ff0) at 

Re: 'less -F' broken?

2022-05-09 Thread Stuart Henderson
On 2022/05/09 00:37, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> It seems that the -F flag to less is broken. Instead of reverting to
> cat-like behaviour on short files, it prints nothing at all.

You'll probably be happier with at least LESS=Xcq in the environment.
(I use MXciq).

> This is with TERM=xterm. If I set TERM=vt100, it works.  This seems
> to relate to some long-standing oddities I've noticed with the
> 'xterm' terminfo definition OpenBSD uses.  In particular, programs
> like man always switch to the alternate screen buffer when displaying
> output, then switch back afterwards.  I find this very annoying so

Me too.



'less -F' broken?

2022-05-09 Thread Lyndon Nerenberg (VE7TFX/VE6BBM)
It seems that the -F flag to less is broken. Instead of reverting to
cat-like behaviour on short files, it prints nothing at all.

  : lyndon@orthanc:/home/lyndon; cat typescript 
 
  Script started on Mon May  9 00:22:41 2022
  : lyndon@orthanc:/home/lyndon; seq 10 > numbers
  : lyndon@orthanc:/home/lyndon; cat numbers
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  : lyndon@orthanc:/home/lyndon; less -F numbers
  : lyndon@orthanc:/home/lyndon; ^D
  
  Script done on Mon May  9 00:22:58 2022

OpenBSD orthanc.ca 7.1 GENERIC.MP#465 amd64

This is with TERM=xterm. If I set TERM=vt100, it works.  This seems
to relate to some long-standing oddities I've noticed with the
'xterm' terminfo definition OpenBSD uses.  In particular, programs
like man always switch to the alternate screen buffer when displaying
output, then switch back afterwards.  I find this very annoying so
I always compile my own terminfo definition for 'xterm', which works
everyplace else but not on OpenBSD.

I've tried poking into this a few times, but I just don't have the
energy to dive into the guts of curses.  Has anyone else run into
this, and maybe have some suggestions on where to start digging?

Here's the terminfo definition I use, FWIW:

# An xterm without the internal screen memory buffer.  This variant
# does not save/restore the screen when running termcap based applications.
# This means the man page you were reading doesn't disappear from the screen
# when you quit the pager.
xterm|vs100|xterm terminal emulator,
am, xenl, km, mir, msgr,
cols#80, it#8, lines#65,
acsc=``aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
bel=^G, cr=^M, csr=\E[%i%p1%d;%p2%dr, tbc=\E[3g,
clear=\E[H\E[2J, el1=\E[1K$<3>, el=\E[K, ed=\E[J,
cup=\E[%i%p1%d;%p2%dH, cud1=^J, home=\E[H, cub1=^H,
cuf1=\E[C, cuu1=\E[A, dch1=\E[P, dl1=\E[M, enacs=\E(B\E)0,
smacs=^N, blink=\E[5m, bold=\E[1m, rev=\E[7m, smso=\E[7m,
smul=\E[4m, rmacs=^O, sgr0=\E[m, rmso=\E[m, rmul=\E[m,
ich1=\E[@, il1=\E[L, ka1=\EOq, ka3=\EOs, kb2=\EOr, kbs=^H,
kc1=\EOp, kc3=\EOn, kcud1=\EOB, kent=\EOM, kf0=\E[21~,
kf1=\E[11~, kf10=\EOx, kf2=\E[12~, kf3=\E[13~, kf4=\E[14~,
kf5=\E[15~, kf6=\E[17~, kf7=\E[18~, kf8=\E[19~, kf9=\E[20~,
kcub1=\EOD, kcuf1=\EOC, kcuu1=\EOA, rmkx=\E[?1l\E>,
smkx=\E[?1h\E=, dch=\E[%p1%dP, dl=\E[%p1%dM,
cud=\E[%p1%dB, ich=\E[%p1%d@, il=\E[%p1%dL,
cub=\E[%p1%dD, cuf=\E[%p1%dC, cuu=\E[%p1%dA,
rs1=\E>\E[1;3;4;5;6l\E[?7h\E[m\E[r\E[2J\E[H, rs2=@,
rc=\E8, sc=\E7, ind=^J, ri=\EM,

sgr=\E[0%?%p1%p6%|%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;m%?%p9%t\016%e\017%;,
hts=\EH, ht=^I,


--lyndon