Re: alc0 DMA write error
On Fri, May 13, 2022 at 11:50:56AM -0400, Scott C. MacCallum wrote: > > During the installation of OpenBSD 7.1 amd64 release, after the network > interface was auto configured for DHCP, a repeated error message was > displayed to the screen at random times, "DMA write error! --resetting." > This prevented the network installation of the operating system on this > interface, so I worked around the problem using an USB network interface. > After I booted into the newly installed system, the error continued to be > displayed on the screen at random. Syspatch did not fix the issue. > > I installed OpenBSD 7.1-current (GENERIC.MP) #516 and the error continues to > be displayed on the screen at random. It seems that AR816x/AR817x also needs to force maximum data payload size to 128 bytes. Does the diff below help? Thanks. Index: sys/dev/pci/if_alc.c === RCS file: /cvs/src/sys/dev/pci/if_alc.c,v retrieving revision 1.55 diff -u -p -u -p -r1.55 if_alc.c --- sys/dev/pci/if_alc.c11 Mar 2022 18:00:45 - 1.55 +++ sys/dev/pci/if_alc.c13 May 2022 17:45:42 - @@ -1354,10 +1354,11 @@ alc_attach(struct device *parent, struct sc->alc_dma_wr_burst = 3; /* * Force maximum payload size to 128 bytes for -* E2200/E2400/E2500. +* E2200/E2400/E2500/AR8162/AR8171/AR8172. * Otherwise it triggers DMA write error. */ - if ((sc->alc_flags & ALC_FLAG_E2X00) != 0) + if ((sc->alc_flags & + (ALC_FLAG_E2X00 | ALC_FLAG_AR816X_FAMILY)) != 0) sc->alc_dma_wr_burst = 0; alc_init_pcie(sc, base); }
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote: > at this point we hold a NET_LOCK(). So basically if there won't > be enough memory we might start sleeping waiting for memory > while we will be holding a NET_LOCK. > > This is something we should try to avoid, however this can be > sorted out later. At this point I just want to point out > this problem, which can be certainly solved in follow up > commit. pf(4) also has its homework to be done around > sleeping mallocs. I think sleeping with netlock is not a problem in general. In pf(4) ioctl we sleep with netlock and pflock while doing copyin() or copyout(). This results in a lock order reversal due to a hack in uvn_io(). In my opinion we should not sleep within pf lock, so we can convert it to mutex or someting better later. In veb configuration we are holding the netlock and sleep in smr_barrier() and refcnt_finalize(). An additional sleep in malloc() is fine here. Holding the netlock and sleeping in m_get() is worse. There is no recovery after reaching the mbuf limit. Sleeping rules are inconsistent and depend on the area of the stack. Different people have multiple ideas how it should be done. bluhm
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote: > sorry i'm late to the party. can you try this diff? Thanks for having a look. I added veb(4) to my setup. With this diff, I cannot trigger a crash anymore. OK bluhm@ > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all the > ports. the forwarding path uses smr to get hold of a map, takes a map > ref, and then leaves the smr crit section before iterating over the map > and pushing packets. > > this means we should only take and release a single refcnt when > we're pushing packets out any number of ports. > > if no span ports are configured, then there's no span port map and > we don't try and take a ref, we can just return early. > > we also only take and release a single refcnt when we forward the > actual packet. forwarding to a single port provided by an etherbridge > lookup already takes/releases the single port ref. if it falls > through that for unknown unicast or broadcast/multicast, then it's > a single refcnt for the current map of all ports. > > Index: if_veb.c > === > RCS file: /cvs/src/sys/net/if_veb.c,v > retrieving revision 1.25 > diff -u -p -r1.25 if_veb.c > --- if_veb.c 4 Jan 2022 06:32:39 - 1.25 > +++ if_veb.c 13 May 2022 02:01:43 - > @@ -139,13 +139,13 @@ struct veb_port { > struct veb_rule_list p_vr_list[2]; > #define VEB_RULE_LIST_OUT0 > #define VEB_RULE_LIST_IN 1 > - > - SMR_TAILQ_ENTRY(veb_port)p_entry; > }; > > struct veb_ports { > - SMR_TAILQ_HEAD(, veb_port) l_list; > - unsigned int l_count; > + struct refcntm_refs; > + unsigned int m_count; > + > + /* followed by an array of veb_port pointers */ > }; > > struct veb_softc { > @@ -155,8 +155,8 @@ struct veb_softc { > struct etherbridge sc_eb; > > struct rwlocksc_rule_lock; > - struct veb_ports sc_ports; > - struct veb_ports sc_spans; > + struct veb_ports*sc_ports; > + struct veb_ports*sc_spans; > }; > > #define DPRINTF(_sc, fmt...)do { \ > @@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u > static int veb_p_output(struct ifnet *, struct mbuf *, > struct sockaddr *, struct rtentry *); > > -static void veb_p_dtor(struct veb_softc *, struct veb_port *, > - const char *); > +static inline size_t > +veb_ports_size(unsigned int n) > +{ > + /* use of _ALIGN is inspired by CMSGs */ > + return _ALIGN(sizeof(struct veb_ports)) + > + n * sizeof(struct veb_port *); > +} > + > +static inline struct veb_port ** > +veb_ports_array(struct veb_ports *m) > +{ > + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m))); > +} > + > +static void veb_ports_free(struct veb_ports *); > + > +static void veb_p_unlink(struct veb_softc *, struct veb_port *); > +static void veb_p_fini(struct veb_port *); > +static void veb_p_dtor(struct veb_softc *, struct veb_port *); > static int veb_add_port(struct veb_softc *, > const struct ifbreq *, unsigned int); > static int veb_del_port(struct veb_softc *, > @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i > return (ENOMEM); > > rw_init(>sc_rule_lock, "vebrlk"); > - SMR_TAILQ_INIT(>sc_ports.l_list); > - SMR_TAILQ_INIT(>sc_spans.l_list); > + sc->sc_ports = NULL; > + sc->sc_spans = NULL; > > ifp = >sc_if; > > @@ -314,7 +331,10 @@ static int > veb_clone_destroy(struct ifnet *ifp) > { > struct veb_softc *sc = ifp->if_softc; > - struct veb_port *p, *np; > + struct veb_ports *mp, *ms; > + struct veb_port **ps; > + struct veb_port *p; > + unsigned int i; > > NET_LOCK(); > sc->sc_dead = 1; > @@ -326,10 +346,60 @@ veb_clone_destroy(struct ifnet *ifp) > if_detach(ifp); > > NET_LOCK(); > - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_ports.l_list, p_entry, np) > - veb_p_dtor(sc, p, "destroy"); > - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_spans.l_list, p_entry, np) > - veb_p_dtor(sc, p, "destroy"); > + > + /* > + * this is an upside down version of veb_p_dtor() and > + * veb_ports_destroy() to avoid a lot of malloc/free and > + * smr_barrier calls if we remove ports one by one. > + */ > + > + mp = SMR_PTR_GET_LOCKED(>sc_ports); > + SMR_PTR_SET_LOCKED(>sc_ports, NULL); > + if (mp != NULL) { > + ps = veb_ports_array(mp); > + for (i = 0; i < mp->m_count; i++) { > + veb_p_unlink(sc, ps[i]); > + } > + } > +
Re: uhid spam: uhidev_intr: bad repid 33
On Fri, May 13, 2022 at 05:46:35PM +0100, Stuart Henderson wrote: > On 2022/05/11 07:54, Anton Lindqvist wrote: > > On Tue, May 10, 2022 at 02:51:07PM +0100, Stuart Henderson wrote: > > > On 2022/05/10 08:12, Anton Lindqvist wrote: > > > > On Mon, May 09, 2022 at 05:44:29PM +0100, Stuart Henderson wrote: > > > > > I have a USB combi keyboard/trackpad thing which is triggering "bad > > > > > repid 33" frequently while attached (between a couple of times a > > > > > minute, > > > > > and once every few minutes). It does work but it's annoying. > > > > > > > > > > Presumably this is because it has non-contiguous report IDs? > > > > > Anyone have an idea how to handle it? > > > > > > > > Could you send me the raw report descriptors: > > > > > > > > $ (set -e; i=0; while :; do doas usbhidctl -f /dev/uhid$i -R > > > > >/tmp/uhid$i.raw; i=$((i + 1)); done) > > > > > > > > My guess here is that uhidev_maxrepid() does not observe all hid items > > as it passes hid_none as the kind to hid_start_parse(). The userspace > > equivalent of hid_start_parse() accepts a kind bitmask allowing > > hid_get_item() to return items of varying kinds. For the kernel, we > > could treat hid_none as a sentinel representing all possible hid kinds. > > Does the following diff get rid of the bad repid output? > > Running it now and I haven't seen any yet, including after pressing the > strange keys and using the trackpad, so I think it works :) Thanks. Great, the diff will be in snaps soon.
Re: uhid spam: uhidev_intr: bad repid 33
On 2022/05/11 07:54, Anton Lindqvist wrote: > On Tue, May 10, 2022 at 02:51:07PM +0100, Stuart Henderson wrote: > > On 2022/05/10 08:12, Anton Lindqvist wrote: > > > On Mon, May 09, 2022 at 05:44:29PM +0100, Stuart Henderson wrote: > > > > I have a USB combi keyboard/trackpad thing which is triggering "bad > > > > repid 33" frequently while attached (between a couple of times a minute, > > > > and once every few minutes). It does work but it's annoying. > > > > > > > > Presumably this is because it has non-contiguous report IDs? > > > > Anyone have an idea how to handle it? > > > > > > Could you send me the raw report descriptors: > > > > > > $ (set -e; i=0; while :; do doas usbhidctl -f /dev/uhid$i -R > > > >/tmp/uhid$i.raw; i=$((i + 1)); done) > > > > > My guess here is that uhidev_maxrepid() does not observe all hid items > as it passes hid_none as the kind to hid_start_parse(). The userspace > equivalent of hid_start_parse() accepts a kind bitmask allowing > hid_get_item() to return items of varying kinds. For the kernel, we > could treat hid_none as a sentinel representing all possible hid kinds. > Does the following diff get rid of the bad repid output? Running it now and I haven't seen any yet, including after pressing the strange keys and using the trackpad, so I think it works :) Thanks. > diff --git sys/dev/hid/hid.c sys/dev/hid/hid.c > index 1c4d5fa45e0..dd03d6d8943 100644 > --- sys/dev/hid/hid.c > +++ sys/dev/hid/hid.c > @@ -229,7 +229,7 @@ hid_get_item(struct hid_data *s, struct hid_item *h) >* Only copy HID item, increment position and return >* if correct kind! >*/ > - if (s->kind == c->kind) { > + if (s->kind == hid_none || s->kind == c->kind) { > *h = *c; > DPRINTF("%u,%u,%u\n", h->loc.pos, > h->loc.size, h->loc.count); >
alc0 DMA write error
During the installation of OpenBSD 7.1 amd64 release, after the network interface was auto configured for DHCP, a repeated error message was displayed to the screen at random times, "DMA write error! --resetting." This prevented the network installation of the operating system on this interface, so I worked around the problem using an USB network interface. After I booted into the newly installed system, the error continued to be displayed on the screen at random. Syspatch did not fix the issue. I installed OpenBSD 7.1-current (GENERIC.MP) #516 and the error continues to be displayed on the screen at random. dmesg OpenBSD 7.1-current (GENERIC.MP) #516: Wed May 11 21:42:03 MDT 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 7983210496 (7613MB) avail mem = 7723905024 (7366MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe4800 (20 entries) bios0: vendor Insyde Corp. version "V2.03" date 08/20/2013 bios0: Acer Aspire V5-123 acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP UEFI HPET APIC MCFG ASF! BOOT WDRT WDAT FPDT MSDM SSDT SSDT SSDT SSDT SSDT SSDT acpi0: wakeup devices GPP1(S4) GPP2(S4) OHC1(S3) OHC2(S3) OHC3(S3) XHC0(S3) acpitimer0 at acpi0: 3579545 Hz, 32 bits acpihpet0 at acpi0: 14318180 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD E1-2100 APU with Radeon(TM) HD Graphics, 998.41 MHz, 16-00-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1,XSAVEOPT cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 1MB 64b/line 16-way L2 cache cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: AMD E1-2100 APU with Radeon(TM) HD Graphics, 998.13 MHz, 16-00-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PCTRL3,ITSC,BMI1,XSAVEOPT cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 1MB 64b/line 16-way L2 cache cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative cpu1: smt 0, core 1, package 0 ioapic0 at mainbus0: apid 4 pa 0xfec0, version 21, 24 pins, remapped ioapic1 at mainbus0: apid 5 pa 0xfec01000, version 21, 32 pins, remapped acpimcfg0 at acpi0 acpimcfg0: addr 0xf800, bus 0-63 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (GPP0) acpiprt2 at acpi0: bus 1 (GPP1) acpiprt3 at acpi0: bus 2 (GPP2) acpiprt4 at acpi0: bus -1 (GPP3) acpiprt5 at acpi0: bus -1 (GFX_) acpiec0 at acpi0 acpibtn0 at acpi0: PWRB acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x acpicmos0 at acpi0 "ETD0501" at acpi0 not configured "PNP0C14" at acpi0 not configured acpibtn1 at acpi0: SLPB acpiac0 at acpi0: AC unit offline acpibat0 at acpi0: BAT1 model "AL12B32" serial 16051 type LION oem "SANYO" acpibtn2 at acpi0: LID_ acpicpu0 at acpi0: C2(0@400 io@0x414), C1(@1 halt!), PSS acpicpu1 at acpi0: C2(0@400 io@0x414), C1(@1 halt!), PSS acpitz0 at acpi0: critical temperature is 100 degC acpivideo0 at acpi0: VGA_ acpivout0 at acpivideo0: LCD_ acpivideo1 at acpi0: VGA_ cpu0: 998 MHz: speeds: 1000 900 800 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "AMD 16h Host" rev 0x00 radeondrm0 at pci0 dev 1 function 0 "ATI Kabini" rev 0x00 drm0 at radeondrm0 radeondrm0: msi azalia0 at pci0 dev 1 function 1 "ATI Radeon HD Audio" rev 0x00: msi azalia0: no supported codecs pchb1 at pci0 dev 2 function 0 vendor "AMD", unknown product 0x1538 rev 0x00 ppb0 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi pci1 at ppb0 bus 1 "Atheros AR9565" rev 0x01 at pci1 dev 0 function 0 not configured ppb1 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi pci2 at ppb1 bus 2 alc0 at pci2 dev 0 function 0 "Attansic Technology AR8171" rev 0x10: msi, address 08:9e:01:e5:a5:9e atphy0 at alc0 phy 0: AR8035 10/100/1000 PHY, rev. 9 xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x01: msi, xHCI 1.0 usb0 at xhci0: USB revision 3.0 uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 addr 1 ahci0 at pci0 dev 17 function 0 "AMD Hudson-2
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello Dave, > > sorry i'm late to the party. can you try this diff? glad to see you are here. I think you diff looks good. I'm just concerned about the memory allocation in veb_ports_insert(). The memory is allocated with `M_WAITOK` flag, which essentially means we may give up CPU. the veb_ports_insert() is being called from veb_add_port() here: 1500 } 1501 1502 p->p_brport.eb_port_take = veb_eb_brport_take; 1503 p->p_brport.eb_port_rele = veb_eb_brport_rele; 1504 1505 om = SMR_PTR_GET_LOCKED(ports_ptr); 1506 nm = veb_ports_insert(om, p); 1507 1508 /* this might have changed if we slept for malloc or ifpromisc */ 1509 error = ether_brport_isset(ifp0); at this point we hold a NET_LOCK(). So basically if there won't be enough memory we might start sleeping waiting for memory while we will be holding a NET_LOCK. This is something we should try to avoid, however this can be sorted out later. At this point I just want to point out this problem, which can be certainly solved in follow up commit. pf(4) also has its homework to be done around sleeping mallocs. I think your diff should go in as is. thanks and regards OK sashan
Re: amdgpu not reliably resuming?
On Fri, May 13, 2022 at 11:11:47AM +0100, Laurence Tratt wrote: > On Tue, May 10, 2022 at 09:26:48PM +0100, Laurence Tratt wrote: > > > Maybe! Would that device have some sort of interaction with X one way or > > another? Because it seems that I can reliably suspend/resume if xenodm is > > on the login page, but after I've logged in via xenodm, suspend rarely > > works and, if it does, resume almost certainly doesn't. > > Partly based on a prompt from Mike Larkin, I went through all my running > applications to see if anything caused suspend problems and I believe I've > now found the culprit: web browsers, both Firefox and Chrome. > > To cut a long story short, and for the benefit of the mailing list archive, > if I used Firefox or Chrome (with long-standing config/cache), suspend didn't > work. Blowing away their config & cache makes suspend (and resume!) work. > [Firefox was auto-started by XFCE hence why I'd conflated "login via xenodm" > with "can't suspend". But Chrome also caused the same problem if I ran it > even if Firefox hadn't been started.] > > I don't know what in the old config/cache could have caused this, other than > the config/cache for both browsers was pretty old. Building the config back > up to what I believe I had before has not made the problem come back so far. > Should that change, I'll post a follow-up. > > Thanks Jonathan and Mike for your help and suggestions! > > > Laurie > Probably audio and/or DRM doing something weird that gets that hardware in an un-zzzable state. Although I think we'd have seen that across a bunch of machines if it was something systemic. Probably just a quirk of your machine. -ml
Re: amdgpu not reliably resuming?
On Tue, May 10, 2022 at 09:26:48PM +0100, Laurence Tratt wrote: > Maybe! Would that device have some sort of interaction with X one way or > another? Because it seems that I can reliably suspend/resume if xenodm is > on the login page, but after I've logged in via xenodm, suspend rarely > works and, if it does, resume almost certainly doesn't. Partly based on a prompt from Mike Larkin, I went through all my running applications to see if anything caused suspend problems and I believe I've now found the culprit: web browsers, both Firefox and Chrome. To cut a long story short, and for the benefit of the mailing list archive, if I used Firefox or Chrome (with long-standing config/cache), suspend didn't work. Blowing away their config & cache makes suspend (and resume!) work. [Firefox was auto-started by XFCE hence why I'd conflated "login via xenodm" with "can't suspend". But Chrome also caused the same problem if I ran it even if Firefox hadn't been started.] I don't know what in the old config/cache could have caused this, other than the config/cache for both browsers was pretty old. Building the config back up to what I believe I had before has not made the problem come back so far. Should that change, I'll post a follow-up. Thanks Jonathan and Mike for your help and suggestions! Laurie
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote: > On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote: > > On 12.5.2022. 20:04, Hrvoje Popovski wrote: > > > On 12.5.2022. 16:22, Hrvoje Popovski wrote: > > >> On 12.5.2022. 14:48, Claudio Jeker wrote: > > >>> I think the diff below may be enough to fix this issue. It drops the SMR > > >>> critical secition around the enqueue operation but uses a reference on > > >>> the > > >>> port insteadt to ensure that the device can't be removed during the > > >>> enqueue. Once the enqueue is finished we enter the SMR critical section > > >>> again and drop the reference. > > >>> > > >>> To make it clear that the SMR_TAILQ remains intact while a refcount is > > >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is > > >>> not strictly needed since the next pointer remains valid up until the > > >>> smr_barrier() call but I find this a bit easier to understand. > > >>> First make sure nobody else holds a reference then remove the port from > > >>> the list. > > >>> > > >>> I currently have no test setup to verify this but I hope someone else > > >>> can > > >>> give this a spin. > > >> Hi, > > >> > > >> for now veb seems stable and i can't panic box although it should, but > > >> please give me few more hours to torture it properly. > > > > > > > > > I can trigger panic in veb with this diff. > > > > > > Thank you .. > > > > > > > > > > I can't trigger ... :))) sorry .. > > sorry i'm late to the party. can you try this diff? > > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all the > ports. the forwarding path uses smr to get hold of a map, takes a map > ref, and then leaves the smr crit section before iterating over the map > and pushing packets. > > this means we should only take and release a single refcnt when > we're pushing packets out any number of ports. > > if no span ports are configured, then there's no span port map and > we don't try and take a ref, we can just return early. > > we also only take and release a single refcnt when we forward the > actual packet. forwarding to a single port provided by an etherbridge > lookup already takes/releases the single port ref. if it falls > through that for unknown unicast or broadcast/multicast, then it's > a single refcnt for the current map of all ports. This is very smart and probably better since it uses less atomic instructions per packet. Diff reads fine. OK claudio@ > Index: if_veb.c > === > RCS file: /cvs/src/sys/net/if_veb.c,v > retrieving revision 1.25 > diff -u -p -r1.25 if_veb.c > --- if_veb.c 4 Jan 2022 06:32:39 - 1.25 > +++ if_veb.c 13 May 2022 02:01:43 - > @@ -139,13 +139,13 @@ struct veb_port { > struct veb_rule_list p_vr_list[2]; > #define VEB_RULE_LIST_OUT0 > #define VEB_RULE_LIST_IN 1 > - > - SMR_TAILQ_ENTRY(veb_port)p_entry; > }; > > struct veb_ports { > - SMR_TAILQ_HEAD(, veb_port) l_list; > - unsigned int l_count; > + struct refcntm_refs; > + unsigned int m_count; > + > + /* followed by an array of veb_port pointers */ > }; > > struct veb_softc { > @@ -155,8 +155,8 @@ struct veb_softc { > struct etherbridge sc_eb; > > struct rwlocksc_rule_lock; > - struct veb_ports sc_ports; > - struct veb_ports sc_spans; > + struct veb_ports*sc_ports; > + struct veb_ports*sc_spans; > }; > > #define DPRINTF(_sc, fmt...)do { \ > @@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u > static int veb_p_output(struct ifnet *, struct mbuf *, > struct sockaddr *, struct rtentry *); > > -static void veb_p_dtor(struct veb_softc *, struct veb_port *, > - const char *); > +static inline size_t > +veb_ports_size(unsigned int n) > +{ > + /* use of _ALIGN is inspired by CMSGs */ > + return _ALIGN(sizeof(struct veb_ports)) + > + n * sizeof(struct veb_port *); > +} > + > +static inline struct veb_port ** > +veb_ports_array(struct veb_ports *m) > +{ > + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m))); > +} > + > +static void veb_ports_free(struct veb_ports *); > + > +static void veb_p_unlink(struct veb_softc *, struct veb_port *); > +static void veb_p_fini(struct veb_port *); > +static void veb_p_dtor(struct veb_softc *, struct veb_port *); > static int veb_add_port(struct veb_softc *, > const struct ifbreq *, unsigned int); > static int veb_del_port(struct veb_softc *, > @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 13.5.2022. 4:19, David Gwynne wrote: > sorry i'm late to the party. can you try this diff? > > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all the > ports. the forwarding path uses smr to get hold of a map, takes a map > ref, and then leaves the smr crit section before iterating over the map > and pushing packets. > > this means we should only take and release a single refcnt when > we're pushing packets out any number of ports. > > if no span ports are configured, then there's no span port map and > we don't try and take a ref, we can just return early. > > we also only take and release a single refcnt when we forward the > actual packet. forwarding to a single port provided by an etherbridge > lookup already takes/releases the single port ref. if it falls > through that for unknown unicast or broadcast/multicast, then it's > a single refcnt for the current map of all ports. Hi, and with this diff i can't trigger panic ...