Re: Set soname for shared libraries

2017-11-13 Thread Philip Guenther
On Mon, Nov 13, 2017 at 12:25 PM, Mark Kettenis 
wrote:

> The diff below changes bsd.lib.mk such that we set the DT_SONAME entry
> for shared libraries to the name of the shared library.  This brings
> us in line with other ELF systems and allows us to use symlinks to
> tell the linker which shared libraries to link against.  That in turn
> would allow us to drop some OpenBSD-specific local modifications from
> our linkers.
>
> ok?
>

Looks good.  ok guenther@


Re: multiple interface input queues

2017-11-13 Thread Hrvoje Popovski
On 14.11.2017. 5:42, David Gwynne wrote:
> this replaces the single mbuf_queue and task in struct ifnet with
> a new ifiqueue structure modelled on ifqueues.


i've tested this diff with ix, myx, em and bge with down/up interfaces
and everything is working fine ...



Re: Remove commented out pledge in tsort

2017-11-13 Thread Ori Bernstein
On Sun, 12 Nov 2017 13:31:10 +0100, Marc Espie  wrote:

> I did this in an eager way a while back, was told in no uncertain terms
> "not yet, we're still looking at things".
> 
> Well, see pledge(2). Whenever the dust settles, sure we can kill this line,
> or activate it instead.
> 
> Dust is not yet settled.
> 

In this specific case, the line is already activated. It's just duplicated
in a comment one line before the pledge gets called.

-- 
Ori Bernstein



multiple interface input queues

2017-11-13 Thread David Gwynne
this replaces the single mbuf_queue and task in struct ifnet with
a new ifiqueue structure modelled on ifqueues.

the main motivation behind this was to show mpsafe input counters.

ifiqueues, like ifqueues, allow a driver to configure multiple
queueus. this in turn allows a driver with multiple rx rings to
have them queue and account for packets independently.

ifiq_input generally replaces if_input. if_input now simply queues
on ifnets builtin ifiqueue (ifp->if_rcv) to support the current set
of drivers that only configure a single rx context/ring.

it is up to the driver to ensure that an ifiqueue is not used
concurrently. in practice this should be easy cos nics generally
only rx packets from processing a single ring from a single interrupt
source.

ifiq counters are updated and read using the same semantic as percpu
counters, ie, there's a generation number updated before and after
a counter update. this means writers dont block, but a reader may
loop a couple of times waiting for a writer to finish. readers call
yield(), which causes splasserts to fire if if_getdata is still
holding the kernel lock.

ifiq_input is set up to interact with the interface rx ring moderation
code (ie, if_rxring code). you pass what the current rxr watermark
is, and it will look at the backlog of packets on that ifiq to
determine if the ring is producing too fast. if it is, itll return
1 to slow down packet reception. i have a later diff that adds
functionality to the if_rxring bits so a driver can say say a ring
is livelocked, rather than relying on the system to detect it. so
drv_rxeof would have the following at the end of it:

if (ifiq_input(rxr->ifiq, , if_rxr_cwm(>rx_ring))
if_rxr_livelocked(>rx_ring);
drv_rx_refill(rxr);

ive run with that on a hacked up ix(4) that runs with 8 rx rings,
and this works pretty well. if you're doing a single stream, you
see one rx ring grow the number of descs it will handle, but if you
flood it with a range of traffic you'll see that one ring scale
down and balance out with the rest of the rings. it turns out you
can still get reasonable throughput even if the ifiqs are dynamically
scaling themselves to only 100 packets. however, the interactivity
of the system improves a lot.

currently if one interface is being DoSed, it'll end up with 8192
packet on if_inputqueue. that takes a while to process those packets,
which blocks the processing of packets from the other interface.
by scaling the input queues to relatively small counts, softnet can
service packets frmo other interfaces sooner.

Index: if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.527
diff -u -p -r1.527 if.c
--- if.c14 Nov 2017 04:08:11 -  1.527
+++ if.c14 Nov 2017 04:18:41 -
@@ -156,7 +156,6 @@ int if_group_egress_build(void);
 
 void   if_watchdog_task(void *);
 
-void   if_input_process(void *);
 void   if_netisr(void *);
 
 #ifdef DDB
@@ -437,8 +436,6 @@ if_attachsetup(struct ifnet *ifp)
 
ifidx = ifp->if_index;
 
-   mq_init(>if_inputqueue, 8192, IPL_NET);
-   task_set(ifp->if_inputtask, if_input_process, (void *)ifidx);
task_set(ifp->if_watchdogtask, if_watchdog_task, (void *)ifidx);
task_set(ifp->if_linkstatetask, if_linkstate_task, (void *)ifidx);
 
@@ -563,6 +560,30 @@ if_attach_queues(struct ifnet *ifp, unsi
 }
 
 void
+if_attach_iqueues(struct ifnet *ifp, unsigned int niqs)
+{
+   struct ifiqueue **map;
+   struct ifiqueue *ifiq;
+   unsigned int i;
+
+   KASSERT(niqs != 0);
+
+   map = mallocarray(niqs, sizeof(*map), M_DEVBUF, M_WAITOK);
+
+   ifp->if_rcv.ifiq_softc = NULL;
+   map[0] = >if_rcv;
+
+   for (i = 1; i < niqs; i++) {
+   ifiq = malloc(sizeof(*ifiq), M_DEVBUF, M_WAITOK|M_ZERO);
+   ifiq_init(ifiq, ifp, i);
+   map[i] = ifiq;
+   }
+
+   ifp->if_iqs = map;
+   ifp->if_niqs = niqs;
+}
+
+void
 if_attach_common(struct ifnet *ifp)
 {
KASSERT(ifp->if_ioctl != NULL);
@@ -587,6 +608,12 @@ if_attach_common(struct ifnet *ifp)
ifp->if_ifqs = ifp->if_snd.ifq_ifqs;
ifp->if_nifqs = 1;
 
+   ifiq_init(>if_rcv, ifp, 0);
+
+   ifp->if_rcv.ifiq_ifiqs[0] = >if_rcv;
+   ifp->if_iqs = ifp->if_rcv.ifiq_ifiqs;
+   ifp->if_niqs = 1;
+
ifp->if_addrhooks = malloc(sizeof(*ifp->if_addrhooks),
M_TEMP, M_WAITOK);
TAILQ_INIT(ifp->if_addrhooks);
@@ -605,8 +632,6 @@ if_attach_common(struct ifnet *ifp)
M_TEMP, M_WAITOK|M_ZERO);
ifp->if_linkstatetask = malloc(sizeof(*ifp->if_linkstatetask),
M_TEMP, M_WAITOK|M_ZERO);
-   ifp->if_inputtask = malloc(sizeof(*ifp->if_inputtask),
-   M_TEMP, M_WAITOK|M_ZERO);
ifp->if_llprio = IFQ_DEFPRIO;
 
SRPL_INIT(>if_inputs);
@@ -694,47 +719,7 @@ if_enqueue(struct ifnet *ifp, struct mbu
 void
 if_input(struct ifnet *ifp, struct 

more ethernet media types

2017-11-13 Thread David Gwynne
apparently active optical cables are distinct from direct attached
cables, so here's a bunch of extra media types for them.

you can also get LR and ER(?) optics for 25g.

ok?

Index: if_media.h
===
RCS file: /cvs/src/sys/net/if_media.h,v
retrieving revision 1.39
diff -u -p -r1.39 if_media.h
--- if_media.h  24 Jan 2017 10:08:30 -  1.39
+++ if_media.h  14 Nov 2017 03:37:46 -
@@ -196,6 +196,13 @@ uint64_t   ifmedia_baudrate(uint64_t);
 #defineIFM_25G_SR  49  /* 25GBase-SR */
 #defineIFM_50G_CR2 50  /* 50GBase-CR2 */
 #defineIFM_50G_KR2 51  /* 50GBase-KR2 */
+#defineIFM_25G_LR  52  /* 25GBase-LR */
+#defineIFM_25G_ER  53  /* 25GBase-ER */
+#defineIFM_10G_AOC 54  /* 10G Active Optical Cable */
+#defineIFM_25G_AOC 55  /* 25G Active Optical Cable */
+#defineIFM_40G_AOC 56  /* 40G Active Optical Cable */
+#defineIFM_100G_AOC57  /* 100G Active Optical Cable */
+#defineIFM_25G_SFP_CU  58  /* 25G SFP28 direct attached 
cable */
 
 #defineIFM_ETH_MASTER  0x0001ULL   /* master mode 
(1000baseT) */
 #defineIFM_ETH_RXPAUSE 0x0002ULL   /* receive PAUSE frames 
*/
@@ -541,6 +548,7 @@ struct ifmedia_description {
{ IFM_ETHER|IFM_10G_KR, "10GBASE-KR" }, \
{ IFM_ETHER|IFM_10G_CR1,"10GbaseCR1" }, \
{ IFM_ETHER|IFM_10G_CR1,"10GBASE-CR1" },\
+   { IFM_ETHER|IFM_10G_AOC,"10G-AOC" },\
{ IFM_ETHER|IFM_20G_KR2,"20GbaseKR2" }, \
{ IFM_ETHER|IFM_20G_KR2,"20GBASE-KR2" },\
{ IFM_ETHER|IFM_2500_KX,"2500baseKX" }, \
@@ -559,6 +567,7 @@ struct ifmedia_description {
{ IFM_ETHER|IFM_1000_CX_SGMII,  "1000BASE-CX-SGMII" },  \
{ IFM_ETHER|IFM_40G_KR4,"40GbaseKR4" }, \
{ IFM_ETHER|IFM_40G_KR4,"40GBASE-KR4" },\
+   { IFM_ETHER|IFM_40G_AOC,"40G-AOC" },\
{ IFM_ETHER|IFM_10G_ER, "10GbaseER" },  \
{ IFM_ETHER|IFM_10G_ER, "10GBASE-ER" }, \
{ IFM_ETHER|IFM_100G_CR4,   "100GbaseCR4" },\
@@ -569,6 +578,7 @@ struct ifmedia_description {
{ IFM_ETHER|IFM_100G_KR4,   "100GBASE-KR4" },   \
{ IFM_ETHER|IFM_100G_LR4,   "100GbaseLR4" },\
{ IFM_ETHER|IFM_100G_LR4,   "100GBASE-LR4" },   \
+   { IFM_ETHER|IFM_100G_AOC,   "100G-AOC" },   \
{ IFM_ETHER|IFM_56G_R4, "56GbaseR4" },  \
{ IFM_ETHER|IFM_56G_R4, "56GBASE-R4" }, \
{ IFM_ETHER|IFM_25G_CR, "25GbaseCR" },  \
@@ -577,6 +587,12 @@ struct ifmedia_description {
{ IFM_ETHER|IFM_25G_KR, "25GBASE-KR" }, \
{ IFM_ETHER|IFM_25G_SR, "25GbaseSR" },  \
{ IFM_ETHER|IFM_25G_SR, "25GBASE-SR" }, \
+   { IFM_ETHER|IFM_25G_LR, "25GbaseLR" },  \
+   { IFM_ETHER|IFM_25G_LR, "25GBASE-LR" }, \
+   { IFM_ETHER|IFM_25G_ER, "25GbaseER" },  \
+   { IFM_ETHER|IFM_25G_ER, "25GBASE-ER" }, \
+   { IFM_ETHER|IFM_25G_SFP_CU  "25GSFP28" },   \
+   { IFM_ETHER|IFM_10G_AOC,"25G-AOC" },\
{ IFM_ETHER|IFM_50G_CR2,"50GbaseCR2" }, \
{ IFM_ETHER|IFM_50G_CR2,"50GBASE-CR2" },\
{ IFM_ETHER|IFM_50G_KR2,"50GbaseKR2" }, \



move reading of ifq counters for if_data into ifq.c

2017-11-13 Thread David Gwynne
this moves the reading of counters out of an interfaces ifqs into
ifq.c.

there's no semantic change, i just want to keep the knowledge about
locking in an ifq all in the same place.

ok?

Index: if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.523
diff -u -p -r1.523 if.c
--- if.c4 Nov 2017 16:58:46 -   1.523
+++ if.c14 Nov 2017 00:22:13 -
@@ -2276,30 +2276,14 @@ void
 if_getdata(struct ifnet *ifp, struct if_data *data)
 {
unsigned int i;
-   struct ifqueue *ifq;
-   uint64_t opackets = 0;
-   uint64_t obytes = 0;
-   uint64_t omcasts = 0;
-   uint64_t oqdrops = 0;
+
+   *data = ifp->if_data;
 
for (i = 0; i < ifp->if_nifqs; i++) {
-   ifq = ifp->if_ifqs[i];
+   struct ifqueue *ifq = ifp->if_ifqs[i];
 
-   mtx_enter(>ifq_mtx);
-   opackets += ifq->ifq_packets;
-   obytes += ifq->ifq_bytes;
-   oqdrops += ifq->ifq_qdrops;
-   omcasts += ifq->ifq_mcasts;
-   mtx_leave(>ifq_mtx);
-   /* ifq->ifq_errors */
+   ifq_add_data(ifq, data);
}
-
-   *data = ifp->if_data;
-   data->ifi_opackets += opackets;
-   data->ifi_obytes += obytes;
-   data->ifi_oqdrops += oqdrops;
-   data->ifi_omcasts += omcasts;
-   /* ifp->if_data.ifi_oerrors */
 }
 
 /*
Index: ifq.c
===
RCS file: /cvs/src/sys/net/ifq.c,v
retrieving revision 1.13
diff -u -p -r1.13 ifq.c
--- ifq.c   14 Nov 2017 00:00:35 -  1.13
+++ ifq.c   14 Nov 2017 00:22:13 -
@@ -289,6 +289,18 @@ ifq_destroy(struct ifqueue *ifq)
ml_purge();
 }
 
+void
+ifq_add_data(struct ifqueue *ifq, struct if_data *data)
+{
+   mtx_enter(>ifq_mtx);
+   data->ifi_opackets += ifq->ifq_packets;
+   data->ifi_obytes += ifq->ifq_bytes;
+   data->ifi_oqdrops += ifq->ifq_qdrops;
+   data->ifi_omcasts += ifq->ifq_mcasts;
+   /* ifp->if_data.ifi_oerrors */
+   mtx_leave(>ifq_mtx);
+}
+
 int
 ifq_enqueue(struct ifqueue *ifq, struct mbuf *m)
 {
Index: ifq.h
===
RCS file: /cvs/src/sys/net/ifq.h,v
retrieving revision 1.14
diff -u -p -r1.14 ifq.h
--- ifq.h   14 Nov 2017 00:00:35 -  1.14
+++ ifq.h   14 Nov 2017 00:22:13 -
@@ -379,6 +379,7 @@ struct ifq_ops {
 voidifq_init(struct ifqueue *, struct ifnet *, unsigned int);
 voidifq_attach(struct ifqueue *, const struct ifq_ops *, void *);
 voidifq_destroy(struct ifqueue *);
+voidifq_add_data(struct ifqueue *, struct if_data *);
 int ifq_enqueue(struct ifqueue *, struct mbuf *);
 voidifq_start(struct ifqueue *);
 struct mbuf*ifq_deq_begin(struct ifqueue *);



Re: README patch

2017-11-13 Thread Ingo Schwarze
Hi,

Andras Farkas wrote on Mon, Nov 13, 2017 at 06:21:26PM -0500:
> On Mon, Nov 13, 2017 at 8:16 AM, Sebastian Benoit  wrote:

>> http://ftp.openbsd.org/pub/OpenBSD/6.2/README

> Much more minor,

It is now known that that file is outdated in a number of respects.

However, it is not subject to version control, and consequently
normal developers cannot change it, so please be patient.
There is no need to report additional errors in that file.

I expect that it will get updated for the next release at the latest.

Yours,
  Ingo



Re: README patch

2017-11-13 Thread Andras Farkas
On Mon, Nov 13, 2017 at 8:16 AM, Sebastian Benoit  wrote:
> http://ftp.openbsd.org/pub/OpenBSD/6.2/README
Much more minor, but "This is our 43nd release." should be "This is
our 43rd release."
Also,
"For further details, please visit https://OpenBSD.org/faq;
actually ends up being https://www.openbsd.org/faq/ with a slash at
the very end.



Re: ofw_clock: reset_deassert and clock_enable order of use Qs

2017-11-13 Thread Mark Kettenis
> Date: Sun, 12 Nov 2017 23:53:10 +0200
> From: Artturi Alm 
> 
> On Sat, Oct 07, 2017 at 10:16:05AM +0300, Artturi Alm wrote:
> > Hi,
> > 
> > 
> > what was the cause of these delays? i just spotted this, so untested on
> > likely more affected HW(sunxi A64/H3 and rockchips), but just for discussion
> > about the delays/ordering? are they really needed like that?
> > 
> > i don't remember having read anything about the order of these from
> > devicetree-binding .txt's i have read so far from u-boot/linux.
> > 
> > "Make sure that the reset signal has been released
> > before the release of module clock gating;"
> > 
> > above can be found from under the short CCU "x.3.6 Programming 
> > Guidelines"[0].
> > maybe this ordering should be verified from somewhere else, as atleast
> > for my sorry non-nativeNlazy grammar i could understand wrong, what is being
> > meant w/"the release" above.. but even w/above discarded, my intuition does
> > favor deasserting reset before gating sclki, hence the diff.
> > 
> > the diff below booted faster with no observable changes in dmesg on cubie2,
> > and otherwise succesfully on panda and wandboard.
> > 
> > -Artturi
> > 
> 
> Ping?
> 
> i think i've passed by sources in both fbsd since email above
> suggesting the order should be like in the diff.
> does anyone want me to provide more on my findings about this,
> or anything else?

As far as I can tell, the Linux code always enables the clock before
deasserting the reset.  If the ordering is SoC-specific, I guess the
clock/reset controller driver should handle it as some of these
drivers are not SoC-specific.

For that reason I'm going to commit the sxitwi diff with the
"canonical" ordering.

Cheers,

Mark

> > [0] sources:
> > A64 User Manual(Revision 1.0):
> > "3.3.6.4. Gating and reset" Page 147
> > H3 Datasheet(Revision1.2):
> > "4.3.6.4. Gating and reset" Page 142
> > 
> > 
> > 
> > diff --git a/sys/dev/fdt/ehci_fdt.c b/sys/dev/fdt/ehci_fdt.c
> > index 55fe75f1a9c..fae54ac11ee 100644
> > --- a/sys/dev/fdt/ehci_fdt.c
> > +++ b/sys/dev/fdt/ehci_fdt.c
> > @@ -91,9 +91,10 @@ ehci_fdt_attach(struct device *parent, struct device 
> > *self, void *aux)
> >  
> > pinctrl_byname(sc->sc_node, "default");
> >  
> > -   clock_enable_all(sc->sc_node);
> > reset_deassert_all(sc->sc_node);
> >  
> > +   clock_enable_all(sc->sc_node);
> > +
> > /* Disable interrupts, so we don't get any spurious ones. */
> > sc->sc.sc_offs = EREAD1(>sc, EHCI_CAPLENGTH);
> > EOWRITE2(>sc, EHCI_USBINTR, 0);
> > diff --git a/sys/dev/fdt/if_dwge_fdt.c b/sys/dev/fdt/if_dwge_fdt.c
> > index edfe5acb992..00668980f4a 100644
> > --- a/sys/dev/fdt/if_dwge_fdt.c
> > +++ b/sys/dev/fdt/if_dwge_fdt.c
> > @@ -125,10 +125,10 @@ dwge_fdt_attach(struct device *parent, struct device 
> > *self, void *aux)
> > else if (OF_is_compatible(faa->fa_node, "rockchip,rk3399-gmac"))
> > dwge_fdt_attach_rockchip(fsc);
> >  
> > +   reset_deassert(faa->fa_node, "stmmaceth");
> > +
> > /* Enable clock. */
> > clock_enable(faa->fa_node, "stmmaceth");
> > -   reset_deassert(faa->fa_node, "stmmaceth");
> > -   delay(5000);
> >  
> > /* Power up PHY. */
> > phy_supply = OF_getpropint(faa->fa_node, "phy-supply", 0);
> > diff --git a/sys/dev/fdt/if_dwxe.c b/sys/dev/fdt/if_dwxe.c
> > index 22a383c06ef..6e303745ec3 100644
> > --- a/sys/dev/fdt/if_dwxe.c
> > +++ b/sys/dev/fdt/if_dwxe.c
> > @@ -376,10 +376,10 @@ dwxe_attach(struct device *parent, struct device 
> > *self, void *aux)
> >  
> > pinctrl_byname(faa->fa_node, "default");
> >  
> > +   reset_deassert(faa->fa_node, "stmmaceth");
> > +
> > /* Enable clock. */
> > clock_enable(faa->fa_node, "stmmaceth");
> > -   reset_deassert(faa->fa_node, "stmmaceth");
> > -   delay(5000);
> >  
> > /* Power up PHY. */
> > phy_supply = OF_getpropint(faa->fa_node, "phy-supply", 0);
> > diff --git a/sys/dev/fdt/sximmc.c b/sys/dev/fdt/sximmc.c
> > index 7acb8f55d56..9ed2ae09fe4 100644
> > --- a/sys/dev/fdt/sximmc.c
> > +++ b/sys/dev/fdt/sximmc.c
> > @@ -366,11 +366,10 @@ sximmc_attach(struct device *parent, struct device 
> > *self, void *aux)
> >  
> > pinctrl_byname(faa->fa_node, "default");
> >  
> > +   reset_deassert_all(faa->fa_node);
> > +
> > /* enable clock */
> > clock_enable(faa->fa_node, NULL);
> > -   delay(5000);
> > -
> > -   reset_deassert_all(faa->fa_node);
> >  
> > /*
> >  * The FIFO register is in a different location on the
> > diff --git a/sys/dev/fdt/sxitwi.c b/sys/dev/fdt/sxitwi.c
> > index f53f2bfd594..c80bdc2ab06 100644
> > --- a/sys/dev/fdt/sxitwi.c
> > +++ b/sys/dev/fdt/sxitwi.c
> > @@ -218,6 +218,8 @@ sxitwi_attach(struct device *parent, struct device 
> > *self, void *aux)
> >  
> > pinctrl_byname(faa->fa_node, "default");
> >  
> > +   reset_deassert_all(faa->fa_node);
> > +
> > /* Enable clock */
> > clock_enable(faa->fa_node, NULL);
> >  
> 



Re: armv7/arm64: enable fifo in com_fdt for "snps,dw-apb-uart"

2017-11-13 Thread Mark Kettenis
> Date: Fri, 10 Nov 2017 00:16:26 +0200
> From: Artturi Alm 
> 
> Hi,
> 
> does also remove ti,omapX-uart compatibles from arm64 com_fdt,
> as they're never to be found there.
> iirc. the lenght is based on minimal lenght i found from any datasheet
> for SoCs w/"snps,dw-apb-uart", and conclusions drawn from gpl-sources.

As far as I can tell the FIFO functionality is optional for the
Synopys "IP".  It's just safer to not bother with the FIFO.

> diff --git a/sys/arch/arm64/dev/com_fdt.c b/sys/arch/arm64/dev/com_fdt.c
> index 9f2a8eb760a..1dd67c7c3e0 100644
> --- a/sys/arch/arm64/dev/com_fdt.c
> +++ b/sys/arch/arm64/dev/com_fdt.c
> @@ -66,9 +66,7 @@ com_fdt_init_cons(void)
>   void *node;
>  
>   if ((node = fdt_find_cons("brcm,bcm2835-aux-uart")) == NULL &&
> - (node = fdt_find_cons("snps,dw-apb-uart")) == NULL &&
> - (node = fdt_find_cons("ti,omap3-uart")) == NULL &&
> - (node = fdt_find_cons("ti,omap4-uart")) == NULL)
> + (node = fdt_find_cons("snps,dw-apb-uart")) == NULL)
>   return;
>   if (fdt_get_reg(node, 0, ))
>   return;
> @@ -95,9 +93,7 @@ com_fdt_match(struct device *parent, void *match, void *aux)
>   struct fdt_attach_args *faa = aux;
>  
>   return (OF_is_compatible(faa->fa_node, "brcm,bcm2835-aux-uart") ||
> - OF_is_compatible(faa->fa_node, "snps,dw-apb-uart") ||
> - OF_is_compatible(faa->fa_node, "ti,omap3-uart") ||
> - OF_is_compatible(faa->fa_node, "ti,omap4-uart"));
> + OF_is_compatible(faa->fa_node, "snps,dw-apb-uart"));
>  }
>  
>  void
> @@ -136,12 +132,11 @@ com_fdt_attach(struct device *parent, struct device 
> *self, void *aux)
>   sc->sc.sc_uarttype = COM_UART_16550;
>   sc->sc.sc_frequency = freq ? freq : COM_FREQ;
>  
> - if (OF_is_compatible(faa->fa_node, "snps,dw-apb-uart"))
> + if (OF_is_compatible(faa->fa_node, "snps,dw-apb-uart")) {
> + sc->sc.sc_uarttype = COM_UART_16550A;
> + sc->sc.sc_fifolen = 64;
>   intr = com_fdt_intr_designware;
> -
> - if (OF_is_compatible(faa->fa_node, "ti,omap3-uart") ||
> - OF_is_compatible(faa->fa_node, "ti,omap4-uart"))
> - sc->sc.sc_uarttype = COM_UART_TI16750;
> + }
>  
>   if (stdout_node == faa->fa_node) {
>   SET(sc->sc.sc_hwflags, COM_HW_CONSOLE);
> diff --git a/sys/arch/armv7/dev/com_fdt.c b/sys/arch/armv7/dev/com_fdt.c
> index 00504801850..97c535a5a05 100644
> --- a/sys/arch/armv7/dev/com_fdt.c
> +++ b/sys/arch/armv7/dev/com_fdt.c
> @@ -140,8 +140,11 @@ com_fdt_attach(struct device *parent, struct device 
> *self, void *aux)
>   sc->sc.sc_uarttype = COM_UART_16550;
>   sc->sc.sc_frequency = freq ? freq : COM_FREQ;
>  
> - if (OF_is_compatible(faa->fa_node, "snps,dw-apb-uart"))
> + if (OF_is_compatible(faa->fa_node, "snps,dw-apb-uart")) {
> + sc->sc.sc_uarttype = COM_UART_16550A;
> + sc->sc.sc_fifolen = 64;
>   intr = com_fdt_intr_designware;
> + }
>  
>   if (OF_is_compatible(faa->fa_node, "ti,omap3-uart") ||
>   OF_is_compatible(faa->fa_node, "ti,omap4-uart"))
> 
> 



Set soname for shared libraries

2017-11-13 Thread Mark Kettenis
The diff below changes bsd.lib.mk such that we set the DT_SONAME entry
for shared libraries to the name of the shared library.  This brings
us in line with other ELF systems and allows us to use symlinks to
tell the linker which shared libraries to link against.  That in turn
would allow us to drop some OpenBSD-specific local modifications from
our linkers.

ok?


Index: share/mk/bsd.lib.mk
===
RCS file: /cvs/src/share/mk/bsd.lib.mk,v
retrieving revision 1.91
diff -u -p -r1.91 bsd.lib.mk
--- share/mk/bsd.lib.mk 5 Nov 2017 10:29:24 -   1.91
+++ share/mk/bsd.lib.mk 13 Nov 2017 20:20:47 -
@@ -211,19 +211,19 @@ ${FULLSHLIBNAME}: ${SOBJS} ${DPADD}
@echo building shared ${LIB} library \(version 
${SHLIB_MAJOR}.${SHLIB_MINOR}\)
@rm -f ${.TARGET}
 .if defined(SYSPATCH_PATH)
-   ${CC} -shared ${PICFLAG} -o ${.TARGET} \
+   ${CC} -shared -Wl,-soname,${FULLSHLIBNAME} ${PICFLAG} -o ${.TARGET} \
`readelf -Ws ${SYSPATCH_PATH}${LIBDIR}/${.TARGET} | \
awk '/ FILE/{sub(".*/", "", $$NF); gsub(/\..*/, ".so", $$NF); print 
$$NF}' | \
egrep -v "(cmll-586|libgcc2|unwind-dw2|mul(d|s|x)c3)" | awk 
'!x[$$0]++'` ${LDADD}
 .else
-   ${CC} -shared ${PICFLAG} -o ${.TARGET} \
+   ${CC} -shared -Wl,-soname,${FULLSHLIBNAME} ${PICFLAG} -o ${.TARGET} \
`echo ${SOBJS} | tr ' ' '\n' | sort -R` ${LDADD}
 .endif
 
 ${FULLSHLIBNAME}.a: ${SOBJS}
@echo building shared ${LIB} library \(version 
${SHLIB_MAJOR}.${SHLIB_MINOR}\) ar
@rm -f ${.TARGET}
-   @echo ${PICFLAG} ${LDADD} > .ldadd
+   @echo -Wl,-soname,${FULLSHLIBNAME} ${PICFLAG} ${LDADD} > .ldadd
ar cqD ${FULLSHLIBNAME}.a ${SOBJS} .ldadd ${SYMBOLSMAP}
 
 # all .do files...



Skip a few more ACPI PNP IDs

2017-11-13 Thread Mark Kettenis
Diff below adds a a few more entries that show up in dmesglog.
PNP0303 gets renamed to match the acpica/iasl output.

ok?


Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.332
diff -u -p -r1.332 acpi.c
--- dev/acpi/acpi.c 17 Aug 2017 05:16:27 -  1.332
+++ dev/acpi/acpi.c 13 Nov 2017 20:06:10 -
@@ -2784,9 +2784,11 @@ acpi_parsehid(struct aml_node *node, voi
 const char *acpi_skip_hids[] = {
"INT0800",  /* Intel 82802Firmware Hub Device */
"PNP",  /* 8259-compatible Programmable Interrupt Controller */
+   "PNP0001",  /* EISA Interrupt Controller */
"PNP0100",  /* PC-class System Timer */
"PNP0103",  /* HPET System Timer */
"PNP0200",  /* PC-class DMA Controller */
+   "PNP0201",  /* EISA DMA Controller */
"PNP0800",  /* Microsoft Sound System Compatible Device */
"PNP0A03",  /* PCI Bus */
"PNP0A08",  /* PCI Express Bus */
@@ -2801,8 +2803,13 @@ const char *acpi_skip_hids[] = {
 
 /* ISA devices for which we attach a driver later */
 const char *acpi_isa_hids[] = {
-   "PNP0303",  /* 8042 PS/2 Controller */
+   "PNP0303",  /* IBM Enhanced Keyboard (101/102-key, PS/2 Mouse) */
+   "PNP0400",  /* Standard LPT Parallel Port */
+   "PNP0401",  /* ECP Parallel Port */
"PNP0501",  /* 16550A-compatible COM Serial Port */
+   "PNP0700",  /* PC-class Floppy Disk Controller */
+   "PNP0F03",  /* Microsoft PS/2-style Mouse */
+   "PNP0F13",  /* PS/2 Mouse */
NULL
 };
 



Re: Introduce ipsec_sysctl()

2017-11-13 Thread Alexander Bluhm
On Mon, Nov 13, 2017 at 01:30:43PM +0100, Martin Pieuchot wrote:
> This move all IPsec tunables to netinet/ipsec_input.c without breaking
> the "net.inet.ip" sysctl(3) namespace.   
> 
> The reason for this move is to properly separate IPsec and IP globals
> in order to ease the removal of the NET_LOCK() in these layers.
> 
> ok?

OK bluhm@

> Index: netinet/in.h
> ===
> RCS file: /cvs/src/sys/netinet/in.h,v
> retrieving revision 1.125
> diff -u -p -r1.125 in.h
> --- netinet/in.h  6 Oct 2017 21:14:55 -   1.125
> +++ netinet/in.h  13 Nov 2017 12:11:16 -
> @@ -745,19 +745,19 @@ struct ip_mreq {
>   _hifirstauto, \
>   _hilastauto, \
>   _maxqueue, \
> - , \
> + NULL /* encdebug */, \
>   NULL, \
> - _expire_acquire, \
> - _keep_invalid, \
> - _require_pfs, \
> - _soft_allocations, \
> - _exp_allocations, \
> - _soft_bytes, \
> - _exp_bytes, \
> - _exp_timeout, \
> - _soft_timeout, \
> - _soft_first_use, \
> - _exp_first_use, \
> + NULL /* ipsec_expire_acquire */, \
> + NULL /* ipsec_keep_invalid */, \
> + NULL /* ipsec_require_pfs */, \
> + NULL /* ipsec_soft_allocations */, \
> + NULL /* ipsec_exp_allocations */, \
> + NULL /* ipsec_soft_bytes */, \
> + NULL /* ipsec_exp_bytes */, \
> + NULL /* ipsec_exp_timeout */, \
> + NULL /* ipsec_soft_timeout */, \
> + NULL /* ipsec_soft_first_use */, \
> + NULL /* ipsec_exp_first_use */, \
>   NULL, \
>   NULL, \
>   NULL, \
> Index: netinet/ip_input.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_input.c,v
> retrieving revision 1.331
> diff -u -p -r1.331 ip_input.c
> --- netinet/ip_input.c10 Nov 2017 08:55:49 -  1.331
> +++ netinet/ip_input.c13 Nov 2017 08:51:37 -
> @@ -84,22 +84,6 @@
>  #include 
>  #endif
>  
> -int encdebug = 0;
> -int ipsec_keep_invalid = IPSEC_DEFAULT_EMBRYONIC_SA_TIMEOUT;
> -int ipsec_require_pfs = IPSEC_DEFAULT_PFS;
> -int ipsec_soft_allocations = IPSEC_DEFAULT_SOFT_ALLOCATIONS;
> -int ipsec_exp_allocations = IPSEC_DEFAULT_EXP_ALLOCATIONS;
> -int ipsec_soft_bytes = IPSEC_DEFAULT_SOFT_BYTES;
> -int ipsec_exp_bytes = IPSEC_DEFAULT_EXP_BYTES;
> -int ipsec_soft_timeout = IPSEC_DEFAULT_SOFT_TIMEOUT;
> -int ipsec_exp_timeout = IPSEC_DEFAULT_EXP_TIMEOUT;
> -int ipsec_soft_first_use = IPSEC_DEFAULT_SOFT_FIRST_USE;
> -int ipsec_exp_first_use = IPSEC_DEFAULT_EXP_FIRST_USE;
> -int ipsec_expire_acquire = IPSEC_DEFAULT_EXPIRE_ACQUIRE;
> -char ipsec_def_enc[20];
> -char ipsec_def_auth[20];
> -char ipsec_def_comp[20];
> -
>  /* values controllable via sysctl */
>  int  ipforwarding = 0;
>  int  ipmforwarding = 0;
> @@ -211,10 +195,6 @@ ip_init(void)
>   for (i = 0; defrootonlyports_udp[i] != 0; i++)
>   DP_SET(rootonlyports.udp, defrootonlyports_udp[i]);
>  
> - strlcpy(ipsec_def_enc, IPSEC_DEFAULT_DEF_ENC, sizeof(ipsec_def_enc));
> - strlcpy(ipsec_def_auth, IPSEC_DEFAULT_DEF_AUTH, sizeof(ipsec_def_auth));
> - strlcpy(ipsec_def_comp, IPSEC_DEFAULT_DEF_COMP, sizeof(ipsec_def_comp));
> -
>   mq_init(_mq, 64, IPL_SOFTNET);
>  
>  #ifdef IPSEC
> @@ -1643,26 +1623,25 @@ ip_sysctl(int *name, u_int namelen, void
> ip_mtudisc_timeout);
>   NET_UNLOCK();
>   return (error);
> +#ifdef IPSEC
> + case IPCTL_ENCDEBUG:
> + case IPCTL_IPSEC_EXPIRE_ACQUIRE:
> + case IPCTL_IPSEC_EMBRYONIC_SA_TIMEOUT:
> + case IPCTL_IPSEC_REQUIRE_PFS:
> + case IPCTL_IPSEC_SOFT_ALLOCATIONS:
> + case IPCTL_IPSEC_ALLOCATIONS:
> + case IPCTL_IPSEC_SOFT_BYTES:
> + case IPCTL_IPSEC_BYTES:
> + case IPCTL_IPSEC_TIMEOUT:
> + case IPCTL_IPSEC_SOFT_TIMEOUT:
> + case IPCTL_IPSEC_SOFT_FIRSTUSE:
> + case IPCTL_IPSEC_FIRSTUSE:
>   case IPCTL_IPSEC_ENC_ALGORITHM:
> - NET_LOCK();
> - error = sysctl_tstring(oldp, oldlenp, newp, newlen,
> -ipsec_def_enc, sizeof(ipsec_def_enc));
> - NET_UNLOCK();
> - return (error);
>   case IPCTL_IPSEC_AUTH_ALGORITHM:
> - NET_LOCK();
> - error = sysctl_tstring(oldp, oldlenp, newp, newlen,
> -ipsec_def_auth,
> -sizeof(ipsec_def_auth));
> - NET_UNLOCK();
> - return (error);
>   case IPCTL_IPSEC_IPCOMP_ALGORITHM:
> - NET_LOCK();
> - error = sysctl_tstring(oldp, oldlenp, newp, newlen,
> -ipsec_def_comp,
> -sizeof(ipsec_def_comp));
> - NET_UNLOCK();
> - return (error);
> + return (ipsec_sysctl(name, namelen, oldp, oldlenp, newp,
> + newlen));
> +#endif
>   case IPCTL_IFQUEUE:
>

Re: un-KERNEL_LOCK() TCP/UDP input & co

2017-11-13 Thread Visa Hankala
On Mon, Nov 13, 2017 at 01:25:42PM +0100, Martin Pieuchot wrote:
> I'd love is somebody could do the changes in the rwlock API to let us
> check if we're holding a lock.  Maybe this is already present in
> witness(4), visa?

witness(4) does maintain data about lock ownership. The patch below
might do the trick for rwlocks without adding extra state tracking.

The patch also fixes the witness-side initialization of statically
initialized locks, in functions witness_checkorder() and
witness_lock(). The fix belongs to a separate commit though.

Index: share/man/man9/rwlock.9
===
RCS file: src/share/man/man9/rwlock.9,v
retrieving revision 1.20
diff -u -p -r1.20 rwlock.9
--- share/man/man9/rwlock.9 30 Oct 2017 13:33:36 -  1.20
+++ share/man/man9/rwlock.9 13 Nov 2017 17:05:51 -
@@ -33,6 +33,7 @@
 .Nm rw_assert_anylock ,
 .Nm rw_assert_unlocked ,
 .Nm rw_status ,
+.Nm rw_status_curproc ,
 .Nm RWLOCK_INITIALIZER ,
 .Nm rrw_init ,
 .Nm rrw_init_flags ,
@@ -68,6 +69,8 @@
 .Fn rw_assert_unlocked "struct rwlock *rwl"
 .Ft int
 .Fn rw_status "struct rwlock *rwl"
+.Ft int
+.Fn rw_status_curproc "struct rwlock *rwl"
 .Fn RWLOCK_INITIALIZER "const char *name"
 .Ft void
 .Fn rrw_init "struct rrwlock *rrwl" "const char *name"
@@ -181,7 +184,7 @@ functions check the status
 .Fa rwl ,
 panicking if it is not write-, read-, any-, or unlocked, respectively.
 .Pp
-.Nm rw_status
+.Fn rw_status
 returns the current state of the lock:
 .Pp
 .Bl -tag -width "RW_WRITE_OTHER" -offset indent -compact
@@ -194,6 +197,22 @@ Lock is read locked.
 The current thread may be one of the threads that has it locked.
 .It 0
 Lock is not locked.
+.El
+.Pp
+.Fn rw_status_curproc
+returns the current state of the lock relative to the calling thread:
+.Pp
+.Bl -tag -width "RW_WRITE_OTHER" -offset indent -compact
+.It Dv RW_WRITE
+Lock is write locked by the calling thread.
+.It Dv RW_READ
+Lock is read locked.
+The current thread may be one of the threads that has it locked.
+If the kernel has been compiled with
+.Xr witness 4 ,
+the status is exact, and the lock is read locked by the current thread.
+.It 0
+Lock is not locked by the calling thread.
 .El
 .Pp
 A lock declaration may be initialised with the
Index: sys/kern/kern_rwlock.c
===
RCS file: src/sys/kern/kern_rwlock.c,v
retrieving revision 1.32
diff -u -p -r1.32 kern_rwlock.c
--- sys/kern/kern_rwlock.c  24 Oct 2017 08:53:15 -  1.32
+++ sys/kern/kern_rwlock.c  13 Nov 2017 17:05:51 -
@@ -325,6 +325,34 @@ rw_status(struct rwlock *rwl)
return (0);
 }
 
+int
+rw_status_curproc(struct rwlock *rwl)
+{
+   unsigned long owner;
+#ifdef WITNESS
+   int status = witness_status(>rwl_lock_obj);
+
+   if (status != -1) {
+   if (status & LA_XLOCKED)
+   return (RW_WRITE);
+   if (status & LA_SLOCKED)
+   return (RW_READ);
+   return (0);
+   }
+#endif
+
+   owner = rwl->rwl_owner;
+   if (owner & RWLOCK_WRLOCK) {
+   if (RW_PROC(curproc) == RW_PROC(owner))
+   return (RW_WRITE);
+   else
+   return (0);
+   }
+   if (owner)
+   return RW_READ;
+   return (0);
+}
+
 #ifdef DIAGNOSTIC
 void
 rw_assert_wrlock(struct rwlock *rwl)
Index: sys/kern/subr_witness.c
===
RCS file: src/sys/kern/subr_witness.c,v
retrieving revision 1.4
diff -u -p -r1.4 subr_witness.c
--- sys/kern/subr_witness.c 12 Aug 2017 03:13:23 -  1.4
+++ sys/kern/subr_witness.c 13 Nov 2017 17:05:51 -
@@ -744,8 +744,8 @@ witness_checkorder(struct lock_object *l
struct witness *w, *w1;
int i, j, s;
 
-   if (witness_cold || witness_watch < 1 || lock->lo_witness == NULL ||
-   panicstr != NULL)
+   if (witness_cold || witness_watch < 1 || panicstr != NULL ||
+   (lock->lo_flags & LO_WITNESS) == 0)
return;
 
w = lock->lo_witness;
@@ -1078,8 +1078,8 @@ witness_lock(struct lock_object *lock, i
struct witness *w;
int s;
 
-   if (witness_cold || witness_watch == -1 || lock->lo_witness == NULL ||
-   panicstr != NULL)
+   if (witness_cold || witness_watch == -1 || panicstr != NULL ||
+   (lock->lo_flags & LO_WITNESS) == 0)
return;
 
w = lock->lo_witness;
@@ -2004,6 +2004,42 @@ witness_assert(const struct lock_object 
 
}
 #endif /* INVARIANT_SUPPORT */
+}
+
+int
+witness_status(struct lock_object *lock)
+{
+   struct lock_instance *instance;
+   struct lock_class *class;
+   int status = 0;
+
+   if (witness_cold || witness_watch < 1 || panicstr != NULL ||
+   (lock->lo_flags & LO_WITNESS) == 0)
+   return -1;
+
+   class = LOCK_CLASS(lock);

Re: Question about tables in nested anchor on pf since 6.1

2017-11-13 Thread Leonardo Guardati
If I change the load statement in pf.uno using
the full path ( /uno/due instead of due ); thens
 there is no error; but still no table is loaded.

/etc/pf.conf:
###

block log


anchor "uno"
load anchor "uno" from "/etc/pf.uno"

###


/etc/pf.uno
###


anchor "due"
load anchor "/uno/due" from "/etc/pf.due"

###


/etc/pf.due
###

table  { 10.0.0.1 }

pass from 

###




Now no error is given:

# pfctl -ef /etc/pf.conf  
pfctl: pfctl_rules
pfctl: load anchors
# 


But the table is not loaded:

# pfctl -a uno/due -t foo -T show  
pfctl: Table does not exist.
# 



Re: xf86-video-intel patch to test

2017-11-13 Thread Stefan Sperling
On Sat, Nov 11, 2017 at 07:57:16PM +0100, Matthieu Herrb wrote:
> Hi,
> 
> the patch below should not affect the intel(4) X.Org driver
> functionality. It's sole purpose is to make it compatible with the
> future upgrade to the X.Org 1.19 xserver.
> 
> But since I don't have much hardware still using the intel driver (we
> switched to modesettings(4) for many devices), I'd like to have this
> tested against the current X server as much as possible.
> 
> To test it, you need to check out the /usr/xenocara source tree and
> then:
> 
> cd /usr/xenocara/driver/xf86-video-intel
> patch -p0 -E < /this/patch
> doas make -f Makefile.bsd-wrapper obj
> doas make -f Makefile.bsd-wrapper build
> 
> Then restart the X server.
> 
> Comments, oks are welcome too.
> 
> Thanks in advance.

No regression on my x201.



Question about tables in nested anchor on pf since 6.1

2017-11-13 Thread Leonardo Guardati
Hi,
   there is a confusing error message in 6.1 and 6.2
(not in 6.0) when using a table inside a nested anchor.

here the rules:

/etc/pf.conf:
###

block log


anchor "uno"
load anchor "uno" from "/etc/pf.uno"

###


/etc/pf.uno
###


anchor "due"
load anchor "due" from "/etc/pf.due"

###


/etc/pf.due
###

table  { 10.0.0.1 }

pass from 

###


on OpenBSD 6.0:

# pfctl -ef /etc/pf.conf
pfctl: pf already enabled



on 6.1 and 6.2:

# pfctl -ef /etc/pf.conf
/etc/pf.due:1: cannot define table foo: Device busy
pfctl: Syntax error in config file: pf rules not loaded
pfctl: load anchors




I've tried to debug, and here is the backtrace for 6.0 and 6.1:


OpenBSD-6.0:

Thread 3 hit Breakpoint 1, pfr_ina_define (tbl=0x80314800, 
addr=0x10cf6f2a7300, size=1, nadd=0x80314c3c, naddr=0x80314c38, 
ticket=11, flags=268435472)
at ../../../../net/pf_table.c:1609
1609{
(gdb) bt
#0  pfr_ina_define (tbl=0x80314800, addr=0x10cf6f2a7300, size=1, 
nadd=0x80314c3c, naddr=0x80314c38, ticket=11, flags=268435472) 
at ../../../../net/pf_table.c:1609
#1  0x811ca27a in pfioctl (dev=18688, cmd=3293594701, 
addr=0x80314800 "uno/due", flags=3, p=0x8000212a5c88) at 
../../../../net/pf_ioctl.c:1999
#2  0x8129b086 in spec_ioctl (v=0x8000212eeb40) at 
../../../../kern/spec_vnops.c:370
#3  0x812979b7 in VOP_IOCTL (vp=0xff006fa93cc0, command=3293594701, 
data=0x80314800, fflag=3, cred=0xff0005bfc840, p=0x8000212a5c88)
at ../../../../kern/vfs_vops.c:259
#4  0x81299600 in vn_ioctl (fp=0xff006db65558, com=3293594701, 
data=0x80314800 "uno/due", p=0x8000212a5c88) at 
../../../../kern/vfs_vnops.c:485
#5  0x8125b746 in sys_ioctl (p=0x8000212a5c88, 
v=0x8000212eee50, retval=0x8000212eeea0) at 
../../../../kern/sys_generic.c:516
#6  0x8147fea0 in mi_syscall (p=0x8000212a5c88, code=54, 
callp=0x81b87040 , argp=0x8000212eee50, 
retval=0x8000212eeea0)
at ../../../../sys/syscall_mi.h:77
#7  0x8147fc94 in syscall (frame=0x8000212eef20) at 
../../../../arch/amd64/amd64/trap.c:597
#8  0x8100180b in Xsyscall ()
#9  0x0003 in ?? ()
#10 0xc450444d in ?? ()
#11 0x7f7d0e40 in ?? ()
#12 0x10cd57535c1a in ?? ()
#13 0x7f7d1268 in ?? ()
#14 0x7f7d1728 in ?? ()
#15 0x in ?? ()
(gdb) 







OpenBSD-6.1:
Thread 1 hit Breakpoint 1, pfr_ina_define (tbl=0x8035c800, 
addr=0x1fced50fc300, size=1, nadd=0x8035cc3c, naddr=0x8035cc38, 
ticket=7, flags=268435472)
at /usr/src/sys/net/pf_table.c:1599
1599{
(gdb) bt
#0  pfr_ina_define (tbl=0x8035c800, addr=0x1fced50fc300, size=1, 
nadd=0x8035cc3c, naddr=0x8035cc38, ticket=7, flags=268435472) 
at /usr/src/sys/net/pf_table.c:1599
#1  0x811cb163 in pfioctl (dev=18688, cmd=3293594701, 
addr=0x8035c800 "/due", flags=3, p=0x8000212ab0d8) at 
/usr/src/sys/net/pf_ioctl.c:2000
#2  0x8129a8f6 in spec_ioctl (v=0x80002132cb40) at 
/usr/src/sys/kern/spec_vnops.c:370
#3  0x81297223 in VOP_IOCTL (vp=0xff0056011230, command=3293594701, 
data=0x8035c800, fflag=3, cred=0xff007f7ac840, p=0x8000212ab0d8)
at /usr/src/sys/kern/vfs_vops.c:259
#4  0x81298e71 in vn_ioctl (fp=0xff005c9d1aa0, com=3293594701, 
data=0x8035c800 "/due", p=0x8000212ab0d8) at 
/usr/src/sys/kern/vfs_vnops.c:487
#5  0x8125c5ba in sys_ioctl (p=0x8000212ab0d8, 
v=0x80002132ce50, retval=0x80002132cea0) at 
/usr/src/sys/kern/sys_generic.c:516
#6  0x8148a642 in mi_syscall (p=0x8000212ab0d8, code=54, 
callp=0x81bc1260 , argp=0x80002132ce50, 
retval=0x80002132cea0)
at /usr/src/sys/sys/syscall_mi.h:77
#7  0x8148a436 in syscall (frame=0x80002132cf20) at 
/usr/src/sys/arch/amd64/amd64/trap.c:600
#8  0x8100180b in Xsyscall ()
#9  0x0003 in ?? ()
#10 0xc450444d in ?? ()
#11 0x7f7bbae0 in ?? ()
#12 0x1fcccfb2f47a in ?? ()
#13 0x7f7bbf08 in ?? ()
#14 0x7f7bc3c8 in ?? ()
#15 0x in ?? ()
(gdb) 





I can see that in 6.0 pfioctl() is passed "uno/due"; while in 6.1 there is only 
"/due" in addr.

Also, I see the code execution difference at /usr/src/sys/net/pf_table.c:1624 
when calling:

  rs = pf_find_ruleset(tbl->pfrt_anchor);

in 6.0 I have rs filled, with :

(gdb) p rs
$2 = (struct pf_ruleset *) 0x80310490
(gdb) p rs->topen
$3 = 1
(gdb) p ticket
$4 = 11
(gdb) p rs->tticket
$5 = 11

that make the the following test pass:

1625if (rs == NULL || !rs->topen || ticket != rs->tticket)
1626return (EBUSY);


this is not happening in 6.1:

(gdb) 

NET_RLOCK in ifioctl()

2017-11-13 Thread Theo Buehler
The diff below pushes the NET_LOCK into ifioctl() and reduces it to
NET_RLOCK where possible. In particular, this will allow SIOCG* requests
to run in parallel.

ok?

Index: sys/kern/sys_socket.c
===
RCS file: /var/cvs/src/sys/kern/sys_socket.c,v
retrieving revision 1.33
diff -u -p -r1.33 sys_socket.c
--- sys/kern/sys_socket.c   11 Aug 2017 21:24:19 -  1.33
+++ sys/kern/sys_socket.c   13 Nov 2017 08:52:44 -
@@ -125,9 +125,7 @@ soo_ioctl(struct file *fp, u_long cmd, c
 * different entry since a socket's unnecessary
 */
if (IOCGROUP(cmd) == 'i') {
-   NET_LOCK();
error = ifioctl(so, cmd, data, p);
-   NET_UNLOCK();
return (error);
}
if (IOCGROUP(cmd) == 'r')
Index: sys/net/if.c
===
RCS file: /var/cvs/src/sys/net/if.c,v
retrieving revision 1.526
diff -u -p -r1.526 if.c
--- sys/net/if.c12 Nov 2017 14:11:15 -  1.526
+++ sys/net/if.c13 Nov 2017 08:52:44 -
@@ -1810,16 +1810,26 @@ ifioctl(struct socket *so, u_long cmd, c
 
switch (cmd) {
case SIOCIFCREATE:
+   if ((error = suser(p, 0)) != 0)
+   return (error);
+   NET_LOCK();
+   error = if_clone_create(ifr->ifr_name, 0);
+   NET_UNLOCK();
+   return (error);
case SIOCIFDESTROY:
if ((error = suser(p, 0)) != 0)
return (error);
-   return ((cmd == SIOCIFCREATE) ?
-   if_clone_create(ifr->ifr_name, 0) :
-   if_clone_destroy(ifr->ifr_name));
+   NET_LOCK();
+   error = if_clone_destroy(ifr->ifr_name);
+   NET_UNLOCK();
+   return (error);
case SIOCSIFGATTR:
if ((error = suser(p, 0)) != 0)
return (error);
-   return (if_setgroupattribs(data));
+   NET_LOCK();
+   error = if_setgroupattribs(data);
+   NET_UNLOCK();
+   return (error);
case SIOCGIFCONF:
case SIOCIFGCLONERS:
case SIOCGIFGMEMB:
@@ -1845,6 +1855,8 @@ ifioctl(struct socket *so, u_long cmd, c
oif_flags = ifp->if_flags;
oif_xflags = ifp->if_xflags;
 
+   NET_LOCK();
+
switch (cmd) {
case SIOCIFAFATTACH:
case SIOCIFAFDETACH:
@@ -2093,6 +2105,8 @@ ifioctl(struct socket *so, u_long cmd, c
if (((oif_flags ^ ifp->if_flags) & IFF_UP) != 0)
getmicrotime(>if_lastchange);
 
+   NET_UNLOCK();
+
return (error);
 }
 
@@ -2109,19 +2123,33 @@ ifioctl_get(u_long cmd, caddr_t data)
 
switch(cmd) {
case SIOCGIFCONF:
-   return (ifconf(data));
+   NET_RLOCK();
+   error = ifconf(data);
+   NET_RUNLOCK();
+   return (error);
case SIOCIFGCLONERS:
-   return (if_clone_list((struct if_clonereq *)data));
+   NET_RLOCK();
+   error = if_clone_list((struct if_clonereq *)data);
+   NET_RUNLOCK();
+   return (error);
case SIOCGIFGMEMB:
-   return (if_getgroupmembers(data));
+   NET_RLOCK();
+   error = if_getgroupmembers(data);
+   NET_RUNLOCK();
+   return (error);
case SIOCGIFGATTR:
-   return (if_getgroupattribs(data));
+   NET_RLOCK();
+   error = if_getgroupattribs(data);
+   NET_RUNLOCK();
+   return (error);
}
 
ifp = ifunit(ifr->ifr_name);
if (ifp == NULL)
return (ENXIO);
 
+   NET_RLOCK();
+
switch(cmd) {
case SIOCGIFFLAGS:
ifr->ifr_flags = ifp->if_flags;
@@ -2187,6 +2215,8 @@ ifioctl_get(u_long cmd, caddr_t data)
default:
panic("invalid ioctl %lu", cmd);
}
+
+   NET_RUNLOCK();
 
return (error);
 }
Index: sys/nfs/nfs_boot.c
===
RCS file: /var/cvs/src/sys/nfs/nfs_boot.c,v
retrieving revision 1.43
diff -u -p -r1.43 nfs_boot.c
--- sys/nfs/nfs_boot.c  11 Aug 2017 21:24:20 -  1.43
+++ sys/nfs/nfs_boot.c  13 Nov 2017 08:52:44 -
@@ -159,15 +159,11 @@ nfs_boot_init(struct nfs_diskless *nd, s
 */
if ((error = socreate(AF_INET, , SOCK_DGRAM, 0)) != 0)
panic("nfs_boot: socreate, error=%d", error);
-   NET_LOCK();
error = ifioctl(so, SIOCGIFFLAGS, (caddr_t), procp);
-   NET_UNLOCK();
if (error)
panic("nfs_boot: GIFFLAGS, error=%d", error);
ireq.ifr_flags |= IFF_UP;
-   NET_LOCK();
error = ifioctl(so, 

ifpromic() change & bridge/switch(4) cleanup

2017-11-13 Thread Martin Pieuchot
In order to reduce the number of ifp->if_ioctl() calls, I'd like to
simplify ifpromic().

There's no need for the underlying interface to be UP when calling
ifpromic().  If the interface is DOWN, set or remove the flag and
return.  Next time the interface will be brought UP its multicast
filters/promiscuous mode will be configured.

This allows us to remove duplicated code in bridge/switch(4) to work
around the stupid requirement.

ok?

Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.526
diff -u -p -r1.526 if.c
--- net/if.c12 Nov 2017 14:11:15 -  1.526
+++ net/if.c13 Nov 2017 15:19:44 -
@@ -2639,32 +2639,33 @@ int
 ifpromisc(struct ifnet *ifp, int pswitch)
 {
struct ifreq ifr;
+   unsigned short oif_flags;
+   int oif_pcount, error;
 
+   oif_flags = ifp->if_flags;
+   oif_pcount = ifp->if_pcount;
if (pswitch) {
-   /*
-* If the device is not configured up, we cannot put it in
-* promiscuous mode.
-*/
-   if ((ifp->if_flags & IFF_UP) == 0)
-   return (ENETDOWN);
if (ifp->if_pcount++ != 0)
return (0);
ifp->if_flags |= IFF_PROMISC;
} else {
if (--ifp->if_pcount > 0)
return (0);
-   ifp->if_flags &= ~IFF_PROMISC;
-   /*
-* If the device is not configured up, we should not need to
-* turn off promiscuous mode (device should have turned it
-* off when interface went down; and will look at IFF_PROMISC
-* again next time interface comes up).
-*/
-   if ((ifp->if_flags & IFF_UP) == 0)
-   return (0);
+ifp->if_flags &= ~IFF_PROMISC;
}
+
+   if ((ifp->if_flags & IFF_UP) == 0)
+   return (0);
+
+   memset(, 0, sizeof(ifr));
ifr.ifr_flags = ifp->if_flags;
-   return ((*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)));
+   error = ((*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t)));
+   if (error) {
+   ifp->if_flags = oif_flags;
+   ifp->if_pcount = oif_pcount;
+   }
+
+   return (error);
 }
 
 void
Index: net/if_bridge.c
===
RCS file: /cvs/src/sys/net/if_bridge.c,v
retrieving revision 1.298
diff -u -p -r1.298 if_bridge.c
--- net/if_bridge.c 17 Aug 2017 10:14:08 -  1.298
+++ net/if_bridge.c 13 Nov 2017 15:21:11 -
@@ -299,41 +299,9 @@ bridge_ioctl(struct ifnet *ifp, u_long c
}
 
if (ifs->if_type == IFT_ETHER) {
-   if ((ifs->if_flags & IFF_UP) == 0) {
-   struct ifreq ifreq;
-
-   /*
-* Bring interface up long enough to set
-* promiscuous flag, then shut it down again.
-*/
-   strlcpy(ifreq.ifr_name, req->ifbr_ifsname,
-   IFNAMSIZ);
-   ifs->if_flags |= IFF_UP;
-   ifreq.ifr_flags = ifs->if_flags;
-   error = (*ifs->if_ioctl)(ifs, SIOCSIFFLAGS,
-   (caddr_t));
-   if (error != 0)
-   break;
-
-   error = ifpromisc(ifs, 1);
-   if (error != 0)
-   break;
-
-   strlcpy(ifreq.ifr_name, req->ifbr_ifsname,
-   IFNAMSIZ);
-   ifs->if_flags &= ~IFF_UP;
-   ifreq.ifr_flags = ifs->if_flags;
-   error = (*ifs->if_ioctl)(ifs, SIOCSIFFLAGS,
-   (caddr_t));
-   if (error != 0) {
-   ifpromisc(ifs, 0);
-   break;
-   }
-   } else {
-   error = ifpromisc(ifs, 1);
-   if (error != 0)
-   break;
-   }
+   error = ifpromisc(ifs, 1);
+   if (error != 0)
+   break;
}
 #if NMPW > 0
else if (ifs->if_type == IFT_MPLSTUNNEL) {
Index: net/if_switch.c
===
RCS file: /cvs/src/sys/net/if_switch.c,v
retrieving revision 1.20
diff -u -p -r1.20 if_switch.c
--- net/if_switch.c 31 May 2017 

Re: un-KERNEL_LOCK() TCP/UDP input & co

2017-11-13 Thread Alexandr Nedvedicky
Hello,

> > So I'd like to know, what's the plan for NET_ASSERT_LOCKED() macro?
> > 
> > a) are we going to relax it, so it will test if the net_lock is
> > locked?
> 
> Yep, that's already done.

cool, thanks a lot. I've just noticed the change is there while ago, while
reading another thread here [1] (assertion "_kernel_lock_held()" failed,
uipc_socket2.c: ipsec)


thanks and
regards
sasha

[1] https://marc.info/?l=openbsd-tech=151052226224227=2 



Re: README patch

2017-11-13 Thread Sebastian Benoit
Jason McIntyre(j...@kerhand.co.uk) on 2017.11.09 19:45:53 +:
> On Thu, Nov 09, 2017 at 08:42:32PM +0100, Ingo Schwarze wrote:
> > Hi,
> > 
> > Jason McIntyre wrote on Thu, Nov 09, 2017 at 07:26:32PM +:
> > > On Wed, Nov 08, 2017 at 08:14:24PM -0600, Edgar Pettijohn wrote:
> > 
> > >> --- README.orig 2017-11-08 20:11:47.091955000 -0600
> > >> +++ README  2017-11-08 20:12:19.787639000 -0600
> > >> @@ -49,8 +49,8 @@
> > >>  
> > >>  No major funding or cost-sharing of the project comes from any company
> > >>  or educational institution. Theo works full-time on improving OpenBSD
> > >> -and paying bills, many other developers expend spend significant
> > >> -quantities of time as well.
> > >> +and paying bills, many other developers spend significant quantities
> > >> +of time as well.
> > >>  
> > >>  For those unable to make their contributions as straightforward gifts,
> > >>  the OpenBSD Foundation (http://OpenBSDFoundation.org) is a Canadian
> > >> 
> > >> 
> > >> Or perhaps it should be expend instead. Not sure, but both sound weird.
> > 
> > > both suggested fixes would be fine.
> > 
> > No it would not.  The first sentence is no longer true, see
> > 
> >   http://www.openbsdfoundation.org/contributors.html
> > 
> > So this would need a much larger fix.
> > 
> 
> fair enough. my comment was about the grammar really, rather than the
> content.
> 
> > > however i have no idea which README this is...
> > 
> > Indeed, i see no evidence that this old text still exists anywhere.
> > At least
> > 
> >   locate README | xargs grep -F 'No major funding'
> > 
> > turns up nothing for me.

http://ftp.openbsd.org/pub/OpenBSD/6.2/README



Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec

2017-11-13 Thread Martin Pieuchot
On 13/11/17(Mon) 12:33, Stuart Henderson wrote:
> On 2017/11/13 13:17, Martin Pieuchot wrote:
> > [...]
> > So it seems that two of your CPU end up looking at/dealing with
> > corrupted memory...
> 
> Is that for sure? 2 does normally print a trace, 3 also drops into ddb.

But none of them print:

panic: spl assertion failure in soassertlocked.

However it might just be a race because the other CPU just entered panic
and set splassert_ctl to 0.

> Same after an hour or two uptime, but this time I get some "netlock:
> lock not held" from some cpu or other, and some functions in the bits of
> the trace that get displayed:
> 
> login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file 
> "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310
> Starting stack trace...
> panic() at panic+0x11b
> __assert(812105d4,80001f898a70,ff0063dc5b00,ff0061804318) 
> at __assert+0x24
> sbappendaddr(0,ff0061804318,ff005fca5600,0,ff0063dc5b00) at 
> sbappendaddrpanic: netlock: lock not held

Does the diff below help?  It should in any case reduce the "netlock:
lock not held" noises.

Index: net/pfkeyv2.c
===
RCS file: /cvs/src/sys/net/pfkeyv2.c,v
retrieving revision 1.173
diff -u -p -r1.173 pfkeyv2.c
--- net/pfkeyv2.c   12 Nov 2017 14:11:15 -  1.173
+++ net/pfkeyv2.c   13 Nov 2017 12:57:36 -
@@ -428,12 +428,14 @@ pfkeyv2_sendmessage(void **headers, int 
 * Search for promiscuous listeners, skipping the
 * original destination.
 */
+   KERNEL_LOCK();
LIST_FOREACH(s, _sockets, kcb_list) {
if ((s->flags & PFKEYV2_SOCKETFLAGS_PROMISC) &&
(s->rcb.rcb_socket != so) &&
(s->rdomain == rdomain))
pfkey_sendup(s, packet, 1);
}
+   KERNEL_UNLOCK();
m_freem(packet);
break;
 
@@ -442,6 +444,7 @@ pfkeyv2_sendmessage(void **headers, int 
 * Send the message to all registered sockets that match
 * the specified satype (e.g., all IPSEC-ESP negotiators)
 */
+   KERNEL_LOCK();
LIST_FOREACH(s, _sockets, kcb_list) {
if ((s->flags & PFKEYV2_SOCKETFLAGS_REGISTERED) &&
(s->rdomain == rdomain)) {
@@ -454,6 +457,7 @@ pfkeyv2_sendmessage(void **headers, int 
}
}
}
+   KERNEL_UNLOCK();
/* Free last/original copy of the packet */
m_freem(packet);
 
@@ -472,21 +476,25 @@ pfkeyv2_sendmessage(void **headers, int 
goto ret;
 
/* Send to all registered promiscuous listeners */
+   KERNEL_LOCK();
LIST_FOREACH(s, _sockets, kcb_list) {
if ((s->flags & PFKEYV2_SOCKETFLAGS_PROMISC) &&
!(s->flags & PFKEYV2_SOCKETFLAGS_REGISTERED) &&
(s->rdomain == rdomain))
pfkey_sendup(s, packet, 1);
}
+   KERNEL_UNLOCK();
m_freem(packet);
break;
 
case PFKEYV2_SENDMESSAGE_BROADCAST:
/* Send message to all sockets */
+   KERNEL_LOCK();
LIST_FOREACH(s, _sockets, kcb_list) {
if (s->rdomain == rdomain)
pfkey_sendup(s, packet, 1);
}
+   KERNEL_UNLOCK();
m_freem(packet);
break;
}
@@ -1010,11 +1018,13 @@ pfkeyv2_send(struct socket *so, void *me
goto ret;
 
/* Send to all promiscuous listeners */
+   KERNEL_LOCK();
LIST_FOREACH(bkp, _sockets, kcb_list) {
if ((bkp->flags & PFKEYV2_SOCKETFLAGS_PROMISC) &&
(bkp->rdomain == rdomain))
pfkey_sendup(bkp, packet, 1);
}
+   KERNEL_UNLOCK();
 
m_freem(packet);
 
@@ -1788,12 +1798,15 @@ pfkeyv2_send(struct socket *so, void *me
if ((rval = pfdatatopacket(message, len, )) != 0)
goto ret;
 
-   LIST_FOREACH(bkp, _sockets, kcb_list)
+   KERNEL_LOCK();
+   LIST_FOREACH(bkp, _sockets, kcb_list) {
if ((bkp != kp) &&
(bkp->rdomain == rdomain) &&
(!smsg->sadb_msg_seq ||
(smsg->sadb_msg_seq == kp->pid)))
pfkey_sendup(bkp, packet, 1);
+

Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec

2017-11-13 Thread Sebastien Marie
On Mon, Nov 13, 2017 at 12:33:35PM +, Stuart Henderson wrote:
> 
> Same after an hour or two uptime, but this time I get some "netlock:
> lock not held" from some cpu or other, and some functions in the bits of
> the trace that get displayed:
> 
> login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file 
> "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310

just a simple question regarding the previous line.

does the start of the line ("login: ") is part of the kernel output or
it is just the login(1) prompt on console (printed long time before the
panic) and you copied the whole line ?

thanks.
-- 
Sebastien Marie



Re: try to bundle multiple packets on an ifq before sending

2017-11-13 Thread Martin Pieuchot
On 13/11/17(Mon) 20:28, David Gwynne wrote:
> 
> > On 13 Nov 2017, at 5:37 pm, Martin Pieuchot  wrote:
> > 
> > On 13/11/17(Mon) 10:56, David Gwynne wrote:
> >> On Sun, Nov 12, 2017 at 02:45:05PM +0100, Martin Pieuchot wrote:
> >>> [...]
> >>> We're currently using net_tq() to distribute load for incoming packets.
> >>> So I believe you should schedule the task on the current taskq or the
> >>> first one if coming from userland.
> >> 
> >> /em shrug.
> >> 
> >> i dont know how to tell what the current softnet tq is.
> > 
> > That's something we need to know.  This will allow to have per taskq
> > data structure.
> 
> maybe we could use curproc()->p_spare as an index into the array of softnets.

I'd would be handy.

> >> be enough to simply not mix if the ifq_idx into the argument to
> >> net_tq?
> > 
> > Maybe.  If a CPU processing packets from myx0 in softnet0 and want to
> > send forward via myx1, using net_tq() like you do it know will pick
> > softnet1.
> > My suggestion was to keep softnet1 busy with processing packets coming
> > from myx1 and enqueue the task on softnet0.  This way no inter CPU
> > communication is needed.
> > But without testing we can't tell if it's more efficient the other way.
> 
> scheduling the bundled start routine as a task on the current softnet would 
> require each ifq to allocate a task for each softnet, and would probably 
> result in a lot of thrashing if multiple threads are pushing packets onto a 
> ring. one of them would have to remove the work from all of them.
> 
> i think this is good enough for now. my tests with 2 softnets mixed in with 
> this diff seemed ok. with 8 it sucked cos there was so much waiting on the 
> netlock, so it's hard to say what the effect of this is.

Good to know.  Then let's move forward the way it is.
 
>   because ifq work could now be pending in a softnet taskq,
>  ifq_barrier also needs to put a barrier in the taskq. this is
>  implemented using taskq_barrier, which i wrote ages ago but didn't
>  have a use case for at the time.
> >> 
> >> both hrvoje and i hit a deadlock when downing an interface. if
> >> softnet (or softnets) are waiting on the netlock that and ioctl
> >> path holds, then an ifq_barrier in that ioctl path will sleep forever
> >> waiting for a task to run in softnets that are waiting for the
> >> netlock.
> >> 
> >> to mitigate this ive added code like what uvm has. thoughts?
> > 
> > That's only called in ioctl(2) path, right?  Which commands end up
> > there?
> 
> ifq_barrier is generally called from the drv_stop/drv_down path by mpsafe 
> drivers. if you go ifconfig down is the most obvious. other ioctl paths 
> include those that generate ENETRESET, eg, configuring lladdr. some drivers 
> also call their down routine during its own reconfiguration, or to fix a 
> stuck ring.
> 
> this diff also adds a call to ifq_barrier inside ifq_destroy, which is called 
> when an interface is detached.

Then please add the following comment on top of the unlock:

/* XXXSMP breaks atomicity */

It helps me keep track of the recursions that need to be fixed.  With
that ok mpi@

> > The problem with driver *_ioctl() routines is that pseudo-driver need
> > the NET_LOCK() while the others should not.  So I'd rather see the
> > NET_LOCK() released before calling ifp->if_ioctl().
> > 
> >> Index: share/man/man9/task_add.9
> >> ===
> >> RCS file: /cvs/src/share/man/man9/task_add.9,v
> >> retrieving revision 1.16
> >> diff -u -p -r1.16 task_add.9
> >> --- share/man/man9/task_add.9  14 Sep 2015 15:14:55 -  1.16
> >> +++ share/man/man9/task_add.9  13 Nov 2017 00:46:17 -
> >> @@ -20,6 +20,7 @@
> >> .Sh NAME
> >> .Nm taskq_create ,
> >> .Nm taskq_destroy ,
> >> +.Nm taskq_barrier ,
> >> .Nm task_set ,
> >> .Nm task_add ,
> >> .Nm task_del ,
> >> @@ -37,6 +38,8 @@
> >> .Ft void
> >> .Fn taskq_destroy "struct taskq *tq"
> >> .Ft void
> >> +.Fn taskq_barrier "struct taskq *tq"
> >> +.Ft void
> >> .Fn task_set "struct task *t" "void (*fn)(void *)" "void *arg"
> >> .Ft int
> >> .Fn task_add "struct taskq *tq" "struct task *t"
> >> @@ -88,6 +91,15 @@ Calling
> >> against the system taskq is an error and will lead to undefined
> >> behaviour or a system fault.
> >> .Pp
> >> +.Fn taskq_barrier
> >> +guarantees that any task that was running on the
> >> +.Fa tq
> >> +taskq when the barrier was called has finished by the time the barrier
> >> +returns.
> >> +.Fn taskq_barrier
> >> +is only supported on taskqs serviced by 1 thread,
> >> +and may not be called by a task running in the specified taskq.
> >> +.Pp
> >> It is the responsibility of the caller to provide the
> >> .Fn task_set ,
> >> .Fn task_add ,
> >> @@ -163,6 +175,8 @@ argument given in
> >> and
> >> .Fn taskq_destroy
> >> can be called during autoconf, or from process context.
> >> +.Fn taskq_barrier
> >> +can be called from process context.
> >> .Fn 

Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec

2017-11-13 Thread Stuart Henderson
On 2017/11/13 13:17, Martin Pieuchot wrote:
> On 13/11/17(Mon) 10:03, Stuart Henderson wrote:
> > On 2017/11/13 08:44, Martin Pieuchot wrote:
> > > On 12/11/17(Sun) 22:10, Stuart Henderson wrote:
> > > > On 2017/11/12 22:48, Martin Pieuchot wrote:
> > > > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote:
> > > > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th:
> > > > > 
> > > > > Weird, did you tweak "kern.splassert" on this box?   Otherwise is 
> > > > > looks
> > > > > like a major corruption.
> > > > 
> > > > It would have kern.splassert=2. (I know this can cause problems
> > > > sometimes, though this would be the first time in 5+ years I've bumped
> > > > into it, most of my routers where I have serial console have this set).
> > > 
> > > Well the panic below correspond to a value of 0 or > 3.
> > 
> > Confirmed, it was definitely set to 2.
> 
> So it seems that two of your CPU end up looking at/dealing with
> corrupted memory...

Is that for sure? 2 does normally print a trace, 3 also drops into ddb.

> > > > I'm trying to get more information because it had either hanged or
> > > > panicked previously (it didn't have serial connected at the time and
> > > > the machine was needed so it had to be rebooted before I had chance
> > > > to dig into it).
> > > 
> > > From which snapshot was the kernel that hanged or panic'd?
> > > 
> > 
> > It was running this:
> > 
> > OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov  7 18:41:54 MST 2017
> > 
> > I've got it onto a remote control PDU now, now looking for some machine
> > with an old enough ssh client to be able to connect to the PDU :-|
> > 
> > Which kernel would be most useful to run now?
> 
> -current
> 
> > I have now moved it to -current GENERIC.MP with the "fast path chunk
> > removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect
> > as maybe having some issues.
> 
> That's perfect from my point of view.
> 

Same after an hour or two uptime, but this time I get some "netlock:
lock not held" from some cpu or other, and some functions in the bits of
the trace that get displayed:

login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file 
"/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310
Starting stack trace...
panic() at panic+0x11b
__assert(812105d4,80001f898a70,ff0063dc5b00,ff0061804318) 
at __assert+0x24
sbappendaddr(0,ff0061804318,ff005fca5600,0,ff0063dc5b00) at 
sbappendaddrpanic: netlock: lock not held
Faulted in traceback, aborting...
+0x276
pfkey_sendup(4,c,808f8b00) at pfkey_sendup+0x75
pfkeyv2_sendmessage(ff00617e9160,80902700,ff00617e00a0,1,809027d8,2)
 at pfkeyv2_sendmessage+0x228
pfkeyv2_acquire(ff00617e924c,ff0067772090,ff006777201c,ff00617e9160,80001f898dc8)
 at pfkeyv2_acquire+0x553
ipsp_acquire_sa(ff00617e9160,0,804d3880,80001f898f20,0) at 
panic: netlock: lock not heldipsp_acquire_sa
Faulted in traceback, aborting...
+0x4c6panic: netlock: lock not held
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ipsp_spd_lookup(panic: ff0005747400,netlock: lock not held
Faulted in traceback, aborting...
0,panic: netlock: lock not held804dc900,80001f898fb0
Faulted in traceback, aborting...
,panic: netlock: lock not held
Faulted in traceback, aborting...
0,panic: netlock: lock not held
Faulted in traceback, aborting...
9c519d9d517a98c1) at panic: netlock: lock not held
Faulted in traceback, aborting...
ipsp_spd_lookuppanic: netlock: lock not held+0xcbe
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ip_output_ipsec_lookup(panic: netlock: lock not held
Faulted in traceback, aborting...
80001f898fc0,panic: netlock: lock not held
Faulted in traceback, aborting...
ff006276f4d4,panic: netlock: lock not held804dc900
Faulted in traceback, aborting...
,panic: netlock: lock not held
Faulted in traceback, aborting...
80001f898fb0,panic: netlock: lock not held
Faulted in traceback, aborting...
0) at panic: netlock: lock not held
Faulted in traceback, aborting...
ip_output_ipsec_lookuppanic: netlock: lock not held+0x34
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ip_output(panic: netlock: lock not held
Faulted in traceback, aborting...
0,panic: 0,netlock: lock not held
Faulted in traceback, aborting...
1,panic: netlock: lock not held
Faulted in traceback, aborting...
ff00615ed020panic: netlock: lock not held
Faulted in traceback, aborting...
,panic: ff0005747400,netlock: lock not held
Faulted in traceback, aborting...
9c519d9d517a98c1) at panic: ip_outputnetlock: lock not held
Faulted in traceback, aborting...
+0x3e7panic: netlock: lock not held
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ip_forward(panic: netlock: lock not held
Faulted in traceback, 

Introduce ipsec_sysctl()

2017-11-13 Thread Martin Pieuchot
This move all IPsec tunables to netinet/ipsec_input.c without breaking
the "net.inet.ip" sysctl(3) namespace.   

The reason for this move is to properly separate IPsec and IP globals
in order to ease the removal of the NET_LOCK() in these layers.

ok?

Index: netinet/in.h
===
RCS file: /cvs/src/sys/netinet/in.h,v
retrieving revision 1.125
diff -u -p -r1.125 in.h
--- netinet/in.h6 Oct 2017 21:14:55 -   1.125
+++ netinet/in.h13 Nov 2017 12:11:16 -
@@ -745,19 +745,19 @@ struct ip_mreq {
_hifirstauto, \
_hilastauto, \
_maxqueue, \
-   , \
+   NULL /* encdebug */, \
NULL, \
-   _expire_acquire, \
-   _keep_invalid, \
-   _require_pfs, \
-   _soft_allocations, \
-   _exp_allocations, \
-   _soft_bytes, \
-   _exp_bytes, \
-   _exp_timeout, \
-   _soft_timeout, \
-   _soft_first_use, \
-   _exp_first_use, \
+   NULL /* ipsec_expire_acquire */, \
+   NULL /* ipsec_keep_invalid */, \
+   NULL /* ipsec_require_pfs */, \
+   NULL /* ipsec_soft_allocations */, \
+   NULL /* ipsec_exp_allocations */, \
+   NULL /* ipsec_soft_bytes */, \
+   NULL /* ipsec_exp_bytes */, \
+   NULL /* ipsec_exp_timeout */, \
+   NULL /* ipsec_soft_timeout */, \
+   NULL /* ipsec_soft_first_use */, \
+   NULL /* ipsec_exp_first_use */, \
NULL, \
NULL, \
NULL, \
Index: netinet/ip_input.c
===
RCS file: /cvs/src/sys/netinet/ip_input.c,v
retrieving revision 1.331
diff -u -p -r1.331 ip_input.c
--- netinet/ip_input.c  10 Nov 2017 08:55:49 -  1.331
+++ netinet/ip_input.c  13 Nov 2017 08:51:37 -
@@ -84,22 +84,6 @@
 #include 
 #endif
 
-int encdebug = 0;
-int ipsec_keep_invalid = IPSEC_DEFAULT_EMBRYONIC_SA_TIMEOUT;
-int ipsec_require_pfs = IPSEC_DEFAULT_PFS;
-int ipsec_soft_allocations = IPSEC_DEFAULT_SOFT_ALLOCATIONS;
-int ipsec_exp_allocations = IPSEC_DEFAULT_EXP_ALLOCATIONS;
-int ipsec_soft_bytes = IPSEC_DEFAULT_SOFT_BYTES;
-int ipsec_exp_bytes = IPSEC_DEFAULT_EXP_BYTES;
-int ipsec_soft_timeout = IPSEC_DEFAULT_SOFT_TIMEOUT;
-int ipsec_exp_timeout = IPSEC_DEFAULT_EXP_TIMEOUT;
-int ipsec_soft_first_use = IPSEC_DEFAULT_SOFT_FIRST_USE;
-int ipsec_exp_first_use = IPSEC_DEFAULT_EXP_FIRST_USE;
-int ipsec_expire_acquire = IPSEC_DEFAULT_EXPIRE_ACQUIRE;
-char ipsec_def_enc[20];
-char ipsec_def_auth[20];
-char ipsec_def_comp[20];
-
 /* values controllable via sysctl */
 intipforwarding = 0;
 intipmforwarding = 0;
@@ -211,10 +195,6 @@ ip_init(void)
for (i = 0; defrootonlyports_udp[i] != 0; i++)
DP_SET(rootonlyports.udp, defrootonlyports_udp[i]);
 
-   strlcpy(ipsec_def_enc, IPSEC_DEFAULT_DEF_ENC, sizeof(ipsec_def_enc));
-   strlcpy(ipsec_def_auth, IPSEC_DEFAULT_DEF_AUTH, sizeof(ipsec_def_auth));
-   strlcpy(ipsec_def_comp, IPSEC_DEFAULT_DEF_COMP, sizeof(ipsec_def_comp));
-
mq_init(_mq, 64, IPL_SOFTNET);
 
 #ifdef IPSEC
@@ -1643,26 +1623,25 @@ ip_sysctl(int *name, u_int namelen, void
  ip_mtudisc_timeout);
NET_UNLOCK();
return (error);
+#ifdef IPSEC
+   case IPCTL_ENCDEBUG:
+   case IPCTL_IPSEC_EXPIRE_ACQUIRE:
+   case IPCTL_IPSEC_EMBRYONIC_SA_TIMEOUT:
+   case IPCTL_IPSEC_REQUIRE_PFS:
+   case IPCTL_IPSEC_SOFT_ALLOCATIONS:
+   case IPCTL_IPSEC_ALLOCATIONS:
+   case IPCTL_IPSEC_SOFT_BYTES:
+   case IPCTL_IPSEC_BYTES:
+   case IPCTL_IPSEC_TIMEOUT:
+   case IPCTL_IPSEC_SOFT_TIMEOUT:
+   case IPCTL_IPSEC_SOFT_FIRSTUSE:
+   case IPCTL_IPSEC_FIRSTUSE:
case IPCTL_IPSEC_ENC_ALGORITHM:
-   NET_LOCK();
-   error = sysctl_tstring(oldp, oldlenp, newp, newlen,
-  ipsec_def_enc, sizeof(ipsec_def_enc));
-   NET_UNLOCK();
-   return (error);
case IPCTL_IPSEC_AUTH_ALGORITHM:
-   NET_LOCK();
-   error = sysctl_tstring(oldp, oldlenp, newp, newlen,
-  ipsec_def_auth,
-  sizeof(ipsec_def_auth));
-   NET_UNLOCK();
-   return (error);
case IPCTL_IPSEC_IPCOMP_ALGORITHM:
-   NET_LOCK();
-   error = sysctl_tstring(oldp, oldlenp, newp, newlen,
-  ipsec_def_comp,
-  sizeof(ipsec_def_comp));
-   NET_UNLOCK();
-   return (error);
+   return (ipsec_sysctl(name, namelen, oldp, oldlenp, newp,
+   newlen));
+#endif
case IPCTL_IFQUEUE:
return (sysctl_niq(name + 1, namelen - 1,
oldp, oldlenp, newp, newlen, ));
Index: netinet/ip_ipsp.h

Re: un-KERNEL_LOCK() TCP/UDP input & co

2017-11-13 Thread Martin Pieuchot
On 13/11/17(Mon) 12:14, Alexandr Nedvedicky wrote:
> your change looks good to me. Though I have a comment/question to your diff.
> 
> > Index: net/if_vxlan.c
> > ===
> > RCS file: /cvs/src/sys/net/if_vxlan.c,v
> > retrieving revision 1.63
> > diff -u -p -r1.63 if_vxlan.c
> > --- net/if_vxlan.c  25 Oct 2017 09:24:09 -  1.63
> > +++ net/if_vxlan.c  8 Nov 2017 14:49:58 -
> > @@ -611,6 +611,7 @@ vxlan_lookup(struct mbuf *m, struct udph
> > vni = VXLAN_VNI_UNSET;
> > }
> >  
> > +   NET_ASSERT_LOCKED();
> > /* First search for a vxlan(4) interface with the packet's VNI */
> > LIST_FOREACH(sc, _tagh[VXLAN_TAGHASH(vni)], sc_entry) {
> > if ((uh->uh_dport == sc->sc_dstport) &&
> > Index: net/pf.c
> 
> I think we are approaching point, which requires us to revisit
> NET_ASSERT_LOCKED(). The assert currently tests caller is writer
> on net_lock.

Since last week NET_ASSERT_LOCKED() is checking if the calling thread
owns the (write) lock or if it is hold by at least one reader.

Without changing our rwlock implementation there's no way to check if
the calling thread is holding a read lock.  That would be definitively
useful.

> Since we are going to 'soften' the NET_LOCK() to R-lock for
> some tasks/threads the NET_ASSERT_LOCKED() will become invalid
> (false positive) assertion for functions, which are being grabbed
> occasionally as readers and other times as writers (?typically in
> local outbound path?). I've seen such smoke already in my diffs
> I'm working on currently.
> 
> So I'd like to know, what's the plan for NET_ASSERT_LOCKED() macro?
> 
>   a) are we going to relax it, so it will test if the net_lock is
>   locked?

Yep, that's already done.

>   b) are we going to keep it, but a new 'soft' assert will be introduced
>   e.g. NET_ASSERT_ALOCKED() (A for any lock)?

We should turn NET_ASSERT_LOCKED() macros that require a write lock to
NET_ASSERT_WLOCKED().  But that's mostly in the ioctl(2) path.

>   c) add parameter to current NET_ASSERT_LOCKED() stating desired
>   lock level: 0 - unlocked, 1 - reader, 2 - writer

I don't care how the API looks like, I'd just argue for coherency
between all or locks.

> As I've said the patch looks good to me as-is and should go in. I just
> would like to hear some greater plan for NET_ASSERT_LOCKED(). Perhaps
> we should go for it before we will be further playing with parallel
> softnet.

I'd love is somebody could do the changes in the rwlock API to let us
check if we're holding a lock.  Maybe this is already present in
witness(4), visa?



Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec

2017-11-13 Thread Martin Pieuchot
On 13/11/17(Mon) 10:03, Stuart Henderson wrote:
> On 2017/11/13 08:44, Martin Pieuchot wrote:
> > On 12/11/17(Sun) 22:10, Stuart Henderson wrote:
> > > On 2017/11/12 22:48, Martin Pieuchot wrote:
> > > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote:
> > > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th:
> > > > 
> > > > Weird, did you tweak "kern.splassert" on this box?   Otherwise is looks
> > > > like a major corruption.
> > > 
> > > It would have kern.splassert=2. (I know this can cause problems
> > > sometimes, though this would be the first time in 5+ years I've bumped
> > > into it, most of my routers where I have serial console have this set).
> > 
> > Well the panic below correspond to a value of 0 or > 3.
> 
> Confirmed, it was definitely set to 2.

So it seems that two of your CPU end up looking at/dealing with
corrupted memory...

> > > I'm trying to get more information because it had either hanged or
> > > panicked previously (it didn't have serial connected at the time and
> > > the machine was needed so it had to be rebooted before I had chance
> > > to dig into it).
> > 
> > From which snapshot was the kernel that hanged or panic'd?
> > 
> 
> It was running this:
> 
> OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov  7 18:41:54 MST 2017
> 
> I've got it onto a remote control PDU now, now looking for some machine
> with an old enough ssh client to be able to connect to the PDU :-|
> 
> Which kernel would be most useful to run now?

-current

> I have now moved it to -current GENERIC.MP with the "fast path chunk
> removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect
> as maybe having some issues.

That's perfect from my point of view.



Re: un-KERNEL_LOCK() TCP/UDP input & co

2017-11-13 Thread Alexandr Nedvedicky
Hello,

your change looks good to me. Though I have a comment/question to your diff.


> Index: net/if_vxlan.c
> ===
> RCS file: /cvs/src/sys/net/if_vxlan.c,v
> retrieving revision 1.63
> diff -u -p -r1.63 if_vxlan.c
> --- net/if_vxlan.c25 Oct 2017 09:24:09 -  1.63
> +++ net/if_vxlan.c8 Nov 2017 14:49:58 -
> @@ -611,6 +611,7 @@ vxlan_lookup(struct mbuf *m, struct udph
>   vni = VXLAN_VNI_UNSET;
>   }
>  
> + NET_ASSERT_LOCKED();
>   /* First search for a vxlan(4) interface with the packet's VNI */
>   LIST_FOREACH(sc, _tagh[VXLAN_TAGHASH(vni)], sc_entry) {
>   if ((uh->uh_dport == sc->sc_dstport) &&
> Index: net/pf.c

I think we are approaching point, which requires us to revisit
NET_ASSERT_LOCKED(). The assert currently tests caller is writer
on net_lock.

Since we are going to 'soften' the NET_LOCK() to R-lock for
some tasks/threads the NET_ASSERT_LOCKED() will become invalid
(false positive) assertion for functions, which are being grabbed
occasionally as readers and other times as writers (?typically in
local outbound path?). I've seen such smoke already in my diffs
I'm working on currently.

So I'd like to know, what's the plan for NET_ASSERT_LOCKED() macro?

a) are we going to relax it, so it will test if the net_lock is
locked?

b) are we going to keep it, but a new 'soft' assert will be introduced
e.g. NET_ASSERT_ALOCKED() (A for any lock)?

c) add parameter to current NET_ASSERT_LOCKED() stating desired
lock level: 0 - unlocked, 1 - reader, 2 - writer

I'm leaning towards option b) introduce a new assertion for cases
where we require lock, but we don't care if it is reader/writer.

As I've said the patch looks good to me as-is and should go in. I just
would like to hear some greater plan for NET_ASSERT_LOCKED(). Perhaps
we should go for it before we will be further playing with parallel
softnet.

thanks a lot
regards
sasha



Re: try to bundle multiple packets on an ifq before sending

2017-11-13 Thread David Gwynne

> On 13 Nov 2017, at 5:37 pm, Martin Pieuchot  wrote:
> 
> On 13/11/17(Mon) 10:56, David Gwynne wrote:
>> On Sun, Nov 12, 2017 at 02:45:05PM +0100, Martin Pieuchot wrote:
>>> [...]
>>> We're currently using net_tq() to distribute load for incoming packets.
>>> So I believe you should schedule the task on the current taskq or the
>>> first one if coming from userland.
>> 
>> /em shrug.
>> 
>> i dont know how to tell what the current softnet tq is.
> 
> That's something we need to know.  This will allow to have per taskq
> data structure.

maybe we could use curproc()->p_spare as an index into the array of softnets.

> 
>>would it
>> be enough to simply not mix if the ifq_idx into the argument to
>> net_tq?
> 
> Maybe.  If a CPU processing packets from myx0 in softnet0 and want to
> send forward via myx1, using net_tq() like you do it know will pick
> softnet1.
> My suggestion was to keep softnet1 busy with processing packets coming
> from myx1 and enqueue the task on softnet0.  This way no inter CPU
> communication is needed.
> But without testing we can't tell if it's more efficient the other way.

scheduling the bundled start routine as a task on the current softnet would 
require each ifq to allocate a task for each softnet, and would probably result 
in a lot of thrashing if multiple threads are pushing packets onto a ring. one 
of them would have to remove the work from all of them.

i think this is good enough for now. my tests with 2 softnets mixed in with 
this diff seemed ok. with 8 it sucked cos there was so much waiting on the 
netlock, so it's hard to say what the effect of this is.

> 
  because ifq work could now be pending in a softnet taskq,
 ifq_barrier also needs to put a barrier in the taskq. this is
 implemented using taskq_barrier, which i wrote ages ago but didn't
 have a use case for at the time.
>> 
>> both hrvoje and i hit a deadlock when downing an interface. if
>> softnet (or softnets) are waiting on the netlock that and ioctl
>> path holds, then an ifq_barrier in that ioctl path will sleep forever
>> waiting for a task to run in softnets that are waiting for the
>> netlock.
>> 
>> to mitigate this ive added code like what uvm has. thoughts?
> 
> That's only called in ioctl(2) path, right?  Which commands end up
> there?

ifq_barrier is generally called from the drv_stop/drv_down path by mpsafe 
drivers. if you go ifconfig down is the most obvious. other ioctl paths include 
those that generate ENETRESET, eg, configuring lladdr. some drivers also call 
their down routine during its own reconfiguration, or to fix a stuck ring.

this diff also adds a call to ifq_barrier inside ifq_destroy, which is called 
when an interface is detached.

> The problem with driver *_ioctl() routines is that pseudo-driver need
> the NET_LOCK() while the others should not.  So I'd rather see the
> NET_LOCK() released before calling ifp->if_ioctl().
> 
>> Index: share/man/man9/task_add.9
>> ===
>> RCS file: /cvs/src/share/man/man9/task_add.9,v
>> retrieving revision 1.16
>> diff -u -p -r1.16 task_add.9
>> --- share/man/man9/task_add.914 Sep 2015 15:14:55 -  1.16
>> +++ share/man/man9/task_add.913 Nov 2017 00:46:17 -
>> @@ -20,6 +20,7 @@
>> .Sh NAME
>> .Nm taskq_create ,
>> .Nm taskq_destroy ,
>> +.Nm taskq_barrier ,
>> .Nm task_set ,
>> .Nm task_add ,
>> .Nm task_del ,
>> @@ -37,6 +38,8 @@
>> .Ft void
>> .Fn taskq_destroy "struct taskq *tq"
>> .Ft void
>> +.Fn taskq_barrier "struct taskq *tq"
>> +.Ft void
>> .Fn task_set "struct task *t" "void (*fn)(void *)" "void *arg"
>> .Ft int
>> .Fn task_add "struct taskq *tq" "struct task *t"
>> @@ -88,6 +91,15 @@ Calling
>> against the system taskq is an error and will lead to undefined
>> behaviour or a system fault.
>> .Pp
>> +.Fn taskq_barrier
>> +guarantees that any task that was running on the
>> +.Fa tq
>> +taskq when the barrier was called has finished by the time the barrier
>> +returns.
>> +.Fn taskq_barrier
>> +is only supported on taskqs serviced by 1 thread,
>> +and may not be called by a task running in the specified taskq.
>> +.Pp
>> It is the responsibility of the caller to provide the
>> .Fn task_set ,
>> .Fn task_add ,
>> @@ -163,6 +175,8 @@ argument given in
>> and
>> .Fn taskq_destroy
>> can be called during autoconf, or from process context.
>> +.Fn taskq_barrier
>> +can be called from process context.
>> .Fn task_set ,
>> .Fn task_add ,
>> and
>> Index: sys/sys/task.h
>> ===
>> RCS file: /cvs/src/sys/sys/task.h,v
>> retrieving revision 1.11
>> diff -u -p -r1.11 task.h
>> --- sys/sys/task.h   7 Jun 2016 07:53:33 -   1.11
>> +++ sys/sys/task.h   13 Nov 2017 00:46:17 -
>> @@ -43,6 +43,7 @@ extern struct taskq *const systqmp;
>> 
>> struct taskq *taskq_create(const char *, unsigned 

Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec

2017-11-13 Thread Stuart Henderson
On 2017/11/13 08:44, Martin Pieuchot wrote:
> On 12/11/17(Sun) 22:10, Stuart Henderson wrote:
> > On 2017/11/12 22:48, Martin Pieuchot wrote:
> > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote:
> > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th:
> > > 
> > > Weird, did you tweak "kern.splassert" on this box?   Otherwise is looks
> > > like a major corruption.
> > 
> > It would have kern.splassert=2. (I know this can cause problems
> > sometimes, though this would be the first time in 5+ years I've bumped
> > into it, most of my routers where I have serial console have this set).
> 
> Well the panic below correspond to a value of 0 or > 3.

Confirmed, it was definitely set to 2.

> > > > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: 
> > > > file "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310
> > > ^^^
> > > Looks like one CPU is triggering this.
> > > 
> > > > splassert: soassertlocked: want 1 have 256
> > > > 
> > > > panic: spl assertion failure in soassertlocked
> > > ^^^
> > > That can't be coming from the same CPU..
> > > 
> > > 
> > > 
> > > 
> > > > Starting stack trace...
> > > > Faulted in traceback, aborting...
> > > > panic(splassert: if_down: want 1 have 256
> > > > panic: spl assertion failure in if_down) at
> > > > Faulted in traceback, aborting...
> > > > panicsplassert: if_down: want 1 have 256
> > > > +0x133panic: spl assertion failure in if_down
> > > > Faulted in traceback, aborting...
> > > > 
> > > > 
> > > > 
> > > > It's stuck at this point, I can't enter ddb.
> > > 
> > > Are you running with WITNESS on purpose?  Can you reproduce such problem
> > > without it?  I'm not saying it's WITNESS fault, but it's clear that
> > > WITNESS kernels aren't ready for production yet.
> > > 
> > 
> > I'm trying to get more information because it had either hanged or
> > panicked previously (it didn't have serial connected at the time and
> > the machine was needed so it had to be rebooted before I had chance
> > to dig into it).
> 
> From which snapshot was the kernel that hanged or panic'd?
> 

It was running this:

OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov  7 18:41:54 MST 2017

I've got it onto a remote control PDU now, now looking for some machine
with an old enough ssh client to be able to connect to the PDU :-|

Which kernel would be most useful to run now?

I have now moved it to -current GENERIC.MP with the "fast path chunk
removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect
as maybe having some issues.