Re: libcrypto for powerpc g5 xonly

2023-01-31 Thread Theo de Raadt
All the functions in libcrypto need to be fixed, or for the ones which
are not fixed, they need to be disabled to use the C versions instead.

There should be no broken functions in the library.  It's not about
what you manage to use, it's about what something else will eventually
use.

The same can be done in the ports tree.

It is a rarely used architecture, so I think a brutish approach is fine.



libcrypto for powerpc g5 xonly

2023-01-31 Thread George Koehler
OpenBSD/macppc can enforce xonly on the PowerPC G5.  libcrypto linked
with cc -Wl,--execute-only will SIGSEGV as the PowerPC asm of sha256
tries to read a table from text.  The fix is to move the table to
rodata.  To find the table, I would do

bcl 20, 31, 1f
1:  mflr%r7
addis   %r7, %r7, .Ltable-1b@ha
addi%r7, %r7, .Ltable-1b@l

This diff does so in perlasm syntax.  The literal "@ha" and "@l" in
this diff are for an ELF platform (like OpenBSD) and might break the
build for AIX or Mac OS, but I suspect that nobody builds this asm
for those platforms.  (PowerPC Mac OS is long obsolete, ended at
Mac OS X 10.5.8.)  If someone wants to try the PowerPC asm on a
not-ELF platform, please tell me.

aes-ppc.pl would have the same problem, but we don't use aes-ppc.pl,
so I provide no fix.  ports/security/openssl/{1.0.2,1.1,3.0} has
copies of aes-ppc.pl and sha512-ppc.pl with the same problem, but
doesn't enable them on OpenBSD, so I don't plan to edit them.

sha512-ppc.pl can emit code for sha256 or sha512, but we only use it
for sha256.  The code uses simple ops (add, subtract, bit logic,
bit rotation), nothing more fancy.  I don't know why it runs faster
than the (not asm) sha256 in ports/security/openssl.

ok for this diff in src/lib/libcrypto?

--George

Index: sha/asm/sha512-ppc.pl
===
RCS file: /cvs/src/lib/libcrypto/sha/asm/sha512-ppc.pl,v
retrieving revision 1.3
diff -u -p -r1.3 sha512-ppc.pl
--- sha/asm/sha512-ppc.pl   14 Nov 2015 14:53:13 -  1.3
+++ sha/asm/sha512-ppc.pl   31 Jan 2023 22:03:47 -
@@ -220,8 +220,11 @@ $func:
$LD $G,`6*$SZ`($ctx)
$LD $H,`7*$SZ`($ctx)
 
-   bl  LPICmeup
-LPICedup:
+   bcl 20,31,Lpc
+Lpc:
+   mflr$Tbl
+   addis   $Tbl,$Tbl,Ltable-Lpc\@ha
+   addi$Tbl,$Tbl,Ltable-Lpc\@l
andi.   r0,$inp,3
bne Lunaligned
 Laligned:
@@ -377,22 +380,8 @@ $code.=<<___;
blr
.long   0
.byte   0,12,0x14,0,0,0,0,0
-___
-
-# Ugly hack here, because PPC assembler syntax seem to vary too
-# much from platforms to platform...
-$code.=<<___;
-.align 6
-LPICmeup:
-   mflrr0
-   bcl 20,31,\$+4
-   mflr$Tbl; vv "distance" between . and 1st data entry
-   addi$Tbl,$Tbl,`64-8`
-   mtlrr0
-   blr
-   .long   0
-   .byte   0,12,0x14,0,0,0,0,0
-   .space  `64-9*4`
+   .rodata
+Ltable:
 ___
 $code.=<<___ if ($SZ==8);
.long   0x428a2f98,0xd728ae22,0x71374491,0x23ef65cd



Re: refactor mbuf parsing on driver level

2023-01-31 Thread Jan Klemkow
On Tue, Jan 31, 2023 at 09:12:51PM +0100, Christian Weisgerber wrote:
> Jan Klemkow:
> 
> >  - I turned the KASSERTS to returns.
> >  - Check if the mbuf is large enough for an ether header.
> >  - additionally #ifdef'd INET6 around the ip6_hdr in the new struct
> 
> For non-initial fragments of TCP/UDP packets, ether_extract_headers()
> will create ext.tcp/ext.udp pointers that do not point to a protocol
> header.  Should there be a check to exclude fragments?

yes.  bluhm also suggested this solution to me.

ok?

Thanks,
Jan

Index: dev/pci/if_ix.c
===
RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.191
diff -u -p -r1.191 if_ix.c
--- dev/pci/if_ix.c 26 Jan 2023 07:32:39 -  1.191
+++ dev/pci/if_ix.c 31 Jan 2023 21:05:40 -
@@ -2477,25 +2477,16 @@ static inline int
 ixgbe_csum_offload(struct mbuf *mp, uint32_t *vlan_macip_lens,
 uint32_t *type_tucmd_mlhl, uint32_t *olinfo_status)
 {
-   struct ether_header *eh = mtod(mp, struct ether_header *);
-   struct mbuf *m;
-   int hoff;
+   struct ether_extracted ext;
int offload = 0;
uint32_t iphlen;
-   uint8_t ipproto;
 
-   *vlan_macip_lens |= (sizeof(*eh) << IXGBE_ADVTXD_MACLEN_SHIFT);
+   ether_extract_headers(mp, );
 
-   switch (ntohs(eh->ether_type)) {
-   case ETHERTYPE_IP: {
-   struct ip *ip;
+   *vlan_macip_lens |= (sizeof(*ext.eh) << IXGBE_ADVTXD_MACLEN_SHIFT);
 
-   m = m_getptr(mp, sizeof(*eh), );
-   KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip));
-   ip = (struct ip *)(mtod(m, caddr_t) + hoff);
-
-   iphlen = ip->ip_hl << 2;
-   ipproto = ip->ip_p;
+   if (ext.ip4) {
+   iphlen = ext.ip4->ip_hl << 2;
 
if (ISSET(mp->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT)) {
*olinfo_status |= IXGBE_TXD_POPTS_IXSM << 8;
@@ -2503,46 +2494,30 @@ ixgbe_csum_offload(struct mbuf *mp, uint
}
 
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
-   break;
-   }
-
 #ifdef INET6
-   case ETHERTYPE_IPV6: {
-   struct ip6_hdr *ip6;
-
-   m = m_getptr(mp, sizeof(*eh), );
-   KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip6));
-   ip6 = (struct ip6_hdr *)(mtod(m, caddr_t) + hoff);
-
-   iphlen = sizeof(*ip6);
-   ipproto = ip6->ip6_nxt;
+   } else if (ext.ip6) {
+   iphlen = sizeof(*ext.ip6);
 
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV6;
-   break;
-   }
 #endif
-
-   default:
+   } else {
return offload;
}
 
*vlan_macip_lens |= iphlen;
 
-   switch (ipproto) {
-   case IPPROTO_TCP:
+   if (ext.tcp) {
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
if (ISSET(mp->m_pkthdr.csum_flags, M_TCP_CSUM_OUT)) {
*olinfo_status |= IXGBE_TXD_POPTS_TXSM << 8;
offload = 1;
}
-   break;
-   case IPPROTO_UDP:
+   } else if (ext.udp) {
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP;
if (ISSET(mp->m_pkthdr.csum_flags, M_UDP_CSUM_OUT)) {
*olinfo_status |= IXGBE_TXD_POPTS_TXSM << 8;
offload = 1;
}
-   break;
}
 
return offload;
Index: dev/pci/if_ixl.c
===
RCS file: /cvs/src/sys/dev/pci/if_ixl.c,v
retrieving revision 1.86
diff -u -p -r1.86 if_ixl.c
--- dev/pci/if_ixl.c26 Jan 2023 07:32:39 -  1.86
+++ dev/pci/if_ixl.c31 Jan 2023 21:05:40 -
@@ -2784,10 +2784,8 @@ ixl_load_mbuf(bus_dma_tag_t dmat, bus_dm
 static uint64_t
 ixl_tx_setup_offload(struct mbuf *m0)
 {
-   struct mbuf *m;
-   int hoff;
+   struct ether_extracted ext;
uint64_t hlen;
-   uint8_t ipproto;
uint64_t offload = 0;
 
if (ISSET(m0->m_flags, M_VLANTAG)) {
@@ -2800,39 +2798,21 @@ ixl_tx_setup_offload(struct mbuf *m0)
M_IPV4_CSUM_OUT|M_TCP_CSUM_OUT|M_UDP_CSUM_OUT))
return (offload);
 
-   switch (ntohs(mtod(m0, struct ether_header *)->ether_type)) {
-   case ETHERTYPE_IP: {
-   struct ip *ip;
-
-   m = m_getptr(m0, ETHER_HDR_LEN, );
-   KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip));
-   ip = (struct ip *)(mtod(m, caddr_t) + hoff);
+   ether_extract_headers(m0, );
 
+   if (ext.ip4) {
offload |= ISSET(m0->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT) ?
IXL_TX_DESC_CMD_IIPT_IPV4_CSUM :
IXL_TX_DESC_CMD_IIPT_IPV4;
  
-   hlen = ip->ip_hl << 2;
-   ipproto = ip->ip_p;
-   break;
-   }
-
+

Re: refactor mbuf parsing on driver level

2023-01-31 Thread Christian Weisgerber
Jan Klemkow:

>  - I turned the KASSERTS to returns.
>  - Check if the mbuf is large enough for an ether header.
>  - additionally #ifdef'd INET6 around the ip6_hdr in the new struct

For non-initial fragments of TCP/UDP packets, ether_extract_headers()
will create ext.tcp/ext.udp pointers that do not point to a protocol
header.  Should there be a check to exclude fragments?

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: Replace selwakeup() with KNOTE() in tun(4) and tap(4)

2023-01-31 Thread Vitaliy Makkoveev
> On 30 Jan 2023, at 06:39, Visa Hankala  wrote:
> 
> Replace selwakeup() with KNOTE() in tun(4) and tap(4).
> 
> This patch makes the tun(4) and tap(4) event filters MP-safe.
> 
> This is similar to the change that just got committed to pppac(4)
> and pppx(4). However, tun(4) and tap(4) can be destroyed abruptly,
> so klist_invalidate() has to be kept in tun_clone_destroy().
> 
> The selwakeup() call in tun_dev_close() can be removed. If the device
> is closed peacefully, the klists get cleared automatically and waiters
> notified before the close routine is invoked. On abrupt detach,
> klist_invalidate() in tun_clone_destroy() should clear any lingering
> knotes.
> 
> OK?
> 

ok mvs@



Re: Replace selwakeup() with KNOTE() in tun(4) and tap(4)

2023-01-31 Thread Vitaliy Makkoveev
On Tue, Jan 31, 2023 at 06:21:01PM +, Visa Hankala wrote:
> On Mon, Jan 30, 2023 at 08:34:29PM +0300, Vitaliy Makkoveev wrote:
> > > On 30 Jan 2023, at 06:39, Visa Hankala  wrote:
> > > 
> > > Replace selwakeup() with KNOTE() in tun(4) and tap(4).
> > > 
> > > This patch makes the tun(4) and tap(4) event filters MP-safe.
> > > 
> > > This is similar to the change that just got committed to pppac(4)
> > > and pppx(4). However, tun(4) and tap(4) can be destroyed abruptly,
> > > so klist_invalidate() has to be kept in tun_clone_destroy().
> > > 
> > > The selwakeup() call in tun_dev_close() can be removed. If the device
> > > is closed peacefully, the klists get cleared automatically and waiters
> > > notified before the close routine is invoked. On abrupt detach,
> > > klist_invalidate() in tun_clone_destroy() should clear any lingering
> > > knotes.
> > > 
> > > OK?
> > > 
> > 
> > Does it make sense to introduce something like KNOTE_UNLOCKED()
> > to push lock acquisition within?
> 
> I have not been keen to add a new variant of KNOTE() because the common
> pattern is, or at least has been, that the klist lock is already held
> when KNOTE() is called.
> 
> The idea is that the klist is protected by the lock that also covers the
> related object, if possible. Examples of this are audio_lock, pipe_lock,
> and solock.
>

We have the new case, where we have no related objects like in
pppx/pppac and tun/tap:

mtx_enter(>sc_mtx);
KNOTE(>sc_rklist, 0);
mtx_leave(>sc_mtx);

So I propose to move lock acquisition within KNOTE(9). This looks
reasonable, if such cases appeared many times.



Re: Move duplicating initialization to soalloc()

2023-01-31 Thread Vitaliy Makkoveev
On Tue, Jan 31, 2023 at 06:00:45PM +, Visa Hankala wrote:
> On Tue, Jan 31, 2023 at 12:44:47PM +0300, Vitaliy Makkoveev wrote:
> > Since we have soalloc() to do common socket initialization, move the
> > rest within. I mostly need to do this because standalone socket's buffer
> > locking require to introduce another klistops data for buffers and there
> > is no reason to add more copypaste to sonewconn().
> > 
> > Also this makes `socket_klistops' private to kern/uipc_socket.c
> > 
> > @@ -226,9 +225,6 @@ sonewconn(struct socket *head, int conns
> > so->so_rcv.sb_lowat = head->so_rcv.sb_lowat;
> > so->so_rcv.sb_timeo_nsecs = head->so_rcv.sb_timeo_nsecs;
> >  
> > -   klist_init(>so_rcv.sb_klist, _klistops, so);
> > -   klist_init(>so_snd.sb_klist, _klistops, so);
> > -   sigio_init(>so_sigio);
> > sigio_copy(>so_sigio, >so_sigio);
> 
> With this change, something should call klist_free() and sigio_free()
> for 'so' if soreserve() fails in sonewconn().
> 

klist_init() and sigio_init() alloc nothing, but for consistency reason
they shold.

I like to do this in the combined error path for soneconn() and
pru_attach().

Index: sys/kern/uipc_socket.c
===
RCS file: /cvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.299
diff -u -p -r1.299 uipc_socket.c
--- sys/kern/uipc_socket.c  27 Jan 2023 21:01:59 -  1.299
+++ sys/kern/uipc_socket.c  31 Jan 2023 18:44:57 -
@@ -112,6 +112,16 @@ const struct filterops soexcept_filtops 
.f_process  = filt_soprocess,
 };
 
+void   klist_soassertlk(void *);
+intklist_solock(void *);
+void   klist_sounlock(void *, int);
+
+const struct klistops socket_klistops = {
+   .klo_assertlk   = klist_soassertlk,
+   .klo_lock   = klist_solock,
+   .klo_unlock = klist_sounlock,
+};
+
 #ifndef SOMINCONN
 #define SOMINCONN 80
 #endif /* SOMINCONN */
@@ -148,6 +158,11 @@ soalloc(int wait)
return (NULL);
rw_init_flags(>so_lock, "solock", RWL_DUPOK);
refcnt_init(>so_refcnt);
+   klist_init(>so_rcv.sb_klist, _klistops, so);
+   klist_init(>so_snd.sb_klist, _klistops, so);
+   sigio_init(>so_sigio);
+   TAILQ_INIT(>so_q0);
+   TAILQ_INIT(>so_q);
 
return (so);
 }
@@ -176,11 +191,6 @@ socreate(int dom, struct socket **aso, i
if (prp->pr_type != type)
return (EPROTOTYPE);
so = soalloc(M_WAIT);
-   klist_init(>so_rcv.sb_klist, _klistops, so);
-   klist_init(>so_snd.sb_klist, _klistops, so);
-   sigio_init(>so_sigio);
-   TAILQ_INIT(>so_q0);
-   TAILQ_INIT(>so_q);
so->so_type = type;
if (suser(p) == 0)
so->so_state = SS_PRIV;
@@ -2333,12 +2343,6 @@ klist_sounlock(void *arg, int ls)
 
sounlock(so);
 }
-
-const struct klistops socket_klistops = {
-   .klo_assertlk   = klist_soassertlk,
-   .klo_lock   = klist_solock,
-   .klo_unlock = klist_sounlock,
-};
 
 #ifdef DDB
 void
Index: sys/kern/uipc_socket2.c
===
RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.134
diff -u -p -r1.134 uipc_socket2.c
--- sys/kern/uipc_socket2.c 27 Jan 2023 18:46:34 -  1.134
+++ sys/kern/uipc_socket2.c 31 Jan 2023 18:44:57 -
@@ -41,7 +41,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 /*
@@ -213,12 +212,8 @@ sonewconn(struct socket *head, int conns
/*
 * Inherit watermarks but those may get clamped in low mem situations.
 */
-   if (soreserve(so, head->so_snd.sb_hiwat, head->so_rcv.sb_hiwat)) {
-   if (persocket)
-   sounlock(so);
-   pool_put(_pool, so);
-   return (NULL);
-   }
+   if (soreserve(so, head->so_snd.sb_hiwat, head->so_rcv.sb_hiwat))
+   goto error;
so->so_snd.sb_wat = head->so_snd.sb_wat;
so->so_snd.sb_lowat = head->so_snd.sb_lowat;
so->so_snd.sb_timeo_nsecs = head->so_snd.sb_timeo_nsecs;
@@ -226,9 +221,6 @@ sonewconn(struct socket *head, int conns
so->so_rcv.sb_lowat = head->so_rcv.sb_lowat;
so->so_rcv.sb_timeo_nsecs = head->so_rcv.sb_timeo_nsecs;
 
-   klist_init(>so_rcv.sb_klist, _klistops, so);
-   klist_init(>so_snd.sb_klist, _klistops, so);
-   sigio_init(>so_sigio);
sigio_copy(>so_sigio, >so_sigio);
 
soqinsque(head, so, 0);
@@ -259,13 +251,7 @@ sonewconn(struct socket *head, int conns
 
if (error) {
soqremque(so, 0);
-   if (persocket)
-   sounlock(so);
-   sigio_free(>so_sigio);
-   klist_free(>so_rcv.sb_klist);
-   klist_free(>so_snd.sb_klist);
-   pool_put(_pool, so);
-   return (NULL);
+   goto error;
}
 
if (connstatus) {
@@ -280,6 +266,16 @@ sonewconn(struct socket 

timecounting: remove incomplete PPS support

2023-01-31 Thread Scott Cheloha
When the timecounting code was ported from FreeBSD in 2004 [1], stubs
for pulse-per-second (PPS) polling were brought in but left disabled.
They remain disabled [2]:

1.1   tholo 710:
711: #ifdef notyet
712:/*
713: * Hardware latching timecounters may not 
generate interrupts on
714: * PPS events, so instead we poll them.  There 
is a finite risk that
715: * the hardware might capture a count which is 
later than the one we
716: * got above, and therefore possibly in the 
next NTP second which might
717: * have a different rate than the current NTP 
second.  It doesn't
718: * matter in practice.
719: */
720:if (tho->th_counter->tc_poll_pps)
721:
tho->th_counter->tc_poll_pps(tho->th_counter);
722: #endif

It's been almost two decades.  I don't expect anyone to finish adding
support, so let's remove the stubs.

This patch gets rid of the tc_poll_pps symbol.

otto: No plans to use this?

ok?

[1] http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/kern_tc.c?annotate=1.1
[2] 
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/kern_tc.c?annotate=1.81

Index: sys/sys/timetc.h
===
RCS file: /cvs/src/sys/sys/timetc.h,v
retrieving revision 1.13
diff -u -p -r1.13 timetc.h
--- sys/sys/timetc.h12 Aug 2022 02:20:36 -  1.13
+++ sys/sys/timetc.h31 Jan 2023 18:12:57 -
@@ -44,7 +44,6 @@
 
 struct timecounter;
 typedef u_int timecounter_get_t(struct timecounter *);
-typedef void timecounter_pps_t(struct timecounter *);
 
 /*
  * Locks used to protect struct members in this file:
@@ -59,13 +58,6 @@ struct timecounter {
 * This function reads the counter.  It is not required to
 * mask any unimplemented bits out, as long as they are
 * constant.
-*/
-   timecounter_pps_t   *tc_poll_pps;   /* [I] */
-   /*
-* This function is optional.  It will be called whenever the
-* timecounter is rewound, and is intended to check for PPS
-* events.  Normal hardware does not need it but timecounters
-* which latch PPS in hardware (like sys/pci/xrpu.c) do.
 */
u_int   tc_counter_mask;/* [I] */
/* This mask should mask off any unimplemented bits. */
Index: sys/kern/kern_tc.c
===
RCS file: /cvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.81
diff -u -p -r1.81 kern_tc.c
--- sys/kern/kern_tc.c  13 Dec 2022 17:30:36 -  1.81
+++ sys/kern/kern_tc.c  31 Jan 2023 18:12:57 -
@@ -56,7 +56,6 @@ dummy_get_timecount(struct timecounter *
 
 static struct timecounter dummy_timecounter = {
.tc_get_timecount = dummy_get_timecount,
-   .tc_poll_pps = NULL,
.tc_counter_mask = ~0u,
.tc_frequency = 100,
.tc_name = "dummy",
@@ -707,19 +706,6 @@ tc_windup(struct bintime *new_boottime, 
naptime = th->th_naptime.sec;
th->th_offset = *new_offset;
}
-
-#ifdef notyet
-   /*
-* Hardware latching timecounters may not generate interrupts on
-* PPS events, so instead we poll them.  There is a finite risk that
-* the hardware might capture a count which is later than the one we
-* got above, and therefore possibly in the next NTP second which might
-* have a different rate than the current NTP second.  It doesn't
-* matter in practice.
-*/
-   if (tho->th_counter->tc_poll_pps)
-   tho->th_counter->tc_poll_pps(tho->th_counter);
-#endif
 
/*
 * If changing the boot time or clock adjustment, do so before
Index: share/man/man9/tc_init.9
===
RCS file: /cvs/src/share/man/man9/tc_init.9,v
retrieving revision 1.10
diff -u -p -r1.10 tc_init.9
--- share/man/man9/tc_init.917 Jan 2023 10:10:11 -  1.10
+++ share/man/man9/tc_init.931 Jan 2023 18:12:57 -
@@ -46,7 +46,6 @@ structure:
 .Bd -literal -offset indent
 struct timecounter {
timecounter_get_t   *tc_get_timecount;
-   timecounter_pps_t   *tc_poll_pps;
u_int   tc_counter_mask;
u_int64_t   tc_frequency;
char*tc_name;
@@ -64,12 +63,6 @@ structure are described below.
 This function reads the counter.
 It is not required to mask any unimplemented bits out, as long as they
 are constant.
-.It Ft void Fn (*tc_poll_pps) "struct timecounter *"
-This function is optional and can be set to 

Re: Replace selwakeup() with KNOTE() in tun(4) and tap(4)

2023-01-31 Thread Visa Hankala
On Mon, Jan 30, 2023 at 08:34:29PM +0300, Vitaliy Makkoveev wrote:
> > On 30 Jan 2023, at 06:39, Visa Hankala  wrote:
> > 
> > Replace selwakeup() with KNOTE() in tun(4) and tap(4).
> > 
> > This patch makes the tun(4) and tap(4) event filters MP-safe.
> > 
> > This is similar to the change that just got committed to pppac(4)
> > and pppx(4). However, tun(4) and tap(4) can be destroyed abruptly,
> > so klist_invalidate() has to be kept in tun_clone_destroy().
> > 
> > The selwakeup() call in tun_dev_close() can be removed. If the device
> > is closed peacefully, the klists get cleared automatically and waiters
> > notified before the close routine is invoked. On abrupt detach,
> > klist_invalidate() in tun_clone_destroy() should clear any lingering
> > knotes.
> > 
> > OK?
> > 
> 
> Does it make sense to introduce something like KNOTE_UNLOCKED()
> to push lock acquisition within?

I have not been keen to add a new variant of KNOTE() because the common
pattern is, or at least has been, that the klist lock is already held
when KNOTE() is called.

The idea is that the klist is protected by the lock that also covers the
related object, if possible. Examples of this are audio_lock, pipe_lock,
and solock.

klist_insert() and klist_remove() did not touch locks initially.
The locking was added when it turned out that there would be repetition
in very many places otherwise.

If a new flavor of KNOTE() is really wanted, I would rather cook up
a patch that renames KNOTE() to KNOTE_LOCKED() and adds KNOTE() that
acquires the klist lock internally. This way the naming would remain
consistent with the rest of the klist functions.



Re: Move duplicating initialization to soalloc()

2023-01-31 Thread Visa Hankala
On Tue, Jan 31, 2023 at 12:44:47PM +0300, Vitaliy Makkoveev wrote:
> Since we have soalloc() to do common socket initialization, move the
> rest within. I mostly need to do this because standalone socket's buffer
> locking require to introduce another klistops data for buffers and there
> is no reason to add more copypaste to sonewconn().
> 
> Also this makes `socket_klistops' private to kern/uipc_socket.c
> 
> Index: sys/kern/uipc_socket.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_socket.c,v
> retrieving revision 1.299
> diff -u -p -r1.299 uipc_socket.c
> --- sys/kern/uipc_socket.c27 Jan 2023 21:01:59 -  1.299
> +++ sys/kern/uipc_socket.c31 Jan 2023 09:16:04 -
> @@ -112,6 +112,16 @@ const struct filterops soexcept_filtops 
>   .f_process  = filt_soprocess,
>  };
>  
> +void klist_soassertlk(void *);
> +int  klist_solock(void *);
> +void klist_sounlock(void *, int);
> +
> +const struct klistops socket_klistops = {
> + .klo_assertlk   = klist_soassertlk,
> + .klo_lock   = klist_solock,
> + .klo_unlock = klist_sounlock,
> +};
> +
>  #ifndef SOMINCONN
>  #define SOMINCONN 80
>  #endif /* SOMINCONN */
> @@ -148,6 +158,11 @@ soalloc(int wait)
>   return (NULL);
>   rw_init_flags(>so_lock, "solock", RWL_DUPOK);
>   refcnt_init(>so_refcnt);
> + klist_init(>so_rcv.sb_klist, _klistops, so);
> + klist_init(>so_snd.sb_klist, _klistops, so);
> + sigio_init(>so_sigio);
> + TAILQ_INIT(>so_q0);
> + TAILQ_INIT(>so_q);
>  
>   return (so);
>  }
> @@ -176,11 +191,6 @@ socreate(int dom, struct socket **aso, i
>   if (prp->pr_type != type)
>   return (EPROTOTYPE);
>   so = soalloc(M_WAIT);
> - klist_init(>so_rcv.sb_klist, _klistops, so);
> - klist_init(>so_snd.sb_klist, _klistops, so);
> - sigio_init(>so_sigio);
> - TAILQ_INIT(>so_q0);
> - TAILQ_INIT(>so_q);
>   so->so_type = type;
>   if (suser(p) == 0)
>   so->so_state = SS_PRIV;
> @@ -2333,12 +2343,6 @@ klist_sounlock(void *arg, int ls)
>  
>   sounlock(so);
>  }
> -
> -const struct klistops socket_klistops = {
> - .klo_assertlk   = klist_soassertlk,
> - .klo_lock   = klist_solock,
> - .klo_unlock = klist_sounlock,
> -};
>  
>  #ifdef DDB
>  void
> Index: sys/kern/uipc_socket2.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
> retrieving revision 1.134
> diff -u -p -r1.134 uipc_socket2.c
> --- sys/kern/uipc_socket2.c   27 Jan 2023 18:46:34 -  1.134
> +++ sys/kern/uipc_socket2.c   31 Jan 2023 09:16:04 -
> @@ -41,7 +41,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  
>  /*
> @@ -226,9 +225,6 @@ sonewconn(struct socket *head, int conns
>   so->so_rcv.sb_lowat = head->so_rcv.sb_lowat;
>   so->so_rcv.sb_timeo_nsecs = head->so_rcv.sb_timeo_nsecs;
>  
> - klist_init(>so_rcv.sb_klist, _klistops, so);
> - klist_init(>so_snd.sb_klist, _klistops, so);
> - sigio_init(>so_sigio);
>   sigio_copy(>so_sigio, >so_sigio);

With this change, something should call klist_free() and sigio_free()
for 'so' if soreserve() fails in sonewconn().

>  
>   soqinsque(head, so, 0);
> Index: sys/sys/event.h
> ===
> RCS file: /cvs/src/sys/sys/event.h,v
> retrieving revision 1.67
> diff -u -p -r1.67 event.h
> --- sys/sys/event.h   31 Mar 2022 01:41:22 -  1.67
> +++ sys/sys/event.h   31 Jan 2023 09:16:04 -
> @@ -286,7 +286,6 @@ struct timespec;
>  
>  extern const struct filterops sig_filtops;
>  extern const struct filterops dead_filtops;
> -extern const struct klistops socket_klistops;
>  
>  extern void  kqpoll_init(unsigned int);
>  extern void  kqpoll_done(unsigned int);
> 



Re: bgpd: improve RTR error handling

2023-01-31 Thread Claudio Jeker
On Tue, Jan 31, 2023 at 12:13:00PM +, Job Snijders wrote:
> When the RTR's Session ID changes (for example when the RTR server is
> restarted), bgpd would incorreectly branch into the "received %s: bad
> msg len:" path.
> 
> The length fields in the RTR PDU error messages are 32-bits, so we
> should use ntohl() instead of ntohs(). While there, add an additional
> length check against the length listed in the RTR payload.
> 
> The resulting logs are now more useful:
> 
> Jan 31 11:54:39 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> 
> idle, reason: connection open
> Jan 31 11:54:39 feather bgpd[27547]: rtr 127.0.0.1: received error: Corrupt 
> Data: Session ID doesn't match.
> Jan 31 11:54:39 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> 
> closed, reason: connection closed
> Jan 31 11:54:46 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> 
> idle, reason: connection open
> Jan 31 11:54:46 feather bgpd[27547]: rtr 127.0.0.1: received error: Corrupt 
> Data: Session ID doesn't match.
> Jan 31 11:54:46 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> 
> closed, reason: connection closed
> (... this goes on and on)
> 
> OK to commit the below?

A few comments below.
 
> There still is an open question: if we receive "Corrupt Data" (because
> RTR Session ID changed), should bgpd really wait until
> RTR_EVNT_TIMER_EXPIRE? Would it be desirable to somehow be able to
> establish a proper connection sooner?

It should indeed just reset the session_id. But I'm a but confused about
the output. So we issue a Serial Query (since the session_id is set)
and then server responds with an error message. Now this feels like an
implementation error on the server side since it should issue a cache
reset instead.

I don't think we should adjust our code to fix this because the case can
not be distinguished properly on our side. The generic 'corrupt data'
error is triggered by many possible cases. We should not do a full cache
reset in those cases. But the error handling in the RTR RFC is way to
optimistic and the cause of such loops.

> Index: usr.sbin/bgpctl/output.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpctl/output.c,v
> retrieving revision 1.35
> diff -u -p -r1.35 output.c
> --- usr.sbin/bgpctl/output.c  24 Jan 2023 15:50:10 -  1.35
> +++ usr.sbin/bgpctl/output.c  31 Jan 2023 12:00:54 -
> @@ -1083,11 +1083,13 @@ show_rtr(struct ctl_show_rtr *rtr)
>   log_reason(rtr->last_sent_msg));
>   }
>   if (rtr->last_recv_error != NO_ERROR) {
> - printf("Last received error: %s\n",
> + printf(" Last received error: %s",
> log_rtr_error(rtr->last_recv_error));
>   if (rtr->last_recv_msg[0])
>   printf(" with reason \"%s\"",
>   log_reason(rtr->last_recv_msg));
> + else
> + printf("\n");

I think this should just printf("\n"); all the4 time so that the
last_recv_msg printf is also finished with a new line.

I think I wanted to have the extra newline after the first line because
the error messages can be long and so you don't end up with overly long
line, like:
 Last received error: Duplicate Announcement Received with reason "Whatever the 
server thinks"

Please do the same adjustment to the last_sent_error case above. They
should stay the same.

So I would prefer to just add a \n to the 'printf(" with reason \"%s\"",'
string.

>   }
>  
>   printf("\n");
> Index: usr.sbin/bgpd/rtr_proto.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rtr_proto.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 rtr_proto.c
> --- usr.sbin/bgpd/rtr_proto.c 28 Dec 2022 21:30:16 -  1.8
> +++ usr.sbin/bgpd/rtr_proto.c 31 Jan 2023 12:00:55 -
> @@ -635,15 +635,24 @@ rtr_parse_error(struct rtr_session *rs, 
>   uint16_t errcode;
>  
>   memcpy(, buf, sizeof(rh));
> + errcode = ntohs(rh.session_id);
> + rh.length = ntohl(rh.length);
> +
> + if (len != rh.length) {
> + log_warnx("rtr %s: received %s: bad len: %u byte",
> + log_rtr(rs), log_rtr_type(ERROR_REPORT), rh.length);
> + rtr_fsm(rs, RTR_EVNT_CON_CLOSED);
> + return -1;
> + }
> +

I don't think this makes sense. You just compare len which is extracted
from the same header by rtr_parse_header with rh.lenght.
So this can never fail.

>   buf += sizeof(struct rtr_header);
>   len -= sizeof(struct rtr_header);
> - errcode = ntohs(rh.session_id);
>  
>   memcpy(_len, buf, sizeof(pdu_len));
> - pdu_len = ntohs(pdu_len);
> + pdu_len = ntohl(pdu_len);
>  
>   if (len < pdu_len + sizeof(pdu_len)) {
> - log_warnx("rtr %s: received %s: bad pdu len: %u byte",
> + log_warnx("rtr %s: received %s: bad encapped pdu len: %u byte",

I would just go 

Re: PKU ?

2023-01-31 Thread Dave Voutila


And I should have prefaced this with: the reason we have to use PKU is
because it's the only way we can get a read-deny bit on Intel, that
still allows instruction fetches. Otherwise, PROT_EXEC implies PROT_READ.

Dave Voutila  writes:

> Marc Espie  writes:
>
>> I'm curious about the new enforcement strategies. Unfortunately I'm a bit
>> lost in the 1000+ pages of the intel manual.
>
> The protection keys documentation is thin because it's just another
> layer in the rules for paging. I'll try to summarize and I'm sure
> someone will correct me if I'm wrong :)
>
> Its design was to allow userland the ability to change read-deny and
> write-deny permissions without having to go back through a syscall to
> manipulate the page table entries. You'd have the kernel tie a "key" (32
> bit number) to entries in the page table and userland can manage the
> permissions assigned to each key via the PKRU register.
>
> We toss most of that noise out the window.
>
> Instead, we have the kernel enforce that all keys except key=0 (the
> default) have read-deny and write-deny set.
>
> When the kernel needs to mark a pages as X-Only, it assigns a non-zero
> key to the pke bits. (Each page can have at most 1 key assigned...no key
> is effectively key=0.)
>
> A lot of effort went into minimizing the opportunity for userland to
> change the permissions on keys, which was the whole point of the PKRU
> being writable by non-supervisor processes.
>
> We can't close this window 100%, but since we start the userland program
> with its .text x-only, we minimize the ability to find a gadget or
> inject code to execute a WRPKRU instruction to modify the permissions on
> the keys. Any trap into the kernel, we check that the program didn't
> attempt this. If that register changed, we kill the program
> defensively.
>
> There are some implementation details I'm glossing over, like how the
> PKRU is per-cpu, but that's the gist.
>
> If you'd like to see an implementation of what Intel *wanted* people to
> *do* with PKU, the Linux kernel describes its API pretty well:
>
> https://www.kernel.org/doc/html/latest/core-api/protection-keys.html
>
> -dv



Re: PKU ?

2023-01-31 Thread Dave Voutila


Marc Espie  writes:

> I'm curious about the new enforcement strategies. Unfortunately I'm a bit
> lost in the 1000+ pages of the intel manual.

The protection keys documentation is thin because it's just another
layer in the rules for paging. I'll try to summarize and I'm sure
someone will correct me if I'm wrong :)

Its design was to allow userland the ability to change read-deny and
write-deny permissions without having to go back through a syscall to
manipulate the page table entries. You'd have the kernel tie a "key" (32
bit number) to entries in the page table and userland can manage the
permissions assigned to each key via the PKRU register.

We toss most of that noise out the window.

Instead, we have the kernel enforce that all keys except key=0 (the
default) have read-deny and write-deny set.

When the kernel needs to mark a pages as X-Only, it assigns a non-zero
key to the pke bits. (Each page can have at most 1 key assigned...no key
is effectively key=0.)

A lot of effort went into minimizing the opportunity for userland to
change the permissions on keys, which was the whole point of the PKRU
being writable by non-supervisor processes.

We can't close this window 100%, but since we start the userland program
with its .text x-only, we minimize the ability to find a gadget or
inject code to execute a WRPKRU instruction to modify the permissions on
the keys. Any trap into the kernel, we check that the program didn't
attempt this. If that register changed, we kill the program
defensively.

There are some implementation details I'm glossing over, like how the
PKRU is per-cpu, but that's the gist.

If you'd like to see an implementation of what Intel *wanted* people to
*do* with PKU, the Linux kernel describes its API pretty well:

https://www.kernel.org/doc/html/latest/core-api/protection-keys.html

-dv



bgpd: improve RTR error handling

2023-01-31 Thread Job Snijders
When the RTR's Session ID changes (for example when the RTR server is
restarted), bgpd would incorreectly branch into the "received %s: bad
msg len:" path.

The length fields in the RTR PDU error messages are 32-bits, so we
should use ntohl() instead of ntohs(). While there, add an additional
length check against the length listed in the RTR payload.

The resulting logs are now more useful:

Jan 31 11:54:39 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> idle, 
reason: connection open
Jan 31 11:54:39 feather bgpd[27547]: rtr 127.0.0.1: received error: Corrupt 
Data: Session ID doesn't match.
Jan 31 11:54:39 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> 
closed, reason: connection closed
Jan 31 11:54:46 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> idle, 
reason: connection open
Jan 31 11:54:46 feather bgpd[27547]: rtr 127.0.0.1: received error: Corrupt 
Data: Session ID doesn't match.
Jan 31 11:54:46 feather bgpd[27547]: rtr 127.0.0.1: state change idle -> 
closed, reason: connection closed
(... this goes on and on)

OK to commit the below?

There still is an open question: if we receive "Corrupt Data" (because
RTR Session ID changed), should bgpd really wait until
RTR_EVNT_TIMER_EXPIRE? Would it be desirable to somehow be able to
establish a proper connection sooner?

Kind regards,

Job

Index: usr.sbin/bgpctl/output.c
===
RCS file: /cvs/src/usr.sbin/bgpctl/output.c,v
retrieving revision 1.35
diff -u -p -r1.35 output.c
--- usr.sbin/bgpctl/output.c24 Jan 2023 15:50:10 -  1.35
+++ usr.sbin/bgpctl/output.c31 Jan 2023 12:00:54 -
@@ -1083,11 +1083,13 @@ show_rtr(struct ctl_show_rtr *rtr)
log_reason(rtr->last_sent_msg));
}
if (rtr->last_recv_error != NO_ERROR) {
-   printf("Last received error: %s\n",
+   printf(" Last received error: %s",
  log_rtr_error(rtr->last_recv_error));
if (rtr->last_recv_msg[0])
printf(" with reason \"%s\"",
log_reason(rtr->last_recv_msg));
+   else
+   printf("\n");
}
 
printf("\n");
Index: usr.sbin/bgpd/rtr_proto.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rtr_proto.c,v
retrieving revision 1.8
diff -u -p -r1.8 rtr_proto.c
--- usr.sbin/bgpd/rtr_proto.c   28 Dec 2022 21:30:16 -  1.8
+++ usr.sbin/bgpd/rtr_proto.c   31 Jan 2023 12:00:55 -
@@ -635,15 +635,24 @@ rtr_parse_error(struct rtr_session *rs, 
uint16_t errcode;
 
memcpy(, buf, sizeof(rh));
+   errcode = ntohs(rh.session_id);
+   rh.length = ntohl(rh.length);
+
+   if (len != rh.length) {
+   log_warnx("rtr %s: received %s: bad len: %u byte",
+   log_rtr(rs), log_rtr_type(ERROR_REPORT), rh.length);
+   rtr_fsm(rs, RTR_EVNT_CON_CLOSED);
+   return -1;
+   }
+
buf += sizeof(struct rtr_header);
len -= sizeof(struct rtr_header);
-   errcode = ntohs(rh.session_id);
 
memcpy(_len, buf, sizeof(pdu_len));
-   pdu_len = ntohs(pdu_len);
+   pdu_len = ntohl(pdu_len);
 
if (len < pdu_len + sizeof(pdu_len)) {
-   log_warnx("rtr %s: received %s: bad pdu len: %u byte",
+   log_warnx("rtr %s: received %s: bad encapped pdu len: %u byte",
log_rtr(rs), log_rtr_type(ERROR_REPORT), pdu_len);
rtr_fsm(rs, RTR_EVNT_CON_CLOSED);
return -1;
@@ -654,7 +663,7 @@ rtr_parse_error(struct rtr_session *rs, 
len -= pdu_len + sizeof(pdu_len);
 
memcpy(_len, buf, sizeof(msg_len));
-   msg_len = ntohs(msg_len);
+   msg_len = ntohl(msg_len);
 
if (len < msg_len + sizeof(msg_len)) {
log_warnx("rtr %s: received %s: bad msg len: %u byte",



Re: hardclock: don't call statclock(), stathz is always non-zero

2023-01-31 Thread Mark Kettenis
> Date: Tue, 31 Jan 2023 04:50:59 -0600
> From: Scott Cheloha 
> 
> On Mon, Jan 30, 2023 at 05:08:38PM +0100, Mark Kettenis wrote:
> > > Date: Sat, 21 Jan 2023 17:02:48 -0600
> > > From: Scott Cheloha 
> > > 
> > > All the platforms have switched to clockintr.
> > > 
> > > Let's start by isolating statclock() from hardclock().  stathz is now
> > > always non-zero: statclock() must be called separately.  Update
> > > several of the the stathz users to reflect that the value is always
> > > non-zero.
> > > 
> > > This is a first step toward making hardclock and statclock into
> > > schedulable entities.
> > > 
> > > ok?
> > 
> > If you are confident enough to start burning bridges, yes ok kettenis@
> > 
> > Maybe you want to add
> > 
> > KASSERT(stathz != 0);
> > KASSERT(profhz != 0);
> > 
> > at the start of initclocks() just to be sure?
> > 
> > Either way is fine with me.
> 
> I thought about doing that, but those checks are done during
> cpu_initclocks(), in clockintr_init():
> 
> 60  void
> 61  clockintr_init(u_int flags)
> 62  {
> 63  KASSERT(CPU_IS_PRIMARY(curcpu()));
> 64  KASSERT(clockintr_flags == 0);
> 65  KASSERT(!ISSET(flags, ~CL_FLAG_MASK));
> 66  
> 67  KASSERT(hz > 0 && hz <= 10);
> 68  hardclock_period = 10 / hz;
> 69  
> 70  KASSERT(stathz >= 1 && stathz <= 10);
> 
> Checking them again might make intent more explicit...  still, I'm
> leaning toward leaving them out.

Ah, sure, yes, that's fine.



Re: hardclock: don't call statclock(), stathz is always non-zero

2023-01-31 Thread Scott Cheloha
On Mon, Jan 30, 2023 at 05:08:38PM +0100, Mark Kettenis wrote:
> > Date: Sat, 21 Jan 2023 17:02:48 -0600
> > From: Scott Cheloha 
> > 
> > All the platforms have switched to clockintr.
> > 
> > Let's start by isolating statclock() from hardclock().  stathz is now
> > always non-zero: statclock() must be called separately.  Update
> > several of the the stathz users to reflect that the value is always
> > non-zero.
> > 
> > This is a first step toward making hardclock and statclock into
> > schedulable entities.
> > 
> > ok?
> 
> If you are confident enough to start burning bridges, yes ok kettenis@
> 
> Maybe you want to add
> 
> KASSERT(stathz != 0);
> KASSERT(profhz != 0);
> 
> at the start of initclocks() just to be sure?
> 
> Either way is fine with me.

I thought about doing that, but those checks are done during
cpu_initclocks(), in clockintr_init():

60  void
61  clockintr_init(u_int flags)
62  {
63  KASSERT(CPU_IS_PRIMARY(curcpu()));
64  KASSERT(clockintr_flags == 0);
65  KASSERT(!ISSET(flags, ~CL_FLAG_MASK));
66  
67  KASSERT(hz > 0 && hz <= 10);
68  hardclock_period = 10 / hz;
69  
70  KASSERT(stathz >= 1 && stathz <= 10);

Checking them again might make intent more explicit...  still, I'm
leaning toward leaving them out.



Re: PKU ?

2023-01-31 Thread Crystal Kolipe
On Tue, Jan 31, 2023 at 11:27:17AM +0100, Marc Espie wrote:
> I'm curious about the new enforcement strategies. Unfortunately I'm a bit
> lost in the 1000+ pages of the intel manual.
> 
> Could someone point me to the document that describes PKU, specifically ?

Well the intel SDM is surely the definitive reference?

Volume 3, chapters 2.7 and 4.6 give an overview of how it's supposed to work.



PKU ?

2023-01-31 Thread Marc Espie
I'm curious about the new enforcement strategies. Unfortunately I'm a bit
lost in the 1000+ pages of the intel manual.

Could someone point me to the document that describes PKU, specifically ?



Move duplicating initialization to soalloc()

2023-01-31 Thread Vitaliy Makkoveev
Since we have soalloc() to do common socket initialization, move the
rest within. I mostly need to do this because standalone socket's buffer
locking require to introduce another klistops data for buffers and there
is no reason to add more copypaste to sonewconn().

Also this makes `socket_klistops' private to kern/uipc_socket.c

Index: sys/kern/uipc_socket.c
===
RCS file: /cvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.299
diff -u -p -r1.299 uipc_socket.c
--- sys/kern/uipc_socket.c  27 Jan 2023 21:01:59 -  1.299
+++ sys/kern/uipc_socket.c  31 Jan 2023 09:16:04 -
@@ -112,6 +112,16 @@ const struct filterops soexcept_filtops 
.f_process  = filt_soprocess,
 };
 
+void   klist_soassertlk(void *);
+intklist_solock(void *);
+void   klist_sounlock(void *, int);
+
+const struct klistops socket_klistops = {
+   .klo_assertlk   = klist_soassertlk,
+   .klo_lock   = klist_solock,
+   .klo_unlock = klist_sounlock,
+};
+
 #ifndef SOMINCONN
 #define SOMINCONN 80
 #endif /* SOMINCONN */
@@ -148,6 +158,11 @@ soalloc(int wait)
return (NULL);
rw_init_flags(>so_lock, "solock", RWL_DUPOK);
refcnt_init(>so_refcnt);
+   klist_init(>so_rcv.sb_klist, _klistops, so);
+   klist_init(>so_snd.sb_klist, _klistops, so);
+   sigio_init(>so_sigio);
+   TAILQ_INIT(>so_q0);
+   TAILQ_INIT(>so_q);
 
return (so);
 }
@@ -176,11 +191,6 @@ socreate(int dom, struct socket **aso, i
if (prp->pr_type != type)
return (EPROTOTYPE);
so = soalloc(M_WAIT);
-   klist_init(>so_rcv.sb_klist, _klistops, so);
-   klist_init(>so_snd.sb_klist, _klistops, so);
-   sigio_init(>so_sigio);
-   TAILQ_INIT(>so_q0);
-   TAILQ_INIT(>so_q);
so->so_type = type;
if (suser(p) == 0)
so->so_state = SS_PRIV;
@@ -2333,12 +2343,6 @@ klist_sounlock(void *arg, int ls)
 
sounlock(so);
 }
-
-const struct klistops socket_klistops = {
-   .klo_assertlk   = klist_soassertlk,
-   .klo_lock   = klist_solock,
-   .klo_unlock = klist_sounlock,
-};
 
 #ifdef DDB
 void
Index: sys/kern/uipc_socket2.c
===
RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.134
diff -u -p -r1.134 uipc_socket2.c
--- sys/kern/uipc_socket2.c 27 Jan 2023 18:46:34 -  1.134
+++ sys/kern/uipc_socket2.c 31 Jan 2023 09:16:04 -
@@ -41,7 +41,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 /*
@@ -226,9 +225,6 @@ sonewconn(struct socket *head, int conns
so->so_rcv.sb_lowat = head->so_rcv.sb_lowat;
so->so_rcv.sb_timeo_nsecs = head->so_rcv.sb_timeo_nsecs;
 
-   klist_init(>so_rcv.sb_klist, _klistops, so);
-   klist_init(>so_snd.sb_klist, _klistops, so);
-   sigio_init(>so_sigio);
sigio_copy(>so_sigio, >so_sigio);
 
soqinsque(head, so, 0);
Index: sys/sys/event.h
===
RCS file: /cvs/src/sys/sys/event.h,v
retrieving revision 1.67
diff -u -p -r1.67 event.h
--- sys/sys/event.h 31 Mar 2022 01:41:22 -  1.67
+++ sys/sys/event.h 31 Jan 2023 09:16:04 -
@@ -286,7 +286,6 @@ struct timespec;
 
 extern const struct filterops sig_filtops;
 extern const struct filterops dead_filtops;
-extern const struct klistops socket_klistops;
 
 extern voidkqpoll_init(unsigned int);
 extern voidkqpoll_done(unsigned int);



Re: npppd(8): remove "pipex" option

2023-01-31 Thread Vitaliy Makkoveev
On Tue, Jan 31, 2023 at 01:40:19PM +0900, YASUOKA Masahiko wrote:
> Hi,
> 
> On Sun, 29 Jan 2023 14:35:05 +0300
> Vitaliy Makkoveev  wrote:
> > While switchind pppx(4) and pppac(4) from selwakeup() to KNOTE(9), I
> > found npppd(8) doesn't create pppx interface with "pipex no" in
> > npppd.conf, but successfully connects the client. So packets don't flow.
> > However, the pppac(4) has no this problem, because corresponding pppac
> > interface always created when npppd(8) opened device node.
> > 
> > In fact, npppd(8) will not work with pppx(4) interfaces without pipex(4)
> > support. Otherwise npppd(8) should create pppx(4) sessions with not
> > pipex(4) specific PIPEXASESSION ioctl(2) command.
> > 
> > I propose to remove "pipex" option from npppd(8). We already have
> > "net.pipex.enable" sysctl MIB to control pipex behaviour. In the case
> > then "net.pipex.enable" is set to 0, pipex(4) sessions will be always
> > created, but the traffic will go outside pipex(4) layer.
> > 
> > The "ifdef USE_NPPPD_PIPEX" left as is. If we decide to remove them, I
> > will do this with the next diffs.
> 
> Will the next diff remove the networking part (MPPE, IP) as well?
> 
> > Please note, we never have complains about the problem described above,
> > so I doubt someone uses npppd(8) with "pipex no" in the npppd.conf(5).
> 
> I don't know why you configured "pipex no", I suppose it was for
> debug.  I also actually use "pipex no" when debug or development.

I used this option to test my/visa@ diff which replaced selwakeup() by
KNOTE(9) and found that pppx(4) case is broken if this option is set to
"no". Since we have the ability of enable/disable pipex(4) globally, I
propose to remove this option.

in fact, for the pppx(4) case npppd(8) is absolutely useless without
pipex(4) support, so I don't see any reasons to build it without
pipex(4). I don't propose to remove the whole "ifdef USE_NPPPD_PIPEX"
blocks, only preprocessor directives. Sorry, if my suggestion was not
clear.

> 
> If having "pipex yes/no" configuration is misleading, we can improve
> the man page or the configuration itself.

 pipex yes | no
 Specify whether npppd(8) uses pipex(4).  The default is
 “yes”. The sysctl(8) variable net.pipex.enable should
 also be enabled to use pipex(4).

There is no misleading. But with "pipex no" npppd(8) is usable with
pppac(4), but with pppx(4) it is not. Also, I don't like that it
successfully creates connection. Guess, it better to deny "pipex no"
for pppx(4).