Re: macppc bsd.mp pmap's hash lock

2021-05-18 Thread George Koehler
On Thu, 13 May 2021 02:20:45 -0400
George Koehler  wrote:

> My last diff (11 May 2021) still has a potential problem with memory
> barriers.  I will mail a new diff if I think of a fix.

Here's my new diff.  It is a copy of what I sent to ppc@ a moment ago.
I would ask, "Is it ok to commit?", but while writing this mail, I
found a 2nd potential problem with memory barriers, so I will need to
edit this diff yet again.

I add a 2nd membar_enter() to __ppc_lock() to avoid my 1st "potential
problem".  I also simplify the code by changing __ppc_lock_spin to
check the owning cpu, not the count.

A moment ago, I sent a copy of this diff to ppc@, in reply to another
report of bsd.mp hanging at "using parent ..." on a G5.  My own hang
happened when the function __ppc_lock(), in a randomly reordered
bsd.mp, crossed a page boundary and caused a page fault with a
recursive call to __ppc_lock().  The goal of my diff is to prevent
such hangs, by allowing such recursive calls to work.

I acquire the lock with an atomic 32-bit write, so the lock is always
in a valid state before or after the write.  The old code did spin
until mpl_count == 0.  In my earlier diffs, I acquired the lock by
setting both the owning cpu and a nonzero count in one write, so I
needed to pack both owner and count in a 32-bit integer.

This diff spins until mpl_cpu == NULL.  I acquire the lock by setting
only mpl_cpu and leaving mpl_count at 0.  So mpl_cpu is the spinlock,
and mpl_count is a variable protected by the lock.  I no longer need
to pack both fields in a 32-bit integer, so I get rid of my BOLT*
macros for packing and unpacking the fields.

I need 2 memory barriers:
  1. after acquiring the lock, before accessing the locked data.
  2. after accessing the locked data, before releasing the lock.

I added the 2nd membar_enter(), because I feared that a cpu would
acquire the lock, but get a page fault (and recursive call) before
it would reach the memory barrier.  I didn't think of the opposite
case, where a cpu would do membar_exit(), but gets a page fault
before it would release the lock.  This is the 2nd potential problem
that I didn't fix.  I didn't observe an actual problem after running
this diff for more than 24 hours.

--George

Index: sys/arch/powerpc/include/mplock.h
===
RCS file: /cvs/src/sys/arch/powerpc/include/mplock.h,v
retrieving revision 1.4
diff -u -p -r1.4 mplock.h
--- sys/arch/powerpc/include/mplock.h   15 Apr 2020 08:09:00 -  1.4
+++ sys/arch/powerpc/include/mplock.h   16 May 2021 00:51:41 -
@@ -30,13 +30,14 @@
 #define __USE_MI_MPLOCK
 
 /*
+ * __ppc_lock exists because pte_spill_r() can't use __mp_lock.
  * Really simple spinlock implementation with recursive capabilities.
  * Correctness is paramount, no fancyness allowed.
  */
 
 struct __ppc_lock {
-   volatile struct cpu_info *mpl_cpu;
-   volatile long   mpl_count;
+   struct cpu_info *volatile   mpl_cpu;
+   longmpl_count;
 };
 
 #ifndef _LOCORE
@@ -44,10 +45,6 @@ struct __ppc_lock {
 void __ppc_lock_init(struct __ppc_lock *);
 void __ppc_lock(struct __ppc_lock *);
 void __ppc_unlock(struct __ppc_lock *);
-int __ppc_release_all(struct __ppc_lock *);
-int __ppc_release_all_but_one(struct __ppc_lock *);
-void __ppc_acquire_count(struct __ppc_lock *, int);
-int __ppc_lock_held(struct __ppc_lock *, struct cpu_info *);
 
 #endif
 
Index: sys/arch/powerpc/powerpc/lock_machdep.c
===
RCS file: /cvs/src/sys/arch/powerpc/powerpc/lock_machdep.c,v
retrieving revision 1.9
diff -u -p -r1.9 lock_machdep.c
--- sys/arch/powerpc/powerpc/lock_machdep.c 15 Apr 2020 08:09:00 -  
1.9
+++ sys/arch/powerpc/powerpc/lock_machdep.c 16 May 2021 00:51:41 -
@@ -1,6 +1,7 @@
 /* $OpenBSD: lock_machdep.c,v 1.9 2020/04/15 08:09:00 mpi Exp $*/
 
 /*
+ * Copyright (c) 2021 George Koehler 
  * Copyright (c) 2007 Artur Grabowski 
  *
  * Permission to use, copy, modify, and distribute this software for any
@@ -22,10 +23,21 @@
 #include 
 
 #include 
-#include 
 
 #include 
 
+/*
+ * If __ppc_lock() crosses a page boundary in the kernel text, then it
+ * may cause a page fault (on G5 with ppc_nobat), and pte_spill_r()
+ * would recursively call __ppc_lock().  The lock must be in a valid
+ * state when the page fault happens.  We acquire or release the lock
+ * with a 32-bit atomic write to mpl_owner, so the lock is always in a
+ * valid state, before or after the write.
+ *
+ * Acquired the lock:  mpl->mpl_cpu == curcpu()
+ * Released the lock:  mpl->mpl_cpu == NULL
+ */
+
 void
 __ppc_lock_init(struct __ppc_lock *lock)
 {
@@ -46,12 +58,12 @@ static __inline void
 __ppc_lock_spin(struct __ppc_lock *mpl)
 {
 #ifndef MP_LOCKDEBUG
-   while (mpl->mpl_count != 0)
+   while (mpl->mpl_cpu != NULL)
CPU_BUSY_CYCLE();
 #else
int nticks = __mp_lock_spinout;
 

[patch] tcpdump: Sync DNS types with IANA

2021-05-18 Thread Matthew Martin
Sync the DNS types with IANA[1] and upstream[2]. With this the Type65
queries show up as HTTPS.

Removed the UNSPECA type parsing as IANA has that query type number
assigned to NID now.

Also added a const on ns_class2str.

1: https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml
2: https://github.com/the-tcpdump-group/tcpdump/pull/912

diff --git nameser.h nameser.h
index ddb5e065c4a..616e0028cff 100644
--- nameser.h
+++ nameser.h
@@ -145,25 +145,47 @@
 #define T_ATMA 34  /* ATM Address */
 #define T_NAPTR35  /* Naming Authority PoinTeR */
 #define T_KX   36  /* Key Exchanger */
-#define T_CERT 37  /* certificate */
+#define T_CERT 37  /* Certificates in the DNS */
 #define T_A6   38  /* IP6 address */
 #define T_DNAME39  /* non-terminal redirection */
-#define T_SINK 40  /* SINK */
+#define T_SINK 40  /* unknown */
 #define T_OPT  41  /* EDNS0 option (meta-RR) */
-#define T_APL  42  /* APL */
+#define T_APL  42  /* lists of address prefixes */
 #define T_DS   43  /* Delegation Signer */
-#define T_SSHFP44  /* SSH Key Fingerprint */
+#define T_SSHFP44  /* SSH Fingerprint */
 #define T_IPSECKEY 45  /* IPsec keying material */
-#define T_RRSIG46  /* RRSIG */
-#define T_NSEC 47  /* NSEC */
-#define T_DNSKEY   48  /* DNSKEY */
+#define T_RRSIG46  /* new security signature */
+#define T_NSEC 47  /* provable insecure information */
+#define T_DNSKEY   48  /* new security key */
+#define T_DHCID49  /* DHCP IDentifier */
+#define T_NSEC350  /* Next SECure record v3 */
+#define T_NSEC3PARAM   51  /* NSEC3 PARAMeter */
+#define T_TLSA 52  /* TLS Authentication */
+#define T_SMIMEA   53  /* S/MIME Authentication */
+/* Unassigned */
+#define T_HIP  55  /* Host Identity Protocol */
+#define T_NINFO56  /* zone status information */
+#define T_RKEY 57  /* Record encryption KEY */
+#define T_TALINK   58  /* Trust Anchor LINK */
+#define T_CDS  59  /* Child Delegation Signer */
+#define T_CDNSKEY  60  /* Child DNSKEY */
+#define T_OPENPGPKEY   61  /* OpenPGP KEY */
+#define T_CSYNC62  /* Child to parent 
SYNCronization */
+#define T_ZONEMD   63  /* ZONE data Message Digest */
+#define T_SVCB 64  /* SerViCe Binding */
+#define T_HTTPS65  /* HTTPS binding */
/* non standard */
 #define T_SPF  99  /* sender policy framework */
 #define T_UINFO100 /* user (finger) information */
 #define T_UID  101 /* user ID */
 #define T_GID  102 /* group ID */
 #define T_UNSPEC   103 /* Unspecified format (binary data) */
-#define T_UNSPECA  104 /* "unspecified ascii". Ugly MIT hack */
+#define T_NID  104 /* Node IDentifier */
+#define T_L32  105 /* Locator 32-bit */
+#define T_L64  106 /* Locator 64-bit */
+#define T_LP   107 /* Locator Pointer */
+#define T_EUI48108 /* an EUI-48 address */
+#define T_EUI64109 /* an EUI-64 address */
/* Query type values which do not appear in resource records */
 #define T_TKEY 249 /* Transaction Key [RFC2930] */
 #define T_TSIG 250 /* Transaction Signature [RFC2845] */
@@ -172,6 +194,13 @@
 #define T_MAILB253 /* transfer mailbox records */
 #define T_MAILA254 /* transfer mail agent records 
*/
 #define T_ANY  255 /* wildcard match */
+#define T_URI  256 /* uri records [RFC7553] */
+#define T_CAA  257 /* Certification Authority 
Authorization */
+#define T_AVC  258 /* Application Visibility and Control */
+#define T_DOA  259 /* Digital Object Architecture */
+#define T_AMTRELAY 260 /* Automatic Multicast Tunneling RELAY 
*/
+#define T_TA   32768   /* DNSSEC Trust Authorities */
+#define T_DLV  32769   /* DNSSEC Lookaside Validation */
 
 /*
  * Values for class field
diff --git print-domain.c print-domain.c
index f5d74e8227c..023f8020738 100644
--- print-domain.c
+++ print-domain.c
@@ -252,7 

Re: [External] : Re: parallel forwarding vs. bridges

2021-05-18 Thread Alexandr Nedvedicky
Hello,

just for the record...



> > in current tree the ether_input() is protected by NET_LOCK(), which is 
> > grabbed
> > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so
> > ether_input() can run concurrently. Switching NET_LOCK() to r-lock has
> > implications on smr read section above. The ting is the call to 
> > eb->eb_input()
> > can sleep now. This is something what needs to be avoided within smr 
> > section.
> 
> Is the new sleeping point introduced by the fact the PF_LOCK() is a
> rwlock?  Did you consider using a mutex, at least for the time being,
> in order to not run in such issues?

below is a diff, which trades both pf(4) rw-locks for mutexes.

diff compiles, it still needs testing/experimenting.

regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index ae7bb008351..30c78da7d16 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -146,8 +146,8 @@ TAILQ_HEAD(pf_tags, pf_tagname) pf_tags = 
TAILQ_HEAD_INITIALIZER(pf_tags),
  * grab the lock as writer. Whenever packet creates state it grabs pf_lock
  * first then it locks pf_state_lock as the writer.
  */
-struct rwlock   pf_lock = RWLOCK_INITIALIZER("pf_lock");
-struct rwlock   pf_state_lock = RWLOCK_INITIALIZER("pf_state_lock");
+struct mutexpf_lock = MUTEX_INITIALIZER(IPL_NET);
+struct mutexpf_state_lock = MUTEX_INITIALIZER(IPL_NET);
 
 #if (PF_QNAME_SIZE != PF_TAG_NAME_SIZE)
 #error PF_QNAME_SIZE must be equal to PF_TAG_NAME_SIZE
diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h
index b1a122af409..c5ae392b468 100644
--- a/sys/net/pfvar_priv.h
+++ b/sys/net/pfvar_priv.h
@@ -39,7 +39,7 @@
 
 #include 
 
-extern struct rwlock pf_lock;
+extern struct mutex pf_lock;
 
 struct pf_pdesc {
struct {
@@ -104,52 +104,41 @@ extern struct timeout pf_purge_to;
 struct pf_state*pf_state_ref(struct pf_state *);
 voidpf_state_unref(struct pf_state *);
 
-extern struct rwlock   pf_lock;
-extern struct rwlock   pf_state_lock;
+extern struct mutexpf_lock;
+extern struct mutexpf_state_lock;
 
 #define PF_LOCK()  do {\
NET_ASSERT_LOCKED();\
-   rw_enter_write(_lock);   \
+   mtx_enter(_lock);\
} while (0)
 
 #define PF_UNLOCK()do {\
PF_ASSERT_LOCKED(); \
-   rw_exit_write(_lock);\
+   mtx_leave(_lock);\
} while (0)
 
-#define PF_ASSERT_LOCKED() do {\
-   if (rw_status(_lock) != RW_WRITE)\
-   splassert_fail(RW_WRITE,\
-   rw_status(_lock),__func__);\
-   } while (0)
+#define PF_ASSERT_LOCKED() (void)(0)
 
-#define PF_ASSERT_UNLOCKED()   do {\
-   if (rw_status(_lock) == RW_WRITE)\
-   splassert_fail(0, rw_status(_lock), __func__);\
-   } while (0)
+#define PF_ASSERT_UNLOCKED()   (void)(0)
 
 #define PF_STATE_ENTER_READ()  do {\
-   rw_enter_read(_state_lock);  \
+   mtx_enter(_state_lock);  \
} while (0)
 
 #define PF_STATE_EXIT_READ()   do {\
-   rw_exit_read(_state_lock);   \
+   mtx_leave(_state_lock);  \
} while (0)
 
 #define PF_STATE_ENTER_WRITE() do {\
-   rw_enter_write(_state_lock); \
+   mtx_enter(_state_lock);  \
} while (0)
 
 #define PF_STATE_EXIT_WRITE()  do {\
PF_ASSERT_STATE_LOCKED();   \
-   rw_exit_write(_state_lock);  \
+   mtx_leave(_state_lock);  \
} while (0)
 
-#define PF_ASSERT_STATE_LOCKED()   do {\
-   if (rw_status(_state_lock) != RW_WRITE)\
-   splassert_fail(RW_WRITE,\
-   rw_status(_state_lock), __func__);\
-   } while (0)
+#define PF_ASSERT_STATE_LOCKED()   (void)(0)
 
 extern void pf_purge_timeout(void *);
 extern void pf_purge(void *);



rpki-client 7.1 released

2021-05-18 Thread Sebastian Benoit
rpki-client 7.1 has just been released and will be available in the
rpki-client directory of any OpenBSD mirror soon.

rpki-client is a FREE, easy-to-use implementation of the Resource
Public Key Infrastructure (RPKI) for Relying Parties (RP) to
facilitate validation of the Route Origin of a BGP announcement. The
program queries the RPKI repository system and outputs Validated ROA
Payloads in the configuration format of OpenBGPD, BIRD, and also as
CSV or JSON objects for consumption by other routing stacks.

See RFC 6811 for a description of how BGP Prefix Origin Validation
secures the Internet's global routing system.

rpki-client was primarily developed by Kristaps Dzonsons, Claudio
Jeker, Job Snijders, and Sebastian Benoit as part of the OpenBSD
Project.

This release includes the following changes to the previous release:

 * Add keep-alive support to the HTTP client code for RRDP,
 * Reference-count and delete unused files synced via RRDP, as far as
   possible,
 * In the JSON output, change the AS Number from a string ("AS123") to
   an integer ("123") to make processing of the output easier,
 * Add an 'expires' column to CSV & JSON output, based on certificate
   and CRL validity times. The 'expires' value can be used to avoid route
   selection based on stale data when generating VRP sets, when faced
   with loss of communication between consumer and valdiator, or
   validator and CA repository,
 * Make the runtime timeout (-s option) also triggers in
   child proecesses.
 * Improved RRDP support, we encourage testing of RRDP with the -r
   option so that RRDP can be enabled by default in a future release.
   Please report any issues found.

In the portable version,

 * Improve support for older libressl versions (altough the latest
   stable release is recommended),
 * Add missing compat headers in release packages so they build on
   Alpine Linux and macOS.

rpki-client is known to compile and run on at least the following
operating systems: Alpine 3.12, Debian 9, 10, Fedora 31, 32, 33, macOS,
RHEL/CentOS 7, 8, Windows Subsystem for Linux 2, and OpenBSD.

It is our hope that packagers take interest and help adapt
rpki-client-portable to more distributions.

The mirrors where rpki-client can be found are on
https://www.rpki-client.org/portable.html

Reporting Bugs:
===

General bugs may be reported to tech@openbsd.org

Portable bugs may be filed at 
https://github.com/rpki-client/rpki-client-portable

We welcome feedback and improvements from the broader community.
Thanks to all of the contributors who helped make this release
possible.



Re: snmpd rename context to pdutype

2021-05-18 Thread Jan Klemkow
On Fri, May 07, 2021 at 04:18:50PM +0200, Martijn van Duren wrote:
> When moving the traphandler to the snmpe process I overlooked the fact
> that "type" is being saved inside the switch statement under the
> sm_context name. RFC3411 talks about pduType, and the name context means
> something completely different in the v3 world.
> 
> The diff below moves our naming closer to the RFCs, which should
> hopefully prevent further confusion in the future.
> 
> While here I made the debug output print the pduType in a human readable
> format.
> 
> The invalid varbind check can be simplified a simple "{}" in the
> ober_scanf_elements allowing me to just drop the type variable.
> 
> OK?

I tested it and the diff looks good and legit for me.
 
> martijn@
> 
> Index: snmp.h
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmp.h,v
> retrieving revision 1.16
> diff -u -p -r1.16 snmp.h
> --- snmp.h30 Jun 2020 17:11:49 -  1.16
> +++ snmp.h7 May 2021 14:17:12 -
> @@ -77,7 +77,7 @@ enum snmp_version {
>   SNMP_V3 = 3
>  };
>  
> -enum snmp_context {
> +enum snmp_pdutype {
>   SNMP_C_GETREQ   = 0,
>   SNMP_C_GETNEXTREQ   = 1,
>   SNMP_C_GETRESP  = 2,
> Index: snmpd.h
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmpd.h,v
> retrieving revision 1.94
> diff -u -p -r1.94 snmpd.h
> --- snmpd.h   5 Feb 2021 10:30:45 -   1.94
> +++ snmpd.h   7 May 2021 14:17:12 -
> @@ -384,7 +384,7 @@ struct snmp_message {
>   socklen_tsm_slen;
>   int  sm_sock_tcp;
>   int  sm_aflags;
> - int  sm_type;
> + enum snmp_pdutypesm_pdutype;
>   struct event sm_sockev;
>   char sm_host[HOST_NAME_MAX+1];
>   in_port_tsm_port;
> @@ -405,7 +405,6 @@ struct snmp_message {
>  
>   /* V1, V2c */
>   char sm_community[SNMPD_MAXCOMMUNITYLEN];
> - int  sm_context;
>  
>   /* V3 */
>   long longsm_msgid;
> Index: snmpe.c
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmpe.c,v
> retrieving revision 1.70
> diff -u -p -r1.70 snmpe.c
> --- snmpe.c   22 Feb 2021 11:31:09 -  1.70
> +++ snmpe.c   7 May 2021 14:17:12 -
> @@ -41,6 +41,7 @@
>  #include "mib.h"
>  
>  void  snmpe_init(struct privsep *, struct privsep_proc *, void *);
> +const char *snmpe_pdutype2string(enum snmp_pdutype);
>  int   snmpe_parse(struct snmp_message *);
>  void  snmpe_tryparse(int, struct snmp_message *);
>  int   snmpe_parsevarbinds(struct snmp_message *);
> @@ -194,6 +195,36 @@ snmpe_bind(struct address *addr)
>   return (-1);
>  }
>  
> +const char *
> +snmpe_pdutype2string(enum snmp_pdutype pdutype)
> +{
> + static char unknown[sizeof("Unknown (4294967295)")];
> +
> + switch (pdutype) {
> + case SNMP_C_GETREQ:
> + return "GetRequest";
> + case SNMP_C_GETNEXTREQ:
> + return "GetNextRequest";
> + case SNMP_C_GETRESP:
> + return "Response";
> + case SNMP_C_SETREQ:
> + return "SetRequest";
> + case SNMP_C_TRAP:
> + return "Trap";
> + case SNMP_C_GETBULKREQ:
> + return "GetBulkRequest";
> + case SNMP_C_INFORMREQ:
> + return "InformRequest";
> + case SNMP_C_TRAPV2:
> + return "SNMPv2-Trap";
> + case SNMP_C_REPORT:
> + return "Report";
> + }
> +
> + snprintf(unknown, sizeof(unknown), "Unknown (%u)", pdutype);
> + return unknown;
> +}
> +
>  int
>  snmpe_parse(struct snmp_message *msg)
>  {
> @@ -202,7 +233,6 @@ snmpe_parse(struct snmp_message *msg)
>   struct ber_element  *a;
>   long longver, req;
>   long longerrval, erridx;
> - unsigned int type;
>   u_intclass;
>   char*comn;
>   char*flagstr, *ctxname;
> @@ -271,15 +301,15 @@ snmpe_parse(struct snmp_message *msg)
>   goto fail;
>   }
>  
> - if (ober_scanf_elements(msg->sm_pdu, "t{e", , , ) != 0)
> + if (ober_scanf_elements(msg->sm_pdu, "t{e", , &(msg->sm_pdutype),
> + ) != 0)
>   goto parsefail;
>  
>   /* SNMP PDU context */
>   if (class != BER_CLASS_CONTEXT)
>   goto parsefail;
>  
> - msg->sm_type = type;
> - switch (type) {
> + switch (msg->sm_pdutype) {
>   case SNMP_C_GETBULKREQ:
>   if (msg->sm_version == SNMP_V1) {
>   stats->snmp_inbadversions++;
> @@ -294,7 +324,7 @@ snmpe_parse(struct snmp_message *msg)
>   /* FALLTHROUGH */
>  
>   case SNMP_C_GETNEXTREQ:
> - if (type == 

bgpd extend capability support (add-path, enhanced rr)

2021-05-18 Thread Claudio Jeker
bgpd(8) will soon support ADD-PATH (RFC7911) and enhanced route refresh
(RFC7313). This is the frist step toward this.
It add the capability parsers, extends the capability struct and adds the
capability negotiation bits. The route refresh parser and generator are
extended to support the BoRR and EoRR message and last but not least
bgpctl is adjusted to print the new capabilities.

Now since the system has no way of enabling the two new capabilities.
bgpctl will only show if the peer sends the capability but there should
be no other effect.

The RDE bits for enahnced route refresh are almost ready, add-path will
take a bit more since the RDE needs to grow an extra indirection.

I decided to split this work up to simplify review. Lets see if this
works :)
-- 
:wq Claudio


Index: bgpctl/bgpctl.c
===
RCS file: /cvs/src/usr.sbin/bgpctl/bgpctl.c,v
retrieving revision 1.267
diff -u -p -r1.267 bgpctl.c
--- bgpctl/bgpctl.c 3 May 2021 14:01:56 -   1.267
+++ bgpctl/bgpctl.c 18 May 2021 08:49:34 -
@@ -1351,6 +1351,13 @@ print_capability(u_int8_t capa_code, u_c
} else
printf("bad length");
break;
+   case CAPA_ADD_PATH:
+   printf("add-path capability");
+   /* XXX there is more needed here */
+   break;
+   case CAPA_ENHANCED_RR:
+   printf("enhanced route refresh capability");
+   break;
default:
printf("unknown capability %u length %u", capa_code, len);
break;
Index: bgpctl/output.c
===
RCS file: /cvs/src/usr.sbin/bgpctl/output.c,v
retrieving revision 1.16
diff -u -p -r1.16 output.c
--- bgpctl/output.c 26 Apr 2021 18:23:20 -  1.16
+++ bgpctl/output.c 18 May 2021 15:17:39 -
@@ -147,6 +147,34 @@ show_neighbor_capa_mp(struct capabilitie
 }
 
 static void
+show_neighbor_capa_add_path(struct capabilities *capa)
+{
+   const char  *mode;
+   int comma;
+   u_int8_ti;
+
+   printf("Add-path: ");
+   for (i = 0, comma = 0; i < AID_MAX; i++) {
+   switch (capa->add_path[i]) {
+   case 0:
+   default:
+   continue;
+   case CAPA_AP_RECV:
+   mode = "recv";
+   break;
+   case CAPA_AP_SEND:
+   mode = "send";
+   break;
+   case CAPA_AP_BIDIR:
+   mode = "bidir";
+   }
+   printf("%s%s %s", comma ? ", " : "", aid2str(i), mode);
+   comma = 1;
+   }
+   printf("\n");
+}
+
+static void
 show_neighbor_capa_restart(struct capabilities *capa)
 {
int comma;
@@ -202,6 +230,13 @@ show_neighbor_msgstats(struct peer *p)
p->stats.prefix_sent_withdraw, p->stats.prefix_rcvd_withdraw);
printf("  %-15s %10llu %10llu\n", "End-of-Rib",
p->stats.prefix_sent_eor, p->stats.prefix_rcvd_eor);
+   printf("  Route Refresh statistics:\n");
+   printf("  %-15s %10llu %10llu\n", "Request",
+   p->stats.refresh_sent_req, p->stats.refresh_rcvd_req);
+   printf("  %-15s %10llu %10llu\n", "Begin-of-RR",
+   p->stats.refresh_sent_borr, p->stats.refresh_rcvd_borr);
+   printf("  %-15s %10llu %10llu\n", "End-of-RR",
+   p->stats.refresh_sent_eorr, p->stats.refresh_rcvd_eorr);
 }
 
 static void
@@ -210,7 +245,7 @@ show_neighbor_full(struct peer *p, struc
const char  *errstr;
struct in_addr   ina;
char*s;
-   int  hascapamp = 0;
+   int  hascapamp, hascapaap;
u_int8_t i;
 
if ((p->conf.remote_addr.aid == AID_INET &&
@@ -279,35 +314,57 @@ show_neighbor_full(struct peer *p, struc
fmt_monotime(p->stats.last_read),
p->holdtime, p->holdtime/3);
printf("  Last write %s\n", fmt_monotime(p->stats.last_write));
-   for (i = 0; i < AID_MAX; i++)
+
+   hascapamp = 0;
+   hascapaap = 0;
+   for (i = AID_MIN; i < AID_MAX; i++) {
if (p->capa.peer.mp[i])
hascapamp = 1;
-   if (hascapamp || p->capa.peer.refresh ||
-   p->capa.peer.grestart.restart || p->capa.peer.as4byte) {
+   if (p->capa.peer.add_path[i])
+   hascapaap = 1;
+   }
+   if (hascapamp || hascapaap || p->capa.peer.grestart.restart ||
+   p->capa.peer.refresh || p->capa.peer.enhanced_rr ||
+   p->capa.peer.as4byte) {
printf("  Neighbor capabilities:\n");
if (hascapamp)
show_neighbor_capa_mp(>capa.peer);
+   if (p->capa.peer.as4byte)
+   

Add f_modify and f_process callbacks to socket filterops

2021-05-18 Thread Visa Hankala
This diff adds f_modify and f_process callbacks to socket event filters.
As a result, socket events are handled using the non-legacy paths in
filter_modify() and filter_process() of kern_event.c This a step toward
MP-safety. However, everything still runs under the kernel lock.

The change has three intended effects:

* Socket events are handled without raising the system priority level.
  This makes the activity observable with btrace(8).

* kqueue itself no longer calls f_event of socket filterops, which
  allows replacing the conditional, NOTE_SUBMIT-based locking with
  a fixed call pattern.

* The state of a socket event is now always rechecked before delivery
  to user. Before, the recheck was skipped if the event was registered
  with EV_ONESHOT.

However, the change of behaviour with EV_ONESHOT is questionable.
When an activated event is being processed, the code will acquire the
socket lock anyway. Skipping the state check would only be a minor
optimization. In addition, I think the behaviour becomes more
consistent as now a delivered EV_ONESHOT event really was active at
the time of delivery.

Consider the following program. It creates a socket pair, writes a byte
to the socket, registers an EV_ONESHOT event, and reads the byte from
the socket. Next it checks how kevent(2) behaves.

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int
main(void)
{
struct kevent kev[1];
struct timespec ts = {};
int fds[2], flags, kq, n;
char b;

if (socketpair(AF_UNIX, SOCK_STREAM, 0, fds) == -1)
err(1, "socketpair");
flags = fcntl(fds[0], F_GETFL, 0);
fcntl(fds[0], F_SETFL, flags | O_NONBLOCK);

printf("write 1\n");
write(fds[1], "x", 1);

kq = kqueue();
if (kq == -1)
err(1, "kqueue");
EV_SET([0], fds[0], EVFILT_READ, EV_ADD | EV_ONESHOT, 0, 0, NULL);
if (kevent(kq, kev, 1, NULL, 0, NULL) == -1)
err(1, "kevent");

n = read(fds[0], , 1);
printf("read %d\n", n);
n = read(fds[0], , 1);
printf("read %d\n", n);

n = kevent(kq, NULL, 0, kev, 1, );
printf("kevent %d\n", n);
n = read(fds[0], , 1);
printf("read %d\n", n);

n = kevent(kq, NULL, 0, kev, 1, );
printf("kevent %d\n", n);

printf("write 1\n");
write(fds[1], "x", 1);

n = kevent(kq, NULL, 0, kev, 1, );
printf("kevent %d\n", n);
n = read(fds[0], , 1);
printf("read %d\n", n);

return 0;
}

With an unpatched kernel, the EV_ONESHOT event gets activated by the
pending byte when the event is registered. The event remains active
until delivery, and the delivery happens even though it is clear that
reading from the socket will fail. The program prints:

write 1
read 1
read -1
kevent 1
read -1
kevent 0
write 1
kevent 0
read 1

With the patch applied, the event gets delivered only if the socket
has bytes pending.

write 1
read 1
read -1
kevent 0
read -1
kevent 0
write 1
kevent 1
read 1

So, is this EV_ONESHOT change reasonable, or should the implementation
stick with the old way? FreeBSD appears to follow the old way. MacOS
might perform differently, though I am not sure about that.

It is not essential to change EV_ONESHOT, however.

Feedback and tests are welcome.

Index: kern/uipc_socket.c
===
RCS file: src/sys/kern/uipc_socket.c,v
retrieving revision 1.261
diff -u -p -r1.261 uipc_socket.c
--- kern/uipc_socket.c  13 May 2021 19:43:11 -  1.261
+++ kern/uipc_socket.c  18 May 2021 12:56:24 -
@@ -70,15 +70,26 @@ voidsorflush(struct socket *);
 
 void   filt_sordetach(struct knote *kn);
 intfilt_soread(struct knote *kn, long hint);
+intfilt_soreadmodify(struct kevent *kev, struct knote *kn);
+intfilt_soreadprocess(struct knote *kn, struct kevent *kev);
+intfilt_soread_common(struct knote *kn, struct socket *so);
 void   filt_sowdetach(struct knote *kn);
 intfilt_sowrite(struct knote *kn, long hint);
+intfilt_sowritemodify(struct kevent *kev, struct knote *kn);
+intfilt_sowriteprocess(struct knote *kn, struct kevent *kev);
+intfilt_sowrite_common(struct knote *kn, struct socket *so);
 intfilt_solisten(struct knote *kn, long hint);
+intfilt_solistenmodify(struct kevent *kev, struct knote *kn);
+intfilt_solistenprocess(struct knote *kn, struct kevent *kev);
+intfilt_solisten_common(struct knote *kn, struct socket *so);
 
 const struct filterops solisten_filtops = {
.f_flags= FILTEROP_ISFD,
.f_attach   = NULL,
.f_detach   = filt_sordetach,
.f_event= filt_solisten,
+   .f_modify   = filt_solistenmodify,
+   .f_process  = filt_solistenprocess,
 };
 
 const struct filterops soread_filtops = {
@@ -86,6 +97,8 @@ const struct filterops soread_filtops = 

Re: snmpd rename context to pdutype

2021-05-18 Thread Martijn van Duren
ping

On Fri, 2021-05-07 at 16:18 +0200, Martijn van Duren wrote:
> When moving the traphandler to the snmpe process I overlooked the fact
> that "type" is being saved inside the switch statement under the
> sm_context name. RFC3411 talks about pduType, and the name context means
> something completely different in the v3 world.
> 
> The diff below moves our naming closer to the RFCs, which should
> hopefully prevent further confusion in the future.
> 
> While here I made the debug output print the pduType in a human readable
> format.
> 
> The invalid varbind check can be simplified a simple "{}" in the
> ober_scanf_elements allowing me to just drop the type variable.
> 
> OK?
> 
> martijn@
> 
> Index: snmp.h
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmp.h,v
> retrieving revision 1.16
> diff -u -p -r1.16 snmp.h
> --- snmp.h  30 Jun 2020 17:11:49 -  1.16
> +++ snmp.h  7 May 2021 14:17:12 -
> @@ -77,7 +77,7 @@ enum snmp_version {
> SNMP_V3 = 3
>  };
>  
> -enum snmp_context {
> +enum snmp_pdutype {
> SNMP_C_GETREQ   = 0,
> SNMP_C_GETNEXTREQ   = 1,
> SNMP_C_GETRESP  = 2,
> Index: snmpd.h
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmpd.h,v
> retrieving revision 1.94
> diff -u -p -r1.94 snmpd.h
> --- snmpd.h 5 Feb 2021 10:30:45 -   1.94
> +++ snmpd.h 7 May 2021 14:17:12 -
> @@ -384,7 +384,7 @@ struct snmp_message {
> socklen_t    sm_slen;
> int  sm_sock_tcp;
> int  sm_aflags;
> -   int  sm_type;
> +   enum snmp_pdutype    sm_pdutype;
> struct event sm_sockev;
> char sm_host[HOST_NAME_MAX+1];
> in_port_t    sm_port;
> @@ -405,7 +405,6 @@ struct snmp_message {
>  
> /* V1, V2c */
> char sm_community[SNMPD_MAXCOMMUNITYLEN];
> -   int  sm_context;
>  
> /* V3 */
> long long    sm_msgid;
> Index: snmpe.c
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmpe.c,v
> retrieving revision 1.70
> diff -u -p -r1.70 snmpe.c
> --- snmpe.c 22 Feb 2021 11:31:09 -  1.70
> +++ snmpe.c 7 May 2021 14:17:12 -
> @@ -41,6 +41,7 @@
>  #include "mib.h"
>  
>  void    snmpe_init(struct privsep *, struct privsep_proc *, void *);
> +const char *snmpe_pdutype2string(enum snmp_pdutype);
>  int snmpe_parse(struct snmp_message *);
>  void    snmpe_tryparse(int, struct snmp_message *);
>  int snmpe_parsevarbinds(struct snmp_message *);
> @@ -194,6 +195,36 @@ snmpe_bind(struct address *addr)
> return (-1);
>  }
>  
> +const char *
> +snmpe_pdutype2string(enum snmp_pdutype pdutype)
> +{
> +   static char unknown[sizeof("Unknown (4294967295)")];
> +
> +   switch (pdutype) {
> +   case SNMP_C_GETREQ:
> +   return "GetRequest";
> +   case SNMP_C_GETNEXTREQ:
> +   return "GetNextRequest";
> +   case SNMP_C_GETRESP:
> +   return "Response";
> +   case SNMP_C_SETREQ:
> +   return "SetRequest";
> +   case SNMP_C_TRAP:
> +   return "Trap";
> +   case SNMP_C_GETBULKREQ:
> +   return "GetBulkRequest";
> +   case SNMP_C_INFORMREQ:
> +   return "InformRequest";
> +   case SNMP_C_TRAPV2:
> +   return "SNMPv2-Trap";
> +   case SNMP_C_REPORT:
> +   return "Report";
> +   }
> +
> +   snprintf(unknown, sizeof(unknown), "Unknown (%u)", pdutype);
> +   return unknown;
> +}
> +
>  int
>  snmpe_parse(struct snmp_message *msg)
>  {
> @@ -202,7 +233,6 @@ snmpe_parse(struct snmp_message *msg)
> struct ber_element  *a;
> long long    ver, req;
> long long    errval, erridx;
> -   unsigned int type;
> u_int    class;
> char*comn;
> char*flagstr, *ctxname;
> @@ -271,15 +301,15 @@ snmpe_parse(struct snmp_message *msg)
> goto fail;
> }
>  
> -   if (ober_scanf_elements(msg->sm_pdu, "t{e", , , ) != 0)
> +   if (ober_scanf_elements(msg->sm_pdu, "t{e", , 
> &(msg->sm_pdutype),
> +   ) != 0)
> goto parsefail;
>  
> /* SNMP PDU context */
> if (class != BER_CLASS_CONTEXT)
> goto parsefail;
>  
> -   msg->sm_type = type;
> -   switch (type) {
> +   switch (msg->sm_pdutype) {
> case SNMP_C_GETBULKREQ:
> if (msg->sm_version == SNMP_V1) {
> stats->snmp_inbadversions++;
> @@ -294,7 +324,7 @@ snmpe_parse(struct snmp_message *msg)
> 

bgpd adjust graceful restart capability negotiation

2021-05-18 Thread Claudio Jeker
When I adjusted the capability negotiation to check both sides for
presence I made the graceful restart capability lose all AFI/SAFI
elements for the peer capabilities.

This can be viewed with bgpctl show nei: e.g

 Description: beznau-1
  BGP version 4, remote router-id 192.168.0.252
  BGP state = Established, up for 02:11:07
  Last read 00:00:09, holdtime 90s, keepalive interval 30s
  Last write 00:00:06
  Neighbor capabilities:
Multiprotocol extensions: IPv4 unicast
4-byte AS numbers
Route Refresh
Graceful Restart: Timeout: 90,
  Negotiated capabilities:
Multiprotocol extensions: IPv4 unicast
4-byte AS numbers
Route Refresh

  Message statistics:

Instead of

 Description: beznau-1
  BGP version 4, remote router-id 192.168.0.252
  BGP state = Established, up for 00:02:31
  Last read 00:00:00, holdtime 90s, keepalive interval 30s
  Last write 00:00:00
  Neighbor capabilities:
Multiprotocol extensions: IPv4 unicast
4-byte AS numbers
Route Refresh
Graceful Restart: Timeout: 90, restarted, IPv4 unicast
  Negotiated capabilities:
Multiprotocol extensions: IPv4 unicast
4-byte AS numbers
Route Refresh

This is just a visual issue. In both cases the flush happens and graceful
refresh remains disabled.

-- 
:wq Claudio

Index: session.c
===
RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
retrieving revision 1.415
diff -u -p -r1.415 session.c
--- session.c   16 May 2021 09:09:11 -  1.415
+++ session.c   18 May 2021 11:38:59 -
@@ -2585,24 +2585,24 @@ capa_neg_calc(struct peer *p)
int8_t  negflags;
 
/* disable GR if the AFI/SAFI is not present */
-   if (p->capa.ann.grestart.restart == 0 ||
-   (p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT &&
+   if ((p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT &&
p->capa.neg.mp[i] == 0))
p->capa.peer.grestart.flags[i] = 0; /* disable */
/* look at current GR state and decide what to do */
negflags = p->capa.neg.grestart.flags[i];
p->capa.neg.grestart.flags[i] = p->capa.peer.grestart.flags[i];
if (negflags & CAPA_GR_RESTARTING) {
-   if (!(p->capa.peer.grestart.flags[i] &
-   CAPA_GR_FORWARD)) {
+   if (p->capa.ann.grestart.restart != 0 ||
+   p->capa.peer.grestart.flags[i] & CAPA_GR_FORWARD) {
+   p->capa.neg.grestart.flags[i] |=
+   CAPA_GR_RESTARTING;
+   } else {
if (imsg_rde(IMSG_SESSION_FLUSH, p->conf.id,
, sizeof(i)) == -1)
return (-1);
log_peer_warnx(>conf, "graceful restart of "
"%s, not restarted, flushing", aid2str(i));
-   } else
-   p->capa.neg.grestart.flags[i] |=
-   CAPA_GR_RESTARTING;
+   }
}
}
p->capa.neg.grestart.timeout = p->capa.peer.grestart.timeout;



Re: Use atomic op for UVM map refcount

2021-05-18 Thread Vitaliy Makkoveev
On Tue, May 18, 2021 at 01:24:42PM +0200, Martin Pieuchot wrote:
> On 18/05/21(Tue) 12:07, Mark Kettenis wrote:
> > > Date: Tue, 18 May 2021 12:02:19 +0200
> > > From: Martin Pieuchot 
> > > 
> > > This allows us to not rely on the KERNEL_LOCK() to check reference
> > > counts.
> > > 
> > > Also reduces differences with NetBSD and shrink my upcoming `vmobjlock'
> > > diff.
> > > 
> > > ok?
> > 
> > Shouldn't we make ref_count volatile in that case?
> 
> I don't know,  I couldn't find any evidence about where to use "volatile"
> in the kernel.
> 
> My understanding is that using "volatile" tells the compiler to not
> "cache" the value of such field in a register because it can change at
> any time.  Is it so?
> 
> If that's correct, we should look at any piece of code reading such field
> multiple times without using atomic operation, right?
> 
> In this case `ref_count' is used once for sanity checks in UVM_MAP_REQ_WRITE()
> and after calling atomic_dec_int_nv() in uvm_map_deallocate().  So, I don't
> see "volatile" necessary here.  Did I miss anything?
> 
> There's only a couple of 'volatile' usages in sys/sys.  These annotations
> do not explicitly indicate which piece of code requires it.  Maybe it would
> be clearer to use a cast or a macro where necessary.  This might help us
> understand why and where "volatile" is needed.
> 

Also "volatile" tells compiler to not drop or reorder expression due to
optimisation. But this is not such case.

This diff is ok by me.

> > > Index: uvm/uvm_map.c
> > > ===
> > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> > > retrieving revision 1.274
> > > diff -u -p -r1.274 uvm_map.c
> > > --- uvm/uvm_map.c 26 Mar 2021 13:40:05 -  1.274
> > > +++ uvm/uvm_map.c 18 May 2021 09:36:55 -
> > > @@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma
> > >  /*
> > >   * uvm_map_reference: add reference to a map
> > >   *
> > > - * XXX check map reference counter lock
> > > + * => map need not be locked
> > >   */
> > > -#define uvm_map_reference(_map)  
> > > \
> > > - do {\
> > > - map->ref_count++;   \
> > > - } while (0)
> > > +void
> > > +uvm_map_reference(struct vm_map *map)
> > > +{
> > > + atomic_inc_int(>ref_count);
> > > +}
> > >  
> > >  /*
> > >   * Calculate the dused delta.
> > > @@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map)
> > >   int c;
> > >   struct uvm_map_deadq dead;
> > >  
> > > - c = --map->ref_count;
> > > + c = atomic_dec_int_nv(>ref_count);
> > >   if (c > 0) {
> > >   return;
> > >   }
> > > Index: uvm/uvm_map.h
> > > ===
> > > RCS file: /cvs/src/sys/uvm/uvm_map.h,v
> > > retrieving revision 1.69
> > > diff -u -p -r1.69 uvm_map.h
> > > --- uvm/uvm_map.h 12 Mar 2021 14:15:49 -  1.69
> > > +++ uvm/uvm_map.h 18 May 2021 09:36:36 -
> > > @@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry
> > >   * read_locks and write_locks are used in lock debugging code.
> > >   *
> > >   *  Locks used to protect struct members in this file:
> > > + *   a   atomic operations
> > >   *   I   immutable after creation or exec(2)
> > >   *   v   `vm_map_lock' (this map `lock' or `mtx')
> > >   */
> > > @@ -272,7 +273,7 @@ struct vm_map {
> > >   struct uvm_map_addr addr;   /* [v] Entry tree, by addr */
> > >  
> > >   vsize_t size;   /* virtual size */
> > > - int ref_count;  /* Reference count */
> > > + int ref_count;  /* [a] Reference count */
> > >   int flags;  /* flags */
> > >   struct mutexflags_lock; /* flags lock */
> > >   unsigned inttimestamp;  /* Version number */
> > > 
> > > 
> 



Re: Use atomic op for UVM map refcount

2021-05-18 Thread Martin Pieuchot
On 18/05/21(Tue) 12:07, Mark Kettenis wrote:
> > Date: Tue, 18 May 2021 12:02:19 +0200
> > From: Martin Pieuchot 
> > 
> > This allows us to not rely on the KERNEL_LOCK() to check reference
> > counts.
> > 
> > Also reduces differences with NetBSD and shrink my upcoming `vmobjlock'
> > diff.
> > 
> > ok?
> 
> Shouldn't we make ref_count volatile in that case?

I don't know,  I couldn't find any evidence about where to use "volatile"
in the kernel.

My understanding is that using "volatile" tells the compiler to not
"cache" the value of such field in a register because it can change at
any time.  Is it so?

If that's correct, we should look at any piece of code reading such field
multiple times without using atomic operation, right?

In this case `ref_count' is used once for sanity checks in UVM_MAP_REQ_WRITE()
and after calling atomic_dec_int_nv() in uvm_map_deallocate().  So, I don't
see "volatile" necessary here.  Did I miss anything?

There's only a couple of 'volatile' usages in sys/sys.  These annotations
do not explicitly indicate which piece of code requires it.  Maybe it would
be clearer to use a cast or a macro where necessary.  This might help us
understand why and where "volatile" is needed.

> > Index: uvm/uvm_map.c
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> > retrieving revision 1.274
> > diff -u -p -r1.274 uvm_map.c
> > --- uvm/uvm_map.c   26 Mar 2021 13:40:05 -  1.274
> > +++ uvm/uvm_map.c   18 May 2021 09:36:55 -
> > @@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma
> >  /*
> >   * uvm_map_reference: add reference to a map
> >   *
> > - * XXX check map reference counter lock
> > + * => map need not be locked
> >   */
> > -#define uvm_map_reference(_map)
> > \
> > -   do {\
> > -   map->ref_count++;   \
> > -   } while (0)
> > +void
> > +uvm_map_reference(struct vm_map *map)
> > +{
> > +   atomic_inc_int(>ref_count);
> > +}
> >  
> >  /*
> >   * Calculate the dused delta.
> > @@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map)
> > int c;
> > struct uvm_map_deadq dead;
> >  
> > -   c = --map->ref_count;
> > +   c = atomic_dec_int_nv(>ref_count);
> > if (c > 0) {
> > return;
> > }
> > Index: uvm/uvm_map.h
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_map.h,v
> > retrieving revision 1.69
> > diff -u -p -r1.69 uvm_map.h
> > --- uvm/uvm_map.h   12 Mar 2021 14:15:49 -  1.69
> > +++ uvm/uvm_map.h   18 May 2021 09:36:36 -
> > @@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry
> >   * read_locks and write_locks are used in lock debugging code.
> >   *
> >   *  Locks used to protect struct members in this file:
> > + * a   atomic operations
> >   * I   immutable after creation or exec(2)
> >   * v   `vm_map_lock' (this map `lock' or `mtx')
> >   */
> > @@ -272,7 +273,7 @@ struct vm_map {
> > struct uvm_map_addr addr;   /* [v] Entry tree, by addr */
> >  
> > vsize_t size;   /* virtual size */
> > -   int ref_count;  /* Reference count */
> > +   int ref_count;  /* [a] Reference count */
> > int flags;  /* flags */
> > struct mutexflags_lock; /* flags lock */
> > unsigned inttimestamp;  /* Version number */
> > 
> > 



OpenIKED 6.9.0 Released

2021-05-18 Thread Tobias Heider
OpenIKED 6.9.0 has just been released. It will be arriving in the
OpenIKED directory of your local OpenBSD mirror soon.

OpenIKED is a free, permissively licensed Internet Key Exchange
(IKEv2) implementation, developed as part of the OpenBSD project.
It is intended to be a lean, secure and interoperable daemon that
allows for easy setup and management of IPsec VPNs.

This release is based on OpenBSD 6.9.  The portable versions take
the OpenBSD based source code and add compatibility functions and
build infrastructure for other operating systems.

Version 6.9.0 is the first release of OpenIKED-portable in quite
some time.  Since the last portable release a significant amount
of features and bug fixes have been added.  If you have not used
OpenIKED-portable in a while, you will be positively surprised!

OpenIKED-portable is known to compile and run on at least the
following operating systems: Arch Linux, Debian 10, FreeBSD 12,
FreeBSD 13 and NetBSD 9.

It is our hope that packagers take interest and help adapt
OpenIKED-portable to more distributions.

OpenIKED-portable can be downloaded from any of the mirrors listed
at https://www.openbsd.org/ftp.html, from the /pub/OpenBSD/OpenIKED
directory.

General bugs may be reported to b...@openbsd.org.  Portable bugs
may be filed at https://github.com/openiked/openiked-portable.

We welcome feedback and improvements from the broader community.
Thanks to all of the contributors who helped make this release
possible.



bgpd upgrade to RFC6793

2021-05-18 Thread Claudio Jeker
Our four-byte AS support dates back to the days of the original draft.
Since then a new RFC 6793 got released that adjusted the error handling a
bit. RFC 6793 just treats any error on AS4_PATH attribute with attribute
drop with the hope that the AS_PATH is better.

The reason is a bit questionable:
   Given that the two-octet AS numbers dominate during the transition
   and are carried in the AS_PATH attribute by an OLD BGP speaker, in
   this document the "attribute discard" approach is chosen to handle a
   malformed AS4_PATH attribute.
My IPv4 feed has 41148 of 71059 or 57% two-octet AS numbers. I would not
call that "dominate" (maybe the transition period is over...)

Anyway, lets follow the standard
-- 
:wq Claudio

Index: bgpd.8
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.8,v
retrieving revision 1.66
diff -u -p -r1.66 bgpd.8
--- bgpd.8  27 Apr 2021 11:34:58 -  1.66
+++ bgpd.8  18 May 2021 10:04:47 -
@@ -317,14 +317,6 @@ has been started.
 .Re
 .Pp
 .Rs
-.%A Q. Vohra
-.%A E. Chen
-.%D May 2007
-.%R RFC 4893
-.%T BGP Support for Four-octet AS Number Space
-.Re
-.Pp
-.Rs
 .%A V. Gill
 .%A J. Heasley
 .%A D. Meyer
@@ -349,6 +341,14 @@ has been started.
 .%D June 2011
 .%R RFC 6286
 .%T Autonomous-System-Wide Unique BGP Identifier for BGP-4
+.Re
+.Pp
+.Rs
+.%A Q. Vohra
+.%A E. Chen
+.%D Dec 2012
+.%R RFC 6793
+.%T BGP Support for Four-Octet Autonomous System (AS) Number Space
 .Re
 .Pp
 .Rs
Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.520
diff -u -p -r1.520 rde.c
--- rde.c   6 May 2021 09:18:54 -   1.520
+++ rde.c   18 May 2021 10:29:15 -
@@ -1918,27 +1918,11 @@ bad_flags:
goto bad_flags;
if ((error = aspath_verify(p, attr_len, 1,
rde_no_as_set(peer))) != 0) {
-   /*
-* XXX RFC does not specify how to handle errors.
-* XXX Instead of dropping the session because of a
-* XXX bad path just mark the full update as having
-* XXX a parse error which makes the update no longer
-* XXX eligible and will not be considered for routing
-* XXX or redistribution.
-* XXX We follow draft-ietf-idr-optional-transitive
-* XXX by looking at the partial bit.
-* XXX Consider soft errors similar to a partial attr.
-*/
-   if (flags & ATTR_PARTIAL || error == AS_ERR_SOFT) {
-   a->flags |= F_ATTR_PARSE_ERR;
-   log_peer_warnx(>conf, "bad AS4_PATH, "
-   "path invalidated and prefix withdrawn");
-   goto optattr;
-   } else {
-   rde_update_err(peer, ERR_UPDATE, ERR_UPD_ASPATH,
-   NULL, 0);
-   return (-1);
-   }
+   /* As per RFC6793 use "attribute discard" here. */
+   log_peer_warnx(>conf, "bad AS4_PATH, "
+   "attribute discarded");
+   plen += attr_len;
+   break;
}
a->flags |= F_ATTR_AS4BYTE_NEW;
goto optattr;



Re: Use atomic op for UVM map refcount

2021-05-18 Thread Mark Kettenis
> Date: Tue, 18 May 2021 12:02:19 +0200
> From: Martin Pieuchot 
> 
> This allows us to not rely on the KERNEL_LOCK() to check reference
> counts.
> 
> Also reduces differences with NetBSD and shrink my upcoming `vmobjlock'
> diff.
> 
> ok?

Shouldn't we make ref_count volatile in that case?

> Index: uvm/uvm_map.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> retrieving revision 1.274
> diff -u -p -r1.274 uvm_map.c
> --- uvm/uvm_map.c 26 Mar 2021 13:40:05 -  1.274
> +++ uvm/uvm_map.c 18 May 2021 09:36:55 -
> @@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma
>  /*
>   * uvm_map_reference: add reference to a map
>   *
> - * XXX check map reference counter lock
> + * => map need not be locked
>   */
> -#define uvm_map_reference(_map)  
> \
> - do {\
> - map->ref_count++;   \
> - } while (0)
> +void
> +uvm_map_reference(struct vm_map *map)
> +{
> + atomic_inc_int(>ref_count);
> +}
>  
>  /*
>   * Calculate the dused delta.
> @@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map)
>   int c;
>   struct uvm_map_deadq dead;
>  
> - c = --map->ref_count;
> + c = atomic_dec_int_nv(>ref_count);
>   if (c > 0) {
>   return;
>   }
> Index: uvm/uvm_map.h
> ===
> RCS file: /cvs/src/sys/uvm/uvm_map.h,v
> retrieving revision 1.69
> diff -u -p -r1.69 uvm_map.h
> --- uvm/uvm_map.h 12 Mar 2021 14:15:49 -  1.69
> +++ uvm/uvm_map.h 18 May 2021 09:36:36 -
> @@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry
>   * read_locks and write_locks are used in lock debugging code.
>   *
>   *  Locks used to protect struct members in this file:
> + *   a   atomic operations
>   *   I   immutable after creation or exec(2)
>   *   v   `vm_map_lock' (this map `lock' or `mtx')
>   */
> @@ -272,7 +273,7 @@ struct vm_map {
>   struct uvm_map_addr addr;   /* [v] Entry tree, by addr */
>  
>   vsize_t size;   /* virtual size */
> - int ref_count;  /* Reference count */
> + int ref_count;  /* [a] Reference count */
>   int flags;  /* flags */
>   struct mutexflags_lock; /* flags lock */
>   unsigned inttimestamp;  /* Version number */
> 
> 



Use atomic op for UVM map refcount

2021-05-18 Thread Martin Pieuchot
This allows us to not rely on the KERNEL_LOCK() to check reference
counts.

Also reduces differences with NetBSD and shrink my upcoming `vmobjlock'
diff.

ok?

Index: uvm/uvm_map.c
===
RCS file: /cvs/src/sys/uvm/uvm_map.c,v
retrieving revision 1.274
diff -u -p -r1.274 uvm_map.c
--- uvm/uvm_map.c   26 Mar 2021 13:40:05 -  1.274
+++ uvm/uvm_map.c   18 May 2021 09:36:55 -
@@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma
 /*
  * uvm_map_reference: add reference to a map
  *
- * XXX check map reference counter lock
+ * => map need not be locked
  */
-#define uvm_map_reference(_map)
\
-   do {\
-   map->ref_count++;   \
-   } while (0)
+void
+uvm_map_reference(struct vm_map *map)
+{
+   atomic_inc_int(>ref_count);
+}
 
 /*
  * Calculate the dused delta.
@@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map)
int c;
struct uvm_map_deadq dead;
 
-   c = --map->ref_count;
+   c = atomic_dec_int_nv(>ref_count);
if (c > 0) {
return;
}
Index: uvm/uvm_map.h
===
RCS file: /cvs/src/sys/uvm/uvm_map.h,v
retrieving revision 1.69
diff -u -p -r1.69 uvm_map.h
--- uvm/uvm_map.h   12 Mar 2021 14:15:49 -  1.69
+++ uvm/uvm_map.h   18 May 2021 09:36:36 -
@@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry
  * read_locks and write_locks are used in lock debugging code.
  *
  *  Locks used to protect struct members in this file:
+ * a   atomic operations
  * I   immutable after creation or exec(2)
  * v   `vm_map_lock' (this map `lock' or `mtx')
  */
@@ -272,7 +273,7 @@ struct vm_map {
struct uvm_map_addr addr;   /* [v] Entry tree, by addr */
 
vsize_t size;   /* virtual size */
-   int ref_count;  /* Reference count */
+   int ref_count;  /* [a] Reference count */
int flags;  /* flags */
struct mutexflags_lock; /* flags lock */
unsigned inttimestamp;  /* Version number */



Re: Regarding May 17 patch

2021-05-18 Thread Florian Obser
No, EVBUFFER_LENGTH(clt->clt_srvevb) will always be 8 for an
FCGI_END_REQUEST, see:
http://www.mit.edu/~yandros/doc/specs/fcgi-spec.html#S5.5
(Assuming the fcgi server is well behaved).
On 2021-05-18 09:12 +02, Matthias Pressfreund  wrote:
> Hi Florian, isn't this what you actually wanted?
>
> Index: usr.sbin/httpd/server_fcgi.c
> ===
> RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v
> retrieving revision 1.86
> diff -u -p -u -p -r1.86 server_fcgi.c
> --- usr.sbin/httpd/server_fcgi.c17 May 2021 09:26:52 -  1.86
> +++ usr.sbin/httpd/server_fcgi.c18 May 2021 07:08:24 -
> @@ -616,7 +616,7 @@ server_fcgi_header(struct client *clt, u
> if (kv_add(>http_headers, "Server", HTTPD_SERVERNAME) == NULL)
> return (-1);
>
> -   if (clt->clt_fcgi.type == FCGI_END_REQUEST ||
> +   if (clt->clt_fcgi.type == FCGI_END_REQUEST &&
> EVBUFFER_LENGTH(clt->clt_srvevb) == 0) {
> /* Can't chunk encode an empty body. */
> clt->clt_fcgi.chunked = 0;
>

-- 
I'm not entirely sure you are real.



Re: httpd(8): don't try to chunk-encode an empty body

2021-05-18 Thread Florian Obser
On 2021-05-18 00:47 +02, Sebastian Benoit  wrote:
> The comments in server_fcgi_header seem to suggest more dragons lurk in this
> area.

Sush!

-- 
I'm not entirely sure you are real.