Re: macppc bsd.mp pmap's hash lock
On Thu, 13 May 2021 02:20:45 -0400 George Koehler wrote: > My last diff (11 May 2021) still has a potential problem with memory > barriers. I will mail a new diff if I think of a fix. Here's my new diff. It is a copy of what I sent to ppc@ a moment ago. I would ask, "Is it ok to commit?", but while writing this mail, I found a 2nd potential problem with memory barriers, so I will need to edit this diff yet again. I add a 2nd membar_enter() to __ppc_lock() to avoid my 1st "potential problem". I also simplify the code by changing __ppc_lock_spin to check the owning cpu, not the count. A moment ago, I sent a copy of this diff to ppc@, in reply to another report of bsd.mp hanging at "using parent ..." on a G5. My own hang happened when the function __ppc_lock(), in a randomly reordered bsd.mp, crossed a page boundary and caused a page fault with a recursive call to __ppc_lock(). The goal of my diff is to prevent such hangs, by allowing such recursive calls to work. I acquire the lock with an atomic 32-bit write, so the lock is always in a valid state before or after the write. The old code did spin until mpl_count == 0. In my earlier diffs, I acquired the lock by setting both the owning cpu and a nonzero count in one write, so I needed to pack both owner and count in a 32-bit integer. This diff spins until mpl_cpu == NULL. I acquire the lock by setting only mpl_cpu and leaving mpl_count at 0. So mpl_cpu is the spinlock, and mpl_count is a variable protected by the lock. I no longer need to pack both fields in a 32-bit integer, so I get rid of my BOLT* macros for packing and unpacking the fields. I need 2 memory barriers: 1. after acquiring the lock, before accessing the locked data. 2. after accessing the locked data, before releasing the lock. I added the 2nd membar_enter(), because I feared that a cpu would acquire the lock, but get a page fault (and recursive call) before it would reach the memory barrier. I didn't think of the opposite case, where a cpu would do membar_exit(), but gets a page fault before it would release the lock. This is the 2nd potential problem that I didn't fix. I didn't observe an actual problem after running this diff for more than 24 hours. --George Index: sys/arch/powerpc/include/mplock.h === RCS file: /cvs/src/sys/arch/powerpc/include/mplock.h,v retrieving revision 1.4 diff -u -p -r1.4 mplock.h --- sys/arch/powerpc/include/mplock.h 15 Apr 2020 08:09:00 - 1.4 +++ sys/arch/powerpc/include/mplock.h 16 May 2021 00:51:41 - @@ -30,13 +30,14 @@ #define __USE_MI_MPLOCK /* + * __ppc_lock exists because pte_spill_r() can't use __mp_lock. * Really simple spinlock implementation with recursive capabilities. * Correctness is paramount, no fancyness allowed. */ struct __ppc_lock { - volatile struct cpu_info *mpl_cpu; - volatile long mpl_count; + struct cpu_info *volatile mpl_cpu; + longmpl_count; }; #ifndef _LOCORE @@ -44,10 +45,6 @@ struct __ppc_lock { void __ppc_lock_init(struct __ppc_lock *); void __ppc_lock(struct __ppc_lock *); void __ppc_unlock(struct __ppc_lock *); -int __ppc_release_all(struct __ppc_lock *); -int __ppc_release_all_but_one(struct __ppc_lock *); -void __ppc_acquire_count(struct __ppc_lock *, int); -int __ppc_lock_held(struct __ppc_lock *, struct cpu_info *); #endif Index: sys/arch/powerpc/powerpc/lock_machdep.c === RCS file: /cvs/src/sys/arch/powerpc/powerpc/lock_machdep.c,v retrieving revision 1.9 diff -u -p -r1.9 lock_machdep.c --- sys/arch/powerpc/powerpc/lock_machdep.c 15 Apr 2020 08:09:00 - 1.9 +++ sys/arch/powerpc/powerpc/lock_machdep.c 16 May 2021 00:51:41 - @@ -1,6 +1,7 @@ /* $OpenBSD: lock_machdep.c,v 1.9 2020/04/15 08:09:00 mpi Exp $*/ /* + * Copyright (c) 2021 George Koehler * Copyright (c) 2007 Artur Grabowski * * Permission to use, copy, modify, and distribute this software for any @@ -22,10 +23,21 @@ #include #include -#include #include +/* + * If __ppc_lock() crosses a page boundary in the kernel text, then it + * may cause a page fault (on G5 with ppc_nobat), and pte_spill_r() + * would recursively call __ppc_lock(). The lock must be in a valid + * state when the page fault happens. We acquire or release the lock + * with a 32-bit atomic write to mpl_owner, so the lock is always in a + * valid state, before or after the write. + * + * Acquired the lock: mpl->mpl_cpu == curcpu() + * Released the lock: mpl->mpl_cpu == NULL + */ + void __ppc_lock_init(struct __ppc_lock *lock) { @@ -46,12 +58,12 @@ static __inline void __ppc_lock_spin(struct __ppc_lock *mpl) { #ifndef MP_LOCKDEBUG - while (mpl->mpl_count != 0) + while (mpl->mpl_cpu != NULL) CPU_BUSY_CYCLE(); #else int nticks = __mp_lock_spinout;
[patch] tcpdump: Sync DNS types with IANA
Sync the DNS types with IANA[1] and upstream[2]. With this the Type65 queries show up as HTTPS. Removed the UNSPECA type parsing as IANA has that query type number assigned to NID now. Also added a const on ns_class2str. 1: https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml 2: https://github.com/the-tcpdump-group/tcpdump/pull/912 diff --git nameser.h nameser.h index ddb5e065c4a..616e0028cff 100644 --- nameser.h +++ nameser.h @@ -145,25 +145,47 @@ #define T_ATMA 34 /* ATM Address */ #define T_NAPTR35 /* Naming Authority PoinTeR */ #define T_KX 36 /* Key Exchanger */ -#define T_CERT 37 /* certificate */ +#define T_CERT 37 /* Certificates in the DNS */ #define T_A6 38 /* IP6 address */ #define T_DNAME39 /* non-terminal redirection */ -#define T_SINK 40 /* SINK */ +#define T_SINK 40 /* unknown */ #define T_OPT 41 /* EDNS0 option (meta-RR) */ -#define T_APL 42 /* APL */ +#define T_APL 42 /* lists of address prefixes */ #define T_DS 43 /* Delegation Signer */ -#define T_SSHFP44 /* SSH Key Fingerprint */ +#define T_SSHFP44 /* SSH Fingerprint */ #define T_IPSECKEY 45 /* IPsec keying material */ -#define T_RRSIG46 /* RRSIG */ -#define T_NSEC 47 /* NSEC */ -#define T_DNSKEY 48 /* DNSKEY */ +#define T_RRSIG46 /* new security signature */ +#define T_NSEC 47 /* provable insecure information */ +#define T_DNSKEY 48 /* new security key */ +#define T_DHCID49 /* DHCP IDentifier */ +#define T_NSEC350 /* Next SECure record v3 */ +#define T_NSEC3PARAM 51 /* NSEC3 PARAMeter */ +#define T_TLSA 52 /* TLS Authentication */ +#define T_SMIMEA 53 /* S/MIME Authentication */ +/* Unassigned */ +#define T_HIP 55 /* Host Identity Protocol */ +#define T_NINFO56 /* zone status information */ +#define T_RKEY 57 /* Record encryption KEY */ +#define T_TALINK 58 /* Trust Anchor LINK */ +#define T_CDS 59 /* Child Delegation Signer */ +#define T_CDNSKEY 60 /* Child DNSKEY */ +#define T_OPENPGPKEY 61 /* OpenPGP KEY */ +#define T_CSYNC62 /* Child to parent SYNCronization */ +#define T_ZONEMD 63 /* ZONE data Message Digest */ +#define T_SVCB 64 /* SerViCe Binding */ +#define T_HTTPS65 /* HTTPS binding */ /* non standard */ #define T_SPF 99 /* sender policy framework */ #define T_UINFO100 /* user (finger) information */ #define T_UID 101 /* user ID */ #define T_GID 102 /* group ID */ #define T_UNSPEC 103 /* Unspecified format (binary data) */ -#define T_UNSPECA 104 /* "unspecified ascii". Ugly MIT hack */ +#define T_NID 104 /* Node IDentifier */ +#define T_L32 105 /* Locator 32-bit */ +#define T_L64 106 /* Locator 64-bit */ +#define T_LP 107 /* Locator Pointer */ +#define T_EUI48108 /* an EUI-48 address */ +#define T_EUI64109 /* an EUI-64 address */ /* Query type values which do not appear in resource records */ #define T_TKEY 249 /* Transaction Key [RFC2930] */ #define T_TSIG 250 /* Transaction Signature [RFC2845] */ @@ -172,6 +194,13 @@ #define T_MAILB253 /* transfer mailbox records */ #define T_MAILA254 /* transfer mail agent records */ #define T_ANY 255 /* wildcard match */ +#define T_URI 256 /* uri records [RFC7553] */ +#define T_CAA 257 /* Certification Authority Authorization */ +#define T_AVC 258 /* Application Visibility and Control */ +#define T_DOA 259 /* Digital Object Architecture */ +#define T_AMTRELAY 260 /* Automatic Multicast Tunneling RELAY */ +#define T_TA 32768 /* DNSSEC Trust Authorities */ +#define T_DLV 32769 /* DNSSEC Lookaside Validation */ /* * Values for class field diff --git print-domain.c print-domain.c index f5d74e8227c..023f8020738 100644 --- print-domain.c +++ print-domain.c @@ -252,7
Re: [External] : Re: parallel forwarding vs. bridges
Hello, just for the record... > > in current tree the ether_input() is protected by NET_LOCK(), which is > > grabbed > > by caller as a writer. bluhm's diff changes NET_LOCK() readlock, so > > ether_input() can run concurrently. Switching NET_LOCK() to r-lock has > > implications on smr read section above. The ting is the call to > > eb->eb_input() > > can sleep now. This is something what needs to be avoided within smr > > section. > > Is the new sleeping point introduced by the fact the PF_LOCK() is a > rwlock? Did you consider using a mutex, at least for the time being, > in order to not run in such issues? below is a diff, which trades both pf(4) rw-locks for mutexes. diff compiles, it still needs testing/experimenting. regards sashan 8<---8<---8<--8< diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c index ae7bb008351..30c78da7d16 100644 --- a/sys/net/pf_ioctl.c +++ b/sys/net/pf_ioctl.c @@ -146,8 +146,8 @@ TAILQ_HEAD(pf_tags, pf_tagname) pf_tags = TAILQ_HEAD_INITIALIZER(pf_tags), * grab the lock as writer. Whenever packet creates state it grabs pf_lock * first then it locks pf_state_lock as the writer. */ -struct rwlock pf_lock = RWLOCK_INITIALIZER("pf_lock"); -struct rwlock pf_state_lock = RWLOCK_INITIALIZER("pf_state_lock"); +struct mutexpf_lock = MUTEX_INITIALIZER(IPL_NET); +struct mutexpf_state_lock = MUTEX_INITIALIZER(IPL_NET); #if (PF_QNAME_SIZE != PF_TAG_NAME_SIZE) #error PF_QNAME_SIZE must be equal to PF_TAG_NAME_SIZE diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h index b1a122af409..c5ae392b468 100644 --- a/sys/net/pfvar_priv.h +++ b/sys/net/pfvar_priv.h @@ -39,7 +39,7 @@ #include -extern struct rwlock pf_lock; +extern struct mutex pf_lock; struct pf_pdesc { struct { @@ -104,52 +104,41 @@ extern struct timeout pf_purge_to; struct pf_state*pf_state_ref(struct pf_state *); voidpf_state_unref(struct pf_state *); -extern struct rwlock pf_lock; -extern struct rwlock pf_state_lock; +extern struct mutexpf_lock; +extern struct mutexpf_state_lock; #define PF_LOCK() do {\ NET_ASSERT_LOCKED();\ - rw_enter_write(_lock); \ + mtx_enter(_lock);\ } while (0) #define PF_UNLOCK()do {\ PF_ASSERT_LOCKED(); \ - rw_exit_write(_lock);\ + mtx_leave(_lock);\ } while (0) -#define PF_ASSERT_LOCKED() do {\ - if (rw_status(_lock) != RW_WRITE)\ - splassert_fail(RW_WRITE,\ - rw_status(_lock),__func__);\ - } while (0) +#define PF_ASSERT_LOCKED() (void)(0) -#define PF_ASSERT_UNLOCKED() do {\ - if (rw_status(_lock) == RW_WRITE)\ - splassert_fail(0, rw_status(_lock), __func__);\ - } while (0) +#define PF_ASSERT_UNLOCKED() (void)(0) #define PF_STATE_ENTER_READ() do {\ - rw_enter_read(_state_lock); \ + mtx_enter(_state_lock); \ } while (0) #define PF_STATE_EXIT_READ() do {\ - rw_exit_read(_state_lock); \ + mtx_leave(_state_lock); \ } while (0) #define PF_STATE_ENTER_WRITE() do {\ - rw_enter_write(_state_lock); \ + mtx_enter(_state_lock); \ } while (0) #define PF_STATE_EXIT_WRITE() do {\ PF_ASSERT_STATE_LOCKED(); \ - rw_exit_write(_state_lock); \ + mtx_leave(_state_lock); \ } while (0) -#define PF_ASSERT_STATE_LOCKED() do {\ - if (rw_status(_state_lock) != RW_WRITE)\ - splassert_fail(RW_WRITE,\ - rw_status(_state_lock), __func__);\ - } while (0) +#define PF_ASSERT_STATE_LOCKED() (void)(0) extern void pf_purge_timeout(void *); extern void pf_purge(void *);
rpki-client 7.1 released
rpki-client 7.1 has just been released and will be available in the rpki-client directory of any OpenBSD mirror soon. rpki-client is a FREE, easy-to-use implementation of the Resource Public Key Infrastructure (RPKI) for Relying Parties (RP) to facilitate validation of the Route Origin of a BGP announcement. The program queries the RPKI repository system and outputs Validated ROA Payloads in the configuration format of OpenBGPD, BIRD, and also as CSV or JSON objects for consumption by other routing stacks. See RFC 6811 for a description of how BGP Prefix Origin Validation secures the Internet's global routing system. rpki-client was primarily developed by Kristaps Dzonsons, Claudio Jeker, Job Snijders, and Sebastian Benoit as part of the OpenBSD Project. This release includes the following changes to the previous release: * Add keep-alive support to the HTTP client code for RRDP, * Reference-count and delete unused files synced via RRDP, as far as possible, * In the JSON output, change the AS Number from a string ("AS123") to an integer ("123") to make processing of the output easier, * Add an 'expires' column to CSV & JSON output, based on certificate and CRL validity times. The 'expires' value can be used to avoid route selection based on stale data when generating VRP sets, when faced with loss of communication between consumer and valdiator, or validator and CA repository, * Make the runtime timeout (-s option) also triggers in child proecesses. * Improved RRDP support, we encourage testing of RRDP with the -r option so that RRDP can be enabled by default in a future release. Please report any issues found. In the portable version, * Improve support for older libressl versions (altough the latest stable release is recommended), * Add missing compat headers in release packages so they build on Alpine Linux and macOS. rpki-client is known to compile and run on at least the following operating systems: Alpine 3.12, Debian 9, 10, Fedora 31, 32, 33, macOS, RHEL/CentOS 7, 8, Windows Subsystem for Linux 2, and OpenBSD. It is our hope that packagers take interest and help adapt rpki-client-portable to more distributions. The mirrors where rpki-client can be found are on https://www.rpki-client.org/portable.html Reporting Bugs: === General bugs may be reported to tech@openbsd.org Portable bugs may be filed at https://github.com/rpki-client/rpki-client-portable We welcome feedback and improvements from the broader community. Thanks to all of the contributors who helped make this release possible.
Re: snmpd rename context to pdutype
On Fri, May 07, 2021 at 04:18:50PM +0200, Martijn van Duren wrote: > When moving the traphandler to the snmpe process I overlooked the fact > that "type" is being saved inside the switch statement under the > sm_context name. RFC3411 talks about pduType, and the name context means > something completely different in the v3 world. > > The diff below moves our naming closer to the RFCs, which should > hopefully prevent further confusion in the future. > > While here I made the debug output print the pduType in a human readable > format. > > The invalid varbind check can be simplified a simple "{}" in the > ober_scanf_elements allowing me to just drop the type variable. > > OK? I tested it and the diff looks good and legit for me. > martijn@ > > Index: snmp.h > === > RCS file: /cvs/src/usr.sbin/snmpd/snmp.h,v > retrieving revision 1.16 > diff -u -p -r1.16 snmp.h > --- snmp.h30 Jun 2020 17:11:49 - 1.16 > +++ snmp.h7 May 2021 14:17:12 - > @@ -77,7 +77,7 @@ enum snmp_version { > SNMP_V3 = 3 > }; > > -enum snmp_context { > +enum snmp_pdutype { > SNMP_C_GETREQ = 0, > SNMP_C_GETNEXTREQ = 1, > SNMP_C_GETRESP = 2, > Index: snmpd.h > === > RCS file: /cvs/src/usr.sbin/snmpd/snmpd.h,v > retrieving revision 1.94 > diff -u -p -r1.94 snmpd.h > --- snmpd.h 5 Feb 2021 10:30:45 - 1.94 > +++ snmpd.h 7 May 2021 14:17:12 - > @@ -384,7 +384,7 @@ struct snmp_message { > socklen_tsm_slen; > int sm_sock_tcp; > int sm_aflags; > - int sm_type; > + enum snmp_pdutypesm_pdutype; > struct event sm_sockev; > char sm_host[HOST_NAME_MAX+1]; > in_port_tsm_port; > @@ -405,7 +405,6 @@ struct snmp_message { > > /* V1, V2c */ > char sm_community[SNMPD_MAXCOMMUNITYLEN]; > - int sm_context; > > /* V3 */ > long longsm_msgid; > Index: snmpe.c > === > RCS file: /cvs/src/usr.sbin/snmpd/snmpe.c,v > retrieving revision 1.70 > diff -u -p -r1.70 snmpe.c > --- snmpe.c 22 Feb 2021 11:31:09 - 1.70 > +++ snmpe.c 7 May 2021 14:17:12 - > @@ -41,6 +41,7 @@ > #include "mib.h" > > void snmpe_init(struct privsep *, struct privsep_proc *, void *); > +const char *snmpe_pdutype2string(enum snmp_pdutype); > int snmpe_parse(struct snmp_message *); > void snmpe_tryparse(int, struct snmp_message *); > int snmpe_parsevarbinds(struct snmp_message *); > @@ -194,6 +195,36 @@ snmpe_bind(struct address *addr) > return (-1); > } > > +const char * > +snmpe_pdutype2string(enum snmp_pdutype pdutype) > +{ > + static char unknown[sizeof("Unknown (4294967295)")]; > + > + switch (pdutype) { > + case SNMP_C_GETREQ: > + return "GetRequest"; > + case SNMP_C_GETNEXTREQ: > + return "GetNextRequest"; > + case SNMP_C_GETRESP: > + return "Response"; > + case SNMP_C_SETREQ: > + return "SetRequest"; > + case SNMP_C_TRAP: > + return "Trap"; > + case SNMP_C_GETBULKREQ: > + return "GetBulkRequest"; > + case SNMP_C_INFORMREQ: > + return "InformRequest"; > + case SNMP_C_TRAPV2: > + return "SNMPv2-Trap"; > + case SNMP_C_REPORT: > + return "Report"; > + } > + > + snprintf(unknown, sizeof(unknown), "Unknown (%u)", pdutype); > + return unknown; > +} > + > int > snmpe_parse(struct snmp_message *msg) > { > @@ -202,7 +233,6 @@ snmpe_parse(struct snmp_message *msg) > struct ber_element *a; > long longver, req; > long longerrval, erridx; > - unsigned int type; > u_intclass; > char*comn; > char*flagstr, *ctxname; > @@ -271,15 +301,15 @@ snmpe_parse(struct snmp_message *msg) > goto fail; > } > > - if (ober_scanf_elements(msg->sm_pdu, "t{e", , , ) != 0) > + if (ober_scanf_elements(msg->sm_pdu, "t{e", , &(msg->sm_pdutype), > + ) != 0) > goto parsefail; > > /* SNMP PDU context */ > if (class != BER_CLASS_CONTEXT) > goto parsefail; > > - msg->sm_type = type; > - switch (type) { > + switch (msg->sm_pdutype) { > case SNMP_C_GETBULKREQ: > if (msg->sm_version == SNMP_V1) { > stats->snmp_inbadversions++; > @@ -294,7 +324,7 @@ snmpe_parse(struct snmp_message *msg) > /* FALLTHROUGH */ > > case SNMP_C_GETNEXTREQ: > - if (type ==
bgpd extend capability support (add-path, enhanced rr)
bgpd(8) will soon support ADD-PATH (RFC7911) and enhanced route refresh (RFC7313). This is the frist step toward this. It add the capability parsers, extends the capability struct and adds the capability negotiation bits. The route refresh parser and generator are extended to support the BoRR and EoRR message and last but not least bgpctl is adjusted to print the new capabilities. Now since the system has no way of enabling the two new capabilities. bgpctl will only show if the peer sends the capability but there should be no other effect. The RDE bits for enahnced route refresh are almost ready, add-path will take a bit more since the RDE needs to grow an extra indirection. I decided to split this work up to simplify review. Lets see if this works :) -- :wq Claudio Index: bgpctl/bgpctl.c === RCS file: /cvs/src/usr.sbin/bgpctl/bgpctl.c,v retrieving revision 1.267 diff -u -p -r1.267 bgpctl.c --- bgpctl/bgpctl.c 3 May 2021 14:01:56 - 1.267 +++ bgpctl/bgpctl.c 18 May 2021 08:49:34 - @@ -1351,6 +1351,13 @@ print_capability(u_int8_t capa_code, u_c } else printf("bad length"); break; + case CAPA_ADD_PATH: + printf("add-path capability"); + /* XXX there is more needed here */ + break; + case CAPA_ENHANCED_RR: + printf("enhanced route refresh capability"); + break; default: printf("unknown capability %u length %u", capa_code, len); break; Index: bgpctl/output.c === RCS file: /cvs/src/usr.sbin/bgpctl/output.c,v retrieving revision 1.16 diff -u -p -r1.16 output.c --- bgpctl/output.c 26 Apr 2021 18:23:20 - 1.16 +++ bgpctl/output.c 18 May 2021 15:17:39 - @@ -147,6 +147,34 @@ show_neighbor_capa_mp(struct capabilitie } static void +show_neighbor_capa_add_path(struct capabilities *capa) +{ + const char *mode; + int comma; + u_int8_ti; + + printf("Add-path: "); + for (i = 0, comma = 0; i < AID_MAX; i++) { + switch (capa->add_path[i]) { + case 0: + default: + continue; + case CAPA_AP_RECV: + mode = "recv"; + break; + case CAPA_AP_SEND: + mode = "send"; + break; + case CAPA_AP_BIDIR: + mode = "bidir"; + } + printf("%s%s %s", comma ? ", " : "", aid2str(i), mode); + comma = 1; + } + printf("\n"); +} + +static void show_neighbor_capa_restart(struct capabilities *capa) { int comma; @@ -202,6 +230,13 @@ show_neighbor_msgstats(struct peer *p) p->stats.prefix_sent_withdraw, p->stats.prefix_rcvd_withdraw); printf(" %-15s %10llu %10llu\n", "End-of-Rib", p->stats.prefix_sent_eor, p->stats.prefix_rcvd_eor); + printf(" Route Refresh statistics:\n"); + printf(" %-15s %10llu %10llu\n", "Request", + p->stats.refresh_sent_req, p->stats.refresh_rcvd_req); + printf(" %-15s %10llu %10llu\n", "Begin-of-RR", + p->stats.refresh_sent_borr, p->stats.refresh_rcvd_borr); + printf(" %-15s %10llu %10llu\n", "End-of-RR", + p->stats.refresh_sent_eorr, p->stats.refresh_rcvd_eorr); } static void @@ -210,7 +245,7 @@ show_neighbor_full(struct peer *p, struc const char *errstr; struct in_addr ina; char*s; - int hascapamp = 0; + int hascapamp, hascapaap; u_int8_t i; if ((p->conf.remote_addr.aid == AID_INET && @@ -279,35 +314,57 @@ show_neighbor_full(struct peer *p, struc fmt_monotime(p->stats.last_read), p->holdtime, p->holdtime/3); printf(" Last write %s\n", fmt_monotime(p->stats.last_write)); - for (i = 0; i < AID_MAX; i++) + + hascapamp = 0; + hascapaap = 0; + for (i = AID_MIN; i < AID_MAX; i++) { if (p->capa.peer.mp[i]) hascapamp = 1; - if (hascapamp || p->capa.peer.refresh || - p->capa.peer.grestart.restart || p->capa.peer.as4byte) { + if (p->capa.peer.add_path[i]) + hascapaap = 1; + } + if (hascapamp || hascapaap || p->capa.peer.grestart.restart || + p->capa.peer.refresh || p->capa.peer.enhanced_rr || + p->capa.peer.as4byte) { printf(" Neighbor capabilities:\n"); if (hascapamp) show_neighbor_capa_mp(>capa.peer); + if (p->capa.peer.as4byte) +
Add f_modify and f_process callbacks to socket filterops
This diff adds f_modify and f_process callbacks to socket event filters. As a result, socket events are handled using the non-legacy paths in filter_modify() and filter_process() of kern_event.c This a step toward MP-safety. However, everything still runs under the kernel lock. The change has three intended effects: * Socket events are handled without raising the system priority level. This makes the activity observable with btrace(8). * kqueue itself no longer calls f_event of socket filterops, which allows replacing the conditional, NOTE_SUBMIT-based locking with a fixed call pattern. * The state of a socket event is now always rechecked before delivery to user. Before, the recheck was skipped if the event was registered with EV_ONESHOT. However, the change of behaviour with EV_ONESHOT is questionable. When an activated event is being processed, the code will acquire the socket lock anyway. Skipping the state check would only be a minor optimization. In addition, I think the behaviour becomes more consistent as now a delivered EV_ONESHOT event really was active at the time of delivery. Consider the following program. It creates a socket pair, writes a byte to the socket, registers an EV_ONESHOT event, and reads the byte from the socket. Next it checks how kevent(2) behaves. #include #include #include #include #include #include #include #include int main(void) { struct kevent kev[1]; struct timespec ts = {}; int fds[2], flags, kq, n; char b; if (socketpair(AF_UNIX, SOCK_STREAM, 0, fds) == -1) err(1, "socketpair"); flags = fcntl(fds[0], F_GETFL, 0); fcntl(fds[0], F_SETFL, flags | O_NONBLOCK); printf("write 1\n"); write(fds[1], "x", 1); kq = kqueue(); if (kq == -1) err(1, "kqueue"); EV_SET([0], fds[0], EVFILT_READ, EV_ADD | EV_ONESHOT, 0, 0, NULL); if (kevent(kq, kev, 1, NULL, 0, NULL) == -1) err(1, "kevent"); n = read(fds[0], , 1); printf("read %d\n", n); n = read(fds[0], , 1); printf("read %d\n", n); n = kevent(kq, NULL, 0, kev, 1, ); printf("kevent %d\n", n); n = read(fds[0], , 1); printf("read %d\n", n); n = kevent(kq, NULL, 0, kev, 1, ); printf("kevent %d\n", n); printf("write 1\n"); write(fds[1], "x", 1); n = kevent(kq, NULL, 0, kev, 1, ); printf("kevent %d\n", n); n = read(fds[0], , 1); printf("read %d\n", n); return 0; } With an unpatched kernel, the EV_ONESHOT event gets activated by the pending byte when the event is registered. The event remains active until delivery, and the delivery happens even though it is clear that reading from the socket will fail. The program prints: write 1 read 1 read -1 kevent 1 read -1 kevent 0 write 1 kevent 0 read 1 With the patch applied, the event gets delivered only if the socket has bytes pending. write 1 read 1 read -1 kevent 0 read -1 kevent 0 write 1 kevent 1 read 1 So, is this EV_ONESHOT change reasonable, or should the implementation stick with the old way? FreeBSD appears to follow the old way. MacOS might perform differently, though I am not sure about that. It is not essential to change EV_ONESHOT, however. Feedback and tests are welcome. Index: kern/uipc_socket.c === RCS file: src/sys/kern/uipc_socket.c,v retrieving revision 1.261 diff -u -p -r1.261 uipc_socket.c --- kern/uipc_socket.c 13 May 2021 19:43:11 - 1.261 +++ kern/uipc_socket.c 18 May 2021 12:56:24 - @@ -70,15 +70,26 @@ voidsorflush(struct socket *); void filt_sordetach(struct knote *kn); intfilt_soread(struct knote *kn, long hint); +intfilt_soreadmodify(struct kevent *kev, struct knote *kn); +intfilt_soreadprocess(struct knote *kn, struct kevent *kev); +intfilt_soread_common(struct knote *kn, struct socket *so); void filt_sowdetach(struct knote *kn); intfilt_sowrite(struct knote *kn, long hint); +intfilt_sowritemodify(struct kevent *kev, struct knote *kn); +intfilt_sowriteprocess(struct knote *kn, struct kevent *kev); +intfilt_sowrite_common(struct knote *kn, struct socket *so); intfilt_solisten(struct knote *kn, long hint); +intfilt_solistenmodify(struct kevent *kev, struct knote *kn); +intfilt_solistenprocess(struct knote *kn, struct kevent *kev); +intfilt_solisten_common(struct knote *kn, struct socket *so); const struct filterops solisten_filtops = { .f_flags= FILTEROP_ISFD, .f_attach = NULL, .f_detach = filt_sordetach, .f_event= filt_solisten, + .f_modify = filt_solistenmodify, + .f_process = filt_solistenprocess, }; const struct filterops soread_filtops = { @@ -86,6 +97,8 @@ const struct filterops soread_filtops =
Re: snmpd rename context to pdutype
ping On Fri, 2021-05-07 at 16:18 +0200, Martijn van Duren wrote: > When moving the traphandler to the snmpe process I overlooked the fact > that "type" is being saved inside the switch statement under the > sm_context name. RFC3411 talks about pduType, and the name context means > something completely different in the v3 world. > > The diff below moves our naming closer to the RFCs, which should > hopefully prevent further confusion in the future. > > While here I made the debug output print the pduType in a human readable > format. > > The invalid varbind check can be simplified a simple "{}" in the > ober_scanf_elements allowing me to just drop the type variable. > > OK? > > martijn@ > > Index: snmp.h > === > RCS file: /cvs/src/usr.sbin/snmpd/snmp.h,v > retrieving revision 1.16 > diff -u -p -r1.16 snmp.h > --- snmp.h 30 Jun 2020 17:11:49 - 1.16 > +++ snmp.h 7 May 2021 14:17:12 - > @@ -77,7 +77,7 @@ enum snmp_version { > SNMP_V3 = 3 > }; > > -enum snmp_context { > +enum snmp_pdutype { > SNMP_C_GETREQ = 0, > SNMP_C_GETNEXTREQ = 1, > SNMP_C_GETRESP = 2, > Index: snmpd.h > === > RCS file: /cvs/src/usr.sbin/snmpd/snmpd.h,v > retrieving revision 1.94 > diff -u -p -r1.94 snmpd.h > --- snmpd.h 5 Feb 2021 10:30:45 - 1.94 > +++ snmpd.h 7 May 2021 14:17:12 - > @@ -384,7 +384,7 @@ struct snmp_message { > socklen_t sm_slen; > int sm_sock_tcp; > int sm_aflags; > - int sm_type; > + enum snmp_pdutype sm_pdutype; > struct event sm_sockev; > char sm_host[HOST_NAME_MAX+1]; > in_port_t sm_port; > @@ -405,7 +405,6 @@ struct snmp_message { > > /* V1, V2c */ > char sm_community[SNMPD_MAXCOMMUNITYLEN]; > - int sm_context; > > /* V3 */ > long long sm_msgid; > Index: snmpe.c > === > RCS file: /cvs/src/usr.sbin/snmpd/snmpe.c,v > retrieving revision 1.70 > diff -u -p -r1.70 snmpe.c > --- snmpe.c 22 Feb 2021 11:31:09 - 1.70 > +++ snmpe.c 7 May 2021 14:17:12 - > @@ -41,6 +41,7 @@ > #include "mib.h" > > void snmpe_init(struct privsep *, struct privsep_proc *, void *); > +const char *snmpe_pdutype2string(enum snmp_pdutype); > int snmpe_parse(struct snmp_message *); > void snmpe_tryparse(int, struct snmp_message *); > int snmpe_parsevarbinds(struct snmp_message *); > @@ -194,6 +195,36 @@ snmpe_bind(struct address *addr) > return (-1); > } > > +const char * > +snmpe_pdutype2string(enum snmp_pdutype pdutype) > +{ > + static char unknown[sizeof("Unknown (4294967295)")]; > + > + switch (pdutype) { > + case SNMP_C_GETREQ: > + return "GetRequest"; > + case SNMP_C_GETNEXTREQ: > + return "GetNextRequest"; > + case SNMP_C_GETRESP: > + return "Response"; > + case SNMP_C_SETREQ: > + return "SetRequest"; > + case SNMP_C_TRAP: > + return "Trap"; > + case SNMP_C_GETBULKREQ: > + return "GetBulkRequest"; > + case SNMP_C_INFORMREQ: > + return "InformRequest"; > + case SNMP_C_TRAPV2: > + return "SNMPv2-Trap"; > + case SNMP_C_REPORT: > + return "Report"; > + } > + > + snprintf(unknown, sizeof(unknown), "Unknown (%u)", pdutype); > + return unknown; > +} > + > int > snmpe_parse(struct snmp_message *msg) > { > @@ -202,7 +233,6 @@ snmpe_parse(struct snmp_message *msg) > struct ber_element *a; > long long ver, req; > long long errval, erridx; > - unsigned int type; > u_int class; > char*comn; > char*flagstr, *ctxname; > @@ -271,15 +301,15 @@ snmpe_parse(struct snmp_message *msg) > goto fail; > } > > - if (ober_scanf_elements(msg->sm_pdu, "t{e", , , ) != 0) > + if (ober_scanf_elements(msg->sm_pdu, "t{e", , > &(msg->sm_pdutype), > + ) != 0) > goto parsefail; > > /* SNMP PDU context */ > if (class != BER_CLASS_CONTEXT) > goto parsefail; > > - msg->sm_type = type; > - switch (type) { > + switch (msg->sm_pdutype) { > case SNMP_C_GETBULKREQ: > if (msg->sm_version == SNMP_V1) { > stats->snmp_inbadversions++; > @@ -294,7 +324,7 @@ snmpe_parse(struct snmp_message *msg) >
bgpd adjust graceful restart capability negotiation
When I adjusted the capability negotiation to check both sides for presence I made the graceful restart capability lose all AFI/SAFI elements for the peer capabilities. This can be viewed with bgpctl show nei: e.g Description: beznau-1 BGP version 4, remote router-id 192.168.0.252 BGP state = Established, up for 02:11:07 Last read 00:00:09, holdtime 90s, keepalive interval 30s Last write 00:00:06 Neighbor capabilities: Multiprotocol extensions: IPv4 unicast 4-byte AS numbers Route Refresh Graceful Restart: Timeout: 90, Negotiated capabilities: Multiprotocol extensions: IPv4 unicast 4-byte AS numbers Route Refresh Message statistics: Instead of Description: beznau-1 BGP version 4, remote router-id 192.168.0.252 BGP state = Established, up for 00:02:31 Last read 00:00:00, holdtime 90s, keepalive interval 30s Last write 00:00:00 Neighbor capabilities: Multiprotocol extensions: IPv4 unicast 4-byte AS numbers Route Refresh Graceful Restart: Timeout: 90, restarted, IPv4 unicast Negotiated capabilities: Multiprotocol extensions: IPv4 unicast 4-byte AS numbers Route Refresh This is just a visual issue. In both cases the flush happens and graceful refresh remains disabled. -- :wq Claudio Index: session.c === RCS file: /cvs/src/usr.sbin/bgpd/session.c,v retrieving revision 1.415 diff -u -p -r1.415 session.c --- session.c 16 May 2021 09:09:11 - 1.415 +++ session.c 18 May 2021 11:38:59 - @@ -2585,24 +2585,24 @@ capa_neg_calc(struct peer *p) int8_t negflags; /* disable GR if the AFI/SAFI is not present */ - if (p->capa.ann.grestart.restart == 0 || - (p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT && + if ((p->capa.peer.grestart.flags[i] & CAPA_GR_PRESENT && p->capa.neg.mp[i] == 0)) p->capa.peer.grestart.flags[i] = 0; /* disable */ /* look at current GR state and decide what to do */ negflags = p->capa.neg.grestart.flags[i]; p->capa.neg.grestart.flags[i] = p->capa.peer.grestart.flags[i]; if (negflags & CAPA_GR_RESTARTING) { - if (!(p->capa.peer.grestart.flags[i] & - CAPA_GR_FORWARD)) { + if (p->capa.ann.grestart.restart != 0 || + p->capa.peer.grestart.flags[i] & CAPA_GR_FORWARD) { + p->capa.neg.grestart.flags[i] |= + CAPA_GR_RESTARTING; + } else { if (imsg_rde(IMSG_SESSION_FLUSH, p->conf.id, , sizeof(i)) == -1) return (-1); log_peer_warnx(>conf, "graceful restart of " "%s, not restarted, flushing", aid2str(i)); - } else - p->capa.neg.grestart.flags[i] |= - CAPA_GR_RESTARTING; + } } } p->capa.neg.grestart.timeout = p->capa.peer.grestart.timeout;
Re: Use atomic op for UVM map refcount
On Tue, May 18, 2021 at 01:24:42PM +0200, Martin Pieuchot wrote: > On 18/05/21(Tue) 12:07, Mark Kettenis wrote: > > > Date: Tue, 18 May 2021 12:02:19 +0200 > > > From: Martin Pieuchot > > > > > > This allows us to not rely on the KERNEL_LOCK() to check reference > > > counts. > > > > > > Also reduces differences with NetBSD and shrink my upcoming `vmobjlock' > > > diff. > > > > > > ok? > > > > Shouldn't we make ref_count volatile in that case? > > I don't know, I couldn't find any evidence about where to use "volatile" > in the kernel. > > My understanding is that using "volatile" tells the compiler to not > "cache" the value of such field in a register because it can change at > any time. Is it so? > > If that's correct, we should look at any piece of code reading such field > multiple times without using atomic operation, right? > > In this case `ref_count' is used once for sanity checks in UVM_MAP_REQ_WRITE() > and after calling atomic_dec_int_nv() in uvm_map_deallocate(). So, I don't > see "volatile" necessary here. Did I miss anything? > > There's only a couple of 'volatile' usages in sys/sys. These annotations > do not explicitly indicate which piece of code requires it. Maybe it would > be clearer to use a cast or a macro where necessary. This might help us > understand why and where "volatile" is needed. > Also "volatile" tells compiler to not drop or reorder expression due to optimisation. But this is not such case. This diff is ok by me. > > > Index: uvm/uvm_map.c > > > === > > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v > > > retrieving revision 1.274 > > > diff -u -p -r1.274 uvm_map.c > > > --- uvm/uvm_map.c 26 Mar 2021 13:40:05 - 1.274 > > > +++ uvm/uvm_map.c 18 May 2021 09:36:55 - > > > @@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma > > > /* > > > * uvm_map_reference: add reference to a map > > > * > > > - * XXX check map reference counter lock > > > + * => map need not be locked > > > */ > > > -#define uvm_map_reference(_map) > > > \ > > > - do {\ > > > - map->ref_count++; \ > > > - } while (0) > > > +void > > > +uvm_map_reference(struct vm_map *map) > > > +{ > > > + atomic_inc_int(>ref_count); > > > +} > > > > > > /* > > > * Calculate the dused delta. > > > @@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map) > > > int c; > > > struct uvm_map_deadq dead; > > > > > > - c = --map->ref_count; > > > + c = atomic_dec_int_nv(>ref_count); > > > if (c > 0) { > > > return; > > > } > > > Index: uvm/uvm_map.h > > > === > > > RCS file: /cvs/src/sys/uvm/uvm_map.h,v > > > retrieving revision 1.69 > > > diff -u -p -r1.69 uvm_map.h > > > --- uvm/uvm_map.h 12 Mar 2021 14:15:49 - 1.69 > > > +++ uvm/uvm_map.h 18 May 2021 09:36:36 - > > > @@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry > > > * read_locks and write_locks are used in lock debugging code. > > > * > > > * Locks used to protect struct members in this file: > > > + * a atomic operations > > > * I immutable after creation or exec(2) > > > * v `vm_map_lock' (this map `lock' or `mtx') > > > */ > > > @@ -272,7 +273,7 @@ struct vm_map { > > > struct uvm_map_addr addr; /* [v] Entry tree, by addr */ > > > > > > vsize_t size; /* virtual size */ > > > - int ref_count; /* Reference count */ > > > + int ref_count; /* [a] Reference count */ > > > int flags; /* flags */ > > > struct mutexflags_lock; /* flags lock */ > > > unsigned inttimestamp; /* Version number */ > > > > > > >
Re: Use atomic op for UVM map refcount
On 18/05/21(Tue) 12:07, Mark Kettenis wrote: > > Date: Tue, 18 May 2021 12:02:19 +0200 > > From: Martin Pieuchot > > > > This allows us to not rely on the KERNEL_LOCK() to check reference > > counts. > > > > Also reduces differences with NetBSD and shrink my upcoming `vmobjlock' > > diff. > > > > ok? > > Shouldn't we make ref_count volatile in that case? I don't know, I couldn't find any evidence about where to use "volatile" in the kernel. My understanding is that using "volatile" tells the compiler to not "cache" the value of such field in a register because it can change at any time. Is it so? If that's correct, we should look at any piece of code reading such field multiple times without using atomic operation, right? In this case `ref_count' is used once for sanity checks in UVM_MAP_REQ_WRITE() and after calling atomic_dec_int_nv() in uvm_map_deallocate(). So, I don't see "volatile" necessary here. Did I miss anything? There's only a couple of 'volatile' usages in sys/sys. These annotations do not explicitly indicate which piece of code requires it. Maybe it would be clearer to use a cast or a macro where necessary. This might help us understand why and where "volatile" is needed. > > Index: uvm/uvm_map.c > > === > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v > > retrieving revision 1.274 > > diff -u -p -r1.274 uvm_map.c > > --- uvm/uvm_map.c 26 Mar 2021 13:40:05 - 1.274 > > +++ uvm/uvm_map.c 18 May 2021 09:36:55 - > > @@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma > > /* > > * uvm_map_reference: add reference to a map > > * > > - * XXX check map reference counter lock > > + * => map need not be locked > > */ > > -#define uvm_map_reference(_map) > > \ > > - do {\ > > - map->ref_count++; \ > > - } while (0) > > +void > > +uvm_map_reference(struct vm_map *map) > > +{ > > + atomic_inc_int(>ref_count); > > +} > > > > /* > > * Calculate the dused delta. > > @@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map) > > int c; > > struct uvm_map_deadq dead; > > > > - c = --map->ref_count; > > + c = atomic_dec_int_nv(>ref_count); > > if (c > 0) { > > return; > > } > > Index: uvm/uvm_map.h > > === > > RCS file: /cvs/src/sys/uvm/uvm_map.h,v > > retrieving revision 1.69 > > diff -u -p -r1.69 uvm_map.h > > --- uvm/uvm_map.h 12 Mar 2021 14:15:49 - 1.69 > > +++ uvm/uvm_map.h 18 May 2021 09:36:36 - > > @@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry > > * read_locks and write_locks are used in lock debugging code. > > * > > * Locks used to protect struct members in this file: > > + * a atomic operations > > * I immutable after creation or exec(2) > > * v `vm_map_lock' (this map `lock' or `mtx') > > */ > > @@ -272,7 +273,7 @@ struct vm_map { > > struct uvm_map_addr addr; /* [v] Entry tree, by addr */ > > > > vsize_t size; /* virtual size */ > > - int ref_count; /* Reference count */ > > + int ref_count; /* [a] Reference count */ > > int flags; /* flags */ > > struct mutexflags_lock; /* flags lock */ > > unsigned inttimestamp; /* Version number */ > > > >
OpenIKED 6.9.0 Released
OpenIKED 6.9.0 has just been released. It will be arriving in the OpenIKED directory of your local OpenBSD mirror soon. OpenIKED is a free, permissively licensed Internet Key Exchange (IKEv2) implementation, developed as part of the OpenBSD project. It is intended to be a lean, secure and interoperable daemon that allows for easy setup and management of IPsec VPNs. This release is based on OpenBSD 6.9. The portable versions take the OpenBSD based source code and add compatibility functions and build infrastructure for other operating systems. Version 6.9.0 is the first release of OpenIKED-portable in quite some time. Since the last portable release a significant amount of features and bug fixes have been added. If you have not used OpenIKED-portable in a while, you will be positively surprised! OpenIKED-portable is known to compile and run on at least the following operating systems: Arch Linux, Debian 10, FreeBSD 12, FreeBSD 13 and NetBSD 9. It is our hope that packagers take interest and help adapt OpenIKED-portable to more distributions. OpenIKED-portable can be downloaded from any of the mirrors listed at https://www.openbsd.org/ftp.html, from the /pub/OpenBSD/OpenIKED directory. General bugs may be reported to b...@openbsd.org. Portable bugs may be filed at https://github.com/openiked/openiked-portable. We welcome feedback and improvements from the broader community. Thanks to all of the contributors who helped make this release possible.
bgpd upgrade to RFC6793
Our four-byte AS support dates back to the days of the original draft. Since then a new RFC 6793 got released that adjusted the error handling a bit. RFC 6793 just treats any error on AS4_PATH attribute with attribute drop with the hope that the AS_PATH is better. The reason is a bit questionable: Given that the two-octet AS numbers dominate during the transition and are carried in the AS_PATH attribute by an OLD BGP speaker, in this document the "attribute discard" approach is chosen to handle a malformed AS4_PATH attribute. My IPv4 feed has 41148 of 71059 or 57% two-octet AS numbers. I would not call that "dominate" (maybe the transition period is over...) Anyway, lets follow the standard -- :wq Claudio Index: bgpd.8 === RCS file: /cvs/src/usr.sbin/bgpd/bgpd.8,v retrieving revision 1.66 diff -u -p -r1.66 bgpd.8 --- bgpd.8 27 Apr 2021 11:34:58 - 1.66 +++ bgpd.8 18 May 2021 10:04:47 - @@ -317,14 +317,6 @@ has been started. .Re .Pp .Rs -.%A Q. Vohra -.%A E. Chen -.%D May 2007 -.%R RFC 4893 -.%T BGP Support for Four-octet AS Number Space -.Re -.Pp -.Rs .%A V. Gill .%A J. Heasley .%A D. Meyer @@ -349,6 +341,14 @@ has been started. .%D June 2011 .%R RFC 6286 .%T Autonomous-System-Wide Unique BGP Identifier for BGP-4 +.Re +.Pp +.Rs +.%A Q. Vohra +.%A E. Chen +.%D Dec 2012 +.%R RFC 6793 +.%T BGP Support for Four-Octet Autonomous System (AS) Number Space .Re .Pp .Rs Index: rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.520 diff -u -p -r1.520 rde.c --- rde.c 6 May 2021 09:18:54 - 1.520 +++ rde.c 18 May 2021 10:29:15 - @@ -1918,27 +1918,11 @@ bad_flags: goto bad_flags; if ((error = aspath_verify(p, attr_len, 1, rde_no_as_set(peer))) != 0) { - /* -* XXX RFC does not specify how to handle errors. -* XXX Instead of dropping the session because of a -* XXX bad path just mark the full update as having -* XXX a parse error which makes the update no longer -* XXX eligible and will not be considered for routing -* XXX or redistribution. -* XXX We follow draft-ietf-idr-optional-transitive -* XXX by looking at the partial bit. -* XXX Consider soft errors similar to a partial attr. -*/ - if (flags & ATTR_PARTIAL || error == AS_ERR_SOFT) { - a->flags |= F_ATTR_PARSE_ERR; - log_peer_warnx(>conf, "bad AS4_PATH, " - "path invalidated and prefix withdrawn"); - goto optattr; - } else { - rde_update_err(peer, ERR_UPDATE, ERR_UPD_ASPATH, - NULL, 0); - return (-1); - } + /* As per RFC6793 use "attribute discard" here. */ + log_peer_warnx(>conf, "bad AS4_PATH, " + "attribute discarded"); + plen += attr_len; + break; } a->flags |= F_ATTR_AS4BYTE_NEW; goto optattr;
Re: Use atomic op for UVM map refcount
> Date: Tue, 18 May 2021 12:02:19 +0200 > From: Martin Pieuchot > > This allows us to not rely on the KERNEL_LOCK() to check reference > counts. > > Also reduces differences with NetBSD and shrink my upcoming `vmobjlock' > diff. > > ok? Shouldn't we make ref_count volatile in that case? > Index: uvm/uvm_map.c > === > RCS file: /cvs/src/sys/uvm/uvm_map.c,v > retrieving revision 1.274 > diff -u -p -r1.274 uvm_map.c > --- uvm/uvm_map.c 26 Mar 2021 13:40:05 - 1.274 > +++ uvm/uvm_map.c 18 May 2021 09:36:55 - > @@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma > /* > * uvm_map_reference: add reference to a map > * > - * XXX check map reference counter lock > + * => map need not be locked > */ > -#define uvm_map_reference(_map) > \ > - do {\ > - map->ref_count++; \ > - } while (0) > +void > +uvm_map_reference(struct vm_map *map) > +{ > + atomic_inc_int(>ref_count); > +} > > /* > * Calculate the dused delta. > @@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map) > int c; > struct uvm_map_deadq dead; > > - c = --map->ref_count; > + c = atomic_dec_int_nv(>ref_count); > if (c > 0) { > return; > } > Index: uvm/uvm_map.h > === > RCS file: /cvs/src/sys/uvm/uvm_map.h,v > retrieving revision 1.69 > diff -u -p -r1.69 uvm_map.h > --- uvm/uvm_map.h 12 Mar 2021 14:15:49 - 1.69 > +++ uvm/uvm_map.h 18 May 2021 09:36:36 - > @@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry > * read_locks and write_locks are used in lock debugging code. > * > * Locks used to protect struct members in this file: > + * a atomic operations > * I immutable after creation or exec(2) > * v `vm_map_lock' (this map `lock' or `mtx') > */ > @@ -272,7 +273,7 @@ struct vm_map { > struct uvm_map_addr addr; /* [v] Entry tree, by addr */ > > vsize_t size; /* virtual size */ > - int ref_count; /* Reference count */ > + int ref_count; /* [a] Reference count */ > int flags; /* flags */ > struct mutexflags_lock; /* flags lock */ > unsigned inttimestamp; /* Version number */ > >
Use atomic op for UVM map refcount
This allows us to not rely on the KERNEL_LOCK() to check reference counts. Also reduces differences with NetBSD and shrink my upcoming `vmobjlock' diff. ok? Index: uvm/uvm_map.c === RCS file: /cvs/src/sys/uvm/uvm_map.c,v retrieving revision 1.274 diff -u -p -r1.274 uvm_map.c --- uvm/uvm_map.c 26 Mar 2021 13:40:05 - 1.274 +++ uvm/uvm_map.c 18 May 2021 09:36:55 - @@ -491,12 +491,13 @@ uvm_mapent_addr_remove(struct vm_map *ma /* * uvm_map_reference: add reference to a map * - * XXX check map reference counter lock + * => map need not be locked */ -#define uvm_map_reference(_map) \ - do {\ - map->ref_count++; \ - } while (0) +void +uvm_map_reference(struct vm_map *map) +{ + atomic_inc_int(>ref_count); +} /* * Calculate the dused delta. @@ -4292,7 +4293,7 @@ uvm_map_deallocate(vm_map_t map) int c; struct uvm_map_deadq dead; - c = --map->ref_count; + c = atomic_dec_int_nv(>ref_count); if (c > 0) { return; } Index: uvm/uvm_map.h === RCS file: /cvs/src/sys/uvm/uvm_map.h,v retrieving revision 1.69 diff -u -p -r1.69 uvm_map.h --- uvm/uvm_map.h 12 Mar 2021 14:15:49 - 1.69 +++ uvm/uvm_map.h 18 May 2021 09:36:36 - @@ -259,6 +259,7 @@ RBT_PROTOTYPE(uvm_map_addr, vm_map_entry * read_locks and write_locks are used in lock debugging code. * * Locks used to protect struct members in this file: + * a atomic operations * I immutable after creation or exec(2) * v `vm_map_lock' (this map `lock' or `mtx') */ @@ -272,7 +273,7 @@ struct vm_map { struct uvm_map_addr addr; /* [v] Entry tree, by addr */ vsize_t size; /* virtual size */ - int ref_count; /* Reference count */ + int ref_count; /* [a] Reference count */ int flags; /* flags */ struct mutexflags_lock; /* flags lock */ unsigned inttimestamp; /* Version number */
Re: Regarding May 17 patch
No, EVBUFFER_LENGTH(clt->clt_srvevb) will always be 8 for an FCGI_END_REQUEST, see: http://www.mit.edu/~yandros/doc/specs/fcgi-spec.html#S5.5 (Assuming the fcgi server is well behaved). On 2021-05-18 09:12 +02, Matthias Pressfreund wrote: > Hi Florian, isn't this what you actually wanted? > > Index: usr.sbin/httpd/server_fcgi.c > === > RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v > retrieving revision 1.86 > diff -u -p -u -p -r1.86 server_fcgi.c > --- usr.sbin/httpd/server_fcgi.c17 May 2021 09:26:52 - 1.86 > +++ usr.sbin/httpd/server_fcgi.c18 May 2021 07:08:24 - > @@ -616,7 +616,7 @@ server_fcgi_header(struct client *clt, u > if (kv_add(>http_headers, "Server", HTTPD_SERVERNAME) == NULL) > return (-1); > > - if (clt->clt_fcgi.type == FCGI_END_REQUEST || > + if (clt->clt_fcgi.type == FCGI_END_REQUEST && > EVBUFFER_LENGTH(clt->clt_srvevb) == 0) { > /* Can't chunk encode an empty body. */ > clt->clt_fcgi.chunked = 0; > -- I'm not entirely sure you are real.
Re: httpd(8): don't try to chunk-encode an empty body
On 2021-05-18 00:47 +02, Sebastian Benoit wrote: > The comments in server_fcgi_header seem to suggest more dragons lurk in this > area. Sush! -- I'm not entirely sure you are real.