Re: atomic read write

2022-03-10 Thread Visa Hankala
On Thu, Mar 10, 2022 at 07:17:43PM +0100, Alexander Bluhm wrote:
> On Thu, Mar 10, 2022 at 04:39:49PM +0100, Alexander Bluhm wrote:
> > > Below is a patch that shows how to accomplish release semantics with
> > > the file and refcnt APIs. (The added memory barriers ensure that the
> > > CPU completes its loads and stores to the object before dropping the
> > > reference. Another CPU might delete the object immediately after.
> > > The barrier might not be strictly necessary with refcnt_finalize(),
> > > though.)
> 
> The enter and exit membars that protect the critical section should
> be symmetric.  Maybe this diff is better.

No, symmetry is not always present. See below.

> And to avoid memory barriers the nobody understands we should convert
> FRELE to refcnt_rele.

I am not sure about that. I think as long as file reference counting
does unusual things that refcnt does not implement, f_count handling
should be separate.

> Index: kern/kern_synch.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.183
> diff -u -p -r1.183 kern_synch.c
> --- kern/kern_synch.c 10 Mar 2022 15:21:08 -  1.183
> +++ kern/kern_synch.c 10 Mar 2022 18:11:39 -
> @@ -805,6 +805,7 @@ void
>  refcnt_init(struct refcnt *r)
>  {
>   atomic_store_int(>r_refs, 1);
> + membar_enter_after_atomic();
>  }

I think membar is unnecessary here. Concurrent access can only happen
after the object has been "published", and the publishing should have
the appropriate memory barriers.

Without proper publishing, the code is prone to race conditions.

Also note that membar_enter_after_atomic() is not valid with
atomic_store_* or atomic_load_*. See my comment about cond_signal().

>  
>  void
> @@ -818,6 +819,7 @@ refcnt_take(struct refcnt *r)
>  #else
>   atomic_inc_int(>r_refs);
>  #endif
> + membar_enter_after_atomic();
>  }

This is unnecessary. The caller already has a reference to the object.
refcnt_take() only has the intent of increasing the reference count.

>  
>  int
> @@ -825,6 +827,7 @@ refcnt_rele(struct refcnt *r)
>  {
>   u_int refcnt;
>  
> + membar_exit_before_atomic();
>   refcnt = atomic_dec_int_nv(>r_refs);
>   KASSERT(refcnt != ~0);
>  
> @@ -844,6 +847,7 @@ refcnt_finalize(struct refcnt *r, const 
>   struct sleep_state sls;
>   u_int refcnt;
>  
> + membar_exit_before_atomic();
>   refcnt = atomic_dec_int_nv(>r_refs);
>   while (refcnt) {
>   sleep_setup(, r, PWAIT, wmesg, 0);
> @@ -856,11 +860,13 @@ void
>  cond_init(struct cond *c)
>  {
>   atomic_store_int(>c_wait, 1);
> + membar_enter_after_atomic();
>  }

Same point here as with refcnt_init().

>  
>  void
>  cond_signal(struct cond *c)
>  {
> + membar_exit_before_atomic();
>   atomic_store_int(>c_wait, 0);

This should use membar_exit(). membar_exit_before_atomic() is valid
only when accompanied with a true read-modify-write atomic operation.

The atomic_ prefix with the store and load instructions confuses this
somewhat.

The wakeup call that follows provides a membar function, but it comes
too late as c_wait has already been cleared.

>  
>   wakeup_one(c);
> @@ -872,9 +878,11 @@ cond_wait(struct cond *c, const char *wm
>   struct sleep_state sls;
>   unsigned int wait;
>  
> + membar_exit_before_atomic();
>   wait = atomic_load_int(>c_wait);
>   while (wait) {
>   sleep_setup(, c, PWAIT, wmesg, 0);
> + membar_exit_before_atomic();
>   wait = atomic_load_int(>c_wait);
>   sleep_finish(, wait);
>   }
> 

I think this should use membar_enter() after the loop.

cond_wait() is supposed to provide acquire semantics; once cond_wait()
returns, the CPU sees a state that is at least as recent as the one
that the caller of cond_signal() saw.

In a way, cond_wait() is similar to lock acquisition, and cond_signal()
similar to lock release.



Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Kevin Lo
On Thu, Mar 10, 2022 at 12:35:20PM +0100, Stefan Sperling wrote:
> 
> On Thu, Mar 10, 2022 at 12:25:17PM +0100, Stefan Sperling wrote:
> > Unless anyone else finds a problem, this patch can be considered ready
> > for review and commit.
> 
> Of course, I forgot to apply my sysassert fix to the second phy context
> command function...  Fixed here.

Tested:

iwx0 at pci7 dev 0 function 0 "Intel Wi-Fi 6 AX200" rev 0x1a, msix
iwx0: hw rev 0x340, fw ver 67.8f59b80b.0, address 50:e0:85:xx:xx:xx

iwx0: flags=808843 mtu 1500
lladdr 50:e0:85:xx:xx:xx
index 2 priority 4 llprio 3
groups: wlan egress
media: IEEE802.11 autoselect (VHT-MCS9 mode 11ac)
status: active
ieee80211: join wing chan 36 bssid b4:8c:9d:49:7d:c8 100% wpakey 
wpaprotos wpa2 wpaakms psk wpaciphers ccmp wpagroupcipher ccmp
inet 10.0.1.178 netmask 0xff00 broadcast 10.255.255.255

I've been running a kernel with this over 5 hours without encountering any
issues.  Thanks for your hard work!



Re: pluart(4): fifo support

2022-03-10 Thread Anton Lindqvist
On Tue, Mar 08, 2022 at 01:44:47PM +, Visa Hankala wrote:
> On Tue, Mar 08, 2022 at 08:04:36AM +0100, Anton Lindqvist wrote:
> > On Mon, Mar 07, 2022 at 07:36:35AM +, Visa Hankala wrote:
> > > I still think that checking TXFF and using the same code for both
> > > SBSA and true PL011 UARTs would be the best choice. This would avoid
> > > fragmenting the code and improve robustness by relying on functionality
> > > that is common to the different controller variants.
> > 
> > Fair enough, new diff.
> 
> Maybe the comments should omit the FIFO space description and just
> mention the lack of the level control register in the SBSA UART
> register interface.

I ended up tweaking the comments before committing. Thanks for all the
feedback.



Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Stefan Sperling
On Wed, Mar 09, 2022 at 01:07:47PM +0100, Stefan Sperling wrote:
> This patch adds initial 11ac support to the iwx(4) driver.
> This means that 80MHz channels can be used. No other 11ac features
> are enabled yet.
> 
> This is not yet a patch which could be committed. Apart from debug
> prints which need to go, there is a known issue found by dv@ where
> this patch causes a firmware error, sysassert 0x20101A25. The reason
> for this is not known.
> It would help to get more testing to see if more clues can be found
> based on where this error pops up. I cannot reproduce the error myself.

Thanks to a hint from Johannes Berg from Linux/Intel, the above problem
has been found.

In some channel configurations we would end up running the PHY at 20MHz
but ask for Tx rates using 80MHz, which caused sysassert 0x20101A25.
This is fixed here.

I have also removed debug prints.

Unless anyone else finds a problem, this patch can be considered ready
for review and commit.

diff refs/heads/master refs/heads/11ac
blob - 57bdcce64458e7f9d5802ce4247d5651f9183200
blob + f423379b313d1dbdba3c05e45c7d35e74056409b
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -304,6 +304,7 @@ int iwx_schedule_session_protection(struct iwx_softc *
uint32_t);
 void   iwx_init_channel_map(struct iwx_softc *, uint16_t *, uint32_t *, int);
 void   iwx_setup_ht_rates(struct iwx_softc *);
+void   iwx_setup_vht_rates(struct iwx_softc *);
 intiwx_mimo_enabled(struct iwx_softc *);
 void   iwx_mac_ctxt_task(void *);
 void   iwx_phy_ctxt_task(void *);
@@ -363,12 +364,13 @@ void  iwx_clear_oactive(struct iwx_softc *, struct 
iwx_
 void   iwx_rx_bmiss(struct iwx_softc *, struct iwx_rx_packet *,
struct iwx_rx_data *);
 intiwx_binding_cmd(struct iwx_softc *, struct iwx_node *, uint32_t);
+uint8_tiwx_get_vht_ctrl_pos(struct ieee80211com *, struct 
ieee80211_channel *);
 intiwx_phy_ctxt_cmd_uhb_v3(struct iwx_softc *, struct iwx_phy_ctxt *, 
uint8_t,
-   uint8_t, uint32_t, uint8_t);
+   uint8_t, uint32_t, uint8_t, uint8_t);
 intiwx_phy_ctxt_cmd_v3(struct iwx_softc *, struct iwx_phy_ctxt *, uint8_t,
-   uint8_t, uint32_t, uint8_t);
+   uint8_t, uint32_t, uint8_t, uint8_t);
 intiwx_phy_ctxt_cmd(struct iwx_softc *, struct iwx_phy_ctxt *, uint8_t,
-   uint8_t, uint32_t, uint32_t, uint8_t);
+   uint8_t, uint32_t, uint32_t, uint8_t, uint8_t);
 intiwx_send_cmd(struct iwx_softc *, struct iwx_host_cmd *);
 intiwx_send_cmd_pdu(struct iwx_softc *, uint32_t, uint32_t, uint16_t,
const void *);
@@ -430,10 +432,12 @@ int   iwx_scan_abort(struct iwx_softc *);
 intiwx_enable_mgmt_queue(struct iwx_softc *);
 intiwx_rs_rval2idx(uint8_t);
 uint16_t iwx_rs_ht_rates(struct iwx_softc *, struct ieee80211_node *, int);
+uint16_t iwx_rs_vht_rates(struct iwx_softc *, struct ieee80211_node *, int);
 intiwx_rs_init(struct iwx_softc *, struct iwx_node *);
 intiwx_enable_data_tx_queues(struct iwx_softc *);
 intiwx_phy_ctxt_update(struct iwx_softc *, struct iwx_phy_ctxt *,
-   struct ieee80211_channel *, uint8_t, uint8_t, uint32_t, uint8_t);
+   struct ieee80211_channel *, uint8_t, uint8_t, uint32_t, uint8_t,
+   uint8_t);
 intiwx_auth(struct iwx_softc *);
 intiwx_deauth(struct iwx_softc *);
 intiwx_run(struct iwx_softc *);
@@ -2812,6 +2816,12 @@ iwx_init_channel_map(struct iwx_softc *sc, uint16_t *c
if (ch_flags & IWX_NVM_CHANNEL_40MHZ)
channel->ic_flags |= IEEE80211_CHAN_40MHZ;
}
+
+   if (is_5ghz && data->sku_cap_11ac_enable) {
+   channel->ic_flags |= IEEE80211_CHAN_VHT;
+   if (ch_flags & IWX_NVM_CHANNEL_80MHZ)
+   channel->ic_xflags |= IEEE80211_CHANX_80MHZ;
+   }
}
 }
 
@@ -2846,6 +2856,34 @@ iwx_setup_ht_rates(struct iwx_softc *sc)
 }
 
 void
+iwx_setup_vht_rates(struct iwx_softc *sc)
+{
+   struct ieee80211com *ic = >sc_ic;
+   uint8_t rx_ant = iwx_fw_valid_rx_ant(sc);
+   int n;
+
+   ic->ic_vht_rxmcs = (IEEE80211_VHT_MCS_0_9 <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(1));
+
+   if (iwx_mimo_enabled(sc) &&
+   ((rx_ant & IWX_ANT_AB) == IWX_ANT_AB ||
+   (rx_ant & IWX_ANT_BC) == IWX_ANT_BC)) {
+   ic->ic_vht_rxmcs |= (IEEE80211_VHT_MCS_0_9 <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(2));
+   } else {
+   ic->ic_vht_rxmcs |= (IEEE80211_VHT_MCS_SS_NOT_SUPP <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(2));
+   }
+
+   for (n = 3; n <= IEEE80211_VHT_NUM_SS; n++) {
+   ic->ic_vht_rxmcs |= (IEEE80211_VHT_MCS_SS_NOT_SUPP <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(n));
+   }
+
+   ic->ic_vht_txmcs = ic->ic_vht_rxmcs;
+}
+
+void
 iwx_init_reorder_buffer(struct iwx_reorder_buffer *reorder_buf,
 uint16_t ssn, 

Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Stefan Sperling
On Thu, Mar 10, 2022 at 12:25:17PM +0100, Stefan Sperling wrote:
> Unless anyone else finds a problem, this patch can be considered ready
> for review and commit.

Of course, I forgot to apply my sysassert fix to the second phy context
command function...  Fixed here.

diff refs/heads/master refs/heads/11ac
blob - 57bdcce64458e7f9d5802ce4247d5651f9183200
blob + 61022a02f457141e31ec6c04351519fedf7b5782
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -304,6 +304,7 @@ int iwx_schedule_session_protection(struct iwx_softc *
uint32_t);
 void   iwx_init_channel_map(struct iwx_softc *, uint16_t *, uint32_t *, int);
 void   iwx_setup_ht_rates(struct iwx_softc *);
+void   iwx_setup_vht_rates(struct iwx_softc *);
 intiwx_mimo_enabled(struct iwx_softc *);
 void   iwx_mac_ctxt_task(void *);
 void   iwx_phy_ctxt_task(void *);
@@ -363,12 +364,13 @@ void  iwx_clear_oactive(struct iwx_softc *, struct 
iwx_
 void   iwx_rx_bmiss(struct iwx_softc *, struct iwx_rx_packet *,
struct iwx_rx_data *);
 intiwx_binding_cmd(struct iwx_softc *, struct iwx_node *, uint32_t);
+uint8_tiwx_get_vht_ctrl_pos(struct ieee80211com *, struct 
ieee80211_channel *);
 intiwx_phy_ctxt_cmd_uhb_v3(struct iwx_softc *, struct iwx_phy_ctxt *, 
uint8_t,
-   uint8_t, uint32_t, uint8_t);
+   uint8_t, uint32_t, uint8_t, uint8_t);
 intiwx_phy_ctxt_cmd_v3(struct iwx_softc *, struct iwx_phy_ctxt *, uint8_t,
-   uint8_t, uint32_t, uint8_t);
+   uint8_t, uint32_t, uint8_t, uint8_t);
 intiwx_phy_ctxt_cmd(struct iwx_softc *, struct iwx_phy_ctxt *, uint8_t,
-   uint8_t, uint32_t, uint32_t, uint8_t);
+   uint8_t, uint32_t, uint32_t, uint8_t, uint8_t);
 intiwx_send_cmd(struct iwx_softc *, struct iwx_host_cmd *);
 intiwx_send_cmd_pdu(struct iwx_softc *, uint32_t, uint32_t, uint16_t,
const void *);
@@ -430,10 +432,12 @@ int   iwx_scan_abort(struct iwx_softc *);
 intiwx_enable_mgmt_queue(struct iwx_softc *);
 intiwx_rs_rval2idx(uint8_t);
 uint16_t iwx_rs_ht_rates(struct iwx_softc *, struct ieee80211_node *, int);
+uint16_t iwx_rs_vht_rates(struct iwx_softc *, struct ieee80211_node *, int);
 intiwx_rs_init(struct iwx_softc *, struct iwx_node *);
 intiwx_enable_data_tx_queues(struct iwx_softc *);
 intiwx_phy_ctxt_update(struct iwx_softc *, struct iwx_phy_ctxt *,
-   struct ieee80211_channel *, uint8_t, uint8_t, uint32_t, uint8_t);
+   struct ieee80211_channel *, uint8_t, uint8_t, uint32_t, uint8_t,
+   uint8_t);
 intiwx_auth(struct iwx_softc *);
 intiwx_deauth(struct iwx_softc *);
 intiwx_run(struct iwx_softc *);
@@ -2812,6 +2816,12 @@ iwx_init_channel_map(struct iwx_softc *sc, uint16_t *c
if (ch_flags & IWX_NVM_CHANNEL_40MHZ)
channel->ic_flags |= IEEE80211_CHAN_40MHZ;
}
+
+   if (is_5ghz && data->sku_cap_11ac_enable) {
+   channel->ic_flags |= IEEE80211_CHAN_VHT;
+   if (ch_flags & IWX_NVM_CHANNEL_80MHZ)
+   channel->ic_xflags |= IEEE80211_CHANX_80MHZ;
+   }
}
 }
 
@@ -2846,6 +2856,34 @@ iwx_setup_ht_rates(struct iwx_softc *sc)
 }
 
 void
+iwx_setup_vht_rates(struct iwx_softc *sc)
+{
+   struct ieee80211com *ic = >sc_ic;
+   uint8_t rx_ant = iwx_fw_valid_rx_ant(sc);
+   int n;
+
+   ic->ic_vht_rxmcs = (IEEE80211_VHT_MCS_0_9 <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(1));
+
+   if (iwx_mimo_enabled(sc) &&
+   ((rx_ant & IWX_ANT_AB) == IWX_ANT_AB ||
+   (rx_ant & IWX_ANT_BC) == IWX_ANT_BC)) {
+   ic->ic_vht_rxmcs |= (IEEE80211_VHT_MCS_0_9 <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(2));
+   } else {
+   ic->ic_vht_rxmcs |= (IEEE80211_VHT_MCS_SS_NOT_SUPP <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(2));
+   }
+
+   for (n = 3; n <= IEEE80211_VHT_NUM_SS; n++) {
+   ic->ic_vht_rxmcs |= (IEEE80211_VHT_MCS_SS_NOT_SUPP <<
+   IEEE80211_VHT_MCS_FOR_SS_SHIFT(n));
+   }
+
+   ic->ic_vht_txmcs = ic->ic_vht_rxmcs;
+}
+
+void
 iwx_init_reorder_buffer(struct iwx_reorder_buffer *reorder_buf,
 uint16_t ssn, uint16_t buf_size)
 {
@@ -3147,7 +3185,7 @@ iwx_phy_ctxt_task(void *arg)
struct ieee80211com *ic = >sc_ic;
struct iwx_node *in = (void *)ic->ic_bss;
struct ieee80211_node *ni = >in_ni;
-   uint8_t chains, sco;
+   uint8_t chains, sco, vht_chan_width;
int err, s = splnet();
 
if ((sc->sc_flags & IWX_FLAG_SHUTDOWN) ||
@@ -3159,13 +3197,23 @@ iwx_phy_ctxt_task(void *arg)
}
 
chains = iwx_mimo_enabled(sc) ? 2 : 1;
-   if (ieee80211_node_supports_ht_chan40(ni))
+   if ((ni->ni_flags & IEEE80211_NODE_HT) &&
+   IEEE80211_CHAN_40MHZ_ALLOWED(ni->ni_chan) &&
+   ieee80211_node_supports_ht_chan40(ni))
sco = 

Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Stefan Sperling
On Wed, Mar 09, 2022 at 07:16:41PM +0300, Mikhail wrote:
> On Wed, Mar 09, 2022 at 01:07:47PM +0100, Stefan Sperling wrote:
> >  /*
> > + * Install received VHT caps information in the node's state block.
> > + */
> > +void
> > +ieee80211_setup_vhtcaps(struct ieee80211_node *ni, const uint8_t *data,
> > +uint8_t len)
> > +{
> > +   if (len != 12)
> > +   return;
> > +
> > +   ni->ni_vhtcaps = (data[0] | (data[1] << 8) | data[2] << 16 |
> > +   data[3] << 24);
> > +   ni->ni_vht_rxmcs = (data[4] | (data[5] << 8));
> > +   ni->ni_vht_rx_max_lgi_mbit_s = ((data[6] | (data[7] << 8)) &
> > +   IEEE80211_VHT_MAX_LGI_MBIT_S_MASK);
> > +   ni->ni_vht_txmcs = (data[8] | (data[9] << 8));
> > +   ni->ni_vht_tx_max_lgi_mbit_s = ((data[10] | (data[11] << 8)) &
> > +   IEEE80211_VHT_MAX_LGI_MBIT_S_MASK);
> > +
> > +   ni->ni_flags |= IEEE80211_NODE_VHTCAP;
> > +}
> 
> I understand that this function do things like ieee80211_setup_htcaps()
> (HT, 802.11n), which is mature, working and stable, but is there a
> reason not to return 0 on error, 1 on success, and check it in the
> caller (like in ieee80211_setup_vhtop())?

Doesn't really matter. We can decide to ignore just the VHT cap IE, or
we can discard the whole beacon. Either way, VHT won't be enabled.

> > +/* Check if the peer supports VHT short guard interval (SGI) on 160 MHz. */
> > +static inline int
> > +ieee80211_node_supports_vht_sgi160(struct ieee80211_node *ni)
> > +{
> > +   return ieee80211_node_supports_vht(ni) &&
> > +   (ni->ni_vhtcaps & IEEE80211_VHTCAP_SGI160);
> > +}
> 
> This function is unused.

I added it for future use.

> The patch for whitespacing:

Thanks. I will keep this for later instead of mixing it in right now.
I can then commit it separately and attribute it to you in the log message.



Re: atomic read write

2022-03-10 Thread Laurence Tratt
On Thu, Mar 10, 2022 at 09:05:54AM +, Visa Hankala wrote:

Hello Visa,

> In general, atomic_* functions have not provided implicit memory
> barriers on OpenBSD.

I've used atomics fairly extensively in other settings. Forgive me if I'm
explaining the obvious, but I had a devil of a job making sense of this
stuff a few years back, and so perhaps others might find it useful to expand
on this point.

Quick background: modern CPUs come in two main flavours, weakly ordered
(e.g. most Arm systems) and strongly ordered (e.g. x86), which determine the
rules of when multiple cores can see the reads/writes of other cores. Weakly
ordered systems can move/optimise loads/stores around more than strongly
ordered systems (so code that seems to work fine on x86 can then fail on
Arm).

There are in a sense two "safe" ways to use atomics: to assume that each
atomic is isolated and that reading/writing to it tells you nothing about any
other location in memory; or that every atomic is fully ordered with respect
to every other atomic (i.e. no reorderings of atomic operations are allowed).
The former is fast but (without additional operations) can't even express
a mutex safely. The latter doesn't have very good performance.

C11 thus allows you to do various atomic operations with different memory
orderings [1] so that you can choose on a case-by-case basis what you're
prepared to tolerate. "relaxed" is the most efficient but has the least
guarantees; "seq_cst" is the least efficient but has the most guarantees.

I would be very nervous about adding further atomic functions (as in the
original diff) to OpenBSD that don't allow the user to specify what ordering
they want: it's impossible to pick a memory ordering that suits all use
cases. For example, neither READ_ONCE nor the existing atomic_* instructions
define an ordering: I suppose I'd have to to assume they're relaxed. I worry
that people might assume these atomic operations provide greater guarantees
than they actually do.

Fortunately since, AFAICT, we already use C11 (or C17?!) for base, and LLVM
includes all of the relevant functions (e.g. the compare_exchange family
[2]) I don't think we need to add any functions of our own? It might not
even be a bad idea to deprecate the current atomic_* functions in base
and migrate to the C11 alternatives?


Laurie

[1] https://en.cppreference.com/w/c/atomic/memory_order
[2] https://en.cppreference.com/w/c/atomic/atomic_compare_exchange



Re: atomic read write

2022-03-10 Thread Visa Hankala
On Wed, Mar 09, 2022 at 08:45:35PM +0100, Alexander Bluhm wrote:
> On Tue, Mar 08, 2022 at 04:55:56PM +0100, Alexander Bluhm wrote:
> > Once we had the discussion where we need the READ_ONCE() macro.  As
> > modern C compiler has much freedom how to access memory, I came to
> > the conclusion that it would be wise to use READ_ONCE() and
> > WRITE_ONCE() everywhere when we use atomic operations variables.
> > Using atomic operations on one side and do whatever the compiler
> > thinks at the other side of the variable feels wrong.
> > 
> > The rule use READ_ONCE, WRITE_ONCE, atomic_inc, atomic_dec consistently
> > would be easy to follow.  Thinking about where the compiler might
> > reorder things and break MP code is much more complicated.
> >
> > Do we want to go this direction?
> 
> mvs@ mentioned that FreeBSD has atomic load and store instructions
> for that.  I decided to implement them as static inline functions
> as they provide stronger type checks.  Also I add them for int and
> long only, everything else is not atomic.
> 
> > If yes, here is the change for struct refcnt and cond.  While there
> > rename the field to r_refs which is easier to grep.
> 
> Note that the _init functions do not need atomic operations.  But
> the whole idea is to make it consistent and have a simple rule.  If
> an MP variable locking is marked as atomic, use the atomic_ functions.
> 
> As a bonus alpha gets the membar it needs.

In general, atomic_* functions have not provided implicit memory
barriers on OpenBSD.

I am not sure if the data dependency barrier is needed where
atomic_load_int() and atomic_load_long() are used. The memory ordering
guarantee is very weak and does not seem useful in any of the use cases
in the patch. However, the barrier does not appear to make things worse
in terms of correctness. Except maybe in assertions where they cause
subtle side effects.

However, the patch looks good.

OK visa@

> Index: sys/dev/pci/if_iwm.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwm.c,v
> retrieving revision 1.391
> diff -u -p -r1.391 if_iwm.c
> --- sys/dev/pci/if_iwm.c  8 Feb 2022 14:24:36 -   1.391
> +++ sys/dev/pci/if_iwm.c  9 Mar 2022 18:52:52 -
> @@ -9975,7 +9975,7 @@ iwm_init(struct ifnet *ifp)
>  
>   generation = ++sc->sc_generation;
>  
> - KASSERT(sc->task_refs.refs == 0);
> + KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
>   refcnt_init(>task_refs);
>  
>   err = iwm_preinit(sc);
> @@ -10116,7 +10116,7 @@ iwm_stop(struct ifnet *ifp)
>   iwm_del_task(sc, systq, >mac_ctxt_task);
>   iwm_del_task(sc, systq, >phy_ctxt_task);
>   iwm_del_task(sc, systq, >bgscan_done_task);
> - KASSERT(sc->task_refs.refs >= 1);
> + KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
>   refcnt_finalize(>task_refs, "iwmstop");
>  
>   iwm_stop_device(sc);
> Index: sys/dev/pci/if_iwx.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwx.c,v
> retrieving revision 1.134
> diff -u -p -r1.134 if_iwx.c
> --- sys/dev/pci/if_iwx.c  21 Jan 2022 15:51:02 -  1.134
> +++ sys/dev/pci/if_iwx.c  9 Mar 2022 18:53:50 -
> @@ -8017,7 +8017,7 @@ iwx_init(struct ifnet *ifp)
>   if (sc->sc_nvm.sku_cap_11n_enable)
>   iwx_setup_ht_rates(sc);
>  
> - KASSERT(sc->task_refs.refs == 0);
> + KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
>   refcnt_init(>task_refs);
>   ifq_clr_oactive(>if_snd);
>   ifp->if_flags |= IFF_RUNNING;
> @@ -8139,7 +8139,7 @@ iwx_stop(struct ifnet *ifp)
>   iwx_del_task(sc, systq, >mac_ctxt_task);
>   iwx_del_task(sc, systq, >phy_ctxt_task);
>   iwx_del_task(sc, systq, >bgscan_done_task);
> - KASSERT(sc->task_refs.refs >= 1);
> + KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
>   refcnt_finalize(>task_refs, "iwxstop");
>  
>   iwx_stop_device(sc);
> Index: sys/kern/kern_synch.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.182
> diff -u -p -r1.182 kern_synch.c
> --- sys/kern/kern_synch.c 19 Feb 2022 23:56:18 -  1.182
> +++ sys/kern/kern_synch.c 9 Mar 2022 18:57:53 -
> @@ -804,7 +804,7 @@ sys___thrwakeup(struct proc *p, void *v,
>  void
>  refcnt_init(struct refcnt *r)
>  {
> - r->refs = 1;
> + atomic_store_int(>r_refs, 1);
>  }
>  
>  void
> @@ -813,10 +813,10 @@ refcnt_take(struct refcnt *r)
>  #ifdef DIAGNOSTIC
>   u_int refcnt;
>  
> - refcnt = atomic_inc_int_nv(>refs);
> + refcnt = atomic_inc_int_nv(>r_refs);
>   KASSERT(refcnt != 0);
>  #else
> - atomic_inc_int(>refs);
> + atomic_inc_int(>r_refs);
>  #endif
>  }
>  
> @@ -825,7 +825,7 @@ refcnt_rele(struct refcnt *r)
>  {
>   u_int refcnt;
>  
> - refcnt = atomic_dec_int_nv(>refs);
> +   

Re: atomic read write

2022-03-10 Thread Alexander Bluhm
On Thu, Mar 10, 2022 at 09:05:54AM +, Visa Hankala wrote:
> > As a bonus alpha gets the membar it needs.
> 
> In general, atomic_* functions have not provided implicit memory
> barriers on OpenBSD.
> 
> I am not sure if the data dependency barrier is needed where
> atomic_load_int() and atomic_load_long() are used. The memory ordering
> guarantee is very weak and does not seem useful in any of the use cases
> in the patch. However, the barrier does not appear to make things worse
> in terms of correctness. Except maybe in assertions where they cause
> subtle side effects.

Our atomic functions don't provide barriers.  The only exception
is reading on Alpha.  The idea of membar_datadep_consumer() is taken
from Linux arch/alpha/include/asm/rwonce.h.

As I understand it, Alpha CPU may have more than one data cache.
So you need a barrier between reading a pointer and using its value.
Without knowing the details, we just do what Linux does.

bluhm



Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Matthias Schmidt
Hi Stefan,

* Stefan Sperling wrote:
> On Thu, Mar 10, 2022 at 12:25:17PM +0100, Stefan Sperling wrote:
> > Unless anyone else finds a problem, this patch can be considered ready
> > for review and commit.
> 
> Of course, I forgot to apply my sysassert fix to the second phy context
> command function...  Fixed here.

I have this patch (and the previous ones as well) running on the
following HW connected to a Fritzbox 7560 which reports that the Thinkpad
is connected via 80MHz and 11ac.

iwx0 at pci2 dev 0 function 0 "Intel Wi-Fi 6 AX200" rev 0x1a, msix
iwx0: hw rev 0x340, fw ver 67.8f59b80b.0, address bc:f1:71:cf:3d:f7

iwx0: flags=8943 mtu 1500
lladdr 50:7b:9d:73:aa:8a
index 2 priority 4 llprio 3
trunk: trunkdev trunk0
groups: wlan
media: IEEE802.11 autoselect (VHT-MCS3 mode 11ac)
status: active
ieee80211: join karlsruhe.v01d chan 44 bssid cc:ce:1e:8b:cf:d1 56% 
wpakey wpaprotos wpa2 wpaakms psk wpaciphers ccmp wpagroupcipher ccmp

So far all is stable and the speed difference is measureable.
Previously, the max speed I had was around 17M/s, with the patch I get
around 25M/s.

Cheers and thanks for the work!

Matthias



Re: atomic read write

2022-03-10 Thread Visa Hankala
On Thu, Mar 10, 2022 at 10:45:47AM +, Laurence Tratt wrote:
> On Thu, Mar 10, 2022 at 09:05:54AM +, Visa Hankala wrote:
> 
> Hello Visa,
> 
> > In general, atomic_* functions have not provided implicit memory
> > barriers on OpenBSD.
> 
> I've used atomics fairly extensively in other settings. Forgive me if I'm
> explaining the obvious, but I had a devil of a job making sense of this
> stuff a few years back, and so perhaps others might find it useful to expand
> on this point.
> 
> Quick background: modern CPUs come in two main flavours, weakly ordered
> (e.g. most Arm systems) and strongly ordered (e.g. x86), which determine the
> rules of when multiple cores can see the reads/writes of other cores. Weakly
> ordered systems can move/optimise loads/stores around more than strongly
> ordered systems (so code that seems to work fine on x86 can then fail on
> Arm).
> 
> There are in a sense two "safe" ways to use atomics: to assume that each
> atomic is isolated and that reading/writing to it tells you nothing about any
> other location in memory; or that every atomic is fully ordered with respect
> to every other atomic (i.e. no reorderings of atomic operations are allowed).
> The former is fast but (without additional operations) can't even express
> a mutex safely. The latter doesn't have very good performance.
> 
> C11 thus allows you to do various atomic operations with different memory
> orderings [1] so that you can choose on a case-by-case basis what you're
> prepared to tolerate. "relaxed" is the most efficient but has the least
> guarantees; "seq_cst" is the least efficient but has the most guarantees.
> 
> I would be very nervous about adding further atomic functions (as in the
> original diff) to OpenBSD that don't allow the user to specify what ordering
> they want: it's impossible to pick a memory ordering that suits all use
> cases. For example, neither READ_ONCE nor the existing atomic_* instructions
> define an ordering: I suppose I'd have to to assume they're relaxed. I worry
> that people might assume these atomic operations provide greater guarantees
> than they actually do.

My understanding is that OpenBSD's atomic_* instructions are relaxed
in terms of memory order. To accomplish a specific form of semantics,
the user adds the appropriate memory barrier instruction. Well, this
is the plan at least, I think.

Below is a patch that shows how to accomplish release semantics with
the file and refcnt APIs. (The added memory barriers ensure that the
CPU completes its loads and stores to the object before dropping the
reference. Another CPU might delete the object immediately after.
The barrier might not be strictly necessary with refcnt_finalize(),
though.)

> Fortunately since, AFAICT, we already use C11 (or C17?!) for base, and LLVM
> includes all of the relevant functions (e.g. the compare_exchange family
> [2]) I don't think we need to add any functions of our own? It might not
> even be a bad idea to deprecate the current atomic_* functions in base
> and migrate to the C11 alternatives?

Some of the architectures are still using GCC 4.2.1.

Base does not use C11 at the moment.


Index: kern/kern_synch.c
===
RCS file: src/sys/kern/kern_synch.c,v
retrieving revision 1.182
diff -u -p -r1.182 kern_synch.c
--- kern/kern_synch.c   19 Feb 2022 23:56:18 -  1.182
+++ kern/kern_synch.c   10 Mar 2022 13:37:50 -
@@ -825,6 +825,7 @@ refcnt_rele(struct refcnt *r)
 {
u_int refcnt;
 
+   membar_exit_before_atomic();
refcnt = atomic_dec_int_nv(>refs);
KASSERT(refcnt != ~0);
 
@@ -844,6 +845,7 @@ refcnt_finalize(struct refcnt *r, const 
struct sleep_state sls;
u_int refcnt;
 
+   membar_exit_before_atomic();
refcnt = atomic_dec_int_nv(>refs);
while (refcnt) {
sleep_setup(, r, PWAIT, wmesg, 0);
Index: sys/file.h
===
RCS file: src/sys/sys/file.h,v
retrieving revision 1.65
diff -u -p -r1.65 file.h
--- sys/file.h  20 Jan 2022 03:43:31 -  1.65
+++ sys/file.h  10 Mar 2022 13:37:50 -
@@ -36,6 +36,7 @@
 #include 
 
 #else /* _KERNEL */
+#include 
 #include 
 #include 
 #endif /* _KERNEL */
@@ -113,13 +114,21 @@ struct file {
atomic_inc_int(&(fp)->f_count); \
} while (0)
 
-#define FRELE(fp,p) \
-   (atomic_dec_int_nv(>f_count) == 0 ? fdrop(fp, p) : 0)
-
 #define FDUP_MAX_COUNT (UINT_MAX - 2 * MAXCPUS)
 
 intfdrop(struct file *, struct proc *);
 
+static inline int
+FRELE(struct file *fp, struct proc *p)
+{
+   int error = 0;
+
+   membar_exit_before_atomic();
+   if (atomic_dec_int_nv(>f_count) == 0)
+   error = fdrop(fp, p);
+   return (error);
+}
+
 static inline off_t
 foffset(struct file *fp)
 {



Re: mg and trailing whitespaces

2022-03-10 Thread Omar Polo
Mikhail  wrote:
> On Wed, Mar 02, 2022 at 09:47:31AM +0100, Omar Polo wrote:
> > Hello tech,
> > 
> > mg(1) has this tendency to leave a lot of trailing whitespaces around in
> > auto-indent-mode and c-mode which I find annoying.  Yes, there's
> > delete-trailing-space but it works only on the current line (and not the
> > whole buffer as in emacs) and one has to manually call it anyway.
> > Emacs, and even vi in base, are clever in this regard: trailing
> > whitespaces before the cursor are delete upon RET before adding a new
> > line.
> > 
> > So, here's the same heuristic for mg when auto-indent-mode or c-mode is
> > enabled.  It's still possible to leave trailing whitespace in a buffer
> > in those modes, it only gets a little bit harder to do.
> > 
> > (as a next step I guess we could also garbage collect cc_strip_trailp,
> > it was used in only one place before this patch and is unconditionally
> > set to TRUE.)
> > 
> > Thoughts/OK?
> 
> I've tested the patch and it works as intended.

thanks for testing!

> I like new behavior for c-mode, but not sure if it is good for
> auto-indent one. If user presses  and then  I'd argue that
> removing first tab is good idea in this particular mode, but no hard
> objections, since I don't use auto-indent.

even thought i think it's more useful/intuitive the behaviour in my diff
(and coherent with vi' auto-indent) what you're suggesting is actually
in line with what GNU emacs does.

Here's an updated diff that calls delwhite before computing the
indentation for auto-indent-mode.

diff 00f81dd44029396ae6f0e446c345d3f3fe7dfca2 
fa684de7df0c9db2c4f5e78b8ff0f31b29f58e2a
blob - 6c0ef5b897536c8c2d6e0faa6500e5a64ea57598
blob + 7b47402aa1145e41a7e8b14ea62fcd3d49bab074
--- usr.bin/mg/cmode.c
+++ usr.bin/mg/cmode.c
@@ -205,6 +205,8 @@ cc_lfindent(int f, int n)
 {
if (n < 0)
return (FALSE);
+   if (cc_strip_trailp)
+   (void)delwhite(FFRAND, 1);
if (enewline(FFRAND, 1) == FALSE)
return (FALSE);
return (cc_indent(FFRAND, n));
blob - 4d38284f3b886043336ff967602354f0fc81a4fd
blob + d128ac230e48a9b3569ae552f4b3411b7ce97d1e
--- usr.bin/mg/util.c
+++ usr.bin/mg/util.c
@@ -364,6 +364,7 @@ lfindent(int f, int n)
 
undo_boundary_enable(FFRAND, 0);
while (n--) {
+   (void)delwhite(FFRAND, 1);
nicol = 0;
for (i = 0; i < llength(curwp->w_dotp); ++i) {
c = lgetc(curwp->w_dotp, i);



openrsync man page fix

2022-03-10 Thread Mikhail
Replace rsync command with openrsync in EXAMPLES.

diff --git usr.bin/rsync/rsync.1 usr.bin/rsync/rsync.1
index 8144754695a..9bcf3566bb9 100644
--- usr.bin/rsync/rsync.1
+++ usr.bin/rsync/rsync.1
@@ -290,7 +290,7 @@ with the local
 and
 .Pa ../src/baz :
 .Pp
-.Dl % rsync -t ../src/bar ../src/baz host:dest
+.Dl % openrsync -t ../src/bar ../src/baz host:dest
 .Pp
 To update the out-of-date local files
 .Pa bar
@@ -301,7 +301,7 @@ with the remote files
 and
 .Pa host:src/baz :
 .Pp
-.Dl % rsync -t host:src/bar :src/baz \&.
+.Dl % openrsync -t host:src/bar :src/baz \&.
 .Pp
 To update the out-of-date local files
 .Pa ../dest/bar
@@ -312,7 +312,7 @@ with
 and
 .Pa baz :
 .Pp
-.Dl % rsync -t bar baz ../dest
+.Dl % openrsync -t bar baz ../dest
 .\" .Sh DIAGNOSTICS
 .Sh SEE ALSO
 .Xr ssh 1



Re: rpki-client: fix wrong conditional

2022-03-10 Thread Martin Vahlensieck
On Thu, Mar 10, 2022 at 06:15:48PM +0100, Theo Buehler wrote:
> On Thu, Mar 10, 2022 at 06:03:14PM +0100, Claudio Jeker wrote:
> > On Thu, Mar 10, 2022 at 05:54:21PM +0100, Theo Buehler wrote:
> > > On Thu, Mar 10, 2022 at 05:51:46PM +0100, Claudio Jeker wrote:
> > > > On Thu, Mar 10, 2022 at 05:33:28PM +0100, Martin Vahlensieck wrote:
> > > > > Hi
> > > > > 
> > > > > This pulls up and adjusts the check if i exceeds the bounds of pfds.
> > > > > Before it was technically wrong, as i > NPFDS means that the last
> > > > > write (i == NPFDS) was already out of bounds.
> > > >  
> > > > I see no reason to pull up the check but the if condition should indeed 
> > > > be
> > > > greater or equal. One could consider to change this into an assert() 
> > > > but I
> > > > think I stick with the errx().
> > > 
> > > Agreed. ok for the diff that just changes the checks to >=
> > 
> > Actually I was wrong, the check needs to happen at the start of the loop
> > not at the end else it does not work if the list is exactly the number of
> > elements to fill NPFDS. 
> 
> Ah right, we need to bail out when we would actually go past the limit.
> However, I see no reason to fiddle with the timeout before checking.
> So let's move the check to the beginning.

Here you go:

Index: http.c
===
RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
retrieving revision 1.53
diff -u -p -r1.53 http.c
--- http.c  10 Feb 2022 11:10:40 -  1.53
+++ http.c  10 Mar 2022 17:28:21 -
@@ -1809,6 +1809,9 @@ proc_http(char *bind_addr, int fd)
timeout = INFTIM;
now = getmonotime();
LIST_FOREACH(conn, , entry) {
+   if (i >= NPFDS)
+   errx(1, "too many connections");
+
if (conn->io_time == 0)
conn->io_time = now + HTTP_IO_TIMEOUT;
 
@@ -1828,10 +1831,11 @@ proc_http(char *bind_addr, int fd)
pfds[i].events = conn->events;
conn->pfd = [i];
i++;
-   if (i > NPFDS)
-   errx(1, "too many connections");
}
LIST_FOREACH(conn, , entry) {
+   if (i >= NPFDS)
+   errx(1, "too many connections");
+
if (conn->idle_time <= now)
timeout = 0;
else {
@@ -1844,8 +1848,6 @@ proc_http(char *bind_addr, int fd)
pfds[i].events = POLLIN;
conn->pfd = [i];
i++;
-   if (i > NPFDS)
-   errx(1, "too many connections");
}
 
if (poll(pfds, i, timeout) == -1) {



Re: atomic read write

2022-03-10 Thread Alexander Bluhm
On Thu, Mar 10, 2022 at 02:03:26PM +, Visa Hankala wrote:
> My understanding is that OpenBSD's atomic_* instructions are relaxed
> in terms of memory order. To accomplish a specific form of semantics,
> the user adds the appropriate memory barrier instruction. Well, this
> is the plan at least, I think.

I also think that.

> Below is a patch that shows how to accomplish release semantics with
> the file and refcnt APIs. (The added memory barriers ensure that the
> CPU completes its loads and stores to the object before dropping the
> reference. Another CPU might delete the object immediately after.
> The barrier might not be strictly necessary with refcnt_finalize(),
> though.)

OK bluhm@

> Index: kern/kern_synch.c
> ===
> RCS file: src/sys/kern/kern_synch.c,v
> retrieving revision 1.182
> diff -u -p -r1.182 kern_synch.c
> --- kern/kern_synch.c 19 Feb 2022 23:56:18 -  1.182
> +++ kern/kern_synch.c 10 Mar 2022 13:37:50 -
> @@ -825,6 +825,7 @@ refcnt_rele(struct refcnt *r)
>  {
>   u_int refcnt;
>  
> + membar_exit_before_atomic();
>   refcnt = atomic_dec_int_nv(>refs);
>   KASSERT(refcnt != ~0);
>  
> @@ -844,6 +845,7 @@ refcnt_finalize(struct refcnt *r, const 
>   struct sleep_state sls;
>   u_int refcnt;
>  
> + membar_exit_before_atomic();
>   refcnt = atomic_dec_int_nv(>refs);
>   while (refcnt) {
>   sleep_setup(, r, PWAIT, wmesg, 0);
> Index: sys/file.h
> ===
> RCS file: src/sys/sys/file.h,v
> retrieving revision 1.65
> diff -u -p -r1.65 file.h
> --- sys/file.h20 Jan 2022 03:43:31 -  1.65
> +++ sys/file.h10 Mar 2022 13:37:50 -
> @@ -36,6 +36,7 @@
>  #include 
>  
>  #else /* _KERNEL */
> +#include 
>  #include 
>  #include 
>  #endif /* _KERNEL */
> @@ -113,13 +114,21 @@ struct file {
>   atomic_inc_int(&(fp)->f_count); \
>   } while (0)
>  
> -#define FRELE(fp,p) \
> - (atomic_dec_int_nv(>f_count) == 0 ? fdrop(fp, p) : 0)
> -
>  #define FDUP_MAX_COUNT   (UINT_MAX - 2 * MAXCPUS)
>  
>  int  fdrop(struct file *, struct proc *);
>  
> +static inline int
> +FRELE(struct file *fp, struct proc *p)
> +{
> + int error = 0;
> +
> + membar_exit_before_atomic();
> + if (atomic_dec_int_nv(>f_count) == 0)
> + error = fdrop(fp, p);
> + return (error);
> +}
> +
>  static inline off_t
>  foffset(struct file *fp)
>  {



rpki-client: fix wrong conditional

2022-03-10 Thread Martin Vahlensieck
Hi

This pulls up and adjusts the check if i exceeds the bounds of pfds.
Before it was technically wrong, as i > NPFDS means that the last
write (i == NPFDS) was already out of bounds.

Best,

Martin


Index: http.c
===
RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
retrieving revision 1.53
diff -u -p -r1.53 http.c
--- http.c  10 Feb 2022 11:10:40 -  1.53
+++ http.c  10 Mar 2022 16:28:48 -
@@ -1820,6 +1820,10 @@ proc_http(char *bind_addr, int fd)
if (timeout == INFTIM || diff < timeout)
timeout = diff;
}
+
+   if (i >= NPFDS)
+   errx(1, "too many connections");
+
if (conn->state == STATE_WRITE_DATA)
pfds[i].fd = conn->req->outfd;
else
@@ -1828,8 +1832,6 @@ proc_http(char *bind_addr, int fd)
pfds[i].events = conn->events;
conn->pfd = [i];
i++;
-   if (i > NPFDS)
-   errx(1, "too many connections");
}
LIST_FOREACH(conn, , entry) {
if (conn->idle_time <= now)
@@ -1840,12 +1842,14 @@ proc_http(char *bind_addr, int fd)
if (timeout == INFTIM || diff < timeout)
timeout = diff;
}
+
+   if (i >= NPFDS)
+   errx(1, "too many connections");
+
pfds[i].fd = conn->fd;
pfds[i].events = POLLIN;
conn->pfd = [i];
i++;
-   if (i > NPFDS)
-   errx(1, "too many connections");
}
 
if (poll(pfds, i, timeout) == -1) {



Re: rpki-client: fix wrong conditional

2022-03-10 Thread Claudio Jeker
On Thu, Mar 10, 2022 at 05:33:28PM +0100, Martin Vahlensieck wrote:
> Hi
> 
> This pulls up and adjusts the check if i exceeds the bounds of pfds.
> Before it was technically wrong, as i > NPFDS means that the last
> write (i == NPFDS) was already out of bounds.
 
I see no reason to pull up the check but the if condition should indeed be
greater or equal. One could consider to change this into an assert() but I
think I stick with the errx().

> Martin
> 
> 
> Index: http.c
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
> retrieving revision 1.53
> diff -u -p -r1.53 http.c
> --- http.c10 Feb 2022 11:10:40 -  1.53
> +++ http.c10 Mar 2022 16:28:48 -
> @@ -1820,6 +1820,10 @@ proc_http(char *bind_addr, int fd)
>   if (timeout == INFTIM || diff < timeout)
>   timeout = diff;
>   }
> +
> + if (i >= NPFDS)
> + errx(1, "too many connections");
> +
>   if (conn->state == STATE_WRITE_DATA)
>   pfds[i].fd = conn->req->outfd;
>   else
> @@ -1828,8 +1832,6 @@ proc_http(char *bind_addr, int fd)
>   pfds[i].events = conn->events;
>   conn->pfd = [i];
>   i++;
> - if (i > NPFDS)
> - errx(1, "too many connections");
>   }
>   LIST_FOREACH(conn, , entry) {
>   if (conn->idle_time <= now)
> @@ -1840,12 +1842,14 @@ proc_http(char *bind_addr, int fd)
>   if (timeout == INFTIM || diff < timeout)
>   timeout = diff;
>   }
> +
> + if (i >= NPFDS)
> + errx(1, "too many connections");
> +
>   pfds[i].fd = conn->fd;
>   pfds[i].events = POLLIN;
>   conn->pfd = [i];
>   i++;
> - if (i > NPFDS)
> - errx(1, "too many connections");
>   }
>  
>   if (poll(pfds, i, timeout) == -1) {
> 

-- 
:wq Claudio



Re: rpki-client: fix wrong conditional

2022-03-10 Thread Claudio Jeker
On Thu, Mar 10, 2022 at 05:54:21PM +0100, Theo Buehler wrote:
> On Thu, Mar 10, 2022 at 05:51:46PM +0100, Claudio Jeker wrote:
> > On Thu, Mar 10, 2022 at 05:33:28PM +0100, Martin Vahlensieck wrote:
> > > Hi
> > > 
> > > This pulls up and adjusts the check if i exceeds the bounds of pfds.
> > > Before it was technically wrong, as i > NPFDS means that the last
> > > write (i == NPFDS) was already out of bounds.
> >  
> > I see no reason to pull up the check but the if condition should indeed be
> > greater or equal. One could consider to change this into an assert() but I
> > think I stick with the errx().
> 
> Agreed. ok for the diff that just changes the checks to >=

Actually I was wrong, the check needs to happen at the start of the loop
not at the end else it does not work if the list is exactly the number of
elements to fill NPFDS. 

for (first element; if not end; next element) {
if (i >= NPFDS)
errx();

/* do work */

i++;
/* here condition would trigger on last element */
}
 
> > 
> > > Martin
> > > 
> > > 
> > > Index: http.c
> > > ===
> > > RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
> > > retrieving revision 1.53
> > > diff -u -p -r1.53 http.c
> > > --- http.c10 Feb 2022 11:10:40 -  1.53
> > > +++ http.c10 Mar 2022 16:28:48 -
> > > @@ -1820,6 +1820,10 @@ proc_http(char *bind_addr, int fd)
> > >   if (timeout == INFTIM || diff < timeout)
> > >   timeout = diff;
> > >   }
> > > +
> > > + if (i >= NPFDS)
> > > + errx(1, "too many connections");
> > > +
> > >   if (conn->state == STATE_WRITE_DATA)
> > >   pfds[i].fd = conn->req->outfd;
> > >   else
> > > @@ -1828,8 +1832,6 @@ proc_http(char *bind_addr, int fd)
> > >   pfds[i].events = conn->events;
> > >   conn->pfd = [i];
> > >   i++;
> > > - if (i > NPFDS)
> > > - errx(1, "too many connections");
> > >   }
> > >   LIST_FOREACH(conn, , entry) {
> > >   if (conn->idle_time <= now)
> > > @@ -1840,12 +1842,14 @@ proc_http(char *bind_addr, int fd)
> > >   if (timeout == INFTIM || diff < timeout)
> > >   timeout = diff;
> > >   }
> > > +
> > > + if (i >= NPFDS)
> > > + errx(1, "too many connections");
> > > +
> > >   pfds[i].fd = conn->fd;
> > >   pfds[i].events = POLLIN;
> > >   conn->pfd = [i];
> > >   i++;
> > > - if (i > NPFDS)
> > > - errx(1, "too many connections");
> > >   }
> > >  
> > >   if (poll(pfds, i, timeout) == -1) {
> > > 
> > 
> > -- 
> > :wq Claudio
> > 
> 

-- 
:wq Claudio



Re: rpki-client: fix wrong conditional

2022-03-10 Thread Theo Buehler
On Thu, Mar 10, 2022 at 06:03:14PM +0100, Claudio Jeker wrote:
> On Thu, Mar 10, 2022 at 05:54:21PM +0100, Theo Buehler wrote:
> > On Thu, Mar 10, 2022 at 05:51:46PM +0100, Claudio Jeker wrote:
> > > On Thu, Mar 10, 2022 at 05:33:28PM +0100, Martin Vahlensieck wrote:
> > > > Hi
> > > > 
> > > > This pulls up and adjusts the check if i exceeds the bounds of pfds.
> > > > Before it was technically wrong, as i > NPFDS means that the last
> > > > write (i == NPFDS) was already out of bounds.
> > >  
> > > I see no reason to pull up the check but the if condition should indeed be
> > > greater or equal. One could consider to change this into an assert() but I
> > > think I stick with the errx().
> > 
> > Agreed. ok for the diff that just changes the checks to >=
> 
> Actually I was wrong, the check needs to happen at the start of the loop
> not at the end else it does not work if the list is exactly the number of
> elements to fill NPFDS. 

Ah right, we need to bail out when we would actually go past the limit.
However, I see no reason to fiddle with the timeout before checking.
So let's move the check to the beginning.

> 
>   for (first element; if not end; next element) {
>   if (i >= NPFDS)
>   errx();
> 
>   /* do work */
> 
>   i++;
>   /* here condition would trigger on last element */
>   }
>  
> > > 
> > > > Martin
> > > > 
> > > > 
> > > > Index: http.c
> > > > ===
> > > > RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
> > > > retrieving revision 1.53
> > > > diff -u -p -r1.53 http.c
> > > > --- http.c  10 Feb 2022 11:10:40 -  1.53
> > > > +++ http.c  10 Mar 2022 16:28:48 -
> > > > @@ -1820,6 +1820,10 @@ proc_http(char *bind_addr, int fd)
> > > > if (timeout == INFTIM || diff < timeout)
> > > > timeout = diff;
> > > > }
> > > > +
> > > > +   if (i >= NPFDS)
> > > > +   errx(1, "too many connections");
> > > > +
> > > > if (conn->state == STATE_WRITE_DATA)
> > > > pfds[i].fd = conn->req->outfd;
> > > > else
> > > > @@ -1828,8 +1832,6 @@ proc_http(char *bind_addr, int fd)
> > > > pfds[i].events = conn->events;
> > > > conn->pfd = [i];
> > > > i++;
> > > > -   if (i > NPFDS)
> > > > -   errx(1, "too many connections");
> > > > }
> > > > LIST_FOREACH(conn, , entry) {
> > > > if (conn->idle_time <= now)
> > > > @@ -1840,12 +1842,14 @@ proc_http(char *bind_addr, int fd)
> > > > if (timeout == INFTIM || diff < timeout)
> > > > timeout = diff;
> > > > }
> > > > +
> > > > +   if (i >= NPFDS)
> > > > +   errx(1, "too many connections");
> > > > +
> > > > pfds[i].fd = conn->fd;
> > > > pfds[i].events = POLLIN;
> > > > conn->pfd = [i];
> > > > i++;
> > > > -   if (i > NPFDS)
> > > > -   errx(1, "too many connections");
> > > > }
> > > >  
> > > > if (poll(pfds, i, timeout) == -1) {
> > > > 
> > > 
> > > -- 
> > > :wq Claudio
> > > 
> > 
> 
> -- 
> :wq Claudio



Re: rpki-client: fix wrong conditional

2022-03-10 Thread Theo Buehler
On Thu, Mar 10, 2022 at 05:51:46PM +0100, Claudio Jeker wrote:
> On Thu, Mar 10, 2022 at 05:33:28PM +0100, Martin Vahlensieck wrote:
> > Hi
> > 
> > This pulls up and adjusts the check if i exceeds the bounds of pfds.
> > Before it was technically wrong, as i > NPFDS means that the last
> > write (i == NPFDS) was already out of bounds.
>  
> I see no reason to pull up the check but the if condition should indeed be
> greater or equal. One could consider to change this into an assert() but I
> think I stick with the errx().

Agreed. ok for the diff that just changes the checks to >=

> 
> > Martin
> > 
> > 
> > Index: http.c
> > ===
> > RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
> > retrieving revision 1.53
> > diff -u -p -r1.53 http.c
> > --- http.c  10 Feb 2022 11:10:40 -  1.53
> > +++ http.c  10 Mar 2022 16:28:48 -
> > @@ -1820,6 +1820,10 @@ proc_http(char *bind_addr, int fd)
> > if (timeout == INFTIM || diff < timeout)
> > timeout = diff;
> > }
> > +
> > +   if (i >= NPFDS)
> > +   errx(1, "too many connections");
> > +
> > if (conn->state == STATE_WRITE_DATA)
> > pfds[i].fd = conn->req->outfd;
> > else
> > @@ -1828,8 +1832,6 @@ proc_http(char *bind_addr, int fd)
> > pfds[i].events = conn->events;
> > conn->pfd = [i];
> > i++;
> > -   if (i > NPFDS)
> > -   errx(1, "too many connections");
> > }
> > LIST_FOREACH(conn, , entry) {
> > if (conn->idle_time <= now)
> > @@ -1840,12 +1842,14 @@ proc_http(char *bind_addr, int fd)
> > if (timeout == INFTIM || diff < timeout)
> > timeout = diff;
> > }
> > +
> > +   if (i >= NPFDS)
> > +   errx(1, "too many connections");
> > +
> > pfds[i].fd = conn->fd;
> > pfds[i].events = POLLIN;
> > conn->pfd = [i];
> > i++;
> > -   if (i > NPFDS)
> > -   errx(1, "too many connections");
> > }
> >  
> > if (poll(pfds, i, timeout) == -1) {
> > 
> 
> -- 
> :wq Claudio
> 



Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Peter Hessler
On 2022 Mar 10 (Thu) at 12:35:20 +0100 (+0100), Stefan Sperling wrote:
:On Thu, Mar 10, 2022 at 12:25:17PM +0100, Stefan Sperling wrote:
:> Unless anyone else finds a problem, this patch can be considered ready
:> for review and commit.
:
:Of course, I forgot to apply my sysassert fix to the second phy context
:command function...  Fixed here.
:

This bumped my speeds up to 150Mbps when running a speedtest.

I also tried some latency sensitive apps, and no regression was noticed.

  iwx0 at pci3 dev 0 function 0 "Intel Wi-Fi 6 AX200" rev 0x1a, msix
  iwx0: hw rev 0x340, fw ver 67.8f59b80b.0, address c0:3c:59:03:ae:53

  chan 64
  media: IEEE802.11 autoselect (VHT-MCS9 mode 11ac)

Over an Asus Wireless Router AC-RT58U.

Running fine for 2 hours so far.

-- 
Security check: INTRUDER ALERT!



document membar atomic

2022-03-10 Thread Alexander Bluhm
Hi,

claudio@ mentioned that documentation for membar_enter_after_atomic
and membar_exit_before_atomic is missing.

ok?

bluhm

Index: membar_sync.9
===
RCS file: /data/mirror/openbsd/cvs/src/share/man/man9/membar_sync.9,v
retrieving revision 1.3
diff -u -p -r1.3 membar_sync.9
--- membar_sync.9   14 Feb 2014 05:11:55 -  1.3
+++ membar_sync.9   10 Mar 2022 20:51:01 -
@@ -35,7 +35,9 @@
 .Nm membar_exit ,
 .Nm membar_producer ,
 .Nm membar_consumer ,
-.Nm membar_sync
+.Nm membar_sync ,
+.Nm membar_enter_after_atomic ,
+.Nm membar_exit_before_atomic
 .Nd memory access barrier operations
 .Sh SYNOPSIS
 .In sys/atomic.h
@@ -49,6 +51,10 @@
 .Fn membar_consumer "void"
 .Ft void
 .Fn membar_sync "void"
+.Ft void
+.Fn membar_enter_after_atomic "void"
+.Ft void
+.Fn membar_exit_before_atomic "void"
 .Sh DESCRIPTION
 The membar set of functions provide an interface for issuing memory barrier
 access operations with respect to multiple processors in the system.
@@ -79,14 +85,27 @@ after the memory barrier complete.
 All loads and stores preceding the memory barrier will complete and
 reach global visibility before any loads and stores after the memory
 barrier complete and reach global visibility.
+.It Fn membar_enter_after_atomic
+An atomic operation preceding
+.Fn membar_enter_after_atomic
+will reach global visibility before all loads and stores following it.
+The atomic operation is used to protect the start of a critical section.
+.It Fn membar_exit_before_atomic
+All loads and stores preceding
+.Fn membar_exit_before_atomic
+will reach global visibility before atomic operation that follows it.
+The atomic operation is used to protect the end of a critical section.
 .El
 .Sh CONTEXT
 .Fn membar_enter ,
 .Fn membar_exit ,
 .Fn membar_producer ,
 .Fn membar_consumer ,
-.Fn membar_sync
-can all be called during autoconf, from process context, or from interrupt 
context.
+.Fn membar_sync ,
+.Fn membar_enter_after_atomic ,
+.Fn membar_exit_before_atomic
+can all be called during autoconf, from process context, or from
+interrupt context.
 .Sh HISTORY
 The membar functions first appeared in
 .Nx 5.0



Re: atomic read write

2022-03-10 Thread Mark Kettenis
> Date: Thu, 10 Mar 2022 21:43:02 +0100
> From: Alexander Bluhm 
> 
> On Thu, Mar 10, 2022 at 07:47:16PM +0100, Mark Kettenis wrote:
> > > In general, atomic_* functions have not provided implicit memory
> > > barriers on OpenBSD.
> > >
> > > I am not sure if the data dependency barrier is needed where
> > > atomic_load_int() and atomic_load_long() are used. The memory ordering
> > > guarantee is very weak and does not seem useful in any of the use cases
> > > in the patch. However, the barrier does not appear to make things worse
> > > in terms of correctness. Except maybe in assertions where they cause
> > > subtle side effects.
> > >
> > > However, the patch looks good.
> > >
> > > OK visa@
> >
> > I have some doubts about using atomic_load_xxx() in KASSERTs.  A
> > KASSERT really shoudn't have any side-effects.  But atomic_load_xxx()
> > has a memory barrier in it, which does have the side-effect of
> > ordering memory accesses.
> 
> The memory barrier is #ifdef __alpha__.  Without it the CPU might
> read the wrong value.  But you are right, a barrier should not
> depend on DIAGNOSTIC.  I guess not many Alpha MP users have Intel
> wireless devices.

If you think about it, the invariants being tested by those KASSERTs
should not depend on whether the old or the new value is read if
another CPU is modifying that variable at the same time.  Unless of
course there is a refcounting bug.  But even with the barrier we're
not guaranteed to catch that bug.

> ok?

ok kettenis@

> Index: dev/pci/if_iwm.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwm.c,v
> retrieving revision 1.392
> diff -u -p -r1.392 if_iwm.c
> --- dev/pci/if_iwm.c  10 Mar 2022 15:21:08 -  1.392
> +++ dev/pci/if_iwm.c  10 Mar 2022 20:34:18 -
> @@ -9975,7 +9975,7 @@ iwm_init(struct ifnet *ifp)
> 
>   generation = ++sc->sc_generation;
> 
> - KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
> + KASSERT(sc->task_refs.r_refs == 0);
>   refcnt_init(>task_refs);
> 
>   err = iwm_preinit(sc);
> @@ -10116,7 +10116,7 @@ iwm_stop(struct ifnet *ifp)
>   iwm_del_task(sc, systq, >mac_ctxt_task);
>   iwm_del_task(sc, systq, >phy_ctxt_task);
>   iwm_del_task(sc, systq, >bgscan_done_task);
> - KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
> + KASSERT(sc->task_refs.r_refs >= 1);
>   refcnt_finalize(>task_refs, "iwmstop");
> 
>   iwm_stop_device(sc);
> Index: dev/pci/if_iwx.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwx.c,v
> retrieving revision 1.135
> diff -u -p -r1.135 if_iwx.c
> --- dev/pci/if_iwx.c  10 Mar 2022 15:21:08 -  1.135
> +++ dev/pci/if_iwx.c  10 Mar 2022 20:34:40 -
> @@ -8017,7 +8017,7 @@ iwx_init(struct ifnet *ifp)
>   if (sc->sc_nvm.sku_cap_11n_enable)
>   iwx_setup_ht_rates(sc);
> 
> - KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
> + KASSERT(sc->task_refs.r_refs == 0);
>   refcnt_init(>task_refs);
>   ifq_clr_oactive(>if_snd);
>   ifp->if_flags |= IFF_RUNNING;
> @@ -8139,7 +8139,7 @@ iwx_stop(struct ifnet *ifp)
>   iwx_del_task(sc, systq, >mac_ctxt_task);
>   iwx_del_task(sc, systq, >phy_ctxt_task);
>   iwx_del_task(sc, systq, >bgscan_done_task);
> - KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
> + KASSERT(sc->task_refs.r_refs >= 1);
>   refcnt_finalize(>task_refs, "iwxstop");
> 
>   iwx_stop_device(sc);
> 



Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Dave Voutila


Stefan Sperling  writes:

> On Thu, Mar 10, 2022 at 01:56:19PM -0500, Dave Voutila wrote:
>>
>> Stefan Sperling  writes:
>>
>> > On Thu, Mar 10, 2022 at 12:25:17PM +0100, Stefan Sperling wrote:
>> >> Unless anyone else finds a problem, this patch can be considered ready
>> >> for review and commit.
>> >
>> > Of course, I forgot to apply my sysassert fix to the second phy context
>> > command function...  Fixed here.
>> >
>>
>> Looking good on the 3 devices I've tested, with one minor nit. All 3
>> devices are similar AX200s like so:
>>
>> iwx0 at pci3 dev 0 function 0 "Intel Wi-Fi 6 AX200" rev 0x1a, msix
>> iwx0: hw rev 0x340, fw ver 67.8f59b80b.0, address ...
>>
>> For some reason, on my lenovo x13 I had to manually enable 11ac mode via
>> ifconfig. No idea why. It was sitting on the table next to my Go 2 that
>> automatically picked up on 11ac mode.
>
> Which channel was it on while not in 11ac mode?
> Only the 5GHz channel range (chan 36+) can use 11ac.
>
> In the 2GHz channel range (chan 1-12 for the US) 11ac cannot be used.
> In that range it should autoselect mode 11n, or 11g/11b for very old APs.
>

Yeah, chan 4 when on 11n for me. 36 for 11ac.

> We could tweak access point selection heuristics to prefer 11ac over 11n
> if both are available for a given SSID. But that can be done in a later
> patch.
>

It does look like it's simply the heuristics and a slight difference in
signal strength or whatever the % represents.

$ ifconfig iwx0 scan | grep PrunoWifi
ieee80211: join PrunoWifi chan 4 bssid 00:11:32:91:72:d7 94% wpakey 
wpaprotos wpa2 wpaakms psk wpaciphers ccmp wpagroupcipher ccmp
nwid PrunoWifi chan 4 bssid 00:11:32:91:72:d7 94% HT-MCS31 
privacy,short_preamble,short_slottime,radio_measurement,wpa2
nwid PrunoWifi chan 36 bssid 00:11:32:91:72:d8 88% HT-MCS31 
privacy,spectrum_mgmt,short_slottime,radio_measurement,wpa2

-dv



Re: atomic read write

2022-03-10 Thread Alexander Bluhm
On Thu, Mar 10, 2022 at 07:47:16PM +0100, Mark Kettenis wrote:
> > In general, atomic_* functions have not provided implicit memory
> > barriers on OpenBSD.
> > 
> > I am not sure if the data dependency barrier is needed where
> > atomic_load_int() and atomic_load_long() are used. The memory ordering
> > guarantee is very weak and does not seem useful in any of the use cases
> > in the patch. However, the barrier does not appear to make things worse
> > in terms of correctness. Except maybe in assertions where they cause
> > subtle side effects.
> > 
> > However, the patch looks good.
> > 
> > OK visa@
> 
> I have some doubts about using atomic_load_xxx() in KASSERTs.  A
> KASSERT really shoudn't have any side-effects.  But atomic_load_xxx()
> has a memory barrier in it, which does have the side-effect of
> ordering memory accesses.

The memory barrier is #ifdef __alpha__.  Without it the CPU might
read the wrong value.  But you are right, a barrier should not
depend on DIAGNOSTIC.  I guess not many Alpha MP users have Intel
wireless devices.

ok?

bluhm

Index: dev/pci/if_iwm.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwm.c,v
retrieving revision 1.392
diff -u -p -r1.392 if_iwm.c
--- dev/pci/if_iwm.c10 Mar 2022 15:21:08 -  1.392
+++ dev/pci/if_iwm.c10 Mar 2022 20:34:18 -
@@ -9975,7 +9975,7 @@ iwm_init(struct ifnet *ifp)
 
generation = ++sc->sc_generation;
 
-   KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
+   KASSERT(sc->task_refs.r_refs == 0);
refcnt_init(>task_refs);
 
err = iwm_preinit(sc);
@@ -10116,7 +10116,7 @@ iwm_stop(struct ifnet *ifp)
iwm_del_task(sc, systq, >mac_ctxt_task);
iwm_del_task(sc, systq, >phy_ctxt_task);
iwm_del_task(sc, systq, >bgscan_done_task);
-   KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
+   KASSERT(sc->task_refs.r_refs >= 1);
refcnt_finalize(>task_refs, "iwmstop");
 
iwm_stop_device(sc);
Index: dev/pci/if_iwx.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwx.c,v
retrieving revision 1.135
diff -u -p -r1.135 if_iwx.c
--- dev/pci/if_iwx.c10 Mar 2022 15:21:08 -  1.135
+++ dev/pci/if_iwx.c10 Mar 2022 20:34:40 -
@@ -8017,7 +8017,7 @@ iwx_init(struct ifnet *ifp)
if (sc->sc_nvm.sku_cap_11n_enable)
iwx_setup_ht_rates(sc);
 
-   KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
+   KASSERT(sc->task_refs.r_refs == 0);
refcnt_init(>task_refs);
ifq_clr_oactive(>if_snd);
ifp->if_flags |= IFF_RUNNING;
@@ -8139,7 +8139,7 @@ iwx_stop(struct ifnet *ifp)
iwx_del_task(sc, systq, >mac_ctxt_task);
iwx_del_task(sc, systq, >phy_ctxt_task);
iwx_del_task(sc, systq, >bgscan_done_task);
-   KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
+   KASSERT(sc->task_refs.r_refs >= 1);
refcnt_finalize(>task_refs, "iwxstop");
 
iwx_stop_device(sc);



Re: document membar atomic

2022-03-10 Thread Mark Kettenis
> Date: Thu, 10 Mar 2022 21:53:13 +0100
> From: Alexander Bluhm 
> 
> Hi,
> 
> claudio@ mentioned that documentation for membar_enter_after_atomic
> and membar_exit_before_atomic is missing.
> 
> ok?

I think this is fine, although it doesn't actually define what an
"atomic operation" is in this context.  Now we we only actually
provide an optimized implementation on x86; on all other architectures
that we support, these are just aliases for membar_enter() and
membar_exit().  So all x86 operations that perform a bus lock are
supported.  However the atomic_load_xxx() and atomic_store_xxx()
operations you just introduced do not fall into this category.

Not sure if that needs spelling out.  And if it does, maybe the chosen
names for those functions are wrong?


> Index: membar_sync.9
> ===
> RCS file: /data/mirror/openbsd/cvs/src/share/man/man9/membar_sync.9,v
> retrieving revision 1.3
> diff -u -p -r1.3 membar_sync.9
> --- membar_sync.9 14 Feb 2014 05:11:55 -  1.3
> +++ membar_sync.9 10 Mar 2022 20:51:01 -
> @@ -35,7 +35,9 @@
>  .Nm membar_exit ,
>  .Nm membar_producer ,
>  .Nm membar_consumer ,
> -.Nm membar_sync
> +.Nm membar_sync ,
> +.Nm membar_enter_after_atomic ,
> +.Nm membar_exit_before_atomic
>  .Nd memory access barrier operations
>  .Sh SYNOPSIS
>  .In sys/atomic.h
> @@ -49,6 +51,10 @@
>  .Fn membar_consumer "void"
>  .Ft void
>  .Fn membar_sync "void"
> +.Ft void
> +.Fn membar_enter_after_atomic "void"
> +.Ft void
> +.Fn membar_exit_before_atomic "void"
>  .Sh DESCRIPTION
>  The membar set of functions provide an interface for issuing memory barrier
>  access operations with respect to multiple processors in the system.
> @@ -79,14 +85,27 @@ after the memory barrier complete.
>  All loads and stores preceding the memory barrier will complete and
>  reach global visibility before any loads and stores after the memory
>  barrier complete and reach global visibility.
> +.It Fn membar_enter_after_atomic
> +An atomic operation preceding
> +.Fn membar_enter_after_atomic
> +will reach global visibility before all loads and stores following it.
> +The atomic operation is used to protect the start of a critical section.
> +.It Fn membar_exit_before_atomic
> +All loads and stores preceding
> +.Fn membar_exit_before_atomic
> +will reach global visibility before atomic operation that follows it.
> +The atomic operation is used to protect the end of a critical section.
>  .El
>  .Sh CONTEXT
>  .Fn membar_enter ,
>  .Fn membar_exit ,
>  .Fn membar_producer ,
>  .Fn membar_consumer ,
> -.Fn membar_sync
> -can all be called during autoconf, from process context, or from interrupt 
> context.
> +.Fn membar_sync ,
> +.Fn membar_enter_after_atomic ,
> +.Fn membar_exit_before_atomic
> +can all be called during autoconf, from process context, or from
> +interrupt context.
>  .Sh HISTORY
>  The membar functions first appeared in
>  .Nx 5.0
> 
> 



Re: atomic read write

2022-03-10 Thread David Gwynne
On Thu, Mar 10, 2022 at 10:45:47AM +, Laurence Tratt wrote:
> On Thu, Mar 10, 2022 at 09:05:54AM +, Visa Hankala wrote:
> 
> Hello Visa,
> 
> > In general, atomic_* functions have not provided implicit memory
> > barriers on OpenBSD.
> 
> I've used atomics fairly extensively in other settings. Forgive me if I'm
> explaining the obvious, but I had a devil of a job making sense of this
> stuff a few years back, and so perhaps others might find it useful to expand
> on this point.
> 
> Quick background: modern CPUs come in two main flavours, weakly ordered
> (e.g. most Arm systems) and strongly ordered (e.g. x86), which determine the
> rules of when multiple cores can see the reads/writes of other cores. Weakly
> ordered systems can move/optimise loads/stores around more than strongly
> ordered systems (so code that seems to work fine on x86 can then fail on
> Arm).
> 
> There are in a sense two "safe" ways to use atomics: to assume that each
> atomic is isolated and that reading/writing to it tells you nothing about any
> other location in memory; or that every atomic is fully ordered with respect
> to every other atomic (i.e. no reorderings of atomic operations are allowed).
> The former is fast but (without additional operations) can't even express
> a mutex safely. The latter doesn't have very good performance.
> 
> C11 thus allows you to do various atomic operations with different memory
> orderings [1] so that you can choose on a case-by-case basis what you're
> prepared to tolerate. "relaxed" is the most efficient but has the least
> guarantees; "seq_cst" is the least efficient but has the most guarantees.
> 
> I would be very nervous about adding further atomic functions (as in the
> original diff) to OpenBSD that don't allow the user to specify what ordering
> they want: it's impossible to pick a memory ordering that suits all use
> cases. For example, neither READ_ONCE nor the existing atomic_* instructions
> define an ordering: I suppose I'd have to to assume they're relaxed. I worry
> that people might assume these atomic operations provide greater guarantees
> than they actually do.

the existing atomic_foo ops provided in the kernel are implicitly
relaxed. as you say, that make no claims about ordering. if you want
ordering then you use explicit calls membar_foo ops.

> Fortunately since, AFAICT, we already use C11 (or C17?!) for base, and LLVM
> includes all of the relevant functions (e.g. the compare_exchange family
> [2]) I don't think we need to add any functions of our own? It might not
> even be a bad idea to deprecate the current atomic_* functions in base
> and migrate to the C11 alternatives?

not all our archs are on llvm/clang or even a recent gcc, and there
are (what i will generously call) gaps in the set of atomic ops
different cpus actually implement themselves. having the atomic_foo
and membar_foo apis in the kernels distinct from the c/compiler provided
ones makes it a bit more obvious that the rules are different and
some care should be taken.

cheers,
dlg

> 
> 
> Laurie
> 
> [1] https://en.cppreference.com/w/c/atomic/memory_order
> [2] https://en.cppreference.com/w/c/atomic/atomic_compare_exchange
> 



Re: rpki-client: fix wrong conditional

2022-03-10 Thread Theo Buehler
On Thu, Mar 10, 2022 at 06:29:40PM +0100, Martin Vahlensieck wrote:
> On Thu, Mar 10, 2022 at 06:15:48PM +0100, Theo Buehler wrote:
> > On Thu, Mar 10, 2022 at 06:03:14PM +0100, Claudio Jeker wrote:
> > > On Thu, Mar 10, 2022 at 05:54:21PM +0100, Theo Buehler wrote:
> > > > On Thu, Mar 10, 2022 at 05:51:46PM +0100, Claudio Jeker wrote:
> > > > > On Thu, Mar 10, 2022 at 05:33:28PM +0100, Martin Vahlensieck wrote:
> > > > > > Hi
> > > > > > 
> > > > > > This pulls up and adjusts the check if i exceeds the bounds of pfds.
> > > > > > Before it was technically wrong, as i > NPFDS means that the last
> > > > > > write (i == NPFDS) was already out of bounds.
> > > > >  
> > > > > I see no reason to pull up the check but the if condition should 
> > > > > indeed be
> > > > > greater or equal. One could consider to change this into an assert() 
> > > > > but I
> > > > > think I stick with the errx().
> > > > 
> > > > Agreed. ok for the diff that just changes the checks to >=
> > > 
> > > Actually I was wrong, the check needs to happen at the start of the loop
> > > not at the end else it does not work if the list is exactly the number of
> > > elements to fill NPFDS. 
> > 
> > Ah right, we need to bail out when we would actually go past the limit.
> > However, I see no reason to fiddle with the timeout before checking.
> > So let's move the check to the beginning.
> 
> Here you go:
> 

Thanks. ok

> Index: http.c
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/http.c,v
> retrieving revision 1.53
> diff -u -p -r1.53 http.c
> --- http.c10 Feb 2022 11:10:40 -  1.53
> +++ http.c10 Mar 2022 17:28:21 -
> @@ -1809,6 +1809,9 @@ proc_http(char *bind_addr, int fd)
>   timeout = INFTIM;
>   now = getmonotime();
>   LIST_FOREACH(conn, , entry) {
> + if (i >= NPFDS)
> + errx(1, "too many connections");
> +
>   if (conn->io_time == 0)
>   conn->io_time = now + HTTP_IO_TIMEOUT;
>  
> @@ -1828,10 +1831,11 @@ proc_http(char *bind_addr, int fd)
>   pfds[i].events = conn->events;
>   conn->pfd = [i];
>   i++;
> - if (i > NPFDS)
> - errx(1, "too many connections");
>   }
>   LIST_FOREACH(conn, , entry) {
> + if (i >= NPFDS)
> + errx(1, "too many connections");
> +
>   if (conn->idle_time <= now)
>   timeout = 0;
>   else {
> @@ -1844,8 +1848,6 @@ proc_http(char *bind_addr, int fd)
>   pfds[i].events = POLLIN;
>   conn->pfd = [i];
>   i++;
> - if (i > NPFDS)
> - errx(1, "too many connections");
>   }
>  
>   if (poll(pfds, i, timeout) == -1) {
> 



Re: atomic read write

2022-03-10 Thread Alexander Bluhm
On Thu, Mar 10, 2022 at 04:39:49PM +0100, Alexander Bluhm wrote:
> > Below is a patch that shows how to accomplish release semantics with
> > the file and refcnt APIs. (The added memory barriers ensure that the
> > CPU completes its loads and stores to the object before dropping the
> > reference. Another CPU might delete the object immediately after.
> > The barrier might not be strictly necessary with refcnt_finalize(),
> > though.)

The enter and exit membars that protect the critical section should
be symmetric.  Maybe this diff is better.

And to avoid memory barriers the nobody understands we should convert
FRELE to refcnt_rele.

bluhm

Index: kern/kern_synch.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.183
diff -u -p -r1.183 kern_synch.c
--- kern/kern_synch.c   10 Mar 2022 15:21:08 -  1.183
+++ kern/kern_synch.c   10 Mar 2022 18:11:39 -
@@ -805,6 +805,7 @@ void
 refcnt_init(struct refcnt *r)
 {
atomic_store_int(>r_refs, 1);
+   membar_enter_after_atomic();
 }
 
 void
@@ -818,6 +819,7 @@ refcnt_take(struct refcnt *r)
 #else
atomic_inc_int(>r_refs);
 #endif
+   membar_enter_after_atomic();
 }
 
 int
@@ -825,6 +827,7 @@ refcnt_rele(struct refcnt *r)
 {
u_int refcnt;
 
+   membar_exit_before_atomic();
refcnt = atomic_dec_int_nv(>r_refs);
KASSERT(refcnt != ~0);
 
@@ -844,6 +847,7 @@ refcnt_finalize(struct refcnt *r, const 
struct sleep_state sls;
u_int refcnt;
 
+   membar_exit_before_atomic();
refcnt = atomic_dec_int_nv(>r_refs);
while (refcnt) {
sleep_setup(, r, PWAIT, wmesg, 0);
@@ -856,11 +860,13 @@ void
 cond_init(struct cond *c)
 {
atomic_store_int(>c_wait, 1);
+   membar_enter_after_atomic();
 }
 
 void
 cond_signal(struct cond *c)
 {
+   membar_exit_before_atomic();
atomic_store_int(>c_wait, 0);
 
wakeup_one(c);
@@ -872,9 +878,11 @@ cond_wait(struct cond *c, const char *wm
struct sleep_state sls;
unsigned int wait;
 
+   membar_exit_before_atomic();
wait = atomic_load_int(>c_wait);
while (wait) {
sleep_setup(, c, PWAIT, wmesg, 0);
+   membar_exit_before_atomic();
wait = atomic_load_int(>c_wait);
sleep_finish(, wait);
}



Re: atomic read write

2022-03-10 Thread Mark Kettenis
> Date: Thu, 10 Mar 2022 09:05:54 +
> From: Visa Hankala 
> 
> On Wed, Mar 09, 2022 at 08:45:35PM +0100, Alexander Bluhm wrote:
> > On Tue, Mar 08, 2022 at 04:55:56PM +0100, Alexander Bluhm wrote:
> > > Once we had the discussion where we need the READ_ONCE() macro.  As
> > > modern C compiler has much freedom how to access memory, I came to
> > > the conclusion that it would be wise to use READ_ONCE() and
> > > WRITE_ONCE() everywhere when we use atomic operations variables.
> > > Using atomic operations on one side and do whatever the compiler
> > > thinks at the other side of the variable feels wrong.
> > > 
> > > The rule use READ_ONCE, WRITE_ONCE, atomic_inc, atomic_dec consistently
> > > would be easy to follow.  Thinking about where the compiler might
> > > reorder things and break MP code is much more complicated.
> > >
> > > Do we want to go this direction?
> > 
> > mvs@ mentioned that FreeBSD has atomic load and store instructions
> > for that.  I decided to implement them as static inline functions
> > as they provide stronger type checks.  Also I add them for int and
> > long only, everything else is not atomic.
> > 
> > > If yes, here is the change for struct refcnt and cond.  While there
> > > rename the field to r_refs which is easier to grep.
> > 
> > Note that the _init functions do not need atomic operations.  But
> > the whole idea is to make it consistent and have a simple rule.  If
> > an MP variable locking is marked as atomic, use the atomic_ functions.
> > 
> > As a bonus alpha gets the membar it needs.
> 
> In general, atomic_* functions have not provided implicit memory
> barriers on OpenBSD.
> 
> I am not sure if the data dependency barrier is needed where
> atomic_load_int() and atomic_load_long() are used. The memory ordering
> guarantee is very weak and does not seem useful in any of the use cases
> in the patch. However, the barrier does not appear to make things worse
> in terms of correctness. Except maybe in assertions where they cause
> subtle side effects.
> 
> However, the patch looks good.
> 
> OK visa@

I have some doubts about using atomic_load_xxx() in KASSERTs.  A
KASSERT really shoudn't have any side-effects.  But atomic_load_xxx()
has a memory barrier in it, which does have the side-effect of
ordering memory accesses.

> > Index: sys/dev/pci/if_iwm.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwm.c,v
> > retrieving revision 1.391
> > diff -u -p -r1.391 if_iwm.c
> > --- sys/dev/pci/if_iwm.c8 Feb 2022 14:24:36 -   1.391
> > +++ sys/dev/pci/if_iwm.c9 Mar 2022 18:52:52 -
> > @@ -9975,7 +9975,7 @@ iwm_init(struct ifnet *ifp)
> >  
> > generation = ++sc->sc_generation;
> >  
> > -   KASSERT(sc->task_refs.refs == 0);
> > +   KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
> > refcnt_init(>task_refs);
> >  
> > err = iwm_preinit(sc);
> > @@ -10116,7 +10116,7 @@ iwm_stop(struct ifnet *ifp)
> > iwm_del_task(sc, systq, >mac_ctxt_task);
> > iwm_del_task(sc, systq, >phy_ctxt_task);
> > iwm_del_task(sc, systq, >bgscan_done_task);
> > -   KASSERT(sc->task_refs.refs >= 1);
> > +   KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
> > refcnt_finalize(>task_refs, "iwmstop");
> >  
> > iwm_stop_device(sc);
> > Index: sys/dev/pci/if_iwx.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iwx.c,v
> > retrieving revision 1.134
> > diff -u -p -r1.134 if_iwx.c
> > --- sys/dev/pci/if_iwx.c21 Jan 2022 15:51:02 -  1.134
> > +++ sys/dev/pci/if_iwx.c9 Mar 2022 18:53:50 -
> > @@ -8017,7 +8017,7 @@ iwx_init(struct ifnet *ifp)
> > if (sc->sc_nvm.sku_cap_11n_enable)
> > iwx_setup_ht_rates(sc);
> >  
> > -   KASSERT(sc->task_refs.refs == 0);
> > +   KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
> > refcnt_init(>task_refs);
> > ifq_clr_oactive(>if_snd);
> > ifp->if_flags |= IFF_RUNNING;
> > @@ -8139,7 +8139,7 @@ iwx_stop(struct ifnet *ifp)
> > iwx_del_task(sc, systq, >mac_ctxt_task);
> > iwx_del_task(sc, systq, >phy_ctxt_task);
> > iwx_del_task(sc, systq, >bgscan_done_task);
> > -   KASSERT(sc->task_refs.refs >= 1);
> > +   KASSERT(atomic_load_int(>task_refs.r_refs) >= 1);
> > refcnt_finalize(>task_refs, "iwxstop");
> >  
> > iwx_stop_device(sc);
> > Index: sys/kern/kern_synch.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
> > retrieving revision 1.182
> > diff -u -p -r1.182 kern_synch.c
> > --- sys/kern/kern_synch.c   19 Feb 2022 23:56:18 -  1.182
> > +++ sys/kern/kern_synch.c   9 Mar 2022 18:57:53 -
> > @@ -804,7 +804,7 @@ sys___thrwakeup(struct proc *p, void *v,
> >  void
> >  refcnt_init(struct refcnt *r)
> >  {
> > -   r->refs = 1;
> > +   atomic_store_int(>r_refs, 1);
> >  }
> >  
> >  void
> > 

Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Dave Voutila


Stefan Sperling  writes:

> On Thu, Mar 10, 2022 at 12:25:17PM +0100, Stefan Sperling wrote:
>> Unless anyone else finds a problem, this patch can be considered ready
>> for review and commit.
>
> Of course, I forgot to apply my sysassert fix to the second phy context
> command function...  Fixed here.
>

Looking good on the 3 devices I've tested, with one minor nit. All 3
devices are similar AX200s like so:

iwx0 at pci3 dev 0 function 0 "Intel Wi-Fi 6 AX200" rev 0x1a, msix
iwx0: hw rev 0x340, fw ver 67.8f59b80b.0, address ...

For some reason, on my lenovo x13 I had to manually enable 11ac mode via
ifconfig. No idea why. It was sitting on the table next to my Go 2 that
automatically picked up on 11ac mode.

In all 3 cases I tested with tpcbench between my apu2 gateway connected
to my Synology RT2600ac and the client machine. In all 3 cases I saw
close to a 100% increase in average bandwidth reported. :-)

-dv



Re: initial iwx(4) 11ac patch for testing

2022-03-10 Thread Stefan Sperling
On Thu, Mar 10, 2022 at 01:56:19PM -0500, Dave Voutila wrote:
> 
> Stefan Sperling  writes:
> 
> > On Thu, Mar 10, 2022 at 12:25:17PM +0100, Stefan Sperling wrote:
> >> Unless anyone else finds a problem, this patch can be considered ready
> >> for review and commit.
> >
> > Of course, I forgot to apply my sysassert fix to the second phy context
> > command function...  Fixed here.
> >
> 
> Looking good on the 3 devices I've tested, with one minor nit. All 3
> devices are similar AX200s like so:
> 
> iwx0 at pci3 dev 0 function 0 "Intel Wi-Fi 6 AX200" rev 0x1a, msix
> iwx0: hw rev 0x340, fw ver 67.8f59b80b.0, address ...
> 
> For some reason, on my lenovo x13 I had to manually enable 11ac mode via
> ifconfig. No idea why. It was sitting on the table next to my Go 2 that
> automatically picked up on 11ac mode.

Which channel was it on while not in 11ac mode?
Only the 5GHz channel range (chan 36+) can use 11ac.

In the 2GHz channel range (chan 1-12 for the US) 11ac cannot be used.
In that range it should autoselect mode 11n, or 11g/11b for very old APs.

We could tweak access point selection heuristics to prefer 11ac over 11n
if both are available for a given SSID. But that can be done in a later
patch.

> In all 3 cases I tested with tpcbench between my apu2 gateway connected
> to my Synology RT2600ac and the client machine. In all 3 cases I saw
> close to a 100% increase in average bandwidth reported. :-)

Very nice. Thanks for testing!



Re: atomic read write

2022-03-10 Thread David Gwynne
On Thu, Mar 10, 2022 at 10:26:21PM +, Laurence Tratt wrote:
> On Thu, Mar 10, 2022 at 09:52:27PM +0100, Mark Kettenis wrote:
> 
> Hello Mark,
> 
> > If you think about it, the invariants being tested by those KASSERTs should
> > not depend on whether the old or the new value is read if another CPU is
> > modifying that variable at the same time.  Unless of course there is a
> > refcounting bug.  But even with the barrier we're not guaranteed to catch
> > that bug.
> ...
> > > - KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
> > > + KASSERT(sc->task_refs.r_refs == 0);
> 
> Unfortunately this transformation almost certainly isn't safe: for example,
> the non-atomic load can return values that were never written by any thread
> (e.g. due to load/store tearing amongst other fun effects).

is that true even when care is taken to only use int/long sized
variables that are naturally aligned? are compilers that pathological
now?



Re: riscv64: ld.lld is too picky on ABI mismatch

2022-03-10 Thread Mark Kettenis
> From: Jeremie Courreges-Anglas 
> Cc: Mark Kettenis , Patrick Wildt 
> Date: Thu, 10 Mar 2022 02:03:10 +0100
> Content-Type: text/plain
> 
> On Mon, Oct 25 2021, Jeremie Courreges-Anglas  wrote:
> > On Mon, Oct 25 2021, Patrick Wildt  wrote:
> >> Am Mon, Oct 25, 2021 at 11:43:55AM +0200 schrieb Mark Kettenis:
> >>> > From: Jeremie Courreges-Anglas 
> >>> > Date: Sun, 24 Oct 2021 17:30:46 +0100
> >>> > 
> >>> > clang(1) defaults to FP ABI but ld(1) -melf64lriscv doesn't.  This is
> >>> > a problem as soon as a port tries to use raw ld(1) -b binary to embed
> >>> > data in a .o file.
> >>> > 
> >>> > Downgrading this hard error to a warning seems more useful as far as the
> >>> > ports tree is concerned.  The diff below fixes
> >>> > databases/postgresql-pllua and should also fix textproc/mupdf and 
> >>> > net/utox.
> >>> > 
> >>> > There's another diff here: https://reviews.llvm.org/D106378
> >>> > but it's slightly more complicated and it received seemingly negative
> >>> > feedback.  So I'm just using a minimal change to fit our needs.
> >>> > 
> >>> > ok?
> >>> 
> >>> I think we should try to avoid deviating from upstream as much as
> >>> possible.  And I agree with the folks who argue that the mismatching
> >>> objects are created with the wrong tools.
> >
> > Well the only alternative they suggest is using assembly and .incbin.
> > Maybe that works for the FreeBSD folks working on low-level stuff, but
> > not so much as a general purpose tool to include binary data in builds.
> >
> >>> But if mortimer@ and
> >>> patrick@ can deal with carrying this local modification I won't
> >>> object.
> >
> > I can't speak for them but given the local changes we have in clang and
> > lld land this two lines diff doesn't seem daunting. ;)
> >
> >> Well, I was about to send an LLVM 13 diff... so can we revisit this diff
> >> when we're updated to LLVM 13?
> 
> I have added comments on https://reviews.llvm.org/D106378#3219454 but
> the discussion has stalled again (I'll try to revive it).
> 
> I'm still using the diff below on the riscv64 ports build machines, and
> I'd like to get rid of the burden of keeping it in my tree.
> 
> ok?

ok kettenis@

> Index: ELF/Arch/RISCV.cpp
> ===
> RCS file: /cvs/src/gnu/llvm/lld/ELF/Arch/RISCV.cpp,v
> retrieving revision 1.1.1.3
> diff -u -p -r1.1.1.3 RISCV.cpp
> --- ELF/Arch/RISCV.cpp17 Dec 2021 12:25:02 -  1.1.1.3
> +++ ELF/Arch/RISCV.cpp28 Jan 2022 09:11:18 -
> @@ -124,8 +124,8 @@ uint32_t RISCV::calcEFlags() const {
>target |= EF_RISCV_RVC;
>  
>  if ((eflags & EF_RISCV_FLOAT_ABI) != (target & EF_RISCV_FLOAT_ABI))
> -  error(toString(f) +
> -": cannot link object files with different floating-point ABI");
> +  warn(toString(f) +
> +": linking object files with different floating-point ABI");
>  
>  if ((eflags & EF_RISCV_RVE) != (target & EF_RISCV_RVE))
>error(toString(f) +
> 
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 



Re: atomic read write

2022-03-10 Thread Laurence Tratt
On Thu, Mar 10, 2022 at 09:52:27PM +0100, Mark Kettenis wrote:

Hello Mark,

> If you think about it, the invariants being tested by those KASSERTs should
> not depend on whether the old or the new value is read if another CPU is
> modifying that variable at the same time.  Unless of course there is a
> refcounting bug.  But even with the barrier we're not guaranteed to catch
> that bug.
...
> > -   KASSERT(atomic_load_int(>task_refs.r_refs) == 0);
> > +   KASSERT(sc->task_refs.r_refs == 0);

Unfortunately this transformation almost certainly isn't safe: for example,
the non-atomic load can return values that were never written by any thread
(e.g. due to load/store tearing amongst other fun effects).


Laurie



Re: atomic read write

2022-03-10 Thread Laurence Tratt
On Fri, Mar 11, 2022 at 09:00:57AM +1000, David Gwynne wrote:

Hello David,

>> Unfortunately this transformation almost certainly isn't safe: for
>> example, the non-atomic load can return values that were never written by
>> any thread (e.g. due to load/store tearing amongst other fun effects).
> is that true even when care is taken to only use int/long sized variables
> that are naturally aligned? are compilers that pathological now?

Yes and yes, I'm afraid -- though rare, there are real examples of compilers
doing this (pity the poor person who debugs it!). My working guess is that
they'll be increasingly likely to occur as compilers become ever more
aggressive at taking advantage of undefined behaviour.

Jade Alglave et al. did an approachable intro to many (all?) of the horrors
one might observe in the context of the Linux kernel [1], which is worth a
read IMHO.


Laurie

[1] https://lwn.net/Articles/793253/