Re: svn commit: r358733 - head/sys/sys
On 12/03/2020 09:48, Alan Cox wrote: > Just to be clear, this is not just a matter of whether our release fences are > poorly or inefficiently implemented. Release fences (as defined by the C/C++ > standards that inspire our atomics) have *different* semantics than release > stores. Specifically, a release fence places more ordering restrictions on the > nearby store accesses than a release store does. (In fact, "man 9 atomic" > describes this.) Thus, it is almost inevitable that a release fence will > have a > higher cost than the best possible implementation of a release store. For people like me who are new to these things here is a good explanation (perhaps a bad title, though): https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/ -- Andriy Gapon ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r358733 - head/sys/sys
On 3/11/20 9:16 PM, Mateusz Guzik wrote: On 3/10/20, Konstantin Belousov wrote: On Tue, Mar 10, 2020 at 12:22:09AM +0100, Mateusz Guzik wrote: On 3/9/20, Konstantin Belousov wrote: On Mon, Mar 09, 2020 at 01:56:17AM +0100, Mateusz Guzik wrote: On 3/8/20, Mateusz Guzik wrote: Author: mjg Date: Sun Mar 8 00:22:32 2020 New Revision: 358733 URL: https://svnweb.freebsd.org/changeset/base/358733 Log: seqc: tidy up - avoid hand-rolled read - match begin/end in terms of fence style There were off lists questions about this so let me clarify. The first bit is a cosmetic change, but the second one is not. Modified: head/sys/sys/seqc.h Modified: head/sys/sys/seqc.h == --- head/sys/sys/seqc.h Sat Mar 7 15:37:23 2020(r358732) +++ head/sys/sys/seqc.h Sun Mar 8 00:22:32 2020(r358733) @@ -66,7 +66,8 @@ static __inline void seqc_write_end(seqc_t *seqcp) { - atomic_store_rel_int(seqcp, *seqcp + 1); + atomic_thread_fence_rel(); + *seqcp += 1; MPASS(!seqc_in_modify(*seqcp)); critical_exit(); } For correct operation the counter has to be modified *before* any work is done and *after* it is completed. Consider a trivial case: seqc++; foo[0] = 0; foo[1] = 1; seqc++; There are 2 ways in which this can be mucked with: - the compiler can be looking at reordering or reworking the increment (e.g., to collapse it to just += 2 instead). a compiler barrier prevents that. - even with generated machine code which increments in correct places the cpu can be looking at reordering the operation (which is heavily dependent on architecture), in particular it could end up issuing seqc++; foo[1] = 1; which would defeat the point. This is where release fences come in. With the patched code there is: seqc++; atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo starts getting modified */ foo[0] = 0; foo[1] = 1; atomic_thread_fence_rel(); /* make sure modifications to foo don't leak past this spot */ seqc++; In comparison, the previous code was: seqc++; atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo starts getting modified */ foo[0] = 0; foo[1] = 1; atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to too don't leak past this spot and update seqc immediately */ There are 2 differences here -- one is an improvement and second one does not matter: the win is: the previous code forces the compiler to compute an incremented value and then store it. On amd64 this translates to the following (with rdx holding the address of the counter): mov(%rdx),%eax /* load the value */ add$0x1,%eax/* add 1 */ mov%eax,(%rdx) /* store it */ On patched kernel this is: addl $0x1,(%rdx) which is clearly much nicer. I am not sure that the new code on amd64 is much nicer. But the point was that on non-amd64, i.e. armv8 and potentially on powerpc 3.0, the patch substitutes release store with full barrier. The later is (much) slower. If an arch performs something significantly more expensive here it's probably a performance bug in the port. It is a question of how much hardware support for fine-grained fences. Relaxed architectures add some optimized operations already, but did not (yet) implemented complete C11 optimized set. I think they eventually converge. Until they did not, I think it is more important to not pessimize to-be Tier1 arches than to collapse three fast integer instructions on amd64 into one rw op. But perhaps more importantly there are significantly more frequent users of atomic_thread_fence_rel, like refcount(9). If the issue is real that should be looked at. For refcount(9) use of release fence removal, we would need atomic_fetchadd_rel_int(). To reiterate, if atomic_thread_fence_rel is indeed very expensive on certain architectures compared to other _rel variants, the kernel is already heavily pessimized because of use of said standalone fence in refcount(9) and few other places and this needs to be fixed. Just to be clear, this is not just a matter of whether our release fences are poorly or inefficiently implemented. Release fences (as defined by the C/C++ standards that inspire our atomics) have *different* semantics than release stores. Specifically, a release fence places more ordering restrictions on the nearby store accesses than a release store does. (In fact, "man 9 atomic" describes this.) Thus, it is almost inevitable that a release fence will have a higher cost than the best possible implementation of a release store. If the cost can't be reduced (e.g., perhaps this can do a no-op store + fence somewhere akin to how lock xadd 0,(%esp) trickery used to work), then the atomic APIs do indeed not to be completed w.r.t. adding all the missing acq/rel variants (that's at least fetchadd and swap). We can also extend the API to do relevant addition/subtraction without
Re: svn commit: r358733 - head/sys/sys
On 3/10/20, Konstantin Belousov wrote: > On Tue, Mar 10, 2020 at 12:22:09AM +0100, Mateusz Guzik wrote: >> On 3/9/20, Konstantin Belousov wrote: >> > On Mon, Mar 09, 2020 at 01:56:17AM +0100, Mateusz Guzik wrote: >> >> On 3/8/20, Mateusz Guzik wrote: >> >> > Author: mjg >> >> > Date: Sun Mar 8 00:22:32 2020 >> >> > New Revision: 358733 >> >> > URL: https://svnweb.freebsd.org/changeset/base/358733 >> >> > >> >> > Log: >> >> > seqc: tidy up >> >> > >> >> > - avoid hand-rolled read >> >> > - match begin/end in terms of fence style >> >> > >> >> >> >> There were off lists questions about this so let me clarify. >> >> >> >> The first bit is a cosmetic change, but the second one is not. >> >> >> >> > Modified: >> >> > head/sys/sys/seqc.h >> >> > >> >> > Modified: head/sys/sys/seqc.h >> >> > == >> >> > --- head/sys/sys/seqc.h Sat Mar 7 15:37:23 2020(r358732) >> >> > +++ head/sys/sys/seqc.h Sun Mar 8 00:22:32 2020(r358733) >> >> > @@ -66,7 +66,8 @@ static __inline void >> >> > seqc_write_end(seqc_t *seqcp) >> >> > { >> >> > >> >> > - atomic_store_rel_int(seqcp, *seqcp + 1); >> >> > + atomic_thread_fence_rel(); >> >> > + *seqcp += 1; >> >> > MPASS(!seqc_in_modify(*seqcp)); >> >> > critical_exit(); >> >> > } >> >> >> >> For correct operation the counter has to be modified *before* any work >> >> is done and *after* it is completed. >> >> >> >> Consider a trivial case: >> >> seqc++; >> >> foo[0] = 0; >> >> foo[1] = 1; >> >> seqc++; >> >> >> >> There are 2 ways in which this can be mucked with: >> >> - the compiler can be looking at reordering or reworking the increment >> >> (e.g., to collapse it to just += 2 instead). a compiler barrier >> >> prevents that. >> >> - even with generated machine code which increments in correct places >> >> the cpu can be looking at reordering the operation (which is heavily >> >> dependent on architecture), in particular it could end up issuing >> >> seqc++; foo[1] = 1; which would defeat the point. This is where >> >> release fences come in. >> >> >> >> With the patched code there is: >> >> seqc++; >> >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo >> >> starts getting modified */ >> >> foo[0] = 0; >> >> foo[1] = 1; >> >> atomic_thread_fence_rel(); /* make sure modifications to foo don't >> >> leak past this spot */ >> >> seqc++; >> >> >> >> In comparison, the previous code was: >> >> seqc++; >> >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo >> >> starts getting modified */ >> >> foo[0] = 0; >> >> foo[1] = 1; >> >> atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to >> >> too don't leak past this spot and update seqc immediately */ >> >> >> >> There are 2 differences here -- one is an improvement and second one >> >> does not matter: >> >> the win is: the previous code forces the compiler to compute an >> >> incremented value and then store it. On amd64 this translates to the >> >> following (with rdx holding the address of the counter): >> >> >> >> mov(%rdx),%eax /* load the value */ >> >> add$0x1,%eax/* add 1 */ >> >> mov%eax,(%rdx) /* store it */ >> >> >> >> On patched kernel this is: >> >> addl $0x1,(%rdx) >> >> >> >> which is clearly much nicer. >> > I am not sure that the new code on amd64 is much nicer. >> > >> > But the point was that on non-amd64, i.e. armv8 and potentially on >> > powerpc 3.0, the patch substitutes release store with full barrier. The >> > later is (much) slower. >> > >> >> If an arch performs something significantly more expensive here it's >> probably a performance bug in the port. > It is a question of how much hardware support for fine-grained fences. > Relaxed architectures add some optimized operations already, but did not > (yet) implemented complete C11 optimized set. > I think they eventually converge. > > Until they did not, I think it is more important to not pessimize > to-be Tier1 arches than to collapse three fast integer instructions > on amd64 into one rw op. > >> But perhaps more importantly >> there are significantly more frequent users of >> atomic_thread_fence_rel, like refcount(9). If the issue is real that >> should be looked at. > For refcount(9) use of release fence removal, we would need > atomic_fetchadd_rel_int(). > To reiterate, if atomic_thread_fence_rel is indeed very expensive on certain architectures compared to other _rel variants, the kernel is already heavily pessimized because of use of said standalone fence in refcount(9) and few other places and this needs to be fixed. If the cost can't be reduced (e.g., perhaps this can do a no-op store + fence somewhere akin to how lock xadd 0,(%esp) trickery used to work), then the atomic APIs do indeed not to be completed w.r.t. adding all the missing acq/rel variants (that's at least fetchadd and swap). We can also extend the API to do relevant
Re: svn commit: r358733 - head/sys/sys
On Tue, Mar 10, 2020 at 12:22:09AM +0100, Mateusz Guzik wrote: > On 3/9/20, Konstantin Belousov wrote: > > On Mon, Mar 09, 2020 at 01:56:17AM +0100, Mateusz Guzik wrote: > >> On 3/8/20, Mateusz Guzik wrote: > >> > Author: mjg > >> > Date: Sun Mar 8 00:22:32 2020 > >> > New Revision: 358733 > >> > URL: https://svnweb.freebsd.org/changeset/base/358733 > >> > > >> > Log: > >> > seqc: tidy up > >> > > >> > - avoid hand-rolled read > >> > - match begin/end in terms of fence style > >> > > >> > >> There were off lists questions about this so let me clarify. > >> > >> The first bit is a cosmetic change, but the second one is not. > >> > >> > Modified: > >> > head/sys/sys/seqc.h > >> > > >> > Modified: head/sys/sys/seqc.h > >> > == > >> > --- head/sys/sys/seqc.h Sat Mar 7 15:37:23 2020(r358732) > >> > +++ head/sys/sys/seqc.h Sun Mar 8 00:22:32 2020(r358733) > >> > @@ -66,7 +66,8 @@ static __inline void > >> > seqc_write_end(seqc_t *seqcp) > >> > { > >> > > >> > -atomic_store_rel_int(seqcp, *seqcp + 1); > >> > +atomic_thread_fence_rel(); > >> > +*seqcp += 1; > >> > MPASS(!seqc_in_modify(*seqcp)); > >> > critical_exit(); > >> > } > >> > >> For correct operation the counter has to be modified *before* any work > >> is done and *after* it is completed. > >> > >> Consider a trivial case: > >> seqc++; > >> foo[0] = 0; > >> foo[1] = 1; > >> seqc++; > >> > >> There are 2 ways in which this can be mucked with: > >> - the compiler can be looking at reordering or reworking the increment > >> (e.g., to collapse it to just += 2 instead). a compiler barrier > >> prevents that. > >> - even with generated machine code which increments in correct places > >> the cpu can be looking at reordering the operation (which is heavily > >> dependent on architecture), in particular it could end up issuing > >> seqc++; foo[1] = 1; which would defeat the point. This is where > >> release fences come in. > >> > >> With the patched code there is: > >> seqc++; > >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo > >> starts getting modified */ > >> foo[0] = 0; > >> foo[1] = 1; > >> atomic_thread_fence_rel(); /* make sure modifications to foo don't > >> leak past this spot */ > >> seqc++; > >> > >> In comparison, the previous code was: > >> seqc++; > >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo > >> starts getting modified */ > >> foo[0] = 0; > >> foo[1] = 1; > >> atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to > >> too don't leak past this spot and update seqc immediately */ > >> > >> There are 2 differences here -- one is an improvement and second one > >> does not matter: > >> the win is: the previous code forces the compiler to compute an > >> incremented value and then store it. On amd64 this translates to the > >> following (with rdx holding the address of the counter): > >> > >> mov(%rdx),%eax /* load the value */ > >> add$0x1,%eax/* add 1 */ > >> mov%eax,(%rdx) /* store it */ > >> > >> On patched kernel this is: > >> addl $0x1,(%rdx) > >> > >> which is clearly much nicer. > > I am not sure that the new code on amd64 is much nicer. > > > > But the point was that on non-amd64, i.e. armv8 and potentially on > > powerpc 3.0, the patch substitutes release store with full barrier. The > > later is (much) slower. > > > > If an arch performs something significantly more expensive here it's > probably a performance bug in the port. It is a question of how much hardware support for fine-grained fences. Relaxed architectures add some optimized operations already, but did not (yet) implemented complete C11 optimized set. I think they eventually converge. Until they did not, I think it is more important to not pessimize to-be Tier1 arches than to collapse three fast integer instructions on amd64 into one rw op. > But perhaps more importantly > there are significantly more frequent users of > atomic_thread_fence_rel, like refcount(9). If the issue is real that > should be looked at. For refcount(9) use of release fence removal, we would need atomic_fetchadd_rel_int(). > > >> > >> the not a problem: the code does not issue the release fence after > >> incrementing the counter the second time, meaning that in principle it > >> can happen arbitrarily late, possibly disturbing code which waits for > >> the counter to become even. This is not a problem because in whoever > >> modifies it is supposed to have a lock and release it shortly after, > >> which also posts the release fence and consequently makes sure the > >> counter update is visible. Also note these routines come with > >> preemption disablement around the modification -- if the thread gets > >> preempted as a result of critical_exit from seqc_write_end, it will > >> also publish the updated counter. If the updater never does anything > >> which
Re: svn commit: r358733 - head/sys/sys
On 3/9/20, Konstantin Belousov wrote: > On Mon, Mar 09, 2020 at 01:56:17AM +0100, Mateusz Guzik wrote: >> On 3/8/20, Mateusz Guzik wrote: >> > Author: mjg >> > Date: Sun Mar 8 00:22:32 2020 >> > New Revision: 358733 >> > URL: https://svnweb.freebsd.org/changeset/base/358733 >> > >> > Log: >> > seqc: tidy up >> > >> > - avoid hand-rolled read >> > - match begin/end in terms of fence style >> > >> >> There were off lists questions about this so let me clarify. >> >> The first bit is a cosmetic change, but the second one is not. >> >> > Modified: >> > head/sys/sys/seqc.h >> > >> > Modified: head/sys/sys/seqc.h >> > == >> > --- head/sys/sys/seqc.hSat Mar 7 15:37:23 2020(r358732) >> > +++ head/sys/sys/seqc.hSun Mar 8 00:22:32 2020(r358733) >> > @@ -66,7 +66,8 @@ static __inline void >> > seqc_write_end(seqc_t *seqcp) >> > { >> > >> > - atomic_store_rel_int(seqcp, *seqcp + 1); >> > + atomic_thread_fence_rel(); >> > + *seqcp += 1; >> >MPASS(!seqc_in_modify(*seqcp)); >> >critical_exit(); >> > } >> >> For correct operation the counter has to be modified *before* any work >> is done and *after* it is completed. >> >> Consider a trivial case: >> seqc++; >> foo[0] = 0; >> foo[1] = 1; >> seqc++; >> >> There are 2 ways in which this can be mucked with: >> - the compiler can be looking at reordering or reworking the increment >> (e.g., to collapse it to just += 2 instead). a compiler barrier >> prevents that. >> - even with generated machine code which increments in correct places >> the cpu can be looking at reordering the operation (which is heavily >> dependent on architecture), in particular it could end up issuing >> seqc++; foo[1] = 1; which would defeat the point. This is where >> release fences come in. >> >> With the patched code there is: >> seqc++; >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo >> starts getting modified */ >> foo[0] = 0; >> foo[1] = 1; >> atomic_thread_fence_rel(); /* make sure modifications to foo don't >> leak past this spot */ >> seqc++; >> >> In comparison, the previous code was: >> seqc++; >> atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo >> starts getting modified */ >> foo[0] = 0; >> foo[1] = 1; >> atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to >> too don't leak past this spot and update seqc immediately */ >> >> There are 2 differences here -- one is an improvement and second one >> does not matter: >> the win is: the previous code forces the compiler to compute an >> incremented value and then store it. On amd64 this translates to the >> following (with rdx holding the address of the counter): >> >> mov(%rdx),%eax /* load the value */ >> add$0x1,%eax/* add 1 */ >> mov%eax,(%rdx) /* store it */ >> >> On patched kernel this is: >> addl $0x1,(%rdx) >> >> which is clearly much nicer. > I am not sure that the new code on amd64 is much nicer. > > But the point was that on non-amd64, i.e. armv8 and potentially on > powerpc 3.0, the patch substitutes release store with full barrier. The > later is (much) slower. > If an arch performs something significantly more expensive here it's probably a performance bug in the port. But perhaps more importantly there are significantly more frequent users of atomic_thread_fence_rel, like refcount(9). If the issue is real that should be looked at. >> >> the not a problem: the code does not issue the release fence after >> incrementing the counter the second time, meaning that in principle it >> can happen arbitrarily late, possibly disturbing code which waits for >> the counter to become even. This is not a problem because in whoever >> modifies it is supposed to have a lock and release it shortly after, >> which also posts the release fence and consequently makes sure the >> counter update is visible. Also note these routines come with >> preemption disablement around the modification -- if the thread gets >> preempted as a result of critical_exit from seqc_write_end, it will >> also publish the updated counter. If the updater never does anything >> which flushes the store buffer then in the absolute worst case they >> will get preempted, at which point once more we are covered. Note that >> code which wants more deterministic behavior will want to >> seqc_read_any, test the counter once and if it is caught uneven resort >> to locking. >> >> > @@ -84,7 +85,7 @@ seqc_read(const seqc_t *seqcp) >> >seqc_t ret; >> > >> >for (;;) { >> > - ret = atomic_load_acq_int(__DECONST(seqc_t *, seqcp)); >> > + ret = seqc_read_any(seqcp); >> >if (__predict_false(seqc_in_modify(ret))) { >> >cpu_spinwait(); >> >continue; >> > ___ >> > svn-src-all@freebsd.org mailing list >> > https://lists.freebsd.org/mailman/listinfo/svn-src-all >> > To
Re: svn commit: r358733 - head/sys/sys
On Mon, Mar 09, 2020 at 01:56:17AM +0100, Mateusz Guzik wrote: > On 3/8/20, Mateusz Guzik wrote: > > Author: mjg > > Date: Sun Mar 8 00:22:32 2020 > > New Revision: 358733 > > URL: https://svnweb.freebsd.org/changeset/base/358733 > > > > Log: > > seqc: tidy up > > > > - avoid hand-rolled read > > - match begin/end in terms of fence style > > > > There were off lists questions about this so let me clarify. > > The first bit is a cosmetic change, but the second one is not. > > > Modified: > > head/sys/sys/seqc.h > > > > Modified: head/sys/sys/seqc.h > > == > > --- head/sys/sys/seqc.h Sat Mar 7 15:37:23 2020(r358732) > > +++ head/sys/sys/seqc.h Sun Mar 8 00:22:32 2020(r358733) > > @@ -66,7 +66,8 @@ static __inline void > > seqc_write_end(seqc_t *seqcp) > > { > > > > - atomic_store_rel_int(seqcp, *seqcp + 1); > > + atomic_thread_fence_rel(); > > + *seqcp += 1; > > MPASS(!seqc_in_modify(*seqcp)); > > critical_exit(); > > } > > For correct operation the counter has to be modified *before* any work > is done and *after* it is completed. > > Consider a trivial case: > seqc++; > foo[0] = 0; > foo[1] = 1; > seqc++; > > There are 2 ways in which this can be mucked with: > - the compiler can be looking at reordering or reworking the increment > (e.g., to collapse it to just += 2 instead). a compiler barrier > prevents that. > - even with generated machine code which increments in correct places > the cpu can be looking at reordering the operation (which is heavily > dependent on architecture), in particular it could end up issuing > seqc++; foo[1] = 1; which would defeat the point. This is where > release fences come in. > > With the patched code there is: > seqc++; > atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo > starts getting modified */ > foo[0] = 0; > foo[1] = 1; > atomic_thread_fence_rel(); /* make sure modifications to foo don't > leak past this spot */ > seqc++; > > In comparison, the previous code was: > seqc++; > atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo > starts getting modified */ > foo[0] = 0; > foo[1] = 1; > atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to > too don't leak past this spot and update seqc immediately */ > > There are 2 differences here -- one is an improvement and second one > does not matter: > the win is: the previous code forces the compiler to compute an > incremented value and then store it. On amd64 this translates to the > following (with rdx holding the address of the counter): > > mov(%rdx),%eax /* load the value */ > add$0x1,%eax/* add 1 */ > mov%eax,(%rdx) /* store it */ > > On patched kernel this is: > addl $0x1,(%rdx) > > which is clearly much nicer. I am not sure that the new code on amd64 is much nicer. But the point was that on non-amd64, i.e. armv8 and potentially on powerpc 3.0, the patch substitutes release store with full barrier. The later is (much) slower. > > the not a problem: the code does not issue the release fence after > incrementing the counter the second time, meaning that in principle it > can happen arbitrarily late, possibly disturbing code which waits for > the counter to become even. This is not a problem because in whoever > modifies it is supposed to have a lock and release it shortly after, > which also posts the release fence and consequently makes sure the > counter update is visible. Also note these routines come with > preemption disablement around the modification -- if the thread gets > preempted as a result of critical_exit from seqc_write_end, it will > also publish the updated counter. If the updater never does anything > which flushes the store buffer then in the absolute worst case they > will get preempted, at which point once more we are covered. Note that > code which wants more deterministic behavior will want to > seqc_read_any, test the counter once and if it is caught uneven resort > to locking. > > > @@ -84,7 +85,7 @@ seqc_read(const seqc_t *seqcp) > > seqc_t ret; > > > > for (;;) { > > - ret = atomic_load_acq_int(__DECONST(seqc_t *, seqcp)); > > + ret = seqc_read_any(seqcp); > > if (__predict_false(seqc_in_modify(ret))) { > > cpu_spinwait(); > > continue; > > ___ > > svn-src-all@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/svn-src-all > > To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org" > > > > > -- > Mateusz Guzik ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r358733 - head/sys/sys
On 3/8/20, Mateusz Guzik wrote: > Author: mjg > Date: Sun Mar 8 00:22:32 2020 > New Revision: 358733 > URL: https://svnweb.freebsd.org/changeset/base/358733 > > Log: > seqc: tidy up > > - avoid hand-rolled read > - match begin/end in terms of fence style > There were off lists questions about this so let me clarify. The first bit is a cosmetic change, but the second one is not. > Modified: > head/sys/sys/seqc.h > > Modified: head/sys/sys/seqc.h > == > --- head/sys/sys/seqc.h Sat Mar 7 15:37:23 2020(r358732) > +++ head/sys/sys/seqc.h Sun Mar 8 00:22:32 2020(r358733) > @@ -66,7 +66,8 @@ static __inline void > seqc_write_end(seqc_t *seqcp) > { > > - atomic_store_rel_int(seqcp, *seqcp + 1); > + atomic_thread_fence_rel(); > + *seqcp += 1; > MPASS(!seqc_in_modify(*seqcp)); > critical_exit(); > } For correct operation the counter has to be modified *before* any work is done and *after* it is completed. Consider a trivial case: seqc++; foo[0] = 0; foo[1] = 1; seqc++; There are 2 ways in which this can be mucked with: - the compiler can be looking at reordering or reworking the increment (e.g., to collapse it to just += 2 instead). a compiler barrier prevents that. - even with generated machine code which increments in correct places the cpu can be looking at reordering the operation (which is heavily dependent on architecture), in particular it could end up issuing seqc++; foo[1] = 1; which would defeat the point. This is where release fences come in. With the patched code there is: seqc++; atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo starts getting modified */ foo[0] = 0; foo[1] = 1; atomic_thread_fence_rel(); /* make sure modifications to foo don't leak past this spot */ seqc++; In comparison, the previous code was: seqc++; atomic_thread_fence_rel(); /* make sure seqc++ is issued before foo starts getting modified */ foo[0] = 0; foo[1] = 1; atomic_store_rel_int(seqcp, *seqcp + 1); /* make sure modifications to too don't leak past this spot and update seqc immediately */ There are 2 differences here -- one is an improvement and second one does not matter: the win is: the previous code forces the compiler to compute an incremented value and then store it. On amd64 this translates to the following (with rdx holding the address of the counter): mov(%rdx),%eax /* load the value */ add$0x1,%eax/* add 1 */ mov%eax,(%rdx) /* store it */ On patched kernel this is: addl $0x1,(%rdx) which is clearly much nicer. the not a problem: the code does not issue the release fence after incrementing the counter the second time, meaning that in principle it can happen arbitrarily late, possibly disturbing code which waits for the counter to become even. This is not a problem because in whoever modifies it is supposed to have a lock and release it shortly after, which also posts the release fence and consequently makes sure the counter update is visible. Also note these routines come with preemption disablement around the modification -- if the thread gets preempted as a result of critical_exit from seqc_write_end, it will also publish the updated counter. If the updater never does anything which flushes the store buffer then in the absolute worst case they will get preempted, at which point once more we are covered. Note that code which wants more deterministic behavior will want to seqc_read_any, test the counter once and if it is caught uneven resort to locking. > @@ -84,7 +85,7 @@ seqc_read(const seqc_t *seqcp) > seqc_t ret; > > for (;;) { > - ret = atomic_load_acq_int(__DECONST(seqc_t *, seqcp)); > + ret = seqc_read_any(seqcp); > if (__predict_false(seqc_in_modify(ret))) { > cpu_spinwait(); > continue; > ___ > svn-src-all@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/svn-src-all > To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org" > -- Mateusz Guzik ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
svn commit: r358733 - head/sys/sys
Author: mjg Date: Sun Mar 8 00:22:32 2020 New Revision: 358733 URL: https://svnweb.freebsd.org/changeset/base/358733 Log: seqc: tidy up - avoid hand-rolled read - match begin/end in terms of fence style Modified: head/sys/sys/seqc.h Modified: head/sys/sys/seqc.h == --- head/sys/sys/seqc.h Sat Mar 7 15:37:23 2020(r358732) +++ head/sys/sys/seqc.h Sun Mar 8 00:22:32 2020(r358733) @@ -66,7 +66,8 @@ static __inline void seqc_write_end(seqc_t *seqcp) { - atomic_store_rel_int(seqcp, *seqcp + 1); + atomic_thread_fence_rel(); + *seqcp += 1; MPASS(!seqc_in_modify(*seqcp)); critical_exit(); } @@ -84,7 +85,7 @@ seqc_read(const seqc_t *seqcp) seqc_t ret; for (;;) { - ret = atomic_load_acq_int(__DECONST(seqc_t *, seqcp)); + ret = seqc_read_any(seqcp); if (__predict_false(seqc_in_modify(ret))) { cpu_spinwait(); continue; ___ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"