On 18/03/19 12:29, Peter Maydell wrote: > In the accessor functions ld*_he_p() and st*_he_p() we use memcpy() > to perform a load or store to a pointer which might not be aligned > for the size of the type. We rely on the compiler to optimize this > memcpy() into an efficient load or store instruction where possible. > This is required for good performance, but at the moment it is also > required for correct operation, because some users of these functions > require that the access is atomic if the pointer is aligned, which > will only be the case if the compiler has optimized out the memcpy(). > (The particular example where we discovered this is the virtio > vring_avail_idx() which calls virtio_lduw_phys_cached() which > eventually ends up calling lduw_he_p().) > > Unfortunately some compile environments, such as the fortify-source > setup used in Alpine Linux, define memcpy() to a wrapper function > in a way that inhibits this compiler optimization. > > The correct long-term fix here is to add a set of functions for > doing atomic accesses into AddressSpaces (and to other relevant > families of accessor functions like the virtio_*_phys_cached() > ones), and make sure that callsites which want atomic behaviour > use the correct functions. > > In the meantime, switch to using __builtin_memcpy() in the > bswap.h accessor functions. This will make us robust against things > like this fortify library in the short term. In the longer term > it will mean that we don't end up with these functions being really > badly-performing even if the semantics of the out-of-line memcpy() > are correct. > > Reported-by: Fernando Casas Schössow <casasferna...@outlook.com> > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org> > --- > include/qemu/bswap.h | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h > index 5a70f78c0ba..2a9f3fe783e 100644 > --- a/include/qemu/bswap.h > +++ b/include/qemu/bswap.h > @@ -316,51 +316,57 @@ static inline void stb_p(void *ptr, uint8_t v) > *(uint8_t *)ptr = v; > } > > -/* Any compiler worth its salt will turn these memcpy into native unaligned > - operations. Thus we don't need to play games with packed attributes, or > - inline byte-by-byte stores. */ > +/* > + * Any compiler worth its salt will turn these memcpy into native unaligned > + * operations. Thus we don't need to play games with packed attributes, or > + * inline byte-by-byte stores. > + * Some compilation environments (eg some fortify-source implementations) > + * may intercept memcpy() in a way that defeats the compiler optimization, > + * though, so we use __builtin_memcpy() to give ourselves the best chance > + * of good performance. > + */ > > static inline int lduw_he_p(const void *ptr) > { > uint16_t r; > - memcpy(&r, ptr, sizeof(r)); > + __builtin_memcpy(&r, ptr, sizeof(r)); > return r; > } > > static inline int ldsw_he_p(const void *ptr) > { > int16_t r; > - memcpy(&r, ptr, sizeof(r)); > + __builtin_memcpy(&r, ptr, sizeof(r)); > return r; > } > > static inline void stw_he_p(void *ptr, uint16_t v) > { > - memcpy(ptr, &v, sizeof(v)); > + __builtin_memcpy(ptr, &v, sizeof(v)); > } > > static inline int ldl_he_p(const void *ptr) > { > int32_t r; > - memcpy(&r, ptr, sizeof(r)); > + __builtin_memcpy(&r, ptr, sizeof(r)); > return r; > } > > static inline void stl_he_p(void *ptr, uint32_t v) > { > - memcpy(ptr, &v, sizeof(v)); > + __builtin_memcpy(ptr, &v, sizeof(v)); > } > > static inline uint64_t ldq_he_p(const void *ptr) > { > uint64_t r; > - memcpy(&r, ptr, sizeof(r)); > + __builtin_memcpy(&r, ptr, sizeof(r)); > return r; > } > > static inline void stq_he_p(void *ptr, uint64_t v) > { > - memcpy(ptr, &v, sizeof(v)); > + __builtin_memcpy(ptr, &v, sizeof(v)); > } > > static inline int lduw_le_p(const void *ptr) >
Queued, thanks. Paolo