Re: [PATCH 1/1]: MINOR __builtin_memcpy_inline usage introduction
On Mon, 20 Jun 2022 at 07:27, Willy Tarreau wrote: > > Hi David, > > On Sat, Jun 18, 2022 at 12:52:23PM +0100, David CARLIER wrote: > > From 9d7b6448a2407451c3115b701c51f97ab2bf6a59 Mon Sep 17 00:00:00 2001 > > From: David Carlier > > Date: Sat, 18 Jun 2022 12:41:11 +0100 > > Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction. > > > > Optimised version of the existing __builtin_memcpy builtin, differences > > reside by the fact it works only with constant time sizes and does > > generate extra calls. > > At the moment, is clang exclusive even tough GCC does not seem to > > want to implement it, the comments try not to reflect this current > > state. > > Usage can be expanded, used purposely only in few places for starter. > > --- > > include/haproxy/compiler.h | 12 > > src/haproxy.c | 4 ++-- > > src/log.c | 2 +- > > src/proxy.c| 2 +- > > 4 files changed, 16 insertions(+), 4 deletions(-) > > > > diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h > > index 66b5f5835..dca9f6aef 100644 > > --- a/include/haproxy/compiler.h > > +++ b/include/haproxy/compiler.h > > @@ -192,6 +192,18 @@ > > #endif > > #endif > > > > +/* > > + * This builtin works only with compile time lengths unlike > > __builtin_memcpy. > > + * Also guarantees no external call is generated. > > + * We could `replicate` the aforementioned constraint with a > > + * _Static_assert/__builtin_constant_p theoretically, but that might be > > + * too much trouble to make it happens (C11 min) > > + * so here we trust the proper usage with other compilers (and the CI > > infrastructure). > > + */ > > +#if !defined(__clang__) || __clang_major__ < 11 > > +#define __builtin_memcpy_inline(x, y, s) memcpy(x, y, s) > > +#endif > > + > > Hmmm please no, this is not the right way to do it for two reasons: > - it will encourage use of __builtin_memcpy_inline() making one believe > it's the expected one when it isn't ; > > - developing on non-clang systems will not report the problem with the > non-constant size that __builtin_memcpy_inline() expects, resulting > in breakage from time to time. > > I'd suggest that instead you create a macro named ha_memcpy_inline() that > maps to __builtin_memcpy_inline() when present and s is a builtin constant, > or maps to __builtin_memcpy() for the rest. Please note, by the way, that > gcc since at least 3.4 has been emitting the same instructions (rep movsb) > as clang's __builtin_memcpy_inline(), but only when optimizing for size. > When optimizing for anything else, it can emit ton of inlined garbage^Wvector > instructions. > > Thus I suspect we could achieve the same result by doing a clever > arrangement of #pragma GCC push_options / #pragma GCC optimize("Os,inline") > around an inline function definition that does the memcpy, and that is > called by the macro. I have not tried, but probably something like this > would do it: > > #pragma GCC push_options > #pragma GCC optimize("Os,inline") > static inline void _ha_memcpy_inline(void *restrict dest, const void > *restrict src, size_t n) > { > __builtin_memcpy(dest, src, n); > } > #pragma GCC pop_options > > #if defined(clang...) > #define ha_memcpy_inline(d,s,n) do { if (__builtin_constant_p(n)) > __builtin_memcpy_inline(d,s,n); else _ha_memcpy_inline(d,s,n); } while (0) > #else > #define ha_memcpy_inline(d,s,n) _ha_memcpy_inline(d,s,n) > #endif > > That's not even compile-tested and I haven't looked at the result, but > you get the idea. Alright thanks here a newer version. > > Now regarding where to use it... I'd say it depends on a lot of factors, > size, speed, undesired dependency on external libs. I think that each and > every call place does warrant an individual commit with a justification > and a check that the benefit matches expectations. There could be some > value in using this in various low-level protocol layers (QUIC, HTX, SPOE). Exactly. > Maybe more, I don't know. Over time its value will display more. > > Thanks, > Willy From 90c951d29c32a345cf38d6c0b43ee904f12b8ac7 Mon Sep 17 00:00:00 2001 From: David Carlier Date: Sat, 18 Jun 2022 12:41:11 +0100 Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction. Optimised version of the existing __builtin_memcpy builtin, differences reside by the fact it works only with constant time sizes and does not generate extra calls. At the moment, is clang exclusive even tough GCC does not seem to want to implement it, the comments try not to reflect this current state. Usage can be expanded, used purposely only in few places for starter. --- include/haproxy/compiler.h | 32 src/haproxy.c | 4 ++-- src/log.c | 2 +- src/proxy.c| 2 +- 4 files changed, 36 insertions(+), 4 deletions(-) diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h index
Re: [PATCH 1/1]: MINOR __builtin_memcpy_inline usage introduction
Hi David, On Sat, Jun 18, 2022 at 12:52:23PM +0100, David CARLIER wrote: > From 9d7b6448a2407451c3115b701c51f97ab2bf6a59 Mon Sep 17 00:00:00 2001 > From: David Carlier > Date: Sat, 18 Jun 2022 12:41:11 +0100 > Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction. > > Optimised version of the existing __builtin_memcpy builtin, differences > reside by the fact it works only with constant time sizes and does > generate extra calls. > At the moment, is clang exclusive even tough GCC does not seem to > want to implement it, the comments try not to reflect this current > state. > Usage can be expanded, used purposely only in few places for starter. > --- > include/haproxy/compiler.h | 12 > src/haproxy.c | 4 ++-- > src/log.c | 2 +- > src/proxy.c| 2 +- > 4 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h > index 66b5f5835..dca9f6aef 100644 > --- a/include/haproxy/compiler.h > +++ b/include/haproxy/compiler.h > @@ -192,6 +192,18 @@ > #endif > #endif > > +/* > + * This builtin works only with compile time lengths unlike __builtin_memcpy. > + * Also guarantees no external call is generated. > + * We could `replicate` the aforementioned constraint with a > + * _Static_assert/__builtin_constant_p theoretically, but that might be > + * too much trouble to make it happens (C11 min) > + * so here we trust the proper usage with other compilers (and the CI > infrastructure). > + */ > +#if !defined(__clang__) || __clang_major__ < 11 > +#define __builtin_memcpy_inline(x, y, s) memcpy(x, y, s) > +#endif > + Hmmm please no, this is not the right way to do it for two reasons: - it will encourage use of __builtin_memcpy_inline() making one believe it's the expected one when it isn't ; - developing on non-clang systems will not report the problem with the non-constant size that __builtin_memcpy_inline() expects, resulting in breakage from time to time. I'd suggest that instead you create a macro named ha_memcpy_inline() that maps to __builtin_memcpy_inline() when present and s is a builtin constant, or maps to __builtin_memcpy() for the rest. Please note, by the way, that gcc since at least 3.4 has been emitting the same instructions (rep movsb) as clang's __builtin_memcpy_inline(), but only when optimizing for size. When optimizing for anything else, it can emit ton of inlined garbage^Wvector instructions. Thus I suspect we could achieve the same result by doing a clever arrangement of #pragma GCC push_options / #pragma GCC optimize("Os,inline") around an inline function definition that does the memcpy, and that is called by the macro. I have not tried, but probably something like this would do it: #pragma GCC push_options #pragma GCC optimize("Os,inline") static inline void _ha_memcpy_inline(void *restrict dest, const void *restrict src, size_t n) { __builtin_memcpy(dest, src, n); } #pragma GCC pop_options #if defined(clang...) #define ha_memcpy_inline(d,s,n) do { if (__builtin_constant_p(n)) __builtin_memcpy_inline(d,s,n); else _ha_memcpy_inline(d,s,n); } while (0) #else #define ha_memcpy_inline(d,s,n) _ha_memcpy_inline(d,s,n) #endif That's not even compile-tested and I haven't looked at the result, but you get the idea. Now regarding where to use it... I'd say it depends on a lot of factors, size, speed, undesired dependency on external libs. I think that each and every call place does warrant an individual commit with a justification and a check that the benefit matches expectations. There could be some value in using this in various low-level protocol layers (QUIC, HTX, SPOE). Maybe more, I don't know. Thanks, Willy
[PATCH 1/1]: MINOR __builtin_memcpy_inline usage introduction
Hi here a little proposal to use this specific clang builtin, only placed in few places but over time can be expanded cautiously. Kindest regards. Thanks. From 9d7b6448a2407451c3115b701c51f97ab2bf6a59 Mon Sep 17 00:00:00 2001 From: David Carlier Date: Sat, 18 Jun 2022 12:41:11 +0100 Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction. Optimised version of the existing __builtin_memcpy builtin, differences reside by the fact it works only with constant time sizes and does generate extra calls. At the moment, is clang exclusive even tough GCC does not seem to want to implement it, the comments try not to reflect this current state. Usage can be expanded, used purposely only in few places for starter. --- include/haproxy/compiler.h | 12 src/haproxy.c | 4 ++-- src/log.c | 2 +- src/proxy.c| 2 +- 4 files changed, 16 insertions(+), 4 deletions(-) diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h index 66b5f5835..dca9f6aef 100644 --- a/include/haproxy/compiler.h +++ b/include/haproxy/compiler.h @@ -192,6 +192,18 @@ #endif #endif +/* + * This builtin works only with compile time lengths unlike __builtin_memcpy. + * Also guarantees no external call is generated. + * We could `replicate` the aforementioned constraint with a + * _Static_assert/__builtin_constant_p theoretically, but that might be + * too much trouble to make it happens (C11 min) + * so here we trust the proper usage with other compilers (and the CI infrastructure). + */ +#if !defined(__clang__) || __clang_major__ < 11 +#define __builtin_memcpy_inline(x, y, s) memcpy(x, y, s) +#endif + /* Linux-like "container_of". It returns a pointer to the structure of type * which has its member stored at address . */ diff --git a/src/haproxy.c b/src/haproxy.c index 32eb4bdc6..1464d29ee 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -1265,11 +1265,11 @@ static void ha_random_boot(char *const *argv) /* stack address (benefit form operating system's ASLR) */ l = (unsigned long) - memcpy(m, , sizeof(l)); m += sizeof(l); + __builtin_memcpy_inline(m, , sizeof(l)); m += sizeof(l); /* argv address (benefit form operating system's ASLR) */ l = (unsigned long) - memcpy(m, , sizeof(l)); m += sizeof(l); + __builtin_memcpy_inline(m, , sizeof(l)); m += sizeof(l); /* use tv_usec again after all the operations above */ gettimeofday(, NULL); diff --git a/src/log.c b/src/log.c index 304e3cb68..a631b1e46 100644 --- a/src/log.c +++ b/src/log.c @@ -808,7 +808,7 @@ int parse_logsrv(char **args, struct list *logsrvs, int do_del, const char *file } node = malloc(sizeof(*node)); - memcpy(node, logsrv, sizeof(struct logsrv)); + __builtin_memcpy_inline(node, logsrv, sizeof(struct logsrv)); node->ref = logsrv; LIST_INIT(>list); LIST_APPEND(logsrvs, >list); diff --git a/src/proxy.c b/src/proxy.c index c10a8cf83..7d02f8c19 100644 --- a/src/proxy.c +++ b/src/proxy.c @@ -1818,7 +1818,7 @@ static int proxy_defproxy_cpy(struct proxy *curproxy, const struct proxy *defpro memprintf(errmsg, "proxy '%s': out of memory", curproxy->id); return 1; } - memcpy(node, tmplogsrv, sizeof(struct logsrv)); + __builtin_memcpy_inline(node, tmplogsrv, sizeof(struct logsrv)); node->ref = tmplogsrv->ref; LIST_INIT(>list); LIST_APPEND(>logsrvs, >list); -- 2.34.1