On Sun, Dec 08, 2013 at 10:50:15PM +0000, David Holland wrote: > > > I have done it by having the original, non-_r functions provide a > > > thunk for the comparison function, as this is least invasive. If we > > > think this is too expensive, an alternative is generating a union of > > > function pointers and making tests at the call sites; another option > > > is to duplicate the code (hopefully with cpp rather than C&P) but that > > > seems like a bad plan. > > > > I'd prefer to not have another indirect call. The only difference > > is the definition and expanding a CMP macro differently? > > Yes. But I'd rather not duplicate the code...
So, to ground this a bit, the realistic options are: starting from this (various details elided): void sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *)) { ... r = cmp(a, b); ... } (1) thunks [what's in my current patch] void sort_r(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *, void *), void *data) { ... r = cmp(a, b, data); ... } struct sort { int (*cmp)(const void *, const void *); }; static int thunk(const void *a, const void *b, void *data) { return ((struct sort *)data)->cmp(a, b) } void sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *)) { struct sort x = { .cmp = cmp }; sort_r(p, n, sz, thunk, &x); } pros: + already written + noninvasive + reasonably tidy + does not duplicate the output code cons: - extra indirect calls (2) union of function pointers union sort { int (*cmp)(const void *, const void *); int (*cmp_r)(const void *, const void *, void *); }; static void sort_internal(void *p, size_t n, size_t sz, bool is_r, union sort cmpu, void *data) { ... r = is_r ? cmpu.cmp(a, b) : cmpu.cmp_r(a, b, data); ... } void sort(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void *)) { union sort cmpu = { .cmp = cmp }; sort_internal(p, n, sz, true, cmpu, NULL); } void sort_r(void *p, size_t n, size_t sz, int (*cmp_r)(const void *, const void *, void *), void *data) { union sort cmpu = { .cmp_r = cmp }; sort_internal(p, n, sz, true, cmpu, data); } pros: + not excessively invasive + no indirect calls + does not duplicate the output code + most likely to have unnecessary logic removed by the compiler cons: + a bit ugly (3) cloning the code with cpp --- in sort.c --- #ifdef MAKE_ME_REENTRANT #define SORT sort_r #define EXTRA , void *data #define CMP(a, b) cmp(a, b, data) #else #define SORT sort #define EXTRA #define CMP(a, b) cmp(a, b) #endif void SORT(void *p, size_t n, size_t sz, int (*cmp)(const void *, const void * EXTRA) EXTRA) { ... r = CMP(a, b); ... } --- in sort_r.c --- #define MAKE_ME_REENTRANT #include sort.c pros: + noninvasive + no runtime overhead cons: + ugly + duplicates the output code Of these I think if we're concerned about the overhead of the thunks in #1, #2 is the best bet. While with normal calling conventions sort can end up being identical object code to sort_r, as various people have noted, I think this kind of optimization is best left to the compiler. With #2, the compiler has a fighting chance of being able to do this. (I haven't checked; I'd be moderately, but not greatly, surprised if any of the compilers we have do.) -- David A. Holland dholl...@netbsd.org