Hello everyone, I was doing some profiling on my two relays running on FreeBSD 13.1 and noticed that they were spending a lot of time in clock_gettime() which prompted me to have a look at the implementation.
Time implementation =================== The time implementation is abstracted in src/lib/time/compat_time.c where different mechanisms are used for different operating systems. On Linux CLOCK_MONOTONIC_COARSE is a clock that gives worse precision than CLOCK_MONOTONIC, but is faster and the abstraction layer checks for its presense and provides more performat less precise time where applicable. On FreeBSD, there is also a fast monotonic time source available called CLOCK_MONOTONIC_FAST. In the header file src/lib/time/compat_time.h, a comment references this clock, but it is not used. I thought it might be worth a shot seeing what difference it would make if I enable the use of CLOCK_MONOTONIC_FAST on FreeBSD and on the VM where I run my two FreeBSD relays, the difference was stunning. I made did a quick patch simply replacing CLOCK_MONOTONIC_COARSE with CLOCK_MONOTONIC_FAST, see patches attached, compiled and tested. Tracing system calls to make sure the correct call was being used, which it was. Results ======= This lead to reducing the CPU usage of the patched relay by about 50 % compared to the unpatched relay. I was a bit shocked so I wrote a small benchmark program and ran it on my VM giving the following results: CLOCK_MONOTONIC: 4.776675 s CLOCK_MONOTONIC_FAST: 0.260002 s Showing that on my VM the performance of CLOCK_MONOTONIC_FAST is about 20 times better than CLOCK_MONOTONIC. I have tested on a few different systems and I think that the performance increase of CLOCK_MONOTONIC_FAST is thanks to commit 60b0ad10dd0fc7ff6892ecc7ba3458482fcc064c - "vdso: lower precision of vdso implementation of CLOCK_MONOTONIC_FAST and CLOCK_UPTIME_FAST" that was cherry-picked to 13.1. Try it yourself and report your results ======================================= If you want to benchmark your server to see whether switching clock could benefit you, you can compile and run my attached test program by doing user>clang -o bench.c -o bench user>./bench In case the program terminates too quickly or slowly for your liking, adjust const unsigned long iterations = 1000000; up or down to change the execution time. My supplied patches appear to work fine on my system, but aren't really upstream appropriate since a solution that works for both FreeBSD and Linux is needed. If you want to test them and you're building Tor from the ports tree, drop them in /usr/ports/security/tor/files and build and install. I'm very interested in seeing some performance data from other people to see whether I think it worth either pestering some Tor devs to have a look at this or putting in some effort myself to write an upstreamable patch. Thank you for reading! Cordially, Andreas Kempe
#include <stdio.h> #include <stdlib.h> #include <time.h> const unsigned long iterations = 1000000; int run_bench(clockid_t id) { struct timespec tp_start; struct timespec tp; struct timespec tp_end; if (clock_gettime(CLOCK_MONOTONIC_FAST, &tp_start) == -1) { perror("Error: "); return 1; } for (long i = 0; i < iterations; i++) { if (clock_gettime(id, &tp) == -1) { perror("Error: "); return 1; } } if (clock_gettime(CLOCK_MONOTONIC_FAST, &tp_end) == -1) { perror("Error: "); return 1; } printf("%lf s\n", (double)(tp.tv_sec - tp_start.tv_sec + ((double)tp.tv_nsec - (double)tp_start.tv_nsec)/1000000000)); return 0; } int main() { printf("CLOCK_MONOTONIC: "); if (run_bench(CLOCK_MONOTONIC)) return 1; printf("CLOCK_MONOTONIC_FAST: "); if (run_bench(CLOCK_MONOTONIC_FAST)) return 1; }
--- src/lib/time/compat_time.c.orig 2022-06-20 22:28:59 UTC +++ src/lib/time/compat_time.c @@ -368,27 +368,27 @@ monotime_add_msec(monotime_t *out, const monotime_t *v /* end of "__APPLE__" */ #elif defined(HAVE_CLOCK_GETTIME) -#ifdef CLOCK_MONOTONIC_COARSE +#ifdef CLOCK_MONOTONIC_FAST /** * Which clock should we use for coarse-grained monotonic time? By default - * this is CLOCK_MONOTONIC_COARSE, but it might not work -- for example, + * this is CLOCK_MONOTONIC_FAST, but it might not work -- for example, * if we're compiled with newer Linux headers and then we try to run on * an old Linux kernel. In that case, we will fall back to CLOCK_MONOTONIC. */ -static int clock_monotonic_coarse = CLOCK_MONOTONIC_COARSE; -#endif /* defined(CLOCK_MONOTONIC_COARSE) */ +static int clock_monotonic_coarse = CLOCK_MONOTONIC_FAST; +#endif /* defined(CLOCK_MONOTONIC_FAST) */ static void monotime_init_internal(void) { -#ifdef CLOCK_MONOTONIC_COARSE +#ifdef CLOCK_MONOTONIC_FAST struct timespec ts; - if (clock_gettime(CLOCK_MONOTONIC_COARSE, &ts) < 0) { - log_info(LD_GENERAL, "CLOCK_MONOTONIC_COARSE isn't working (%s); " + if (clock_gettime(CLOCK_MONOTONIC_FAST, &ts) < 0) { + log_info(LD_GENERAL, "CLOCK_MONOTONIC_FAST isn't working (%s); " "falling back to CLOCK_MONOTONIC.", strerror(errno)); clock_monotonic_coarse = CLOCK_MONOTONIC; } -#endif /* defined(CLOCK_MONOTONIC_COARSE) */ +#endif /* defined(CLOCK_MONOTONIC_FAST) */ } void @@ -405,7 +405,7 @@ monotime_get(monotime_t *out) tor_assert(r == 0); } -#ifdef CLOCK_MONOTONIC_COARSE +#ifdef CLOCK_MONOTONIC_FAST void monotime_coarse_get(monotime_coarse_t *out) { @@ -419,7 +419,7 @@ monotime_coarse_get(monotime_coarse_t *out) int r = clock_gettime(clock_monotonic_coarse, &out->ts_); if (PREDICT_UNLIKELY(r < 0) && errno == EINVAL && - clock_monotonic_coarse == CLOCK_MONOTONIC_COARSE) { + clock_monotonic_coarse == CLOCK_MONOTONIC_FAST) { /* We should have caught this at startup in monotime_init_internal! */ log_warn(LD_BUG, "Falling back to non-coarse monotonic time %s initial " @@ -430,7 +430,7 @@ monotime_coarse_get(monotime_coarse_t *out) tor_assert(r == 0); } -#endif /* defined(CLOCK_MONOTONIC_COARSE) */ +#endif /* defined(CLOCK_MONOTONIC_FAST) */ int64_t monotime_diff_nsec(const monotime_t *start,
--- src/lib/time/compat_time.h.orig 2022-06-20 22:43:26 UTC +++ src/lib/time/compat_time.h @@ -172,7 +172,7 @@ typedef struct monotime_t { #endif /* defined(__APPLE__) || ... */ } monotime_t; -#if defined(CLOCK_MONOTONIC_COARSE) && \ +#if defined(CLOCK_MONOTONIC_FAST) && \ defined(HAVE_CLOCK_GETTIME) #define MONOTIME_COARSE_FN_IS_DIFFERENT #define monotime_coarse_t monotime_t @@ -188,7 +188,7 @@ typedef struct monotime_coarse_t { #define monotime_coarse_t monotime_t #else #define monotime_coarse_t monotime_t -#endif /* defined(CLOCK_MONOTONIC_COARSE) && ... || ... */ +#endif /* defined(CLOCK_MONOTONIC_FAST) && ... || ... */ /** * Initialize the timing subsystem. This function is idempotent.
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays