Btw, it is probably wise to use the syscall() function just so you are always sure you are testing system call details rather than libc details.
The standard microbenchmark is syscall(__NR_getpid). That is the minimal system call, vs. close that takes locks and so forth (so it's getting more issues into the test than the one you are looking at). The microbenchmark makes that seem like more of a sensical comparison than it really is. They are really apples and oranges. The TIF_SYSCALL_TRACE types (process.syscall) add some overhead to every system call. The probe types (kprobe/tracepoint/marker) add overhead only to the probed call. In real situations, there will be many different syscalls made. In tracing scenarios where you are only probing a few individual ones (especially if they are not the cheapest or most frequent), the distribution of overheads will be quite different. Thanks, Roland