https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79086
Bug ID: 79086 Summary: Localtime_r does a spinlock unnecessarily killing performance when run millions of times Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dgotwisn at netscout dot com Target Milestone: --- gcc (GCC) 4.8.1 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Linux nfw-devperf-01.newwireless.com 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Libc version libc-2.12.so Running perf top to look at what's going on in my app shows 7% of CPU in _tz_convert and 5% in spin lock, with no locks being done in my code. This means that we are spending almost as much time in spin lock (without ever waiting on it) as we are in the rest of _tz_convert. The code in question is a single threaded tight loop iterating (c++11 for (auto &it: mymap)) a std::multimap with a significant number of lines of code, only a few of which call into libc (or any other library). What goes into the libraries are array accessors and iterator accessors to the std::multimap (which has no locks) and the following code block: struct tm t; localtime_r(&nowTime.tv_sec, &t). Nowtime is a struct timeval (previously filled by gettimeofday() calls by another program run days before). This is called once per record (my test case has We also call once per record, there are 157,669,045 records (for my small test sample)). For every record, we also call fwrite_unlocked(), which according to the manual page doesn't have a lock (and by the name). In looking inside localtime_r via http://code.metager.de/source/xref/gnu/glibc/time/localtime.c, the code is as follows: struct tm * localtime (const time_t *t) { return __tz_convert (t, 1, &_tmbuf); } libc_hidden_def (localtime) Drilling down into __tz_convert (website http://osxr.org:8080/glibc/source/time/tzset.c?v=glibc-2.18), we have the following block of code: __libc_lock_lock (tzset_lock); /* Update internal database according to current TZ setting. POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname. This is a good idea since this allows at least a bit more parallelism. */ tzset_internal (tp == &_tmbuf && use_localtime, 1); followed by much more code that access tp and calls various other _tz_ routines. The correct solution should only be to lock if tp == &tmbuf (or the equivalent for gmtime vs gmtime_r). The lock prevents the parallelism mentioned in the comment. To test this, create a routine that iterates for a long time calling localtime_r for a given time_t value (it doesn't matter what the value is, as long as it's valid). Then view with "perf top -p <pid>" of the program. Run this with permissions so you can see into the kernel. A routine that uses a user-provided buffer and has no shared resources should never need a lock. If "spin lock" shows up, the bug exists. If it doesn't show up, the bug was fixed.