https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79086

            Bug ID: 79086
           Summary: Localtime_r does a spinlock unnecessarily killing
                    performance when run millions of times
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dgotwisn at netscout dot com
  Target Milestone: ---

gcc (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Linux nfw-devperf-01.newwireless.com 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul
31 17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Libc version libc-2.12.so

Running perf top to look at what's going on in my app shows 7% of CPU in
_tz_convert and 5% in spin lock, with no locks being done in my code.  This
means that we are spending almost as much time in spin lock (without ever
waiting on it) as we are in the rest of _tz_convert.

The code in question is a single threaded tight loop iterating (c++11 for (auto
&it: mymap)) a std::multimap with a significant number of lines of code, only a
few of which call into libc (or any other library).  What goes into the
libraries are array accessors and iterator accessors to the std::multimap
(which has no locks) and the following code block:

    struct tm t;
    localtime_r(&nowTime.tv_sec, &t).

Nowtime is a struct timeval (previously filled by gettimeofday() calls by
another program run days before).

This is called once per record (my test case has We also call once per record,
there are 157,669,045 records (for my small test sample)).

For every record, we also call fwrite_unlocked(), which according to the manual
page doesn't have a lock (and by the name).

In looking inside localtime_r via
http://code.metager.de/source/xref/gnu/glibc/time/localtime.c, the code is as
follows:

struct tm *
localtime (const time_t *t)
{
  return __tz_convert (t, 1, &_tmbuf);
}
libc_hidden_def (localtime)

Drilling down into __tz_convert (website
http://osxr.org:8080/glibc/source/time/tzset.c?v=glibc-2.18), we have the
following block of code:


   __libc_lock_lock (tzset_lock);

   /* Update internal database according to current TZ setting.
      POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
      This is a good idea since this allows at least a bit more parallelism. 
*/
   tzset_internal (tp == &_tmbuf && use_localtime, 1);

followed by much more code that access tp and calls various other _tz_
routines.

The correct solution should only be to lock if tp == &tmbuf (or the equivalent
for gmtime vs gmtime_r).

The lock prevents the parallelism mentioned in the comment.

To test this, create a routine that iterates for a long time calling
localtime_r for a given time_t value (it doesn't matter what the value is, as
long as it's valid).  Then view with "perf top -p <pid>" of the program.  Run
this with permissions so you can see into the kernel.

A routine that uses a user-provided buffer and has no shared resources should
never need a lock.  If "spin lock" shows up, the bug exists.  If it doesn't
show up, the bug was fixed.

Reply via email to