Re: [RFC/PATCH v4 1/3] add high resolution timer function to debug performance issues

2014-05-21 Thread Karsten Blees
Am 22.05.2014 00:14, schrieb Richard Hansen:
> On 2014-05-20 15:11, Karsten Blees wrote:
>> Add a getnanotime() function that returns nanoseconds since 01/01/1970 as
>> unsigned 64-bit integer (i.e. overflows in july 2554).
> 
> Must it be relative to epoch?  If it was relative to system boot (like
> the NetBSD kernel's nanouptime() function), then you wouldn't have to
> worry about clock adjustments messing with performance stats and you
> might have more options for implementing getnanotime() on various platforms.
> 
> -Richard
> 

Normalizing to the epoch adds the ability to use the same timestamps (div 10e9) 
in other time-related functions (e.g. gmtime, ctime etc.), with very little 
overhead (one 64-bit integer addition per call).

The getnanotime() implementation is actually platform independent and can be 
backed by any time source that returns nanoseconds relative to anything. 
Getnanotime() is synced to the system clock only once on startup, so if your 
time source is monotonic (which I think NetBSD's nanouptime() is), you don't 
have to worry about clock adjustments.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v4 1/3] add high resolution timer function to debug performance issues

2014-05-21 Thread Richard Hansen
On 2014-05-21 18:14, Richard Hansen wrote:
> On 2014-05-20 15:11, Karsten Blees wrote:
>> Add a getnanotime() function that returns nanoseconds since 01/01/1970 as
>> unsigned 64-bit integer (i.e. overflows in july 2554).
> 
> Must it be relative to epoch?  If it was relative to system boot (like
> the NetBSD kernel's nanouptime() function),

or relative to some other arbitrary reference point

> then you wouldn't have to
> worry about clock adjustments messing with performance stats and you
> might have more options for implementing getnanotime() on various platforms.
> 
> -Richard
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v4 1/3] add high resolution timer function to debug performance issues

2014-05-21 Thread Richard Hansen
On 2014-05-20 15:11, Karsten Blees wrote:
> Add a getnanotime() function that returns nanoseconds since 01/01/1970 as
> unsigned 64-bit integer (i.e. overflows in july 2554).

Must it be relative to epoch?  If it was relative to system boot (like
the NetBSD kernel's nanouptime() function), then you wouldn't have to
worry about clock adjustments messing with performance stats and you
might have more options for implementing getnanotime() on various platforms.

-Richard
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v4 1/3] add high resolution timer function to debug performance issues

2014-05-21 Thread Karsten Blees
Am 21.05.2014 09:31, schrieb Noel Grandin:
> On 2014-05-20 21:11, Karsten Blees wrote:
>>   * implement Mac OSX version using mach_absolute_time
>>
>>
> 
> 
> Note that unlike the Windows and Linux APIs, mach_absolute_time does not do 
> correction for frequency-scaling

I don't have a MAC so I can't test any of this, but supposedly 
mach_timebase_info() returns the frequency of mach_absolute_time(), so you 
could do similar frequency-scaling as I do for Windows with 
QueryPerformanceFrequency().

> and cross-CPU synchronization with the TSC.
> 

The TSC is synchronized across cores and sockets on modern x86 hardware [1] (at 
least since Intel Nehalem, i.e. all Core i[357] processors). On older machines, 
I would expect the OS API to choose a more appropriate time source, e.g. the 
HPET. I'm not proposing to use asm("rdtsc") or anything like that...

[1] 
https://software.intel.com/en-us/articles/best-timing-function-for-measuring-ipp-api-timing

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v4 1/3] add high resolution timer function to debug performance issues

2014-05-21 Thread Noel Grandin

On 2014-05-20 21:11, Karsten Blees wrote:

  * implement Mac OSX version using mach_absolute_time





Note that unlike the Windows and Linux APIs, mach_absolute_time does not do correction for frequency-scaling and 
cross-CPU synchronization with the TSC.


I'm not aware of anything else that you could use on MacOS, so your best bet is probably just to use mach_absolute_time 
and document it's shortcomings.


Regards, Noel.

Disclaimer: http://www.peralex.com/disclaimer.html


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH v4 1/3] add high resolution timer function to debug performance issues

2014-05-20 Thread Karsten Blees
Add a getnanotime() function that returns nanoseconds since 01/01/1970 as
unsigned 64-bit integer (i.e. overflows in july 2554). This is easier to
work with than e.g. struct timeval or struct timespec.

The implementation uses gettimeofday() by default; supports high precision
time sources on the following platforms:
 * Linux: using clock_gettime(CLOCK_MONOTONIC)
 * Windows: using QueryPerformanceCounter()

Todo:
 * enable clock_gettime() on more platforms
 * implement Mac OSX version using mach_absolute_time

Signed-off-by: Karsten Blees 
---
 Makefile |  7 +
 cache.h  |  1 +
 config.mak.uname |  1 +
 trace.c  | 82 
 4 files changed, 91 insertions(+)

diff --git a/Makefile b/Makefile
index a53f3a8..3c05f8c 100644
--- a/Makefile
+++ b/Makefile
@@ -341,6 +341,8 @@ all::
 #
 # Define GMTIME_UNRELIABLE_ERRORS if your gmtime() function does not
 # return NULL when it receives a bogus time_t.
+#
+# Define HAVE_CLOCK_GETTIME if your platform has clock_gettime in librt.
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1497,6 +1499,11 @@ ifdef GMTIME_UNRELIABLE_ERRORS
BASIC_CFLAGS += -DGMTIME_UNRELIABLE_ERRORS
 endif
 
+ifdef HAVE_CLOCK_GETTIME
+   BASIC_CFLAGS += -DHAVE_CLOCK_GETTIME
+   EXTLIBS += -lrt
+endif
+
 ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
diff --git a/cache.h b/cache.h
index 107ac61..48fc616 100644
--- a/cache.h
+++ b/cache.h
@@ -1362,6 +1362,7 @@ extern int trace_want(const char *key);
 __attribute__((format (printf, 2, 3)))
 extern void trace_printf_key(const char *key, const char *fmt, ...);
 extern void trace_strbuf(const char *key, const struct strbuf *buf);
+extern uint64_t getnanotime(void);
 
 void packet_trace_identity(const char *prog);
 
diff --git a/config.mak.uname b/config.mak.uname
index 23a8803..5e3b1dd 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -33,6 +33,7 @@ ifeq ($(uname_S),Linux)
HAVE_PATHS_H = YesPlease
LIBC_CONTAINS_LIBINTL = YesPlease
HAVE_DEV_TTY = YesPlease
+   HAVE_CLOCK_GETTIME = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
NO_STRLCPY = YesPlease
diff --git a/trace.c b/trace.c
index 08180a9..3d72084 100644
--- a/trace.c
+++ b/trace.c
@@ -187,3 +187,85 @@ int trace_want(const char *key)
return 0;
return 1;
 }
+
+#ifdef HAVE_CLOCK_GETTIME
+
+static inline uint64_t highres_nanos(void)
+{
+   struct timespec ts;
+   if (clock_gettime(CLOCK_MONOTONIC, &ts))
+   return 0;
+   return (uint64_t) ts.tv_sec * 10 + ts.tv_nsec;
+}
+
+#elif defined (GIT_WINDOWS_NATIVE)
+
+static inline uint64_t highres_nanos(void)
+{
+   static uint64_t high_ns, scaled_low_ns;
+   static int scale;
+   LARGE_INTEGER cnt;
+
+   if (!scale) {
+   if (!QueryPerformanceFrequency(&cnt))
+   return 0;
+
+   /* high_ns = number of ns per cnt.HighPart */
+   high_ns = (10LL << 32) / (uint64_t) cnt.QuadPart;
+
+   /*
+* Number of ns per cnt.LowPart is 10^9 / frequency (or
+* high_ns >> 32). For maximum precision, we scale this factor
+* so that it just fits within 32 bit (i.e. won't overflow if
+* multiplied with cnt.LowPart).
+*/
+   scaled_low_ns = high_ns;
+   scale = 32;
+   while (scaled_low_ns >= 0x1LL) {
+   scaled_low_ns >>= 1;
+   scale--;
+   }
+   }
+
+   /* if QPF worked on initialization, we expect QPC to work as well */
+   QueryPerformanceCounter(&cnt);
+
+   return (high_ns * cnt.HighPart) +
+  ((scaled_low_ns * cnt.LowPart) >> scale);
+}
+
+#else
+# define highres_nanos() 0
+#endif
+
+static inline uint64_t gettimeofday_nanos(void)
+{
+   struct timeval tv;
+   gettimeofday(&tv, NULL);
+   return (uint64_t) tv.tv_sec * 10 + tv.tv_usec * 1000;
+}
+
+/*
+ * Returns nanoseconds since the epoch (01/01/1970), for performance tracing
+ * (i.e. favoring high precision over wall clock time accuracy).
+ */
+inline uint64_t getnanotime(void)
+{
+   static uint64_t offset;
+   if (offset > 1) {
+   /* initialization succeeded, return offset + high res time */
+   return offset + highres_nanos();
+   } else if (offset == 1) {
+   /* initialization failed, fall back to gettimeofday */
+   return gettimeofday_nanos();
+   } else {
+   /* initialize offset if high resolution timer works */
+   uint64_t now = gettimeofday_nanos();
+   uint64_t highres = highres_nanos();
+   if (highres)
+   offset = now - highres;
+   else
+   offset = 1;
+   return now;
+   }
+}
-- 
1.9.2.msysgit.0.493