Re: performance regression on 2.6.32.60

John Stultz Thu, 24 Jan 2013 10:27:31 -0800

On 01/24/2013 10:08 AM, John Stultz wrote:

On 01/24/2013 07:00 AM, Romain Francoise wrote:
Willy Tarreau <[email protected]> writes:
What is your workload ?
There are two separate workloads that are affected by this issue:
- one is a pretty typical whole-distribution build, which gets 1.5xto 2x
   slower with v2.6.32.60
- the other is a code simulator, which is basically a bunch of processes
   communicating together using POSIX message queues (mq_timedsend() and
mq_timedreceive()). It runs approximately 10x slower, but doesn'tseem
   to use more CPU, it just sits there doing pretty much nothing.
I managed to narrow the second workload down to a single test case,whichruns in approximately 0.7s in v2.6.32.59 and 8s (!) in v2.6.32.60.Havinga simple test case allowed me to do a full bisection in a test VM,and it
ends on commit 61b76840ddee647c0c223365378c3f394355b7d7 ("time: Avoid
making adjustments if we haven't accumulated anything").

And indeed, if I revert this commit from v2.6.32.60 then everything is
back to normal.
Huh. That's curious.
Especially since "time: Avoid making adjustments if we haven'taccumulated anything" should really be shortcutting unnecessary work.
Do you see any similar regression behavior between 3.5 and 3.6 (whenthe same fix landed upstream)?
If not, I suspect there's some extra detail w/ xtime_cache and thenon-logarithmic accumulation that we did prior to 2.6.33, that I missed.


Yep, that's the issue. As soon as I hit send it made sense.

Basically with kernels between 2.6.20-something to 2.6.32, we accumulatetime in half second chunks, rather then every timer-tick. This was addedbecause when NOHZ landed, if you were idle for a few seconds, you had tospin for every tick we skipped in the accumulation loop, which createdsome bad latencies.

However, this required that we create the xtime_cache() which was stillupdated each tick, so that filesystem timestamps, etc continued to seetime increment normally.

Of course, the xtime_cache is updated at the bottom ofupdate_wall_time(). So the early return on (offset <timekeeper.cycle_interval), causes the xtime_cache to not be updated.Then in hrtimer_get_softirq_time(), we pull current_kernel_time() whichuses the non-updated xtime_cache, and that likely causes timers to firewith half-second granularity.

So I think reverting 61b76840ddee647c0c223365378c3f394355b7d7 is theright call, since it isn't really appropriate for 3.6.32.

Although the issue it addresses (inconsistencies in the _COARSEclockids) would return, those had been present for quite some timewithout complaint. So another possible solution would be the following(fair warning, totally untested). Mind giving it a try to see if itresolves the issue?


If it looks good to you I'll put it through some of my own tests as well.

thanks
-john


diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3d35af3..f65a0fb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -809,7 +809,7 @@ void update_wall_time(void)
 #endif
        /* Check if there's really nothing to do */
        if (offset < timekeeper.cycle_interval)
-               return;
+               goto out;

timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;@@ -881,6 +881,7 @@ void update_wall_time(void)

        timekeeper.ntp_error += timekeeper.xtime_nsec <<
                                timekeeper.ntp_error_shift;

+out:

        nsecs = clocksource_cyc2ns(offset, timekeeper.mult, timekeeper.shift);
        update_xtime_cache(nsecs);


--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance regression on 2.6.32.60

Reply via email to