On Fri, Jan 6, 2012 at 8:00 AM, Akemi Yagi <[email protected]> wrote: > Hi, > > Is there anyone who has/had SL-6 machines running > 200 days ? > > There is a kernel bug that causes a system crash when the uptime goes > over 208.5 days. This was noted by an Scientific Linux user on the SL > Japanese mailing list [1]. > > According to available info, the patch [2] is now in kernel 3.1.5. > RHEL/SL 6 is affected in the sense that the buggy code is there. SL 6 > has been out long enough to see this bug in action and so I wondered > if someone has already encountered a crash. I searched TUV's bugzilla > but have not been able to find one that looks related.
No wonder I did not see it; it is private. :( Here's a copy of the reply from a RH guy to my post on the RHEL-6 mailing list: From: Robin Price II <rprice redhat com> Date: Fri, 06 Jan 2012 11:55:08 -0500 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=765720 This is private due to private information from customer use cases. If you need further details, I would highly encourage you to contact Red Hat support or your TAM. Here is the initial information opened in the BZ: "The following patch is in urgent fix for Linus branch, which avoid the unnecessary overflow in sched_clock otherwise kernel will crash after 209~250 days. http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=patch;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9 In hundreds of days, the __cycles_2_ns calculation in sched_clock has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing the final value to become zero. We can solve this without losing any precision. We can decompose TSC into quotient and remainder of division by the scale factor, and then use this to convert TSC into nanoseconds." ~rp
