Re: An upstream kernel bug that causes a system crash in SL-6

Akemi Yagi Fri, 06 Jan 2012 10:26:13 -0800

On Fri, Jan 6, 2012 at 8:00 AM, Akemi Yagi <[email protected]> wrote:
> Hi,
>
> Is there anyone who has/had SL-6 machines running > 200 days ?
>
> There is a kernel bug that causes a system crash when the uptime goes
> over 208.5 days. This was noted by an Scientific Linux user on the SL
> Japanese mailing list [1].
>
> According to available info, the patch [2] is now in kernel 3.1.5.
> RHEL/SL 6 is affected in the sense that the buggy code is there. SL 6
> has been out long enough to see this bug in action and so I wondered
> if someone has already encountered a crash. I searched TUV's bugzilla
> but have not been able to find one that looks related.

No wonder I did not see it; it is private. :(

Here's a copy of the reply from a RH guy to my post on the RHEL-6 mailing list:

From: Robin Price II <rprice redhat com>
Date: Fri, 06 Jan 2012 11:55:08 -0500

Bugzilla:  https://bugzilla.redhat.com/show_bug.cgi?id=765720

This is private due to private information from customer use cases. If
you need further details, I would highly encourage you to contact Red
Hat support or your TAM.

Here is the initial information opened in the BZ:

"The following patch is in urgent fix for Linus branch, which avoid the

unnecessary overflow in sched_clock otherwise kernel will crash after
209~250 days.

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=patch;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9

In hundreds of days, the __cycles_2_ns calculation in sched_clock

has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing any
precision. We can decompose TSC into quotient and remainder of
division by the scale factor, and then use this to convert TSC into
nanoseconds."

~rp

Re: An upstream kernel bug that causes a system crash in SL-6

Reply via email to