** Description changed:
- To be elaborated soon.
+ [Impact]
+ * We received a report recently of a missing TSC refinement across multiple
reboots of a server, in an Intel Skylake-based processor. This was only
reproducible in Bionic pre-5.0.
+
+ * After checking kernel commits, we came up with 2 commits that largely
+ improve the situation: a786ef152cdc ("x86/tsc: Make calibration
+ refinement more robust") [git.kernel.org/linus/a786ef152cdc] and
+ 604dc9170f24 ("x86/tsc: Use CPUID.0x16 to calculate missing crystal
+ frequency") [git.kernel.org/linus/604dc9170f24]. We hereby request SRU
+ for both of them.
+
+ * The first commit contains improvement in comments and in an offset to match
more recent (fast) machines, but the important part is a retry mechanism in the
TSC refinement (in case it fails due to some disturbance on TSC read, like
NMIs/SMIs).
+
+ * The second commit is an improvement in TSC calibration for Skylake (and
some other models), by checking a register instead of relying on table-based
hardcoded values.
+
+ * A note for Xenial (kernel 4.4): the second patch would require the
+ inclusion of more commits, so given the "maturity" of this release (and
+ the fact kernel 4.15 is an HWE for Xenial), I've kept it out of Xenial,
+ backporting only the first and more important patch for 4.4 .
+
+ [Test case]
+ * Unfortunately there's not an easy way to test the effectiveness of the
commits, specially the refinement improvement.
+
+ * The user that reported us the missing refinements was able to test 300
+ reboots with a regular Bionic kernel (and it reproduced the issue at
+ least once), whereas when they tested with Bionic kernel + both hereby
+ proposed commits, the problem didn't happen.
+
+ * Regarding the calibration commit, it was well-tested by community
+ using multiple machines and checking the TSC calibration read vs. tables
+ present in instlatx64.atw.hu .
+
+ [Regression potential]
+ * We consider the regression potential low, specially due to the nature of
the patches: the first is basically a retry mechanism (and some improvement in
an offset to reflect more recent machines), and the 2nd is an improvement for
TSC calibration on some platforms (that are currently hardcoded in a
table-based way in kernel). Also, the patches are present upstream for a while
and I couldn't find any fixes for them.
+
+ * An hypothetical regression from the 2nd patch could be in TSC
+ precision calculation, which refinement itself might as well circumvent.
+ From the first patch, a bug in code is the one hypothetical regression I
+ could think.
** Also affects: linux (Ubuntu Bionic)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Xenial)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Xenial)
Status: New => In Progress
** Changed in: linux (Ubuntu Xenial)
Importance: Undecided => High
** Changed in: linux (Ubuntu Bionic)
Status: New => In Progress
** Changed in: linux (Ubuntu Bionic)
Importance: Undecided => High
** Changed in: linux (Ubuntu Bionic)
Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)
** Changed in: linux (Ubuntu Xenial)
Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1877858
Title:
Improve TSC refinement (and calibration) reliability
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1877858/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs