** Description changed:

- To be elaborated soon.
+ [Impact]
+ * We received a report recently of a missing TSC refinement across multiple 
reboots of a server, in an Intel Skylake-based processor. This was only 
reproducible in Bionic pre-5.0.
+ 
+ * After checking kernel commits, we came up with 2 commits that largely
+ improve the situation: a786ef152cdc ("x86/tsc: Make calibration
+ refinement more robust") [git.kernel.org/linus/a786ef152cdc] and
+ 604dc9170f24 ("x86/tsc: Use CPUID.0x16 to calculate missing crystal
+ frequency") [git.kernel.org/linus/604dc9170f24]. We hereby request SRU
+ for both of them.
+ 
+ * The first commit contains improvement in comments and in an offset to match 
more recent (fast) machines, but the important part is a retry mechanism in the 
TSC refinement (in case it fails due to some disturbance on TSC read, like 
NMIs/SMIs).
+  
+ * The second commit is an improvement in TSC calibration for Skylake (and 
some other models), by checking a register instead of relying on table-based 
hardcoded values.
+ 
+ * A note for Xenial (kernel 4.4): the second patch would require the
+ inclusion of more commits, so given the "maturity" of this release (and
+ the fact kernel 4.15 is an HWE for Xenial), I've kept it out of Xenial,
+ backporting only the first and more important patch for 4.4 .
+ 
+ [Test case]
+ * Unfortunately there's not an easy way to test the effectiveness of the 
commits, specially the refinement improvement.
+ 
+ * The user that reported us the missing refinements was able to test 300
+ reboots with a regular Bionic kernel (and it reproduced the issue at
+ least once), whereas when they tested with Bionic kernel + both hereby
+ proposed commits, the problem didn't happen.
+ 
+ * Regarding the calibration commit, it was well-tested by community
+ using multiple machines and checking the TSC calibration read vs. tables
+ present in instlatx64.atw.hu .
+ 
+ [Regression potential]
+ * We consider the regression potential low, specially due to the nature of 
the patches: the first is basically a retry mechanism (and some improvement in 
an offset to reflect more recent machines), and the 2nd is an improvement for 
TSC calibration on some platforms (that are currently hardcoded in a 
table-based way in kernel). Also, the patches are present upstream for a while 
and I couldn't find any fixes for them.
+ 
+ * An hypothetical regression from the 2nd patch could be in TSC
+ precision calculation, which refinement itself might as well circumvent.
+ From the first patch, a bug in code is the one hypothetical regression I
+ could think.

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Xenial)
       Status: New => In Progress

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
       Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
     Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Xenial)
     Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1877858

Title:
  Improve TSC refinement (and calibration) reliability

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1877858/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to