Re: TSC calibration in virtual machines
On Wed, 2018-06-27 at 10:05 -0700, Rodney W. Grimes wrote: > > > > On Wed, Jun 27, 2018 at 10:36 AM, Jung-uk Kim > > wrote: > > > > > > > > On 06/27/2018 03:14, Andriy Gapon wrote: > > > > > > > > > > > > It seems that TSC calibration in virtual machines sometimes can > > > > do more > > > harm > > > > > > > > than good. Should we default to trusting the information > > > > provided by a > > > hypervisor? > > > > > > > > > > > > Specifically, I am observing a problem on GCE instances where > > > > calibrated > > > TSC > > > > > > > > frequency is about 10% lower than advertised frequency. And > > > > apparently > > > the > > > > > > > > advertised frequency is the right one. > > > > > > > > I found this thread with similar reports and a variety of > > > > workarounds > > > from > > > > > > > > administratively disabling the calibration to switching to a > > > > different > > > timecounter: > > > > > > > > https://lists.freebsd.org/pipermail/freebsd-cloud/2017- > > > January/80.html > > > > > > We already do that for VMware hosts since r221214. > > > > > > https://svnweb.freebsd.org/changeset/base/221214 > > > > > > We should do the same for each hypervisor. > > > > > > Jung-uk Kim > > > > > > > > We probably should. But why does calibration fail in the first > > place? If > > it can fail in a VM, then it can probably fail on bare metal > > too. It would > > be worth investigating. > No, the failure in a VM is unique to a VM, it has to do with the fact > your have the hypervisor timeslicing a CPU that you believe to be > 100% > dedicated to you. > > There are several white papers, including one from VMWare about what > they have done to help with the time keeping problems. > > What is suggested above would be a correct thing to do. > Bhyve creates these issues as well, and use of certain timers > in a bhyve guest can cause you nightmares with ntp. Iirc, bhyve's arithmetic when doing timer emulation leads to roundoff errors that accumulate to effectively make the emulated timer run off- frequency. The hpet timer was trivial to fix by just redefining it to run at a power-of-2 frequency to eliminate rounding errors. The other timers have to run at fixed frequencies, so better arithmetic will be the way to fix them. I vaguely remember that being harder to do than to say because of the way the code is currently structured, which is why I just did the easy fix to the hpet so that people would have at least one usable timer that didn't give ntpd fits in guest OSes. -- Ian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: TSC calibration in virtual machines
On Wed, Jun 27, 2018, 12:48 PM Alan Somers wrote: > On Wed, Jun 27, 2018 at 10:36 AM, Jung-uk Kim wrote: > > > On 06/27/2018 03:14, Andriy Gapon wrote: > > > > > > It seems that TSC calibration in virtual machines sometimes can do more > > harm > > > than good. Should we default to trusting the information provided by a > > hypervisor? > > > > > > Specifically, I am observing a problem on GCE instances where > calibrated > > TSC > > > frequency is about 10% lower than advertised frequency. And apparently > > the > > > advertised frequency is the right one. > > > > > > I found this thread with similar reports and a variety of workarounds > > from > > > administratively disabling the calibration to switching to a different > > timecounter: > > > https://lists.freebsd.org/pipermail/freebsd-cloud/2017- > > January/80.html > > > > We already do that for VMware hosts since r221214. > > > > https://svnweb.freebsd.org/changeset/base/221214 > > > > We should do the same for each hypervisor. > > > > Jung-uk Kim > > > > > We probably should. But why does calibration fail in the first place? If > it can fail in a VM, then it can probably fail on bare metal too. It would > be worth investigating. > The main problem is you can't be assured that the DELAY call will be accurate in those cases. Also the way that some VMs implement the rdtsc insrruction may not be as accurate as we would need. In some cases it can compound the issue. -Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: TSC calibration in virtual machines
On 06/27/2018 13:05, Rodney W. Grimes wrote: > There are several white papers, including one from VMWare about what > they have done to help with the time keeping problems. https://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf Jung-uk Kim signature.asc Description: OpenPGP digital signature
Re: TSC calibration in virtual machines
> On Wed, Jun 27, 2018 at 10:36 AM, Jung-uk Kim wrote: > > > On 06/27/2018 03:14, Andriy Gapon wrote: > > > > > > It seems that TSC calibration in virtual machines sometimes can do more > > harm > > > than good. Should we default to trusting the information provided by a > > hypervisor? > > > > > > Specifically, I am observing a problem on GCE instances where calibrated > > TSC > > > frequency is about 10% lower than advertised frequency. And apparently > > the > > > advertised frequency is the right one. > > > > > > I found this thread with similar reports and a variety of workarounds > > from > > > administratively disabling the calibration to switching to a different > > timecounter: > > > https://lists.freebsd.org/pipermail/freebsd-cloud/2017- > > January/80.html > > > > We already do that for VMware hosts since r221214. > > > > https://svnweb.freebsd.org/changeset/base/221214 > > > > We should do the same for each hypervisor. > > > > Jung-uk Kim > > > > > We probably should. But why does calibration fail in the first place? If > it can fail in a VM, then it can probably fail on bare metal too. It would > be worth investigating. No, the failure in a VM is unique to a VM, it has to do with the fact your have the hypervisor timeslicing a CPU that you believe to be 100% dedicated to you. There are several white papers, including one from VMWare about what they have done to help with the time keeping problems. What is suggested above would be a correct thing to do. Bhyve creates these issues as well, and use of certain timers in a bhyve guest can cause you nightmares with ntp. -- Rod Grimes rgri...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: TSC calibration in virtual machines
On 06/27/2018 13:01, Ryan Stone wrote: > I would guess that the calibration can fail because when running under > the hypervisor, the FreeBSD guest code can be descheduled at the wrong > time. As I recall, the current algorithm looks like: > > 1. Sample rdtsc > 2. Use a fixed-frequency timer to busy-wait for exactly 1 second > 3. Sample rdtsc again > 4. tsc_freq = sample2 - sample1; > > If we are descheduled between 2 and 3, the time we spend off-cpu will > not be accounted for at step 4. On bare-metal this is not possible as > neither the scheduler nor interrupts are not running yet. > > Although, come to think of it, I seem to recall something about SMI > interrupts mucking this up long in the past, for exactly the same > reason. I think it was legacy USB device emulation for certain Intel chipset-based motherboards. Jung-uk Kim signature.asc Description: OpenPGP digital signature
Re: TSC calibration in virtual machines
I would guess that the calibration can fail because when running under the hypervisor, the FreeBSD guest code can be descheduled at the wrong time. As I recall, the current algorithm looks like: 1. Sample rdtsc 2. Use a fixed-frequency timer to busy-wait for exactly 1 second 3. Sample rdtsc again 4. tsc_freq = sample2 - sample1; If we are descheduled between 2 and 3, the time we spend off-cpu will not be accounted for at step 4. On bare-metal this is not possible as neither the scheduler nor interrupts are not running yet. Although, come to think of it, I seem to recall something about SMI interrupts mucking this up long in the past, for exactly the same reason. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: TSC calibration in virtual machines
On Wed, Jun 27, 2018 at 11:05 AM, Jung-uk Kim wrote: > On 06/27/2018 12:47, Alan Somers wrote: > > On Wed, Jun 27, 2018 at 10:36 AM, Jung-uk Kim > <mailto:j...@freebsd.org>> wrote: > > > > On 06/27/2018 03:14, Andriy Gapon wrote: > > > > > > It seems that TSC calibration in virtual machines sometimes can do > more harm > > > than good. Should we default to trusting the information provided > by a hypervisor? > > > > > > Specifically, I am observing a problem on GCE instances where > calibrated TSC > > > frequency is about 10% lower than advertised frequency. And > apparently the > > > advertised frequency is the right one. > > > > > > I found this thread with similar reports and a variety of > workarounds from > > > administratively disabling the calibration to switching to a > different timecounter: > > > https://lists.freebsd.org/pipermail/freebsd-cloud/2017- > January/80.html > > <https://lists.freebsd.org/pipermail/freebsd-cloud/2017- > January/80.html> > > > > We already do that for VMware hosts since r221214. > > > > https://svnweb.freebsd.org/changeset/base/221214 > > <https://svnweb.freebsd.org/changeset/base/221214> > > > > We should do the same for each hypervisor. > > > > We probably should. But why does calibration fail in the first place? > Because multiple guests are sharing same physical CPUs and guest OS has > no control, timing cannot be 100% accurate. > > > If it can fail in a VM, then it can probably fail on bare metal too. It > > would be worth investigating. > It does not "fail" in bare metal because we have almost complete control. > > Jung-uk Kim > > Makes sense. I didn't realize that it ran before the scheduler or interrupts were started. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: TSC calibration in virtual machines
On 06/27/2018 12:47, Alan Somers wrote: > On Wed, Jun 27, 2018 at 10:36 AM, Jung-uk Kim <mailto:j...@freebsd.org>> wrote: > > On 06/27/2018 03:14, Andriy Gapon wrote: > > > > It seems that TSC calibration in virtual machines sometimes can do more > harm > > than good. Should we default to trusting the information provided by a > hypervisor? > > > > Specifically, I am observing a problem on GCE instances where > calibrated TSC > > frequency is about 10% lower than advertised frequency. And apparently > the > > advertised frequency is the right one. > > > > I found this thread with similar reports and a variety of workarounds > from > > administratively disabling the calibration to switching to a different > timecounter: > > > https://lists.freebsd.org/pipermail/freebsd-cloud/2017-January/80.html > > <https://lists.freebsd.org/pipermail/freebsd-cloud/2017-January/80.html> > > We already do that for VMware hosts since r221214. > > https://svnweb.freebsd.org/changeset/base/221214 > <https://svnweb.freebsd.org/changeset/base/221214> > > We should do the same for each hypervisor. > > We probably should. But why does calibration fail in the first place? Because multiple guests are sharing same physical CPUs and guest OS has no control, timing cannot be 100% accurate. > If it can fail in a VM, then it can probably fail on bare metal too. It > would be worth investigating. It does not "fail" in bare metal because we have almost complete control. Jung-uk Kim signature.asc Description: OpenPGP digital signature
Re: TSC calibration in virtual machines
On Wed, Jun 27, 2018 at 10:36 AM, Jung-uk Kim wrote: > On 06/27/2018 03:14, Andriy Gapon wrote: > > > > It seems that TSC calibration in virtual machines sometimes can do more > harm > > than good. Should we default to trusting the information provided by a > hypervisor? > > > > Specifically, I am observing a problem on GCE instances where calibrated > TSC > > frequency is about 10% lower than advertised frequency. And apparently > the > > advertised frequency is the right one. > > > > I found this thread with similar reports and a variety of workarounds > from > > administratively disabling the calibration to switching to a different > timecounter: > > https://lists.freebsd.org/pipermail/freebsd-cloud/2017- > January/80.html > > We already do that for VMware hosts since r221214. > > https://svnweb.freebsd.org/changeset/base/221214 > > We should do the same for each hypervisor. > > Jung-uk Kim > > We probably should. But why does calibration fail in the first place? If it can fail in a VM, then it can probably fail on bare metal too. It would be worth investigating. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: TSC calibration in virtual machines
On 06/27/2018 03:14, Andriy Gapon wrote: > > It seems that TSC calibration in virtual machines sometimes can do more harm > than good. Should we default to trusting the information provided by a > hypervisor? > > Specifically, I am observing a problem on GCE instances where calibrated TSC > frequency is about 10% lower than advertised frequency. And apparently the > advertised frequency is the right one. > > I found this thread with similar reports and a variety of workarounds from > administratively disabling the calibration to switching to a different > timecounter: > https://lists.freebsd.org/pipermail/freebsd-cloud/2017-January/80.html We already do that for VMware hosts since r221214. https://svnweb.freebsd.org/changeset/base/221214 We should do the same for each hypervisor. Jung-uk Kim signature.asc Description: OpenPGP digital signature
Re: TSC calibration in virtual machines
On 6/27/18 12:14 AM, Andriy Gapon wrote: > > It seems that TSC calibration in virtual machines sometimes can do more harm > than good. Should we default to trusting the information provided by a > hypervisor? > > Specifically, I am observing a problem on GCE instances where calibrated TSC > frequency is about 10% lower than advertised frequency. And apparently the > advertised frequency is the right one. > > I found this thread with similar reports and a variety of workarounds from > administratively disabling the calibration to switching to a different > timecounter: > https://lists.freebsd.org/pipermail/freebsd-cloud/2017-January/80.html I suspect you are probably right that we should just "trust" TSC frequencies provided by a hypervisor. We could perhaps choose to whitelist hypervisors known to provide accurate values if we wanted to be cautious. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
TSC calibration in virtual machines
It seems that TSC calibration in virtual machines sometimes can do more harm than good. Should we default to trusting the information provided by a hypervisor? Specifically, I am observing a problem on GCE instances where calibrated TSC frequency is about 10% lower than advertised frequency. And apparently the advertised frequency is the right one. I found this thread with similar reports and a variety of workarounds from administratively disabling the calibration to switching to a different timecounter: https://lists.freebsd.org/pipermail/freebsd-cloud/2017-January/80.html -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"