On Wednesday, 01.07.2015 at 00:11, Antti Kantee wrote: > Well, that's a lot of code, apparently a good deal of which comes > from having to convert RTC into seconds :(
You can thank IBM for that :-) > Does using rdtsc really work as a basis for timekeeping? Doesn't > the calibration go off when the clock rate changes? On older processors/laptop systems the TSC does indeed change frequency when the clock rate changes. On some even older "broken" processors, it even does things like halt in idle. In the kernels I've looked at, all of them contain a maze of code to deal with this and not use the TSC if it is broken. However, all Intel processors since Nehalem (introduced 2007, manufactured 2008) have an invariant TSC which is completely fine for our purposes: http://marc.info/?l=xen-devel&m=128475199115727&w=2 I can put in a check for the invariant TSC using CPUID; the question is if that fails should we just refuse to boot or warn and try anyway? For example, the system I tested on is a 2005-era Pentium M which does not have an invariant TSC, but as long as you run it on AC all is fine. There's a bit more to using the TSC when SMP is involved, but that is not something I want to even think about now :-) > Besides, rdtsc is not available on a 486, which I understood was one of > your targets. I changed my mind, for two reasons. First, I don't have a clear idea of how to implement a monotonic clock without a TSC. Second, I'm not trying to build a general solution that will run on any PC-compatible system since the dawn of the 80386. TSC is available on any Pentium-class processor, and new embedded offerings from Intel such as Quark are also Pentium-class. Hence, I don't think its worth both the work and extra complexity involved. It's also not a regression, since the current code uses TSC. Having said that, if someone comes along and wants a massive deployment of rumprun on 486-class CPUs, I'm open to consulting offers :-) > I don't understand the fascination with the 100ms calibration delay. > Why is 99ms not a good value? Or 10ms? or 1ms? I'd assume 100ms > is a value that someone picked out of a hat back when clock rates > were around 8MHz and minimizing it simply didn't matter since > computers booted for minutes anyway. That particular algorithm is based on what NetBSD does and happens to be the simplest option which is why I used it. Linux is even more paranoid and takes longer calibrating the TSC, see here: http://lxr.free-electrons.com/source/arch/x86/kernel/tsc.c#L646 The best algorithm I've found so far would appear to be the OpenSolaris code at: http://fxr.watson.org/fxr/source/i86pc/ml/locore.s?v=OPENSOLARIS#L1734 Unfortunately that is also all hand-coded assembly and CDDL licensed so not usable for us. If someone wants to do a clean-room implementation of that code for rumprun, be my guest. In any case, does it really matter how long actual bare metal takes to boot? There are way longer delays all over the place once you start enumerating devices, etc. What does matter is unikernels on KVM, and there the delay should (if I understand it correctly) go away entirely once I implement KVM pvclock since I can just grab the TSC multiplier from that interface and not bother with the TSC calibration at all. > I don't understand why you need assembly to do multiplication. I need to operate on the intermediate product which may be larger than 64 bits. It is much easier to reason about what happens if you write it in assembly, and it also allows use of a single mulq instruction on x86-64. Doing the latter in C would depend on GCC-specific 128-bit types. > > Critically examine need for critical sections. Good point. bmk_cpu_clock_now() should have cli()/sti() around it, or is there some other mechanism you'd like me to use? > > I'd just get rid of HZ, it serves no purpose. You mean replace TIMER_HZ / HZ with TIMER_HZ / 100? Sure. > bmk_cpu_block() is wrong. Just because a timer interrupt fired > doesn't mean another interrupt didn't. Seems rather painful doing > tickless with i8254... A correct but wasteful solution would be to just always return back into schedule() after the hlt(). It'll be inefficient for long sleeps, but will work fine. Any better ideas much appreciated! Regarding the i8254, I used that as despite being limited is easy to program and well documented. The alternative would be to use the APIC timer which is available on Pentium and newer, however that is much more complex to setup (since you have to enable the APIC instead of the i8259 legacy PICs) and requires calibration against the PIT since it runs at the CPU bus clock frequency. KVM does not tell us anything about the APIC so we'd be stuck with the initial boot delay even there :-/ One nice thing about the APIC timer is that on fairly new processors (Sandy Bridge and newer, 2011 vintage) it supports a TSC-deadline timer which is exactly what we need; it fires an IRQ once the TSC passes a certain deadline. So that may be worth implementing where supported as (guessing) it should exhibit much lower overhead in virtualized scenarios. > No need to expose everything that the original clock_subr.h exposes. > > uint64_t dt_year? Well that's not going to suffer from y2k issues > anytime soon. Why is it unsigned anyway? Does counting start from > -bigbang or what? ;) That was all lifted rather hurriedly from NetBSD, I'll clean it up a bit more :-) > >I'm not entirely happy about the MD/MI split of the code, perhaps that > >could be improved. Antti? > > Can you elaborate? What is the intended division between bmk_platform_X() and bmk_cpu_X()? Is it ok to just call e.g. bmk_cpu_block() from bmk_platform_block() as I'm doing it now? Similarly for bmk_platform_clock_epochoffset() just calling bmk_cpu_clock_epochoffset().
