After much joy with this, I thought I post this to a bigger audience.
After having migrated to Xen 4.1.1, booting HVM guests had several
issues. Some related to interrupts not being set up correctly (which
Stefano has posted patches) and even with those 3.0 guests seem to hang
for me while 2.6.38 or older kernels were ok.

After digging deeply into this, I think I found the issue. However, if
that is true, it seems rather lucky if pv spinlocks in HVM worked for
anybody.

The xen_hvm_smp_init() call will change the smp_ops hooks. One of which is 
smp_prepare_cpus. This is done in start_kernel and at this point in time, there 
is no change to the pv_lock_ops and they point to the ticket versions. Later in 
start_kernel, check_bugs is called and part of that takes the pv_lock_ops and 
patches the kernel with the correct jumps.
_After_ that, the kernel_init is called and that in turn does the 
smp_prepare_cpus which now changes the pv_lock_ops again, *but not* run any 
patching again. So the _raw_spin_*lock calls still use the ticket calls.

start_kernel
  setup_arch -> xen_hvm_smp_init (set smp_ops)
  ...
  check_bugs -> alternative_instructions (patches pv_locks sites)
  ...
  rest_init (triggers kernel_init as a thread)
    kernel_init
      ...
      smp_prepare_cpus (calls xen_init_spinlocks through smp_ops hook)

To make things even more fun, there is the possibility that some
spinlock functions are forced to be inlined and others are not
(CONFIG_INLINE_SPIN_*). In our special case only two versions of
spin_unlock were inlined. Put that together into a pot with modules,
shake well, and there is the fun. Basically on load time, the non-inline
calls remain pointing to the unmodified ticket implementation (main
kernel). But the unlock calls (which are inlined) get modified because
the loaded module gets patched up. One can imagine that this does not
work too well.

Anyway, even without the secondary issue, I am sure that just replacing
the functions in pv_lock_ops without the spinlock calls getting actually
modified is not the intended behaviour.

Unfortunately I have not yet been able to make it work. Any attempt to
move xen_init_spinlocks to be called before check_bugs or adding a call
to alternative_instructions results in another hang on boot. At least
the latter method results in a more usable dump for crash, which shows
that on spinlock was taken (slow) and two spurious taken ones (this is
more to play for me).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/838026

Title:
  Running Oneiric kernel as Xen HVM guest with pvlocks hangs on boot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/838026/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to