Just to provide background for this commit that went in today:

--- a/xen/arch/powerpc/powerpc64/domain.c
+++ b/xen/arch/powerpc/powerpc64/domain.c
@@ -55,7 +55,10 @@ void load_sprs(struct vcpu *v)
     /* adjust the DEC value to account for cycles while not
      * running this OS */
     timebase_delta = mftb() - v->arch.timebase;
-    v->arch.dec -= timebase_delta;
+    if (timebase_delta > v->arch.dec)
+        v->arch.dec = 0;
+    else
+        v->arch.dec -= timebase_delta;

In the patch titled "Schedule idle domain on secondary processors", 
I mentioned that sometimes the entire system would freeze, so I didn't
want the patch to be considered for merging.

The problem turned out to be that we don't sync the timebases between
the processors.  So if load_sprs() is executed on a different CPU than
save_sprs() was, the call to mftb is bogus.  The timebase_delta can
overflow into a large unsigned value of up to 149 seconds on JS21.  So
the domU was not wrecking the machine, the decrementer was just being
loaded with a huge value every time that domU's vcpu was loaded on a
particular physical CPU, including cpu0.

This patch also went in, to pin dom0 to cpu0:

--- a/xen/arch/powerpc/setup.c  Fri Sep 01 12:31:56 2006 -0400
+++ b/xen/arch/powerpc/setup.c  Fri Sep 01 12:37:29 2006 -0400
@@ -343,6 +343,10 @@ static void __init __start_xen(multiboot
     if (NULL == alloc_vcpu(dom0, 0, 0))
         panic("Error creating domain 0 vcpu 0\n");

+    /* The Interrupt Controller will route everything to CPU 0 so we
+     * need to make sure Dom0's vVCPU 0 is pinned to the CPU */
+    dom0->vcpu[0]->cpu_affinity = cpumask_of_cpu(0);

We are currently thinking about how best to sync the timebases.  Right
now it looks like pulling in Linux's implementation is the best option.
Any comments would be appreciated.

We did have a real memory controller hang, as discussed on this list in
response to my original post.  It only occurred on Maple, where PIBS
does not clear the HIOR for secondary CPUSs, so their first exeception was
delivered to 0xX00 + Y.  Hence this patch that went in yesterday:

+        cpu0_hior = 0;

+    mthior(cpu0_hior);

Xen-ppc-devel mailing list

Reply via email to