Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors
On Tue, Aug 29, 2006 at 09:02:58AM +0200, Segher Boessenkool wrote: It is quite stable in that the secondary processors reliably join the idle domain and wait for free pages to scrub, handling 0x980 interrupts with no problem. What's this 980 exception? Perhaps my phrasing is bad. I was referring to the hypervisor decrementor interrupt (hdec). However, the domU's sometimes hang during initialization. When the domU hangs, it seems the whole machine freezes, including the serial console. Most common cause of this is hanging the U3/U4. Do you have a hardware debugger to see where this happens? I had a friend take a look at the state of cpu 0, but everything seems ok. It looks like there is a race and occasionally one of the secondary processors is hanging the U4. ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors
It is quite stable in that the secondary processors reliably join the idle domain and wait for free pages to scrub, handling 0x980 interrupts with no problem. What's this 980 exception? Perhaps my phrasing is bad. I was referring to the hypervisor decrementor interrupt (hdec). Ah yes, I forgot, thanks. However, the domU's sometimes hang during initialization. When the domU hangs, it seems the whole machine freezes, including the serial console. Most common cause of this is hanging the U3/U4. Do you have a hardware debugger to see where this happens? I had a friend take a look at the state of cpu 0, but everything seems ok. It looks like there is a race and occasionally one of the secondary processors is hanging the U4. Doing a cacheable load/store to HT (or something else on U4) perhaps? Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors
Most common cause of this is hanging the U3/U4. Do you have a hardware debugger to see where this happens? It's been my experience that RISCWatch isn't very helpful in these situations (e.g. can't stop the processor). When the northbridge goes, JTAG becomes unhappy. Works fine for me, don't know what the difference is -- different debugger? The CPU JTAG chain is not connected to the U4 in any way, fwiw. What happens is, the API (EI, whatever -- the CPU bus) becomes unusable after the bad I/O to U4; the CPU waits forever for the bad load/store to finish. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel