Re: [XenPPC] PHDR link failure testcase
Perhaps, this is just mythology/warm-n-fuzzy for me, but I really like having 1 PHDR. Lemmy collect my thoughts and come up with a rational reason. 1 PHDR works just as well; the important thing is to explicitly define your PHDRs in the linker script. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Error creating domain on JS20 (Fw: [Prose-jvm] Brief Status in TRL (2006/08/24))
I also probably have a old blade. But your statement is correct: Almost all JS20. However, that is not the same as all the JS20. Yes indeed. We hardly ever test on those old machines; there's not many of them around. So you're our tester now, heh. Don't worry: you found one of the two differences already -- see if you can spot the other ;-) 970, 970FX, 970GX, 970MP, actually. FWIW: first two of those are the two generations of JS20; last two are 2-core and 4-core JS21. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors
It is quite stable in that the secondary processors reliably join the idle domain and wait for free pages to scrub, handling 0x980 interrupts with no problem. What's this 980 exception? Perhaps my phrasing is bad. I was referring to the hypervisor decrementor interrupt (hdec). Ah yes, I forgot, thanks. However, the domU's sometimes hang during initialization. When the domU hangs, it seems the whole machine freezes, including the serial console. Most common cause of this is hanging the U3/U4. Do you have a hardware debugger to see where this happens? I had a friend take a look at the state of cpu 0, but everything seems ok. It looks like there is a race and occasionally one of the secondary processors is hanging the U4. Doing a cacheable load/store to HT (or something else on U4) perhaps? Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors
Most common cause of this is hanging the U3/U4. Do you have a hardware debugger to see where this happens? It's been my experience that RISCWatch isn't very helpful in these situations (e.g. can't stop the processor). When the northbridge goes, JTAG becomes unhappy. Works fine for me, don't know what the difference is -- different debugger? The CPU JTAG chain is not connected to the U4 in any way, fwiw. What happens is, the API (EI, whatever -- the CPU bus) becomes unusable after the bad I/O to U4; the CPU waits forever for the bad load/store to finish. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang
Sometimes when Xen is booted and we let Linux init the MPIC for the second time Xen could end up in a loop where the CPU is constantly being interrupted by the MPIC. Because of console buffering, the last message you see is some message from early kernel boot. Anyway.. we detect this now and you see a panic. There seems to be a problem with the U3/U4 MPIC, where edge- triggered interrupts are delivered to more than one CPU. Every CPU other than the one that ACKed it first, will get the spurious vector (so functionally, the impact of this bug isn't that bad; performance-wise it might be different). The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce console output for every spurious interrupt, you'll get a nice little storm. Is that what's happening? Yes, I believe, it has something to do with temperature. Interesting observation, never thought of investigating that -- it's in line with my suspicion that something in the MPIC is metastable though. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] Print backtrace on BUG
Bah, it's too early for GCC asm: +asm(mr %0, 1 : =r (sp)); \ +asm(mflr %0 : =r (lr)); \ +asm(mflr %0; bl 1f; 1: mflr %1; mtlr %0 : =r (tp), =r (pc));\ asm(bl $+4 ; mflr %0; mtlr %1 : =r(pc) : r(lr));\ +show_backtrace(sp, lr, pc); \ +__asm__ __volatile__ ( trap ); \ +} while ( 0 ) ...and the one asm where you put volatile on is the only one that doesn't need it :-) (and no __ is needed either). Alternatively (and preferred), you can make a single statement out of the first three asm statements. In fact, you _have_ to make those three into one; even volatile asms can be reordered by the compiler, if there's no data dependency. The trap asm is always volatile (it has no parameters); it can still be reordered though. You can use __builtin_trap() here instead. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [xenppc-unstable] [XEN][POWERPC] SCOM access is fully known and working
+/* these give iface errors because the address is ambiguous after + * the above bit dropping */ +BUG_ON(addr == 0x8000); Anything with the high bit set isn't available via SCOMC/SCOMD, only via the external interfaces. +/* WARNING! older 970s (pre FX) shift the bits right 1 position */ They also don't have the exact same stuff at the exact same registers -- SCOM is very CPU-specific, check every one you want to use. That is, if you do the fix for the shifted bits, if not, don't bother ;-) +if (c.bits.iface_error) +udelay(10); Why the udelay()? +/* SCOMC addresses are 16bit but we are given 24 bits in the + * books. The low oerder 8 bits are some kinda parity thin and should + * be ignored */ The low bit is the odd parity of the other 23 bits; everything accessible via SCOMC/SCOMD has bits 16..22 zero. All these comments are pretty minor, congratz on finally having it working Jimi :-) Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Help with JS21 disk solution
@@ -126,6 +126,8 @@ static void u4_inv_entry(ulong pgn) union dart_ctl dc; ulong retries = 0; +return u4_inv_all(); If you need inv_all here, you have a bug elsewhere... Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Help with JS21 disk solution
If you need inv_all here, you have a bug elsewhere... I agree, I'm just trying to corner the beast :) Ok, this seems to work, its pretty solid, so somehow our invalidation logic is sufficient for network but not disk activity. One theory is that disk uses short lived TCE entries and not batching as network does. So we have a workaround and later we can investigate the single entry issue. Do you map the DART table as M=1 or M=0? U3 should use M=0 (and needs logic to flush the data to main memory), while U4 should use M=1... Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Cannot boot from local disk
Good question, I think it may do iso-9660 and fat16, and I heard that ext2 might be supported, but I'd be surprised if SLOF can do reiserfs It can do ext2/ext3 and fat12/fat16 (both versions)/fat32. The various CD and DVD filesystems are next on the list of filesystems we want to support, for obvious reasons ;-) Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Spurious interrupt count
One day, all mpic operations will happen in Xen, Ah, you got me worried already. But this MPIC-sharing architecture is temporary, good :-) Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [linux-ppc-2.6] [LINUX][XEN][POWEPRC] def config changes
Why did you change the date? Jimi did not change the date by hand. The Kconfig logic discussed above did so, and I believe that the myriad benefits of that logic outweigh the cost of resolving the trivial merge conflict caused by the date. Perhaps we can investigate removing the date insertion logic if this becomes a real maintainer burden. KCONFIG_NOTIMESTAMP=1 Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] Remove timestamp from xen_maple_defconfig
Doesn't this require that everybody build like this all the time? In that case, I'm not sure a one-time checkin makes sense. Yes. You can edit your Makefile's to always include it, if you want. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Re: [Xen Wiki] Update of XenPPC/Run by JeroneYoung
+ SLOTH firmware loads up. You will PXE boot the Xen image built putting an example of the command line to enter when using Sloth. I was It's not called SLOTH, it's SLOF :-) Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] Flush the ERAT early for secondary CPUs
p = probability of success = .997 (897 / 900) q = probability of failure = .003 (1.0 - .997) n = number of trials = 2323 X = number of successes= 2323 Applying these to the binomial probability formula, we get: P(2323) = 2323! / ((2323 - 2323)! * 2323!) * .997**2323 * .003** (2323-2323) = .0009307922 So we conclude that the probability that our trials with this patch applied achieved exactly 2323 successes because of chance alone is . 0009. Not prematurely rounding p to a useless precision gives you p**2323 ~ 0.000428 even. And that just calculates the chance that 2323 tries all succeed given that the chance for one to succeed is 897/900; it doesn't compare two hypotheses at all. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] Disable DPM until code is audited
Most JS20 and JS21 have DPM disabled on the board, What does this mean? SLOF/js2x enables DPM always, for example; there is no hardware override that I'm aware of. According to S9.9 of 970FX UM: Dynamic power management can be disabled in the RAS units by asserting bit[0] in the JTAG register with modifier address 0x000800. Oh okay, that's not a *hardware* disable. Well could be that it's enabled on JS2x, I dunno. which is why we have not seen any SMP problems with them. However the Maple-D and the JS20 model Amos cites both have had problems with the one of these two modes. That model seems to be the newest JS20 we've run on. Sounds like the problem manifests itself on all 970FX and no other CPUs from the 970 family. I was under the impression that we had other 970FX js20s but perhaps we do not The 2.2GHz ones are 970FX, the 1.6GHz ones are not. My question remains: did you try with NAP disabled and DPM enabled? I see, so: HID0[NAP]=1 HID0[DPM]=1 MSR[POW]=1 is NAP and is different than: HID0[NAP]=0 HID0[DPM]=1 MSR[POW]=1 which is something else? NAP=0 DPM=1 POW=whatever is what I was after. DPM is not a power-down mode; it's just (fine-grained) clock gating AFAIK, it shouldn't make anything slower ever. Sure I'll try that. Thanks! Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] Disable DPM until code is audited
-/* FIXME Do not set the NAP and DPM bits in HID0 until we have had a - * chance to audit the safe halt and idle loop code. */ +/* FIXME Do not set the NAP bit in HID0 until we have had a chance + * to audit the safe halt and idle loop code. */ hid0.bits.nap = 0; /* NAP */ -hid0.bits.dpm = 0; /* Dynamic Power Management */ +hid0.bits.dpm = 1; /* Dynamic Power Management */ hid0.bits.nhr = 1; /* Not Hard Reset */ hid0.bits.hdice_en = 1; /* enable HDEC */ This works on the JS20 in TRL. Great to hear DPM is just fine. NAP is expected to have problems on CPUs before 970MP, it needs special setup. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Machine check: instruction-fetch TLB tablewalk
[NOTE: I'm assuming the decode here is correct] (XEN) MACHINE CHECK: IS Recoverable (XEN) SRR1: 0x000cf032 (XEN) 0b11: Exception caused by a hardware uncorrectable (XEN) error (UE) detected while doing a reload of an (XEN) instruction-fetch TLB tablewalk. (XEN) (XEN) DSISR: 0x0220 There was a parity error in the ITLB CAM array. The hardware won't recover this, but software can (blast the entry away, reload it -- or just blast all TLBs away, probably easier, and performance isn't an issue, this shouldn't happen often at all). You should hardly ever see this. If you add recovery routines, there are some special settings (in HID4 I think) that will introduce bit errors for you, it's almost impossible to test this stuff otherwise, unless you have serious hardware problems :-) Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] Re: OF claim untrustworthy?
Repeated identical claims cause an unknown exception at the Forth prompt, but don't succeed. I'm not sure if that becomes an error via the client interface. It does, the throw method would return an OF failure, this is expected. The OF side of the specific client interface call has to catch the error and return the appropriate kind of error, this stuff cannot be automated. Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [xenppc-unstable] [XEN][POWERPC] Linker script simplification broke optimized builds.
[XEN][POWERPC] Linker script simplification broke optimized builds. offending changeset was: changeset: 14126:c759c733f77d So put it back and just update the symbols like a good little boy. What, you're replacing one bug by a big bag of other bugs? Wouldn't it have been smarter to just fix the bug you had? Is there any bug report about the original problem (I didn't see it)? +SEARCH_DIR(=/usr/local/lib64); SEARCH_DIR(=/lib64); SEARCH_DIR (=/usr/lib64); SEARCH_DIR(=/usr/local/lib); SEARCH_DIR(=/ lib); SEARCH_DIR(=/usr/lib); For example, this obviously is very very wrong. I don't dare look at the rest of this patch (well I did, but I don't know where to start commenting on it ;-) ) Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel
Re: [XenPPC] [PATCH] Linux shim code for ACM hypercalls
This patch provides the implementation of the shim layer for ACM hypercalls on PPC. Signed-off-by: Stefan Berger [EMAIL PROTECTED] -ENOPATCH Segher ___ Xen-ppc-devel mailing list Xen-ppc-devel@lists.xensource.com http://lists.xensource.com/xen-ppc-devel