Re: [XenPPC] PHDR link failure testcase

2006-08-16 Thread Segher Boessenkool
Perhaps, this is just mythology/warm-n-fuzzy for me, but I really  
like having 1 PHDR.

Lemmy collect my thoughts and come up with a rational reason.


1 PHDR works just as well; the important thing is to explicitly define
your PHDRs in the linker script.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Error creating domain on JS20 (Fw: [Prose-jvm] Brief Status in TRL (2006/08/24))

2006-08-25 Thread Segher Boessenkool

I also probably have a old blade.   But your statement is correct:
Almost all JS20.  However, that is not the same as all the JS20.


Yes indeed.

We hardly ever test on those old machines; there's not many of them
around.  So you're our tester now, heh.  Don't worry: you found one
of the two differences already -- see if you can spot the other ;-)


970, 970FX, 970GX, 970MP, actually.


FWIW: first two of those are the two generations of JS20; last two
are 2-core and 4-core JS21.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors

2006-08-29 Thread Segher Boessenkool
It is quite stable in that the secondary processors reliably join  
the

idle domain and wait for free pages to scrub, handling 0x980
interrupts
with no problem.


What's this 980 exception?


Perhaps my phrasing is bad.  I was referring to the hypervisor
decrementor interrupt (hdec).


Ah yes, I forgot, thanks.


However, the domU's sometimes hang during initialization.  When the
domU hangs, it seems the whole machine freezes, including the serial
console.


Most common cause of this is hanging the U3/U4.  Do you have a
hardware debugger to see where this happens?


I had a friend take a look at the state of cpu 0, but everything  
seems ok.

It looks like there is a race and occasionally one of the secondary
processors is hanging the U4.


Doing a cacheable load/store to HT (or something else on U4)
perhaps?


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH/RFC] Schedule idle domain on secondary processors

2006-08-29 Thread Segher Boessenkool
Most common cause of this is hanging the U3/U4.  Do you have a  
hardware

debugger to see where this happens?


It's been my experience that RISCWatch isn't very helpful in these
situations (e.g. can't stop the processor). When the northbridge goes,
JTAG becomes unhappy.


Works fine for me, don't know what the difference is -- different
debugger?  The CPU JTAG chain is not connected to the U4 in any
way, fwiw.

What happens is, the API (EI, whatever -- the CPU bus) becomes
unusable after the bad I/O to U4; the CPU waits forever for the
bad load/store to finish.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: Hang on boot Was: [XenPPC] [xenppc-unstable] [POWERPC][XEN] Detect bad spurious interrupt condition and panic instead of hang

2006-09-12 Thread Segher Boessenkool
Sometimes when Xen is booted and we let Linux init the MPIC for  
the second time Xen could end up in a loop where the CPU is  
constantly being interrupted by the MPIC.


Because of console buffering, the last message you see is some  
message from early kernel boot.

Anyway.. we detect this now and you see a panic.


There seems to be a problem with the U3/U4 MPIC, where edge-
triggered interrupts are delivered to more than one CPU.  Every
CPU other than the one that ACKed it first, will get the spurious
vector (so functionally, the impact of this bug isn't that bad;
performance-wise it might be different).

The UART IRQ [on JS2x and Maple] is an edge IRQ; if you produce
console output for every spurious interrupt, you'll get a nice
little storm.  Is that what's happening?


Yes, I believe, it has something to do with temperature.


Interesting observation, never thought of investigating that --
it's in line with my suspicion that something in the MPIC is
metastable though.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] Print backtrace on BUG

2006-09-21 Thread Segher Boessenkool

Bah, it's too early for GCC asm:

+asm(mr %0, 1 :  
=r (sp));  \
+asm(mflr %0 :  
=r (lr));   \
+asm(mflr %0; bl 1f; 1: mflr %1; mtlr %0 : =r (tp),  
=r (pc));\


asm(bl $+4 ; mflr %0; mtlr %1 : =r(pc) : r(lr));\

+show_backtrace(sp, lr,  
pc);   \
+__asm__ __volatile__  
( trap );  \

+} while ( 0 )


...and the one asm where you put volatile on is the only one that
doesn't need it :-)  (and no __ is needed either).

Alternatively (and preferred), you can make a single statement out
of the first three asm statements.


In fact, you _have_ to make those three into one; even volatile asms
can be reordered by the compiler, if there's no data dependency.

The trap asm is always volatile (it has no parameters); it can still
be reordered though.  You can use __builtin_trap() here instead.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [xenppc-unstable] [XEN][POWERPC] SCOM access is fully known and working

2006-09-21 Thread Segher Boessenkool

+/* these give iface errors because the address is ambiguous after
+ * the above bit dropping */
+BUG_ON(addr == 0x8000);


Anything with the high bit set isn't available via SCOMC/SCOMD,
only via the external interfaces.

+/* WARNING! older 970s (pre FX) shift the bits right 1  
position */


They also don't have the exact same stuff at the exact same
registers -- SCOM is very CPU-specific, check every one you
want to use.  That is, if you do the fix for the shifted bits,
if not, don't bother ;-)


+if (c.bits.iface_error)
+udelay(10);


Why the udelay()?


+/* SCOMC addresses are 16bit but we are given 24 bits in the
+ * books. The low oerder 8 bits are some kinda parity thin and should
+ * be ignored */


The low bit is the odd parity of the other 23 bits; everything
accessible via SCOMC/SCOMD has bits 16..22 zero.


All these comments are pretty minor, congratz on finally having
it working Jimi :-)


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Help with JS21 disk solution

2006-09-25 Thread Segher Boessenkool

@@ -126,6 +126,8 @@ static void u4_inv_entry(ulong pgn)
 union dart_ctl dc;
 ulong retries = 0;
+return u4_inv_all();


If you need inv_all here, you have a bug elsewhere...


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Help with JS21 disk solution

2006-09-27 Thread Segher Boessenkool

If you need inv_all here, you have a bug elsewhere...


I agree, I'm just trying to corner the beast :)


Ok, this seems to work, its pretty solid, so somehow our  
invalidation logic is sufficient for network but not disk  
activity.  One theory is that disk uses short lived TCE entries and  
not batching as network does.


So we have a workaround and later we can investigate the single  
entry issue.


Do you map the DART table as M=1 or M=0?  U3 should use M=0
(and needs logic to flush the data to main memory), while U4
should use M=1...


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Cannot boot from local disk

2006-10-06 Thread Segher Boessenkool
Good question, I think it may do iso-9660 and fat16, and I heard  
that ext2 might be supported, but I'd be surprised if SLOF can do  
reiserfs


It can do ext2/ext3 and fat12/fat16 (both versions)/fat32.

The various CD and DVD filesystems are next on the list of
filesystems we want to support, for obvious reasons ;-)


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Spurious interrupt count

2006-10-07 Thread Segher Boessenkool

One day, all mpic operations will happen in Xen,


Ah, you got me worried already.  But this MPIC-sharing
architecture is temporary, good :-)


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [linux-ppc-2.6] [LINUX][XEN][POWEPRC] def config changes

2006-10-26 Thread Segher Boessenkool

Why did you change the date?


Jimi did not change the date by hand.  The Kconfig logic discussed  
above

did so, and I believe that the myriad benefits of that logic outweigh
the cost of resolving the trivial merge conflict caused by the date.

Perhaps we can investigate removing the date insertion logic if this
becomes a real maintainer burden.


KCONFIG_NOTIMESTAMP=1


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] Remove timestamp from xen_maple_defconfig

2006-10-26 Thread Segher Boessenkool

Doesn't this require that everybody build like this all the time? In
that case, I'm not sure a one-time checkin makes sense.


Yes.  You can edit your Makefile's to always include it, if
you want.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Re: [Xen Wiki] Update of XenPPC/Run by JeroneYoung

2006-11-01 Thread Segher Boessenkool

+ SLOTH firmware loads up. You will PXE boot the Xen image built


putting an example of the command line to enter when using Sloth. I  
was


It's not called SLOTH, it's SLOF :-)


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] Flush the ERAT early for secondary CPUs

2006-11-09 Thread Segher Boessenkool

 p = probability of success = .997 (897 / 900)
 q = probability of failure = .003 (1.0 - .997)
 n = number of trials   = 2323
 X = number of successes= 2323

Applying these to the binomial probability formula, we get:

 P(2323) = 2323! / ((2323 - 2323)! * 2323!) * .997**2323 * .003** 
(2323-2323)

 = .0009307922

So we conclude that the probability that our trials with this patch
applied achieved exactly 2323 successes because of chance alone is . 
0009.


Not prematurely rounding p to a useless precision gives you

p**2323 ~ 0.000428

even.  And that just calculates the chance that 2323 tries all
succeed given that the chance for one to succeed is 897/900; it
doesn't compare two hypotheses at all.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] Disable DPM until code is audited

2006-12-02 Thread Segher Boessenkool

Most JS20 and JS21 have DPM disabled on the board,


What does this mean?  SLOF/js2x enables DPM always, for
example; there is no hardware override that I'm aware of.


According to S9.9 of 970FX UM:
  Dynamic power management can be disabled in the RAS units by  
asserting bit[0]

  in the JTAG register with modifier address 0x000800.


Oh okay, that's not a *hardware* disable.  Well could be that
it's enabled on JS2x, I dunno.

which is why we have not seen any SMP problems with them.   
However the Maple-D and the JS20 model Amos cites both have had  
problems with the one of these two modes.  That model seems to be  
the newest JS20 we've run on.


Sounds like the problem manifests itself on all 970FX and
no other CPUs from the 970 family.


I was under the impression that we had other 970FX js20s but  
perhaps we do not


The 2.2GHz ones are 970FX, the 1.6GHz ones are not.


My question remains: did you try with NAP disabled and
DPM enabled?


I see, so:
  HID0[NAP]=1
  HID0[DPM]=1
  MSR[POW]=1

is NAP and is different than:
  HID0[NAP]=0
  HID0[DPM]=1
  MSR[POW]=1
which is something else?


NAP=0 DPM=1 POW=whatever is what I was after.

DPM is not a power-down mode; it's just (fine-grained) clock
gating AFAIK, it shouldn't make anything slower ever.


Sure I'll try that.


Thanks!


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] Disable DPM until code is audited

2006-12-04 Thread Segher Boessenkool
-/* FIXME Do not set the NAP and DPM bits in HID0 until we  
have had a

- * chance to audit the safe halt and idle loop code. */
+/* FIXME Do not set the NAP bit in HID0 until we have had a  
chance

+ * to audit the safe halt and idle loop code. */
 hid0.bits.nap = 0;  /* NAP */
-hid0.bits.dpm = 0;  /* Dynamic Power Management */
+hid0.bits.dpm = 1;  /* Dynamic Power Management */
 hid0.bits.nhr = 1;  /* Not Hard Reset */
 hid0.bits.hdice_en = 1; /* enable HDEC */


This works on the JS20 in TRL.


Great to hear DPM is just fine.  NAP is expected to have
problems on CPUs before 970MP, it needs special setup.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Machine check: instruction-fetch TLB tablewalk

2006-12-17 Thread Segher Boessenkool

[NOTE: I'm assuming the decode here is correct]


(XEN) MACHINE CHECK: IS Recoverable



(XEN) SRR1: 0x000cf032
(XEN) 0b11: Exception caused by a hardware uncorrectable
(XEN)   error (UE) detected while doing a reload of an
(XEN)   instruction-fetch TLB tablewalk.
(XEN)
(XEN) DSISR: 0x0220


There was a parity error in the ITLB CAM array.  The hardware
won't recover this, but software can (blast the entry away,
reload it -- or just blast all TLBs away, probably easier, and
performance isn't an issue, this shouldn't happen often at all).

You should hardly ever see this.  If you add recovery routines,
there are some special settings (in HID4 I think) that will
introduce bit errors for you, it's almost impossible to test
this stuff otherwise, unless you have serious hardware problems :-)


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] Re: OF claim untrustworthy?

2007-01-10 Thread Segher Boessenkool

Repeated identical claims cause an unknown exception at the Forth
prompt, but don't succeed. I'm not sure if that becomes an error via 
the

client interface.


It does, the throw method would return an OF failure, this is 
expected.


The OF side of the specific client interface call has to
catch the error and return the appropriate kind of error,
this stuff cannot be automated.


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [xenppc-unstable] [XEN][POWERPC] Linker script simplification broke optimized builds.

2007-01-23 Thread Segher Boessenkool

[XEN][POWERPC] Linker script simplification broke optimized builds.

offending changeset was: changeset:   14126:c759c733f77d
So put it back and just update the symbols like a good little boy.


What, you're replacing one bug by a big bag of other
bugs?  Wouldn't it have been smarter to just fix the
bug you had?  Is there any bug report about the original
problem (I didn't see it)?

+SEARCH_DIR(=/usr/local/lib64); SEARCH_DIR(=/lib64); SEARCH_DIR 
(=/usr/lib64); SEARCH_DIR(=/usr/local/lib); SEARCH_DIR(=/ 
lib); SEARCH_DIR(=/usr/lib);


For example, this obviously is very very wrong.

I don't dare look at the rest of this patch (well I
did, but I don't know where to start commenting on
it ;-) )


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel


Re: [XenPPC] [PATCH] Linux shim code for ACM hypercalls

2007-06-04 Thread Segher Boessenkool

This patch provides the implementation of the shim layer for ACM
hypercalls on PPC.

Signed-off-by: Stefan Berger [EMAIL PROTECTED]


-ENOPATCH


Segher


___
Xen-ppc-devel mailing list
Xen-ppc-devel@lists.xensource.com
http://lists.xensource.com/xen-ppc-devel