Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] Re: Re: Userspace testing results (many 
> kernels, many svn trees)
> 
>     Michael> Could the high/low bits be swapped?  What happends if you
>     Michael> change cycles_t from long long to long?  Could you try
>     Michael> running the clock_test utility?
> 
> What seems to be happening is that mftb is giving the low 32 bits of
> the timebase (as expected on ppc32).  Since your get_cycles() is
> returning a long long, those 32 bits get put in the most significant
> 32 bits of the return value, and the low 32 bits are garbage (ppc is
> big endian).
> 
> If I compile clock_test for ppc32, I see that get_cycles() compiles to:
> 
>       1000064c <get_cycles>:
>       1000064c:       7c 6c 42 e6     mftb    r3
>       10000650:       4e 80 00 20     blr
> 
> For comparison, a function like
> 
>       unsigned long long blah(void) { return 0x100000002ull; }
> 
> compiles to
> 
>       00000000 <blah>:
>          0:   38 60 00 01     li      r3,1
>          4:   38 80 00 02     li      r4,2
>          8:   4e 80 00 20     blr
> 
> In other words the convention on ppc32 is that unsigned long long
> return values have the high 32 bits in r3 and the low 32 bits in r4.
> 
> I think you want to use something like
> 
>       typedef unsigned long long cycles_t;
>       static inline cycles_t get_cycles()
>       {
>               unsigned long low, hi, hi2;
>       
>               do {
>                       asm volatile ("mftbu %0" : "=r" (hi));
>                       asm volatile ("mftb  %0" : "=r" (low));
>                       asm volatile ("mftbu %0" : "=r" (hi2));
>               } while (hi != hi2);
>       
>               return ((unsigned long long) hi << 32) | low;
>       }
> 
> for ppc32.

I'm convinced, I moved it back to 32 bit.

> However, this is not quite enough to make things work on
> all powerpc systems, because the timebase does not necessarily run at
> the same speed as the CPU.  For example, on an IBM JS20 blade,
> clock_test prints
> 
>       1 sec = 6536.8 usec
>       1 sec = 6537.05 usec
> 
> (both as a 32-bit and 64-bit executable) because, as /proc/cpuinfo shows:
> 
>       processor       : 0
>       cpu             : PPC970FX, altivec supported
>       clock           : 2194.624509MHz
>       revision        : 3.0
>       
>       processor       : 1
>       cpu             : PPC970FX, altivec supported
>       clock           : 2194.624509MHz
>       revision        : 3.0
>       
>       timebase        : 14318000
>       machine         : CHRP IBM,8842-P2C
> 
> the timebase runs at about 14.3 MHz, or approx 153 times slower than
> the CPU clock.
> 
> I'm not sure how you want to fix this in perftest.

I just added some cycle calibration code to get_cpu_mhz().
Check it out (you can just run clock_test).

-- 
MST
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to