On Apr 14, 2010, at 6:01 AM, Mogens Lindholdt Lauridsen wrote:

> First of all... I don't know what went wrong, but I apparently  
> didn't have your patch in the valgrind binary when I ran that test.  
> Sorry.

No worries, it's easy to do.

> I have now tried your test program without valgrind, and it hits an  
> assert:
> # ./dcbzl
> dcbzl: dcbzl.c:38: main: Assertion `block[(128)+i] == 0x00' failed.
> Aborted
> #
>
> It seems like the PPC core I use (e300c4) handle dcbzl in another  
> way. It only clears 32 bytes and not 128 as on your PPC970. A  
> college of mine found this comment in FFMpeg code (libavcodec/ppc/ 
> dsputil_ppc.c):
[...]
> The page 
> http://developer.apple.com/legacy/mac/library/technotes/tn/tn2087.html 
>  explains it quite well.
>
> I have looked at the FFMpeg code and they run the dcbzl instruction  
> and checks how it works, to see if they can use it.

That may not be a valid instruction for your particular processor.  I  
was working under the assumption that you were using a PPC970.  Does  
my test program run correctly *outside* of valgrind, or does it give  
you a SIGILL?

This page discusses dcbz versus dcbzl: 
http://www.powerdeveloper.org/forums/viewtopic.php?p=9842&sid=b54491befbf4ec2df4844d4d09657ddb#9842

My PPC970 user manual generally agrees with that comment.

Since the only difference between a dcbz and dcbzl is that bit 10 (in  
PPC-manual-speak, bit 21 in normal/VEX numbering) is 0/1, it may be  
that your e300 core is just ignoring that bit.  However those bits  
(PPC:6-10 / VEX:25-21) are reserved, which means they can result in  
boundedly undefined behavior if they aren't zeroed.

See pages 111 ("Boundedly Undefined") and 358 ("dcbz encoding") of the  
e300 manual: http://www.freescale.com/files/32bit/doc/ref_manual/e300coreRM.pdf

> So I have changed your test program so the BLOCK_SIZE is 32. And now  
> the program can run on my target. However valgrind crashed because  
> of the "vassert(lineszB == 128);" in you patch.
> I have also changed this, and now it works!


I'm glad this gets you unstuck, but I think that assertion (or perhaps  
a more informative error message) is still appropriate for the dcbzl  
instruction.  I think that FFMpeg shouldn't be using that instruction  
on older/embedded PPC processors.

But as I said before, I'm far from being a PPC expert, so I could be  
totally wrong on this.  I also don't know what the typical valgrind/ 
VEX behavior is in these cases of minor platform variation.  I would  
guess that it tries to "do the right thing" most of the time, which in  
this case would be to just zero the cache line size given by lineszB.   
This is especially true if most/all non-970 processors ignore those  
reserved bits.

-Dave


------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to