>That may not be a valid instruction for your particular processor.  I 
>was working under the assumption that you were using a PPC970.  Does 
>my test program run correctly *outside* of valgrind, or does it give 
>you a SIGILL?
If I change the BLOCK_SIZE from 128 to 32, your test program runs fine on 
my CPU. I will investigate this is pure luck, or dcbzl really is 
supported. 

I will also take a deeper look into how FFmpeg is handling/using this 
instruction.

Thanks very much for your help so far!

-Mogens






Dave Goodell <good...@mcs.anl.gov> 
14-04-2010 19:35

To
Mogens Lindholdt Lauridsen <m...@bang-olufsen.dk>
cc
valgrind-users@lists.sourceforge.net
Subject
Re: [Valgrind-users] PPC: unhandled instruction: 0x7C2907EC






On Apr 14, 2010, at 6:01 AM, Mogens Lindholdt Lauridsen wrote:

> First of all... I don't know what went wrong, but I apparently 
> didn't have your patch in the valgrind binary when I ran that test. 
> Sorry.

No worries, it's easy to do.

> I have now tried your test program without valgrind, and it hits an 
> assert:
> # ./dcbzl
> dcbzl: dcbzl.c:38: main: Assertion `block[(128)+i] == 0x00' failed.
> Aborted
> #
>
> It seems like the PPC core I use (e300c4) handle dcbzl in another 
> way. It only clears 32 bytes and not 128 as on your PPC970. A 
> college of mine found this comment in FFMpeg code (libavcodec/ppc/ 
> dsputil_ppc.c):
[...]
> The page 
http://developer.apple.com/legacy/mac/library/technotes/tn/tn2087.html 
>  explains it quite well.
>
> I have looked at the FFMpeg code and they run the dcbzl instruction 
> and checks how it works, to see if they can use it.

That may not be a valid instruction for your particular processor.  I 
was working under the assumption that you were using a PPC970.  Does 
my test program run correctly *outside* of valgrind, or does it give 
you a SIGILL?

This page discusses dcbz versus dcbzl: 
http://www.powerdeveloper.org/forums/viewtopic.php?p=9842&sid=b54491befbf4ec2df4844d4d09657ddb#9842


My PPC970 user manual generally agrees with that comment.

Since the only difference between a dcbz and dcbzl is that bit 10 (in 
PPC-manual-speak, bit 21 in normal/VEX numbering) is 0/1, it may be 
that your e300 core is just ignoring that bit.  However those bits 
(PPC:6-10 / VEX:25-21) are reserved, which means they can result in 
boundedly undefined behavior if they aren't zeroed.

See pages 111 ("Boundedly Undefined") and 358 ("dcbz encoding") of the 
e300 manual: 
http://www.freescale.com/files/32bit/doc/ref_manual/e300coreRM.pdf

> So I have changed your test program so the BLOCK_SIZE is 32. And now 
> the program can run on my target. However valgrind crashed because 
> of the "vassert(lineszB == 128);" in you patch.
> I have also changed this, and now it works!


I'm glad this gets you unstuck, but I think that assertion (or perhaps 
a more informative error message) is still appropriate for the dcbzl 
instruction.  I think that FFMpeg shouldn't be using that instruction 
on older/embedded PPC processors.

But as I said before, I'm far from being a PPC expert, so I could be 
totally wrong on this.  I also don't know what the typical valgrind/ 
VEX behavior is in these cases of minor platform variation.  I would 
guess that it tries to "do the right thing" most of the time, which in 
this case would be to just zero the cache line size given by lineszB. 
This is especially true if most/all non-970 processors ignore those 
reserved bits.

-Dave



------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to