> And the only difference would be to relax pattern recognition so that 
delay slot is examined for %o7-based arithmetic for all call instructions, 
not only call .+8 in particular. Is this correctly understood?

Yes, you correctly understand this. But it's not as easy as that. I don't 
need to get into Purify implementation details, but remember, if the 
target gets pushed more than 13 bits away we need to turn the call/add 
into sethi/or/call/add or something like that. The fact that this is in a 
delay slot and also that the %o7 value from the call is a source register 
complicates things even more. It would not be impossible to handle this 
but there is the ROI to consider.

You asked when, specifically, Purify stretches code. The short answer is: 
anywhere we need to. Definitely at the top of a function, and at every 
memory load or store instruction, and after function calls. Beyond that, 
we might do insertion on any instruction at all, subject to our needs.

<sales_pitch>
The basic Purify insertion is on load and store instructions; everything 
else is in support of that. Purify's whole value proposition is to 
pinpoint memory errors like reading uninitialized memory, or touching 
beyond the end of a block or the end of the current stack, or touching 
memory you've already freed. In contrast, malloc-debug libraries only 
report bad writes, and only after the fact. They spray patterns into freed 
memory in the hopes that bad reads will cause visible misbehavior in the 
program's future. Unlike those, Purify sees both reads and writes when 
they happen, pinpointing the faulting instruction instead of telling you 
"a bad thing happened sometime in the past."
</sales_pitch>

Best case (on SPARC) is that we insert two instructions before each load 
or store. Worst case, we "unravel" instructions out of delay slots, add 
more instructions to "shadow" certain types of register usage, and deal 
with offsets that have grown too large by inserting additional math.

You asked how you can know that Purify will *not* do insertion or stretch 
your code. That's a little tricky. If you have two non-global symbols that 
identify data blocks, and there are no global symbols or code 
(instructions) between them, there won't be any stretching from today's 
Purify. But any instructions at all are subject to insertion, and in some 
cases we insert dead space (a "red zone") before a global data symbol.

Now, back to libcrypto: while you and I have been talking, our resident 
genius instrumentation engine guy has actually coded some modifications to 
support the .PIC.me.up pattern as it appears in 0.9.8j. This supports our 
current customers who use past, released versions of libcrypto on SPARC. I 
expect this change to appear in an upcoming release of PurifyPlus, but I 
can't commit to it or give a date because I'm not authorized to commit to 
future product features or support in a public forum.

The new pattern recognizer is pretty specific, intending to support 
existing customers with libcrypto binaries. It's not a general-purpose 
recognizer for optimized interprocedural PIC sequences. It recognizes 
patterns that stay very close to this:

   call target
   mov offset,%o0
   ...

target:
   add %o0,%o7,%o0

The new code recognizes this when "offset" is the distance from the call 
instruction to "target," and the "add" really is the very first 
instruction at the call target. We'll even patch the offset if the 
distance from the caller to the target grows past 13 bits.

The developer also coded changes to recognize and patch the self-relative 
offset in data from .PIC.DES_SPtrans to DES_SPtrans. I don't know the 
details and restrictions on this one. Like I said, it's really meant for 
customers with current libcrypto binaries.

Regardless of any new recognizers which might appear in the future, there 
are two Purify-safe ways to do PIC stuff on SPARC:

Short form:
L1:     call8
        add     %o7,(target-L1),regZ

Long form:
        sethi   %hi(target-L2),regX
        or      regX,%lo(target-L2),regY
L2:     call8
        add     regY,%o7,regZ

The short form will work even if Purify stretches the distance farther 
than 13 bits will reach. Purify is flexible: regX, regY, and regZ can be 
different or they can overlap, and the call8 can happen any time before 
the add, and you can move the o7 result of the call8 to another register 
if you want and then use that: it doesn't have to stay in o7. You can use 
the same call8-derived base register for multiple PIC computations, but 
you can't use one computed address (like regZ) as the base for another. 
These two patterns work for both 32-bit and 64-bit programs.

Regarding the patch you referred to 
(http://cvs.openssl.org/chngview?cn=17898): I'm sorry to say Purify is not 
as flexible as you might want. In the short form we recognize "add" using 
%o7 after call8, but not "sub." So the patched aes_sparcv9 module is *not* 
Purify-friendly yet. To fix this, change "sub" to "add" and reverse the 
subtraction that computes the offset:

BAD:
1:      call    .+8
        sub     %o7,1b-AES_Te,%o4

GOOD:
1:      call    .+8
        add     %o7,AES_Te-1b,%o4

Thanks for working with us on this. Let me know if you have more thoughts 
or questions.

-- Allan Pratt, apr...@us.ibm.com
Rational software division of IBM

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to