> And the only difference would be to relax pattern recognition so that delay slot is examined for %o7-based arithmetic for all call instructions, not only call .+8 in particular. Is this correctly understood?
Yes, you correctly understand this. But it's not as easy as that. I don't need to get into Purify implementation details, but remember, if the target gets pushed more than 13 bits away we need to turn the call/add into sethi/or/call/add or something like that. The fact that this is in a delay slot and also that the %o7 value from the call is a source register complicates things even more. It would not be impossible to handle this but there is the ROI to consider. You asked when, specifically, Purify stretches code. The short answer is: anywhere we need to. Definitely at the top of a function, and at every memory load or store instruction, and after function calls. Beyond that, we might do insertion on any instruction at all, subject to our needs. <sales_pitch> The basic Purify insertion is on load and store instructions; everything else is in support of that. Purify's whole value proposition is to pinpoint memory errors like reading uninitialized memory, or touching beyond the end of a block or the end of the current stack, or touching memory you've already freed. In contrast, malloc-debug libraries only report bad writes, and only after the fact. They spray patterns into freed memory in the hopes that bad reads will cause visible misbehavior in the program's future. Unlike those, Purify sees both reads and writes when they happen, pinpointing the faulting instruction instead of telling you "a bad thing happened sometime in the past." </sales_pitch> Best case (on SPARC) is that we insert two instructions before each load or store. Worst case, we "unravel" instructions out of delay slots, add more instructions to "shadow" certain types of register usage, and deal with offsets that have grown too large by inserting additional math. You asked how you can know that Purify will *not* do insertion or stretch your code. That's a little tricky. If you have two non-global symbols that identify data blocks, and there are no global symbols or code (instructions) between them, there won't be any stretching from today's Purify. But any instructions at all are subject to insertion, and in some cases we insert dead space (a "red zone") before a global data symbol. Now, back to libcrypto: while you and I have been talking, our resident genius instrumentation engine guy has actually coded some modifications to support the .PIC.me.up pattern as it appears in 0.9.8j. This supports our current customers who use past, released versions of libcrypto on SPARC. I expect this change to appear in an upcoming release of PurifyPlus, but I can't commit to it or give a date because I'm not authorized to commit to future product features or support in a public forum. The new pattern recognizer is pretty specific, intending to support existing customers with libcrypto binaries. It's not a general-purpose recognizer for optimized interprocedural PIC sequences. It recognizes patterns that stay very close to this: call target mov offset,%o0 ... target: add %o0,%o7,%o0 The new code recognizes this when "offset" is the distance from the call instruction to "target," and the "add" really is the very first instruction at the call target. We'll even patch the offset if the distance from the caller to the target grows past 13 bits. The developer also coded changes to recognize and patch the self-relative offset in data from .PIC.DES_SPtrans to DES_SPtrans. I don't know the details and restrictions on this one. Like I said, it's really meant for customers with current libcrypto binaries. Regardless of any new recognizers which might appear in the future, there are two Purify-safe ways to do PIC stuff on SPARC: Short form: L1: call8 add %o7,(target-L1),regZ Long form: sethi %hi(target-L2),regX or regX,%lo(target-L2),regY L2: call8 add regY,%o7,regZ The short form will work even if Purify stretches the distance farther than 13 bits will reach. Purify is flexible: regX, regY, and regZ can be different or they can overlap, and the call8 can happen any time before the add, and you can move the o7 result of the call8 to another register if you want and then use that: it doesn't have to stay in o7. You can use the same call8-derived base register for multiple PIC computations, but you can't use one computed address (like regZ) as the base for another. These two patterns work for both 32-bit and 64-bit programs. Regarding the patch you referred to (http://cvs.openssl.org/chngview?cn=17898): I'm sorry to say Purify is not as flexible as you might want. In the short form we recognize "add" using %o7 after call8, but not "sub." So the patched aes_sparcv9 module is *not* Purify-friendly yet. To fix this, change "sub" to "add" and reverse the subtraction that computes the offset: BAD: 1: call .+8 sub %o7,1b-AES_Te,%o4 GOOD: 1: call .+8 add %o7,AES_Te-1b,%o4 Thanks for working with us on this. Let me know if you have more thoughts or questions. -- Allan Pratt, apr...@us.ibm.com Rational software division of IBM ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org