Here is a more detailed reply specifically about generating correct and optimal Sparc PIC sequences.
Let's get the non-PIC static case out of the way, we should always use: set symbol, %reg ! 32-bit setx symbol, %tmp_reg, %reg ! 64-bit Using calls to PIC stubs is completely pointless overhead when we are doing a static build. If we are generating PIC we need a stub function, there are a lot of ways to do this. One scheme is to simply emit a stub in each source file where the stub is needed. If the assembler and linker support got-data optimizations, we can emit the following sequence: sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG call __sparc_pic_stub or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG sethi %gdop_hix22(symbol), %TMP xor %TMP, %gdop_lox10(symbol), %TMP LDPTR [%PIC_REG + %TMP], %REG, %gdop(symbol) If the linker finds that the resolution of "symbol" (f.e. the symbol is static to the compilation unit, or marked as 'hidden') can be done at final link time, that LDPTR above will be optimized into: add %PIC_REG, %TMP, %REG The symbol offset will also be adjusted, as needed, in the %gdop_*() sethi and xor instructions. And finally, the reference to the global offset table slot that would have been generated for 'symbol', will be removed. Otherwise, if the linker and assembler lack gotdata optimization support, we use just a plain PIC sequence: sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG call __sparc_pic_stub or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG sethi %hi(symbol), %TMP or %TMP, %lo(symbol), %TMP LDPTR [%PIC_REG + %TMP], %REG If this doesn't work in some cases, we need to discover exactly why instead of dismissing my approach completely. Now, of course, all of the above if for -fPIC, but I see no sparc target (nor any target except one strange hpux case) that specifies -fpic instead of -fPIC in Configure. However that case is simple to accomodate as well, and I'd be happy to do so in my macros. About the RAS stack missing cost, every Sun produced UltraSPARC chip pushes unconditionally onto the RAS and does not special case the call .+8 pattern. Thinking about this logically, a RAS miss can (at best) perform like a full branch misprediction. Which on UltraSPARC results in a full pipeline flush as the mis-predicted fetched instructions needs to be cancelled and cleared out of the pipeline so we can begin executing down the correct path. This can be huge, depending upon the contents of the improperly fetched path of instructions. In the worst possible case, up to 18 instructions can need to be cancelled (UltraSPARC-I programmers manual, section 16.2.9, page 270) Worse than the immediate cost of the RAS corruption, is that every subsequent function return out of openssl is going to miss the RAS and incur the penalty as well. I consider it absolutely critical that the PIC sequences support being used in leaf functions, without save and restore instructions. And my macros have been designed with this in mind. When used, one need not allocate a register window merely for the sake of performing a PIC sequence. When we get past these initial patches and I post my DES work, you will see that I adjusted dec_enc.m4 to use the new PIC interfaces I created. In fact I had to, because the 13-bit relocations used there no longer fit with the crypto opcode code added. There are other problems in des_enc.m4, which I have fixed in my patches. As just one other example, it doesn't include opensslconf.h and therefore OPENSSL_SYSNAME_ULTRASPARC is never defined and the V9 sequences are never used for 32-bit, which hurts performance. Only one valid set of CPP tests exists for the various cases we care about on sparc. "__PIC__" means PIC code generation is in use. __arch64__ means 64-bit code generation, and __sparc_v9__ means V9 code can be used. These are fully standardized and both SunPRO and GCC set them consistently. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org