Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

David Miller Fri, 21 Sep 2012 10:03:07 -0700

Here is a more detailed reply specifically about generating
correct and optimal Sparc PIC sequences.


Let's get the non-PIC static case out of the way, we should
always use:

        set     symbol, %reg            ! 32-bit
        setx    symbol, %tmp_reg, %reg  ! 64-bit

Using calls to PIC stubs is completely pointless overhead when we are
doing a static build.

If we are generating PIC we need a stub function, there are a lot of
ways to do this.  One scheme is to simply emit a stub in each source
file where the stub is needed.

If the assembler and linker support got-data optimizations, we can
emit the following sequence:

        sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG
        call    __sparc_pic_stub
         or     %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG
        sethi   %gdop_hix22(symbol), %TMP
        xor     %TMP, %gdop_lox10(symbol), %TMP
        LDPTR   [%PIC_REG + %TMP], %REG, %gdop(symbol)

If the linker finds that the resolution of "symbol" (f.e. the symbol
is static to the compilation unit, or marked as 'hidden') can be done
at final link time, that LDPTR above will be optimized into:

        add     %PIC_REG, %TMP, %REG

The symbol offset will also be adjusted, as needed, in the %gdop_*()
sethi and xor instructions.  And finally, the reference to the global
offset table slot that would have been generated for 'symbol', will be
removed.

Otherwise, if the linker and assembler lack gotdata optimization
support, we use just a plain PIC sequence:

        sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG
        call    __sparc_pic_stub
         or     %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG
        sethi   %hi(symbol), %TMP
        or      %TMP, %lo(symbol), %TMP
        LDPTR   [%PIC_REG + %TMP], %REG

If this doesn't work in some cases, we need to discover exactly
why instead of dismissing my approach completely.

Now, of course, all of the above if for -fPIC, but I see no sparc
target (nor any target except one strange hpux case) that specifies
-fpic instead of -fPIC in Configure.

However that case is simple to accomodate as well, and I'd be happy to
do so in my macros.

About the RAS stack missing cost, every Sun produced UltraSPARC chip
pushes unconditionally onto the RAS and does not special case the

        call    .+8

pattern.

Thinking about this logically, a RAS miss can (at best) perform like a
full branch misprediction.  Which on UltraSPARC results in a full
pipeline flush as the mis-predicted fetched instructions needs to be
cancelled and cleared out of the pipeline so we can begin executing
down the correct path.

This can be huge, depending upon the contents of the improperly
fetched path of instructions.  In the worst possible case, up to 18
instructions can need to be cancelled (UltraSPARC-I programmers
manual, section 16.2.9, page 270)

Worse than the immediate cost of the RAS corruption, is that every
subsequent function return out of openssl is going to miss the RAS
and incur the penalty as well.

I consider it absolutely critical that the PIC sequences support being
used in leaf functions, without save and restore instructions.  And my
macros have been designed with this in mind.

When used, one need not allocate a register window merely for the sake
of performing a PIC sequence.

When we get past these initial patches and I post my DES work, you
will see that I adjusted dec_enc.m4 to use the new PIC interfaces I
created.  In fact I had to, because the 13-bit relocations used there
no longer fit with the crypto opcode code added.

There are other problems in des_enc.m4, which I have fixed in my
patches.  As just one other example, it doesn't include opensslconf.h
and therefore OPENSSL_SYSNAME_ULTRASPARC is never defined and the V9
sequences are never used for 32-bit, which hurts performance.

Only one valid set of CPP tests exists for the various cases we care
about on sparc.  "__PIC__" means PIC code generation is in use.
__arch64__ means 64-bit code generation, and __sparc_v9__ means V9
code can be used.  These are fully standardized and both SunPRO and
GCC set them consistently.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

Reply via email to