>>>> SIGBUS normally denotes unaligned access, but instruction in qustion
>>>> pulls 16-bit value and effective address is 16-bit aligned...
> 
> I just tried a test .S file with
> 
>       ldda    [%sp+0+16]%asi, %f0
>       ldda    [%sp+0+8]%asi, %f0
>       ldda    [%sp+0+4]%asi, %f0
>       ldda    [%sp+0+2]%asi, %f0
> 
> And +4 is the first one it SIGBUS'd on. So if the alignment in
> sparcv9a-mont is increases to +8, it would also work on T1.

Yes, but spacv9a-mont *relies* on +2, +4 and even +6. Offsets are used
to pick 16-bit words constituting single [naturally aligned] 64-bit
value, i.e. words reside on adjacent +2n offsets [with n=0-3]. It does
work on UltraSPARC-I-IV and SPARC64 V-VII.

But getting bn_mul_mont_fpu working on T1 is *not* the goal, because
performance would be *horrible* (1/10th or worth). Idea implemented in
updated sparcv9cap.c is to use this SIGBUS to heuristically detect T1
and to disable FP code in favor of pure IALU bn_mul_mont_int...

... But wait... The fact that I remember 1/10th coefficient must mean
that sparcv9a-mont did work under Solaris on T1. Question is how.
Chances are that Solaris kernel transparently fixes the ldda unaligned
access in trap handler. Meaning that *if/when* Linux chooses to do the
same, the above mentioned heuristic test will fail to detect T1... A.



______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to