#5219: Build ATLAS in dist mode with SSE2 only
--------------------------+-------------------------------------------------
Reporter: mabshoff | Owner: mabshoff
Type: defect | Status: assigned
Priority: critical | Milestone: sage-3.3
Component: distribution | Resolution:
Keywords: |
--------------------------+-------------------------------------------------
Comment (by mabshoff):
This was not as simple than I thought it would be. To do this we need to
do two things:
* disable the SSE3 detection by making it return "FAILURE"
unconditionally
* select ARCH defaults that allow SSE2 on 32 and 64 bit boxen. ATLAS
3.8.2 only offers that for Hammer, i.e. ARCH=20.
When doing both of the above on sage.math we get an libatlas.a without any
SSE3 instructions:
{{{
atlas-3.8.2.p2/Hammer/lib$ ~/SSE2-project/sse-2.bash libatlas.a
found SSE2 addpd: 2
found SSE2 addsd: 2
found SSE2 movapd: 208
found SSE2 movlpd: 131
found SSE2 movsd: 4057
found SSE2 movupd: 1
found SSE2 mulpd: 2
found SSE2 mulsd: 2
found SSE2 orpd: 174
found SSE2 unpcklpd: 1
found SSE2 xorpd: 174
}}}
Contrast this with a PNI enabled ATLAS from the same machine:
{{{
atlas-3.8.2.p2/Hammer/lib$ ~/SSE2-project/sse-2.bash
/scratch/mabshoff/sage-3.3.rc1/local/lib/libatlas.a
found SSE2 pshufd: 394
found SSE2 addpd: 41840
found SSE2 addsd: 74197
found SSE2 andnpd: 3
found SSE2 andpd: 34
found SSE2 comisd: 1393
found SSE2 cvtsd2ss: 8
found SSE2 cvtsi2sd: 4
found SSE2 cvtss2sd: 20
found SSE2 divsd: 304
found SSE2 maxpd: 4
found SSE2 maxsd: 4
found SSE2 movapd: 108245
found SSE2 movhpd: 1092
found SSE2 movlpd: 1111
found SSE2 movmskpd: 8
found SSE2 movsd: 27295
found SSE2 movupd: 80
found SSE2 mulpd: 41882
found SSE2 mulsd: 79686
found SSE2 orpd: 1152
found SSE2 sqrtsd: 8
found SSE2 subsd: 1658
found SSE2 ucomisd: 1392
found SSE2 unpckhpd: 86
found SSE2 unpcklpd: 90
found SSE2 xorpd: 1151
found SSE3 haddpd: 1224
found SSE3 haddps: 530
found SSE3 movddup: 4
found SSE3 movshdup: 2
found SSE3 movsldup: 3
}}}
It is unclear how much of a performance penalty there is when selecting a
Hammer ATLAS for a P4 arch, but it could be substantial. Someone needs to
collect some numbers. It might be a good idea to tune the P4 kernels by
selecting {{{-A 16}}}, but this would require adding tuning info for that
config in 64 bits.
In the long term it might be beneficial to build ATLAS libs on various
CPUs and then use a runtime selection to put the best version in
LD_LIBRARY_PATH.
I will build an spkg with the above changes since the SSE3 issue is really
becoming a problem. One should note that for optimum performance one needs
to build from sources.
Cheers,
Michael
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/5219#comment:3>
Sage <http://sagemath.org/>
Sage - Open Source Mathematical Software: Building the Car Instead of
Reinventing the Wheel
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en
-~----------~----~----~----~------~----~------~--~---