#5219: Build ATLAS in dist mode with SSE2 only
--------------------------+-------------------------------------------------
 Reporter:  mabshoff      |        Owner:  mabshoff
     Type:  defect        |       Status:  assigned
 Priority:  critical      |    Milestone:  sage-3.3
Component:  distribution  |   Resolution:          
 Keywords:                |  
--------------------------+-------------------------------------------------
Comment (by mabshoff):

 This was not as simple than I thought it would be. To do this we need to
 do two things:

  * disable the SSE3 detection by making it return "FAILURE"
 unconditionally
  * select ARCH defaults that allow SSE2 on 32 and 64 bit boxen. ATLAS
 3.8.2 only offers that for Hammer, i.e. ARCH=20.

 When doing both of the above on sage.math we get an libatlas.a without any
 SSE3 instructions:

 {{{
 atlas-3.8.2.p2/Hammer/lib$ ~/SSE2-project/sse-2.bash libatlas.a
 found SSE2 addpd: 2
 found SSE2 addsd: 2
 found SSE2 movapd: 208
 found SSE2 movlpd: 131
 found SSE2 movsd: 4057
 found SSE2 movupd: 1
 found SSE2 mulpd: 2
 found SSE2 mulsd: 2
 found SSE2 orpd: 174
 found SSE2 unpcklpd: 1
 found SSE2 xorpd: 174
 }}}

 Contrast this with a PNI enabled ATLAS from the same machine:
 {{{
 atlas-3.8.2.p2/Hammer/lib$ ~/SSE2-project/sse-2.bash
 /scratch/mabshoff/sage-3.3.rc1/local/lib/libatlas.a
 found SSE2 pshufd: 394
 found SSE2 addpd: 41840
 found SSE2 addsd: 74197
 found SSE2 andnpd: 3
 found SSE2 andpd: 34
 found SSE2 comisd: 1393
 found SSE2 cvtsd2ss: 8
 found SSE2 cvtsi2sd: 4
 found SSE2 cvtss2sd: 20
 found SSE2 divsd: 304
 found SSE2 maxpd: 4
 found SSE2 maxsd: 4
 found SSE2 movapd: 108245
 found SSE2 movhpd: 1092
 found SSE2 movlpd: 1111
 found SSE2 movmskpd: 8
 found SSE2 movsd: 27295
 found SSE2 movupd: 80
 found SSE2 mulpd: 41882
 found SSE2 mulsd: 79686
 found SSE2 orpd: 1152
 found SSE2 sqrtsd: 8
 found SSE2 subsd: 1658
 found SSE2 ucomisd: 1392
 found SSE2 unpckhpd: 86
 found SSE2 unpcklpd: 90
 found SSE2 xorpd: 1151
 found SSE3 haddpd: 1224
 found SSE3 haddps: 530
 found SSE3 movddup: 4
 found SSE3 movshdup: 2
 found SSE3 movsldup: 3
 }}}
 It is unclear how much of a performance penalty there is when selecting a
 Hammer ATLAS for a P4 arch, but it could be substantial. Someone needs to
 collect some numbers. It might be a good idea to tune the P4 kernels by
 selecting {{{-A 16}}}, but this would require adding tuning info for that
 config in 64 bits.

 In the long term it might be beneficial to build ATLAS libs on various
 CPUs and then use a runtime selection to put the best version in
 LD_LIBRARY_PATH.

 I will build an spkg with the above changes since the SSE3 issue is really
 becoming a problem. One should note that for optimum performance one needs
 to build from sources.

 Cheers,

 Michael

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/5219#comment:3>
Sage <http://sagemath.org/>
Sage - Open Source Mathematical Software: Building the Car Instead of 
Reinventing the Wheel
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to