I usually configure --with-debugging=0 COPTFLAGS='-O2 -march=native' or 
similar. There's a tension here between optimizing aggressively for the current 
machine and making binaries that work on other machines. Most configure systems 
default to making somewhat portable binaries, so that's a principal of least 
surprise. (Though you're no novice and seem to have been surprised anyway.)

I'd kinda prefer if we recommended making portable binaries that run-time 
detected when to use newer instructions where it matters.

Barry Smith <[email protected]> writes:

>   Shouldn't configure be setting something appropriate for this 
> automatically? This is nuts, it means when users do a ./configure make unless 
> they pass weird arguments they sure as heck don't know about to the compiler 
> they won't get any of the glory that they expect and that has been in almost 
> all Intel systems forever.
>
>   Barry
>
>   I run ./configure --with-debugging=0 and I get none of the stuff added by 
> Intel for 15+ years?
>
>
>> On Feb 13, 2021, at 11:26 PM, Jed Brown <[email protected]> wrote:
>> 
>> Use -march=native or similar. The default target is basic x86_64, which has 
>> only SSE2.
>> 
>> Barry Smith <[email protected]> writes:
>> 
>>> PETSc source has code like defined(__AVX2__) in the source but it does not 
>>> seem to be able to find any of these macros (icc or gcc) on the petsc-02 
>>> system
>>> 
>>> Are these macros supposed to be defined? How does on get them to be 
>>> defined? Why are they not define? What am I doing wrong?
>>> 
>>> Keep reading
>>> 
>>> $ lscpu 
>>> Architecture:        x86_64
>>> CPU op-mode(s):      32-bit, 64-bit
>>> Byte Order:          Little Endian
>>> CPU(s):              64
>>> On-line CPU(s) list: 0-63
>>> Thread(s) per core:  2
>>> Core(s) per socket:  16
>>> Socket(s):           2
>>> NUMA node(s):        2
>>> Vendor ID:           GenuineIntel
>>> CPU family:          6
>>> Model:               85
>>> Model name:          Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
>>> Stepping:            7
>>> CPU MHz:             1000.603
>>> CPU max MHz:         2301.0000
>>> CPU min MHz:         1000.0000
>>> BogoMIPS:            4600.00
>>> Virtualization:      VT-x
>>> L1d cache:           32K
>>> L1i cache:           32K
>>> L2 cache:            1024K
>>> L3 cache:            22528K
>>> NUMA node0 CPU(s):   0-15,32-47
>>> NUMA node1 CPU(s):   16-31,48-63
>>> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
>>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall 
>>> nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl 
>>> xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl 
>>> vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic 
>>> movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 
>>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd 
>>> mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid 
>>> fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f 
>>> avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw 
>>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total 
>>> cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear 
>>> flush_l1d arch_capabilities
>>> 
>>> Test program 
>>> 
>>> #if defined(__FMA__)
>>> #error FMA
>>> #endif
>>> 
>>> #if defined(__AVX512F__)
>>> #error AVX512F
>>> #endif
>>> 
>>> #if defined(__AVX2__)
>>> #error AVX2
>>> #endif
>>> 
>>> 
>>> icc mytest.c
>>> /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In 
>>> function `_start':
>>> (.text+0x20): undefined reference to `main'

Reply via email to