I usually configure --with-debugging=0 COPTFLAGS='-O2 -march=native' or similar. There's a tension here between optimizing aggressively for the current machine and making binaries that work on other machines. Most configure systems default to making somewhat portable binaries, so that's a principal of least surprise. (Though you're no novice and seem to have been surprised anyway.)
I'd kinda prefer if we recommended making portable binaries that run-time detected when to use newer instructions where it matters. Barry Smith <[email protected]> writes: > Shouldn't configure be setting something appropriate for this > automatically? This is nuts, it means when users do a ./configure make unless > they pass weird arguments they sure as heck don't know about to the compiler > they won't get any of the glory that they expect and that has been in almost > all Intel systems forever. > > Barry > > I run ./configure --with-debugging=0 and I get none of the stuff added by > Intel for 15+ years? > > >> On Feb 13, 2021, at 11:26 PM, Jed Brown <[email protected]> wrote: >> >> Use -march=native or similar. The default target is basic x86_64, which has >> only SSE2. >> >> Barry Smith <[email protected]> writes: >> >>> PETSc source has code like defined(__AVX2__) in the source but it does not >>> seem to be able to find any of these macros (icc or gcc) on the petsc-02 >>> system >>> >>> Are these macros supposed to be defined? How does on get them to be >>> defined? Why are they not define? What am I doing wrong? >>> >>> Keep reading >>> >>> $ lscpu >>> Architecture: x86_64 >>> CPU op-mode(s): 32-bit, 64-bit >>> Byte Order: Little Endian >>> CPU(s): 64 >>> On-line CPU(s) list: 0-63 >>> Thread(s) per core: 2 >>> Core(s) per socket: 16 >>> Socket(s): 2 >>> NUMA node(s): 2 >>> Vendor ID: GenuineIntel >>> CPU family: 6 >>> Model: 85 >>> Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz >>> Stepping: 7 >>> CPU MHz: 1000.603 >>> CPU max MHz: 2301.0000 >>> CPU min MHz: 1000.0000 >>> BogoMIPS: 4600.00 >>> Virtualization: VT-x >>> L1d cache: 32K >>> L1i cache: 32K >>> L2 cache: 1024K >>> L3 cache: 22528K >>> NUMA node0 CPU(s): 0-15,32-47 >>> NUMA node1 CPU(s): 16-31,48-63 >>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall >>> nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl >>> xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl >>> vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic >>> movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm >>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd >>> mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid >>> fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f >>> avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw >>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total >>> cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear >>> flush_l1d arch_capabilities >>> >>> Test program >>> >>> #if defined(__FMA__) >>> #error FMA >>> #endif >>> >>> #if defined(__AVX512F__) >>> #error AVX512F >>> #endif >>> >>> #if defined(__AVX2__) >>> #error AVX2 >>> #endif >>> >>> >>> icc mytest.c >>> /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In >>> function `_start': >>> (.text+0x20): undefined reference to `main'
