> On 14 Feb 2021, at 4:52 PM, Zhang, Hong via petsc-dev <[email protected]> > wrote: > > > >> On Feb 14, 2021, at 5:05 AM, Patrick Sanan <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >>> Am 14.02.2021 um 07:22 schrieb Barry Smith <[email protected] >>> <mailto:[email protected]>>: >>> >>> >>> >>>> On Feb 13, 2021, at 11:58 PM, Jed Brown <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> I usually configure --with-debugging=0 COPTFLAGS='-O2 -march=native' or >>>> similar. There's a tension here between optimizing aggressively for the >>>> current machine and making binaries that work on other machines. Most >>>> configure systems default to making somewhat portable binaries, so that's >>>> a principal of least surprise. (Though you're no novice and seem to have >>>> been surprised anyway.) >>>> >>>> I'd kinda prefer if we recommended making portable binaries that run-time >>>> detected when to use newer instructions where it matters. >>> >>> How do we do this? What can we put in configure to do this. >>> >>> Yes, I never paid attention to the AVX nonsense over the years and never >>> realized that Intel and Gnu (and hence PETSc) both compile by default for >>> machines I used in my twenties. >>> >>> Expecting PETSc users to automatically add -march= is not realistic. I >>> will try to rig something up in configure where if the user does not >>> provide march something reasonable is selected. >> A softer (yet trivial to implement) option might also be to just alert the >> user that these flags exist in the usual message about using default >> optimization flags. Something like this would encourage users to do what Jed >> is doing: >> >> ***** WARNING: Using default optimization C flags -g -O3 >> You might consider manually setting optimal optimization flags for your >> system with >> COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for >> examples. >> In particular, you may want to supply specific flags (e.g. -march=native) >> to take advantage of higher-performance instructions. > > I think this is a reasonable thing to do.
This is a reasonable message to print on the screen, but I don’t think this is a reasonable flag to impose by default. You are basically asking all package managers to add a new flag (-march=generic) which was previously not needed. I’m crossing my fingers Jed has a clever way of "making portable binaries that run-time detected when to use newer instructions where it matters”, because -march=native by default is just not practical when deploying software. Thanks, Pierre > We should also inform users that tuning -march options may enable > vectorization instructions such as SSE(3 and above) and AVX but generate > nonportable binaries. > > If we add -march=native to the configure test, we will need to run > executables to make sure the specified instruction sets are supported by the > CPU where the code is running. For PETSc, the executables should cover all > the intrinsics we use in the code ideally; otherwise, users will get run-time > errors when there is a mismatch in vectorization instructions between > compiler support and CPU support. > > Hong > >> >> None of the examples in config/examples actually use -march=native, and this >> is a very common thing to do that, as you point out, isn't obvious until you >> know you have to do it, so it seems to be worth the screen space. >> >> >> >> >> >>> >>> Barry >>> >>> >>>> >>>> Barry Smith <[email protected] <mailto:[email protected]>> writes: >>>> >>>>> Shouldn't configure be setting something appropriate for this >>>>> automatically? This is nuts, it means when users do a ./configure make >>>>> unless they pass weird arguments they sure as heck don't know about to >>>>> the compiler they won't get any of the glory that they expect and that >>>>> has been in almost all Intel systems forever. >>>>> >>>>> Barry >>>>> >>>>> I run ./configure --with-debugging=0 and I get none of the stuff added by >>>>> Intel for 15+ years? >>>>> >>>>> >>>>>> On Feb 13, 2021, at 11:26 PM, Jed Brown <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Use -march=native or similar. The default target is basic x86_64, which >>>>>> has only SSE2. >>>>>> >>>>>> Barry Smith <[email protected] <mailto:[email protected]>> writes: >>>>>> >>>>>>> PETSc source has code like defined(__AVX2__) in the source but it does >>>>>>> not seem to be able to find any of these macros (icc or gcc) on the >>>>>>> petsc-02 system >>>>>>> >>>>>>> Are these macros supposed to be defined? How does on get them to be >>>>>>> defined? Why are they not define? What am I doing wrong? >>>>>>> >>>>>>> Keep reading >>>>>>> >>>>>>> $ lscpu >>>>>>> Architecture: x86_64 >>>>>>> CPU op-mode(s): 32-bit, 64-bit >>>>>>> Byte Order: Little Endian >>>>>>> CPU(s): 64 >>>>>>> On-line CPU(s) list: 0-63 >>>>>>> Thread(s) per core: 2 >>>>>>> Core(s) per socket: 16 >>>>>>> Socket(s): 2 >>>>>>> NUMA node(s): 2 >>>>>>> Vendor ID: GenuineIntel >>>>>>> CPU family: 6 >>>>>>> Model: 85 >>>>>>> Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz >>>>>>> Stepping: 7 >>>>>>> CPU MHz: 1000.603 >>>>>>> CPU max MHz: 2301.0000 >>>>>>> CPU min MHz: 1000.0000 >>>>>>> BogoMIPS: 4600.00 >>>>>>> Virtualization: VT-x >>>>>>> L1d cache: 32K >>>>>>> L1i cache: 32K >>>>>>> L2 cache: 1024K >>>>>>> L3 cache: 22528K >>>>>>> NUMA node0 CPU(s): 0-15,32-47 >>>>>>> NUMA node1 CPU(s): 16-31,48-63 >>>>>>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>>>>>> pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe >>>>>>> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts >>>>>>> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq >>>>>>> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm >>>>>>> pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave >>>>>>> avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >>>>>>> invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced >>>>>>> tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 >>>>>>> smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap >>>>>>> clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec >>>>>>> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm >>>>>>> ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d >>>>>>> arch_capabilities >>>>>>> >>>>>>> Test program >>>>>>> >>>>>>> #if defined(__FMA__) >>>>>>> #error FMA >>>>>>> #endif >>>>>>> >>>>>>> #if defined(__AVX512F__) >>>>>>> #error AVX512F >>>>>>> #endif >>>>>>> >>>>>>> #if defined(__AVX2__) >>>>>>> #error AVX2 >>>>>>> #endif >>>>>>> >>>>>>> >>>>>>> icc mytest.c >>>>>>> /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In >>>>>>> function `_start': >>>>>>> (.text+0x20): undefined reference to `main'
