> On 14 Feb 2021, at 4:52 PM, Zhang, Hong via petsc-dev <[email protected]> 
> wrote:
> 
> 
> 
>> On Feb 14, 2021, at 5:05 AM, Patrick Sanan <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>>> Am 14.02.2021 um 07:22 schrieb Barry Smith <[email protected] 
>>> <mailto:[email protected]>>:
>>> 
>>> 
>>> 
>>>> On Feb 13, 2021, at 11:58 PM, Jed Brown <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> I usually configure --with-debugging=0 COPTFLAGS='-O2 -march=native' or 
>>>> similar. There's a tension here between optimizing aggressively for the 
>>>> current machine and making binaries that work on other machines. Most 
>>>> configure systems default to making somewhat portable binaries, so that's 
>>>> a principal of least surprise. (Though you're no novice and seem to have 
>>>> been surprised anyway.)
>>>> 
>>>> I'd kinda prefer if we recommended making portable binaries that run-time 
>>>> detected when to use newer instructions where it matters.
>>> 
>>>   How do we do this? What can we put in configure to do this.
>>> 
>>>   Yes, I never paid attention to the AVX nonsense over the years and never 
>>> realized that Intel and Gnu (and hence PETSc)  both compile by default for 
>>> machines I used in my twenties.
>>> 
>>>   Expecting PETSc users to automatically add -march= is not realistic.  I 
>>> will try to rig something up in configure where if the user does not 
>>> provide march something reasonable is selected. 
>> A softer (yet trivial to implement) option might also be to just alert the 
>> user that these flags exist in the usual message about using default 
>> optimization flags. Something like this would encourage users to do what Jed 
>> is doing:
>> 
>>       ***** WARNING: Using default optimization C flags -g -O3
>> You might consider manually setting optimal optimization flags for your 
>> system with
>> COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for 
>> examples. 
>> In particular, you may want to supply specific flags (e.g. -march=native) 
>> to take advantage of higher-performance instructions.
> 
> I think this is a reasonable thing to do.

This is a reasonable message to print on the screen, but I don’t think this is 
a reasonable flag to impose by default.
You are basically asking all package managers to add a new flag 
(-march=generic) which was previously not needed.

I’m crossing my fingers Jed has a clever way of "making portable binaries that 
run-time detected when to use newer instructions where it matters”, because 
-march=native by default is just not practical when deploying software.

Thanks,
Pierre

> We should also inform users that tuning -march options may enable 
> vectorization instructions such as SSE(3 and above) and AVX but generate 
> nonportable binaries.
> 
> If we add -march=native to the configure test, we will need to run 
> executables to make sure the specified instruction sets are supported by the 
> CPU where the code is running. For PETSc, the executables should cover all 
> the intrinsics we use in the code ideally; otherwise, users will get run-time 
> errors when there is a mismatch in vectorization instructions between 
> compiler support and CPU support.
> 
> Hong
> 
>> 
>> None of the examples in config/examples actually use -march=native, and this 
>> is a very common thing to do that, as you point out, isn't obvious until you 
>> know you have to do it, so it seems to be worth the screen space.
>> 
>> 
>> 
>> 
>> 
>>> 
>>>  Barry
>>> 
>>> 
>>>> 
>>>> Barry Smith <[email protected] <mailto:[email protected]>> writes:
>>>> 
>>>>> Shouldn't configure be setting something appropriate for this 
>>>>> automatically? This is nuts, it means when users do a ./configure make 
>>>>> unless they pass weird arguments they sure as heck don't know about to 
>>>>> the compiler they won't get any of the glory that they expect and that 
>>>>> has been in almost all Intel systems forever.
>>>>> 
>>>>> Barry
>>>>> 
>>>>> I run ./configure --with-debugging=0 and I get none of the stuff added by 
>>>>> Intel for 15+ years?
>>>>> 
>>>>> 
>>>>>> On Feb 13, 2021, at 11:26 PM, Jed Brown <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Use -march=native or similar. The default target is basic x86_64, which 
>>>>>> has only SSE2.
>>>>>> 
>>>>>> Barry Smith <[email protected] <mailto:[email protected]>> writes:
>>>>>> 
>>>>>>> PETSc source has code like defined(__AVX2__) in the source but it does 
>>>>>>> not seem to be able to find any of these macros (icc or gcc) on the 
>>>>>>> petsc-02 system
>>>>>>> 
>>>>>>> Are these macros supposed to be defined? How does on get them to be 
>>>>>>> defined? Why are they not define? What am I doing wrong?
>>>>>>> 
>>>>>>> Keep reading
>>>>>>> 
>>>>>>> $ lscpu 
>>>>>>> Architecture:        x86_64
>>>>>>> CPU op-mode(s):      32-bit, 64-bit
>>>>>>> Byte Order:          Little Endian
>>>>>>> CPU(s):              64
>>>>>>> On-line CPU(s) list: 0-63
>>>>>>> Thread(s) per core:  2
>>>>>>> Core(s) per socket:  16
>>>>>>> Socket(s):           2
>>>>>>> NUMA node(s):        2
>>>>>>> Vendor ID:           GenuineIntel
>>>>>>> CPU family:          6
>>>>>>> Model:               85
>>>>>>> Model name:          Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
>>>>>>> Stepping:            7
>>>>>>> CPU MHz:             1000.603
>>>>>>> CPU max MHz:         2301.0000
>>>>>>> CPU min MHz:         1000.0000
>>>>>>> BogoMIPS:            4600.00
>>>>>>> Virtualization:      VT-x
>>>>>>> L1d cache:           32K
>>>>>>> L1i cache:           32K
>>>>>>> L2 cache:            1024K
>>>>>>> L3 cache:            22528K
>>>>>>> NUMA node0 CPU(s):   0-15,32-47
>>>>>>> NUMA node1 CPU(s):   16-31,48-63
>>>>>>> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
>>>>>>> pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
>>>>>>> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts 
>>>>>>> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq 
>>>>>>> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm 
>>>>>>> pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave 
>>>>>>> avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 
>>>>>>> invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced 
>>>>>>> tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 
>>>>>>> smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap 
>>>>>>> clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec 
>>>>>>> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm 
>>>>>>> ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d 
>>>>>>> arch_capabilities
>>>>>>> 
>>>>>>> Test program 
>>>>>>> 
>>>>>>> #if defined(__FMA__)
>>>>>>> #error FMA
>>>>>>> #endif
>>>>>>> 
>>>>>>> #if defined(__AVX512F__)
>>>>>>> #error AVX512F
>>>>>>> #endif
>>>>>>> 
>>>>>>> #if defined(__AVX2__)
>>>>>>> #error AVX2
>>>>>>> #endif
>>>>>>> 
>>>>>>> 
>>>>>>> icc mytest.c
>>>>>>> /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In 
>>>>>>> function `_start':
>>>>>>> (.text+0x20): undefined reference to `main'

Reply via email to