Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 7:13 PM Richard Biener wrote: > > On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu wrote: > > > > On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu wrote: > > > > > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu wrote: > > > > > > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > > > > wrote: > > > > > > > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > > > > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt > > > > > > > wrote: > > > > > > > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > > > > similar for scatter, there're > > > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > > > > vectorization for all gather/scatter instructions. The options > > > > > > > > is > > > > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > > > Ok for trunk? > > > > > > > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > > > > tunables add ^use_gather_any to cover all cases? (or > > > > > > > change what use_gather controls - it seems we changed its > > > > > > > meaning before, and instead add use_gather_8parts and > > > > > > > use_gather_16parts) > > > > > > > > > > > > > > That is, what's the point of this? > > The point of this is to keep consistent between GCC, LLVM, and > > ICX(Intel® oneAPI DPC++/C++ Compiler) . > > LLVM,ICX will support that option. > > GCC has very many options that are not the same as LLVM or ICX, > I don't see a good reason to special case this one. As said, it's > a very bad name IMHO. In general terms, yes. But this is a new option, shouldn't it be better to be consistent? And the problem with mfma is mainly that the cpuid is just called fma, but we don't have a cpuid called gather/scatter, with clear document that the option is only for auto-vectorization, -m{no-,}{gather,scattter} looks fine to me. As Honza mentioned, users need to option to turn on/off gather/scatter auto vectorization, I don't think they will expect the option is also valid for intrinsic. If -mtune-crtl= is not suitable for direct exposure to usersusers, then the original proposal should be ok? Developers will manintain the relation between mgather/scatter and -mtune-crtl=XXX to make it consistent between GCC versions. > > Richard. > > > > > > > > > > > > > https://www.phoronix.com/review/downfall > > > > > > > > > > > > that caused: > > > > > > > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > > > > > > > Yes, I know. But there's -mtune-ctl= doing the trick. > > > > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > > > > gather works only on SI/SFmode or larger). > > > > > > > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > > > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > > > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > > > > We don't have an extrat explicit flag for target tune, just single bit > > > > - ix86_tune_features[X86_TUNE_USE_GATHER] > > > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > > > > > > > Richard. > > > > > > > > > > > Uros. > > > > > > > > > > > > > > > > -- > > > > BR, > > > > Hongtao > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > > > > > -- > > BR, > > Hongtao -- BR, Hongtao
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > wrote: > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > > > > > Currently we have 3 different independent tunes for gather > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > similar for scatter, there're > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > vectorization for all gather/scatter instructions. The options is > > > > interpreted by driver to 3 tunes. > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > Ok for trunk? > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > enable part of an ISA but they won't disable the use of intrinsics > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > May I suggest to invent a more generic "short-cut" to > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > tunables add ^use_gather_any to cover all cases? (or > > > change what use_gather controls - it seems we changed its > > > meaning before, and instead add use_gather_8parts and > > > use_gather_16parts) > > > > > > That is, what's the point of this? > > > > https://www.phoronix.com/review/downfall > > > > that caused: > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > Yes, I know. But there's -mtune-ctl= doing the trick. > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > gather works only on SI/SFmode or larger). > > Then -mtune-ctl=^use_gather works which I think is nice enough? -mtune-ctl is really intended for GCC developers. It is not backward compatible, fully documented and bad sets of values may trigger ICEs. If gathers became very slow, I think normal users may want to disable them and in such situation specialized command line option makes sense to me. Honza > > Richard. > > > Uros.
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 11:16 AM Hongtao Liu wrote: > > On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu wrote: > > > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu wrote: > > > > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > > > wrote: > > > > > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > > > wrote: > > > > > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt > > > > > > wrote: > > > > > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > > > similar for scatter, there're > > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > > > vectorization for all gather/scatter instructions. The options is > > > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > > Ok for trunk? > > > > > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > > > tunables add ^use_gather_any to cover all cases? (or > > > > > > change what use_gather controls - it seems we changed its > > > > > > meaning before, and instead add use_gather_8parts and > > > > > > use_gather_16parts) > > > > > > > > > > > > That is, what's the point of this? > The point of this is to keep consistent between GCC, LLVM, and > ICX(Intel® oneAPI DPC++/C++ Compiler) . > LLVM,ICX will support that option. GCC has very many options that are not the same as LLVM or ICX, I don't see a good reason to special case this one. As said, it's a very bad name IMHO. Richard. > > > > > > > > > > https://www.phoronix.com/review/downfall > > > > > > > > > > that caused: > > > > > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > > > > > Yes, I know. But there's -mtune-ctl= doing the trick. > > > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > > > gather works only on SI/SFmode or larger). > > > > > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > > > We don't have an extrat explicit flag for target tune, just single bit > > > - ix86_tune_features[X86_TUNE_USE_GATHER] > > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > > > > > Richard. > > > > > > > > > Uros. > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > > > > > -- > > BR, > > Hongtao > > > > -- > BR, > Hongtao
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 9:55 AM Hongtao Liu wrote: > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > wrote: > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > wrote: > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > similar for scatter, there're > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > vectorization for all gather/scatter instructions. The options is > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > Ok for trunk? > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > tunables add ^use_gather_any to cover all cases? (or > > > > change what use_gather controls - it seems we changed its > > > > meaning before, and instead add use_gather_8parts and > > > > use_gather_16parts) > > > > > > > > That is, what's the point of this? > > > > > > https://www.phoronix.com/review/downfall > > > > > > that caused: > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > Yes, I know. But there's -mtune-ctl= doing the trick. > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > gather works only on SI/SFmode or larger). > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? No, -mtune-ctl=use_gather should turn them all on as well. > We don't have an extrat explicit flag for target tune, just single bit > - ix86_tune_features[X86_TUNE_USE_GATHER] GCC 11 just had that single bit for all. I'm not sure how awkward it is to have use_gather alias use_gather_2_parts, use_gather_4_parts ... > > > > Richard. > > > > > Uros. > > > > -- > BR, > Hongtao
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 4:07 PM Hongtao Liu wrote: > > On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu wrote: > > > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > > wrote: > > > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > > wrote: > > > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt > > > > > wrote: > > > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > > similar for scatter, there're > > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > > vectorization for all gather/scatter instructions. The options is > > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > Ok for trunk? > > > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > > tunables add ^use_gather_any to cover all cases? (or > > > > > change what use_gather controls - it seems we changed its > > > > > meaning before, and instead add use_gather_8parts and > > > > > use_gather_16parts) > > > > > > > > > > That is, what's the point of this? The point of this is to keep consistent between GCC, LLVM, and ICX(Intel® oneAPI DPC++/C++ Compiler) . LLVM,ICX will support that option. > > > > > > > > https://www.phoronix.com/review/downfall > > > > > > > > that caused: > > > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > > > Yes, I know. But there's -mtune-ctl= doing the trick. > > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > > gather works only on SI/SFmode or larger). > > > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > > We don't have an extrat explicit flag for target tune, just single bit > > - ix86_tune_features[X86_TUNE_USE_GATHER] > Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > > > Richard. > > > > > > > Uros. > > > > > > > > -- > > BR, > > Hongtao > > > > -- > BR, > Hongtao -- BR, Hongtao
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 3:55 PM Hongtao Liu wrote: > > On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches > wrote: > > > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > > wrote: > > > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > > > > > > > Currently we have 3 different independent tunes for gather > > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > > similar for scatter, there're > > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > > vectorization for all gather/scatter instructions. The options is > > > > > interpreted by driver to 3 tunes. > > > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > Ok for trunk? > > > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > > enable part of an ISA but they won't disable the use of intrinsics > > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > > > May I suggest to invent a more generic "short-cut" to > > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > > tunables add ^use_gather_any to cover all cases? (or > > > > change what use_gather controls - it seems we changed its > > > > meaning before, and instead add use_gather_8parts and > > > > use_gather_16parts) > > > > > > > > That is, what's the point of this? > > > > > > https://www.phoronix.com/review/downfall > > > > > > that caused: > > > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > > > Yes, I know. But there's -mtune-ctl= doing the trick. > > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > > gather works only on SI/SFmode or larger). > > > > Then -mtune-ctl=^use_gather works which I think is nice enough? > So basically, -mtune-ctrl=^use_gather is used to turn off all gather > vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? > We don't have an extrat explicit flag for target tune, just single bit > - ix86_tune_features[X86_TUNE_USE_GATHER] Looks like I can handle it specially in parse_mtune_ctrl_str, let me try. > > > > Richard. > > > > > Uros. > > > > -- > BR, > Hongtao -- BR, Hongtao
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 3:49 PM Richard Biener via Gcc-patches wrote: > > On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > wrote: > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > > > > > Currently we have 3 different independent tunes for gather > > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > > similar for scatter, there're > > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > > > The patch support 2 standardizing options to enable/disable > > > > vectorization for all gather/scatter instructions. The options is > > > > interpreted by driver to 3 tunes. > > > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > Ok for trunk? > > > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > > enable part of an ISA but they won't disable the use of intrinsics > > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > > > May I suggest to invent a more generic "short-cut" to > > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > > tunables add ^use_gather_any to cover all cases? (or > > > change what use_gather controls - it seems we changed its > > > meaning before, and instead add use_gather_8parts and > > > use_gather_16parts) > > > > > > That is, what's the point of this? > > > > https://www.phoronix.com/review/downfall > > > > that caused: > > > > https://www.phoronix.com/review/intel-downfall-benchmarks > > Yes, I know. But there's -mtune-ctl= doing the trick. > GCC 11 had only 'use_gather', covering all number of lanes. I suggest > to resurrect that behavior and add use_gather_8+parts (or two, IIRC > gather works only on SI/SFmode or larger). > > Then -mtune-ctl=^use_gather works which I think is nice enough? So basically, -mtune-ctrl=^use_gather is used to turn off all gather vectorization, but -mtune-ctrl=use_gather doesn't turn on all of them? We don't have an extrat explicit flag for target tune, just single bit - ix86_tune_features[X86_TUNE_USE_GATHER] > > Richard. > > > Uros. -- BR, Hongtao
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > wrote: > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > > > Currently we have 3 different independent tunes for gather > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > similar for scatter, there're > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > The patch support 2 standardizing options to enable/disable > > > vectorization for all gather/scatter instructions. The options is > > > interpreted by driver to 3 tunes. > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > Ok for trunk? > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > enable part of an ISA but they won't disable the use of intrinsics > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > May I suggest to invent a more generic "short-cut" to > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > tunables add ^use_gather_any to cover all cases? (or > > change what use_gather controls - it seems we changed its > > meaning before, and instead add use_gather_8parts and > > use_gather_16parts) > > > > That is, what's the point of this? > > https://www.phoronix.com/review/downfall > > that caused: > > https://www.phoronix.com/review/intel-downfall-benchmarks Yes, I know. But there's -mtune-ctl= doing the trick. GCC 11 had only 'use_gather', covering all number of lanes. I suggest to resurrect that behavior and add use_gather_8+parts (or two, IIRC gather works only on SI/SFmode or larger). Then -mtune-ctl=^use_gather works which I think is nice enough? Richard. > Uros.
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 9:40 AM Richard Biener wrote: > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > The patch support 2 standardizing options to enable/disable > > vectorization for all gather/scatter instructions. The options is > > interpreted by driver to 3 tunes. > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > Ok for trunk? > > I think -mgather/-mscatter are too close to -mfma suggesting they > enable part of an ISA but they won't disable the use of intrinsics > or enable gather/scatter on CPUs where the ISA doesn't have them. > > May I suggest to invent a more generic "short-cut" to > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > tunables add ^use_gather_any to cover all cases? (or > change what use_gather controls - it seems we changed its > meaning before, and instead add use_gather_8parts and > use_gather_16parts) > > That is, what's the point of this? https://www.phoronix.com/review/downfall that caused: https://www.phoronix.com/review/intel-downfall-benchmarks Uros.
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to enable/disable > vectorization for all gather/scatter instructions. The options is > interpreted by driver to 3 tunes. > > bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? I think -mgather/-mscatter are too close to -mfma suggesting they enable part of an ISA but they won't disable the use of intrinsics or enable gather/scatter on CPUs where the ISA doesn't have them. May I suggest to invent a more generic "short-cut" to -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter tunables add ^use_gather_any to cover all cases? (or change what use_gather controls - it seems we changed its meaning before, and instead add use_gather_8parts and use_gather_16parts) That is, what's the point of this? Richard. > gcc/ChangeLog: > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > GATHER_SCATTER_DRIVER_SELF_SPECS. > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > * config/i386/i386.opt (mgather): New option. > (mscatter): Ditto. > --- > gcc/config/i386/i386.h | 12 +++- > gcc/config/i386/i386.opt | 8 > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..d9ac2c29bde 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > # define SUBTARGET_DRIVER_SELF_SPECS "" > #endif > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > + > "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > + > %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} > \ > + %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > +#endif > + > +#define DRIVER_SELF_SPECS \ > + SUBTARGET_DRIVER_SELF_SPECS " " \ > + GATHER_SCATTER_DRIVER_SELF_SPECS > > /* -march=native handling only makes sense with compiler running on > an x86 or x86_64 chip. If changing this condition, also change > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index ddb7f110aa2..99948644a8d 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -424,6 +424,14 @@ mdaz-ftz > Target > Set the FTZ and DAZ Flags. > > +mgather > +Target > +Enable vectorization for gather instruction. > + > +mscatter > +Target > +Enable vectorization for scatter instruction. > + > mpreferred-stack-boundary= > Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg) > Attempt to keep stack aligned to this power of 2. > -- > 2.31.1 >
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 2:04 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > The patch support 2 standardizing options to enable/disable > > vectorization for all gather/scatter instructions. The options is > > interpreted by driver to 3 tunes. > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > > GATHER_SCATTER_DRIVER_SELF_SPECS. > > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > > * config/i386/i386.opt (mgather): New option. > > (mscatter): Ditto. > > --- > > gcc/config/i386/i386.h | 12 +++- > > gcc/config/i386/i386.opt | 8 > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > index ef342fcee9b..d9ac2c29bde 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > > # define SUBTARGET_DRIVER_SELF_SPECS "" > > #endif > > > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > > + > > "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} > > \ > > + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > > + > > %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} > > \ > > + > > %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > > +#endif > > + > > +#define DRIVER_SELF_SPECS \ > > + SUBTARGET_DRIVER_SELF_SPECS " " \ > > + GATHER_SCATTER_DRIVER_SELF_SPECS > > > > /* -march=native handling only makes sense with compiler running on > > an x86 or x86_64 chip. If changing this condition, also change > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > > index ddb7f110aa2..99948644a8d 100644 > > --- a/gcc/config/i386/i386.opt > > +++ b/gcc/config/i386/i386.opt > > @@ -424,6 +424,14 @@ mdaz-ftz > > Target > > Set the FTZ and DAZ Flags. > > > > +mgather > > +Target > > +Enable vectorization for gather instruction. > > + > > +mscatter > > +Target > > +Enable vectorization for scatter instruction. > > Are gather and scatter instructions affected in a separate way, or > should we use one -mgather-scatter option to cover all gather/scatter > tunings? A separate way. Gather Data Sampling is only for gather. https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html > > Uros. > > > + > > mpreferred-stack-boundary= > > Target RejectNegative Joined UInteger > > Var(ix86_preferred_stack_boundary_arg) > > Attempt to keep stack aligned to this power of 2. > > -- > > 2.31.1 > > -- BR, Hongtao
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to enable/disable > vectorization for all gather/scatter instructions. The options is > interpreted by driver to 3 tunes. > > bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > GATHER_SCATTER_DRIVER_SELF_SPECS. > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > * config/i386/i386.opt (mgather): New option. > (mscatter): Ditto. > --- > gcc/config/i386/i386.h | 12 +++- > gcc/config/i386/i386.opt | 8 > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..d9ac2c29bde 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > # define SUBTARGET_DRIVER_SELF_SPECS "" > #endif > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > + > "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > + > %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} > \ > + %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > +#endif > + > +#define DRIVER_SELF_SPECS \ > + SUBTARGET_DRIVER_SELF_SPECS " " \ > + GATHER_SCATTER_DRIVER_SELF_SPECS > > /* -march=native handling only makes sense with compiler running on > an x86 or x86_64 chip. If changing this condition, also change > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index ddb7f110aa2..99948644a8d 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -424,6 +424,14 @@ mdaz-ftz > Target > Set the FTZ and DAZ Flags. > > +mgather > +Target > +Enable vectorization for gather instruction. > + > +mscatter > +Target > +Enable vectorization for scatter instruction. Are gather and scatter instructions affected in a separate way, or should we use one -mgather-scatter option to cover all gather/scatter tunings? Uros. > + > mpreferred-stack-boundary= > Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg) > Attempt to keep stack aligned to this power of 2. > -- > 2.31.1 >
RE: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
> -Original Message- > From: Xi Ruoyao > Sent: Thursday, August 10, 2023 9:48 AM > To: Liu, Hongtao ; gcc-patches@gcc.gnu.org > Cc: richard.guent...@gmail.com; ubiz...@gmail.com; hubi...@ucw.cz > Subject: Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable > vectorization for all gather/scatter instructions. > > On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote: > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > The patch support 2 standardizing options to enable/disable > > vectorization for all gather/scatter instructions. The options is > > interpreted by driver to 3 tunes. > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > Ok for trunk? > > And should we set -mno-gather as the default for GDS affected processors? > We'll likely apply the ucode update for them, and then the gathering > instructions will be much slower. Assume you're talking about https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html Yes, there will be an separate patch for microarchitecture tuning. > > > gcc/ChangeLog: > > > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > > GATHER_SCATTER_DRIVER_SELF_SPECS. > > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > > * config/i386/i386.opt (mgather): New option. > > (mscatter): Ditto. > > --- > > gcc/config/i386/i386.h | 12 +++- > > gcc/config/i386/i386.opt | 8 > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index > > ef342fcee9b..d9ac2c29bde 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > > # define SUBTARGET_DRIVER_SELF_SPECS "" > > #endif > > > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS # define > > +GATHER_SCATTER_DRIVER_SELF_SPECS \ > > + "%{mno-gather:-mtune- > > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > > + %{mgather:-mtune- > > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > > + %{mno-scatter:-mtune- > > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > > + %{mscatter:-mtune- > > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > > +#endif > > + > > +#define DRIVER_SELF_SPECS \ > > + SUBTARGET_DRIVER_SELF_SPECS " " \ > > + GATHER_SCATTER_DRIVER_SELF_SPECS > > > > /* -march=native handling only makes sense with compiler running on > > an x86 or x86_64 chip. If changing this condition, also change > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index > > ddb7f110aa2..99948644a8d 100644 > > --- a/gcc/config/i386/i386.opt > > +++ b/gcc/config/i386/i386.opt > > @@ -424,6 +424,14 @@ mdaz-ftz > > Target > > Set the FTZ and DAZ Flags. > > > > +mgather > > +Target > > +Enable vectorization for gather instruction. > > + > > +mscatter > > +Target > > +Enable vectorization for scatter instruction. > > + > > mpreferred-stack-boundary= > > Target RejectNegative Joined UInteger > > Var(ix86_preferred_stack_boundary_arg) > > Attempt to keep stack aligned to this power of 2. > > -- > Xi Ruoyao > School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote: > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to enable/disable > vectorization for all gather/scatter instructions. The options is > interpreted by driver to 3 tunes. > > bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? And should we set -mno-gather as the default for GDS affected processors? We'll likely apply the ucode update for them, and then the gathering instructions will be much slower. > gcc/ChangeLog: > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > GATHER_SCATTER_DRIVER_SELF_SPECS. > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > * config/i386/i386.opt (mgather): New option. > (mscatter): Ditto. > --- > gcc/config/i386/i386.h | 12 +++- > gcc/config/i386/i386.opt | 8 > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..d9ac2c29bde 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > # define SUBTARGET_DRIVER_SELF_SPECS "" > #endif > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > + "%{mno-gather:-mtune- > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > + %{mgather:-mtune- > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > + %{mno-scatter:-mtune- > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > + %{mscatter:-mtune- > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > +#endif > + > +#define DRIVER_SELF_SPECS \ > + SUBTARGET_DRIVER_SELF_SPECS " " \ > + GATHER_SCATTER_DRIVER_SELF_SPECS > > /* -march=native handling only makes sense with compiler running on > an x86 or x86_64 chip. If changing this condition, also change > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index ddb7f110aa2..99948644a8d 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -424,6 +424,14 @@ mdaz-ftz > Target > Set the FTZ and DAZ Flags. > > +mgather > +Target > +Enable vectorization for gather instruction. > + > +mscatter > +Target > +Enable vectorization for scatter instruction. > + > mpreferred-stack-boundary= > Target RejectNegative Joined UInteger > Var(ix86_preferred_stack_boundary_arg) > Attempt to keep stack aligned to this power of 2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
Currently we have 3 different independent tunes for gather "use_gather,use_gather_2parts,use_gather_4parts", similar for scatter, there're "use_scatter,use_scatter_2parts,use_scatter_4parts" The patch support 2 standardizing options to enable/disable vectorization for all gather/scatter instructions. The options is interpreted by driver to 3 tunes. bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.h (DRIVER_SELF_SPECS): Add GATHER_SCATTER_DRIVER_SELF_SPECS. (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. * config/i386/i386.opt (mgather): New option. (mscatter): Ditto. --- gcc/config/i386/i386.h | 12 +++- gcc/config/i386/i386.opt | 8 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ef342fcee9b..d9ac2c29bde 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; # define SUBTARGET_DRIVER_SELF_SPECS "" #endif -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ + "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ + %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ + %{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ + %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" +#endif + +#define DRIVER_SELF_SPECS \ + SUBTARGET_DRIVER_SELF_SPECS " " \ + GATHER_SCATTER_DRIVER_SELF_SPECS /* -march=native handling only makes sense with compiler running on an x86 or x86_64 chip. If changing this condition, also change diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index ddb7f110aa2..99948644a8d 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -424,6 +424,14 @@ mdaz-ftz Target Set the FTZ and DAZ Flags. +mgather +Target +Enable vectorization for gather instruction. + +mscatter +Target +Enable vectorization for scatter instruction. + mpreferred-stack-boundary= Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg) Attempt to keep stack aligned to this power of 2. -- 2.31.1