Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Hongtao Liu writes: > On Thu, Nov 17, 2022 at 9:59 PM Richard Sandiford > wrote: >> >> Hongtao Liu writes: >> > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford >> > wrote: >> >> >> >> Hongtao Liu writes: >> >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford >> >> > wrote: >> >> >> >> >> >> Tamar Christina writes: >> >> >> >> -Original Message- >> >> >> >> From: Hongtao Liu >> >> >> >> Sent: Tuesday, November 15, 2022 9:37 AM >> >> >> >> To: Tamar Christina >> >> >> >> Cc: Richard Sandiford ; Tamar Christina >> >> >> >> via >> >> >> >> Gcc-patches ; nd ; >> >> >> >> rguent...@suse.de >> >> >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of >> >> >> >> subvectors from >> >> >> >> arbitrary element position inside a vector >> >> >> >> >> >> >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > > -Original Message- >> >> >> >> > > From: Hongtao Liu >> >> >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM >> >> >> >> > > To: Tamar Christina >> >> >> >> > > Cc: Richard Sandiford ; Tamar >> >> >> >> > > Christina >> >> >> >> > > via Gcc-patches ; nd ; >> >> >> >> > > rguent...@suse.de >> >> >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of >> >> >> >> > > subvectors from arbitrary element position inside a vector >> >> >> >> > > >> >> >> >> > > Hi: >> >> >> >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- >> >> >> >> > > November/606040.html. >> >> >> >> > > > } >> >> >> >> > > > >> >> >> >> > > >/* See if we can get a better vector mode before >> >> >> >> > > > extracting. >> >> >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index >> >> >> >> > > > >> >> >> >> > > >> >> >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 >> >> >> >> > > 9 >> >> >> >> > > 0 >> >> >> >> > > > a453cc6a28d9 100644 >> >> >> >> > > > --- a/gcc/optabs.cc >> >> >> >> > > > +++ b/gcc/optabs.cc >> >> >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode >> >> >> >> mode, >> >> >> >> > > rtx v0, rtx v1, >> >> >> >> > > >v0_qi = gen_lowpart (qimode, v0); >> >> >> >> > > >v1_qi = gen_lowpart (qimode, v1); >> >> >> >> > > >if (targetm.vectorize.vec_perm_const != NULL >> >> >> >> > > > + && targetm.can_change_mode_class (mode, qimode, >> >> >> >> > > > + ALL_REGS) >> >> >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be >> >> >> >> > > better >> >> >> >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, >> >> >> >> target_qi)). >> >> >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but >> >> >> >> > > not >> >> >> >> > > to guard gen_lowpart. >> >> >> >> > >> >> >> >> > Hmm I don't think this is quite true, there are existing usages in >> >> >> >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I >> >> >> >> > mentioned before for instance the canoncalization of vec_select >> >> >> >> > to subreg >> >> >> >> in rtlanal for ins
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
On Thu, Nov 17, 2022 at 9:59 PM Richard Sandiford wrote: > > Hongtao Liu writes: > > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford > > wrote: > >> > >> Hongtao Liu writes: > >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford > >> > wrote: > >> >> > >> >> Tamar Christina writes: > >> >> >> -Original Message- > >> >> >> From: Hongtao Liu > >> >> >> Sent: Tuesday, November 15, 2022 9:37 AM > >> >> >> To: Tamar Christina > >> >> >> Cc: Richard Sandiford ; Tamar Christina > >> >> >> via > >> >> >> Gcc-patches ; nd ; > >> >> >> rguent...@suse.de > >> >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of > >> >> >> subvectors from > >> >> >> arbitrary element position inside a vector > >> >> >> > >> >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina > >> >> >> wrote: > >> >> >> > > >> >> >> > > -Original Message- > >> >> >> > > From: Hongtao Liu > >> >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM > >> >> >> > > To: Tamar Christina > >> >> >> > > Cc: Richard Sandiford ; Tamar > >> >> >> > > Christina > >> >> >> > > via Gcc-patches ; nd ; > >> >> >> > > rguent...@suse.de > >> >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of > >> >> >> > > subvectors from arbitrary element position inside a vector > >> >> >> > > > >> >> >> > > Hi: > >> >> >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > >> >> >> > > November/606040.html. > >> >> >> > > > } > >> >> >> > > > > >> >> >> > > >/* See if we can get a better vector mode before extracting. > >> >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > >> >> >> > > > > >> >> >> > > > >> >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 > >> >> >> > > 9 > >> >> >> > > 0 > >> >> >> > > > a453cc6a28d9 100644 > >> >> >> > > > --- a/gcc/optabs.cc > >> >> >> > > > +++ b/gcc/optabs.cc > >> >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode > >> >> >> mode, > >> >> >> > > rtx v0, rtx v1, > >> >> >> > > >v0_qi = gen_lowpart (qimode, v0); > >> >> >> > > >v1_qi = gen_lowpart (qimode, v1); > >> >> >> > > >if (targetm.vectorize.vec_perm_const != NULL > >> >> >> > > > + && targetm.can_change_mode_class (mode, qimode, > >> >> >> > > > + ALL_REGS) > >> >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be > >> >> >> > > better > >> >> >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, > >> >> >> target_qi)). > >> >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but > >> >> >> > > not > >> >> >> > > to guard gen_lowpart. > >> >> >> > > >> >> >> > Hmm I don't think this is quite true, there are existing usages in > >> >> >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I > >> >> >> > mentioned before for instance the canoncalization of vec_select to > >> >> >> > subreg > >> >> >> in rtlanal for instances uses this. > >> >> >> In theory, we need to iterate through all reg classes that can be > >> >> >> assigned for > >> >> >> both qimode and mode, if any regclass returns true for > >> >> >> targetm.can_change_mode_class, the bitcast(validate_subreg) should > >> >> >>
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Hongtao Liu writes: > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford > wrote: >> >> Hongtao Liu writes: >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford >> > wrote: >> >> >> >> Tamar Christina writes: >> >> >> -Original Message- >> >> >> From: Hongtao Liu >> >> >> Sent: Tuesday, November 15, 2022 9:37 AM >> >> >> To: Tamar Christina >> >> >> Cc: Richard Sandiford ; Tamar Christina via >> >> >> Gcc-patches ; nd ; >> >> >> rguent...@suse.de >> >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors >> >> >> from >> >> >> arbitrary element position inside a vector >> >> >> >> >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina >> >> >> wrote: >> >> >> > >> >> >> > > -Original Message----- >> >> >> > > From: Hongtao Liu >> >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM >> >> >> > > To: Tamar Christina >> >> >> > > Cc: Richard Sandiford ; Tamar Christina >> >> >> > > via Gcc-patches ; nd ; >> >> >> > > rguent...@suse.de >> >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of >> >> >> > > subvectors from arbitrary element position inside a vector >> >> >> > > >> >> >> > > Hi: >> >> >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- >> >> >> > > November/606040.html. >> >> >> > > > } >> >> >> > > > >> >> >> > > >/* See if we can get a better vector mode before extracting. >> >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index >> >> >> > > > >> >> >> > > >> >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 >> >> >> > > 9 >> >> >> > > 0 >> >> >> > > > a453cc6a28d9 100644 >> >> >> > > > --- a/gcc/optabs.cc >> >> >> > > > +++ b/gcc/optabs.cc >> >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode >> >> >> mode, >> >> >> > > rtx v0, rtx v1, >> >> >> > > >v0_qi = gen_lowpart (qimode, v0); >> >> >> > > >v1_qi = gen_lowpart (qimode, v1); >> >> >> > > >if (targetm.vectorize.vec_perm_const != NULL >> >> >> > > > + && targetm.can_change_mode_class (mode, qimode, >> >> >> > > > + ALL_REGS) >> >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be better >> >> >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, >> >> >> target_qi)). >> >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not >> >> >> > > to guard gen_lowpart. >> >> >> > >> >> >> > Hmm I don't think this is quite true, there are existing usages in >> >> >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I >> >> >> > mentioned before for instance the canoncalization of vec_select to >> >> >> > subreg >> >> >> in rtlanal for instances uses this. >> >> >> In theory, we need to iterate through all reg classes that can be >> >> >> assigned for >> >> >> both qimode and mode, if any regclass returns true for >> >> >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be >> >> >> ok. >> >> >> Here we just passed ALL_REGS. >> >> > >> >> > Yes, and most targets where this transformation is valid return true >> >> > here. >> >> > >> >> > I've checked: >> >> > * alpha >> >> > * arm >> >> > * aarch64 >> >> > * rs6000 >> >> > * s390 >> >> > * sparc >> >> > * pa >> >> > * mips >> >> > >> >> > And even the defa
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford wrote: > > Hongtao Liu writes: > > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford > > wrote: > >> > >> Tamar Christina writes: > >> >> -Original Message- > >> >> From: Hongtao Liu > >> >> Sent: Tuesday, November 15, 2022 9:37 AM > >> >> To: Tamar Christina > >> >> Cc: Richard Sandiford ; Tamar Christina via > >> >> Gcc-patches ; nd ; > >> >> rguent...@suse.de > >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors > >> >> from > >> >> arbitrary element position inside a vector > >> >> > >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina > >> >> wrote: > >> >> > > >> >> > > -Original Message----- > >> >> > > From: Hongtao Liu > >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM > >> >> > > To: Tamar Christina > >> >> > > Cc: Richard Sandiford ; Tamar Christina > >> >> > > via Gcc-patches ; nd ; > >> >> > > rguent...@suse.de > >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of > >> >> > > subvectors from arbitrary element position inside a vector > >> >> > > > >> >> > > Hi: > >> >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > >> >> > > November/606040.html. > >> >> > > > } > >> >> > > > > >> >> > > >/* See if we can get a better vector mode before extracting. > >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > >> >> > > > > >> >> > > > >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 > >> >> > > 9 > >> >> > > 0 > >> >> > > > a453cc6a28d9 100644 > >> >> > > > --- a/gcc/optabs.cc > >> >> > > > +++ b/gcc/optabs.cc > >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode > >> >> mode, > >> >> > > rtx v0, rtx v1, > >> >> > > >v0_qi = gen_lowpart (qimode, v0); > >> >> > > >v1_qi = gen_lowpart (qimode, v1); > >> >> > > >if (targetm.vectorize.vec_perm_const != NULL > >> >> > > > + && targetm.can_change_mode_class (mode, qimode, > >> >> > > > + ALL_REGS) > >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be better > >> >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, > >> >> target_qi)). > >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not > >> >> > > to guard gen_lowpart. > >> >> > > >> >> > Hmm I don't think this is quite true, there are existing usages in > >> >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I > >> >> > mentioned before for instance the canoncalization of vec_select to > >> >> > subreg > >> >> in rtlanal for instances uses this. > >> >> In theory, we need to iterate through all reg classes that can be > >> >> assigned for > >> >> both qimode and mode, if any regclass returns true for > >> >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be > >> >> ok. > >> >> Here we just passed ALL_REGS. > >> > > >> > Yes, and most targets where this transformation is valid return true > >> > here. > >> > > >> > I've checked: > >> > * alpha > >> > * arm > >> > * aarch64 > >> > * rs6000 > >> > * s390 > >> > * sparc > >> > * pa > >> > * mips > >> > > >> > And even the default example that other targets use from the > >> > documentation > >> > would return true as the size of the modes are the same. > >> > > >> > X86 and RISCV are the only two targets that I found (but didn't check > >> > all) that > >> > blankly return a result based on just the register classes. &
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Hongtao Liu writes: > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford > wrote: >> >> Tamar Christina writes: >> >> -Original Message- >> >> From: Hongtao Liu >> >> Sent: Tuesday, November 15, 2022 9:37 AM >> >> To: Tamar Christina >> >> Cc: Richard Sandiford ; Tamar Christina via >> >> Gcc-patches ; nd ; >> >> rguent...@suse.de >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from >> >> arbitrary element position inside a vector >> >> >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina >> >> wrote: >> >> > >> >> > > -Original Message- >> >> > > From: Hongtao Liu >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM >> >> > > To: Tamar Christina >> >> > > Cc: Richard Sandiford ; Tamar Christina >> >> > > via Gcc-patches ; nd ; >> >> > > rguent...@suse.de >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of >> >> > > subvectors from arbitrary element position inside a vector >> >> > > >> >> > > Hi: >> >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- >> >> > > November/606040.html. >> >> > > > } >> >> > > > >> >> > > >/* See if we can get a better vector mode before extracting. >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index >> >> > > > >> >> > > >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 >> >> > > 9 >> >> > > 0 >> >> > > > a453cc6a28d9 100644 >> >> > > > --- a/gcc/optabs.cc >> >> > > > +++ b/gcc/optabs.cc >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode >> >> mode, >> >> > > rtx v0, rtx v1, >> >> > > >v0_qi = gen_lowpart (qimode, v0); >> >> > > >v1_qi = gen_lowpart (qimode, v1); >> >> > > >if (targetm.vectorize.vec_perm_const != NULL >> >> > > > + && targetm.can_change_mode_class (mode, qimode, >> >> > > > + ALL_REGS) >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be better >> >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, >> >> target_qi)). >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not >> >> > > to guard gen_lowpart. >> >> > >> >> > Hmm I don't think this is quite true, there are existing usages in >> >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I >> >> > mentioned before for instance the canoncalization of vec_select to >> >> > subreg >> >> in rtlanal for instances uses this. >> >> In theory, we need to iterate through all reg classes that can be >> >> assigned for >> >> both qimode and mode, if any regclass returns true for >> >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok. >> >> Here we just passed ALL_REGS. >> > >> > Yes, and most targets where this transformation is valid return true here. >> > >> > I've checked: >> > * alpha >> > * arm >> > * aarch64 >> > * rs6000 >> > * s390 >> > * sparc >> > * pa >> > * mips >> > >> > And even the default example that other targets use from the documentation >> > would return true as the size of the modes are the same. >> > >> > X86 and RISCV are the only two targets that I found (but didn't check all) >> > that >> > blankly return a result based on just the register classes. >> > >> > That is to say, there are more targets that adhere to the interpretation >> > that >> > rclass here means "should be possible in some class in rclass" rather than >> > "should be possible in ALL classes of rclass". >> >> Yeah, I agree. A query "can something stored in ALL_REGS change from >> mode M1 to mode M2?" is meaningful if at least one register R in ALL_REGS >> can hold both M1 and M2. It's then the target's job to answer >> conservatively so that the resul
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford wrote: > > Tamar Christina writes: > >> -Original Message- > >> From: Hongtao Liu > >> Sent: Tuesday, November 15, 2022 9:37 AM > >> To: Tamar Christina > >> Cc: Richard Sandiford ; Tamar Christina via > >> Gcc-patches ; nd ; > >> rguent...@suse.de > >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from > >> arbitrary element position inside a vector > >> > >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina > >> wrote: > >> > > >> > > -Original Message- > >> > > From: Hongtao Liu > >> > > Sent: Tuesday, November 15, 2022 8:36 AM > >> > > To: Tamar Christina > >> > > Cc: Richard Sandiford ; Tamar Christina > >> > > via Gcc-patches ; nd ; > >> > > rguent...@suse.de > >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of > >> > > subvectors from arbitrary element position inside a vector > >> > > > >> > > Hi: > >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > >> > > November/606040.html. > >> > > > } > >> > > > > >> > > >/* See if we can get a better vector mode before extracting. > >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > >> > > > > >> > > > >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 > >> > > 9 > >> > > 0 > >> > > > a453cc6a28d9 100644 > >> > > > --- a/gcc/optabs.cc > >> > > > +++ b/gcc/optabs.cc > >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode > >> mode, > >> > > rtx v0, rtx v1, > >> > > >v0_qi = gen_lowpart (qimode, v0); > >> > > >v1_qi = gen_lowpart (qimode, v1); > >> > > >if (targetm.vectorize.vec_perm_const != NULL > >> > > > + && targetm.can_change_mode_class (mode, qimode, > >> > > > + ALL_REGS) > >> > > It looks like you want to guard gen_lowpart, shouldn't it be better > >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, > >> target_qi)). > >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not > >> > > to guard gen_lowpart. > >> > > >> > Hmm I don't think this is quite true, there are existing usages in > >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I > >> > mentioned before for instance the canoncalization of vec_select to subreg > >> in rtlanal for instances uses this. > >> In theory, we need to iterate through all reg classes that can be assigned > >> for > >> both qimode and mode, if any regclass returns true for > >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok. > >> Here we just passed ALL_REGS. > > > > Yes, and most targets where this transformation is valid return true here. > > > > I've checked: > > * alpha > > * arm > > * aarch64 > > * rs6000 > > * s390 > > * sparc > > * pa > > * mips > > > > And even the default example that other targets use from the documentation > > would return true as the size of the modes are the same. > > > > X86 and RISCV are the only two targets that I found (but didn't check all) > > that > > blankly return a result based on just the register classes. > > > > That is to say, there are more targets that adhere to the interpretation > > that > > rclass here means "should be possible in some class in rclass" rather than > > "should be possible in ALL classes of rclass". > > Yeah, I agree. A query "can something stored in ALL_REGS change from > mode M1 to mode M2?" is meaningful if at least one register R in ALL_REGS > can hold both M1 and M2. It's then the target's job to answer > conservatively so that the result covers all such R. > > In principle it's OK for a target to err on the side of caution and forbid > things that are actually OK. But that's going to risk losing performance > in some cases, and sometimes that loss of performance will be unacceptable. > IMO that's what's happening here. The target is applying x87 rules to > things that (AIUI) are never stored in x87 registers, and so losing Y
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Tamar Christina writes: >> -Original Message- >> From: Hongtao Liu >> Sent: Tuesday, November 15, 2022 9:37 AM >> To: Tamar Christina >> Cc: Richard Sandiford ; Tamar Christina via >> Gcc-patches ; nd ; >> rguent...@suse.de >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from >> arbitrary element position inside a vector >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina >> wrote: >> > >> > > -Original Message- >> > > From: Hongtao Liu >> > > Sent: Tuesday, November 15, 2022 8:36 AM >> > > To: Tamar Christina >> > > Cc: Richard Sandiford ; Tamar Christina >> > > via Gcc-patches ; nd ; >> > > rguent...@suse.de >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of >> > > subvectors from arbitrary element position inside a vector >> > > >> > > Hi: >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- >> > > November/606040.html. >> > > > } >> > > > >> > > >/* See if we can get a better vector mode before extracting. >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index >> > > > >> > > >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 >> > > 9 >> > > 0 >> > > > a453cc6a28d9 100644 >> > > > --- a/gcc/optabs.cc >> > > > +++ b/gcc/optabs.cc >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode >> mode, >> > > rtx v0, rtx v1, >> > > >v0_qi = gen_lowpart (qimode, v0); >> > > >v1_qi = gen_lowpart (qimode, v1); >> > > >if (targetm.vectorize.vec_perm_const != NULL >> > > > + && targetm.can_change_mode_class (mode, qimode, >> > > > + ALL_REGS) >> > > It looks like you want to guard gen_lowpart, shouldn't it be better >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, >> target_qi)). >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not >> > > to guard gen_lowpart. >> > >> > Hmm I don't think this is quite true, there are existing usages in >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I >> > mentioned before for instance the canoncalization of vec_select to subreg >> in rtlanal for instances uses this. >> In theory, we need to iterate through all reg classes that can be assigned >> for >> both qimode and mode, if any regclass returns true for >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok. >> Here we just passed ALL_REGS. > > Yes, and most targets where this transformation is valid return true here. > > I've checked: > * alpha > * arm > * aarch64 > * rs6000 > * s390 > * sparc > * pa > * mips > > And even the default example that other targets use from the documentation > would return true as the size of the modes are the same. > > X86 and RISCV are the only two targets that I found (but didn't check all) > that > blankly return a result based on just the register classes. > > That is to say, there are more targets that adhere to the interpretation that > rclass here means "should be possible in some class in rclass" rather than > "should be possible in ALL classes of rclass". Yeah, I agree. A query "can something stored in ALL_REGS change from mode M1 to mode M2?" is meaningful if at least one register R in ALL_REGS can hold both M1 and M2. It's then the target's job to answer conservatively so that the result covers all such R. In principle it's OK for a target to err on the side of caution and forbid things that are actually OK. But that's going to risk losing performance in some cases, and sometimes that loss of performance will be unacceptable. IMO that's what's happening here. The target is applying x87 rules to things that (AIUI) are never stored in x87 registers, and so losing performance as a result. Note that the RA also uses ALL_REGS for some things, so this usage isn't specific to non-RA code. IMO it's not the job of target-independent code to iterate through individual classes and aggregate the result. One of the reasons for having union classes is to avoid the need to do that. And ALL_REGS is the ultimate union class. :-) The patch looks correct to me. Thanks, Richard >> > >> > So there are already existing precedence for this. And the >> > documentation for the hook says: >
RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
> -Original Message- > From: Hongtao Liu > Sent: Tuesday, November 15, 2022 9:37 AM > To: Tamar Christina > Cc: Richard Sandiford ; Tamar Christina via > Gcc-patches ; nd ; > rguent...@suse.de > Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from > arbitrary element position inside a vector > > On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina > wrote: > > > > > -Original Message- > > > From: Hongtao Liu > > > Sent: Tuesday, November 15, 2022 8:36 AM > > > To: Tamar Christina > > > Cc: Richard Sandiford ; Tamar Christina > > > via Gcc-patches ; nd ; > > > rguent...@suse.de > > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of > > > subvectors from arbitrary element position inside a vector > > > > > > Hi: > > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > > > November/606040.html. > > > > } > > > > > > > >/* See if we can get a better vector mode before extracting. > > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > > > > > > > > cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 > > > 9 > > > 0 > > > > a453cc6a28d9 100644 > > > > --- a/gcc/optabs.cc > > > > +++ b/gcc/optabs.cc > > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode > mode, > > > rtx v0, rtx v1, > > > >v0_qi = gen_lowpart (qimode, v0); > > > >v1_qi = gen_lowpart (qimode, v1); > > > >if (targetm.vectorize.vec_perm_const != NULL > > > > + && targetm.can_change_mode_class (mode, qimode, > > > > + ALL_REGS) > > > It looks like you want to guard gen_lowpart, shouldn't it be better > > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, > target_qi)). > > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not > > > to guard gen_lowpart. > > > > Hmm I don't think this is quite true, there are existing usages in > > expr.cc and rtanal.cc That do this and aren't part of RA. As I > > mentioned before for instance the canoncalization of vec_select to subreg > in rtlanal for instances uses this. > In theory, we need to iterate through all reg classes that can be assigned for > both qimode and mode, if any regclass returns true for > targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok. > Here we just passed ALL_REGS. Yes, and most targets where this transformation is valid return true here. I've checked: * alpha * arm * aarch64 * rs6000 * s390 * sparc * pa * mips And even the default example that other targets use from the documentation would return true as the size of the modes are the same. X86 and RISCV are the only two targets that I found (but didn't check all) that blankly return a result based on just the register classes. That is to say, there are more targets that adhere to the interpretation that rclass here means "should be possible in some class in rclass" rather than "should be possible in ALL classes of rclass". > > > > So there are already existing precedence for this. And the > > documentation for the hook says: > > > > "This hook returns true if it is possible to bitcast values held in > > registers of > class rclass from mode from to mode to and if doing so preserves the low- > order bits that are common to both modes. The result is only meaningful if > rclass has registers that can hold both from and to. The default > implementation returns true" > > > > So it looks like it's use outside of RA is perfectly valid.. and the > > documentation also mentions in the example the use from the mid-end as > an example. > > > > But if the mid-end maintainers are happy I'll use something else. > > > > Tamar > > > > > I did similar things in > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html > > > (and ALL_REGS doesn't cover all cases for registers which are both > > > available for qimode and mode, ALL_REGS fail doesn't mean it can't > > > be subreg, it just means parts of ALL_REGS can't be subreg. but with > > > a subset of ALL_REGS, there could be a reg class which return true > > > for > > > targetm.can_change_mode_class) > > > > && targetm.vectorize.vec_perm_const (qimode, qimode, > > > > target_qi, > > > v0_qi, > > > >v1_
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina wrote: > > > -Original Message- > > From: Hongtao Liu > > Sent: Tuesday, November 15, 2022 8:36 AM > > To: Tamar Christina > > Cc: Richard Sandiford ; Tamar Christina via > > Gcc-patches ; nd ; > > rguent...@suse.de > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from > > arbitrary element position inside a vector > > > > Hi: > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > > November/606040.html. > > > } > > > > > >/* See if we can get a better vector mode before extracting. */ > > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > > > > > cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b689616009 > > 0 > > > a453cc6a28d9 100644 > > > --- a/gcc/optabs.cc > > > +++ b/gcc/optabs.cc > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, > > rtx v0, rtx v1, > > >v0_qi = gen_lowpart (qimode, v0); > > >v1_qi = gen_lowpart (qimode, v1); > > >if (targetm.vectorize.vec_perm_const != NULL > > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS) > > It looks like you want to guard gen_lowpart, shouldn't it be better to use > > validate_subreg or (tmp = gen_lowpart_if_possible (mode, target_qi)). > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not to > > guard gen_lowpart. > > Hmm I don't think this is quite true, there are existing usages in expr.cc > and rtanal.cc > That do this and aren't part of RA. As I mentioned before for instance the > canoncalization of vec_select to subreg in rtlanal for instances uses this. In theory, we need to iterate through all reg classes that can be assigned for both qimode and mode, if any regclass returns true for targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok. Here we just passed ALL_REGS. > > So there are already existing precedence for this. And the documentation for > the hook says: > > "This hook returns true if it is possible to bitcast values held in registers > of class rclass from mode from to mode to and if doing so preserves the > low-order bits that are common to both modes. The result is only meaningful > if rclass has registers that can hold both from and to. The default > implementation returns true" > > So it looks like it's use outside of RA is perfectly valid.. and the > documentation also mentions > in the example the use from the mid-end as an example. > > But if the mid-end maintainers are happy I'll use something else. > > Tamar > > > I did similar things in > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html > > (and ALL_REGS doesn't cover all cases for registers which are both available > > for qimode and mode, ALL_REGS fail doesn't mean it can't be subreg, it just > > means parts of ALL_REGS can't be subreg. but with a subset of ALL_REGS, > > there could be a reg class which return true for > > targetm.can_change_mode_class) > > > && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, > > v0_qi, > > >v1_qi, qimode_indices)) > > > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 @@ > > > expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, > > > } > > > > > >if (qimode != VOIDmode > > > - && selector_fits_mode_p (qimode, qimode_indices)) > > > + && selector_fits_mode_p (qimode, qimode_indices) > > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)) > > > { > > >icode = direct_optab_handler (vec_perm_optab, qimode); > > >if (icode != CODE_FOR_nothing) > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c > > > b/gcc/testsuite/gcc.target/aarch64/ext_1.c > > > new file mode 100644 > > > index > > > > > ..18a10a14f1161584267a8472e5 > > 71 > > > b3bc2ddf887a > > > > > > > > > > -- > > BR, > > Hongtao -- BR, Hongtao
RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
> -Original Message- > From: Hongtao Liu > Sent: Tuesday, November 15, 2022 8:36 AM > To: Tamar Christina > Cc: Richard Sandiford ; Tamar Christina via > Gcc-patches ; nd ; > rguent...@suse.de > Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from > arbitrary element position inside a vector > > Hi: > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > November/606040.html. > > } > > > >/* See if we can get a better vector mode before extracting. */ > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > > > cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b689616009 > 0 > > a453cc6a28d9 100644 > > --- a/gcc/optabs.cc > > +++ b/gcc/optabs.cc > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, > rtx v0, rtx v1, > >v0_qi = gen_lowpart (qimode, v0); > >v1_qi = gen_lowpart (qimode, v1); > >if (targetm.vectorize.vec_perm_const != NULL > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS) > It looks like you want to guard gen_lowpart, shouldn't it be better to use > validate_subreg or (tmp = gen_lowpart_if_possible (mode, target_qi)). > IMHO, targetm.can_change_mode_class is mostly used for RA, but not to > guard gen_lowpart. Hmm I don't think this is quite true, there are existing usages in expr.cc and rtanal.cc That do this and aren't part of RA. As I mentioned before for instance the canoncalization of vec_select to subreg in rtlanal for instances uses this. So there are already existing precedence for this. And the documentation for the hook says: "This hook returns true if it is possible to bitcast values held in registers of class rclass from mode from to mode to and if doing so preserves the low-order bits that are common to both modes. The result is only meaningful if rclass has registers that can hold both from and to. The default implementation returns true" So it looks like it's use outside of RA is perfectly valid.. and the documentation also mentions in the example the use from the mid-end as an example. But if the mid-end maintainers are happy I'll use something else. Tamar > I did similar things in > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html > (and ALL_REGS doesn't cover all cases for registers which are both available > for qimode and mode, ALL_REGS fail doesn't mean it can't be subreg, it just > means parts of ALL_REGS can't be subreg. but with a subset of ALL_REGS, > there could be a reg class which return true for > targetm.can_change_mode_class) > > && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, > v0_qi, > >v1_qi, qimode_indices)) > > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 @@ > > expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, > > } > > > >if (qimode != VOIDmode > > - && selector_fits_mode_p (qimode, qimode_indices)) > > + && selector_fits_mode_p (qimode, qimode_indices) > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)) > > { > >icode = direct_optab_handler (vec_perm_optab, qimode); > >if (icode != CODE_FOR_nothing) > > diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c > > b/gcc/testsuite/gcc.target/aarch64/ext_1.c > > new file mode 100644 > > index > > > ..18a10a14f1161584267a8472e5 > 71 > > b3bc2ddf887a > > > > > -- > BR, > Hongtao
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Hi: I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606040.html. > } > >/* See if we can get a better vector mode before extracting. */ > diff --git a/gcc/optabs.cc b/gcc/optabs.cc > index > cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b6896160090a453cc6a28d9 > 100644 > --- a/gcc/optabs.cc > +++ b/gcc/optabs.cc > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx > v1, >v0_qi = gen_lowpart (qimode, v0); >v1_qi = gen_lowpart (qimode, v1); >if (targetm.vectorize.vec_perm_const != NULL > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS) It looks like you want to guard gen_lowpart, shouldn't it be better to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, target_qi)). IMHO, targetm.can_change_mode_class is mostly used for RA, but not to guard gen_lowpart. I did similar things in https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html (and ALL_REGS doesn't cover all cases for registers which are both available for qimode and mode, ALL_REGS fail doesn't mean it can't be subreg, it just means parts of ALL_REGS can't be subreg. but with a subset of ALL_REGS, there could be a reg class which return true for targetm.can_change_mode_class) > && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, > v0_qi, >v1_qi, qimode_indices)) > return gen_lowpart (mode, target_qi); > @@ -6311,7 +6312,8 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx > v1, > } > >if (qimode != VOIDmode > - && selector_fits_mode_p (qimode, qimode_indices)) > + && selector_fits_mode_p (qimode, qimode_indices) > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)) > { >icode = direct_optab_handler (vec_perm_optab, qimode); >if (icode != CODE_FOR_nothing) > diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c > b/gcc/testsuite/gcc.target/aarch64/ext_1.c > new file mode 100644 > index > ..18a10a14f1161584267a8472e571b3bc2ddf887a -- BR, Hongtao
RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Hi, > > ...can we use expand_vec_perm_const here? It will try the constant > expansion first, which is the preferred order. It also has a few variations > up > its sleeve. > We can, however it this function seems to be incorrectly assuming it can always Convert the input mode to a QI vector mode. When I started using it we got a number of miscompilations in the AArch64 codegen. This had the knock-on effect of uncovering bugs in both the AArch64 backend and i386. I'll send patched out for those separately. For now here's the new patch using that hook and updating the permute expansion code: Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * expmed.cc (extract_bit_field_1): Add support for vector element extracts. * optabs.cc (expand_vec_perm_const): Add checks before converting permute to QImode fallback. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ext_1.c: New. --- inline copy of patch --- diff --git a/gcc/expmed.cc b/gcc/expmed.cc index bab020c07222afa38305ef8d7333f271b1965b78..7d38045ae525c8a4665a0c1384fc515e4de88c67 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -1718,6 +1718,21 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum, return target; } } + else if (!known_eq (bitnum, 0U) + && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, )) + { + /* The encoding has a single stepped pattern. */ + poly_uint64 nunits = GET_MODE_NUNITS (new_mode); + vec_perm_builder sel (nunits, 1, 3); + sel.quick_push (pos); + sel.quick_push (pos + 1); + sel.quick_push (pos + 2); + + rtx res + = expand_vec_perm_const (new_mode, op0, op0, sel, new_mode, NULL); + if (res) + return simplify_gen_subreg (tmode, res, new_mode, 0); + } } /* See if we can get a better vector mode before extracting. */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b6896160090a453cc6a28d9 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, v0_qi = gen_lowpart (qimode, v0); v1_qi = gen_lowpart (qimode, v1); if (targetm.vectorize.vec_perm_const != NULL + && targetm.can_change_mode_class (mode, qimode, ALL_REGS) && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, v0_qi, v1_qi, qimode_indices)) return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, } if (qimode != VOIDmode - && selector_fits_mode_p (qimode, qimode_indices)) + && selector_fits_mode_p (qimode, qimode_indices) + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)) { icode = direct_optab_handler (vec_perm_optab, qimode); if (icode != CODE_FOR_nothing) diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c new file mode 100644 index ..18a10a14f1161584267a8472e571b3bc2ddf887a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +typedef unsigned int v4si __attribute__((vector_size (16))); +typedef unsigned int v2si __attribute__((vector_size (8))); + +/* +** extract: { xfail *-*-* } +** ext v0.16b, v0.16b, v0.16b, #4 +** ret +*/ +v2si extract (v4si x) +{ +v2si res = {x[1], x[2]}; +return res; +} + +/* +** extract1: { xfail *-*-* } +** ext v0.16b, v0.16b, v0.16b, #4 +** ret +*/ +v2si extract1 (v4si x) +{ +v2si res; +memcpy (, ((int*))+1, sizeof(res)); +return res; +} + +typedef struct cast { + int a; + v2si b __attribute__((packed)); +} cast_t; + +typedef union Data { + v4si x; + cast_t y; +} data; + +/* +** extract2: +** ext v0.16b, v0.16b, v0.16b, #4 +** ret +*/ +v2si extract2 (v4si x) +{ +data d; +d.x = x; +return d.y.b; +} + rb16242.patch Description: rb16242.patch
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Tamar Christina via Gcc-patches writes: > Hi All, > > The current vector extract pattern can only extract from a vector when the > position to extract is a multiple of the vector bitsize as a whole. > > That means extract something like a V2SI from a V4SI vector from position 32 > isn't possible as 32 is not a multiple of 64. Ideally this optab should have > worked on multiple of the element size, but too many targets rely on this > semantic now. > > So instead add a new case which allows any extraction as long as the bit pos > is a multiple of the element size. We use a VEC_PERM to shuffle the elements > into the bottom parts of the vector and then use a subreg to extract the > values > out. This now allows various vector operations that before were being > decomposed into very inefficient scalar operations. > > NOTE: I added 3 testcases, I only fixed the 3rd one. > > The 1st one missed because we don't optimize VEC_PERM expressions into > bitfields. The 2nd one is missed because extract_bit_field only works on > vector modes. In this case the intermediate extract is DImode. > > On targets where the scalar mode is tiable to vector modes the extract should > work fine. > > However I ran out of time to fix the first two and so will do so in GCC 14. > For now this catches the case that my pattern now introduces more easily. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * expmed.cc (extract_bit_field_1): Add support for vector element > extracts. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/ext_1.c: New. > > --- inline copy of patch -- > diff --git a/gcc/expmed.cc b/gcc/expmed.cc > index > bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4 > 100644 > --- a/gcc/expmed.cc > +++ b/gcc/expmed.cc > @@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, > poly_uint64 bitnum, > return target; > } > } > + else if (!known_eq (bitnum, 0U) > +&& multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, )) > + { > + /* The encoding has a single stepped pattern. */ > + poly_uint64 nunits = GET_MODE_NUNITS (new_mode); > + int nelts = nunits.to_constant (); > + vec_perm_builder sel (nunits, nelts, 1); > + int delta = -pos.to_constant (); > + for (int i = 0; i < nelts; ++i) > + sel.quick_push ((i - delta) % nelts); > + vec_perm_indices indices (sel, 1, nunits); Thanks for doing this, looks good. But I don't think the to_constant calls are safe. new_mode and pos could in principle be nonconstant. To build a stepped pattern, we just need: vec_perm_builder sel (nunits, 1, 3); and then push pos, pos + 1, and pos + 2 to it. There's no need to clamp the position to nelts, it happens automatically. > + > + if (can_vec_perm_const_p (new_mode, new_mode, indices, false)) > + { > + class expand_operand ops[4]; > + machine_mode outermode = new_mode; > + machine_mode innermode = tmode; > + enum insn_code icode > + = direct_optab_handler (vec_perm_optab, outermode); > + target = gen_reg_rtx (outermode); > + if (icode != CODE_FOR_nothing) > + { > + rtx sel = vec_perm_indices_to_rtx (outermode, indices); > + create_output_operand ([0], target, outermode); > + ops[0].target = 1; > + create_input_operand ([1], op0, outermode); > + create_input_operand ([2], op0, outermode); > + create_input_operand ([3], sel, outermode); I think this should be GET_MODE (sel). Looks like the current version would ICE for float vectors. That said... > + if (maybe_expand_insn (icode, 4, ops)) > + return simplify_gen_subreg (innermode, target, outermode, > 0); > + } > + else if (targetm.vectorize.vec_perm_const != NULL) > + { > + if (targetm.vectorize.vec_perm_const (outermode, outermode, > + target, op0, op0, > indices)) > + return simplify_gen_subreg (innermode, target, outermode, > 0); > + } ...can we use expand_vec_perm_const here? It will try the constant expansion first, which is the preferred order. It also has a few variations up its sleeve. Thanks, Richard > + } > + } > } > >/* See if we can get a better vector mode before extracting. */ > diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c > b/gcc/testsuite/gcc.target/aarch64/ext_1.c > new file mode 100644 > index > ..18a10a14f1161584267a8472e571b3bc2ddf887a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c > @@ -0,0 +1,54 @@ > +/* { dg-do compile } */ > +/* {
Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
On 10/31/22 05:57, Tamar Christina wrote: Hi All, The current vector extract pattern can only extract from a vector when the position to extract is a multiple of the vector bitsize as a whole. That means extract something like a V2SI from a V4SI vector from position 32 isn't possible as 32 is not a multiple of 64. Ideally this optab should have worked on multiple of the element size, but too many targets rely on this semantic now. So instead add a new case which allows any extraction as long as the bit pos is a multiple of the element size. We use a VEC_PERM to shuffle the elements into the bottom parts of the vector and then use a subreg to extract the values out. This now allows various vector operations that before were being decomposed into very inefficient scalar operations. NOTE: I added 3 testcases, I only fixed the 3rd one. The 1st one missed because we don't optimize VEC_PERM expressions into bitfields. The 2nd one is missed because extract_bit_field only works on vector modes. In this case the intermediate extract is DImode. On targets where the scalar mode is tiable to vector modes the extract should work fine. However I ran out of time to fix the first two and so will do so in GCC 14. For now this catches the case that my pattern now introduces more easily. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * expmed.cc (extract_bit_field_1): Add support for vector element extracts. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ext_1.c: New. OK. jeff
[PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Hi All, The current vector extract pattern can only extract from a vector when the position to extract is a multiple of the vector bitsize as a whole. That means extract something like a V2SI from a V4SI vector from position 32 isn't possible as 32 is not a multiple of 64. Ideally this optab should have worked on multiple of the element size, but too many targets rely on this semantic now. So instead add a new case which allows any extraction as long as the bit pos is a multiple of the element size. We use a VEC_PERM to shuffle the elements into the bottom parts of the vector and then use a subreg to extract the values out. This now allows various vector operations that before were being decomposed into very inefficient scalar operations. NOTE: I added 3 testcases, I only fixed the 3rd one. The 1st one missed because we don't optimize VEC_PERM expressions into bitfields. The 2nd one is missed because extract_bit_field only works on vector modes. In this case the intermediate extract is DImode. On targets where the scalar mode is tiable to vector modes the extract should work fine. However I ran out of time to fix the first two and so will do so in GCC 14. For now this catches the case that my pattern now introduces more easily. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * expmed.cc (extract_bit_field_1): Add support for vector element extracts. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ext_1.c: New. --- inline copy of patch -- diff --git a/gcc/expmed.cc b/gcc/expmed.cc index bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum, return target; } } + else if (!known_eq (bitnum, 0U) + && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, )) + { + /* The encoding has a single stepped pattern. */ + poly_uint64 nunits = GET_MODE_NUNITS (new_mode); + int nelts = nunits.to_constant (); + vec_perm_builder sel (nunits, nelts, 1); + int delta = -pos.to_constant (); + for (int i = 0; i < nelts; ++i) + sel.quick_push ((i - delta) % nelts); + vec_perm_indices indices (sel, 1, nunits); + + if (can_vec_perm_const_p (new_mode, new_mode, indices, false)) + { + class expand_operand ops[4]; + machine_mode outermode = new_mode; + machine_mode innermode = tmode; + enum insn_code icode + = direct_optab_handler (vec_perm_optab, outermode); + target = gen_reg_rtx (outermode); + if (icode != CODE_FOR_nothing) + { + rtx sel = vec_perm_indices_to_rtx (outermode, indices); + create_output_operand ([0], target, outermode); + ops[0].target = 1; + create_input_operand ([1], op0, outermode); + create_input_operand ([2], op0, outermode); + create_input_operand ([3], sel, outermode); + if (maybe_expand_insn (icode, 4, ops)) + return simplify_gen_subreg (innermode, target, outermode, 0); + } + else if (targetm.vectorize.vec_perm_const != NULL) + { + if (targetm.vectorize.vec_perm_const (outermode, outermode, + target, op0, op0, indices)) + return simplify_gen_subreg (innermode, target, outermode, 0); + } + } + } } /* See if we can get a better vector mode before extracting. */ diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c new file mode 100644 index ..18a10a14f1161584267a8472e571b3bc2ddf887a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +typedef unsigned int v4si __attribute__((vector_size (16))); +typedef unsigned int v2si __attribute__((vector_size (8))); + +/* +** extract: { xfail *-*-* } +** ext v0.16b, v0.16b, v0.16b, #4 +** ret +*/ +v2si extract (v4si x) +{ +v2si res = {x[1], x[2]}; +return res; +} + +/* +** extract1: { xfail *-*-* } +** ext v0.16b, v0.16b, v0.16b, #4 +** ret +*/ +v2si extract1 (v4si x) +{ +v2si res; +memcpy (, ((int*))+1, sizeof(res)); +return res; +} + +typedef struct cast { + int a; + v2si b __attribute__((packed)); +} cast_t; + +typedef union Data { + v4si x; + cast_t y; +} data; + +/* +** extract2: +** ext v0.16b, v0.16b, v0.16b, #4 +** ret +*/ +v2si