Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-12-06 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, December 6, 2022 10:28 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
>> 
>> Tamar Christina  writes:
>> > Hi,
>> >
>> >
>> >> This name might cause confusion with the SVE iterators, where FULL
>> >> means "every bit of the register is used".  How about something like
>> >> VMOVE instead?
>> >>
>> >> With this change, I guess VALL_F16 represents "The set of all modes
>> >> for which the vld1 intrinsics are provided" and VMOVE or whatever is
>> >> "All Advanced SIMD modes suitable for moving, loading, and storing".
>> >> That is, VMOVE extends VALL_F16 with modes that are not manifested
>> >> via intrinsics.
>> >>
>> >
>> > Done.
>> >
>> >> Where is the 2h used, and is it valid syntax in that context?
>> >>
>> >> Same for later instances of 2h.
>> >
>> > They are, but they weren't meant to be in this patch.  They belong in
>> > a separate FP16 series that I won't get to finish for GCC 13 due not
>> > being able to finish writing all the tests.  I have moved them to that 
>> > patch
>> series though.
>> >
>> > While the addp patch series has been killed, this patch is still good
>> > standalone and improves codegen as shown in the updated testcase.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
>> >(mov, movmisalign, aarch64_dup_lane,
>> >aarch64_store_lane0, aarch64_simd_vec_set,
>> >@aarch64_simd_vec_copy_lane, vec_set,
>> >reduc__scal_, reduc__scal_,
>> >aarch64_reduc__internal,
>> aarch64_get_lane,
>> >vec_init, vec_extract): Support V2HF.
>> >(aarch64_simd_dupv2hf): New.
>> >* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
>> >Add E_V2HFmode.
>> >* config/aarch64/iterators.md (VHSDF_P): New.
>> >(V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
>> >Vel, q, vp): Add V2HF.
>> >* config/arm/types.md (neon_fp_reduc_add_h): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >* gcc.target/aarch64/sve/slp_1.c: Update testcase.
>> >
>> > --- inline copy of patch ---
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> > b/gcc/config/aarch64/aarch64-simd.md
>> > index
>> >
>> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
>> e6
>> > c2d578fca4b7 100644
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -19,10 +19,10 @@
>> >  ;; <http://www.gnu.org/licenses/>.
>> >
>> >  (define_expand "mov"
>> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
>> > -  (match_operand:VALL_F16 1 "general_operand"))]
>> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
>> > +  (match_operand:VMOVE 1 "general_operand"))]
>> >"TARGET_SIMD"
>> > -  "
>> > +{
>> >/* Force the operand into a register if it is not an
>> >   immediate whose use can be replaced with xzr.
>> >   If the mode is 16 bytes wide, then we will be doing @@ -46,12
>> > +46,11 @@ (define_expand "mov"
>> >aarch64_expand_vector_init (operands[0], operands[1]);
>> >DONE;
>> >  }
>> > -  "
>> > -)
>> > +})
>> >
>> >  (define_expand "movmisalign"
>> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
>> > -(match_operand:VALL_F16 1 "general_operand"))]
>> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
>> > +(match_operand:VMOVE 1 "general_operand"))]
>> >"TARGET_SIMD && !STRICT_ALIGNMENT"
>> >  {
>> >/* This pattern is not permitted to fail during expansion: if both
>> > arguments @@ -73,6 +72,16 @@ (define_insn
>> "aarch64_simd_dup"
>> >[(set_attr "type" "neon_dup, neon_from_gp")]
>> >  )
>> >
>> > +(define_insn "aarch64_simd_dupv2hf"
>> > +  [(set (match_operand:V2HF 0 "register_operand" "=w")
>> > +  (vec_duplicate:V2HF
>> > +(match_operand:HF 1 "register_operand" "0")))]
>> 
>> Seems like this should be "w" rather than "0", since SLI is a two-register
>> instruction.
>
> Yes, but for a dup it's only valid when the same register is used. i.e. it 
> has to
> write into the original src register.

Ah, right.  In that case it might be better to use %d0 for the source
operand:

  For operands to match in a particular case usually means that they
  are identical-looking RTL expressions.  But in a few special cases
  specific kinds of dissimilarity are allowed.  For example, @code{*x}
  as an input operand will match @code{*x++} as an output operand.
  For proper results in such cases, the output template should always
  use the output-operand's number when printing the operand.

Thanks,
Richard


RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-12-06 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, December 6, 2022 10:28 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> 
> Tamar Christina  writes:
> > Hi,
> >
> >
> >> This name might cause confusion with the SVE iterators, where FULL
> >> means "every bit of the register is used".  How about something like
> >> VMOVE instead?
> >>
> >> With this change, I guess VALL_F16 represents "The set of all modes
> >> for which the vld1 intrinsics are provided" and VMOVE or whatever is
> >> "All Advanced SIMD modes suitable for moving, loading, and storing".
> >> That is, VMOVE extends VALL_F16 with modes that are not manifested
> >> via intrinsics.
> >>
> >
> > Done.
> >
> >> Where is the 2h used, and is it valid syntax in that context?
> >>
> >> Same for later instances of 2h.
> >
> > They are, but they weren't meant to be in this patch.  They belong in
> > a separate FP16 series that I won't get to finish for GCC 13 due not
> > being able to finish writing all the tests.  I have moved them to that patch
> series though.
> >
> > While the addp patch series has been killed, this patch is still good
> > standalone and improves codegen as shown in the updated testcase.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
> > (mov, movmisalign, aarch64_dup_lane,
> > aarch64_store_lane0, aarch64_simd_vec_set,
> > @aarch64_simd_vec_copy_lane, vec_set,
> > reduc__scal_, reduc__scal_,
> > aarch64_reduc__internal,
> aarch64_get_lane,
> > vec_init, vec_extract): Support V2HF.
> > (aarch64_simd_dupv2hf): New.
> > * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
> > Add E_V2HFmode.
> > * config/aarch64/iterators.md (VHSDF_P): New.
> > (V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
> > Vel, q, vp): Add V2HF.
> > * config/arm/types.md (neon_fp_reduc_add_h): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/slp_1.c: Update testcase.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
> e6
> > c2d578fca4b7 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -19,10 +19,10 @@
> >  ;; <http://www.gnu.org/licenses/>.
> >
> >  (define_expand "mov"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -   (match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +   (match_operand:VMOVE 1 "general_operand"))]
> >"TARGET_SIMD"
> > -  "
> > +{
> >/* Force the operand into a register if it is not an
> >   immediate whose use can be replaced with xzr.
> >   If the mode is 16 bytes wide, then we will be doing @@ -46,12
> > +46,11 @@ (define_expand "mov"
> >aarch64_expand_vector_init (operands[0], operands[1]);
> >DONE;
> >  }
> > -  "
> > -)
> > +})
> >
> >  (define_expand "movmisalign"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -(match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +(match_operand:VMOVE 1 "general_operand"))]
> >"TARGET_SIMD && !STRICT_ALIGNMENT"
> >  {
> >/* This pattern is not permitted to fail during expansion: if both
> > arguments @@ -73,6 +72,16 @@ (define_insn
> "aarch64_simd_dup"
> >[(set_attr "type" "neon_dup, neon_from_gp")]
> >  )
> >
> > +(define_insn "aarch64_simd_dupv2hf"
> > +  [(set (match_operand:V2HF 0 "register_operand" "=w")
> > +   (vec_duplicate:V2HF
> > + (match_operand:HF 1 "register_operand" "0")))]
>

Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-12-06 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi,
>
>
>> This name might cause confusion with the SVE iterators, where FULL means
>> "every bit of the register is used".  How about something like VMOVE
>> instead?
>> 
>> With this change, I guess VALL_F16 represents "The set of all modes for
>> which the vld1 intrinsics are provided" and VMOVE or whatever is "All
>> Advanced SIMD modes suitable for moving, loading, and storing".
>> That is, VMOVE extends VALL_F16 with modes that are not manifested via
>> intrinsics.
>> 
>
> Done.
>
>> Where is the 2h used, and is it valid syntax in that context?
>> 
>> Same for later instances of 2h.
>
> They are, but they weren't meant to be in this patch.  They belong in a 
> separate FP16 series that
> I won't get to finish for GCC 13 due not being able to finish writing all the 
> tests.  I have moved them
> to that patch series though.
>
> While the addp patch series has been killed, this patch is still good 
> standalone and improves codegen
> as shown in the updated testcase.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
>   (mov, movmisalign, aarch64_dup_lane,
>   aarch64_store_lane0, aarch64_simd_vec_set,
>   @aarch64_simd_vec_copy_lane, vec_set,
>   reduc__scal_, reduc__scal_,
>   aarch64_reduc__internal, aarch64_get_lane,
>   vec_init, vec_extract): Support V2HF.
>   (aarch64_simd_dupv2hf): New.
>   * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
>   Add E_V2HFmode.
>   * config/aarch64/iterators.md (VHSDF_P): New.
>   (V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
>   Vel, q, vp): Add V2HF.
>   * config/arm/types.md (neon_fp_reduc_add_h): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/slp_1.c: Update testcase.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661e6c2d578fca4b7
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -19,10 +19,10 @@
>  ;; .
>  
>  (define_expand "mov"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> - (match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> + (match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD"
> -  "
> +{
>/* Force the operand into a register if it is not an
>   immediate whose use can be replaced with xzr.
>   If the mode is 16 bytes wide, then we will be doing
> @@ -46,12 +46,11 @@ (define_expand "mov"
>aarch64_expand_vector_init (operands[0], operands[1]);
>DONE;
>  }
> -  "
> -)
> +})
>  
>  (define_expand "movmisalign"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> -(match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> +(match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD && !STRICT_ALIGNMENT"
>  {
>/* This pattern is not permitted to fail during expansion: if both 
> arguments
> @@ -73,6 +72,16 @@ (define_insn "aarch64_simd_dup"
>[(set_attr "type" "neon_dup, neon_from_gp")]
>  )
>  
> +(define_insn "aarch64_simd_dupv2hf"
> +  [(set (match_operand:V2HF 0 "register_operand" "=w")
> + (vec_duplicate:V2HF
> +   (match_operand:HF 1 "register_operand" "0")))]

Seems like this should be "w" rather than "0", since SLI is a
two-register instruction.

> +  "TARGET_SIMD"
> +  "@
> +   sli\\t%d0, %d1, 16"
> +  [(set_attr "type" "neon_shift_imm")]
> +)
> +
>  (define_insn "aarch64_simd_dup"
>[(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
>   (vec_duplicate:VDQF_F16
> @@ -85,10 +94,10 @@ (define_insn "aarch64_simd_dup"
>  )
>  
>  (define_insn "aarch64_dup_lane"
> -  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
> - (vec_duplicate:VALL_F16
> +  [(set (match_operand:VMOVE 0 "register_operand" "=w")
> + (vec_duplicate:VMOVE
> (vec_select:
> - (match_operand:VALL_F16 1 "register_operand" "w")
> + (match_operand:VMOVE 1 "register_operand" "w")
>   (parallel [(match_operand:SI 2 "immediate_operand" "i")])
>)))]
>"TARGET_SIMD"
> @@ -142,6 +151,29 @@ (define_insn "*aarch64_simd_mov"
>mov_reg, neon_move")]
>  )
>  
> +(define_insn "*aarch64_simd_movv2hf"
> +  [(set (match_operand:V2HF 0 "nonimmediate_operand"
> + "=w, m,  m,  w, ?r, ?w, ?r, w, w")
> + (match_operand:V2HF 1 "general_operand"
> + "m,  Dz, w,  w,  w,  r,  r, Dz, Dn"))]
> +  "TARGET_SIMD_F16INST
> +   && (register_operand (operands[0], V2HFmode)
> +   || aarch64_simd_reg_or_zero (operands[1], V2HFmode))"
> +   "@
> +ldr\\t%s0, %1
> +str\\twzr, %0
> +

RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-29 Thread Tamar Christina via Gcc-patches
Ping x3

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, November 22, 2022 4:01 PM
> To: Tamar Christina ; Richard Sandiford
> 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> 
> Ping
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> > Christina via Gcc-patches
> > Sent: Friday, November 11, 2022 2:40 PM
> > To: Richard Sandiford 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > ; Marcus Shawcroft
> > ; Kyrylo Tkachov
> 
> > Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> >
> > Hi,
> >
> >
> > > This name might cause confusion with the SVE iterators, where FULL
> > > means "every bit of the register is used".  How about something like
> > > VMOVE instead?
> > >
> > > With this change, I guess VALL_F16 represents "The set of all modes
> > > for which the vld1 intrinsics are provided" and VMOVE or whatever is
> > > "All Advanced SIMD modes suitable for moving, loading, and storing".
> > > That is, VMOVE extends VALL_F16 with modes that are not manifested
> > > via intrinsics.
> > >
> >
> > Done.
> >
> > > Where is the 2h used, and is it valid syntax in that context?
> > >
> > > Same for later instances of 2h.
> >
> > They are, but they weren't meant to be in this patch.  They belong in
> > a separate FP16 series that I won't get to finish for GCC 13 due not
> > being able to finish writing all the tests.  I have moved them to that patch
> series though.
> >
> > While the addp patch series has been killed, this patch is still good
> > standalone and improves codegen as shown in the updated testcase.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
> > (mov, movmisalign, aarch64_dup_lane,
> > aarch64_store_lane0, aarch64_simd_vec_set,
> > @aarch64_simd_vec_copy_lane, vec_set,
> > reduc__scal_, reduc__scal_,
> > aarch64_reduc__internal,
> > aarch64_get_lane,
> > vec_init, vec_extract): Support V2HF.
> > (aarch64_simd_dupv2hf): New.
> > * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
> > Add E_V2HFmode.
> > * config/aarch64/iterators.md (VHSDF_P): New.
> > (V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
> > Vel, q, vp): Add V2HF.
> > * config/arm/types.md (neon_fp_reduc_add_h): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/slp_1.c: Update testcase.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
> > e6c2d578fca4b7 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -19,10 +19,10 @@
> >  ;; <http://www.gnu.org/licenses/>.
> >
> >  (define_expand "mov"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -   (match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +   (match_operand:VMOVE 1 "general_operand"))]
> >"TARGET_SIMD"
> > -  "
> > +{
> >/* Force the operand into a register if it is not an
> >   immediate whose use can be replaced with xzr.
> >   If the mode is 16 bytes wide, then we will be doing @@ -46,12
> > +46,11 @@ (define_expand "mov"
> >aarch64_expand_vector_init (operands[0], operands[1]);
> >DONE;
> >  }
> > -  "
> > -)
> > +})
> >
> >  (define_expand "movmisalign"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -(match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +(match_operand:VMOVE 1 "general_operand"))]
> >"TARGET_SIMD && !STRICT_ALIGNMENT"
> >  {
&

RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-22 Thread Tamar Christina via Gcc-patches
Ping

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Friday, November 11, 2022 2:40 PM
> To: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> 
> Hi,
> 
> 
> > This name might cause confusion with the SVE iterators, where FULL
> > means "every bit of the register is used".  How about something like
> > VMOVE instead?
> >
> > With this change, I guess VALL_F16 represents "The set of all modes
> > for which the vld1 intrinsics are provided" and VMOVE or whatever is
> > "All Advanced SIMD modes suitable for moving, loading, and storing".
> > That is, VMOVE extends VALL_F16 with modes that are not manifested via
> > intrinsics.
> >
> 
> Done.
> 
> > Where is the 2h used, and is it valid syntax in that context?
> >
> > Same for later instances of 2h.
> 
> They are, but they weren't meant to be in this patch.  They belong in a
> separate FP16 series that I won't get to finish for GCC 13 due not being able
> to finish writing all the tests.  I have moved them to that patch series 
> though.
> 
> While the addp patch series has been killed, this patch is still good 
> standalone
> and improves codegen as shown in the updated testcase.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
>   (mov, movmisalign, aarch64_dup_lane,
>   aarch64_store_lane0, aarch64_simd_vec_set,
>   @aarch64_simd_vec_copy_lane, vec_set,
>   reduc__scal_, reduc__scal_,
>   aarch64_reduc__internal,
> aarch64_get_lane,
>   vec_init, vec_extract): Support V2HF.
>   (aarch64_simd_dupv2hf): New.
>   * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
>   Add E_V2HFmode.
>   * config/aarch64/iterators.md (VHSDF_P): New.
>   (V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
>   Vel, q, vp): Add V2HF.
>   * config/arm/types.md (neon_fp_reduc_add_h): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/slp_1.c: Update testcase.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
> e6c2d578fca4b7 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -19,10 +19,10 @@
>  ;; <http://www.gnu.org/licenses/>.
> 
>  (define_expand "mov"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> - (match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> + (match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD"
> -  "
> +{
>/* Force the operand into a register if it is not an
>   immediate whose use can be replaced with xzr.
>   If the mode is 16 bytes wide, then we will be doing @@ -46,12 +46,11 @@
> (define_expand "mov"
>aarch64_expand_vector_init (operands[0], operands[1]);
>DONE;
>  }
> -  "
> -)
> +})
> 
>  (define_expand "movmisalign"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> -(match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> +(match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD && !STRICT_ALIGNMENT"
>  {
>/* This pattern is not permitted to fail during expansion: if both 
> arguments
> @@ -73,6 +72,16 @@ (define_insn "aarch64_simd_dup"
>[(set_attr "type" "neon_dup, neon_from_gp")]
>  )
> 
> +(define_insn "aarch64_simd_dupv2hf"
> +  [(set (match_operand:V2HF 0 "register_operand" "=w")
> + (vec_duplicate:V2HF
> +   (match_operand:HF 1 "register_operand" "0")))]
> +  "TARGET_SIMD"
> +  "@
> +   sli\\t%d0, %d1, 16"
> +  [(set_attr "type" "neon_shift_imm")]
> +)
> +
>  (define_insn "aarch64_simd_dup"
>[(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
>   (vec_duplicate:VDQF_F16
> @@ -85,10 +94,10 @@ (define

RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-11 Thread Tamar Christina via Gcc-patches
Hi,


> This name might cause confusion with the SVE iterators, where FULL means
> "every bit of the register is used".  How about something like VMOVE
> instead?
> 
> With this change, I guess VALL_F16 represents "The set of all modes for
> which the vld1 intrinsics are provided" and VMOVE or whatever is "All
> Advanced SIMD modes suitable for moving, loading, and storing".
> That is, VMOVE extends VALL_F16 with modes that are not manifested via
> intrinsics.
> 

Done.

> Where is the 2h used, and is it valid syntax in that context?
> 
> Same for later instances of 2h.

They are, but they weren't meant to be in this patch.  They belong in a 
separate FP16 series that
I won't get to finish for GCC 13 due not being able to finish writing all the 
tests.  I have moved them
to that patch series though.

While the addp patch series has been killed, this patch is still good 
standalone and improves codegen
as shown in the updated testcase.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
(mov, movmisalign, aarch64_dup_lane,
aarch64_store_lane0, aarch64_simd_vec_set,
@aarch64_simd_vec_copy_lane, vec_set,
reduc__scal_, reduc__scal_,
aarch64_reduc__internal, aarch64_get_lane,
vec_init, vec_extract): Support V2HF.
(aarch64_simd_dupv2hf): New.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
Add E_V2HFmode.
* config/aarch64/iterators.md (VHSDF_P): New.
(V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
Vel, q, vp): Add V2HF.
* config/arm/types.md (neon_fp_reduc_add_h): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/slp_1.c: Update testcase.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661e6c2d578fca4b7
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,10 +19,10 @@
 ;; .
 
 (define_expand "mov"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
-   (match_operand:VALL_F16 1 "general_operand"))]
+  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
+   (match_operand:VMOVE 1 "general_operand"))]
   "TARGET_SIMD"
-  "
+{
   /* Force the operand into a register if it is not an
  immediate whose use can be replaced with xzr.
  If the mode is 16 bytes wide, then we will be doing
@@ -46,12 +46,11 @@ (define_expand "mov"
   aarch64_expand_vector_init (operands[0], operands[1]);
   DONE;
 }
-  "
-)
+})
 
 (define_expand "movmisalign"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
-(match_operand:VALL_F16 1 "general_operand"))]
+  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
+(match_operand:VMOVE 1 "general_operand"))]
   "TARGET_SIMD && !STRICT_ALIGNMENT"
 {
   /* This pattern is not permitted to fail during expansion: if both arguments
@@ -73,6 +72,16 @@ (define_insn "aarch64_simd_dup"
   [(set_attr "type" "neon_dup, neon_from_gp")]
 )
 
+(define_insn "aarch64_simd_dupv2hf"
+  [(set (match_operand:V2HF 0 "register_operand" "=w")
+   (vec_duplicate:V2HF
+ (match_operand:HF 1 "register_operand" "0")))]
+  "TARGET_SIMD"
+  "@
+   sli\\t%d0, %d1, 16"
+  [(set_attr "type" "neon_shift_imm")]
+)
+
 (define_insn "aarch64_simd_dup"
   [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
(vec_duplicate:VDQF_F16
@@ -85,10 +94,10 @@ (define_insn "aarch64_simd_dup"
 )
 
 (define_insn "aarch64_dup_lane"
-  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
-   (vec_duplicate:VALL_F16
+  [(set (match_operand:VMOVE 0 "register_operand" "=w")
+   (vec_duplicate:VMOVE
  (vec_select:
-   (match_operand:VALL_F16 1 "register_operand" "w")
+   (match_operand:VMOVE 1 "register_operand" "w")
(parallel [(match_operand:SI 2 "immediate_operand" "i")])
   )))]
   "TARGET_SIMD"
@@ -142,6 +151,29 @@ (define_insn "*aarch64_simd_mov"
 mov_reg, neon_move")]
 )
 
+(define_insn "*aarch64_simd_movv2hf"
+  [(set (match_operand:V2HF 0 "nonimmediate_operand"
+   "=w, m,  m,  w, ?r, ?w, ?r, w, w")
+   (match_operand:V2HF 1 "general_operand"
+   "m,  Dz, w,  w,  w,  r,  r, Dz, Dn"))]
+  "TARGET_SIMD_F16INST
+   && (register_operand (operands[0], V2HFmode)
+   || aarch64_simd_reg_or_zero (operands[1], V2HFmode))"
+   "@
+ldr\\t%s0, %1
+str\\twzr, %0
+str\\t%s1, %0
+mov\\t%0.2s[0], %1.2s[0]
+umov\\t%w0, %1.s[0]
+fmov\\t%s0, %1
+mov\\t%0, %1
+movi\\t%d0, 0
+* return aarch64_output_simd_mov_immediate (operands[1], 32);"
+  [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\
+neon_logic, neon_to_gp, f_mcr,\
+   

RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-01 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 1, 2022 2:59 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > The backend has an existing V2HFmode that is used by pairwise operations.
> > This mode was however never made fully functional.  Amongst other
> > things it was never declared as a vector type which made it unusable from
> the mid-end.
> >
> > It's also lacking an implementation for load/stores so reload ICEs if
> > this mode is every used.  This finishes the implementation by providing the
> above.
> >
> > Note that I have created a new iterator VHSDF_P instead of extending
> > VHSDF because the previous iterator is used in far more things than just
> load/stores.
> >
> > It's also used for instance in intrinsics and extending this would
> > force me to provide support for mangling the type while we never
> > expose it through intrinsics.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
> > (mov, movmisalign, aarch64_dup_lane,
> > aarch64_store_lane0, aarch64_simd_vec_set,
> > @aarch64_simd_vec_copy_lane, vec_set,
> > reduc__scal_, reduc__scal_,
> > aarch64_reduc__internal,
> aarch64_get_lane,
> > vec_init, vec_extract): Support V2HF.
> > * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
> > Add E_V2HFmode.
> > * config/aarch64/iterators.md (VHSDF_P): New.
> > (V2F, VALL_F16_FULL, nunits, Vtype, Vmtype, Vetype, stype, VEL,
> > Vel, q, vp): Add V2HF.
> > * config/arm/types.md (neon_fp_reduc_add_h): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/slp_1.c: Update testcase.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> 25aed74f8cf939562ed65a578fe32ca76605b58a..93a2888f567460ad10ec050ea7
> d4
> > f701df4729d1 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -19,10 +19,10 @@
> >  ;; <http://www.gnu.org/licenses/>.
> >
> >  (define_expand "mov"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -   (match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VALL_F16_FULL 0 "nonimmediate_operand")
> > +   (match_operand:VALL_F16_FULL 1 "general_operand"))]
> >"TARGET_SIMD"
> > -  "
> > +{
> >/* Force the operand into a register if it is not an
> >   immediate whose use can be replaced with xzr.
> >   If the mode is 16 bytes wide, then we will be doing @@ -46,12
> > +46,11 @@ (define_expand "mov"
> >aarch64_expand_vector_init (operands[0], operands[1]);
> >DONE;
> >  }
> > -  "
> > -)
> > +})
> >
> >  (define_expand "movmisalign"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -(match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VALL_F16_FULL 0 "nonimmediate_operand")
> > +(match_operand:VALL_F16_FULL 1 "general_operand"))]
> >"TARGET_SIMD && !STRICT_ALIGNMENT"
> >  {
> >/* This pattern is not permitted to fail during expansion: if both
> > arguments @@ -85,10 +84,10 @@ (define_insn
> "aarch64_simd_dup"
> >  )
> >
> >  (define_insn "aarch64_dup_lane"
> > -  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
> > -   (vec_duplicate:VALL_F16
> > +  [(set (match_operand:VALL_F16_FULL 0 "register_operand" "=w")
> > +   (vec_duplicate:VALL_F16_FULL
> >   (vec_select:
> > -   (match_operand:VALL_F16 1 "register_operand" "w")
> > +   (match_operand:VALL_F16_FULL 1 "register_operand" "w")
> > (parallel [(match_operand:SI 2 "immediate_operand" "i")])
> >)))]
> >"TARGET_SIMD"
> > @@ -142,6 +141,29 @@ (define_insn
&g

Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> The backend has an existing V2HFmode that is used by pairwise operations.
> This mode was however never made fully functional.  Amongst other things it 
> was
> never declared as a vector type which made it unusable from the mid-end.
>
> It's also lacking an implementation for load/stores so reload ICEs if this 
> mode
> is every used.  This finishes the implementation by providing the above.
>
> Note that I have created a new iterator VHSDF_P instead of extending VHSDF
> because the previous iterator is used in far more things than just 
> load/stores.
>
> It's also used for instance in intrinsics and extending this would force me to
> provide support for mangling the type while we never expose it through
> intrinsics.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
>   (mov, movmisalign, aarch64_dup_lane,
>   aarch64_store_lane0, aarch64_simd_vec_set,
>   @aarch64_simd_vec_copy_lane, vec_set,
>   reduc__scal_, reduc__scal_,
>   aarch64_reduc__internal, aarch64_get_lane,
>   vec_init, vec_extract): Support V2HF.
>   * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
>   Add E_V2HFmode.
>   * config/aarch64/iterators.md (VHSDF_P): New.
>   (V2F, VALL_F16_FULL, nunits, Vtype, Vmtype, Vetype, stype, VEL,
>   Vel, q, vp): Add V2HF.
>   * config/arm/types.md (neon_fp_reduc_add_h): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/slp_1.c: Update testcase.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 25aed74f8cf939562ed65a578fe32ca76605b58a..93a2888f567460ad10ec050ea7d4f701df4729d1
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -19,10 +19,10 @@
>  ;; .
>  
>  (define_expand "mov"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> - (match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VALL_F16_FULL 0 "nonimmediate_operand")
> + (match_operand:VALL_F16_FULL 1 "general_operand"))]
>"TARGET_SIMD"
> -  "
> +{
>/* Force the operand into a register if it is not an
>   immediate whose use can be replaced with xzr.
>   If the mode is 16 bytes wide, then we will be doing
> @@ -46,12 +46,11 @@ (define_expand "mov"
>aarch64_expand_vector_init (operands[0], operands[1]);
>DONE;
>  }
> -  "
> -)
> +})
>  
>  (define_expand "movmisalign"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> -(match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VALL_F16_FULL 0 "nonimmediate_operand")
> +(match_operand:VALL_F16_FULL 1 "general_operand"))]
>"TARGET_SIMD && !STRICT_ALIGNMENT"
>  {
>/* This pattern is not permitted to fail during expansion: if both 
> arguments
> @@ -85,10 +84,10 @@ (define_insn "aarch64_simd_dup"
>  )
>  
>  (define_insn "aarch64_dup_lane"
> -  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
> - (vec_duplicate:VALL_F16
> +  [(set (match_operand:VALL_F16_FULL 0 "register_operand" "=w")
> + (vec_duplicate:VALL_F16_FULL
> (vec_select:
> - (match_operand:VALL_F16 1 "register_operand" "w")
> + (match_operand:VALL_F16_FULL 1 "register_operand" "w")
>   (parallel [(match_operand:SI 2 "immediate_operand" "i")])
>)))]
>"TARGET_SIMD"
> @@ -142,6 +141,29 @@ (define_insn "*aarch64_simd_mov"
>mov_reg, neon_move")]
>  )
>  
> +(define_insn "*aarch64_simd_movv2hf"
> +  [(set (match_operand:V2HF 0 "nonimmediate_operand"
> + "=w, m,  m,  w, ?r, ?w, ?r, w, w")
> + (match_operand:V2HF 1 "general_operand"
> + "m,  Dz, w,  w,  w,  r,  r, Dz, Dn"))]
> +  "TARGET_SIMD_F16INST
> +   && (register_operand (operands[0], V2HFmode)
> +   || aarch64_simd_reg_or_zero (operands[1], V2HFmode))"
> +   "@
> +ldr\\t%s0, %1
> +str\\twzr, %0
> +str\\t%s1, %0
> +mov\\t%0.2s[0], %1.2s[0]
> +umov\\t%w0, %1.s[0]
> +fmov\\t%s0, %1
> +mov\\t%0, %1
> +movi\\t%d0, 0
> +* return aarch64_output_simd_mov_immediate (operands[1], 32);"
> +  [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\
> +  neon_logic, neon_to_gp, f_mcr,\
> +  mov_reg, neon_move, neon_move")]
> +)
> +
>  (define_insn "*aarch64_simd_mov"
>[(set (match_operand:VQMOV 0 "nonimmediate_operand"
>   "=w, Umn,  m,  w, ?r, ?w, ?r, w")
> @@ -182,7 +204,7 @@ (define_insn "*aarch64_simd_mov"
>  
>  (define_insn "aarch64_store_lane0"
>[(set (match_operand: 0 "memory_operand" "=m")
> - (vec_select: (match_operand:VALL_F16 1 "register_operand" "w")
> + (vec_select: (match_operand:VALL_F16_FULL 1 "register_operand" "w")
>   

[PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All,

The backend has an existing V2HFmode that is used by pairwise operations.
This mode was however never made fully functional.  Amongst other things it was
never declared as a vector type which made it unusable from the mid-end.

It's also lacking an implementation for load/stores so reload ICEs if this mode
is every used.  This finishes the implementation by providing the above.

Note that I have created a new iterator VHSDF_P instead of extending VHSDF
because the previous iterator is used in far more things than just load/stores.

It's also used for instance in intrinsics and extending this would force me to
provide support for mangling the type while we never expose it through
intrinsics.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
(mov, movmisalign, aarch64_dup_lane,
aarch64_store_lane0, aarch64_simd_vec_set,
@aarch64_simd_vec_copy_lane, vec_set,
reduc__scal_, reduc__scal_,
aarch64_reduc__internal, aarch64_get_lane,
vec_init, vec_extract): Support V2HF.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
Add E_V2HFmode.
* config/aarch64/iterators.md (VHSDF_P): New.
(V2F, VALL_F16_FULL, nunits, Vtype, Vmtype, Vetype, stype, VEL,
Vel, q, vp): Add V2HF.
* config/arm/types.md (neon_fp_reduc_add_h): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/slp_1.c: Update testcase.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
25aed74f8cf939562ed65a578fe32ca76605b58a..93a2888f567460ad10ec050ea7d4f701df4729d1
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,10 +19,10 @@
 ;; .
 
 (define_expand "mov"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
-   (match_operand:VALL_F16 1 "general_operand"))]
+  [(set (match_operand:VALL_F16_FULL 0 "nonimmediate_operand")
+   (match_operand:VALL_F16_FULL 1 "general_operand"))]
   "TARGET_SIMD"
-  "
+{
   /* Force the operand into a register if it is not an
  immediate whose use can be replaced with xzr.
  If the mode is 16 bytes wide, then we will be doing
@@ -46,12 +46,11 @@ (define_expand "mov"
   aarch64_expand_vector_init (operands[0], operands[1]);
   DONE;
 }
-  "
-)
+})
 
 (define_expand "movmisalign"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
-(match_operand:VALL_F16 1 "general_operand"))]
+  [(set (match_operand:VALL_F16_FULL 0 "nonimmediate_operand")
+(match_operand:VALL_F16_FULL 1 "general_operand"))]
   "TARGET_SIMD && !STRICT_ALIGNMENT"
 {
   /* This pattern is not permitted to fail during expansion: if both arguments
@@ -85,10 +84,10 @@ (define_insn "aarch64_simd_dup"
 )
 
 (define_insn "aarch64_dup_lane"
-  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
-   (vec_duplicate:VALL_F16
+  [(set (match_operand:VALL_F16_FULL 0 "register_operand" "=w")
+   (vec_duplicate:VALL_F16_FULL
  (vec_select:
-   (match_operand:VALL_F16 1 "register_operand" "w")
+   (match_operand:VALL_F16_FULL 1 "register_operand" "w")
(parallel [(match_operand:SI 2 "immediate_operand" "i")])
   )))]
   "TARGET_SIMD"
@@ -142,6 +141,29 @@ (define_insn "*aarch64_simd_mov"
 mov_reg, neon_move")]
 )
 
+(define_insn "*aarch64_simd_movv2hf"
+  [(set (match_operand:V2HF 0 "nonimmediate_operand"
+   "=w, m,  m,  w, ?r, ?w, ?r, w, w")
+   (match_operand:V2HF 1 "general_operand"
+   "m,  Dz, w,  w,  w,  r,  r, Dz, Dn"))]
+  "TARGET_SIMD_F16INST
+   && (register_operand (operands[0], V2HFmode)
+   || aarch64_simd_reg_or_zero (operands[1], V2HFmode))"
+   "@
+ldr\\t%s0, %1
+str\\twzr, %0
+str\\t%s1, %0
+mov\\t%0.2s[0], %1.2s[0]
+umov\\t%w0, %1.s[0]
+fmov\\t%s0, %1
+mov\\t%0, %1
+movi\\t%d0, 0
+* return aarch64_output_simd_mov_immediate (operands[1], 32);"
+  [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\
+neon_logic, neon_to_gp, f_mcr,\
+mov_reg, neon_move, neon_move")]
+)
+
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQMOV 0 "nonimmediate_operand"
"=w, Umn,  m,  w, ?r, ?w, ?r, w")
@@ -182,7 +204,7 @@ (define_insn "*aarch64_simd_mov"
 
 (define_insn "aarch64_store_lane0"
   [(set (match_operand: 0 "memory_operand" "=m")
-   (vec_select: (match_operand:VALL_F16 1 "register_operand" "w")
+   (vec_select: (match_operand:VALL_F16_FULL 1 "register_operand" "w")
(parallel [(match_operand 2 "const_int_operand" 
"n")])))]
   "TARGET_SIMD
&& ENDIAN_LANE_N (, INTVAL (operands[2])) == 0"
@@ -1035,11 +1057,11 @@ (define_insn "one_cmpl2"
 )
 
 (define_insn