Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2019-01-16 Thread Segher Boessenkool
On Mon, Jan 14, 2019 at 12:49:33PM -0600, Aaron Sawdey wrote:
> The patch for this was committed to trunk as 267562 (see below). Is this also 
> ok for backport to 8?

Yes please.  Thanks!


Segher


> On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> > On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
> >> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> >>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>  Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
>  to cache inhibited memory, here is a patch that keeps memmove (and 
>  memcpy)
>  inline expansion from doing unaligned vector or using vector load/store
>  other than lvx/stvx. More description of the issue is here:
> 
>  https://patchwork.ozlabs.org/patch/814059/
> 
>  OK for trunk if bootstrap/regtest ok?
> >>>
> >>> Okay, but see below.
> >>>
> >> [snip]
> >>>
> >>> This is extraordinarily clumsy :-)  Maybe something like:
> >>>
> >>> static rtx
> >>> gen_lvx_v4si_move (rtx dest, rtx src)
> >>> {
> >>>   gcc_assert (!(MEM_P (dest) && MEM_P (src));
> >>>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> >>>   if (MEM_P (dest))
> >>> return gen_altivec_stvx_v4si_internal (dest, src);
> >>>   else if (MEM_P (src))
> >>> return gen_altivec_lvx_v4si_internal (dest, src);
> >>>   else
> >>> gcc_unreachable ();
> >>> }
> >>>
> >>> (Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
> >>> the useless extra variable.
> >>
> >> I think this should be better:
> > 
> > The gcc_unreachable at the end catches the non-mem to non-mem case.
> > 
> >> static rtx
> >> gen_lvx_v4si_move (rtx dest, rtx src)
> >> {
> >>   gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && 
> >> !MEM_P(dest)));
> > 
> > But if you prefer this, how about
> > 
> > {
> >   gcc_assert (MEM_P (dest) ^ MEM_P (src));
> >   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> > 
> >   if (MEM_P (dest))
> > return gen_altivec_stvx_v4si_internal (dest, src);
> >   else
> > return gen_altivec_lvx_v4si_internal (dest, src);
> > }
> > 
> > :-)
> > 
> > 
> > Segher
> > 
> 
> 2019-01-03  Aaron Sawdey  
> 
> * config/rs6000/rs6000-string.c (expand_block_move): Don't use
> unaligned vsx and avoid lxvd2x/stxvd2x.
> (gen_lvx_v4si_move): New function.
> 
> 
> Index: gcc/config/rs6000/rs6000-string.c
> ===
> --- gcc/config/rs6000/rs6000-string.c (revision 267299)
> +++ gcc/config/rs6000/rs6000-string.c (working copy)
> @@ -2669,6 +2669,25 @@
>return true;
>  }
> 
> +/* Generate loads and stores for a move of v4si mode using lvx/stvx.
> +   This uses altivec_{l,st}vx__internal which use unspecs to
> +   keep combine from changing what instruction gets used.
> +
> +   DEST is the destination for the data.
> +   SRC is the source of the data for the move.  */
> +
> +static rtx
> +gen_lvx_v4si_move (rtx dest, rtx src)
> +{
> +  gcc_assert (MEM_P (dest) ^ MEM_P (src));
> +  gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> +
> +  if (MEM_P (dest))
> +return gen_altivec_stvx_v4si_internal (dest, src);
> +  else
> +return gen_altivec_lvx_v4si_internal (dest, src);
> +}
> +
>  /* Expand a block move operation, and return 1 if successful.  Return 0
> if we should let the compiler generate normal code.
> 
> @@ -2721,11 +2740,11 @@
> 
>/* Altivec first, since it will be faster than a string move
>when it applies, and usually not significantly larger.  */
> -  if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX 
> || align >= 128))
> +  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
>   {
> move_bytes = 16;
> mode = V4SImode;
> -   gen_func.mov = gen_movv4si;
> +   gen_func.mov = gen_lvx_v4si_move;
>   }
>else if (bytes >= 8 && TARGET_POWERPC64
>  && (align >= 64 || !STRICT_ALIGNMENT))
> 
> 
> 
> -- 
> Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain


Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2019-01-14 Thread Aaron Sawdey
The patch for this was committed to trunk as 267562 (see below). Is this also 
ok for backport to 8?

Thanks,
   Aaron

On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
 Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
 to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
 inline expansion from doing unaligned vector or using vector load/store
 other than lvx/stvx. More description of the issue is here:

 https://patchwork.ozlabs.org/patch/814059/

 OK for trunk if bootstrap/regtest ok?
>>>
>>> Okay, but see below.
>>>
>> [snip]
>>>
>>> This is extraordinarily clumsy :-)  Maybe something like:
>>>
>>> static rtx
>>> gen_lvx_v4si_move (rtx dest, rtx src)
>>> {
>>>   gcc_assert (!(MEM_P (dest) && MEM_P (src));
>>>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>>>   if (MEM_P (dest))
>>> return gen_altivec_stvx_v4si_internal (dest, src);
>>>   else if (MEM_P (src))
>>> return gen_altivec_lvx_v4si_internal (dest, src);
>>>   else
>>> gcc_unreachable ();
>>> }
>>>
>>> (Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
>>> the useless extra variable.
>>
>> I think this should be better:
> 
> The gcc_unreachable at the end catches the non-mem to non-mem case.
> 
>> static rtx
>> gen_lvx_v4si_move (rtx dest, rtx src)
>> {
>>   gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && 
>> !MEM_P(dest)));
> 
> But if you prefer this, how about
> 
> {
>   gcc_assert (MEM_P (dest) ^ MEM_P (src));
>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> 
>   if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
>   else
> return gen_altivec_lvx_v4si_internal (dest, src);
> }
> 
> :-)
> 
> 
> Segher
> 

2019-01-03  Aaron Sawdey  

* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.


Index: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c   (revision 267299)
+++ gcc/config/rs6000/rs6000-string.c   (working copy)
@@ -2669,6 +2669,25 @@
   return true;
 }

+/* Generate loads and stores for a move of v4si mode using lvx/stvx.
+   This uses altivec_{l,st}vx__internal which use unspecs to
+   keep combine from changing what instruction gets used.
+
+   DEST is the destination for the data.
+   SRC is the source of the data for the move.  */
+
+static rtx
+gen_lvx_v4si_move (rtx dest, rtx src)
+{
+  gcc_assert (MEM_P (dest) ^ MEM_P (src));
+  gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
+
+  if (MEM_P (dest))
+return gen_altivec_stvx_v4si_internal (dest, src);
+  else
+return gen_altivec_lvx_v4si_internal (dest, src);
+}
+
 /* Expand a block move operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.

@@ -2721,11 +2740,11 @@

   /* Altivec first, since it will be faster than a string move
 when it applies, and usually not significantly larger.  */
-  if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || 
align >= 128))
+  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
{
  move_bytes = 16;
  mode = V4SImode;
- gen_func.mov = gen_movv4si;
+ gen_func.mov = gen_lvx_v4si_move;
}
   else if (bytes >= 8 && TARGET_POWERPC64
   && (align >= 64 || !STRICT_ALIGNMENT))



-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-20 Thread Aaron Sawdey
On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
 Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
 to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
 inline expansion from doing unaligned vector or using vector load/store
 other than lvx/stvx. More description of the issue is here:

 https://patchwork.ozlabs.org/patch/814059/

 OK for trunk if bootstrap/regtest ok?
>>>
>>> Okay, but see below.
>>>
>> [snip]
>>>
>>> This is extraordinarily clumsy :-)  Maybe something like:
>>>
>>> static rtx
>>> gen_lvx_v4si_move (rtx dest, rtx src)
>>> {
>>>   gcc_assert (!(MEM_P (dest) && MEM_P (src));
>>>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>>>   if (MEM_P (dest))
>>> return gen_altivec_stvx_v4si_internal (dest, src);
>>>   else if (MEM_P (src))
>>> return gen_altivec_lvx_v4si_internal (dest, src);
>>>   else
>>> gcc_unreachable ();
>>> }
>>>
>>> (Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
>>> the useless extra variable.
>>
>> I think this should be better:
> 
> The gcc_unreachable at the end catches the non-mem to non-mem case.
> 
>> static rtx
>> gen_lvx_v4si_move (rtx dest, rtx src)
>> {
>>   gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && 
>> !MEM_P(dest)));
> 
> But if you prefer this, how about
> 
> {
>   gcc_assert (MEM_P (dest) ^ MEM_P (src));
>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> 
>   if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
>   else
> return gen_altivec_lvx_v4si_internal (dest, src);
> }
> 
> :-)
> 
> 
> Segher
> 

I like that even better, thanks!

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-20 Thread Segher Boessenkool
On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> > On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
> >> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
> >> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
> >> inline expansion from doing unaligned vector or using vector load/store
> >> other than lvx/stvx. More description of the issue is here:
> >>
> >> https://patchwork.ozlabs.org/patch/814059/
> >>
> >> OK for trunk if bootstrap/regtest ok?
> > 
> > Okay, but see below.
> > 
> [snip]
> > 
> > This is extraordinarily clumsy :-)  Maybe something like:
> > 
> > static rtx
> > gen_lvx_v4si_move (rtx dest, rtx src)
> > {
> >   gcc_assert (!(MEM_P (dest) && MEM_P (src));
> >   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> >   if (MEM_P (dest))
> > return gen_altivec_stvx_v4si_internal (dest, src);
> >   else if (MEM_P (src))
> > return gen_altivec_lvx_v4si_internal (dest, src);
> >   else
> > gcc_unreachable ();
> > }
> > 
> > (Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
> > the useless extra variable.
> 
> I think this should be better:

The gcc_unreachable at the end catches the non-mem to non-mem case.

> static rtx
> gen_lvx_v4si_move (rtx dest, rtx src)
> {
>   gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && 
> !MEM_P(dest)));

But if you prefer this, how about

{
  gcc_assert (MEM_P (dest) ^ MEM_P (src));
  gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);

  if (MEM_P (dest))
return gen_altivec_stvx_v4si_internal (dest, src);
  else
return gen_altivec_lvx_v4si_internal (dest, src);
}

:-)


Segher


Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-20 Thread Aaron Sawdey
On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
>> inline expansion from doing unaligned vector or using vector load/store
>> other than lvx/stvx. More description of the issue is here:
>>
>> https://patchwork.ozlabs.org/patch/814059/
>>
>> OK for trunk if bootstrap/regtest ok?
> 
> Okay, but see below.
> 
[snip]
> 
> This is extraordinarily clumsy :-)  Maybe something like:
> 
> static rtx
> gen_lvx_v4si_move (rtx dest, rtx src)
> {
>   gcc_assert (!(MEM_P (dest) && MEM_P (src));
>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>   if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
>   else if (MEM_P (src))
> return gen_altivec_lvx_v4si_internal (dest, src);
>   else
> gcc_unreachable ();
> }
> 
> (Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
> the useless extra variable.

I think this should be better:

static rtx
gen_lvx_v4si_move (rtx dest, rtx src)
{
  gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
  gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
  if (MEM_P (dest))
  return gen_altivec_stvx_v4si_internal (dest, src);
  else if (MEM_P (src))
  return gen_altivec_lvx_v4si_internal (dest, src);
  gcc_unreachable ();
}

I'll commit after I re-regstrap.

Thanks!
   Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-20 Thread Segher Boessenkool
On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
> inline expansion from doing unaligned vector or using vector load/store
> other than lvx/stvx. More description of the issue is here:
> 
> https://patchwork.ozlabs.org/patch/814059/
> 
> OK for trunk if bootstrap/regtest ok?

Okay, but see below.

> 2018-12-19  Aaron Sawdey  
> 
>   * config/rs6000/rs6000-string.c (expand_block_move): Don't use
>   unaligned vsx and avoid lxvd2x/stxvd2x.
>   (gen_lvx_v4si_move): New function.

> +static rtx
> +gen_lvx_v4si_move (rtx dest, rtx src)
> +{
> +  rtx rv = NULL;
> +  if (MEM_P (dest))
> +{
> +  gcc_assert (!MEM_P (src));
> +  gcc_assert (GET_MODE (src) == V4SImode);
> +  rv = gen_altivec_stvx_v4si_internal (dest, src);
> +}
> +  else if (MEM_P (src))
> +{
> +  gcc_assert (!MEM_P (dest));
> +  gcc_assert (GET_MODE (dest) == V4SImode);
> +  rv = gen_altivec_lvx_v4si_internal (dest, src);
> +}
> +  else
> +gcc_unreachable ();
> +
> +  return rv;
> +}

This is extraordinarily clumsy :-)  Maybe something like:

static rtx
gen_lvx_v4si_move (rtx dest, rtx src)
{
  gcc_assert (!(MEM_P (dest) && MEM_P (src));
  gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
  if (MEM_P (dest))
return gen_altivec_stvx_v4si_internal (dest, src);
  else if (MEM_P (src))
return gen_altivec_lvx_v4si_internal (dest, src);
  else
gcc_unreachable ();
}

(Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
the useless extra variable.

Thanks!


Segher