Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-09 Thread Jeff Law
On 04/04/2018 06:43 AM, Peter Bergner wrote:
> On 4/4/18 4:06 AM, Tamar Christina wrote:
>> Now that I know how the loads are done, I have a patch should be both 
>> correct and generate better code in most cases.
>> It just calculates bitsize inside the loop and does the copying in the 
>> largest mode possible that's equal or less than the bits
>> That are to be copied. This avoids the issue with reading too much, honors 
>> padding and alignment and still generates better code
>> In most cases.
>>
>> I'm running the regression tests and should have a final version soon.
> 
> If you give me your patch when it's ready, I'll do bootstrap and regression
> testing on powerpc*-linux and verify it fixes the issues we hit.
Similarly, I've got a jenkins instance here were I can get a bootstrap
and regression test on the usual targets like aarch64, armv7, i686,
powerpc (32, 64, 64le), s390x, sparc64, x86_64.  But more interestingly
it'll also do a bootstrap test on alpha, hppa, m68k, sh4 and other
oddballs like aarch64-be.

A patch for the tip of the trunk is all I need.

It doesn't run the testsuite, but the ability to bootstrap on the lesser
used targets gives a level of code generator validation that is helpful.

Takes about 24hrs to cycle through everything...

jeff


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-04 Thread Peter Bergner
On 4/4/18 4:06 AM, Tamar Christina wrote:
> Now that I know how the loads are done, I have a patch should be both correct 
> and generate better code in most cases.
> It just calculates bitsize inside the loop and does the copying in the 
> largest mode possible that's equal or less than the bits
> That are to be copied. This avoids the issue with reading too much, honors 
> padding and alignment and still generates better code
> In most cases.
> 
> I'm running the regression tests and should have a final version soon.

If you give me your patch when it's ready, I'll do bootstrap and regression
testing on powerpc*-linux and verify it fixes the issues we hit.

Peter




RE: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-04 Thread Tamar Christina


> -Original Message-
> From: Alan Modra <amo...@gmail.com>
> Sent: Tuesday, April 3, 2018 14:16
> To: Richard Biener <rguent...@suse.de>
> Cc: Peter Bergner <berg...@vnet.ibm.com>; Tamar Christina
> <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; nd <n...@arm.com>;
> l...@redhat.com; i...@airs.com
> Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> supports unaligned access [Patch (1/2)]
> 
> On Tue, Apr 03, 2018 at 02:30:23PM +0200, Richard Biener wrote:
> > On Tue, 3 Apr 2018, Alan Modra wrote:
> >
> > > On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> > > > On Fri, 30 Mar 2018, Peter Bergner wrote:
> > > >
> > > > > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > > > > On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
> > > > > >> --- a/gcc/expr.c
> > > > > >> +++ b/gcc/expr.c
> > > > > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode
> mode,
> > > > > >> tree src)
> > > > > >>
> > > > > >>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > > > > >>dst_words = XALLOCAVEC (rtx, n_regs);
> > > > > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)),
> > > > > >> BITS_PER_WORD);
> > > > > >> +  bitsize = BITS_PER_WORD;
> > > > > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN
> (TREE_TYPE (src
> > > > > >> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)),
> > > > > >> + BITS_PER_WORD);
> > > > > >>
> > > > > >>/* Copy the structure BITSIZE bits at a time.  */
> > > > > >>for (bitpos = 0, xbitpos = padding_correction;
> > > > > >
> > > > > > I believe this patch is wrong.  Please revert.  See the PR84762
> testcase.
> > > > > >
> > > > > > There are two problems.  Firstly, if padding_correction is
> > > > > > non-zero, then xbitpos % BITS_PER_WORD is non-zero and in
> > > > > >
> > > > > >   store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > > > >0, 0, word_mode,
> > > > > >extract_bit_field (src_word, bitsize,
> > > > > >   bitpos % BITS_PER_WORD, 1,
> > > > > >   NULL_RTX, word_mode,
> word_mode,
> > > > > >   false, NULL),
> > > > > >false);
> > > > > >
> > > > > > the stored bit-field exceeds the destination register size.
> > > > > > You could fix that by making bitsize the gcd of bitsize and
> padding_correction.
> > > > > >
> > > > > > However, that doesn't fix the second problem which is that the
> > > > > > extracted bit-field can exceed the source size.  That will
> > > > > > result in rubbish being read into a register.
> > > > >
> > > > > FYI, I received an automated response saying Tamar is away on
> > > > > vacation with no return date specified.  That means he won't be
> > > > > able to revert the patch.  What do we do now?
> > > >
> > > > The code before the change already looks fishy to me.
> > >
> > > Yes, it is fishy so far as the code in the loop relies on alignment
> > > determining a small enough bitsize.  Adding
> > >
> > >   if (bytes % UNITS_PER_WORD != 0)
> > > bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) *
> > > BITS_PER_UNIT);
> > >
> > > after bitsize is calculated from alignment would make the code
> > > correct, I believe.  But I think that will fail the testcase Tamar
> > > added.
> > >
> > > >   x = expand_normal (src);
> > > >
> > > > so what do we expect this to expand to in general?  Fortunately it
> > > > seems there are exactly two callers so hopefully a gcc_assert
> > > > (MEM_P (x)) would work?
> > > >
> > > > The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> > > >
> > > > In case x is a MEM we should use MEM_ALIGN and if not then we
> > > > shouldn't do anything but just use word_mode here.
> > >
>

Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-03 Thread Alan Modra
On Tue, Apr 03, 2018 at 02:30:23PM +0200, Richard Biener wrote:
> On Tue, 3 Apr 2018, Alan Modra wrote:
> 
> > On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> > > On Fri, 30 Mar 2018, Peter Bergner wrote:
> > > 
> > > > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > > > On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
> > > > >> --- a/gcc/expr.c
> > > > >> +++ b/gcc/expr.c
> > > > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree 
> > > > >> src)
> > > > >>  
> > > > >>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > > > >>dst_words = XALLOCAVEC (rtx, n_regs);
> > > > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > > >> +  bitsize = BITS_PER_WORD;
> > > > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN 
> > > > >> (TREE_TYPE (src
> > > > >> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > > >>  
> > > > >>/* Copy the structure BITSIZE bits at a time.  */
> > > > >>for (bitpos = 0, xbitpos = padding_correction;
> > > > > 
> > > > > I believe this patch is wrong.  Please revert.  See the PR84762 
> > > > > testcase.
> > > > > 
> > > > > There are two problems.  Firstly, if padding_correction is non-zero,
> > > > > then xbitpos % BITS_PER_WORD is non-zero and in
> > > > > 
> > > > >   store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > > >  0, 0, word_mode,
> > > > >  extract_bit_field (src_word, bitsize,
> > > > > bitpos % BITS_PER_WORD, 1,
> > > > > NULL_RTX, word_mode, 
> > > > > word_mode,
> > > > > false, NULL),
> > > > >  false);
> > > > > 
> > > > > the stored bit-field exceeds the destination register size.  You could
> > > > > fix that by making bitsize the gcd of bitsize and padding_correction.
> > > > > 
> > > > > However, that doesn't fix the second problem which is that the
> > > > > extracted bit-field can exceed the source size.  That will result in
> > > > > rubbish being read into a register.
> > > > 
> > > > FYI, I received an automated response saying Tamar is away on vacation
> > > > with no return date specified.  That means he won't be able to revert
> > > > the patch.  What do we do now?
> > > 
> > > The code before the change already looks fishy to me.
> > 
> > Yes, it is fishy so far as the code in the loop relies on alignment
> > determining a small enough bitsize.  Adding
> > 
> >   if (bytes % UNITS_PER_WORD != 0)
> > bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);
> > 
> > after bitsize is calculated from alignment would make the code
> > correct, I believe.  But I think that will fail the testcase Tamar
> > added.
> > 
> > >   x = expand_normal (src);
> > > 
> > > so what do we expect this to expand to in general?  Fortunately
> > > it seems there are exactly two callers so hopefully a
> > > gcc_assert (MEM_P (x)) would work?
> > > 
> > > The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> > > 
> > > In case x is a MEM we should use MEM_ALIGN and if not then we
> > > shouldn't do anything but just use word_mode here.
> > 
> > No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
> > quoted above.
> 
> Ah, so it should possibly use TYPE_SIZE instead of TYPE_ALIGN

Not that either.  We already have a byte count..

> otherwise
> it will handle the over-aligned case (align to 8 bytes, size 4 bytes)
> incorrectly as well?

Yes, that case is mishandled too, even before Tamar's patch.  Peter's
pr85123 testcase with vfloat1 aligned(8) is an example.

If we want correct code, then

  if (bytes % UNITS_PER_WORD != 0)
bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);

will fix the algorithm inside the copy_blkmode_to_reg loop.  Otherwise
the loop itself needs changes.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-03 Thread Tamar Christina
Hi all,

> I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
>

I'm happy to revert it given the regression and GCC 8 release being imminent,
but looking at the code again it seems to me that the original code may also not
be fully correct? Particularly I'm wondering what happens if you overalign the 
struct.

> There are two problems.  Firstly, if padding_correction is non-zero,
> then xbitpos % BITS_PER_WORD is non-zero and in
>
>  store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
>   0, 0, word_mode,
>   extract_bit_field (src_word, bitsize,
>  bitpos % BITS_PER_WORD, 1,
>  NULL_RTX, word_mode, word_mode,
>  false, NULL),
>   false);
>
> the stored bit-field exceeds the destination register size.  You could
> fix that by making bitsize the gcd of bitsize and padding_correction.
>
> However, that doesn't fix the second problem which is that the
> extracted bit-field can exceed the source size.  That will result in
> rubbish being read into a register.

Yes, this is because I misunderstood how extract_bit_field works, I had though
that the word_mode was the size of the load and that the bitsize, bitpos is just
used to determine the mask & shift. But it seems to do the load based on bitsize
and word_mode is just the mode you want the results in?

In which case, wouldn't just adjusting bitsize to be the largest size smaller 
than the number of
bits we want to copy in each loop iteration work? alignment permitted.

Regards,
Tamar


From: Alan Modra <amo...@gmail.com>
Sent: Tuesday, April 3, 2018 1:25 PM
To: Richard Biener
Cc: Peter Bergner; Tamar Christina; gcc-patches@gcc.gnu.org; nd; 
l...@redhat.com; i...@airs.com
Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target supports 
unaligned access [Patch (1/2)]

On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> On Fri, 30 Mar 2018, Peter Bergner wrote:
>
> > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
> > >> --- a/gcc/expr.c
> > >> +++ b/gcc/expr.c
> > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> > >>
> > >>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > >>dst_words = XALLOCAVEC (rtx, n_regs);
> > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > >> +  bitsize = BITS_PER_WORD;
> > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> > >> (src
> > >> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > >>
> > >>/* Copy the structure BITSIZE bits at a time.  */
> > >>for (bitpos = 0, xbitpos = padding_correction;
> > >
> > > I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> > >
> > > There are two problems.  Firstly, if padding_correction is non-zero,
> > > then xbitpos % BITS_PER_WORD is non-zero and in
> > >
> > >   store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > >  0, 0, word_mode,
> > >  extract_bit_field (src_word, bitsize,
> > > bitpos % BITS_PER_WORD, 1,
> > > NULL_RTX, word_mode, word_mode,
> > > false, NULL),
> > >  false);
> > >
> > > the stored bit-field exceeds the destination register size.  You could
> > > fix that by making bitsize the gcd of bitsize and padding_correction.
> > >
> > > However, that doesn't fix the second problem which is that the
> > > extracted bit-field can exceed the source size.  That will result in
> > > rubbish being read into a register.
> >
> > FYI, I received an automated response saying Tamar is away on vacation
> > with no return date specified.  That means he won't be able to revert
> > the patch.  What do we do now?
>
> The code before the change already looks fishy to me.

Yes, it is fishy so far as the code in the loop relies on alignment
determining a small enough bitsize.  Adding

  if (bytes % UNITS_PER_WORD != 0)
bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);

after bitsize is calculated from alignment would make the code
correct, I believe.  But I think that will fail the testcase Tamar
added.

>   x = expand_normal (src);
&g

Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-03 Thread Richard Biener
On Tue, 3 Apr 2018, Alan Modra wrote:

> On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> > On Fri, 30 Mar 2018, Peter Bergner wrote:
> > 
> > > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > > On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
> > > >> --- a/gcc/expr.c
> > > >> +++ b/gcc/expr.c
> > > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> > > >>  
> > > >>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > > >>dst_words = XALLOCAVEC (rtx, n_regs);
> > > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > >> +  bitsize = BITS_PER_WORD;
> > > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> > > >> (src
> > > >> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > >>  
> > > >>/* Copy the structure BITSIZE bits at a time.  */
> > > >>for (bitpos = 0, xbitpos = padding_correction;
> > > > 
> > > > I believe this patch is wrong.  Please revert.  See the PR84762 
> > > > testcase.
> > > > 
> > > > There are two problems.  Firstly, if padding_correction is non-zero,
> > > > then xbitpos % BITS_PER_WORD is non-zero and in
> > > > 
> > > >   store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > >0, 0, word_mode,
> > > >extract_bit_field (src_word, bitsize,
> > > >   bitpos % BITS_PER_WORD, 1,
> > > >   NULL_RTX, word_mode, 
> > > > word_mode,
> > > >   false, NULL),
> > > >false);
> > > > 
> > > > the stored bit-field exceeds the destination register size.  You could
> > > > fix that by making bitsize the gcd of bitsize and padding_correction.
> > > > 
> > > > However, that doesn't fix the second problem which is that the
> > > > extracted bit-field can exceed the source size.  That will result in
> > > > rubbish being read into a register.
> > > 
> > > FYI, I received an automated response saying Tamar is away on vacation
> > > with no return date specified.  That means he won't be able to revert
> > > the patch.  What do we do now?
> > 
> > The code before the change already looks fishy to me.
> 
> Yes, it is fishy so far as the code in the loop relies on alignment
> determining a small enough bitsize.  Adding
> 
>   if (bytes % UNITS_PER_WORD != 0)
> bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);
> 
> after bitsize is calculated from alignment would make the code
> correct, I believe.  But I think that will fail the testcase Tamar
> added.
> 
> >   x = expand_normal (src);
> > 
> > so what do we expect this to expand to in general?  Fortunately
> > it seems there are exactly two callers so hopefully a
> > gcc_assert (MEM_P (x)) would work?
> > 
> > The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> > 
> > In case x is a MEM we should use MEM_ALIGN and if not then we
> > shouldn't do anything but just use word_mode here.
> 
> No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
> quoted above.

Ah, so it should possibly use TYPE_SIZE instead of TYPE_ALIGN, otherwise
it will handle the over-aligned case (align to 8 bytes, size 4 bytes)
incorrectly as well?

Richard.


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-03 Thread Alan Modra
On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> On Fri, 30 Mar 2018, Peter Bergner wrote:
> 
> > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
> > >> --- a/gcc/expr.c
> > >> +++ b/gcc/expr.c
> > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> > >>  
> > >>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > >>dst_words = XALLOCAVEC (rtx, n_regs);
> > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > >> +  bitsize = BITS_PER_WORD;
> > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> > >> (src
> > >> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > >>  
> > >>/* Copy the structure BITSIZE bits at a time.  */
> > >>for (bitpos = 0, xbitpos = padding_correction;
> > > 
> > > I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> > > 
> > > There are two problems.  Firstly, if padding_correction is non-zero,
> > > then xbitpos % BITS_PER_WORD is non-zero and in
> > > 
> > >   store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > >  0, 0, word_mode,
> > >  extract_bit_field (src_word, bitsize,
> > > bitpos % BITS_PER_WORD, 1,
> > > NULL_RTX, word_mode, word_mode,
> > > false, NULL),
> > >  false);
> > > 
> > > the stored bit-field exceeds the destination register size.  You could
> > > fix that by making bitsize the gcd of bitsize and padding_correction.
> > > 
> > > However, that doesn't fix the second problem which is that the
> > > extracted bit-field can exceed the source size.  That will result in
> > > rubbish being read into a register.
> > 
> > FYI, I received an automated response saying Tamar is away on vacation
> > with no return date specified.  That means he won't be able to revert
> > the patch.  What do we do now?
> 
> The code before the change already looks fishy to me.

Yes, it is fishy so far as the code in the loop relies on alignment
determining a small enough bitsize.  Adding

  if (bytes % UNITS_PER_WORD != 0)
bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);

after bitsize is calculated from alignment would make the code
correct, I believe.  But I think that will fail the testcase Tamar
added.

>   x = expand_normal (src);
> 
> so what do we expect this to expand to in general?  Fortunately
> it seems there are exactly two callers so hopefully a
> gcc_assert (MEM_P (x)) would work?
> 
> The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> 
> In case x is a MEM we should use MEM_ALIGN and if not then we
> shouldn't do anything but just use word_mode here.

No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
quoted above.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-04-03 Thread Richard Biener
On Fri, 30 Mar 2018, Peter Bergner wrote:

> On 3/29/18 9:35 AM, Alan Modra wrote:
> > On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
> >> --- a/gcc/expr.c
> >> +++ b/gcc/expr.c
> >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> >>  
> >>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> >>dst_words = XALLOCAVEC (rtx, n_regs);
> >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> >> +  bitsize = BITS_PER_WORD;
> >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> >> (src
> >> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> >>  
> >>/* Copy the structure BITSIZE bits at a time.  */
> >>for (bitpos = 0, xbitpos = padding_correction;
> > 
> > I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> > 
> > There are two problems.  Firstly, if padding_correction is non-zero,
> > then xbitpos % BITS_PER_WORD is non-zero and in
> > 
> >   store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> >0, 0, word_mode,
> >extract_bit_field (src_word, bitsize,
> >   bitpos % BITS_PER_WORD, 1,
> >   NULL_RTX, word_mode, word_mode,
> >   false, NULL),
> >false);
> > 
> > the stored bit-field exceeds the destination register size.  You could
> > fix that by making bitsize the gcd of bitsize and padding_correction.
> > 
> > However, that doesn't fix the second problem which is that the
> > extracted bit-field can exceed the source size.  That will result in
> > rubbish being read into a register.
> 
> FYI, I received an automated response saying Tamar is away on vacation
> with no return date specified.  That means he won't be able to revert
> the patch.  What do we do now?

The code before the change already looks fishy to me.

  x = expand_normal (src);

so what do we expect this to expand to in general?  Fortunately
it seems there are exactly two callers so hopefully a
gcc_assert (MEM_P (x)) would work?

The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).

In case x is a MEM we should use MEM_ALIGN and if not then we
shouldn't do anything but just use word_mode here.


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-03-30 Thread Peter Bergner
On 3/29/18 9:35 AM, Alan Modra wrote:
> On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
>>  
>>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
>>dst_words = XALLOCAVEC (rtx, n_regs);
>> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>> +  bitsize = BITS_PER_WORD;
>> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
>> (src
>> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>>  
>>/* Copy the structure BITSIZE bits at a time.  */
>>for (bitpos = 0, xbitpos = padding_correction;
> 
> I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> 
> There are two problems.  Firstly, if padding_correction is non-zero,
> then xbitpos % BITS_PER_WORD is non-zero and in
> 
>   store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
>  0, 0, word_mode,
>  extract_bit_field (src_word, bitsize,
> bitpos % BITS_PER_WORD, 1,
> NULL_RTX, word_mode, word_mode,
> false, NULL),
>  false);
> 
> the stored bit-field exceeds the destination register size.  You could
> fix that by making bitsize the gcd of bitsize and padding_correction.
> 
> However, that doesn't fix the second problem which is that the
> extracted bit-field can exceed the source size.  That will result in
> rubbish being read into a register.

FYI, I received an automated response saying Tamar is away on vacation
with no return date specified.  That means he won't be able to revert
the patch.  What do we do now?

Peter




Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-03-30 Thread Peter Bergner
On 3/29/18 9:35 AM, Alan Modra wrote:
> On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
>>  
>>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
>>dst_words = XALLOCAVEC (rtx, n_regs);
>> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>> +  bitsize = BITS_PER_WORD;
>> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
>> (src
>> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>>  
>>/* Copy the structure BITSIZE bits at a time.  */
>>for (bitpos = 0, xbitpos = padding_correction;
> 
> I believe this patch is wrong.  Please revert.  See the PR84762 testcase.

I'm seeing this patch cause an error too on different test case:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85123#c2

I've closed PR85123 as a dup of PR84762, even though the test case
in PR84762 is failing for -m32 while the test case in PR85123 is
failing for -m64, since they're both caused by the same bug.

Peter



Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2018-03-29 Thread Alan Modra
On Thu, Nov 16, 2017 at 03:27:01PM +, Tamar Christina wrote:
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
>  
>n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
>dst_words = XALLOCAVEC (rtx, n_regs);
> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> +  bitsize = BITS_PER_WORD;
> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> (src
> +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>  
>/* Copy the structure BITSIZE bits at a time.  */
>for (bitpos = 0, xbitpos = padding_correction;

I believe this patch is wrong.  Please revert.  See the PR84762 testcase.

There are two problems.  Firstly, if padding_correction is non-zero,
then xbitpos % BITS_PER_WORD is non-zero and in

  store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
   0, 0, word_mode,
   extract_bit_field (src_word, bitsize,
  bitpos % BITS_PER_WORD, 1,
  NULL_RTX, word_mode, word_mode,
  false, NULL),
   false);

the stored bit-field exceeds the destination register size.  You could
fix that by making bitsize the gcd of bitsize and padding_correction.

However, that doesn't fix the second problem which is that the
extracted bit-field can exceed the source size.  That will result in
rubbish being read into a register.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-17 Thread Richard Biener
On Thu, 16 Nov 2017, Tamar Christina wrote:

> Hi Richard,
> 
> > 
> > I'd have made it
> > 
> >   if { ([is-effective-target non_strict_align]
> > && ! ( [istarget ...] || ))
> > 
> > thus default it to 1 for non-strict-align targets.
> > 
> 
> Fair, I've switched it to a black list and have excluded the only one I know
> should not work. Most of the rest will get blocked by non_strict_align and 
> for the
> few others I'll adjust the testcase accordingly if there are any issues.
> 
> > > But this also raises a question, some targets have defined 
> > > SLOW_UNALIGNED_ACCESS
> > > in a way that uses only internal state to determine the value where 
> > > STRICT_ALIGNMENT
> > > is essentially ignored. e.g. PowerPC and riscv.
> > > 
> > > The code generation *might* change for them but the tests won't run. I 
> > > see now way to
> > > make the test accurate (as in, runs in all cases where the codegen 
> > > changed)
> > > unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> > > 
> > > Would this be the way to go?
> > 
> > I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
> > a certain alignment.
> > 
> 
> Ah, right! that slipped my mind for a bit.
> 
> Ok for trunk?

Ok.

Richard.


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-16 Thread Tamar Christina
Hi Richard,

> 
> I'd have made it
> 
>   if { ([is-effective-target non_strict_align]
> && ! ( [istarget ...] || ))
> 
> thus default it to 1 for non-strict-align targets.
> 

Fair, I've switched it to a black list and have excluded the only one I know
should not work. Most of the rest will get blocked by non_strict_align and for 
the
few others I'll adjust the testcase accordingly if there are any issues.

> > But this also raises a question, some targets have defined 
> > SLOW_UNALIGNED_ACCESS
> > in a way that uses only internal state to determine the value where 
> > STRICT_ALIGNMENT
> > is essentially ignored. e.g. PowerPC and riscv.
> > 
> > The code generation *might* change for them but the tests won't run. I see 
> > now way to
> > make the test accurate (as in, runs in all cases where the codegen changed)
> > unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> > 
> > Would this be the way to go?
> 
> I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
> a certain alignment.
> 

Ah, right! that slipped my mind for a bit.

Ok for trunk?

Thanks for the review,
Tamar
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..3b200964462a82ebbe68bbe798cc91ed27337034 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item word_mode_no_slow_unalign
+Target does not have slow unaligned access when doing word size accesses.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a81c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src
+bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index ..17b956022e4efb37044c7a74cc8baa9fb779221a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target word_mode_no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-...@prep.ai.mit.edu  */
+
+#include 
+
+struct struct3 { char a, b, c; };
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+
+struct struct3  fun3()
+{
+  return foo3;
+}
+
+#ifdef PROTOTYPES
+void Fun3(struct struct3 foo3)
+#else
+void Fun3(foo3)
+ struct struct3 foo3;
+#endif
+{
+  L3 = foo3;
+}
+
+int main()
+{
+  struct struct3 x = fun3();
+
+  printf("a:%c, b:%c, c:%c\n", x.a, x.b, x.c);
+}
+
+/* { dg-final { scan-rtl-dump-not {zero_extract:.+\[\s*foo3\s*\]} "final" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b6f9e51c4817cf8235c8e33b14e2763308eb482a..03413c323d00e88872879a741ab3c015e052311d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6037,6 +6037,31 @@ proc check_effective_target_unaligned_stack { } {
 return $et_unaligned_stack_saved
 }
 
+# Return 1 if the target plus current options does not have
+# slow unaligned access when using word size accesses.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_word_mode_no_slow_unalign { } {
+global et_word_mode_no_slow_unalign_saved
+global et_index
+
+if [info exists et_word_mode_no_slow_unalign_saved($et_index)] {
+verbose "check_effective_target_word_mode_no_slow_unalign: \
+ using cached result" 

Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-16 Thread Richard Biener
On Thu, 16 Nov 2017, Tamar Christina wrote:

> > > > 
> > > > I see.  But then the slow_unaligned_access implementation should use
> > > > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > > > defaulted to STRICT_ALIGN.
> > > > 
> > > > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > > > modes it would also make sense to be more specific for the testcase in
> > > > question, like word_mode_slow_unaligned_access to tell this only 
> > > > applies to
> > > > word_mode?
> > > 
> > > Ah, that's fair enough. I've updated the patch and the new changelog is:
> > 
> > Did you attach the old patch? I don't see strict_aling being tested in
> > the word_mode_np_slow_unalign test.
> > 
> 
> Sorry! I misunderstood your previous email. I've added the check accordingly.

+if { ([istarget x86_64-*-*]
+  || [istarget aarch64*-*-*])
+&& [is-effective-target non_strict_align]
+   } {
+set et_word_mode_no_slow_unalign_saved($et_index) 1
+}

I'd have made it

  if { ([is-effective-target non_strict_align]
&& ! ( [istarget ...] || ))

thus default it to 1 for non-strict-align targets.

> But this also raises a question, some targets have defined 
> SLOW_UNALIGNED_ACCESS
> in a way that uses only internal state to determine the value where 
> STRICT_ALIGNMENT
> is essentially ignored. e.g. PowerPC and riscv.
> 
> The code generation *might* change for them but the tests won't run. I see 
> now way to
> make the test accurate (as in, runs in all cases where the codegen changed)
> unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> 
> Would this be the way to go?

I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
a certain alignment.

Richard.

> Thanks,
> Tamar
> 
> > Richard.
> > 
> > > 
> > > gcc/
> > > 2017-11-15  Tamar Christina  
> > > 
> > >   * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > >   with fast unaligned access.
> > >   * doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
> > >   
> > > gcc/testsuite/
> > > 2017-11-15  Tamar Christina  
> > > 
> > >   * gcc.dg/struct-simple.c: New.
> > >   * lib/target-supports.exp
> > >   (check_effective_target_word_mode_no_slow_unalign): New.
> > > 
> > > Ok for trunk?
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > > 
> > > > Thanks,
> > > > Richard.
> > > > 
> > > > > Thanks,
> > > > > Tamar
> > > > > >
> > > > > > Otherwise the expr.c change looks ok.
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > >
> > > > > > > gcc/
> > > > > > > 2017-11-14  Tamar Christina  
> > > > > > >
> > > > > > >   * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > > > >   with fast unaligned access.
> > > > > > >   * doc/sourcebuild.texi (no_slow_unalign): New.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > 2017-11-14  Tamar Christina  
> > > > > > >
> > > > > > >   * gcc.dg/struct-simple.c: New.
> > > > > > >   * lib/target-supports.exp
> > > > > > >   (check_effective_target_no_slow_unalign): New.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener 
> > > > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > > > > Norton, HRB 21284 (AG Nuernberg)
> > > > >
> > > > >
> > > > 
> > > > --
> > > > Richard Biener 
> > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > > > HRB 21284 (AG Nuernberg)
> > > 
> > 
> > -- 
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> > 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-16 Thread Tamar Christina
> > > 
> > > I see.  But then the slow_unaligned_access implementation should use
> > > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > > defaulted to STRICT_ALIGN.
> > > 
> > > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > > modes it would also make sense to be more specific for the testcase in
> > > question, like word_mode_slow_unaligned_access to tell this only applies 
> > > to
> > > word_mode?
> > 
> > Ah, that's fair enough. I've updated the patch and the new changelog is:
> 
> Did you attach the old patch? I don't see strict_aling being tested in
> the word_mode_np_slow_unalign test.
> 

Sorry! I misunderstood your previous email. I've added the check accordingly.

But this also raises a question, some targets have defined SLOW_UNALIGNED_ACCESS
in a way that uses only internal state to determine the value where 
STRICT_ALIGNMENT
is essentially ignored. e.g. PowerPC and riscv.

The code generation *might* change for them but the tests won't run. I see now 
way to
make the test accurate (as in, runs in all cases where the codegen changed)
unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.

Would this be the way to go?

Thanks,
Tamar

> Richard.
> 
> > 
> > gcc/
> > 2017-11-15  Tamar Christina  
> > 
> > * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > with fast unaligned access.
> > * doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
> > 
> > gcc/testsuite/
> > 2017-11-15  Tamar Christina  
> > 
> > * gcc.dg/struct-simple.c: New.
> > * lib/target-supports.exp
> > (check_effective_target_word_mode_no_slow_unalign): New.
> > 
> > Ok for trunk?
> > 
> > Thanks,
> > Tamar
> > 
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > > Thanks,
> > > > Tamar
> > > > >
> > > > > Otherwise the expr.c change looks ok.
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > >
> > > > > > gcc/
> > > > > > 2017-11-14  Tamar Christina  
> > > > > >
> > > > > > * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > > > with fast unaligned access.
> > > > > > * doc/sourcebuild.texi (no_slow_unalign): New.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > > 2017-11-14  Tamar Christina  
> > > > > >
> > > > > > * gcc.dg/struct-simple.c: New.
> > > > > > * lib/target-supports.exp
> > > > > > (check_effective_target_no_slow_unalign): New.
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener 
> > > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > > > Norton, HRB 21284 (AG Nuernberg)
> > > >
> > > >
> > > 
> > > --
> > > Richard Biener 
> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > > HRB 21284 (AG Nuernberg)
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)

-- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..3b200964462a82ebbe68bbe798cc91ed27337034 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item word_mode_no_slow_unalign
+Target does not have slow unaligned access when doing word size accesses.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a81c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src
+bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index ..17b956022e4efb37044c7a74cc8baa9fb779221a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target word_mode_no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free 

RE: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-15 Thread Richard Biener
On Wed, 15 Nov 2017, tamar.christ...@arm.com wrote:

> > -Original Message-
> > From: Richard Biener [mailto:rguent...@suse.de]
> > Sent: Wednesday, November 15, 2017 12:50
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com;
> > i...@airs.com
> > Subject: RE: [PATCH][GCC][mid-end] Allow larger copies when target
> > supports unaligned access [Patch (1/2)]
> > 
> > On Wed, 15 Nov 2017, Tamar Christina wrote:
> > 
> > >
> > >
> > > > -Original Message-
> > > > From: Richard Biener [mailto:rguent...@suse.de]
> > > > Sent: Wednesday, November 15, 2017 08:24
> > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com;
> > > > i...@airs.com
> > > > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> > > > supports unaligned access [Patch (1/2)]
> > > >
> > > > On Tue, 14 Nov 2017, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch allows larger bitsizes to be used as copy size when the
> > > > > target does not have SLOW_UNALIGNED_ACCESS.
> > > > >
> > > > > fun3:
> > > > >   adrpx2, .LANCHOR0
> > > > >   add x2, x2, :lo12:.LANCHOR0
> > > > >   mov x0, 0
> > > > >   sub sp, sp, #16
> > > > >   ldrhw1, [x2, 16]
> > > > >   ldrbw2, [x2, 18]
> > > > >   add sp, sp, 16
> > > > >   bfi x0, x1, 0, 8
> > > > >   ubfxx1, x1, 8, 8
> > > > >   bfi x0, x1, 8, 8
> > > > >   bfi x0, x2, 16, 8
> > > > >   ret
> > > > >
> > > > > is turned into
> > > > >
> > > > > fun3:
> > > > >   adrpx0, .LANCHOR0
> > > > >   add x0, x0, :lo12:.LANCHOR0
> > > > >   sub sp, sp, #16
> > > > >   ldrhw1, [x0, 16]
> > > > >   ldrbw0, [x0, 18]
> > > > >   strhw1, [sp, 8]
> > > > >   strbw0, [sp, 10]
> > > > >   ldr w0, [sp, 8]
> > > > >   add sp, sp, 16
> > > > >   ret
> > > > >
> > > > > which avoids the bfi's for a simple 3 byte struct copy.
> > > > >
> > > > > Regression tested on aarch64-none-linux-gnu and
> > > > > x86_64-pc-linux-gnu and
> > > > no regressions.
> > > > >
> > > > > This patch is just splitting off from the previous combined patch
> > > > > with
> > > > > AArch64 and adding a testcase.
> > > > >
> > > > > I assume Jeff's ACK from
> > > > > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
> > > > > valid as
> > > > the code did not change.
> > > >
> > > > Given your no_slow_unalign isn't mode specific can't you use the
> > > > existing non_strict_align?
> > >
> > > No because non_strict_align checks if the target supports unaligned
> > > access at all,
> > >
> > > This no_slow_unalign corresponds instead to the target
> > > slow_unaligned_access which checks that the access you want to make
> > > has a greater cost than doing an aligned access. ARM for instance
> > > always return 1 (value of STRICT_ALIGNMENT) for slow_unaligned_access
> > > while for non_strict_align it may return 0 or 1 based on the options
> > provided to the compiler.
> > >
> > > The problem is I have no way to test STRICT_ALIGNMENT or
> > > slow_unaligned_access So I had to hardcode some targets that I know it
> > does work on.
> > 
> > I see.  But then the slow_unaligned_access implementation should use
> > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > defaulted to STRICT_ALIGN.
> > 
> > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > modes it would also make sense to be more specific for the testcase in
> > question, like word_mode_slow_unaligned_access to tell this only applies to
> > word_mode?
> 
> Ah, that's fair enough. I've updated the patch and the new changelog is:

Did you attach the old p

RE: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-15 Thread tamar . christina
> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Wednesday, November 15, 2017 12:50
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com;
> i...@airs.com
> Subject: RE: [PATCH][GCC][mid-end] Allow larger copies when target
> supports unaligned access [Patch (1/2)]
> 
> On Wed, 15 Nov 2017, Tamar Christina wrote:
> 
> >
> >
> > > -Original Message-
> > > From: Richard Biener [mailto:rguent...@suse.de]
> > > Sent: Wednesday, November 15, 2017 08:24
> > > To: Tamar Christina <tamar.christ...@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com;
> > > i...@airs.com
> > > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> > > supports unaligned access [Patch (1/2)]
> > >
> > > On Tue, 14 Nov 2017, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch allows larger bitsizes to be used as copy size when the
> > > > target does not have SLOW_UNALIGNED_ACCESS.
> > > >
> > > > fun3:
> > > > adrpx2, .LANCHOR0
> > > > add x2, x2, :lo12:.LANCHOR0
> > > > mov x0, 0
> > > > sub sp, sp, #16
> > > > ldrhw1, [x2, 16]
> > > > ldrbw2, [x2, 18]
> > > > add sp, sp, 16
> > > > bfi x0, x1, 0, 8
> > > > ubfxx1, x1, 8, 8
> > > > bfi x0, x1, 8, 8
> > > > bfi x0, x2, 16, 8
> > > > ret
> > > >
> > > > is turned into
> > > >
> > > > fun3:
> > > > adrpx0, .LANCHOR0
> > > > add x0, x0, :lo12:.LANCHOR0
> > > > sub sp, sp, #16
> > > > ldrhw1, [x0, 16]
> > > > ldrbw0, [x0, 18]
> > > > strhw1, [sp, 8]
> > > > strbw0, [sp, 10]
> > > > ldr w0, [sp, 8]
> > > > add sp, sp, 16
> > > > ret
> > > >
> > > > which avoids the bfi's for a simple 3 byte struct copy.
> > > >
> > > > Regression tested on aarch64-none-linux-gnu and
> > > > x86_64-pc-linux-gnu and
> > > no regressions.
> > > >
> > > > This patch is just splitting off from the previous combined patch
> > > > with
> > > > AArch64 and adding a testcase.
> > > >
> > > > I assume Jeff's ACK from
> > > > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
> > > > valid as
> > > the code did not change.
> > >
> > > Given your no_slow_unalign isn't mode specific can't you use the
> > > existing non_strict_align?
> >
> > No because non_strict_align checks if the target supports unaligned
> > access at all,
> >
> > This no_slow_unalign corresponds instead to the target
> > slow_unaligned_access which checks that the access you want to make
> > has a greater cost than doing an aligned access. ARM for instance
> > always return 1 (value of STRICT_ALIGNMENT) for slow_unaligned_access
> > while for non_strict_align it may return 0 or 1 based on the options
> provided to the compiler.
> >
> > The problem is I have no way to test STRICT_ALIGNMENT or
> > slow_unaligned_access So I had to hardcode some targets that I know it
> does work on.
> 
> I see.  But then the slow_unaligned_access implementation should use
> non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> defaulted to STRICT_ALIGN.
> 
> Given that SLOW_UNALIGNED_ACCESS has different values for different
> modes it would also make sense to be more specific for the testcase in
> question, like word_mode_slow_unaligned_access to tell this only applies to
> word_mode?

Ah, that's fair enough. I've updated the patch and the new changelog is:


gcc/
2017-11-15  Tamar Christina  <tamar.christ...@arm.com>

* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
with fast unaligned access.
* doc/sourcebuild.texi (word_mode_no_slow_unalign): New.

gcc/testsuite/
2017-11-15  Tamar Christina  <tamar.christ...@arm.com>

* gcc.dg/struct-simple.c: New.
* lib/target-supports.exp
(check_effective_target_word_mode_no_slow_unalign): New.

Ok for trunk?

Thanks,
Tamar

> 
> Thanks,
> Richard.
> 
> &

RE: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-15 Thread Richard Biener
On Wed, 15 Nov 2017, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener [mailto:rguent...@suse.de]
> > Sent: Wednesday, November 15, 2017 08:24
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com;
> > i...@airs.com
> > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> > supports unaligned access [Patch (1/2)]
> > 
> > On Tue, 14 Nov 2017, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch allows larger bitsizes to be used as copy size when the
> > > target does not have SLOW_UNALIGNED_ACCESS.
> > >
> > > fun3:
> > >   adrpx2, .LANCHOR0
> > >   add x2, x2, :lo12:.LANCHOR0
> > >   mov x0, 0
> > >   sub sp, sp, #16
> > >   ldrhw1, [x2, 16]
> > >   ldrbw2, [x2, 18]
> > >   add sp, sp, 16
> > >   bfi x0, x1, 0, 8
> > >   ubfxx1, x1, 8, 8
> > >   bfi x0, x1, 8, 8
> > >   bfi x0, x2, 16, 8
> > >   ret
> > >
> > > is turned into
> > >
> > > fun3:
> > >   adrpx0, .LANCHOR0
> > >   add x0, x0, :lo12:.LANCHOR0
> > >   sub sp, sp, #16
> > >   ldrhw1, [x0, 16]
> > >   ldrbw0, [x0, 18]
> > >   strhw1, [sp, 8]
> > >   strbw0, [sp, 10]
> > >   ldr w0, [sp, 8]
> > >   add sp, sp, 16
> > >   ret
> > >
> > > which avoids the bfi's for a simple 3 byte struct copy.
> > >
> > > Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and
> > no regressions.
> > >
> > > This patch is just splitting off from the previous combined patch with
> > > AArch64 and adding a testcase.
> > >
> > > I assume Jeff's ACK from
> > > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still valid as
> > the code did not change.
> > 
> > Given your no_slow_unalign isn't mode specific can't you use the existing
> > non_strict_align?
> 
> No because non_strict_align checks if the target supports unaligned access at 
> all,
> 
> This no_slow_unalign corresponds instead to the target slow_unaligned_access
> which checks that the access you want to make has a greater cost than doing an
> aligned access. ARM for instance always return 1 (value of STRICT_ALIGNMENT)
> for slow_unaligned_access while for non_strict_align it may return 0 or 1 
> based
> on the options provided to the compiler.
> 
> The problem is I have no way to test STRICT_ALIGNMENT or slow_unaligned_access
> So I had to hardcode some targets that I know it does work on.

I see.  But then the slow_unaligned_access implementation should use
non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is defaulted
to STRICT_ALIGN.

Given that SLOW_UNALIGNED_ACCESS has different values for different modes
it would also make sense to be more specific for the testcase in question,
like word_mode_slow_unaligned_access to tell this only applies to 
word_mode?

Thanks,
Richard.

> Thanks,
> Tamar
> > 
> > Otherwise the expr.c change looks ok.
> > 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > >
> > > gcc/
> > > 2017-11-14  Tamar Christina  <tamar.christ...@arm.com>
> > >
> > >   * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > >   with fast unaligned access.
> > >   * doc/sourcebuild.texi (no_slow_unalign): New.
> > >
> > > gcc/testsuite/
> > > 2017-11-14  Tamar Christina  <tamar.christ...@arm.com>
> > >
> > >   * gcc.dg/struct-simple.c: New.
> > >   * lib/target-supports.exp
> > >   (check_effective_target_no_slow_unalign): New.
> > >
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


RE: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-15 Thread Tamar Christina


> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Wednesday, November 15, 2017 08:24
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; l...@redhat.com;
> i...@airs.com
> Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> supports unaligned access [Patch (1/2)]
> 
> On Tue, 14 Nov 2017, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch allows larger bitsizes to be used as copy size when the
> > target does not have SLOW_UNALIGNED_ACCESS.
> >
> > fun3:
> > adrpx2, .LANCHOR0
> > add x2, x2, :lo12:.LANCHOR0
> > mov x0, 0
> > sub sp, sp, #16
> > ldrhw1, [x2, 16]
> > ldrbw2, [x2, 18]
> > add sp, sp, 16
> > bfi x0, x1, 0, 8
> > ubfxx1, x1, 8, 8
> > bfi x0, x1, 8, 8
> > bfi x0, x2, 16, 8
> > ret
> >
> > is turned into
> >
> > fun3:
> > adrpx0, .LANCHOR0
> > add x0, x0, :lo12:.LANCHOR0
> > sub sp, sp, #16
> > ldrhw1, [x0, 16]
> > ldrbw0, [x0, 18]
> > strhw1, [sp, 8]
> > strbw0, [sp, 10]
> > ldr w0, [sp, 8]
> > add sp, sp, 16
> > ret
> >
> > which avoids the bfi's for a simple 3 byte struct copy.
> >
> > Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and
> no regressions.
> >
> > This patch is just splitting off from the previous combined patch with
> > AArch64 and adding a testcase.
> >
> > I assume Jeff's ACK from
> > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still valid as
> the code did not change.
> 
> Given your no_slow_unalign isn't mode specific can't you use the existing
> non_strict_align?

No because non_strict_align checks if the target supports unaligned access at 
all,

This no_slow_unalign corresponds instead to the target slow_unaligned_access
which checks that the access you want to make has a greater cost than doing an
aligned access. ARM for instance always return 1 (value of STRICT_ALIGNMENT)
for slow_unaligned_access while for non_strict_align it may return 0 or 1 based
on the options provided to the compiler.

The problem is I have no way to test STRICT_ALIGNMENT or slow_unaligned_access
So I had to hardcode some targets that I know it does work on.

Thanks,
Tamar
> 
> Otherwise the expr.c change looks ok.
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> >
> >
> > gcc/
> > 2017-11-14  Tamar Christina  <tamar.christ...@arm.com>
> >
> > * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > with fast unaligned access.
> > * doc/sourcebuild.texi (no_slow_unalign): New.
> >
> > gcc/testsuite/
> > 2017-11-14  Tamar Christina  <tamar.christ...@arm.com>
> >
> > * gcc.dg/struct-simple.c: New.
> > * lib/target-supports.exp
> > (check_effective_target_no_slow_unalign): New.
> >
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nuernberg)


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-15 Thread Richard Biener
On Tue, 14 Nov 2017, Tamar Christina wrote:

> Hi All,
> 
> This patch allows larger bitsizes to be used as copy size
> when the target does not have SLOW_UNALIGNED_ACCESS.
> 
> fun3:
>   adrpx2, .LANCHOR0
>   add x2, x2, :lo12:.LANCHOR0
>   mov x0, 0
>   sub sp, sp, #16
>   ldrhw1, [x2, 16]
>   ldrbw2, [x2, 18]
>   add sp, sp, 16
>   bfi x0, x1, 0, 8
>   ubfxx1, x1, 8, 8
>   bfi x0, x1, 8, 8
>   bfi x0, x2, 16, 8
>   ret
> 
> is turned into
> 
> fun3:
>   adrpx0, .LANCHOR0
>   add x0, x0, :lo12:.LANCHOR0
>   sub sp, sp, #16
>   ldrhw1, [x0, 16]
>   ldrbw0, [x0, 18]
>   strhw1, [sp, 8]
>   strbw0, [sp, 10]
>   ldr w0, [sp, 8]
>   add sp, sp, 16
>   ret
> 
> which avoids the bfi's for a simple 3 byte struct copy.
> 
> Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and no 
> regressions.
> 
> This patch is just splitting off from the previous combined patch with 
> AArch64 and adding
> a testcase.
> 
> I assume Jeff's ACK from 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
> valid as the code did not change.

Given your no_slow_unalign isn't mode specific can't you use the existing
non_strict_align?

Otherwise the expr.c change looks ok.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> 
> gcc/
> 2017-11-14  Tamar Christina  
> 
>   * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
>   with fast unaligned access.
>   * doc/sourcebuild.texi (no_slow_unalign): New.
>   
> gcc/testsuite/
> 2017-11-14  Tamar Christina  
> 
>   * gcc.dg/struct-simple.c: New.
>   * lib/target-supports.exp
>   (check_effective_target_no_slow_unalign): New.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)