Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-21 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 Seems like the thread might have died down, so just wanted to ping it.
 As Marcus says, this is holding up other patches so it'd be good to get
 something in soon.  Would it be OK to commit the original patch or should
 we wait?

 Yes, go ahead, but add a FIXME or ??? comment.

Thanks, here's what I installed.

Richard


gcc/
2015-01-25  Alan Hayward  alan.hayw...@arm.com

* rtlanal.c (subreg_get_info): Exit early for simple and common
cases.

Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   2015-01-21 17:06:33.594664514 +
+++ gcc/rtlanal.c   2015-01-21 17:10:22.863830284 +
@@ -3440,6 +3440,22 @@ subreg_get_info (unsigned int xregno, ma
  info-offset = offset / regsize_xmode;
  return;
}
+  /* Quick exit for the simple and common case of extracting whole
+subregisters from a multiregister value.  */
+  /* ??? It would be better to integrate this into the code below,
+if we can generalize the concept enough and figure out how
+odd-sized modes can coexist with the other weird cases we support.  */
+  if (!rknown
+  WORDS_BIG_ENDIAN == REG_WORDS_BIG_ENDIAN
+  regsize_xmode == regsize_ymode
+  (offset % regsize_ymode) == 0)
+   {
+ info-representable_p = true;
+ info-nregs = nregs_ymode;
+ info-offset = offset / regsize_ymode;
+ gcc_assert (info-offset + info-nregs = nregs_xmode);
+ return;
+   }
 }
 
   /* Lowpart subregs are otherwise valid.  */



Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-20 Thread Richard Sandiford
Seems like the thread might have died down, so just wanted to ping it.
As Marcus says, this is holding up other patches so it'd be good to get
something in soon.  Would it be OK to commit the original patch or should
we wait?

Marcus Shawcroft marcus.shawcr...@gmail.com writes:
 On 14 January 2015 at 07:35, Jeff Law l...@redhat.com wrote:
 On 01/13/15 11:55, Eric Botcazou wrote:


 (1) we have a non-paradoxical subreg;
 (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
  hard registers (no padding or unused upper bits);
 (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
  of bytes (X) in each constituent hard register;
 (4) the offset is a multiple of X, i.e. the data we're accessing
  is aligned to a register boundary; and
 (5) endianness is regular (no differences between words and bytes,
  or between registers and memory)


 OK, that's a nice translation of the new code. :-)

 It seems to me that the patch wants to extend the support of generic
 subregs
 to modes whose sizes are not multiple of each other, which is a
 requirement of
 the existing code, but does that in a very specific case for the sake of
 the
 ARM port without saying where all the above restrictions come from.

 Basically we're lifting the restriction that the the sizes are multiples of
 each other.  The requirements above are the set where we know it will work.
 They are target independent, but happen to match what the ARM needs.

 The certainly do short circuit the meat of the function, that's the whole
 point, there's this set of conditions under which we know this will work and
 when they hold, we bypass.

 Now one could argue that instead of bypassing we should put the code to
 handle this situation further down.  I'd be leery of doing that just from a
 complexity standpoint.  But one could also argue that short circuiting like
 the patch does adds complexity as well and may be a bit kludgy.

Yeah, I'm worried about the complexity too.  We allow subregs that
have padding and subregs where the number of bytes in the mode doesn't
divide equally between the number of registers.  We also have subregs where
a DImode value in R can take a different number of registers from a DFmode
value in R, despite the two modes having the same number of bits.  I've
no idea how we'd generalise the code so that those cases and the new one
just fall out as particular inputs to an overarching equation.  Or how
we make sure that the equation doesn't give nonsense results for cases
that would be better off triggering an abort (e.g. DFmode subregs of
CImode when DImode and DFmode occupy different numbers of registers).

I don't think we want to allow subregs in all cases where there is
padding.  We hit a similar case with 8-byte subregs of 24-byte values
stored in 16-byte registers (DImode, EImode and TImode respectively).
That doesn't do what we want because all three DImode pieces of the
EImode aren't independently addressable, so the abort actually helped.

TBH I find even the current code too hard to understand.  I can plug
specific inputs in and follow what happens, but I don't have a feel for
why that's the right way of handling all possible inputs.

In some ways I think we've made life hard for ourselves by trying to
implement all these rules in a target-independent way.  Subregs on
memory are easy (even though we should generally be avoiding them :-):
they start SUBREG_BYTE bytes into the MEM and occupy the number of bytes
in the outer mode.  At least AFAIK, we never have situation where an N-bit
float can occupy a different number of memory bytes from an N-bit integer.
But treating REGs as an image of memory (which I think is effectively
what we're doing) has caused problems.

As well as being complicated, doing things this way is pretty restrictive.
One of the main uses of CANNOT_CHANGE_MODE_CLASS seems to be to work
around cases where the generic rules get it wrong.  Sometimes it seems
like it would be better to let the target define which subregs it can
form and on which registers.  It would be less complicated, more general,
nd (in cases where it allows C_C_M_C to be removed) hopefully more optimal.

 Maybe the way forward here is for someone to try and integrate this support
 in the main part of the code and see how it looks.  Then we can pick one.

I still have a mental block on how to do that :-)

 The downside is since this probably isn't a regression that work would need
 to happen quickly to make it into gcc-5.

 Which leads to another option, get the release managers to sign off on the
 kludge after gcc-5 branches and only install the kludge on the gcc-5 branch
 and insisting the other solution go in for gcc-6 and beyond.  Not sure if
 they'd do that, but it's a discussion that could happen.

 This issue is currently gating a number of patches that get big endian
 working on aarch64 (all of which are on the list), it would be good if
 we could get this addressed in some form in gcc-5 

Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-20 Thread Eric Botcazou
 Seems like the thread might have died down, so just wanted to ping it.
 As Marcus says, this is holding up other patches so it'd be good to get
 something in soon.  Would it be OK to commit the original patch or should
 we wait?

Yes, go ahead, but add a FIXME or ??? comment.

-- 
Eric Botcazou


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-14 Thread Marcus Shawcroft
On 14 January 2015 at 07:35, Jeff Law l...@redhat.com wrote:
 On 01/13/15 11:55, Eric Botcazou wrote:


 (1) we have a non-paradoxical subreg;
 (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
  hard registers (no padding or unused upper bits);
 (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
  of bytes (X) in each constituent hard register;
 (4) the offset is a multiple of X, i.e. the data we're accessing
  is aligned to a register boundary; and
 (5) endianness is regular (no differences between words and bytes,
  or between registers and memory)


 OK, that's a nice translation of the new code. :-)

 It seems to me that the patch wants to extend the support of generic
 subregs
 to modes whose sizes are not multiple of each other, which is a
 requirement of
 the existing code, but does that in a very specific case for the sake of
 the
 ARM port without saying where all the above restrictions come from.

 Basically we're lifting the restriction that the the sizes are multiples of
 each other.  The requirements above are the set where we know it will work.
 They are target independent, but happen to match what the ARM needs.

 The certainly do short circuit the meat of the function, that's the whole
 point, there's this set of conditions under which we know this will work and
 when they hold, we bypass.

 Now one could argue that instead of bypassing we should put the code to
 handle this situation further down.  I'd be leery of doing that just from a
 complexity standpoint.  But one could also argue that short circuiting like
 the patch does adds complexity as well and may be a bit kludgy.

 Maybe the way forward here is for someone to try and integrate this support
 in the main part of the code and see how it looks.  Then we can pick one.

 The downside is since this probably isn't a regression that work would need
 to happen quickly to make it into gcc-5.

 Which leads to another option, get the release managers to sign off on the
 kludge after gcc-5 branches and only install the kludge on the gcc-5 branch
 and insisting the other solution go in for gcc-6 and beyond.  Not sure if
 they'd do that, but it's a discussion that could happen.

This issue is currently gating a number of patches that get big endian
working on aarch64 (all of which are on the list), it would be good if
we could get this addressed in some form in gcc-5 rather than put out
a second release of gcc with borked BE aarch64 support.

Cheers
/Marcus


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-13 Thread Jeff Law

On 01/10/15 06:05, Richard Sandiford wrote:

Sorry for the slow response.  Jeff has approved the patch in the
meantime, but I didn't want to go ahead and apply it while there
was still disagreement...
Thanks.  I didn't realize there was a disagreement when I approved. 
Let's continue to hash this out a bit in the hopes that we can all get 
to a place where we're comfortable with the final change, whatever it 
happens to me.


jeff



Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-13 Thread Jeff Law

On 01/13/15 11:55, Eric Botcazou wrote:



(1) we have a non-paradoxical subreg;
(2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
 hard registers (no padding or unused upper bits);
(3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
 of bytes (X) in each constituent hard register;
(4) the offset is a multiple of X, i.e. the data we're accessing
 is aligned to a register boundary; and
(5) endianness is regular (no differences between words and bytes,
 or between registers and memory)


OK, that's a nice translation of the new code. :-)

It seems to me that the patch wants to extend the support of generic subregs
to modes whose sizes are not multiple of each other, which is a requirement of
the existing code, but does that in a very specific case for the sake of the
ARM port without saying where all the above restrictions come from.
Basically we're lifting the restriction that the the sizes are multiples 
of each other.  The requirements above are the set where we know it will 
work.  They are target independent, but happen to match what the ARM needs.


The certainly do short circuit the meat of the function, that's the 
whole point, there's this set of conditions under which we know this 
will work and when they hold, we bypass.


Now one could argue that instead of bypassing we should put the code to 
handle this situation further down.  I'd be leery of doing that just 
from a complexity standpoint.  But one could also argue that short 
circuiting like the patch does adds complexity as well and may be a bit 
kludgy.


Maybe the way forward here is for someone to try and integrate this 
support in the main part of the code and see how it looks.  Then we can 
pick one.


The downside is since this probably isn't a regression that work would 
need to happen quickly to make it into gcc-5.


Which leads to another option, get the release managers to sign off on 
the kludge after gcc-5 branches and only install the kludge on the gcc-5 
branch and insisting the other solution go in for gcc-6 and beyond.  Not 
sure if they'd do that, but it's a discussion that could happen.



jeff


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-13 Thread Eric Botcazou
 Sorry for the slow response.  Jeff has approved the patch in the
 meantime, but I didn't want to go ahead and apply it while there
 was still disagreement...

I still think that it isn't appropriate to short-circuit the main computation 
as the patch does, but I don't want to block it after Jeff's approval.

 (1) we have a non-paradoxical subreg;
 (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
 hard registers (no padding or unused upper bits);
 (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
 of bytes (X) in each constituent hard register;
 (4) the offset is a multiple of X, i.e. the data we're accessing
 is aligned to a register boundary; and
 (5) endianness is regular (no differences between words and bytes,
 or between registers and memory)

OK, that's a nice translation of the new code. :-)

It seems to me that the patch wants to extend the support of generic subregs 
to modes whose sizes are not multiple of each other, which is a requirement of 
the existing code, but does that in a very specific case for the sake of the 
ARM port without saying where all the above restrictions come from.

-- 
Eric Botcazou


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-10 Thread Richard Sandiford
Sorry for the slow response.  Jeff has approved the patch in the
meantime, but I didn't want to go ahead and apply it while there
was still disagreement...

Eric Botcazou ebotca...@adacore.com writes:
 Please be more specific though.  If you don't think the patch is correct,
 what do you think the requirement should be and how should it be integrated
 into the existing checks?

 Good question, but I have asked it first. :-)

 So what are the new subregs that we want to accept here?  Can someone write 
 this down explicitly, I think that we cannot go ahead without that.

The idea is that if:

(1) we have a non-paradoxical subreg;
(2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full
hard registers (no padding or unused upper bits);
(3) (reg:ymode xregno) and (reg:xmode xregno) store the same number
of bytes (X) in each constituent hard register;
(4) the offset is a multiple of X, i.e. the data we're accessing
is aligned to a register boundary; and
(5) endianness is regular (no differences between words and bytes,
or between registers and memory)

then the register offset is always the byte offset divided by X.

 E.g. the assert is there because the main calculation is based on:
 
   /* Size of ymode must not be greater than the size of xmode.  */
   mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode);
   gcc_assert (mode_multiple != 0);
 
 which clearly isn't a useful value if the division isn't exact.
 Do you mean that, since mode_multiple isn't correct for the
 DI-of-a-CI case, we should reformulate the end of the function
 to avoid using mode_multiple at all?

 Yes.

It's not really obvious to me how to do that though.  Maybe I just
don't understand the cases that the existing code is trying to handle
well enough.

Thanks,
Richard


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2015-01-08 Thread Jeff Law

On 12/12/14 04:07, Alan Hayward wrote:

[Cleaning this thread up to submit patch again, with better explanation]

This patch causes subreg_get_info() to exit early in the simple cases
where we are extracting a whole register from a multi register.

In aarch64 for Big Endian we were producing a subreg of a OImode (256bits)
from a CImode (384bits) This would hit the following assert in
subreg_get_info:

gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);


This is a rule we should be able to relax a little - if the subreg we want
fits into a whole register then this is a valid result and can be easily
detected earlier in the function.

This has the bonus that we should be slightly reducing the execution time
for more common cases, for example a subreg of 64bits from 256bits.

This patch is required for the second part of the patch, which is aarch64
specific, and fixes up aarch64 Big Endian movoi/ci/xi. This second part
has already been approved.

This patch will apply cleanly by itself and no regressions were seen when
testing aarch64 and x86_64 on make check.

Cheers,
Alan


Changelog:

2014-11-14  Alan Hayward  alan.hayw...@arm.com

 * rtlanal.c
 (subreg_get_info): Exit early for simple and common cases

ChangeLog nit.  Should be:

* rtlanal.c (subreg_get_info): Exit early for simple and common
cases.

With ChangeLog fixed, this patch is OK for the trunk.


If you have a testcase, please submit it as well.

Thanks for your patience,
Jeff




Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-19 Thread Eric Botcazou
 Please be more specific though.  If you don't think the patch is correct,
 what do you think the requirement should be and how should it be integrated
 into the existing checks?

Good question, but I have asked it first. :-)

So what are the new subregs that we want to accept here?  Can someone write 
this down explicitly, I think that we cannot go ahead without that.

 E.g. the assert is there because the main calculation is based on:
 
   /* Size of ymode must not be greater than the size of xmode.  */
   mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode);
   gcc_assert (mode_multiple != 0);
 
 which clearly isn't a useful value if the division isn't exact.
 Do you mean that, since mode_multiple isn't correct for the
 DI-of-a-CI case, we should reformulate the end of the function
 to avoid using mode_multiple at all?

Yes.

-- 
Eric Botcazou


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-15 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 FWIW I agree this is the right approach, although I can't approve it.
 The assert above is guarding code that deals with a very general case,
 including some unusual combinations, so I don't think it would be a
 good idea to try to remove it entirely.

 Yes, but the patch is a bit of kludge since it short-circuits the meat of the 
 function:

/* This should always pass, otherwise we don't know how to verify
  the constraint.  These conditions may be relaxed but
  subreg_regno_offset would need to be redesigned.  */
   gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);
   gcc_assert ((nregs_xmode % nregs_ymode) == 0);

 So what would it take to do things properly here, i.e. relax the conditions 
 and adjust downstream?

What do you think we should relax it to though?  Obviously there's a balance
here between relaxing things enough and not relaxing them too far (so that
the EImode AArch64 thing I mentioned is still a noisy failure, for example).
ISTM the patch deals with the only significant case that is obviously safe
for modes that are not a power of 2 in size.

If you're saying that the condition itself is OK, but that the code should
be further down in the function, then I don't think that would gain much.
We already have early-outs for the simple cases, such as:

  /* Paradoxical subregs are otherwise valid.  */
  if (!rknown
   offset == 0
   GET_MODE_PRECISION (ymode)  GET_MODE_PRECISION (xmode))
{
  info-representable_p = true;
  /* If this is a big endian paradoxical subreg, which uses more
 actual hard registers than the original register, we must
 return a negative offset so that we find the proper highpart
 of the register.  */
  if (GET_MODE_SIZE (ymode)  UNITS_PER_WORD
  ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)
info-offset = nregs_xmode - nregs_ymode;
  else
info-offset = 0;
  info-nregs = nregs_ymode;
  return;
}
  [...]
  /* Lowpart subregs are otherwise valid.  */
  if (!rknown  offset == subreg_lowpart_offset (ymode, xmode))
{
  info-representable_p = true;
  rknown = true;

  if (offset == 0 || nregs_xmode == nregs_ymode)
{
  info-offset = 0;
  info-nregs = nregs_ymode;
  return;
}
}

which also come before the assert.

Thanks,
Richard



Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-15 Thread Eric Botcazou
 What do you think we should relax it to though?  Obviously there's a balance
 here between relaxing things enough and not relaxing them too far (so that
 the EImode AArch64 thing I mentioned is still a noisy failure, for
 example). ISTM the patch deals with the only significant case that is
 obviously safe for modes that are not a power of 2 in size.

Apparently the change wants to accept general subregs with not only modes 
whose sizes are multiple of each other but also whose sizes are multiple of a 
common large enough value.  That clearly goes against:

  /* This should always pass, otherwise we don't know how to verify
 the constraint.  These conditions may be relaxed but
 subreg_regno_offset would need to be redesigned.  */
  gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);
  gcc_assert ((nregs_xmode % nregs_ymode) == 0);

so I think that we should formulate the new requirement and implement it in 
the main part of the function, instead of adding it as a kludge.

 If you're saying that the condition itself is OK, but that the code should
 be further down in the function, then I don't think that would gain much.
 We already have early-outs for the simple cases, such as:

Right, but they are more of special cases and this one is not.

-- 
Eric Botcazou


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-15 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 What do you think we should relax it to though?  Obviously there's a balance
 here between relaxing things enough and not relaxing them too far (so that
 the EImode AArch64 thing I mentioned is still a noisy failure, for
 example). ISTM the patch deals with the only significant case that is
 obviously safe for modes that are not a power of 2 in size.

 Apparently the change wants to accept general subregs with not only modes 
 whose sizes are multiple of each other but also whose sizes are multiple of a 
 common large enough value.  That clearly goes against:

   /* This should always pass, otherwise we don't know how to verify
  the constraint.  These conditions may be relaxed but
  subreg_regno_offset would need to be redesigned.  */
   gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);
   gcc_assert ((nregs_xmode % nregs_ymode) == 0);

 so I think that we should formulate the new requirement and implement it in 
 the main part of the function, instead of adding it as a kludge.

Please be more specific though.  If you don't think the patch is correct,
what do you think the requirement should be and how should it be integrated
into the existing checks?

E.g. the assert is there because the main calculation is based on:

  /* Size of ymode must not be greater than the size of xmode.  */
  mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode);
  gcc_assert (mode_multiple != 0);

which clearly isn't a useful value if the division isn't exact.
Do you mean that, since mode_multiple isn't correct for the
DI-of-a-CI case, we should reformulate the end of the function
to avoid using mode_multiple at all?

Thanks,
Richard


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-13 Thread Eric Botcazou
 FWIW I agree this is the right approach, although I can't approve it.
 The assert above is guarding code that deals with a very general case,
 including some unusual combinations, so I don't think it would be a
 good idea to try to remove it entirely.

Yes, but the patch is a bit of kludge since it short-circuits the meat of the 
function:

   /* This should always pass, otherwise we don't know how to verify
 the constraint.  These conditions may be relaxed but
 subreg_regno_offset would need to be redesigned.  */
  gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);
  gcc_assert ((nregs_xmode % nregs_ymode) == 0);

So what would it take to do things properly here, i.e. relax the conditions 
and adjust downstream?

-- 
Eric Botcazou


[PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-12 Thread Alan Hayward
[Cleaning this thread up to submit patch again, with better explanation]

This patch causes subreg_get_info() to exit early in the simple cases
where we are extracting a whole register from a multi register.

In aarch64 for Big Endian we were producing a subreg of a OImode (256bits)
from a CImode (384bits) This would hit the following assert in
subreg_get_info:

gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);


This is a rule we should be able to relax a little - if the subreg we want
fits into a whole register then this is a valid result and can be easily
detected earlier in the function.

This has the bonus that we should be slightly reducing the execution time
for more common cases, for example a subreg of 64bits from 256bits.

This patch is required for the second part of the patch, which is aarch64
specific, and fixes up aarch64 Big Endian movoi/ci/xi. This second part
has already been approved.

This patch will apply cleanly by itself and no regressions were seen when
testing aarch64 and x86_64 on make check.

Cheers,
Alan


Changelog:

2014-11-14  Alan Hayward  alan.hayw...@arm.com

* rtlanal.c
(subreg_get_info): Exit early for simple and common cases



---
gcc/rtlanal.c | 13 +
1 file changed, 13 insertions(+)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index c9bf69c..a3f7b78 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3561,6 +3561,19 @@ subreg_get_info (unsigned int xregno, machine_mode
xmode,
  info-offset = offset / regsize_xmode;
  return;
}
+   /* Quick exit for the simple and common case of extracting whole
+  subregisters from a multiregister value.  */
+  if (!rknown
+  WORDS_BIG_ENDIAN == REG_WORDS_BIG_ENDIAN
+  regsize_xmode == regsize_ymode
+  (offset % regsize_ymode) == 0)
+   {
+ info-representable_p = true;
+ info-nregs = nregs_ymode;
+ info-offset = offset / regsize_ymode;
+ gcc_assert (info-offset + info-nregs = nregs_xmode);
+ return;
+   }
 }
   /* Lowpart subregs are otherwise valid.  */
--
1.9.1



0001-BE-fix-load-stores.-Common-code.patch
Description: Binary data


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-12 Thread Richard Sandiford
Alan Hayward alan.hayw...@arm.com writes:
 [Cleaning this thread up to submit patch again, with better explanation]

 This patch causes subreg_get_info() to exit early in the simple cases
 where we are extracting a whole register from a multi register.

 In aarch64 for Big Endian we were producing a subreg of a OImode (256bits)
 from a CImode (384bits) This would hit the following assert in
 subreg_get_info:

 gcc_assert ((GET_MODE_SIZE (xmode) % GET_MODE_SIZE (ymode)) == 0);


 This is a rule we should be able to relax a little - if the subreg we want
 fits into a whole register then this is a valid result and can be easily
 detected earlier in the function.

 This has the bonus that we should be slightly reducing the execution time
 for more common cases, for example a subreg of 64bits from 256bits.

 This patch is required for the second part of the patch, which is aarch64
 specific, and fixes up aarch64 Big Endian movoi/ci/xi. This second part
 has already been approved.

 This patch will apply cleanly by itself and no regressions were seen when
 testing aarch64 and x86_64 on make check.

FWIW I agree this is the right approach, although I can't approve it.
The assert above is guarding code that deals with a very general case,
including some unusual combinations, so I don't think it would be a
good idea to try to remove it entirely.

E.g. Tejas hit the same assert because we were trying to create subregs
of EImode SIMD registers on AArch64.  EImode is 24 bytes, so it's
one-*and-a-half* SIMD registers.  Taking subregs of something like that is
very dangerous and I think we want the assert to continue to trigger there.
This patch deals with a much simpler and more obvious case.

Thanks,
Richard



Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-10 Thread Alan Hayward

On 02/12/2014 12:36, Alan Hayward alan.hayw...@arm.com wrote:


On 21/11/2014 14:08, Alan Hayward alan.hayw...@arm.com wrote:


On 14/11/2014 16:48, Alan Hayward alan.hayw...@arm.com wrote:

This is a new version of my BE patch from a few weeks ago.
This is part 1 and covers rtlanal.c. The second part will be aarch64
specific.

When combined with the second patch, It fixes up movoi/ci/xi for Big
Endian, so that we end up with the lab of a big-endian integer to be in
the low byte of the highest-numbered register.

This will apply cleanly by itself and no regressions were seen when
testing aarch64 and x86_64 on make check.


Changelog:

2014-11-14  Alan Hayward  alan.hayw...@arm.com

* rtlanal.c
(subreg_get_info): Exit early for simple and common cases


Alan.

Hi,

The second part to this patch (aarch64 specific) has been approved.


Could someone review this one please.


Thanks,
Alan.


Ping.


Thanks,
Alan.


Ping ping.

Thanks,
Alan.


0001-BE-fix-load-stores.-Common-code.patch
Description: Binary data


Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-10 Thread Marcus Shawcroft
On 10 December 2014 at 09:51, Alan Hayward alan.hayw...@arm.com wrote:

This is a new version of my BE patch from a few weeks ago.
This is part 1 and covers rtlanal.c. The second part will be aarch64
specific.

When combined with the second patch, It fixes up movoi/ci/xi for Big
Endian, so that we end up with the lab of a big-endian integer to be in
the low byte of the highest-numbered register.

This will apply cleanly by itself and no regressions were seen when
testing aarch64 and x86_64 on make check.

Hi Alan, I'm not a maintainer in this area of the compiler... however
I would suggest that the relevant maintainers of the middle end
rtlanal.c etc are more likely to pick up this review if the text of
the email focussed on how rtlanal.c is broken and why the proposed
patch is the right way to go.  You should also CC one of the
maintainers...

Cheers
/Marcus


Re: [rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-12-02 Thread Alan Hayward

On 21/11/2014 14:08, Alan Hayward alan.hayw...@arm.com wrote:


On 14/11/2014 16:48, Alan Hayward alan.hayw...@arm.com wrote:

This is a new version of my BE patch from a few weeks ago.
This is part 1 and covers rtlanal.c. The second part will be aarch64
specific.

When combined with the second patch, It fixes up movoi/ci/xi for Big
Endian, so that we end up with the lab of a big-endian integer to be in
the low byte of the highest-numbered register.

This will apply cleanly by itself and no regressions were seen when
testing aarch64 and x86_64 on make check.


Changelog:

2014-11-14  Alan Hayward  alan.hayw...@arm.com

* rtlanal.c
(subreg_get_info): Exit early for simple and common cases


Alan.

Hi,

The second part to this patch (aarch64 specific) has been approved.


Could someone review this one please.


Thanks,
Alan.


Ping.


Thanks,
Alan.


0001-BE-fix-load-stores.-Common-code.patch
Description: Binary data


Re: [rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-11-21 Thread Alan Hayward

On 14/11/2014 16:48, Alan Hayward alan.hayw...@arm.com wrote:

This is a new version of my BE patch from a few weeks ago.
This is part 1 and covers rtlanal.c. The second part will be aarch64
specific.

When combined with the second patch, It fixes up movoi/ci/xi for Big
Endian, so that we end up with the lab of a big-endian integer to be in
the low byte of the highest-numbered register.

This will apply cleanly by itself and no regressions were seen when
testing aarch64 and x86_64 on make check.


Changelog:

2014-11-14  Alan Hayward  alan.hayw...@arm.com

* rtlanal.c
(subreg_get_info): Exit early for simple and common cases


Alan.

Hi,

The second part to this patch (aarch64 specific) has been approved.


Could someone review this one please.


Thanks,
Alan.





[rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1

2014-11-14 Thread Alan Hayward
This is a new version of my BE patch from a few weeks ago.
This is part 1 and covers rtlanal.c. The second part will be aarch64
specific.

When combined with the second patch, It fixes up movoi/ci/xi for Big
Endian, so that we end up with the lab of a big-endian integer to be in
the low byte of the highest-numbered register.

This will apply cleanly by itself and no regressions were seen when
testing aarch64 and x86_64 on make check.


Changelog:

2014-11-14  Alan Hayward  alan.hayw...@arm.com

* rtlanal.c
(subreg_get_info): Exit early for simple and common cases


Alan.



0001-BE-fix-load-stores.-Common-code.patch
Description: Binary data