Re: Question about make_extraction() in combine.c

2019-08-29 Thread Michael Eager

On 8/28/19 12:33 PM, Segher Boessenkool wrote:

Hi!

On Tue, Aug 27, 2019 at 09:37:59AM -0700, Michael Eager wrote:

Combine is complex, but I don't think that target descriptions should
conform to its behaviors;


But they have to, in some ways.  If combine writes something that can be
written in multiple ways in some way X, then your machine description has
to recognise X (perhaps in addition to other ways it can be written), or
you will not get as much optimisation as you might like: some combine
attempts will fail.


combine should adapt to the target.


How?


By not making arbitrary restrictions on the instructions which a target 
can implement, simply to avoid a bug in a different target.


The target has an instruction which can extract a bit (or bit field) 
from a word in memory.  The code in combine prevents that instruction 
from being used without creating awkward workarounds.





diff --git a/gcc/combine.c b/gcc/combine.c
index 93bd3da26d7..fdc79ab7d3e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -,17 +,7 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,


This patch is against some older version of combine.c?  The line number is
off by 70 or so.


The patch was created last November.


&& partial_subreg_p (extraction_mode, mode))
  extraction_mode = mode;


And current trunk has here

   /* Punt if len is too large for extraction_mode.  */
   if (maybe_gt (len, GET_MODE_PRECISION (extraction_mode)))
 return NULL_RTX;

(See r268913, https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01140.html ).


This patch seems unrelated.


Does that fix your problem already?  Is more needed?  Is your patch
removing some now-dead code?


If you are asking, does my patch remove code that is no longer needed to 
fix the decades-old problem which it fixed (or hid)?  I suspect it does. 
 But this cannot be verified.  There's no test case for the original 
problem, nor is it clear which architecture had the problem.


--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306


Re: Question about make_extraction() in combine.c

2019-08-28 Thread Segher Boessenkool
Hi!

On Tue, Aug 27, 2019 at 09:37:59AM -0700, Michael Eager wrote:
> Combine is complex, but I don't think that target descriptions should 
> conform to its behaviors;

But they have to, in some ways.  If combine writes something that can be
written in multiple ways in some way X, then your machine description has
to recognise X (perhaps in addition to other ways it can be written), or
you will not get as much optimisation as you might like: some combine
attempts will fail.

> combine should adapt to the target.

How?


> diff --git a/gcc/combine.c b/gcc/combine.c
> index 93bd3da26d7..fdc79ab7d3e 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -,17 +,7 @@ make_extraction (machine_mode mode, rtx inner, 
> HOST_WIDE_INT pos,

This patch is against some older version of combine.c?  The line number is
off by 70 or so.

>&& partial_subreg_p (extraction_mode, mode))
>  extraction_mode = mode;

And current trunk has here

  /* Punt if len is too large for extraction_mode.  */
  if (maybe_gt (len, GET_MODE_PRECISION (extraction_mode)))
return NULL_RTX;

(See r268913, https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01140.html ).

Does that fix your problem already?  Is more needed?  Is your patch
removing some now-dead code?

> -  if (!MEM_P (inner))
> -wanted_inner_mode = wanted_inner_reg_mode;
> -  else
> -{
> -  /* Be careful not to go beyond the extracted object and maintain the
> -  natural alignment of the memory.  */
> -  wanted_inner_mode = smallest_int_mode_for_size (len);
> -  while (pos % GET_MODE_BITSIZE (wanted_inner_mode) + len
> -  > GET_MODE_BITSIZE (wanted_inner_mode))
> - wanted_inner_mode = GET_MODE_WIDER_MODE (wanted_inner_mode).require ();
> -}
> +  wanted_inner_mode = wanted_inner_reg_mode;


Segher


Re: Question about make_extraction() in combine.c

2019-08-27 Thread Michael Eager



On 8/20/19 4:07 PM, Jeff Law wrote:

On 11/20/18 4:39 PM, Michael Eager wrote:

On 11/20/2018 03:10 PM, Jeff Law wrote:

On 11/20/18 11:07 AM, Michael Eager wrote:

On 11/18/2018 08:14 AM, Jeff Law wrote:

On 11/18/18 8:44 AM, Michael Eager wrote:

On 11/16/18 14:50, Segher Boessenkool wrote:

Hi!

On Wed, Nov 14, 2018 at 11:22:58AM -0800, Michael Eager wrote:

The (mem:SI) is converted to (mem:QI).

The return from make_extract() is
   (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ]))
  (const_int 1 [0x1])
  (const_int 2 [0x2]))

The target has an instruction pattern for zero_extract, but it
matches
SI values, not QI.  So the instruction which implements a test of a
bit
flag is never generated.


The RTL documentation says:

  If @var{loc} is in memory, its mode must be a single-byte
integer
mode.

But obviously various ports use other modes, too.  I wonder if that
ever
worked well.


Thanks.  I hadn't noticed this.

Does anyone have any idea why there is a restriction on the mode?
Other instruction patterns don't place arbitrary restriction on the
memory access mode.

Not offhand.   BUt it's been around for a long time (at least since the
early 90s).  I stumbled over it several times through the years.  It
could well be an artifact of the m68k or the vax.


The internal RTL should not be dictating what the target arch can or
cannot implement.  Reload should insert any needed conversions,
especially ones which narrow the size.

I have a patch which removes this conversion.  Works for my target.  I
built and ran 'make check' for x86 with no additional failures.  I don't
have a VAX or M68K sitting around to test.  :-)

I can do correctness tests for m68k via qemu.  Essentially verifying it
still bootstraps:-)  Just pass along a git formatted patch and I can
throw it into the tester.


Here's the patch.  I'll edit the doc if this works for 68K.

So I'm finally coming back to this.  The test certainly bootstraps on
the m68k and various other targets.  It doesn't seem to cause any
regressions on the various targets that get tested.

I wandered the backends and uses of zero_extract with MEMs not in QImode
are actually relatively rare and not likely things that are being
exercised thoroughly.  Ironically the vax seems to have the most of
these kinds of patterns.

After re-reviewing Paulo's changes from 2006 as well as an old thread
from the gcc-2 era, I'm even more concerned than I was before WRT this
change potentially allowing a read beyond the boundary of the original
object, particularly on targets that don't require strict alignment.
That could be fatal if the object happens to be at the end of a page.

The 1992 thread I suspect is what ultimately led to the note in the
manual which limits memory operands of a zero_extract to QImode.  We
didn't have version control in those days, so I can't be 100% certain.

This kind of situation is rare, but not unheard of (I've tripped over
this kind of stuff myself).

I think we're ultimately better off fixing the target to better match
what combine is doing.


Thanks for investigating this.  It looks like you did quite a bit of 
archaeological research.


It seems to me that the only way that a read outside of the original 
object bounds is if the size is expanded.  There's code in combine.c to 
insure that does not happen.  I guess that an extv could be generated 
with incorrect operands which would be outside the object bounds, but 
that would be a bug.


I don't doubt that the QI restriction on zero_extract resolves some 
long-ago bug report, but I suspect that it hides the bug rather than 
fixes it.  It's also possible that the actual bug has been fixed, but 
the restriction has never been removed.  Without a test case, it isn't 
easy to do more than offer a conjecture.


Combine is complex, but I don't think that target descriptions should 
conform to its behaviors; combine should adapt to the target. 
(Incidentally, the target has a peephole optimization which is supposed 
to do what combine should be doing in this case.  It worked with GCC-4; 
I don't recall why I never investigated why it isn't working with GCC-8. 
 Possibly, the error occurs before peephole optimization.)


As a practical matter, the patch seems to work without problem.  (I've 
attached it here again; I'd sent it previously in an off-list email.) 
The target is not in the GCC tree, but in a separate public repo, and I 
don't believe that it will ever be upstreamed, so there's no issue with 
a future merge conflict.  My project to upgrade the target to GCC-8 was 
completed last year and I'm no longer maintaining the tool chain.


Thanks again for following up on this.


--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
>From 7de0670b38c7b1cc3b805bdebe638357f384a9f8 Mon Sep 17 00:00:00 2001
From: Michael Eager 
Date: Tue, 20 Nov 2018 15:37:10 -0800
Subject: [PATCH] Eliminate narrowing of zero_extract MEM reference

Combine (make_extraction) 

Re: Question about make_extraction() in combine.c

2018-11-29 Thread Jeff Law
On 11/21/18 9:33 AM, Segher Boessenkool wrote:
> On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote:
>> The internal RTL should not be dictating what the target arch can or 
>> cannot implement.  Reload should insert any needed conversions, 
>> especially ones which narrow the size.
> 
> Well, that depends.  A zero_extract of mem is only defined for byte_mode,
> just like SET is only defined for VOIDmode (on the SET itself, not its
> args).  Because this is guaranteed, nothing in GCC ever needs to check
> this.  That is the theory of course; in reality quite a few targets have
> used other modes for the mem in a zero_extract, and this seems to have
> mostly worked.
Right.

> 
> As another example, closer by, an extract length of 0 is not allowed
> either, for zero_extract.  And this *did* cause problems recently.
Very true.  I don't doubt there's still corner cases that haven't been
well documented.

> 
> Why was it documented as requiring byte mode?  Was this changed, just
> the documentation was not updated?
I don't think it was ever documented why byte mode was required, merely
that it was required.  I can recall stumbling over it in the early/mid
90s.  I've got references to the byte requirement going back to early
1992 in my archives.

After some wandering of those archives, folks were clearly worried about
the extv/insv insns reading/writing beyond the boundary of the current
object.  If you restrict memory ops to QImode, then you resolve that
problem.

But we're unlikely to get a definitive answer here.


I think the real question is can we reasonably lift the restriction.



Jeff


Re: Question about make_extraction() in combine.c

2018-11-21 Thread Michael Eager

On 11/21/18 11:47, Segher Boessenkool wrote:

On Wed, Nov 21, 2018 at 08:52:21AM -0800, Michael Eager wrote:

On 11/21/2018 08:33 AM, Segher Boessenkool wrote:

On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote:

The internal RTL should not be dictating what the target arch can or
cannot implement.  Reload should insert any needed conversions,
especially ones which narrow the size.


Well, that depends.  A zero_extract of mem is only defined for byte_mode,
just like SET is only defined for VOIDmode (on the SET itself, not its
args).  Because this is guaranteed, nothing in GCC ever needs to check
this.  That is the theory of course; in reality quite a few targets have
used other modes for the mem in a zero_extract, and this seems to have
mostly worked.


This restriction on zero_extract MEM args (and only MEM) seems to be
completely arbitrary.  What is it about the operation of extracting a
bit field which makes it dependent on the memory access size?


If the mode is required to be byte_mode, then all code dealing with it
can assume it to be byte_mode.  Without first having to check it, and
without having to handle other modes.


Essentially all other RTL operations place no restriction on mode.
If there are no arbitrary mode restrictions, then no checking is
required.




The value of SET is VOIDmode, in that it has no value.   Not sure what
your point is here.


See above.


As another example, closer by, an extract length of 0 is not allowed
either, for zero_extract.  And this *did* cause problems recently.


This is a restriction which does make sense.  It isn't clear what
the value of a zero length field is, or how to represent it.  If
something is undefined, then there is a strong argument for making is
invalid.  (Are there architectures which have instructions which extract
a zero length bit field?  I doubt it.)


It doesn't matter if you (or I, or anyone) think it makes sense; the rules
are the rules.  If you want to change the rules, then post a patch.


These are not immutable rules delivered by some deity.  Rules which
have no discernible value or purpose can and should be changed.

A patch is being tested.


Why was it documented as requiring byte mode?  Was this changed, just
the documentation was not updated?


Ancient history.  As Jeff said, perhaps an architectural requirement of
VAX or m68k.  This wasn't changed, as far as I'm aware.


So a mem in a *_extract is still required to be byte_mode you say? >
I don't think we can change this in the documentation without reviewing
all code that deals with *_extract to see if this will work :-/


Have at it.

--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306


Re: Question about make_extraction() in combine.c

2018-11-21 Thread Segher Boessenkool
On Wed, Nov 21, 2018 at 08:52:21AM -0800, Michael Eager wrote:
> On 11/21/2018 08:33 AM, Segher Boessenkool wrote:
> >On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote:
> >>The internal RTL should not be dictating what the target arch can or
> >>cannot implement.  Reload should insert any needed conversions,
> >>especially ones which narrow the size.
> >
> >Well, that depends.  A zero_extract of mem is only defined for byte_mode,
> >just like SET is only defined for VOIDmode (on the SET itself, not its
> >args).  Because this is guaranteed, nothing in GCC ever needs to check
> >this.  That is the theory of course; in reality quite a few targets have
> >used other modes for the mem in a zero_extract, and this seems to have
> >mostly worked.
> 
> This restriction on zero_extract MEM args (and only MEM) seems to be
> completely arbitrary.  What is it about the operation of extracting a
> bit field which makes it dependent on the memory access size?

If the mode is required to be byte_mode, then all code dealing with it
can assume it to be byte_mode.  Without first having to check it, and
without having to handle other modes.

> The value of SET is VOIDmode, in that it has no value.   Not sure what
> your point is here.

See above.

> >As another example, closer by, an extract length of 0 is not allowed
> >either, for zero_extract.  And this *did* cause problems recently.
> 
> This is a restriction which does make sense.  It isn't clear what
> the value of a zero length field is, or how to represent it.  If
> something is undefined, then there is a strong argument for making is
> invalid.  (Are there architectures which have instructions which extract
> a zero length bit field?  I doubt it.)

It doesn't matter if you (or I, or anyone) think it makes sense; the rules
are the rules.  If you want to change the rules, then post a patch.

> >Why was it documented as requiring byte mode?  Was this changed, just
> >the documentation was not updated?
> 
> Ancient history.  As Jeff said, perhaps an architectural requirement of
> VAX or m68k.  This wasn't changed, as far as I'm aware.

So a mem in a *_extract is still required to be byte_mode you say?

I don't think we can change this in the documentation without reviewing
all code that deals with *_extract to see if this will work :-/


Segher


Re: Question about make_extraction() in combine.c

2018-11-21 Thread Michael Eager

On 11/21/2018 08:33 AM, Segher Boessenkool wrote:

On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote:

The internal RTL should not be dictating what the target arch can or
cannot implement.  Reload should insert any needed conversions,
especially ones which narrow the size.


Well, that depends.  A zero_extract of mem is only defined for byte_mode,
just like SET is only defined for VOIDmode (on the SET itself, not its
args).  Because this is guaranteed, nothing in GCC ever needs to check
this.  That is the theory of course; in reality quite a few targets have
used other modes for the mem in a zero_extract, and this seems to have
mostly worked.


This restriction on zero_extract MEM args (and only MEM) seems to be
completely arbitrary.  What is it about the operation of extracting a
bit field which makes it dependent on the memory access size?

The value of SET is VOIDmode, in that it has no value.   Not sure what
your point is here.


As another example, closer by, an extract length of 0 is not allowed
either, for zero_extract.  And this *did* cause problems recently.


This is a restriction which does make sense.  It isn't clear what
the value of a zero length field is, or how to represent it.  If
something is undefined, then there is a strong argument for making is
invalid.  (Are there architectures which have instructions which extract
a zero length bit field?  I doubt it.)


Why was it documented as requiring byte mode?  Was this changed, just
the documentation was not updated?


Ancient history.  As Jeff said, perhaps an architectural requirement of
VAX or m68k.  This wasn't changed, as far as I'm aware.

--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306


Re: Question about make_extraction() in combine.c

2018-11-21 Thread Segher Boessenkool
On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote:
> The internal RTL should not be dictating what the target arch can or 
> cannot implement.  Reload should insert any needed conversions, 
> especially ones which narrow the size.

Well, that depends.  A zero_extract of mem is only defined for byte_mode,
just like SET is only defined for VOIDmode (on the SET itself, not its
args).  Because this is guaranteed, nothing in GCC ever needs to check
this.  That is the theory of course; in reality quite a few targets have
used other modes for the mem in a zero_extract, and this seems to have
mostly worked.

As another example, closer by, an extract length of 0 is not allowed
either, for zero_extract.  And this *did* cause problems recently.

Why was it documented as requiring byte mode?  Was this changed, just
the documentation was not updated?


Segher


Re: Question about make_extraction() in combine.c

2018-11-20 Thread Jeff Law
On 11/20/18 11:07 AM, Michael Eager wrote:
> On 11/18/2018 08:14 AM, Jeff Law wrote:
>> On 11/18/18 8:44 AM, Michael Eager wrote:
>>> On 11/16/18 14:50, Segher Boessenkool wrote:
 Hi!

 On Wed, Nov 14, 2018 at 11:22:58AM -0800, Michael Eager wrote:
> The (mem:SI) is converted to (mem:QI).
>
> The return from make_extract() is
>  (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ]))
>     (const_int 1 [0x1])
>     (const_int 2 [0x2]))
>
> The target has an instruction pattern for zero_extract, but it matches
> SI values, not QI.  So the instruction which implements a test of a
> bit
> flag is never generated.

 The RTL documentation says:

     If @var{loc} is in memory, its mode must be a single-byte integer
 mode.

 But obviously various ports use other modes, too.  I wonder if that
 ever
 worked well.
>>>
>>> Thanks.  I hadn't noticed this.
>>>
>>> Does anyone have any idea why there is a restriction on the mode?
>>> Other instruction patterns don't place arbitrary restriction on the
>>> memory access mode.
>> Not offhand.   BUt it's been around for a long time (at least since the
>> early 90s).  I stumbled over it several times through the years.  It
>> could well be an artifact of the m68k or the vax.
> 
> The internal RTL should not be dictating what the target arch can or
> cannot implement.  Reload should insert any needed conversions,
> especially ones which narrow the size.
> 
> I have a patch which removes this conversion.  Works for my target.  I
> built and ran 'make check' for x86 with no additional failures.  I don't
> have a VAX or M68K sitting around to test.  :-)
I can do correctness tests for m68k via qemu.  Essentially verifying it
still bootstraps:-)  Just pass along a git formatted patch and I can
throw it into the tester.

jeff


Re: Question about make_extraction() in combine.c

2018-11-20 Thread Michael Eager

On 11/18/2018 08:14 AM, Jeff Law wrote:

On 11/18/18 8:44 AM, Michael Eager wrote:

On 11/16/18 14:50, Segher Boessenkool wrote:

Hi!

On Wed, Nov 14, 2018 at 11:22:58AM -0800, Michael Eager wrote:

The (mem:SI) is converted to (mem:QI).

The return from make_extract() is
     (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ]))
    (const_int 1 [0x1])
    (const_int 2 [0x2]))

The target has an instruction pattern for zero_extract, but it matches
SI values, not QI.  So the instruction which implements a test of a bit
flag is never generated.


The RTL documentation says:

    If @var{loc} is in memory, its mode must be a single-byte integer
mode.

But obviously various ports use other modes, too.  I wonder if that ever
worked well.


Thanks.  I hadn't noticed this.

Does anyone have any idea why there is a restriction on the mode?
Other instruction patterns don't place arbitrary restriction on the
memory access mode.

Not offhand.   BUt it's been around for a long time (at least since the
early 90s).  I stumbled over it several times through the years.  It
could well be an artifact of the m68k or the vax.


The internal RTL should not be dictating what the target arch can or 
cannot implement.  Reload should insert any needed conversions, 
especially ones which narrow the size.


I have a patch which removes this conversion.  Works for my target.  I 
built and ran 'make check' for x86 with no additional failures.  I don't 
have a VAX or M68K sitting around to test.  :-)


I can submit the patch and remove the restriction from the docs, but I 
have no way of telling that this doesn't break (or deoptimize) some 
other target.



--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306


Question about make_extraction() in combine.c

2018-11-15 Thread Michael Eager

In combine, simplify_comparison() is being called with the following
arguments:

  code = EQ
  op0 =  (and:SI (mem:SI (reg/v/f:SI 50 [ gp ]) (const_int 4 [0x4]))
  op1 =  (const_int 0 [0])
€
After churning down through make_compound_operation() and
make_compound_operation_int(), processing gets to make_extraction().
Eventually (combine.c:7753), make_extract() decides that the best
pattern is EP_epextzv.  No instruction patterns are provided for
"extz*"; get_best_reg_extraction_insn() returns false, and
make_extraction falls through to this code [indenting modified for
email].

combine.c:7786:
  /* Be careful not to go beyond the extracted object and
 maintain the natural alignment of the memory.  */
  wanted_inner_mode = smallest_int_mode_for_size (len);
  while (pos % GET_MODE_BITSIZE (wanted_inner_mode) + len
 > GET_MODE_BITSIZE (wanted_inner_mode))
wanted_inner_mode = GET_MODE_WIDER_MODE
  (wanted_inner_mode).require ();

Len == 1; the result is that wanted_inner_mode == E_QImode.

No instruction patterns are provided for "extz*";
get_best_reg_extraction_insn() returns false, and make_extraction falls
through to this code

The (mem:SI) is converted to (mem:QI).

The return from make_extract() is
   (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ]))
  (const_int 1 [0x1])
  (const_int 2 [0x2]))

The target has an instruction pattern for zero_extract, but it matches
SI values, not QI.  So the instruction which implements a test of a bit
flag is never generated.

In an old version of GCC, a call to mode_dependent_address_p() 
controlled whether this conversion from SI to QI was done.  This would

not be useful, since QI is a valid address mode in general, just not for
the zero_extract pattern.  A kludge was added to prevent this conversion
from SI to QI.  I'd like to avoid re-implementing the kludge.



Questions:
  1.  What is make_extract() trying to do with the MEM address?
  2.  Why modify the MEM address from SI to QI?
  There's no obvious benefit that I see of
  (zero_extract:SI (mem:QI)...)  over (zero_extract:SI (mem:SI)...).
  3.  What's the best way to fix this?
- Remove the down-sizing of MEM in make_extract()?
- Define patterns for extz*?
- Do something so zero_extend accepts QI?

--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306