Re: Question about make_extraction() in combine.c
On 8/28/19 12:33 PM, Segher Boessenkool wrote: Hi! On Tue, Aug 27, 2019 at 09:37:59AM -0700, Michael Eager wrote: Combine is complex, but I don't think that target descriptions should conform to its behaviors; But they have to, in some ways. If combine writes something that can be written in multiple ways in some way X, then your machine description has to recognise X (perhaps in addition to other ways it can be written), or you will not get as much optimisation as you might like: some combine attempts will fail. combine should adapt to the target. How? By not making arbitrary restrictions on the instructions which a target can implement, simply to avoid a bug in a different target. The target has an instruction which can extract a bit (or bit field) from a word in memory. The code in combine prevents that instruction from being used without creating awkward workarounds. diff --git a/gcc/combine.c b/gcc/combine.c index 93bd3da26d7..fdc79ab7d3e 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -,17 +,7 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, This patch is against some older version of combine.c? The line number is off by 70 or so. The patch was created last November. && partial_subreg_p (extraction_mode, mode)) extraction_mode = mode; And current trunk has here /* Punt if len is too large for extraction_mode. */ if (maybe_gt (len, GET_MODE_PRECISION (extraction_mode))) return NULL_RTX; (See r268913, https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01140.html ). This patch seems unrelated. Does that fix your problem already? Is more needed? Is your patch removing some now-dead code? If you are asking, does my patch remove code that is no longer needed to fix the decades-old problem which it fixed (or hid)? I suspect it does. But this cannot be verified. There's no test case for the original problem, nor is it clear which architecture had the problem. -- Michael Eagerea...@eagerm.com 1960 Park Blvd., Palo Alto, CA 94306
Re: Question about make_extraction() in combine.c
Hi! On Tue, Aug 27, 2019 at 09:37:59AM -0700, Michael Eager wrote: > Combine is complex, but I don't think that target descriptions should > conform to its behaviors; But they have to, in some ways. If combine writes something that can be written in multiple ways in some way X, then your machine description has to recognise X (perhaps in addition to other ways it can be written), or you will not get as much optimisation as you might like: some combine attempts will fail. > combine should adapt to the target. How? > diff --git a/gcc/combine.c b/gcc/combine.c > index 93bd3da26d7..fdc79ab7d3e 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -,17 +,7 @@ make_extraction (machine_mode mode, rtx inner, > HOST_WIDE_INT pos, This patch is against some older version of combine.c? The line number is off by 70 or so. >&& partial_subreg_p (extraction_mode, mode)) > extraction_mode = mode; And current trunk has here /* Punt if len is too large for extraction_mode. */ if (maybe_gt (len, GET_MODE_PRECISION (extraction_mode))) return NULL_RTX; (See r268913, https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01140.html ). Does that fix your problem already? Is more needed? Is your patch removing some now-dead code? > - if (!MEM_P (inner)) > -wanted_inner_mode = wanted_inner_reg_mode; > - else > -{ > - /* Be careful not to go beyond the extracted object and maintain the > - natural alignment of the memory. */ > - wanted_inner_mode = smallest_int_mode_for_size (len); > - while (pos % GET_MODE_BITSIZE (wanted_inner_mode) + len > - > GET_MODE_BITSIZE (wanted_inner_mode)) > - wanted_inner_mode = GET_MODE_WIDER_MODE (wanted_inner_mode).require (); > -} > + wanted_inner_mode = wanted_inner_reg_mode; Segher
Re: Question about make_extraction() in combine.c
On 8/20/19 4:07 PM, Jeff Law wrote: On 11/20/18 4:39 PM, Michael Eager wrote: On 11/20/2018 03:10 PM, Jeff Law wrote: On 11/20/18 11:07 AM, Michael Eager wrote: On 11/18/2018 08:14 AM, Jeff Law wrote: On 11/18/18 8:44 AM, Michael Eager wrote: On 11/16/18 14:50, Segher Boessenkool wrote: Hi! On Wed, Nov 14, 2018 at 11:22:58AM -0800, Michael Eager wrote: The (mem:SI) is converted to (mem:QI). The return from make_extract() is (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ])) (const_int 1 [0x1]) (const_int 2 [0x2])) The target has an instruction pattern for zero_extract, but it matches SI values, not QI. So the instruction which implements a test of a bit flag is never generated. The RTL documentation says: If @var{loc} is in memory, its mode must be a single-byte integer mode. But obviously various ports use other modes, too. I wonder if that ever worked well. Thanks. I hadn't noticed this. Does anyone have any idea why there is a restriction on the mode? Other instruction patterns don't place arbitrary restriction on the memory access mode. Not offhand. BUt it's been around for a long time (at least since the early 90s). I stumbled over it several times through the years. It could well be an artifact of the m68k or the vax. The internal RTL should not be dictating what the target arch can or cannot implement. Reload should insert any needed conversions, especially ones which narrow the size. I have a patch which removes this conversion. Works for my target. I built and ran 'make check' for x86 with no additional failures. I don't have a VAX or M68K sitting around to test. :-) I can do correctness tests for m68k via qemu. Essentially verifying it still bootstraps:-) Just pass along a git formatted patch and I can throw it into the tester. Here's the patch. I'll edit the doc if this works for 68K. So I'm finally coming back to this. The test certainly bootstraps on the m68k and various other targets. It doesn't seem to cause any regressions on the various targets that get tested. I wandered the backends and uses of zero_extract with MEMs not in QImode are actually relatively rare and not likely things that are being exercised thoroughly. Ironically the vax seems to have the most of these kinds of patterns. After re-reviewing Paulo's changes from 2006 as well as an old thread from the gcc-2 era, I'm even more concerned than I was before WRT this change potentially allowing a read beyond the boundary of the original object, particularly on targets that don't require strict alignment. That could be fatal if the object happens to be at the end of a page. The 1992 thread I suspect is what ultimately led to the note in the manual which limits memory operands of a zero_extract to QImode. We didn't have version control in those days, so I can't be 100% certain. This kind of situation is rare, but not unheard of (I've tripped over this kind of stuff myself). I think we're ultimately better off fixing the target to better match what combine is doing. Thanks for investigating this. It looks like you did quite a bit of archaeological research. It seems to me that the only way that a read outside of the original object bounds is if the size is expanded. There's code in combine.c to insure that does not happen. I guess that an extv could be generated with incorrect operands which would be outside the object bounds, but that would be a bug. I don't doubt that the QI restriction on zero_extract resolves some long-ago bug report, but I suspect that it hides the bug rather than fixes it. It's also possible that the actual bug has been fixed, but the restriction has never been removed. Without a test case, it isn't easy to do more than offer a conjecture. Combine is complex, but I don't think that target descriptions should conform to its behaviors; combine should adapt to the target. (Incidentally, the target has a peephole optimization which is supposed to do what combine should be doing in this case. It worked with GCC-4; I don't recall why I never investigated why it isn't working with GCC-8. Possibly, the error occurs before peephole optimization.) As a practical matter, the patch seems to work without problem. (I've attached it here again; I'd sent it previously in an off-list email.) The target is not in the GCC tree, but in a separate public repo, and I don't believe that it will ever be upstreamed, so there's no issue with a future merge conflict. My project to upgrade the target to GCC-8 was completed last year and I'm no longer maintaining the tool chain. Thanks again for following up on this. -- Michael Eagerea...@eagerm.com 1960 Park Blvd., Palo Alto, CA 94306 >From 7de0670b38c7b1cc3b805bdebe638357f384a9f8 Mon Sep 17 00:00:00 2001 From: Michael Eager Date: Tue, 20 Nov 2018 15:37:10 -0800 Subject: [PATCH] Eliminate narrowing of zero_extract MEM reference Combine (make_extraction)
Re: Question about make_extraction() in combine.c
On 11/21/18 9:33 AM, Segher Boessenkool wrote: > On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote: >> The internal RTL should not be dictating what the target arch can or >> cannot implement. Reload should insert any needed conversions, >> especially ones which narrow the size. > > Well, that depends. A zero_extract of mem is only defined for byte_mode, > just like SET is only defined for VOIDmode (on the SET itself, not its > args). Because this is guaranteed, nothing in GCC ever needs to check > this. That is the theory of course; in reality quite a few targets have > used other modes for the mem in a zero_extract, and this seems to have > mostly worked. Right. > > As another example, closer by, an extract length of 0 is not allowed > either, for zero_extract. And this *did* cause problems recently. Very true. I don't doubt there's still corner cases that haven't been well documented. > > Why was it documented as requiring byte mode? Was this changed, just > the documentation was not updated? I don't think it was ever documented why byte mode was required, merely that it was required. I can recall stumbling over it in the early/mid 90s. I've got references to the byte requirement going back to early 1992 in my archives. After some wandering of those archives, folks were clearly worried about the extv/insv insns reading/writing beyond the boundary of the current object. If you restrict memory ops to QImode, then you resolve that problem. But we're unlikely to get a definitive answer here. I think the real question is can we reasonably lift the restriction. Jeff
Re: Question about make_extraction() in combine.c
On 11/21/18 11:47, Segher Boessenkool wrote: On Wed, Nov 21, 2018 at 08:52:21AM -0800, Michael Eager wrote: On 11/21/2018 08:33 AM, Segher Boessenkool wrote: On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote: The internal RTL should not be dictating what the target arch can or cannot implement. Reload should insert any needed conversions, especially ones which narrow the size. Well, that depends. A zero_extract of mem is only defined for byte_mode, just like SET is only defined for VOIDmode (on the SET itself, not its args). Because this is guaranteed, nothing in GCC ever needs to check this. That is the theory of course; in reality quite a few targets have used other modes for the mem in a zero_extract, and this seems to have mostly worked. This restriction on zero_extract MEM args (and only MEM) seems to be completely arbitrary. What is it about the operation of extracting a bit field which makes it dependent on the memory access size? If the mode is required to be byte_mode, then all code dealing with it can assume it to be byte_mode. Without first having to check it, and without having to handle other modes. Essentially all other RTL operations place no restriction on mode. If there are no arbitrary mode restrictions, then no checking is required. The value of SET is VOIDmode, in that it has no value. Not sure what your point is here. See above. As another example, closer by, an extract length of 0 is not allowed either, for zero_extract. And this *did* cause problems recently. This is a restriction which does make sense. It isn't clear what the value of a zero length field is, or how to represent it. If something is undefined, then there is a strong argument for making is invalid. (Are there architectures which have instructions which extract a zero length bit field? I doubt it.) It doesn't matter if you (or I, or anyone) think it makes sense; the rules are the rules. If you want to change the rules, then post a patch. These are not immutable rules delivered by some deity. Rules which have no discernible value or purpose can and should be changed. A patch is being tested. Why was it documented as requiring byte mode? Was this changed, just the documentation was not updated? Ancient history. As Jeff said, perhaps an architectural requirement of VAX or m68k. This wasn't changed, as far as I'm aware. So a mem in a *_extract is still required to be byte_mode you say? > I don't think we can change this in the documentation without reviewing all code that deals with *_extract to see if this will work :-/ Have at it. -- Michael Eagerea...@eagerm.com 1960 Park Blvd., Palo Alto, CA 94306
Re: Question about make_extraction() in combine.c
On Wed, Nov 21, 2018 at 08:52:21AM -0800, Michael Eager wrote: > On 11/21/2018 08:33 AM, Segher Boessenkool wrote: > >On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote: > >>The internal RTL should not be dictating what the target arch can or > >>cannot implement. Reload should insert any needed conversions, > >>especially ones which narrow the size. > > > >Well, that depends. A zero_extract of mem is only defined for byte_mode, > >just like SET is only defined for VOIDmode (on the SET itself, not its > >args). Because this is guaranteed, nothing in GCC ever needs to check > >this. That is the theory of course; in reality quite a few targets have > >used other modes for the mem in a zero_extract, and this seems to have > >mostly worked. > > This restriction on zero_extract MEM args (and only MEM) seems to be > completely arbitrary. What is it about the operation of extracting a > bit field which makes it dependent on the memory access size? If the mode is required to be byte_mode, then all code dealing with it can assume it to be byte_mode. Without first having to check it, and without having to handle other modes. > The value of SET is VOIDmode, in that it has no value. Not sure what > your point is here. See above. > >As another example, closer by, an extract length of 0 is not allowed > >either, for zero_extract. And this *did* cause problems recently. > > This is a restriction which does make sense. It isn't clear what > the value of a zero length field is, or how to represent it. If > something is undefined, then there is a strong argument for making is > invalid. (Are there architectures which have instructions which extract > a zero length bit field? I doubt it.) It doesn't matter if you (or I, or anyone) think it makes sense; the rules are the rules. If you want to change the rules, then post a patch. > >Why was it documented as requiring byte mode? Was this changed, just > >the documentation was not updated? > > Ancient history. As Jeff said, perhaps an architectural requirement of > VAX or m68k. This wasn't changed, as far as I'm aware. So a mem in a *_extract is still required to be byte_mode you say? I don't think we can change this in the documentation without reviewing all code that deals with *_extract to see if this will work :-/ Segher
Re: Question about make_extraction() in combine.c
On 11/21/2018 08:33 AM, Segher Boessenkool wrote: On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote: The internal RTL should not be dictating what the target arch can or cannot implement. Reload should insert any needed conversions, especially ones which narrow the size. Well, that depends. A zero_extract of mem is only defined for byte_mode, just like SET is only defined for VOIDmode (on the SET itself, not its args). Because this is guaranteed, nothing in GCC ever needs to check this. That is the theory of course; in reality quite a few targets have used other modes for the mem in a zero_extract, and this seems to have mostly worked. This restriction on zero_extract MEM args (and only MEM) seems to be completely arbitrary. What is it about the operation of extracting a bit field which makes it dependent on the memory access size? The value of SET is VOIDmode, in that it has no value. Not sure what your point is here. As another example, closer by, an extract length of 0 is not allowed either, for zero_extract. And this *did* cause problems recently. This is a restriction which does make sense. It isn't clear what the value of a zero length field is, or how to represent it. If something is undefined, then there is a strong argument for making is invalid. (Are there architectures which have instructions which extract a zero length bit field? I doubt it.) Why was it documented as requiring byte mode? Was this changed, just the documentation was not updated? Ancient history. As Jeff said, perhaps an architectural requirement of VAX or m68k. This wasn't changed, as far as I'm aware. -- Michael Eagerea...@eagerm.com 1960 Park Blvd., Palo Alto, CA 94306
Re: Question about make_extraction() in combine.c
On Tue, Nov 20, 2018 at 10:07:35AM -0800, Michael Eager wrote: > The internal RTL should not be dictating what the target arch can or > cannot implement. Reload should insert any needed conversions, > especially ones which narrow the size. Well, that depends. A zero_extract of mem is only defined for byte_mode, just like SET is only defined for VOIDmode (on the SET itself, not its args). Because this is guaranteed, nothing in GCC ever needs to check this. That is the theory of course; in reality quite a few targets have used other modes for the mem in a zero_extract, and this seems to have mostly worked. As another example, closer by, an extract length of 0 is not allowed either, for zero_extract. And this *did* cause problems recently. Why was it documented as requiring byte mode? Was this changed, just the documentation was not updated? Segher
Re: Question about make_extraction() in combine.c
On 11/20/18 11:07 AM, Michael Eager wrote: > On 11/18/2018 08:14 AM, Jeff Law wrote: >> On 11/18/18 8:44 AM, Michael Eager wrote: >>> On 11/16/18 14:50, Segher Boessenkool wrote: Hi! On Wed, Nov 14, 2018 at 11:22:58AM -0800, Michael Eager wrote: > The (mem:SI) is converted to (mem:QI). > > The return from make_extract() is > (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ])) > (const_int 1 [0x1]) > (const_int 2 [0x2])) > > The target has an instruction pattern for zero_extract, but it matches > SI values, not QI. So the instruction which implements a test of a > bit > flag is never generated. The RTL documentation says: If @var{loc} is in memory, its mode must be a single-byte integer mode. But obviously various ports use other modes, too. I wonder if that ever worked well. >>> >>> Thanks. I hadn't noticed this. >>> >>> Does anyone have any idea why there is a restriction on the mode? >>> Other instruction patterns don't place arbitrary restriction on the >>> memory access mode. >> Not offhand. BUt it's been around for a long time (at least since the >> early 90s). I stumbled over it several times through the years. It >> could well be an artifact of the m68k or the vax. > > The internal RTL should not be dictating what the target arch can or > cannot implement. Reload should insert any needed conversions, > especially ones which narrow the size. > > I have a patch which removes this conversion. Works for my target. I > built and ran 'make check' for x86 with no additional failures. I don't > have a VAX or M68K sitting around to test. :-) I can do correctness tests for m68k via qemu. Essentially verifying it still bootstraps:-) Just pass along a git formatted patch and I can throw it into the tester. jeff
Re: Question about make_extraction() in combine.c
On 11/18/2018 08:14 AM, Jeff Law wrote: On 11/18/18 8:44 AM, Michael Eager wrote: On 11/16/18 14:50, Segher Boessenkool wrote: Hi! On Wed, Nov 14, 2018 at 11:22:58AM -0800, Michael Eager wrote: The (mem:SI) is converted to (mem:QI). The return from make_extract() is (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ])) (const_int 1 [0x1]) (const_int 2 [0x2])) The target has an instruction pattern for zero_extract, but it matches SI values, not QI. So the instruction which implements a test of a bit flag is never generated. The RTL documentation says: If @var{loc} is in memory, its mode must be a single-byte integer mode. But obviously various ports use other modes, too. I wonder if that ever worked well. Thanks. I hadn't noticed this. Does anyone have any idea why there is a restriction on the mode? Other instruction patterns don't place arbitrary restriction on the memory access mode. Not offhand. BUt it's been around for a long time (at least since the early 90s). I stumbled over it several times through the years. It could well be an artifact of the m68k or the vax. The internal RTL should not be dictating what the target arch can or cannot implement. Reload should insert any needed conversions, especially ones which narrow the size. I have a patch which removes this conversion. Works for my target. I built and ran 'make check' for x86 with no additional failures. I don't have a VAX or M68K sitting around to test. :-) I can submit the patch and remove the restriction from the docs, but I have no way of telling that this doesn't break (or deoptimize) some other target. -- Michael Eagerea...@eagerm.com 1960 Park Blvd., Palo Alto, CA 94306
Question about make_extraction() in combine.c
In combine, simplify_comparison() is being called with the following arguments: code = EQ op0 = (and:SI (mem:SI (reg/v/f:SI 50 [ gp ]) (const_int 4 [0x4])) op1 = (const_int 0 [0]) After churning down through make_compound_operation() and make_compound_operation_int(), processing gets to make_extraction(). Eventually (combine.c:7753), make_extract() decides that the best pattern is EP_epextzv. No instruction patterns are provided for "extz*"; get_best_reg_extraction_insn() returns false, and make_extraction falls through to this code [indenting modified for email]. combine.c:7786: /* Be careful not to go beyond the extracted object and maintain the natural alignment of the memory. */ wanted_inner_mode = smallest_int_mode_for_size (len); while (pos % GET_MODE_BITSIZE (wanted_inner_mode) + len > GET_MODE_BITSIZE (wanted_inner_mode)) wanted_inner_mode = GET_MODE_WIDER_MODE (wanted_inner_mode).require (); Len == 1; the result is that wanted_inner_mode == E_QImode. No instruction patterns are provided for "extz*"; get_best_reg_extraction_insn() returns false, and make_extraction falls through to this code The (mem:SI) is converted to (mem:QI). The return from make_extract() is (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ])) (const_int 1 [0x1]) (const_int 2 [0x2])) The target has an instruction pattern for zero_extract, but it matches SI values, not QI. So the instruction which implements a test of a bit flag is never generated. In an old version of GCC, a call to mode_dependent_address_p() controlled whether this conversion from SI to QI was done. This would not be useful, since QI is a valid address mode in general, just not for the zero_extract pattern. A kludge was added to prevent this conversion from SI to QI. I'd like to avoid re-implementing the kludge. Questions: 1. What is make_extract() trying to do with the MEM address? 2. Why modify the MEM address from SI to QI? There's no obvious benefit that I see of (zero_extract:SI (mem:QI)...) over (zero_extract:SI (mem:SI)...). 3. What's the best way to fix this? - Remove the down-sizing of MEM in make_extract()? - Define patterns for extz*? - Do something so zero_extend accepts QI? -- Michael Eagerea...@eagerm.com 1960 Park Blvd., Palo Alto, CA 94306