[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 Thomas Preud'homme changed: What|Removed |Added CC||robotux at celest dot fr --- Comment #17 from Thomas Preud'homme --- (In reply to Maxim Kuvyrkov from comment #16) > AFAIK, Thomas isn't working on this. Indeed, not anymore, sorry.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 Maxim Kuvyrkov changed: What|Removed |Added Status|ASSIGNED|UNCONFIRMED Ever confirmed|1 |0
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 Maxim Kuvyrkov changed: What|Removed |Added CC||mkuvyrkov at gcc dot gnu.org Assignee|thomas.preudhomme at celest dot fr |unassigned at gcc dot gnu.org --- Comment #16 from Maxim Kuvyrkov --- AFAIK, Thomas isn't working on this.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 --- Comment #15 from Eric Botcazou --- > Right makes sense. So I tried your suggestion (guard the first if with > !reverse but not the second) and it didn't work. Problem as you suggested is > adjust_bit_field_mem_for_reg which refuses to do an unaligned load (or > rather bit_field_mode_iterator's next_mode method refuses). I think > get_best_mem_extraction_insn does not have this problem because instead it > just queries whether an instruction to do unaligned access exist. > > Are you aware of a reason why next_mode does not do the same? The alignment of modes is enforced on strict-alignment targets like ARM: /* Stop if the mode requires too much alignment. */ if (GET_MODE_ALIGNMENT (mode) > m_align && targetm.slow_unaligned_access (mode, m_align)) break; So you'll probably need to fiddle with the extv path, i.e. either to implement the generic handling of reverse storage in extract_bit_field_using_extv or do it only for simple bitfields in extract_integral_bit_field as discussed previously.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 --- Comment #14 from Thomas Preud'homme --- (In reply to Eric Botcazou from comment #13) > > Forgive my naive question as I'm not too familiar with that part of the > > compiler: why should the get_best_mem_extraction_insn be guarded with > > reverse? I thought I'd just ad an if (reverse) if it succeeds and call > > flip_storage_order there, likewise after the call to extract_bit_field_1 > > below if successful. > > No, the numbering of bits depends on the endianness, i.e. you need to know > the endianness of the source to do a correct extraction. For example, if > you extract bit #2 - bit #9 of a structure in big-endian using HImode, then > you cannot do it in little-endian and just swap the bytes afterwards (as a > matter of fact, there is nothing to swap since the result is byte-sized). > The LE extraction is: > HImode load + HImode right_shift (2) > whereas the BE extraction is: > HImode load + HImode right_shift (6) > > The extv machinery cannot handle reverse SSO for the time being so the guard > is still needed for it in the general case; on the contrary, > extract_bit_field_1 can already and doesn't need an additional call to > flip_storage_order. > > Of course, for specific bitfields, typically verifying > simple_mem_bitfield_p, then you can extract in native order and do > flip_storage_order on the result. > > In other words, the extv path can be used as you envision, but only for > specific bitfields modeled on those accepted by simple_mem_bitfield_p, and > then the call to flip_storage_order will indeed be needed. Right makes sense. So I tried your suggestion (guard the first if with !reverse but not the second) and it didn't work. Problem as you suggested is adjust_bit_field_mem_for_reg which refuses to do an unaligned load (or rather bit_field_mode_iterator's next_mode method refuses). I think get_best_mem_extraction_insn does not have this problem because instead it just queries whether an instruction to do unaligned access exist. Are you aware of a reason why next_mode does not do the same?
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 --- Comment #13 from Eric Botcazou --- > Forgive my naive question as I'm not too familiar with that part of the > compiler: why should the get_best_mem_extraction_insn be guarded with > reverse? I thought I'd just ad an if (reverse) if it succeeds and call > flip_storage_order there, likewise after the call to extract_bit_field_1 > below if successful. No, the numbering of bits depends on the endianness, i.e. you need to know the endianness of the source to do a correct extraction. For example, if you extract bit #2 - bit #9 of a structure in big-endian using HImode, then you cannot do it in little-endian and just swap the bytes afterwards (as a matter of fact, there is nothing to swap since the result is byte-sized). The LE extraction is: HImode load + HImode right_shift (2) whereas the BE extraction is: HImode load + HImode right_shift (6) The extv machinery cannot handle reverse SSO for the time being so the guard is still needed for it in the general case; on the contrary, extract_bit_field_1 can already and doesn't need an additional call to flip_storage_order. Of course, for specific bitfields, typically verifying simple_mem_bitfield_p, then you can extract in native order and do flip_storage_order on the result. In other words, the extv path can be used as you envision, but only for specific bitfields modeled on those accepted by simple_mem_bitfield_p, and then the call to flip_storage_order will indeed be needed.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 --- Comment #12 from Thomas Preud'homme --- (In reply to Eric Botcazou from comment #11) > > Therefore unaligned access are handled by extract_bit_field. This in turns > > call extract_bit_field_1 and later extract_integral_bit_field where things > > are different between little endian and big endian. For little endian, it > > goes in the following if block: > > > > /* If OP0 is a memory, try copying it to a register and seeing if a > > cheap register alternative is available. */ > > if (MEM_P (op0) & !reverse) > > > > For big endian it continues and calls extract_fixed_bit_field. I'm wondering > > if an extra call to flip_storage_order when reverse is true would solve the > > issue. Need to understand better whe is it safe to call flip_storage_order. > > Where do you want to add a call to flip_storage_order exactly? The correct > thing to do would be to move the !reverse test from the top-level if to the > first inner if (in order to bypass only the extv business) and see what > happens next (and be prepared for adjusting adjust_bit_field_mem_for_reg to > reverse SSO). Forgive my naive question as I'm not too familiar with that part of the compiler: why should the get_best_mem_extraction_insn be guarded with reverse? I thought I'd just ad an if (reverse) if it succeeds and call flip_storage_order there, likewise after the call to extract_bit_field_1 below if successful.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 Eric Botcazou changed: What|Removed |Added CC||ebotcazou at gcc dot gnu.org --- Comment #11 from Eric Botcazou --- > Therefore unaligned access are handled by extract_bit_field. This in turns > call extract_bit_field_1 and later extract_integral_bit_field where things > are different between little endian and big endian. For little endian, it > goes in the following if block: > > /* If OP0 is a memory, try copying it to a register and seeing if a > cheap register alternative is available. */ > if (MEM_P (op0) & !reverse) > > For big endian it continues and calls extract_fixed_bit_field. I'm wondering > if an extra call to flip_storage_order when reverse is true would solve the > issue. Need to understand better whe is it safe to call flip_storage_order. Where do you want to add a call to flip_storage_order exactly? The correct thing to do would be to move the !reverse test from the top-level if to the first inner if (in order to bypass only the extv business) and see what happens next (and be prepared for adjusting adjust_bit_field_mem_for_reg to reverse SSO).
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 Thomas Preud'homme changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-10-09 Assignee|unassigned at gcc dot gnu.org |thopre01 at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #10 from Thomas Preud'homme --- (In reply to Thomas Preud'homme from comment #8) > (In reply to Thomas Preud'homme from comment #7) > > (In reply to Thomas Preud'homme from comment #6) > > > Happens at expand time. Diving in. > > > > There's a giant if in expand_expr_real_1 with the following comment: > > > > /* In cases where an aligned union has an unaligned object > >as a field, we might be extracting a BLKmode value from > >an integer-mode (e.g., SImode) object. Handle this case > >by doing the extract into an object as wide as the field > >(which we know to be the width of a basic mode), then > >storing into memory, and changing the mode to BLKmode. */ > > > > The "if" is entered in the big endian unaligned case but not in the other > > case. In the aligned case, it continues after the if until the call to > > flip_storage_order which will generate the bswap. > > In the aligned case, the if is not taken because alignment of the memory Vs > access is sufficient. There is provision to call flip_storage_order but only > if the access is a RECORD and here the mode class is INT. > > Therefore unaligned access are handled by extract_bit_field. This in turns > call extract_bit_field_1 and later extract_integral_bit_field where things > are different between little endian and big endian. For little endian, it > goes in the following if block: > > /* If OP0 is a memory, try copying it to a register and seeing if a > cheap register alternative is available. */ > if (MEM_P (op0) & !reverse) > > For big endian it continues and calls extract_fixed_bit_field. I'm wondering > if an extra call to flip_storage_order when reverse is true would solve the > issue. Need to understand better whe is it safe to call flip_storage_order. It gives me the expected assembly but I need to convince myself that this is always safe.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 --- Comment #9 from Thomas Preud'homme --- (In reply to Thomas Preud'homme from comment #8) > (In reply to Thomas Preud'homme from comment #7) > > (In reply to Thomas Preud'homme from comment #6) > > > Happens at expand time. Diving in. > > > > There's a giant if in expand_expr_real_1 with the following comment: > > > > /* In cases where an aligned union has an unaligned object > >as a field, we might be extracting a BLKmode value from > >an integer-mode (e.g., SImode) object. Handle this case > >by doing the extract into an object as wide as the field > >(which we know to be the width of a basic mode), then > >storing into memory, and changing the mode to BLKmode. */ > > > > The "if" is entered in the big endian unaligned case but not in the other > > case. In the aligned case, it continues after the if until the call to > > flip_storage_order which will generate the bswap. > > In the aligned case, the if is not taken because alignment of the memory Vs > access is sufficient. There is provision to call flip_storage_order but only > if the access is a RECORD and here the mode class is INT. > > Therefore unaligned access are handled by extract_bit_field. This in turns > call extract_bit_field_1 and later extract_integral_bit_field where things > are different between little endian and big endian. For little endian, it > goes in the following if block: > > /* If OP0 is a memory, try copying it to a register and seeing if a > cheap register alternative is available. */ > if (MEM_P (op0) & !reverse) > > For big endian it continues and calls extract_fixed_bit_field. I'm wondering > if an extra call to flip_storage_order when reverse is true would solve the > issue. Need to understand better whe is it safe to call flip_storage_order. It gives me the expected assembly but I need to convince myself that this is always safe.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 --- Comment #8 from Thomas Preud'homme --- (In reply to Thomas Preud'homme from comment #7) > (In reply to Thomas Preud'homme from comment #6) > > Happens at expand time. Diving in. > > There's a giant if in expand_expr_real_1 with the following comment: > > /* In cases where an aligned union has an unaligned object >as a field, we might be extracting a BLKmode value from >an integer-mode (e.g., SImode) object. Handle this case >by doing the extract into an object as wide as the field >(which we know to be the width of a basic mode), then >storing into memory, and changing the mode to BLKmode. */ > > The "if" is entered in the big endian unaligned case but not in the other > case. In the aligned case, it continues after the if until the call to > flip_storage_order which will generate the bswap. In the aligned case, the if is not taken because alignment of the memory Vs access is sufficient. There is provision to call flip_storage_order but only if the access is a RECORD and here the mode class is INT. Therefore unaligned access are handled by extract_bit_field. This in turns call extract_bit_field_1 and later extract_integral_bit_field where things are different between little endian and big endian. For little endian, it goes in the following if block: /* If OP0 is a memory, try copying it to a register and seeing if a cheap register alternative is available. */ if (MEM_P (op0) & !reverse) For big endian it continues and calls extract_fixed_bit_field. I'm wondering if an extra call to flip_storage_order when reverse is true would solve the issue. Need to understand better whe is it safe to call flip_storage_order.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 Thomas Preud'homme changed: What|Removed |Added CC||thopre01 at gcc dot gnu.org --- Comment #7 from Thomas Preud'homme --- (In reply to Thomas Preud'homme from comment #6) > Happens at expand time. Diving in. There's a giant if in expand_expr_real_1 with the following comment: /* In cases where an aligned union has an unaligned object as a field, we might be extracting a BLKmode value from an integer-mode (e.g., SImode) object. Handle this case by doing the extract into an object as wide as the field (which we know to be the width of a basic mode), then storing into memory, and changing the mode to BLKmode. */ The "if" is entered in the big endian unaligned case but not in the other case. In the aligned case, it continues after the if until the call to flip_storage_order which will generate the bswap.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 --- Comment #6 from Thomas Preud'homme --- Happens at expand time. Diving in.
[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968 Ramana Radhakrishnan changed: What|Removed |Added CC||ramana at gcc dot gnu.org --- Comment #5 from Ramana Radhakrishnan --- (In reply to jos...@codesourcery.com from comment #4) > Any unaligned access things that don't work for big-endian ARM are > probably fallout from the issues with big-endian NEON (NEON architectural > lane numbers are different from the architecture-independent lane numbers > in GNU C vector extensions and GCC IR, and GCC expects each machine mode > to have a single defined memory layout and a single defined layout in any > given register, and to be able to move between core and NEON registers, > and between core registers and memory, in the respective layouts used for > those registers, but some NEON loads and stores for big-endian don't work > with those expectations, so unaligned vector operations are limited for > big-endian ARM). Correct, we don't allow misaligned access for Neon because of exactly the above mentioned reasons. I would have however expected misaligned access to work with -march=armv7-a -munaligned-access -mfpu=vfpv3-d16 -mfloat-abi=softfp/hard on the command line for the afore mentioned testcase as we do have a movmisalign pattern in arm.md that should kick in overriding the movmisalign pattern in neon.md. It probably needs a little more detailed investigation.