[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2021-10-01 Thread robotux at celest dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

Thomas Preud'homme  changed:

   What|Removed |Added

 CC||robotux at celest dot fr

--- Comment #17 from Thomas Preud'homme  ---
(In reply to Maxim Kuvyrkov from comment #16)
> AFAIK, Thomas isn't working on this.

Indeed, not anymore, sorry.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2021-10-01 Thread mkuvyrkov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

Maxim Kuvyrkov  changed:

   What|Removed |Added

 Status|ASSIGNED|UNCONFIRMED
 Ever confirmed|1   |0

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2021-10-01 Thread mkuvyrkov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

Maxim Kuvyrkov  changed:

   What|Removed |Added

 CC||mkuvyrkov at gcc dot gnu.org
   Assignee|thomas.preudhomme at celest dot fr |unassigned at gcc dot 
gnu.org

--- Comment #16 from Maxim Kuvyrkov  ---
AFAIK, Thomas isn't working on this.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-12 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

--- Comment #15 from Eric Botcazou  ---
> Right makes sense. So I tried your suggestion (guard the first if with
> !reverse but not the second) and it didn't work. Problem as you suggested is
> adjust_bit_field_mem_for_reg which refuses to do an unaligned load (or
> rather bit_field_mode_iterator's next_mode method refuses). I think
> get_best_mem_extraction_insn does not have this problem because instead it
> just queries whether an instruction to do unaligned access exist.
> 
> Are you aware of a reason why next_mode does not do the same?

The alignment of modes is enforced on strict-alignment targets like ARM:

  /* Stop if the mode requires too much alignment.  */
  if (GET_MODE_ALIGNMENT (mode) > m_align
  && targetm.slow_unaligned_access (mode, m_align))
break;

So you'll probably need to fiddle with the extv path, i.e. either to implement
the generic handling of reverse storage in extract_bit_field_using_extv or do
it only for simple bitfields in extract_integral_bit_field as discussed
previously.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-12 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

--- Comment #14 from Thomas Preud'homme  ---
(In reply to Eric Botcazou from comment #13)
> > Forgive my naive question as I'm not too familiar with that part of the
> > compiler: why should the get_best_mem_extraction_insn be guarded with
> > reverse? I thought I'd just ad an if (reverse) if it succeeds and call
> > flip_storage_order there, likewise after the call to extract_bit_field_1
> > below if successful.
> 
> No, the numbering of bits depends on the endianness, i.e. you need to know
> the endianness of the source to do a correct extraction.  For example, if
> you extract bit #2 - bit #9 of a structure in big-endian using HImode, then
> you cannot do it in little-endian and just swap the bytes afterwards (as a
> matter of fact, there is nothing to swap since the result is byte-sized). 
> The LE extraction is:
>   HImode load + HImode right_shift (2)
> whereas the BE extraction is:
>   HImode load + HImode right_shift (6)
> 
> The extv machinery cannot handle reverse SSO for the time being so the guard
> is still needed for it in the general case; on the contrary,
> extract_bit_field_1 can already and doesn't need an additional call to
> flip_storage_order.
> 
> Of course, for specific bitfields, typically verifying
> simple_mem_bitfield_p, then you can extract in native order and do
> flip_storage_order on the result.
> 
> In other words, the extv path can be used as you envision, but only for
> specific bitfields modeled on those accepted by simple_mem_bitfield_p, and
> then the call to flip_storage_order will indeed be needed.

Right makes sense. So I tried your suggestion (guard the first if with !reverse
but not the second) and it didn't work. Problem as you suggested is
adjust_bit_field_mem_for_reg which refuses to do an unaligned load (or rather
bit_field_mode_iterator's next_mode method refuses). I think
get_best_mem_extraction_insn does not have this problem because instead it just
queries whether an instruction to do unaligned access exist.

Are you aware of a reason why next_mode does not do the same?

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-10 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

--- Comment #13 from Eric Botcazou  ---
> Forgive my naive question as I'm not too familiar with that part of the
> compiler: why should the get_best_mem_extraction_insn be guarded with
> reverse? I thought I'd just ad an if (reverse) if it succeeds and call
> flip_storage_order there, likewise after the call to extract_bit_field_1
> below if successful.

No, the numbering of bits depends on the endianness, i.e. you need to know the
endianness of the source to do a correct extraction.  For example, if you
extract bit #2 - bit #9 of a structure in big-endian using HImode, then you
cannot do it in little-endian and just swap the bytes afterwards (as a matter
of fact, there is nothing to swap since the result is byte-sized).  The LE
extraction is:
  HImode load + HImode right_shift (2)
whereas the BE extraction is:
  HImode load + HImode right_shift (6)

The extv machinery cannot handle reverse SSO for the time being so the guard is
still needed for it in the general case; on the contrary, extract_bit_field_1
can already and doesn't need an additional call to flip_storage_order.

Of course, for specific bitfields, typically verifying simple_mem_bitfield_p,
then you can extract in native order and do flip_storage_order on the result.

In other words, the extv path can be used as you envision, but only for
specific bitfields modeled on those accepted by simple_mem_bitfield_p, and then
the call to flip_storage_order will indeed be needed.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-10 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

--- Comment #12 from Thomas Preud'homme  ---
(In reply to Eric Botcazou from comment #11)
> > Therefore unaligned access are handled by extract_bit_field. This in turns
> > call extract_bit_field_1 and later extract_integral_bit_field where things
> > are different between little endian and big endian. For little endian, it
> > goes in the following if block:
> > 
> >   /* If OP0 is a memory, try copying it to a register and seeing if a
> >  cheap register alternative is available.  */
> >   if (MEM_P (op0) & !reverse)
> > 
> > For big endian it continues and calls extract_fixed_bit_field. I'm wondering
> > if an extra call to flip_storage_order when reverse is true would solve the
> > issue. Need to understand better whe is it safe to call flip_storage_order.
> 
> Where do you want to add a call to flip_storage_order exactly?  The correct
> thing to do would be to move the !reverse test from the top-level if to the
> first inner if (in order to bypass only the extv business) and see what
> happens next (and be prepared for adjusting adjust_bit_field_mem_for_reg to
> reverse SSO).

Forgive my naive question as I'm not too familiar with that part of the
compiler: why should the get_best_mem_extraction_insn be guarded with reverse?
I thought I'd just ad an if (reverse) if it succeeds and call
flip_storage_order there, likewise after the call to extract_bit_field_1 below
if successful.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-10 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

Eric Botcazou  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu.org

--- Comment #11 from Eric Botcazou  ---
> Therefore unaligned access are handled by extract_bit_field. This in turns
> call extract_bit_field_1 and later extract_integral_bit_field where things
> are different between little endian and big endian. For little endian, it
> goes in the following if block:
> 
>   /* If OP0 is a memory, try copying it to a register and seeing if a
>  cheap register alternative is available.  */
>   if (MEM_P (op0) & !reverse)
> 
> For big endian it continues and calls extract_fixed_bit_field. I'm wondering
> if an extra call to flip_storage_order when reverse is true would solve the
> issue. Need to understand better whe is it safe to call flip_storage_order.

Where do you want to add a call to flip_storage_order exactly?  The correct
thing to do would be to move the !reverse test from the top-level if to the
first inner if (in order to bypass only the extv business) and see what happens
next (and be prepared for adjusting adjust_bit_field_mem_for_reg to reverse
SSO).

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-09 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

Thomas Preud'homme  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-10-09
   Assignee|unassigned at gcc dot gnu.org  |thopre01 at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #10 from Thomas Preud'homme  ---
(In reply to Thomas Preud'homme from comment #8)
> (In reply to Thomas Preud'homme from comment #7)
> > (In reply to Thomas Preud'homme from comment #6)
> > > Happens at expand time. Diving in.
> > 
> > There's a giant if in expand_expr_real_1 with the following comment:
> > 
> > /* In cases where an aligned union has an unaligned object
> >as a field, we might be extracting a BLKmode value from
> >an integer-mode (e.g., SImode) object.  Handle this case
> >by doing the extract into an object as wide as the field
> >(which we know to be the width of a basic mode), then
> >storing into memory, and changing the mode to BLKmode.  */
> > 
> > The "if" is entered in the big endian unaligned case but not in the other
> > case. In the aligned case, it continues after the if until the call to
> > flip_storage_order which will generate the bswap.
> 
> In the aligned case, the if is not taken because alignment of the memory Vs
> access is sufficient. There is provision to call flip_storage_order but only
> if the access is a RECORD and here the mode class is INT.
> 
> Therefore unaligned access are handled by extract_bit_field. This in turns
> call extract_bit_field_1 and later extract_integral_bit_field where things
> are different between little endian and big endian. For little endian, it
> goes in the following if block:
> 
>   /* If OP0 is a memory, try copying it to a register and seeing if a
>  cheap register alternative is available.  */
>   if (MEM_P (op0) & !reverse)
> 
> For big endian it continues and calls extract_fixed_bit_field. I'm wondering
> if an extra call to flip_storage_order when reverse is true would solve the
> issue. Need to understand better whe is it safe to call flip_storage_order.

It gives me the expected assembly but I need to convince myself that this is
always safe.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-09 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

--- Comment #9 from Thomas Preud'homme  ---
(In reply to Thomas Preud'homme from comment #8)
> (In reply to Thomas Preud'homme from comment #7)
> > (In reply to Thomas Preud'homme from comment #6)
> > > Happens at expand time. Diving in.
> > 
> > There's a giant if in expand_expr_real_1 with the following comment:
> > 
> > /* In cases where an aligned union has an unaligned object
> >as a field, we might be extracting a BLKmode value from
> >an integer-mode (e.g., SImode) object.  Handle this case
> >by doing the extract into an object as wide as the field
> >(which we know to be the width of a basic mode), then
> >storing into memory, and changing the mode to BLKmode.  */
> > 
> > The "if" is entered in the big endian unaligned case but not in the other
> > case. In the aligned case, it continues after the if until the call to
> > flip_storage_order which will generate the bswap.
> 
> In the aligned case, the if is not taken because alignment of the memory Vs
> access is sufficient. There is provision to call flip_storage_order but only
> if the access is a RECORD and here the mode class is INT.
> 
> Therefore unaligned access are handled by extract_bit_field. This in turns
> call extract_bit_field_1 and later extract_integral_bit_field where things
> are different between little endian and big endian. For little endian, it
> goes in the following if block:
> 
>   /* If OP0 is a memory, try copying it to a register and seeing if a
>  cheap register alternative is available.  */
>   if (MEM_P (op0) & !reverse)
> 
> For big endian it continues and calls extract_fixed_bit_field. I'm wondering
> if an extra call to flip_storage_order when reverse is true would solve the
> issue. Need to understand better whe is it safe to call flip_storage_order.

It gives me the expected assembly but I need to convince myself that this is
always safe.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-09 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

--- Comment #8 from Thomas Preud'homme  ---
(In reply to Thomas Preud'homme from comment #7)
> (In reply to Thomas Preud'homme from comment #6)
> > Happens at expand time. Diving in.
> 
> There's a giant if in expand_expr_real_1 with the following comment:
> 
> /* In cases where an aligned union has an unaligned object
>as a field, we might be extracting a BLKmode value from
>an integer-mode (e.g., SImode) object.  Handle this case
>by doing the extract into an object as wide as the field
>(which we know to be the width of a basic mode), then
>storing into memory, and changing the mode to BLKmode.  */
> 
> The "if" is entered in the big endian unaligned case but not in the other
> case. In the aligned case, it continues after the if until the call to
> flip_storage_order which will generate the bswap.

In the aligned case, the if is not taken because alignment of the memory Vs
access is sufficient. There is provision to call flip_storage_order but only if
the access is a RECORD and here the mode class is INT.

Therefore unaligned access are handled by extract_bit_field. This in turns call
extract_bit_field_1 and later extract_integral_bit_field where things are
different between little endian and big endian. For little endian, it goes in
the following if block:

  /* If OP0 is a memory, try copying it to a register and seeing if a
 cheap register alternative is available.  */
  if (MEM_P (op0) & !reverse)

For big endian it continues and calls extract_fixed_bit_field. I'm wondering if
an extra call to flip_storage_order when reverse is true would solve the issue.
Need to understand better whe is it safe to call flip_storage_order.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-09 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

Thomas Preud'homme  changed:

   What|Removed |Added

 CC||thopre01 at gcc dot gnu.org

--- Comment #7 from Thomas Preud'homme  ---
(In reply to Thomas Preud'homme from comment #6)
> Happens at expand time. Diving in.

There's a giant if in expand_expr_real_1 with the following comment:

/* In cases where an aligned union has an unaligned object
   as a field, we might be extracting a BLKmode value from
   an integer-mode (e.g., SImode) object.  Handle this case
   by doing the extract into an object as wide as the field
   (which we know to be the width of a basic mode), then
   storing into memory, and changing the mode to BLKmode.  */

The "if" is entered in the big endian unaligned case but not in the other case.
In the aligned case, it continues after the if until the call to
flip_storage_order which will generate the bswap.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-09 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

--- Comment #6 from Thomas Preud'homme  ---
Happens at expand time. Diving in.

[Bug target/86968] Unaligned big-endian (scalar_storage_order) access on armv7-a yields 4 ldrb instructions rather than ldr+rev

2018-10-09 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86968

Ramana Radhakrishnan  changed:

   What|Removed |Added

 CC||ramana at gcc dot gnu.org

--- Comment #5 from Ramana Radhakrishnan  ---
(In reply to jos...@codesourcery.com from comment #4)
> Any unaligned access things that don't work for big-endian ARM are 
> probably fallout from the issues with big-endian NEON (NEON architectural 
> lane numbers are different from the architecture-independent lane numbers 
> in GNU C vector extensions and GCC IR, and GCC expects each machine mode 
> to have a single defined memory layout and a single defined layout in any 
> given register, and to be able to move between core and NEON registers, 
> and between core registers and memory, in the respective layouts used for 
> those registers, but some NEON loads and stores for big-endian don't work 
> with those expectations, so unaligned vector operations are limited for 
> big-endian ARM).

Correct, we don't allow misaligned access for Neon because of exactly the above
mentioned reasons. 


I would have however expected misaligned access to work with -march=armv7-a
-munaligned-access -mfpu=vfpv3-d16 -mfloat-abi=softfp/hard on the command line
for the afore mentioned testcase as we do have a movmisalign pattern in arm.md
that should kick in overriding the movmisalign pattern in neon.md.  It probably
needs a little more detailed investigation.