Re: RFC: New mechanism for hard reg operands to inline asm

2024-03-15 Thread Stefan Schulze Frielinghaus
On Fri, Jun 04, 2021 at 06:02:27PM +, Andreas Krebbel via Gcc wrote:
> Hi,
> 
> I wonder if we could replace the register asm construct for
> inline assemblies with something a bit nicer and more obvious.
> E.g. turning this (real world example from IBM Z kernel code):
> 
> int diag8_response(int cmdlen, char *response, int *rlen)
> {
> register unsigned long reg2 asm ("2") = (addr_t) cpcmd_buf;
> register unsigned long reg3 asm ("3") = (addr_t) response;
> register unsigned long reg4 asm ("4") = cmdlen | 0x4000L;
> register unsigned long reg5 asm ("5") = *rlen; /* <-- */
> asm volatile(
> "   diag%2,%0,0x8\n"
> "   brc 8,1f\n"
> "   agr %1,%4\n"
> "1:\n"
> : "+d" (reg4), "+d" (reg5)
> : "d" (reg2), "d" (reg3), "d" (*rlen): "cc");
> *rlen = reg5;
> return reg4;
> }
> 
> into this:
> 
> int diag8_response(int cmdlen, char *response, int *rlen)
> {
> unsigned long len = cmdlen | 0x4000L;
> 
> asm volatile(
> "   diag%2,%0,0x8\n"
> "   brc 8,1f\n"
> "   agr %1,%4\n"
> "1:\n"
> : "+{r4}" (len), "+{r5}" (*rlen)
> : "{r2}" ((addr_t)cpcmd_buf), "{r3}" ((addr_t)response), "d" 
> (*rlen): "cc");
> return len;
> }
> 
> Apart from being much easier to read because the hard regs become part
> of the inline assembly it solves also a couple of other issues:
> 
> - function calls might clobber register asm variables see BZ100908
> - the constraints for the register asm operands are superfluous
> - one register asm variable cannot be used for 2 different inline
>   assemblies if the value is expected in different hard regs
> 
> I've started with a hackish implementation for IBM Z using the
> TARGET_MD_ASM_ADJUST hook and let all the places parsing constraints
> skip over the {} parts.  But perhaps it would be useful to make this a
> generic mechanism for all targets?!
> 
> Andrea

Hi all,

I would like to resurrect this topic
https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html and have been
coming up with a first implementation in order to discuss this further.

Basically, I see two ways to implement this.  First is by letting LRA
assign the registers and the second one by introducing extra moves just
before/after asm statements.  Currently I went for the latter and emit
extra moves during expand into hard regs as specified by the
input/output constraints.

Before going forward I would like to get some feedback whether this approach
makes sense to you at all or whether you see some show stoppers.  I was
wondering whether my current approach is robust enough in the sense that no
other pass could potentially remove the extra moves I introduced before.
In particular I was first worried about code motion.  Initially I thought I
have to make use not only of hard regs but hard regs which are flagged as
register-asms in order to prevent optimizations to fiddly around with those
moves.  However, after some more investigation I tend to conclude that this is
not necessary.  Any thoughts about this approach?

With the current approach I can at least handle cases like:

int __attribute__ ((noipa))
foo (int x) { return x; }

int test (int x)
{
  asm ("foo %0,%1\n" :: "{r3}" (foo (x + 1)), "{r2}" (x));
  return x;
}

Note, this is written with the s390 ABI in mind where the first int argument
and return value are passed in register r2.  The point here is that r2 needs to
be altered and restored multiple times until we reach } of function test().
Luckily, during expand we get all this basically for free.

This brings me to the general question what should be allowed and what not?
Evaluation order of input expressions is probably unspecified similar to
function arguments.  However, what about this one:

int test (int x)
{
  register int y asm ("r5") = x + 1;
  asm ("foo %0,%1\n" : "={r4}" (y) : "{r1}" (y));
  return y;
}

IMHO the input is just fine but the output constraint is misleading and it is
not obvious in which register variable y resides after the asm statement.
With my current implementation, were I don't bail out, it is register r4
contrary to the decl.  Interestingly, the other way around where one register
is "aliased" by multiple variables is accepted by vanilla GCC:

int foo (int x, int y)
{
  register int a asm ("r1") = x;
  register int b asm ("r1") = y;
  return a + b;
}

Though, probably not intentionally.

Cheers,
Stefan


Re: RFC: New mechanism for hard reg operands to inline asm

2021-06-07 Thread Richard Sandiford via Gcc
Paul Koning via Gcc  writes:
>> On Jun 4, 2021, at 2:02 PM, Andreas Krebbel via Gcc  wrote:
>> 
>> Hi,
>> 
>> I wonder if we could replace the register asm construct for
>> inline assemblies with something a bit nicer and more obvious.
>> E.g. turning this (real world example from IBM Z kernel code):
>> 
>> int diag8_response(int cmdlen, char *response, int *rlen)
>> {
>>register unsigned long reg2 asm ("2") = (addr_t) cpcmd_buf;
>>register unsigned long reg3 asm ("3") = (addr_t) response;
>>register unsigned long reg4 asm ("4") = cmdlen | 0x4000L;
>>register unsigned long reg5 asm ("5") = *rlen; /* <-- */
>>asm volatile(
>>"   diag%2,%0,0x8\n"
>>"   brc 8,1f\n"
>>"   agr %1,%4\n"
>>"1:\n"
>>: "+d" (reg4), "+d" (reg5)
>>: "d" (reg2), "d" (reg3), "d" (*rlen): "cc");
>>*rlen = reg5;
>>return reg4;
>> }
>> 
>> into this:
>> 
>> int diag8_response(int cmdlen, char *response, int *rlen)
>> {
>>unsigned long len = cmdlen | 0x4000L;
>> 
>>asm volatile(
>>"   diag%2,%0,0x8\n"
>>"   brc 8,1f\n"
>>"   agr %1,%4\n"
>>"1:\n"
>>: "+{r4}" (len), "+{r5}" (*rlen)
>>: "{r2}" ((addr_t)cpcmd_buf), "{r3}" ((addr_t)response), "d" 
>> (*rlen): "cc");
>>return len;
>> }
>> 
>> Apart from being much easier to read because the hard regs become part
>> of the inline assembly it solves also a couple of other issues:
>> 
>> - function calls might clobber register asm variables see BZ100908
>> - the constraints for the register asm operands are superfluous
>> - one register asm variable cannot be used for 2 different inline
>>  assemblies if the value is expected in different hard regs
>> 
>> I've started with a hackish implementation for IBM Z using the
>> TARGET_MD_ASM_ADJUST hook and let all the places parsing constraints
>> skip over the {} parts.  But perhaps it would be useful to make this a
>> generic mechanism for all targets?!
>> 
>> Andreas
>
> Yes, I would think this should be made a general mechanism that any target 
> could use.

+1 FWIW.  I think this would also avoid the common confusion around
the semantics of register asms.

We would need to check whether { is used as a constraint string by
any target (hepefully not).

Richard


Re: RFC: New mechanism for hard reg operands to inline asm

2021-06-04 Thread Andreas Krebbel via Gcc
On 6/4/21 8:18 PM, Paul Koning wrote:
...
> Yes, I would think this should be made a general mechanism that any target 
> could use.
> 
> I wonder if instead of creating a new mechanism you could do this simply by 
> creating new constraint names, where each name matches exactly one hard 
> register.  That's roughly what this amounts to, isn't it? 

I thought about this as well but I'm not sure this would work well without 
changing LRA. It would
mean to introduce many register classes with single registers. The presence of 
many "sparse"
register classes to my understanding would affect register allocation.

Perhaps it would work to just emit all the moves into or out of hard regs at 
expand time.

Andreas


Re: RFC: New mechanism for hard reg operands to inline asm

2021-06-04 Thread Paul Koning via Gcc



> On Jun 4, 2021, at 2:02 PM, Andreas Krebbel via Gcc  wrote:
> 
> Hi,
> 
> I wonder if we could replace the register asm construct for
> inline assemblies with something a bit nicer and more obvious.
> E.g. turning this (real world example from IBM Z kernel code):
> 
> int diag8_response(int cmdlen, char *response, int *rlen)
> {
>register unsigned long reg2 asm ("2") = (addr_t) cpcmd_buf;
>register unsigned long reg3 asm ("3") = (addr_t) response;
>register unsigned long reg4 asm ("4") = cmdlen | 0x4000L;
>register unsigned long reg5 asm ("5") = *rlen; /* <-- */
>asm volatile(
>"   diag%2,%0,0x8\n"
>"   brc 8,1f\n"
>"   agr %1,%4\n"
>"1:\n"
>: "+d" (reg4), "+d" (reg5)
>: "d" (reg2), "d" (reg3), "d" (*rlen): "cc");
>*rlen = reg5;
>return reg4;
> }
> 
> into this:
> 
> int diag8_response(int cmdlen, char *response, int *rlen)
> {
>unsigned long len = cmdlen | 0x4000L;
> 
>asm volatile(
>"   diag%2,%0,0x8\n"
>"   brc 8,1f\n"
>"   agr %1,%4\n"
>"1:\n"
>: "+{r4}" (len), "+{r5}" (*rlen)
>: "{r2}" ((addr_t)cpcmd_buf), "{r3}" ((addr_t)response), "d" 
> (*rlen): "cc");
>return len;
> }
> 
> Apart from being much easier to read because the hard regs become part
> of the inline assembly it solves also a couple of other issues:
> 
> - function calls might clobber register asm variables see BZ100908
> - the constraints for the register asm operands are superfluous
> - one register asm variable cannot be used for 2 different inline
>  assemblies if the value is expected in different hard regs
> 
> I've started with a hackish implementation for IBM Z using the
> TARGET_MD_ASM_ADJUST hook and let all the places parsing constraints
> skip over the {} parts.  But perhaps it would be useful to make this a
> generic mechanism for all targets?!
> 
> Andreas

Yes, I would think this should be made a general mechanism that any target 
could use.

I wonder if instead of creating a new mechanism you could do this simply by 
creating new constraint names, where each name matches exactly one hard 
register.  That's roughly what this amounts to, isn't it? 

paul



RFC: New mechanism for hard reg operands to inline asm

2021-06-04 Thread Andreas Krebbel via Gcc
Hi,

I wonder if we could replace the register asm construct for
inline assemblies with something a bit nicer and more obvious.
E.g. turning this (real world example from IBM Z kernel code):

int diag8_response(int cmdlen, char *response, int *rlen)
{
register unsigned long reg2 asm ("2") = (addr_t) cpcmd_buf;
register unsigned long reg3 asm ("3") = (addr_t) response;
register unsigned long reg4 asm ("4") = cmdlen | 0x4000L;
register unsigned long reg5 asm ("5") = *rlen; /* <-- */
asm volatile(
"   diag%2,%0,0x8\n"
"   brc 8,1f\n"
"   agr %1,%4\n"
"1:\n"
: "+d" (reg4), "+d" (reg5)
: "d" (reg2), "d" (reg3), "d" (*rlen): "cc");
*rlen = reg5;
return reg4;
}

into this:

int diag8_response(int cmdlen, char *response, int *rlen)
{
unsigned long len = cmdlen | 0x4000L;

asm volatile(
"   diag%2,%0,0x8\n"
"   brc 8,1f\n"
"   agr %1,%4\n"
"1:\n"
: "+{r4}" (len), "+{r5}" (*rlen)
: "{r2}" ((addr_t)cpcmd_buf), "{r3}" ((addr_t)response), "d" 
(*rlen): "cc");
return len;
}

Apart from being much easier to read because the hard regs become part
of the inline assembly it solves also a couple of other issues:

- function calls might clobber register asm variables see BZ100908
- the constraints for the register asm operands are superfluous
- one register asm variable cannot be used for 2 different inline
  assemblies if the value is expected in different hard regs

I've started with a hackish implementation for IBM Z using the
TARGET_MD_ASM_ADJUST hook and let all the places parsing constraints
skip over the {} parts.  But perhaps it would be useful to make this a
generic mechanism for all targets?!

Andreas