subject:"libgcc\: strange optimization"

Re: libgcc: strange optimization

2011-08-09 Thread Richard Earnshaw

On 02/08/11 13:22, Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson mi...@it.uu.se wrote:
 Michael Walle writes:
  
   Hi,
  
To confirm that try -fno-tree-ter.
  
   lm32-gcc -O1 -fno-tree-ter -S -c test.c generates the following working
   assembly code:
  
   f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
addi r8, r0, 10
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

 -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
 
 It's of course only a workaround, not a real fix as nothing prevents
 other optimizers from performing the re-scheduling TER does.
 
 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.
 
 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).
 
 Richard.
 
 

Better still would be to change the specification and implementation of
local register variables to only guarantee them at the beginning of ASM
statements.  At other times they are simply the same as other local
variables.  Now we have a problem that the register allocator knows how
to solve.

In other words, if the user writes

bar (int y)
{
  register int x asm (r0) = y;

  foo()

  asm volatile (mov r1, r0);

}

The compiler will generate
(set (reg:SI 999 x) (reg:SI y))
(call foo)
(set (reg:SI 0 r0) (reg:SI 999 x))
(asm mov r1, r0)
(set (reg:SI 999 x) (reg:SI 0 r0))

That is, it inserts appropriate set insns around asm blocks.  Of course,
the register allocator can try to allocate reg 999 to r0 and if it
succeeds, then the sets become dead.  But if it fails then at least the
code will continue to execute as intended.

R.

Re: libgcc: strange optimization

2011-08-09 Thread Ulrich Weigand

Richard Earnshaw wrote:

 Better still would be to change the specification and implementation of
 local register variables to only guarantee them at the beginning of ASM
 statements.  At other times they are simply the same as other local
 variables.  Now we have a problem that the register allocator knows how
 to solve.

This seems to be pretty much the same as my proposal here:
http://gcc.gnu.org/ml/gcc/2011-08/msg00064.html

But there was some push-back on requiring additional semantics
by some users ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: libgcc: strange optimization

2011-08-09 Thread Hans-Peter Nilsson

On Tue, 9 Aug 2011, Ulrich Weigand wrote:
 Richard Earnshaw wrote:

  Better still would be to change the specification and implementation of
  local register variables to only guarantee them at the beginning of ASM
  statements.  At other times they are simply the same as other local
  variables.  Now we have a problem that the register allocator knows how
  to solve.

 This seems to be pretty much the same as my proposal here:
 http://gcc.gnu.org/ml/gcc/2011-08/msg00064.html

 But there was some push-back on requiring additional semantics
 by some users ...

Don't feel bad, at least we seem to have overwhelming consensus
on what to do for local asm-declared register variables when
they feed asm statements! :)

I found an example where I have an asm-declared register that
was used not just for the primary asm statement, but I'm ok with
those other uses not using the declared register, just as warned
by the documentation.  (I don't think gcc can better assign
another register, but that's beside the point.)

brgds, H-P

Re: libgcc: strange optimization

2011-08-09 Thread Hans-Peter Nilsson

On Tue, 9 Aug 2011, Richard Earnshaw wrote:
 Better still would be to change the specification and implementation of
 local register variables to only guarantee them at the beginning of ASM
 statements.

Only for those asm statements taking the same asm-register
variables as arguments.

  At other times they are simply the same as other local
 variables.  Now we have a problem that the register allocator knows how
 to solve.

 In other words, if the user writes

 bar (int y)
 {
   register int x asm (r0) = y;

   foo()

   asm volatile (mov r1, r0);

 }

 The compiler will generate
 (set (reg:SI 999 x) (reg:SI y))
 (call foo)
 (set (reg:SI 0 r0) (reg:SI 999 x))
 (asm mov r1, r0)
 (set (reg:SI 999 x) (reg:SI 0 r0))

It should rather eliminate the variable x and its assignment as
it isn't used in a way properly conveyed to gcc: the occurrence
of the string r0 in the asm should not be considered.

I like Ulrich Weigand's proposal better, not the least because
it's how it's already documented to work.

brgds, H-P

Re: libgcc: strange optimization

2011-08-08 Thread Richard Guenther

On Sat, Aug 6, 2011 at 5:00 PM, Paolo Bonzini bonz...@gnu.org wrote:
 On 08/04/2011 01:10 PM, Andrew Haley wrote:

   It's the sort of thing that gets done in threaded interpreters,
   where you really need to keep a few pointers in registers and
   the interpreter itself is a very long function.  gcc has always
   done a dreadful job of register allocation in such cases.

 
   Sure, but what I have seen people use global register variables
   for this (which means they get taken away from the register
  allocator).

 Not always though, and the x86 has so few registers that using a
 global register variable is very problematic.  I suppose you could
 compile the threaded interpreter in a file of its own, but I'm not
 sure that has quite the same semantics as local register variables.

 Indeed, local register variables give almost the same benefit as globals
 with half the burden.  The idea is that you don't care about the exact
 register that holds the contents but, by specifying a callee-save register,
 GCC will use those instead of memory across calls.  This reduces _a lot_ the
 number of spills.

 The problem is that people who care about this stuff very much don't
 always read...@gcc.gnu.org  so won't be heard.  But in their own world
 (LISP, Forth) nice features like register variables and labels as
 values have led to gcc being the preferred compiler for this kind of
 work.

 /me raises hands.

 For GNU Smalltalk, using

 #if defined(__i386__)
 # define __DECL_REG1 __asm(%esi)
 # define __DECL_REG2 __asm(%edi)
 # define __DECL_REG3 /* no more caller-save regs if PIC is in use!  */
 #endif

 #if defined(__x86_64__)
 # define __DECL_REG1 __asm(%r12)
 # define __DECL_REG2 __asm(%r13)
 # define __DECL_REG3 __asm(%rbx)
 #endif

 ...

  register unsigned char *ip __DECL_REG1;
  register OOP * sp __DECL_REG2;
  register intptr_t arg __DECL_REG3;

 improves performance by up to 20% if I remember correctly.  I can benchmark
 it if desired.

 It does not come for free, in some cases the register allocator does some
 stupid things due to the hard register declaration.  But it gets much better
 code overall, so who cares about the microoptimization.

 Of course, if the register allocator did the right thing, or if I could use
 simply

  unsigned char *ip __attribute__(__do_not_spill_me__(20)));
  OOP *sp __attribute__(__do_not_spill_me__(10)));
  intptr_t arg __attrbite__(__do_not_spill_me__(0)));

 that would be just fine.

Like if

register unsigned char *ip;

would increase spill cost of ip compared to

unsigned char *ip;

?  It is, after all, a cost issue - forcefully pinning down registers can
lead to problems.  We'd of course have to somehow preserve the
register state of ip for all relevant pseudos (and avoid coalescing with
non-register ones).

Richard.

 Paolo

Re: libgcc: strange optimization

2011-08-08 Thread Paolo Bonzini


On 08/08/2011 10:06 AM, Richard Guenther wrote:

Like if

register unsigned char *ip;

would increase spill cost of ip compared to

unsigned char *ip;

?


Remember we're talking about a function with 11000 pseudos and 4000 
allocnos (not to mention a 1500 basic blocks).  You cannot really blame 
IRA for not doing the right thing.  And actually, ip and sp are live 
everywhere, so there's no hope of reserving a register for them, 
especially since all x86 callee-save registers have special uses in 
string functions.


If I understand the huge dumps correctly, the missing part is trying to 
use callee-save registers for spilling, rather than memory.  However, 
perhaps another way to do it is a specialized region management scheme 
for large switch statements, treating each switch arm as a separate 
region??  There are few registers live across the switch, and all of 
them are used either a lot or almost never (and always in cold blocks).


BTW, here are some measurements on x86-64:

1) with regalloc hints: 450060432 bytecodes/sec; 12819996 calls/sec
2) without regalloc hints: 263002439 bytecodes/sec; 9458816 sends/sec

Probably even worse on x86-32.

None of -fira-region=all, -fira-region=one, -fira-algorithm=priority had 
significant changes.  In fact, it's pretty much a binary result: I'd 
expect register allocation results to be either on par with (1) or 
similar to (2); everything else is mostly noise.


Paolo

Re: libgcc: strange optimization

2011-08-06 Thread Paolo Bonzini


On 08/04/2011 01:10 PM, Andrew Haley wrote:

  It's the sort of thing that gets done in threaded interpreters,
  where you really need to keep a few pointers in registers and
  the interpreter itself is a very long function.  gcc has always
  done a dreadful job of register allocation in such cases.


  Sure, but what I have seen people use global register variables
  for this (which means they get taken away from the register allocator).


Not always though, and the x86 has so few registers that using a
global register variable is very problematic.  I suppose you could
compile the threaded interpreter in a file of its own, but I'm not
sure that has quite the same semantics as local register variables.


Indeed, local register variables give almost the same benefit as globals 
with half the burden.  The idea is that you don't care about the exact 
register that holds the contents but, by specifying a callee-save 
register, GCC will use those instead of memory across calls.  This 
reduces _a lot_ the number of spills.



The problem is that people who care about this stuff very much don't
always read...@gcc.gnu.org  so won't be heard.  But in their own world
(LISP, Forth) nice features like register variables and labels as
values have led to gcc being the preferred compiler for this kind of
work.


/me raises hands.

For GNU Smalltalk, using

#if defined(__i386__)
# define __DECL_REG1 __asm(%esi)
# define __DECL_REG2 __asm(%edi)
# define __DECL_REG3 /* no more caller-save regs if PIC is in use!  */
#endif

#if defined(__x86_64__)
# define __DECL_REG1 __asm(%r12)
# define __DECL_REG2 __asm(%r13)
# define __DECL_REG3 __asm(%rbx)
#endif

...

  register unsigned char *ip __DECL_REG1;
  register OOP * sp __DECL_REG2;
  register intptr_t arg __DECL_REG3;

improves performance by up to 20% if I remember correctly.  I can 
benchmark it if desired.


It does not come for free, in some cases the register allocator does 
some stupid things due to the hard register declaration.  But it gets 
much better code overall, so who cares about the microoptimization.


Of course, if the register allocator did the right thing, or if I could 
use simply


  unsigned char *ip __attribute__(__do_not_spill_me__(20)));
  OOP *sp __attribute__(__do_not_spill_me__(10)));
  intptr_t arg __attrbite__(__do_not_spill_me__(0)));

that would be just fine.

Paolo

Re: libgcc: strange optimization

2011-08-04 Thread Andreas Schwab

Hans-Peter Nilsson h...@bitrange.com writes:

 To make sure, it'd be nice if someone could perhaps grep an
 entire GNU/Linux-or-other distribution including the kernel for
 uses of asm-declared *local* registers that don't directly feed
 into asms and not being the stack-pointer?

One frequent candidate is the global pointer.

Andreas.

-- 
Andreas Schwab, sch...@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84  5EC7 45C6 250E 6F00 984E
And now for something completely different.

Re: libgcc: strange optimization

2011-08-04 Thread Andrew Haley

On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

 To make sure, it'd be nice if someone could perhaps grep an
 entire GNU/Linux-or-other distribution including the kernel for
 uses of asm-declared *local* registers that don't directly feed
 into asms and not being the stack-pointer?  Or can we get away
 with just saying that local asm registers haven't had any other
 documented meaning for the last seven years?

It's the sort of thing that gets done in threaded interpreters,
where you really need to keep a few pointers in registers and
the interpreter itself is a very long function.  gcc has always
done a dreadful job of register allocation in such cases.

Andrew.

Re: libgcc: strange optimization

2011-08-04 Thread Richard Guenther

On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley a...@redhat.com wrote:
 On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

 To make sure, it'd be nice if someone could perhaps grep an
 entire GNU/Linux-or-other distribution including the kernel for
 uses of asm-declared *local* registers that don't directly feed
 into asms and not being the stack-pointer?  Or can we get away
 with just saying that local asm registers haven't had any other
 documented meaning for the last seven years?

 It's the sort of thing that gets done in threaded interpreters,
 where you really need to keep a few pointers in registers and
 the interpreter itself is a very long function.  gcc has always
 done a dreadful job of register allocation in such cases.

Sure, but what I have seen people use global register variables
for this (which means they get taken away from the register allocator).

Richard.

 Andrew.

Re: libgcc: strange optimization

2011-08-04 Thread Andrew Haley

On 08/04/2011 10:52 AM, Richard Guenther wrote:
 On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley a...@redhat.com wrote:
 On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

 To make sure, it'd be nice if someone could perhaps grep an
 entire GNU/Linux-or-other distribution including the kernel for
 uses of asm-declared *local* registers that don't directly feed
 into asms and not being the stack-pointer?  Or can we get away
 with just saying that local asm registers haven't had any other
 documented meaning for the last seven years?

 It's the sort of thing that gets done in threaded interpreters,
 where you really need to keep a few pointers in registers and
 the interpreter itself is a very long function.  gcc has always
 done a dreadful job of register allocation in such cases.
 
 Sure, but what I have seen people use global register variables
 for this (which means they get taken away from the register allocator).

Not always though, and the x86 has so few registers that using a
global register variable is very problematic.  I suppose you could
compile the threaded interpreter in a file of its own, but I'm not
sure that has quite the same semantics as local register variables.

The problem is that people who care about this stuff very much don't
always read gcc@gcc.gnu.org so won't be heard.  But in their own world
(LISP, Forth) nice features like register variables and labels as
values have led to gcc being the preferred compiler for this kind of
work.

Andrew.

Re: libgcc: strange optimization

2011-08-04 Thread Richard Guenther

On Thu, Aug 4, 2011 at 1:10 PM, Andrew Haley a...@redhat.com wrote:
 On 08/04/2011 10:52 AM, Richard Guenther wrote:
 On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley a...@redhat.com wrote:
 On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

 To make sure, it'd be nice if someone could perhaps grep an
 entire GNU/Linux-or-other distribution including the kernel for
 uses of asm-declared *local* registers that don't directly feed
 into asms and not being the stack-pointer?  Or can we get away
 with just saying that local asm registers haven't had any other
 documented meaning for the last seven years?

 It's the sort of thing that gets done in threaded interpreters,
 where you really need to keep a few pointers in registers and
 the interpreter itself is a very long function.  gcc has always
 done a dreadful job of register allocation in such cases.

 Sure, but what I have seen people use global register variables
 for this (which means they get taken away from the register allocator).

 Not always though, and the x86 has so few registers that using a
 global register variable is very problematic.  I suppose you could
 compile the threaded interpreter in a file of its own, but I'm not
 sure that has quite the same semantics as local register variables.

 The problem is that people who care about this stuff very much don't
 always read gcc@gcc.gnu.org so won't be heard.  But in their own world
 (LISP, Forth) nice features like register variables and labels as
 values have led to gcc being the preferred compiler for this kind of
 work.

Well, the uses won't break with the idea - they would simply work
like if they were not using local register variables.

Richard.

 Andrew.

Re: libgcc: strange optimization

2011-08-04 Thread Hans-Peter Nilsson

On Thu, 4 Aug 2011, Andreas Schwab wrote:
 Hans-Peter Nilsson h...@bitrange.com writes:

  To make sure, it'd be nice if someone could perhaps grep an
  entire GNU/Linux-or-other distribution including the kernel for
  uses of asm-declared *local* registers that don't directly feed
  into asms and not being the stack-pointer?

 One frequent candidate is the global pointer.

Yes, that too, but it's usually fixed isn't it?  What I really
meant was not being a fixed register but I don't think many
willing to grep a whole distro can tell which registers in which
gcc port are fixed and remember to look for uses of
-ffixed-reg-.

brgds, H-P

Re: libgcc: strange optimization

2011-08-04 Thread Andrew Haley

On 08/04/2011 12:19 PM, Richard Guenther wrote:
 On Thu, Aug 4, 2011 at 1:10 PM, Andrew Haley a...@redhat.com wrote:
 On 08/04/2011 10:52 AM, Richard Guenther wrote:
 On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley a...@redhat.com wrote:
 On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

 To make sure, it'd be nice if someone could perhaps grep an
 entire GNU/Linux-or-other distribution including the kernel for
 uses of asm-declared *local* registers that don't directly feed
 into asms and not being the stack-pointer?  Or can we get away
 with just saying that local asm registers haven't had any other
 documented meaning for the last seven years?

 It's the sort of thing that gets done in threaded interpreters,
 where you really need to keep a few pointers in registers and
 the interpreter itself is a very long function.  gcc has always
 done a dreadful job of register allocation in such cases.

 Sure, but what I have seen people use global register variables
 for this (which means they get taken away from the register allocator).

 Not always though, and the x86 has so few registers that using a
 global register variable is very problematic.  I suppose you could
 compile the threaded interpreter in a file of its own, but I'm not
 sure that has quite the same semantics as local register variables.

 The problem is that people who care about this stuff very much don't
 always read gcc@gcc.gnu.org so won't be heard.  But in their own world
 (LISP, Forth) nice features like register variables and labels as
 values have led to gcc being the preferred compiler for this kind of
 work.
 
 Well, the uses won't break with the idea - they would simply work
 like if they were not using local register variables.

I don't understand this remark.  Surely if they work like they were
not using local register variables, you'll get dreadful register
allocation.  But this is a big reason to use gcc.  Efficient code
really does matter to people writing this kind of thing.

Andrew.

Re: libgcc: strange optimization

2011-08-03 Thread Ulrich Weigand

Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor i...@google.com wrote:
  Richard Guenther richard.guent...@gmail.com writes:
  I suggest to amend the documentation for local call-clobbered register
  variables to say that the only valid sequence using them is from a
  non-inlinable function that contains only direct initializations of the
  register variables from constants or parameters.
 
  Let's just implement those requirements in the compiler itself.
 
 Doesn't work for existing code, no?  And if thinking new code then
 I'd rather have explicit dependences (and a way to represent them).
 Thus, for example
 
 asm (scall : : asm(r0) (10), ...)
 
 thus, why force new constraints when we already can figure out
 local register vars by register name?  Why not extend the constraint
 syntax somehow to allow specifying the same effect?

Maybe it would be possible to implement this while keeping the syntax
of existing code by (re-)defining the semantics of register asm to
basically say that:

 If a variable X is declared as register asm for register Y, and X
 is later on used as operand to an inline asm, the register allocator
 will choose register Y to hold that asm operand.  (And this is the
 full specification of register asm semantics, nothing beyond this
 is guaranteed.)

It seems this semantics could be implemented very early on, probably
in the frontend itself.  The frontend would mark the *asm* statement
as using the specified register (there would be no special handling
of the *variable* as such, after the frontend is done).  The optimizers
would then simply be required to pass the asm-statement register
annotations though, much like today they pass constraints through.
At the point where register allocation decisions are made, those
register annotations would then be acted on.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: libgcc: strange optimization

2011-08-03 Thread Georg-Johann Lay

Ulrich Weigand wrote:
 Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor i...@google.com wrote:
 Richard Guenther richard.guent...@gmail.com writes:
 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.
 Let's just implement those requirements in the compiler itself.
 Doesn't work for existing code, no?  And if thinking new code then
 I'd rather have explicit dependences (and a way to represent them).
 Thus, for example

 asm (scall : : asm(r0) (10), ...)

 thus, why force new constraints when we already can figure out
 local register vars by register name?  Why not extend the constraint
 syntax somehow to allow specifying the same effect?

Yes this would be exact equivalence of

  register int var asm (r0) = 10;
  ...
  asm (scall : : r (var), ...)


 Maybe it would be possible to implement this while keeping the syntax
 of existing code by (re-)defining the semantics of register asm to
 basically say that:
 
  If a variable X is declared as register asm for register Y, and X
  is later on used as operand to an inline asm, the register allocator
  will choose register Y to hold that asm operand.  (And this is the
  full specification of register asm semantics, nothing beyond this
  is guaranteed.)

Yes, that's reasonable.  As I understand the docs, in code like

void foo ()
{
   register int var asm (r1) = 10;
   asm (;; use r1);
}

there is nothing that connects var to the asm and assuming that
r1 holds 10 in the asm is a user error.

The only place where the asm attached to a variable needs to have
effect are the inline asm sequences that explicitly refer to
respective variables.  If there is no inline asm referencing a
local register variable, there is on difference to a non-register
auto variable; there could even be a warning that in such a case
that

   register int var asm (r1) = 10;

is equivalent to

   int var = 10;

 It seems this semantics could be implemented very early on, probably
 in the frontend itself.  The frontend would mark the *asm* statement
 as using the specified register (there would be no special handling
 of the *variable* as such, after the frontend is done).  The optimizers
 would then simply be required to pass the asm-statement register
 annotations though, much like today they pass constraints through.
 At the point where register allocation decisions are made, those
 register annotations would then be acted on.
 
 Bye,
 Ulrich

I wonder why it does not work like that in the current implementation.
Local register variable is just like using a similar constraint
(with the only difference that in general there is no such constraint,
otherwise the developer would use it). A pass like .asmcons could
take care of it just the same way it does for constraints and no
optimizer passed would have to bother if a variable is a local register
or not.

This would render local register variables even more functional
because no one needed to care if there were implicit library calls
or things like that.

Johann

Re: libgcc: strange optimization

2011-08-03 Thread Richard Guenther

On Wed, Aug 3, 2011 at 11:50 AM, Georg-Johann Lay a...@gjlay.de wrote:
 Ulrich Weigand wrote:
 Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor i...@google.com wrote:
 Richard Guenther richard.guent...@gmail.com writes:
 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.
 Let's just implement those requirements in the compiler itself.
 Doesn't work for existing code, no?  And if thinking new code then
 I'd rather have explicit dependences (and a way to represent them).
 Thus, for example

 asm (scall : : asm(r0) (10), ...)

 thus, why force new constraints when we already can figure out
 local register vars by register name?  Why not extend the constraint
 syntax somehow to allow specifying the same effect?

 Yes this would be exact equivalence of

  register int var asm (r0) = 10;
  ...
  asm (scall : : r (var), ...)


 Maybe it would be possible to implement this while keeping the syntax
 of existing code by (re-)defining the semantics of register asm to
 basically say that:

  If a variable X is declared as register asm for register Y, and X
  is later on used as operand to an inline asm, the register allocator
  will choose register Y to hold that asm operand.  (And this is the
  full specification of register asm semantics, nothing beyond this
  is guaranteed.)

 Yes, that's reasonable.  As I understand the docs, in code like

 void foo ()
 {
   register int var asm (r1) = 10;
   asm (;; use r1);
 }

 there is nothing that connects var to the asm and assuming that
 r1 holds 10 in the asm is a user error.

 The only place where the asm attached to a variable needs to have
 effect are the inline asm sequences that explicitly refer to
 respective variables.  If there is no inline asm referencing a
 local register variable, there is on difference to a non-register
 auto variable; there could even be a warning that in such a case
 that

   register int var asm (r1) = 10;

 is equivalent to

   int var = 10;

 It seems this semantics could be implemented very early on, probably
 in the frontend itself.  The frontend would mark the *asm* statement
 as using the specified register (there would be no special handling
 of the *variable* as such, after the frontend is done).  The optimizers
 would then simply be required to pass the asm-statement register
 annotations though, much like today they pass constraints through.
 At the point where register allocation decisions are made, those
 register annotations would then be acted on.

 Bye,
 Ulrich

 I wonder why it does not work like that in the current implementation.
 Local register variable is just like using a similar constraint
 (with the only difference that in general there is no such constraint,
 otherwise the developer would use it). A pass like .asmcons could
 take care of it just the same way it does for constraints and no
 optimizer passed would have to bother if a variable is a local register
 or not.

 This would render local register variables even more functional
 because no one needed to care if there were implicit library calls
 or things like that.

Yes, I like that idea.

Richard.

Re: libgcc: strange optimization

2011-08-03 Thread Michael Matz

Hi,

On Wed, 3 Aug 2011, Richard Guenther wrote:

  Yes, that's reasonable.  As I understand the docs, in code like
 
  void foo ()
  {
    register int var asm (r1) = 10;
    asm (;; use r1);
  }
 
  there is nothing that connects var to the asm and assuming that
  r1 holds 10 in the asm is a user error.
 
  The only place where the asm attached to a variable needs to have
  effect are the inline asm sequences that explicitly refer to
  respective variables.  If there is no inline asm referencing a
  local register variable, there is on difference to a non-register
  auto variable; there could even be a warning that in such a case
  that
 
    register int var asm (r1) = 10;
 
  is equivalent to
 
    int var = 10;
 
  This would render local register variables even more functional 
  because no one needed to care if there were implicit library calls or 
  things like that.
 
 Yes, I like that idea.

I do too.  Except it doesn't work :)

There's a common idiom of accessing registers read-only by declaring local 
register vars.  E.g. to (*grasp*) the stack pointer.  There won't be a DEF 
for that register var, and hence at use-points we couldn't reload any 
sensible values into those registers (and we really shouldn't clobber the 
stack pointer in this way).

We could introduce that special semantic only for non-reserved registers, 
and require no writes to register vars for reserved registers.

Or we could simply do:

  if (any_local_reg_vars)
optimize = 0;

But I already see people wanting to _do_ optimization also with local reg 
vars, just not the wrong optimizations ;-/


Ciao,
Michael.

Re: libgcc: strange optimization

2011-08-03 Thread Richard Guenther

On Wed, Aug 3, 2011 at 3:27 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Wed, 3 Aug 2011, Richard Guenther wrote:

  Yes, that's reasonable.  As I understand the docs, in code like
 
  void foo ()
  {
    register int var asm (r1) = 10;
    asm (;; use r1);
  }
 
  there is nothing that connects var to the asm and assuming that
  r1 holds 10 in the asm is a user error.
 
  The only place where the asm attached to a variable needs to have
  effect are the inline asm sequences that explicitly refer to
  respective variables.  If there is no inline asm referencing a
  local register variable, there is on difference to a non-register
  auto variable; there could even be a warning that in such a case
  that
 
    register int var asm (r1) = 10;
 
  is equivalent to
 
    int var = 10;
 
  This would render local register variables even more functional
  because no one needed to care if there were implicit library calls or
  things like that.

 Yes, I like that idea.

 I do too.  Except it doesn't work :)

 There's a common idiom of accessing registers read-only by declaring local
 register vars.  E.g. to (*grasp*) the stack pointer.  There won't be a DEF
 for that register var, and hence at use-points we couldn't reload any
 sensible values into those registers (and we really shouldn't clobber the
 stack pointer in this way).

 We could introduce that special semantic only for non-reserved registers,
 and require no writes to register vars for reserved registers.

 Or we could simply do:

  if (any_local_reg_vars)
    optimize = 0;

 But I already see people wanting to _do_ optimization also with local reg
 vars, just not the wrong optimizations ;-/

I'd say we should start rejecting all these bogus constructs by default
(maybe accepting them with -fpermissive and then, well, maybe generate
some dwim code).  That is, local register var decls are only valid
with an initializer, they are implicitly constant (you can't re-assign to them).
Reserved registers are a no-go (like %esp), either global or local.

Richard.


 Ciao,
 Michael.

Re: libgcc: strange optimization

2011-08-03 Thread Georg-Johann Lay

Richard Guenther wrote:
 On Wed, Aug 3, 2011 at 3:27 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Wed, 3 Aug 2011, Richard Guenther wrote:

 Yes, that's reasonable.  As I understand the docs, in code like

 void foo ()
 {
   register int var asm (r1) = 10;
   asm (;; use r1);
 }

 there is nothing that connects var to the asm and assuming that
 r1 holds 10 in the asm is a user error.

 The only place where the asm attached to a variable needs to have
 effect are the inline asm sequences that explicitly refer to
 respective variables.  If there is no inline asm referencing a
 local register variable, there is on difference to a non-register
 auto variable; there could even be a warning that in such a case
 that

   register int var asm (r1) = 10;

 is equivalent to

   int var = 10;

 This would render local register variables even more functional
 because no one needed to care if there were implicit library calls or
 things like that.
 Yes, I like that idea.
 I do too.  Except it doesn't work :)

 There's a common idiom of accessing registers read-only by declaring local
 register vars.  E.g. to (*grasp*) the stack pointer.  There won't be a DEF
 for that register var, and hence at use-points we couldn't reload any
 sensible values into those registers (and we really shouldn't clobber the
 stack pointer in this way).

 We could introduce that special semantic only for non-reserved registers,
 and require no writes to register vars for reserved registers.

 Or we could simply do:

  if (any_local_reg_vars)
optimize = 0;

 But I already see people wanting to _do_ optimization also with local reg
 vars, just not the wrong optimizations ;-/

Definitely yes.  As I wrote above, if you see asm it's not unlikely that it
is  a piece of performance critical code.

 I'd say we should start rejecting all these bogus constructs by default
 (maybe accepting them with -fpermissive and then, well, maybe generate
 some dwim code).  That is, local register var decls are only valid
 with an initializer, they are implicitly constant (you can't re-assign to 
 them).
 Reserved registers are a no-go (like %esp), either global or local.

Would that help? Like in code

static inline void foo (int arg)
{
   register const int reg asm (r1) = arg;
   asm (...::r(reg));
}

And with output constraints like =r,0 or +r.  Or in local blocks:

static inline void foo (int arg)
{
   register const int reg asm (r1) = arg;

   ...
   {
   register const int reg2 asm (r1) = reg;
   asm (...::r(reg2));
   }
}



Do the current optimizers shred inline asm with ordinary constraints
but without local registers?

If yes, there is a considerable problem in the optimizers and/or in GCC.

If not, why can't local register variables work similarly, i.e. propagate
the register information into respective asms and forget about it for
the variables?

Johann

 Richard.
 
 Ciao,
 Michael.

Re: libgcc: strange optimization

2011-08-03 Thread Richard Henderson

On 08/03/2011 07:02 AM, Richard Guenther wrote:
 Reserved registers are a no-go (like %esp), either global or local.

Local register variables referring to anything in fixed_regs
are trivial to handle -- continue to treat them exactly as we
currently do.  They won't be clobbered by random code movement
because they're fixed.


r~

Re: libgcc: strange optimization

2011-08-03 Thread Hans-Peter Nilsson

On Wed, 3 Aug 2011, Ulrich Weigand wrote:
 Richard Guenther wrote:
  asm (scall : : asm(r0) (10), ...)
 Maybe it would be possible to implement this while keeping the syntax
 of existing code by (re-)defining the semantics of register asm to
 basically say that:

  If a variable X is declared as register asm for register Y, and X
  is later on used as operand to an inline asm, the register allocator
  will choose register Y to hold that asm operand.

me too: Nice idea!

  (And this is the
  full specification of register asm semantics, nothing beyond this
  is guaranteed.)

You'd have to handle global registers differently, and local
fixed registers not feeding into asms.  For everything else,
error or warning.  That should be ok, because local asm
registers are wonderfully already documented to have that
restriction: Local register variables in specific registers do
not reserve the registers, except at the point where they are
used as input or output operands in an @code{asm} statement and
the @code{asm} statement itself is not deleted.

So, it's just a small matter of programming to make that happen
for real. :-)

To make sure, it'd be nice if someone could perhaps grep an
entire GNU/Linux-or-other distribution including the kernel for
uses of asm-declared *local* registers that don't directly feed
into asms and not being the stack-pointer?  Or can we get away
with just saying that local asm registers haven't had any other
documented meaning for the last seven years?

 It seems this semantics could be implemented very early on, probably
 in the frontend itself.  The frontend would mark the *asm* statement
 as using the specified register (there would be no special handling
 of the *variable* as such, after the frontend is done).  The optimizers
 would then simply be required to pass the asm-statement register
 annotations though, much like today they pass constraints through.
 At the point where register allocation decisions are made, those
 register annotations would then be acted on.

People ask why it's not already like that, probably because they
assume the ideal sequence of events.  At least the quote above
is a late addition (close to seven years now).  IIUC, asms and
register asms weren't originally tied together and the current
implementation with early register tying just happened to work
well together, well, that is until the SSA revolution. ;)

brgds, H-P

Re: libgcc: strange optimization

2011-08-02 Thread Hans-Peter Nilsson

On Mon, 1 Aug 2011, Georg-Johann Lay wrote:
 Michael Walle schrieb:
  Hi list,
 
  consider the following test code:
   static void inline f1(int arg)
   {
 register int a1 asm(r8) = 10;
 register int a2 asm(r1) = arg;
 
 asm(scall : : r(a1), r(a2));
   }
 
   void f2(int arg)
   {
 f1(arg  10);
   }
 
 
  If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
  email), the a1 = 10; assignment is optimized away.

 Your asm has no output operands and no side effects, with more aggressive
 optimization the whole ask would disappear.

No, for the record that's not supposed to happen for asms
*without outputs*.

If an @code{asm} has output operands, GCC assumes for
optimization purposes the instruction has no side effects except
to change the output
operands.

 What you want is maybe something like

asm volatile (scall : : r(a1), r(a2));

For the code at hand, the scall should be described to both have
an output and be marked volatile, since the system call is a
side effect that GCC can't see and might otherwise optimize away
if the system call return value is unused.  A plain volatile
marking as the above should not be necessary, modulo gcc bugs.

The real problem is quite worrysome.  I don't think a port
(lm32) should have to solve it with constraints; the (inline)
function parameter *should* cause a non-clobbering temporary to
hold any intermediate operations, but it looks as if you'll
otherwise have to debug it yourself.

brgds, H-P

Re: libgcc: strange optimization

2011-08-02 Thread Hans-Peter Nilsson

On Mon, 1 Aug 2011, Richard Henderson wrote:

 On 08/01/2011 01:30 PM, Michael Walle wrote:
   1) function inlining
   2) deferred argument evaluation
   3) because our target has no barrel shifter, (arg  10) is emitted as a
  function call to libgcc's __ashrsi3 (_in place_!)
   4) BAM! dead code elimination optimizes r8 assignment away because calli
  may clobber r1-r10 (callee saved registers on lm32).

 I'm afraid the only solution I can think of is to force F1 out-of-line.

Or another temporary - but the parameter should already have
that effect.

brgds, H-P

Re: libgcc: strange optimization

2011-08-02 Thread Georg-Johann Lay

Michael Walle wrote:
 Hi,
 
 That was quick :)
 
 Your asm has no output operands and no side effects, with more
 aggressive optimization the whole ask would disappear.
 Sorry, that was just a small test file, the original code has output operands.
 
 The new test code:
  static int inline f1(int arg)
  {
register int ret asm(r1);
register int a1 asm(r8) = 10;
register int a2 asm(r1) = arg;
 
asm volatile (scall : =r(ret) : r(a1), r(a2) : memory);
 
return ret;
  }
 
  int f2(int arg1, int arg2)
  {
return f1(arg1  10);
  }
 
 translates to the same assembly:
 f2:
 addi sp, sp, -4
 sw   (sp+4), ra
 addi r2, r0, 10
 calli__ashrsi3
 scall
 lw   ra, (sp+4)
 addi sp, sp, 4
 bra
 
 PS. R1 is the return register in the target architecture ABI.

I'd guess you ran into

http://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html#Local-Reg-Vars

A common pitfall is to initialize multiple call-clobbered registers
with arbitrary expressions,  where a function call or  library call
for an arithmetic operator will overwrite a register value from a
previous assignment, for example r0 below:

 register int *p1 asm (r0) = ...;
 register int *p2 asm (r1) = ...;

In those cases, a solution is to use a temporary variable for each
arbitrary expression.

So I'd try to rewrite it as

static int inline f1 (int arg0)
{
int arg = arg0;
register int ret asm(r1);
register int a1 asm(r8) = 10;
register int a2 asm(r1) = arg;

asm volatile (scall : =r(ret) : r(a1), r(a2) : memory);

return ret;
}

and if that does not help the rather hackish

static int inline f1 (int arg0)
{
int arg = arg0;
register int ret asm(r1);
register int a1 asm(r8);
register int a2 asm(r1);

asm ( : +r (arg));

a1 = 10;
a2 = arg;

asm volatile (scall : =r(ret) : r(a1), r(a2) : memory);

return ret;
}

Re: libgcc: strange optimization

2011-08-02 Thread Mikael Pettersson

Hans-Peter Nilsson writes:
  On Mon, 1 Aug 2011, Richard Henderson wrote:
  
   On 08/01/2011 01:30 PM, Michael Walle wrote:
 1) function inlining
 2) deferred argument evaluation
 3) because our target has no barrel shifter, (arg  10) is emitted as a
function call to libgcc's __ashrsi3 (_in place_!)
 4) BAM! dead code elimination optimizes r8 assignment away because calli
may clobber r1-r10 (callee saved registers on lm32).
  
   I'm afraid the only solution I can think of is to force F1 out-of-line.
  
  Or another temporary - but the parameter should already have
  that effect.

It should, but doesn't.  See PR48863 for similar breakage on ARM.

/Mikael

Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther

On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson mi...@it.uu.se wrote:
 Hans-Peter Nilsson writes:
   On Mon, 1 Aug 2011, Richard Henderson wrote:
  
    On 08/01/2011 01:30 PM, Michael Walle wrote:
      1) function inlining
      2) deferred argument evaluation
      3) because our target has no barrel shifter, (arg  10) is emitted 
 as a
     function call to libgcc's __ashrsi3 (_in place_!)
      4) BAM! dead code elimination optimizes r8 assignment away because 
 calli
     may clobber r1-r10 (callee saved registers on lm32).
   
    I'm afraid the only solution I can think of is to force F1 out-of-line.
  
   Or another temporary - but the parameter should already have
   that effect.

 It should, but doesn't.  See PR48863 for similar breakage on ARM.

On GIMPLE we don't see either the libcall nor those dependencies.

Don't use register vars.

Richard.

 /Mikael

Re: libgcc: strange optimization

2011-08-02 Thread Georg-Johann Lay

Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson mi...@it.uu.se wrote:
 Hans-Peter Nilsson writes:
   On Mon, 1 Aug 2011, Richard Henderson wrote:
  
On 08/01/2011 01:30 PM, Michael Walle wrote:
  1) function inlining
  2) deferred argument evaluation
  3) because our target has no barrel shifter, (arg  10) is emitted 
 as a
 function call to libgcc's __ashrsi3 (_in place_!)
  4) BAM! dead code elimination optimizes r8 assignment away because 
 calli
 may clobber r1-r10 (callee saved registers on lm32).
   
I'm afraid the only solution I can think of is to force F1 out-of-line.
  
   Or another temporary - but the parameter should already have
   that effect.

 It should, but doesn't.  See PR48863 for similar breakage on ARM.
 
 On GIMPLE we don't see either the libcall nor those dependencies.
 
 Don't use register vars.

IMO such code is supposed to work, e.g. in order to write an interface
to a non-ABI assembler function.  In general this cannot be expressed
by means of constraints because that would imply plethora of different
constraints for each register/mode.

From the documentation a user will expect that he wrote correct code
and is not supposed to bother with GCC inerts like implicit library
calls or GIMPLE or whatever.

Johann

 Richard.

Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther

On Tue, Aug 2, 2011 at 12:01 PM, Georg-Johann Lay a...@gjlay.de wrote:
 Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson mi...@it.uu.se wrote:
 Hans-Peter Nilsson writes:
   On Mon, 1 Aug 2011, Richard Henderson wrote:
  
    On 08/01/2011 01:30 PM, Michael Walle wrote:
      1) function inlining
      2) deferred argument evaluation
      3) because our target has no barrel shifter, (arg  10) is emitted 
 as a
     function call to libgcc's __ashrsi3 (_in place_!)
      4) BAM! dead code elimination optimizes r8 assignment away because 
 calli
     may clobber r1-r10 (callee saved registers on lm32).
   
    I'm afraid the only solution I can think of is to force F1 out-of-line.
  
   Or another temporary - but the parameter should already have
   that effect.

 It should, but doesn't.  See PR48863 for similar breakage on ARM.

 On GIMPLE we don't see either the libcall nor those dependencies.

 Don't use register vars.

 IMO such code is supposed to work, e.g. in order to write an interface
 to a non-ABI assembler function.  In general this cannot be expressed
 by means of constraints because that would imply plethora of different
 constraints for each register/mode.

 From the documentation a user will expect that he wrote correct code
 and is not supposed to bother with GCC inerts like implicit library
 calls or GIMPLE or whatever.

Well.  I suppose what is happening is that we expand from

  register int a1 __asm__ (*edi);
  register int a2 __asm__ (*eax);
  int D.2700;

bb 2:
  D.2700_2 = arg_1(D)  10;
  a1 = 10;
  a2 = D.2700_2;
  __asm__ __volatile__(scall :  : r a1, r a2);
  return;

and end up TERing D.2700_2 = arg_1(D)  10, materializing a
libcall after the setting of a1 and before the asm.  To confirm that
try -fno-tree-ter.  I don't see how we can easily avoid this without
exposing libcalls at the gimple level.  Maybe disable TER if
we see any register variable use.

Richard.

 Johann

 Richard.

Re: libgcc: strange optimization

2011-08-02 Thread Michael Walle


Hi,

 To confirm that try -fno-tree-ter.

lm32-gcc -O1 -fno-tree-ter -S -c test.c generates the following working
assembly code:

f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
addi r8, r0, 10
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

-- 
Michael

Re: libgcc: strange optimization

2011-08-02 Thread Mikael Pettersson

Michael Walle writes:
  
  Hi,
  
   To confirm that try -fno-tree-ter.
  
  lm32-gcc -O1 -fno-tree-ter -S -c test.c generates the following working
  assembly code:
  
  f2:
   addi sp, sp, -4
   sw   (sp+4), ra
   addi r2, r0, 10
   calli__ashrsi3
   addi r8, r0, 10
   scall
   lw   ra, (sp+4)
   addi sp, sp, 4
   bra

-fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther

On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson mi...@it.uu.se wrote:
 Michael Walle writes:
  
   Hi,
  
    To confirm that try -fno-tree-ter.
  
   lm32-gcc -O1 -fno-tree-ter -S -c test.c generates the following working
   assembly code:
  
   f2:
        addi     sp, sp, -4
        sw       (sp+4), ra
        addi     r2, r0, 10
        calli    __ashrsi3
        addi     r8, r0, 10
        scall
        lw       ra, (sp+4)
        addi     sp, sp, 4
        b        ra

 -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

It's of course only a workaround, not a real fix as nothing prevents
other optimizers from performing the re-scheduling TER does.

I suggest to amend the documentation for local call-clobbered register
variables to say that the only valid sequence using them is from a
non-inlinable function that contains only direct initializations of the
register variables from constants or parameters.

Or go one step further and deprecate local register variables alltogether
(they IMHO don't make much sense, and rather the targets should provide
a way to properly constrain asm inputs and outputs).

Richard.

Re: libgcc: strange optimization

2011-08-02 Thread Georg-Johann Lay

Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson mi...@it.uu.se wrote:
 Michael Walle writes:
  
   Hi,
  
To confirm that try -fno-tree-ter.
  
   lm32-gcc -O1 -fno-tree-ter -S -c test.c generates the following working
   assembly code:
  
   f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
addi r8, r0, 10
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

 -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
 
 It's of course only a workaround, not a real fix as nothing prevents
 other optimizers from performing the re-scheduling TER does.
 
 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.
 
 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

Strongly oppose.

Local register variables are very useful; maybe not on a linux machine
but on embedded systems there are situations you cannot do without.

You ever counted the constraint alternatives that that would be needed?

You'd need different constraints for QI/HI/SI/DI for each register
resulting in myriads of register classes increasing register allocation
time and dumps would become impossible to read.  I once tried it for
a target...never again.

Besides that, with local register vars the developer can write code that
meets his requirements, whereas for constraints you will always have to
change the compiler and make existing sources incompatible.

If GCC provides such a feature it must work properly and not be sacrifices
for this or that optimization.

Correct code is always preferred over non-functional code.

Johann

 
 Richard.

Re: libgcc: strange optimization

2011-08-02 Thread Hans-Peter Nilsson

On Tue, 2 Aug 2011, Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson mi...@it.uu.se wrote:
  Michael Walle writes:
   
    Hi,
   
     To confirm that try -fno-tree-ter.
   
    lm32-gcc -O1 -fno-tree-ter -S -c test.c generates the following working
    assembly code:
   
    f2:
         addi     sp, sp, -4
         sw       (sp+4), ra
         addi     r2, r0, 10
         calli    __ashrsi3
         addi     r8, r0, 10
         scall
         lw       ra, (sp+4)
         addi     sp, sp, 4
         b        ra
 
  -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

 It's of course only a workaround, not a real fix as nothing prevents
 other optimizers from performing the re-scheduling TER does.

 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.

I'd be ok with that, FWIW; I see the problem with keeping the
scheduling of operations in a working order (yuck) and I don't
see how else to keep it working ...except perhaps make gcc flag
functions with register asms as non-inlinable, maybe even flag
down any of the dangerous re-scheduling?

Maybe I can do that with some hand-holding?

 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

They do make sense when implementing e.g. system calls, and
they're documented to work as discussed.  (I almost regret
making that happen, though.)  Fortunately such functions are
small, and not relatively much helped by inlining (it's a
*syscall*; much more happening beyond the call than is affected
by inlining some parameter initialization).  Sure, new targets
are much better off by implementing that through other means,
but preferably intrinsic functions to asms.

brgds, H-P

Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther

On Tue, Aug 2, 2011 at 2:53 PM, Hans-Peter Nilsson h...@bitrange.com wrote:
 On Tue, 2 Aug 2011, Richard Guenther wrote:
 On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson mi...@it.uu.se wrote:
  Michael Walle writes:
   
    Hi,
   
     To confirm that try -fno-tree-ter.
   
    lm32-gcc -O1 -fno-tree-ter -S -c test.c generates the following 
  working
    assembly code:
   
    f2:
         addi     sp, sp, -4
         sw       (sp+4), ra
         addi     r2, r0, 10
         calli    __ashrsi3
         addi     r8, r0, 10
         scall
         lw       ra, (sp+4)
         addi     sp, sp, 4
         b        ra
 
  -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

 It's of course only a workaround, not a real fix as nothing prevents
 other optimizers from performing the re-scheduling TER does.

 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.

 I'd be ok with that, FWIW; I see the problem with keeping the
 scheduling of operations in a working order (yuck) and I don't
 see how else to keep it working ...except perhaps make gcc flag
 functions with register asms as non-inlinable, maybe even flag
 down any of the dangerous re-scheduling?

But then can't people use a pure assembler stub instead?  Without
inlining there isn't much benefit left from writing

 void f1(int arg)
 {
  register int a1 asm(r8) = 10;
  register int a2 asm(r1) = arg;

  asm(scall : : r(a1), r(a2));
 }

instead of

f1:
 mov r8, 10
 mov r1, rX
 scall
 ret

in a .s file no?  I doubt much prologue/epilogue is needed.

Or even write

void f1(int arg)
{
 asm(mov r8, %0; mov r1 %1; scall; : : g(a1), g(a2) : r8, r1);
}

which should be inlinable again (yes, in inlined for not optimally
register-allocated, but compared to the non-inline routine?).

Richard.

 Maybe I can do that with some hand-holding?

 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

 They do make sense when implementing e.g. system calls, and
 they're documented to work as discussed.  (I almost regret
 making that happen, though.)  Fortunately such functions are
 small, and not relatively much helped by inlining (it's a
 *syscall*; much more happening beyond the call than is affected
 by inlining some parameter initialization).  Sure, new targets
 are much better off by implementing that through other means,
 but preferably intrinsic functions to asms.

 brgds, H-P

Re: libgcc: strange optimization

2011-08-02 Thread Hans-Peter Nilsson

On Tue, 2 Aug 2011, Richard Guenther wrote:
  I'd be ok with that, FWIW; I see the problem with keeping the
  scheduling of operations in a working order (yuck) and I don't
  see how else to keep it working ...except perhaps make gcc flag
  functions with register asms as non-inlinable, maybe even flag
  down any of the dangerous re-scheduling?

 But then can't people use a pure assembler stub instead?

I see your point, but you're thinking new code, I'm thinking
let's keep existing code used by several targets and documented
as working the last seven years working.  Maybe breakable with a
major release; in gcc-5.  (Oh no, I see what's coming. :)

brgds, H-P

Re: libgcc: strange optimization

2011-08-02 Thread Ian Lance Taylor

Richard Guenther richard.guent...@gmail.com writes:

 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

No, local register variables are documented as working and many programs
rely on them.  They are a straightforward way to get an asm argument in
a specific register, and I don't see any reason to break that.

 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.

Let's just implement those requirements in the compiler itself.

Ian

Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther

On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor i...@google.com wrote:
 Richard Guenther richard.guent...@gmail.com writes:

 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

 No, local register variables are documented as working and many programs
 rely on them.  They are a straightforward way to get an asm argument in
 a specific register, and I don't see any reason to break that.

Well, maybe they look like so.  But in reality there is _no_ connection
from the register setup to the actual asm.  Which is the problem the
compiler faces here (apart from the libcall issue).  If there should be
an implicit dependence of all asms to all local register var setters
and users then this isn't implemented on gimple (or rather it works
by chance there as we treat register vars as memory and do not
disambiguate anything across asms (yet)).

 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.

 Let's just implement those requirements in the compiler itself.

Doesn't work for existing code, no?  And if thinking new code then
I'd rather have explicit dependences (and a way to represent them).
Thus, for example

asm (scall : : asm(r0) (10), ...)

thus, why force new constraints when we already can figure out
local register vars by register name?  Why not extend the constraint
syntax somehow to allow specifying the same effect?

Richard.

 Ian

Re: libgcc: strange optimization

2011-08-02 Thread Ian Lance Taylor

Richard Guenther richard.guent...@gmail.com writes:

 On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor i...@google.com wrote:
 Richard Guenther richard.guent...@gmail.com writes:

 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

 No, local register variables are documented as working and many programs
 rely on them.  They are a straightforward way to get an asm argument in
 a specific register, and I don't see any reason to break that.

 Well, maybe they look like so.  But in reality there is _no_ connection
 from the register setup to the actual asm.  Which is the problem the
 compiler faces here (apart from the libcall issue).  If there should be
 an implicit dependence of all asms to all local register var setters
 and users then this isn't implemented on gimple (or rather it works
 by chance there as we treat register vars as memory and do not
 disambiguate anything across asms (yet)).

I'm not sure why we need to do anything at the GIMPLE level other than
disable some optimizations.  There is a connection from the register
variable to the asm--the asm refers to the variable.  There is nothing
specific about the register in there, but at the GIMPLE level there
doesn't have to be.

We should not break a useful existing feature because we find it
inconvenient.  Let's just disable some optimizations so that it
continues to work.


 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.

 Let's just implement those requirements in the compiler itself.

 Doesn't work for existing code, no?

Why not?


 And if thinking new code then
 I'd rather have explicit dependences (and a way to represent them).
 Thus, for example

 asm (scall : : asm(r0) (10), ...)

 thus, why force new constraints when we already can figure out
 local register vars by register name?  Why not extend the constraint
 syntax somehow to allow specifying the same effect?

I agree that it would be a good idea to permit asms to indicate the
specific register the operand should go into.

Ian

Re: libgcc: strange optimization

2011-08-02 Thread Richard Henderson

On 08/02/2011 05:22 AM, Richard Guenther wrote:
 -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
 
 It's of course only a workaround, not a real fix as nothing prevents
 other optimizers from performing the re-scheduling TER does.
 
 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.
 
 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

Neither of these is a viable option.

What we might be able to do is throttle TER when the destination
is a local register variable.  This should unbreak the common case
of local regs immediately surrounding an asm.


r~

Re: libgcc: strange optimization

2011-08-02 Thread Georg-Johann Lay


Richard Guenther schrieb:

I suggest to amend the documentation for local call-clobbered register
variables to say that the only valid sequence using them is from a
non-inlinable function that contains only direct initializations of the
register variables from constants or parameters.

Richard.


That's completely counterproductive.

If a developer invents asm or local register variables he has a
very good reason for that choice like to meet hard (with hard as
in HARD) real time constraints.  Disabling inlining a function
that uses local register vars would make many places of local
register vars unusable because thre would no more be a way to
write down the exact register usage footprint of a piece of
asm.  Typical use-cases are interfacing to a function that has a
smaller register footprint than an ordinary block-box function or
doing some arithmetic that needs special hard regs for which there
are no fitting constraints. All this will be impossible if inlining
is disabled for such functions; then it is no more possible to
describe such a low-overhead piece of code without calling a black box 
function, clobber all call-clobbered registers, render a maybe 
tail-function into a non tail-call function etc.


Embedded systems like ARM or PowerPC based get more and more
important over the years and GCC should put more attention to
them and their needs; not only to bolide servers/PCs.
This includes systems with hard real time constraints.

Johann

Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther

On Tue, Aug 2, 2011 at 6:02 PM, Richard Henderson r...@redhat.com wrote:
 On 08/02/2011 05:22 AM, Richard Guenther wrote:
 -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

 It's of course only a workaround, not a real fix as nothing prevents
 other optimizers from performing the re-scheduling TER does.

 I suggest to amend the documentation for local call-clobbered register
 variables to say that the only valid sequence using them is from a
 non-inlinable function that contains only direct initializations of the
 register variables from constants or parameters.

 Or go one step further and deprecate local register variables alltogether
 (they IMHO don't make much sense, and rather the targets should provide
 a way to properly constrain asm inputs and outputs).

 Neither of these is a viable option.

 What we might be able to do is throttle TER when the destination
 is a local register variable.  This should unbreak the common case
 of local regs immediately surrounding an asm.

Sure, similar to disabling TER for functions containing such vars.
But it isn't a solution for the general issue that nothing prevents
scheduling gimple statements between register variable def and use.

Richard.


 r~

Re: libgcc: strange optimization

2011-08-02 Thread Miles Bader

Richard Guenther richard.guent...@gmail.com writes:
 But then can't people use a pure assembler stub instead?  Without
 inlining there isn't much benefit left from writing

  void f1(int arg)
  {
   register int a1 asm(r8) = 10;
   register int a2 asm(r1) = arg;

   asm(scall : : r(a1), r(a2));
  }

 instead of

 f1:
  mov r8, 10
  mov r1, rX
  scall
  ret

 in a .s file no?  I doubt much prologue/epilogue is needed.

 Or even write

 void f1(int arg)
 {
  asm(mov r8, %0; mov r1 %1; scall; : : g(a1), g(a2) : r8, r1);
 }

Of course in practice people _do_ want to use it with f1 inlined, where
using reg variables (or alternatively, some expanded constraint language
for the asm parameters) can really get rid of tons of unnecessary asm
moves, and they want the compiler to guard against conflicts.

-Miles

-- 
Whatever you do will be insignificant, but it is very important that
 you do it.  Mahatma Gandhi

libgcc: strange optimization

2011-08-01 Thread Michael Walle

Hi list,


consider the following test code:
 static void inline f1(int arg)
 {
   register int a1 asm(r8) = 10;
   register int a2 asm(r1) = arg;

   asm(scall : : r(a1), r(a2));
 }

 void f2(int arg)
 {
   f1(arg  10);
 }


If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
email), the a1 = 10; assignment is optimized away. According to my
understanding the following happens:

 1) function inlining
 2) deferred argument evaluation
 3) because our target has no barrel shifter, (arg  10) is emitted as a
function call to libgcc's __ashrsi3 (_in place_!)
 4) BAM! dead code elimination optimizes r8 assignment away because calli
may clobber r1-r10 (callee saved registers on lm32).

If you use:
 void f2(int arg)
 {
   f1(__ashrsi3(arg, 10));
 }
everything works as expected, __ashrsi3 is evaluated before the body of f1.

According to wikipedia [1], function calls are sequence points and all
side effects for the arguments are completed before entering the function.
So in my understanding the deferred argument evaluation is wrong if that
operation is emitted as a call to a libgcc helper.

I tried that on other architectures too (microblaze and avr). All show the
same behaviour. If an integer arithmetic opcode is translated to a call to
libgcc, every assignment to a register which is clobbered by the call is
optimized away.

The GCC mentions some caveats when using explicit register variables [2]:
  In the above example, beware that a register that is call-clobbered by
  the target ABI will be overwritten by any function call in the
  assignment, including library calls for arithmetic operators. Also a
  register may be clobbered when generating some operations, like variable
  shift, memory copy or memory move on x86. Assuming it is a call-clobbered
  register, this may happen to r0 above by the assignment to p2. If you
  have to use such a register, use temporary variables for expressions
  between the register assignment.

But i think, this may not apply to the case above, where the arithmetic
operator is an argument of the called function. Eg. there is a sequence
point and the statements must not be reordered.


Assembler output (lm32-gcc -O1 -S -c test.c):
f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

Assembler output with no DCE (lm32-gcc -O1 -S -fno-dce -c test.c)
f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r8, r0, 10
addi r2, r0, 10
calli__ashrsi3
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

[1] http://en.wikipedia.org/wiki/Sequence_point
[2]
http://gcc.gnu.org/onlinedocs/gcc/Extended-
Asm.html#Example%20of%20asm%20with%20clobbered%20asm%20reg

-- 
Michael

Re: libgcc: strange optimization

2011-08-01 Thread Georg-Johann Lay


Michael Walle schrieb:

Hi list,

consider the following test code:
 static void inline f1(int arg)
 {
   register int a1 asm(r8) = 10;
   register int a2 asm(r1) = arg;

   asm(scall : : r(a1), r(a2));
 }

 void f2(int arg)
 {
   f1(arg  10);
 }


If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
email), the a1 = 10; assignment is optimized away.


Your asm has no output operands and no side effects, with more 
aggressive optimization the whole ask would disappear.


What you want is maybe something like

   asm volatile (scall : : r(a1), r(a2));

Johann

Re: libgcc: strange optimization

2011-08-01 Thread Michael Walle


Hi,

That was quick :)

 Your asm has no output operands and no side effects, with more
 aggressive optimization the whole ask would disappear.
Sorry, that was just a small test file, the original code has output operands.

The new test code:
 static int inline f1(int arg)
 {
   register int ret asm(r1);
   register int a1 asm(r8) = 10;
   register int a2 asm(r1) = arg;

   asm volatile (scall : =r(ret) : r(a1), r(a2) : memory);

   return ret;
 }

 int f2(int arg1, int arg2)
 {
   return f1(arg1  10);
 }

translates to the same assembly:
f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

PS. R1 is the return register in the target architecture ABI.

-- 
Michael

Re: libgcc: strange optimization

2011-08-01 Thread Richard Henderson

On 08/01/2011 01:30 PM, Michael Walle wrote:
  1) function inlining
  2) deferred argument evaluation
  3) because our target has no barrel shifter, (arg  10) is emitted as a
 function call to libgcc's __ashrsi3 (_in place_!)
  4) BAM! dead code elimination optimizes r8 assignment away because calli
 may clobber r1-r10 (callee saved registers on lm32).

I'm afraid the only solution I can think of is to force F1 out-of-line.
That's the only safe way to make sure that arguments are completely
evaluated before forcing them into hard register variables.

Alternately, expose new constraints such that you don't need the
hard register variables at all.  E.g.

  asm(scall : : R08(a1), R01(a2));

where Rxx is defined in constraints.md for every relevant register.
That'll prevent a reference to the hard register until register
allocation, at which point we'll have done the right thing with
the shift.


r~

47 matches

Mail list logo