Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread Richard Biener
On Mon, Aug 19, 2019 at 8:06 PM John Darrington
 wrote:
>
> On Mon, Aug 19, 2019 at 10:07:11AM -0500, Segher Boessenkool wrote:
>
>  > ? As I remember there were a few other ideas from Richard Biener and
>  > Segher Boessenkool.? I also proposed to add a new address register 
> which
>  > will be always a fixed stack memory slot at the end. Unfortunately I am
>  > not familiar with the target and the port to say in details how to do
>  > it.? But I think it is worth to try.
>
>  The m68hc11 port used the fake Z register approach, and I believe it had
>  some special machine pass to get rid of it right before assembler output.
>
>  (r171302 is when it was removed -- last version was
>  
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061
>  for the machine reorg stuff).
>
>  No idea how well it works...  But it's only needed if you are forced to
>  have a frame pointer IIUC?
>
>
>  Segher
>
>
> Most of these suggestions involve adding some sort of virtual registers
> So I hacked the machine description to add two new registers Z1 and Z2
> with the same mode as X and Y.
>
> Obviously the assembler balks at this.  However the compiler still
> ICEs at the same place as before.
>
> So this suggests that our original diagnosis, viz: there are not enough
> address registers was not accurate, and in fact there is some other
> problem?

That sounds likely.  Given you have indirect addressing you could
simulate N virtual regs by placing them in a virtual reg table in memory
and accessed via a fixed address register (assuming all instructions
that would need an address reg also can take that indirect from memory).

Richard.

> J'
>
> --
> Avoid eavesdropping.  Send strong encrypted email.
> PGP Public key ID: 1024D/2DE827B3
> fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
> See http://sks-keyservers.net or any PGP keyserver for public key.
>


Aw: Re: asking for __attribute__((aligned()) clarification

2019-08-19 Thread Markus Fröschle
Thank you (and others) for your answers. Now I'm just as smart as before, 
however.

Is it a supported, documented, 'long term' feature we can rely on or not?

If yes, I would expect it to be properly documented. If not, never mind.

> Gesendet: Montag, 19. August 2019 um 16:08 Uhr
> Von: "Alexander Monakov" 
> An: "Richard Earnshaw (lists)" 
> Cc: "Paul Koning" , "Markus Fröschle" 
> , gcc@gcc.gnu.org
> Betreff: Re: asking for __attribute__((aligned()) clarification
>
> On Mon, 19 Aug 2019, Richard Earnshaw (lists) wrote:
> 
> > Correct, but note that you can only pack structs and unions, not basic 
> > types.
> > there is no way of under-aligning a basic type except by wrapping it in a
> > struct.
> 
> I don't think that's true. In GCC-9 the doc for 'aligned' attribute has been
> significantly revised, and now ends with
> 
>   When used as part of a typedef, the aligned attribute can both increase and
>   decrease alignment, and specifying the packed attribute generates a 
> warning. 
> 
> (but I'm sure defacto behavior of accepting and honoring reduced alignment on
> a typedef'ed scalar type goes way earlier than gcc-9)
> 
> Alexander
>


[GSoC-19] Final Evaluations

2019-08-19 Thread Tejas Joshi
Hello.
The deadline for final evaluations is 26th of August and there are
certain things that I need to submit along with the code.
A link has to be submitted of the codes that I have written and I am
thinking of doing it as a github gist along with links to commits to
my gcc fork. I know that the documentation in the respective .texi
files is still remaining, of whatever changes I have made and I will
do it as soon as I can but I have to give some of my time to
university exams from 20th to 24th of this month.
I also need to mention the codes that have been merged with the
original source code of GCC. I don't know what codes are OK to be
merged with GCC.
Patches I have submitted till now :





Other things like FADD variants expansion and fromfp variants
expansion that needed to be completed in the course of GSoC is
remaining and I am trying my best to complete it till the end of
deadline. Even if some work remains, I will continue to work on them
to completion and also want to extend my work with other intricacies
or works, even in the ISO/IEC specifications/extensions.

Thanks,
Tejas


Re: Help with bug in GCC garbage collector

2019-08-19 Thread Steve Ellcey
On Mon, 2019-08-19 at 17:05 -0600, Jeff Law wrote:
> 
> There's a real good chance Martin Liska has already fixed this.  He's
> made a couple fixes in the last week or so with the interactions
> between
> the GC system and the symbol tables.
> 
> 
> 2019-08-15  Martin Liska  
> 
> PR ipa/91404
> * passes.c (order): Remove.
> (uid_hash_t): Likewise).
> (remove_cgraph_node_from_order): Remove from set
> of pointers (cgraph_node *).
> (insert_cgraph_node_to_order): New.
> (duplicate_cgraph_node_to_order): New.
> (do_per_function_toporder): Register all 3 cgraph hooks.
> Skip removed_nodes now as we know about all of them.
> 
> 
> The way I'd approach would be to configure a compiler with
> --enable-checking=gc,gcac, just build it through stage1.  Then run your
> test through that compiler which should fail.  THen apply Martin's patch
> (or update to the head of the trunk), rebuild the stage1 compiler and
> verify it works.

I had already built a compiler with --enable-checking=gc,gcac, that did
not catch the bug (I still got a segfault).  I did update my sources
though and the bug does not happen at ToT so it looks like Martin's
patch did fix my bug.

Steve Ellcey
sell...@marvell.com


Re: Help with bug in GCC garbage collector

2019-08-19 Thread Jeff Law
On 8/19/19 4:59 PM, Steve Ellcey wrote:
> I was wondering if anyone could help me investigate a bug I am
> seeing in the GCC garbage collector.  This bug (which may or may not
> be PR 89179) is causing a segfault in GCC, but when I try to create
> a preprocessed source file, the bug doesn't trigger.  The problem is
> with the garbage collector trying to mark some memory that has
> already been freed.  I have tracked down the initial allocation to:
> 
> symbol_table::allocate_cgraph_symbol
> 
> It has:
> 
> node = ggc_cleared_alloc ();
> 
> to allocate a cgraph node.  With the GGC debugging on I see this 
> allocated:
> 
> Allocating object, requested size=360, actual=360 at 0x7029c210
> on 0x41b148c0
> 
> then freed:
> 
> Freeing object, actual size=360, at 0x7029c210 on 0x41b148c0
> 
> And then later, while the garbage collector is marking nodes, I see:
> 
> Marking 0x7029c210
> 
> The garbage collector shouldn't be marking this node if has already 
> been freed.
> 
> So I guess my main question is how do I figure out how the garbage 
> collector got to this memory location?  I am guessing some GTY
> pointer is still pointing to it and hadn't got nulled out when the
> memory was freed.  Does that seem like the most likely cause?
> 
> I am not sure why I am only running into this with one particular 
> application on my Aarch64 platform.  I am building it with -fopenmp, 
> which could have something to do with it (though there are no simd
> functions in the application).  The application is not that large as
> C++ programs go.
There's a real good chance Martin Liska has already fixed this.  He's
made a couple fixes in the last week or so with the interactions between
the GC system and the symbol tables.


2019-08-15  Martin Liska  

PR ipa/91404
* passes.c (order): Remove.
(uid_hash_t): Likewise).
(remove_cgraph_node_from_order): Remove from set
of pointers (cgraph_node *).
(insert_cgraph_node_to_order): New.
(duplicate_cgraph_node_to_order): New.
(do_per_function_toporder): Register all 3 cgraph hooks.
Skip removed_nodes now as we know about all of them.


The way I'd approach would be to configure a compiler with
--enable-checking=gc,gcac, just build it through stage1.  Then run your
test through that compiler which should fail.  THen apply Martin's patch
(or update to the head of the trunk), rebuild the stage1 compiler and
verify it works.


jeff


Help with bug in GCC garbage collector

2019-08-19 Thread Steve Ellcey
I was wondering if anyone could help me investigate a bug I am seeing
in the GCC garbage collector.  This bug (which may or may not be PR
89179) is causing a segfault in GCC, but when I try to create a
preprocessed source file, the bug doesn't trigger.  The problem is with
the garbage collector trying to mark some memory that has already been
freed.  I have tracked down the initial allocation to:

symbol_table::allocate_cgraph_symbol

It has:

node = ggc_cleared_alloc ();

to allocate a cgraph node.  With the GGC debugging on I see this
allocated:

Allocating object, requested size=360, actual=360 at 0x7029c210 on 
0x41b148c0

then freed:

Freeing object, actual size=360, at 0x7029c210 on 0x41b148c0

And then later, while the garbage collector is marking nodes, I see:

Marking 0x7029c210

The garbage collector shouldn't be marking this node if has already
been freed.

So I guess my main question is how do I figure out how the garbage
collector got to this memory location?  I am guessing some GTY pointer
is still pointing to it and hadn't got nulled out when the memory was
freed.  Does that seem like the most likely cause?

I am not sure why I am only running into this with one particular
application on my Aarch64 platform.  I am building it with -fopenmp,
which could have something to do with it (though there are no simd functions in 
the application).  The application is not that large as C++ programs go.

Steve Ellcey
sell...@marvell.com


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread John Darrington
On Mon, Aug 19, 2019 at 10:07:11AM -0500, Segher Boessenkool wrote:

 > ? As I remember there were a few other ideas from Richard Biener and 
 > Segher Boessenkool.? I also proposed to add a new address register which 
 > will be always a fixed stack memory slot at the end. Unfortunately I am 
 > not familiar with the target and the port to say in details how to do 
 > it.? But I think it is worth to try.
 
 The m68hc11 port used the fake Z register approach, and I believe it had
 some special machine pass to get rid of it right before assembler output.
 
 (r171302 is when it was removed -- last version was
 
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061
 for the machine reorg stuff).
 
 No idea how well it works...  But it's only needed if you are forced to
 have a frame pointer IIUC?
 
 
 Segher


Most of these suggestions involve adding some sort of virtual registers
So I hacked the machine description to add two new registers Z1 and Z2 
with the same mode as X and Y.

Obviously the assembler balks at this.  However the compiler still
ICEs at the same place as before.

So this suggests that our original diagnosis, viz: there are not enough
address registers was not accurate, and in fact there is some other
problem?

J'

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



Re: Faster number printing and pointer alignment algorithms.

2019-08-19 Thread Ruslan Kabatsayev
Hi Cale,

On Mon, 19 Aug 2019 at 19:50, Cale McCollough  wrote:
>
> My name is Cale McCollough and I'm the author of the Fastest Method to
> Print Integers and Floating-point Numbers, the Puff algorithm, that
> eliminates over half of the division instructions from the
> industry-standard mod 100 div 100 technique, saving hundreds to thousands
> of clock cycles and eliminating one division instruction from Grisu2, the
> world's fastest floating-point-to-string algoihrm.

How does performance of your method compare to that of Ryū
(sources[1], paper[2])?

[1]: https://github.com/ulfjack/ryu
[2]: https://dl.acm.org/citation.cfm?id=3192369

> The article can be found
> at:
> https://github.com/kabuki-starship/script2/wiki/Fastest-Method-to-Print-Integers-and-Floating-point-Numbers
>
> The other Visual-C++ optimization I have for the community is the Fastest
> Method to Align Pointers, which uses only 3 instructions plus loading the
> mask. I have verified is faster than Microsoft and GCC implementation. The
> article can be found at:
> https://github.com/kabuki-starship/script2/wiki/Fastest-Method-to-Align-Pointers
>
> Also, I have invested all of my time and labor into these open-source
> technologies, have worked for years without pay, and I am looking for a
> cool job that can facilitate my MongoDB/BJSON competitor called the ASCII
> Abstract Data Specification, Chinese Room Abstract Stack Machine (Crabs),
> SCRIPT Protocol, and Script2. Thanks.

Regards,
Ruslan


Faster number printing and pointer alignment algorithms.

2019-08-19 Thread Cale McCollough
My name is Cale McCollough and I'm the author of the Fastest Method to
Print Integers and Floating-point Numbers, the Puff algorithm, that
eliminates over half of the division instructions from the
industry-standard mod 100 div 100 technique, saving hundreds to thousands
of clock cycles and eliminating one division instruction from Grisu2, the
world's fastest floating-point-to-string algoihrm. The article can be found
at:
https://github.com/kabuki-starship/script2/wiki/Fastest-Method-to-Print-Integers-and-Floating-point-Numbers

The other Visual-C++ optimization I have for the community is the Fastest
Method to Align Pointers, which uses only 3 instructions plus loading the
mask. I have verified is faster than Microsoft and GCC implementation. The
article can be found at:
https://github.com/kabuki-starship/script2/wiki/Fastest-Method-to-Align-Pointers

Also, I have invested all of my time and labor into these open-source
technologies, have worked for years without pay, and I am looking for a
cool job that can facilitate my MongoDB/BJSON competitor called the ASCII
Abstract Data Specification, Chinese Room Abstract Stack Machine (Crabs),
SCRIPT Protocol, and Script2. Thanks.


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread Segher Boessenkool
On Mon, Aug 19, 2019 at 09:14:22AM -0400, Vladimir Makarov wrote:
> On 2019-08-19 3:35 a.m., John Darrington wrote:
> >On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote:
> >  No I meant something like that
> >  
> >  (define_special_memory_constraint "a" ...)
> >  (define_predicate "my_special_predicate" ...
> > 
> >   {
> > if (lra_in_progress_p)
> >   return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
> >   reg_renumber[REGNO(op)] < 0;
> > return true if memory with sp addressing;
> >  })
> >  
> >  I think LRA spills pseudo-register and it will be memory addressed 
> >  by sp
> >  at the end of LRA.
> >
> >What I've done is this:
> >
> >(define_predicate "my_special_predicate"
> > (match_operand 0 "memory_operand")
> >  {
> >debug_rtx (op);
> >gcc_assert (MEM_P (op));
> >op = XEXP (op, 0);
> >if (GET_CODE (op) == PLUS)
> >  op = XEXP (op, 0);
> >
> >if (lra_in_progress)
> >  {
> >fprintf (stderr, "%s:%d\n", __FILE__, __LINE__);
> >return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
> >reg_renumber[REGNO(op)] < 0;
> >  }
> >
> >
> >if (REG_P (op))
> >  {
> >int regno = REGNO (op);
> >return (regno == 10); // register is the stack pointer
> >  }
> >
> >return true;
> >  })
> >
> >  (and many variations)  Unfortunately, any moderately complicated input
> >  still results in a (mem (reg) ) insn repeatedly entering the
> >  lra_in_progress case and returning false, and eventually terminating with
> >  
> >  "internal compiler error: maximum number of generated reload insns per 
> >  insn achieved (90)"
> >
> >
> >Any other ideas?
>   As I remember there were a few other ideas from Richard Biener and 
> Segher Boessenkool.  I also proposed to add a new address register which 
> will be always a fixed stack memory slot at the end. Unfortunately I am 
> not familiar with the target and the port to say in details how to do 
> it.  But I think it is worth to try.

The m68hc11 port used the fake Z register approach, and I believe it had
some special machine pass to get rid of it right before assembler output.

(r171302 is when it was removed -- last version was
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061
for the machine reorg stuff).

No idea how well it works...  But it's only needed if you are forced to
have a frame pointer IIUC?


Segher


Re: [GSoC-19] Expanding fromfp variants on AArch64

2019-08-19 Thread Joseph Myers
On Mon, 19 Aug 2019, Tejas Joshi wrote:

> How can I add a target hook to specify the FP_INT_* values from libm ?

See target.def.

You'll need a GCC-specific enum (GCC_FP_INT_*, say) that GCC uses 
internally, and a hook that maps between that and FP_INT_*.  I'm guessing 
that for the likely uses, maybe the hook should map from FP_INT_* to 
GCC_FP_INT_* (so it gets used on constant arguments to the built-in 
function to say which rounding direction they are in GCC's internal enum).  
It will need to be able to return that a constant doesn't map to a known 
rounding mode (not an error, just means that call can't be expanded inline 
or optimized to a constant).

Then the relevant macro giving the default for glibc systems should be 
defined in config/gnu-user.h (see how it defines e.g. 
TARGET_LIBC_HAS_FUNCTION).

> Also as this includes rounding to integers, does it involve any RTL
> related complications that we have encountered in FADD ?

The new RTL would effectively be variants of the fix_trunc and 
fixuns_trunc patterns, which can use (fix) and (unsigned_fix) RTL; the new 
variants would take an argument in a floating-point mode, returning one in 
an integer mode - but with extra information involved about the number of 
bits, rounding direction, handling of "inexact".

fix_trunc and fixuns_trunc / (fix) and (unsigned_fix) always use 
FP_INT_TOWARDZERO, always use the width of the mode and have unspecified 
"inexact" handling for non-integer in-range values (they correspond to C 
casts) so are not themselves suitable for implementing the new built-in 
functions (but the particular instructions those patterns expand to are 
likely to be suitable for certain arguments to certain of the new built-in 
functions).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: asking for __attribute__((aligned()) clarification

2019-08-19 Thread Paul Koning



> On Aug 19, 2019, at 10:08 AM, Alexander Monakov  wrote:
> 
> On Mon, 19 Aug 2019, Richard Earnshaw (lists) wrote:
> 
>> Correct, but note that you can only pack structs and unions, not basic types.
>> there is no way of under-aligning a basic type except by wrapping it in a
>> struct.
> 
> I don't think that's true. In GCC-9 the doc for 'aligned' attribute has been
> significantly revised, and now ends with
> 
>  When used as part of a typedef, the aligned attribute can both increase and
>  decrease alignment, and specifying the packed attribute generates a warning. 
> 
> (but I'm sure defacto behavior of accepting and honoring reduced alignment on
> a typedef'ed scalar type goes way earlier than gcc-9)

Interesting.  It certainly wasn't that way a decade ago.  And for the old code 
pattern to generate a warning seems like a bad incompatible change.  Honoring 
reducing alignments is one thing, complaining about packed is not good.

paul



Re: asking for __attribute__((aligned()) clarification

2019-08-19 Thread Alexander Monakov
On Mon, 19 Aug 2019, Richard Earnshaw (lists) wrote:

> Correct, but note that you can only pack structs and unions, not basic types.
> there is no way of under-aligning a basic type except by wrapping it in a
> struct.

I don't think that's true. In GCC-9 the doc for 'aligned' attribute has been
significantly revised, and now ends with

  When used as part of a typedef, the aligned attribute can both increase and
  decrease alignment, and specifying the packed attribute generates a warning. 

(but I'm sure defacto behavior of accepting and honoring reduced alignment on
a typedef'ed scalar type goes way earlier than gcc-9)

Alexander


Re: asking for __attribute__((aligned()) clarification

2019-08-19 Thread Richard Earnshaw (lists)

On 19/08/2019 14:57, Paul Koning wrote:




On Aug 19, 2019, at 8:46 AM, Markus Fröschle  wrote:

All,

this is my first post on these lists, so please bear with me.

My question is about gcc's __attribute__((aligned()). Please consider the 
following code:

#include 

typedef uint32_t uuint32_t __attribute__((aligned(1)));

uint32_t getuuint32(uint8_t p[]) {
return *(uuint32_t*)p;
}

This is meant to prevent gcc to produce hard faults/address errors on 
architectures that do not support unaligned access to shorts/ints (e.g some 
ARMs, some m68k). On these architectures, gcc is supposed to replace the 32 bit 
access with a series of 8 or 16 bit accesses.

I originally came from gcc 4.6.4 (yes, pretty old) where this did not work and 
gcc does not respect the aligned(1) attribute for its code generation (i.e. it 
generates a 'normal' pointer dereference, consequently crashing when the code 
executes). To be fair, it is my understanding that the gcc manuals never 
promised this *would* work.


That has never been my understanding.  I've always read the manual to say that "aligned" 
only INCREASES the alignment.  The normal alignment is that specified by the ABI for the given data 
type (often, but not always, the size of the primitive type) -- or it is 1 for "packed".

So I use __attribute__ ((packed)) to request byte alignment, and, say, 
__attribute__ ((packed, aligned(2))) to specify alignment to 2 byte multiples.

paul




Correct, but note that you can only pack structs and unions, not basic 
types.  there is no way of under-aligning a basic type except by 
wrapping it in a struct.


R.


Re: asking for __attribute__((aligned()) clarification

2019-08-19 Thread Paul Koning



> On Aug 19, 2019, at 8:46 AM, Markus Fröschle  wrote:
> 
> All,
> 
> this is my first post on these lists, so please bear with me.
> 
> My question is about gcc's __attribute__((aligned()). Please consider the 
> following code:
> 
> #include 
> 
> typedef uint32_t uuint32_t __attribute__((aligned(1)));
> 
> uint32_t getuuint32(uint8_t p[]) {
>return *(uuint32_t*)p;
> }
> 
> This is meant to prevent gcc to produce hard faults/address errors on 
> architectures that do not support unaligned access to shorts/ints (e.g some 
> ARMs, some m68k). On these architectures, gcc is supposed to replace the 32 
> bit access with a series of 8 or 16 bit accesses.
> 
> I originally came from gcc 4.6.4 (yes, pretty old) where this did not work 
> and gcc does not respect the aligned(1) attribute for its code generation 
> (i.e. it generates a 'normal' pointer dereference, consequently crashing when 
> the code executes). To be fair, it is my understanding that the gcc manuals 
> never promised this *would* work.

That has never been my understanding.  I've always read the manual to say that 
"aligned" only INCREASES the alignment.  The normal alignment is that specified 
by the ABI for the given data type (often, but not always, the size of the 
primitive type) -- or it is 1 for "packed".

So I use __attribute__ ((packed)) to request byte alignment, and, say, 
__attribute__ ((packed, aligned(2))) to specify alignment to 2 byte multiples.

paul




Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread Vladimir Makarov



On 2019-08-19 3:35 a.m., John Darrington wrote:

On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote:
  
  
  No I meant something like that
  
  (define_special_memory_constraint "a" ...)

  (define_predicate "my_special_predicate" ...

   {
 if (lra_in_progress_p)
   return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
 return true if memory with sp addressing;
  })
  
  I think LRA spills pseudo-register and it will be memory addressed by sp

  at the end of LRA.

What I've done is this:

(define_predicate "my_special_predicate"
(match_operand 0 "memory_operand")
  {
debug_rtx (op);
gcc_assert (MEM_P (op));
op = XEXP (op, 0);
if (GET_CODE (op) == PLUS)
  op = XEXP (op, 0);

if (lra_in_progress)
  {
fprintf (stderr, "%s:%d\n", __FILE__, __LINE__);
return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
  }


if (REG_P (op))
  {
int regno = REGNO (op);
return (regno == 10); // register is the stack pointer
  }

return true;
  })

  (and many variations)  Unfortunately, any moderately complicated input
  still results in a (mem (reg) ) insn repeatedly entering the
  lra_in_progress case and returning false, and eventually terminating with
  
  "internal compiler error: maximum number of generated reload insns per insn achieved (90)"



Any other ideas?
  As I remember there were a few other ideas from Richard Biener and 
Segher Boessenkool.  I also proposed to add a new address register which 
will be always a fixed stack memory slot at the end. Unfortunately I am 
not familiar with the target and the port to say in details how to do 
it.  But I think it is worth to try.


Re: Expansion of narrowing math built-ins into power instructions

2019-08-19 Thread Segher Boessenkool
Hi Richard,

On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote:
> Tejas Joshi  writes:
> >> It's just a different name, nothing more, nothing less.  Because it is
> >> a different name it can not be accidentally generated from actual
> >> truncations.
> >
> > I have introduced float_narrow but I could not find appropriate places
> > to generate it for a call to fadd instead it to generate a CALL. I
> > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
> > confused with the rtx codes and passes which generate respective RTL.
> > It should not be similar to FLOAT_TRUNCATE if we want to avoid it
> > generating for actual truncations?
> 
> Please don't do it this way.  The whole point of the work is that this
> is a single operation that cannot be modelled as a post-processing of
> a normal double addition result.  It's a single operation at the source
> level, a single IFN, a single optab, and a single instruction.  Splitting
> it apart into two operations for rtl only, and making it look in rtl terms
> like a post-processing of a normal addition result, seems like it's going
> to come back to bite us.
> 
> In lisp terms we're saying that the operand to the float_narrow is
> implicitly quoted:
> 
>   (float_narrow:m '(plus:n a b))
> 
> so that when float_narrow is evaluated, the argument is the unevaluated
> rtl expression "(plus a b)" rather than the evaluated result a + b.
> float_narrow then does its own evaluation of a and b and performs a
> fused addition and narrowing on the result.

RTL isn't Lisp.  RTL doesn't have quotations.  RTL doesn't have
*evaluation*.

RTL is just a data structure that describes your program instructions.
A large part of what means what is system-specific.  Rounding of floating
point is not defined, for example.

And yes, various parts of GCC can manipulate RTL, doing substitution and
algebraic simplication and whatnot.  All within the rules of RTL.  And
that means nothing ever can "pass" a float_narrow, because there are no
rules that allow it to.

> No other rtx rvalue works like this.

A lot of unspecs are used like this, for example.

> Using float_narrow would also be inconsistent with the way we handle
> saturating arithmetic.  There we use US_PLUS and SS_PLUS rtx codes for
> unsigned and signed saturating plus respectively, rather than:
> 
>   (unsigned_sat '(plus a b))
>   (signed_sat '(plus a b))
> 
> Using dedicated codes might seem clunky.  But it's simple, safe, and fits
> the existing model without special cases. :-)

And you need many many more RTX codes, which you will not handle in
almost all places, because there are too many.


I agree this construct is not as nice as could be hoped for.  I don't
agree that 60 new RTX codes is an acceptable solution (or that that will
ever really work out, even).


It would be nice if somehow we could make a variant of RTL codes, so that
we could have nice and simple code that applies to all variants of some
code.  Not sure how that would work out.  Maybe we don't have to do this
very generically, how often will we need this anyway?

I have three examples so far:
1) Saturating arithmetic;
2) This float_narrow thing;
3) Ordered compares, that is, fp compares that set an exception on NaNs.

Something that works for all three would be nice!


Segher


asking for __attribute__((aligned()) clarification

2019-08-19 Thread Markus Fröschle
All,

this is my first post on these lists, so please bear with me.

My question is about gcc's __attribute__((aligned()). Please consider the 
following code:

#include 

typedef uint32_t uuint32_t __attribute__((aligned(1)));

uint32_t getuuint32(uint8_t p[]) {
return *(uuint32_t*)p;
}

This is meant to prevent gcc to produce hard faults/address errors on 
architectures that do not support unaligned access to shorts/ints (e.g some 
ARMs, some m68k). On these architectures, gcc is supposed to replace the 32 bit 
access with a series of 8 or 16 bit accesses.

I originally came from gcc 4.6.4 (yes, pretty old) where this did not work and 
gcc does not respect the aligned(1) attribute for its code generation (i.e. it 
generates a 'normal' pointer dereference, consequently crashing when the code 
executes). To be fair, it is my understanding that the gcc manuals never 
promised this *would* work.

As - at least as far as I can tell - documentation didn't really change 
regarding lowering alignment for variables and does not appear to say anything 
specific regarding pointer dereference on single, misaligned variables, I was 
pretty astonished to see this working on newer gcc versions (tried 6.2 and 
7.4), however. gcc appears to even know the differences within an architecture 
(68000 generates a bytewise copy while ColdFire - that supports unaligned 
access - copies a 32 bit value).

My question: is this now intended behaviour we can rely on? If yes, can we have 
documentation upgraded to clearly state that this use case is valid?

Thank you.
Markus


[GSoC-19] Expanding fromfp variants on AArch64

2019-08-19 Thread Tejas Joshi
Hello.
The fromfp/fromfpx variants round to integers with a specified number
of bits, to a specified rounding mode. They come with their own
complications as Joseph had stated in an initial mail and expected to
expand them in AArch64 :

> The fromfp / fromfpx / ufromfp / ufromfpx functions (round to integers
> of a specified number of bits, in a specified rounding mode, with
> specified handling of inexact results) are a case with some other
> complications.  Typically I'd expect them to be expanded inline only (for
> constant arguments or) for constant values of the number of bits and
> rounding mode, if the target machine has an appropriate instruction.  A
> target hook would need adding for a target to specify the FP_INT_* values
> used in libm, since that's an ABI that's defined by libm, not by GCC.
> Then you'd need instruction patterns that might only be supported in
> certain cases.

How can I add a target hook to specify the FP_INT_* values from libm ?
Also as this includes rounding to integers, does it involve any RTL
related complications that we have encountered in FADD ?

Thanks,
Tejas


Re: Expansion of narrowing math built-ins into power instructions

2019-08-19 Thread Tejas Joshi
> but an unspec is of course easiest for now.

So, at this point, should I proceed with UNSPEC considering the
complications that might arise as Richard points out?


On Sat, 17 Aug 2019 at 13:51, Richard Sandiford
 wrote:
>
> Tejas Joshi  writes:
> > Hi,
> >
> >> It's just a different name, nothing more, nothing less.  Because it is
> >> a different name it can not be accidentally generated from actual
> >> truncations.
> >
> > I have introduced float_narrow but I could not find appropriate places
> > to generate it for a call to fadd instead it to generate a CALL. I
> > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
> > confused with the rtx codes and passes which generate respective RTL.
> > It should not be similar to FLOAT_TRUNCATE if we want to avoid it
> > generating for actual truncations?
>
> Please don't do it this way.  The whole point of the work is that this
> is a single operation that cannot be modelled as a post-processing of
> a normal double addition result.  It's a single operation at the source
> level, a single IFN, a single optab, and a single instruction.  Splitting
> it apart into two operations for rtl only, and making it look in rtl terms
> like a post-processing of a normal addition result, seems like it's going
> to come back to bite us.
>
> In lisp terms we're saying that the operand to the float_narrow is
> implicitly quoted:
>
>   (float_narrow:m '(plus:n a b))
>
> so that when float_narrow is evaluated, the argument is the unevaluated
> rtl expression "(plus a b)" rather than the evaluated result a + b.
> float_narrow then does its own evaluation of a and b and performs a
> fused addition and narrowing on the result.
>
> No other rtx rvalue works like this.  rtx nappings like simplification
> or evaluation are normally depth-first, so that the mapping is applied
> to the operands first, and then the root is mapped/simplified/evaluated
> with the results.  Adding implicit lisp quoting would require special
> cases in these routines for float_narrow.
>
> The only current analogue I can think of for this is the handling
> of zero_extend on const_ints.  Because const_ints are modeless, we have
> to avoid cases in which the recursion produces things like:
>
>   (zero_extend:m (const_int -1))
>
> because it's no longer clear what mode the zero_extend is extending from.
> But I think that's seen as a wart of having modeless const_ints.  I don't
> think it's something we should actively embrace by adding float_narrow.
>
> Using float_narrow would also be inconsistent with the way we handle
> saturating arithmetic.  There we use US_PLUS and SS_PLUS rtx codes for
> unsigned and signed saturating plus respectively, rather than:
>
>   (unsigned_sat '(plus a b))
>   (signed_sat '(plus a b))
>
> Using dedicated codes might seem clunky.  But it's simple, safe, and fits
> the existing model without special cases. :-)
>
> Thanks,
> Richard
>
> >
> > Thanks,
> > Tejas
> >
> >
> > On Fri, 16 Aug 2019 at 15:53, Richard Sandiford
> >  wrote:
> >>
> >> Segher Boessenkool  writes:
> >> > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote:
> >> >> Tejas Joshi  writes:
> >> >> > Hello.
> >> >> > I just wanted to make sure that I am looking at the correct code here.
> >> >> > Except for rtl.def where I should be introducing something like
> >> >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
> >> >
> >> > I like that "float_narrow" name :-)
> >> >
> >> >> > set on functions around expr.c, cfgexpand.c where I grep for
> >> >> > float_truncate/FLOAT_TRUNCATE did not hit.
> >> >> > Also, in what manner should float_contract/narrow be different from
> >> >> > float_truncate as both are trying to do similar things? (truncation
> >> >> > from DF to SF)
> >> >>
> >> >> I think the code should instead be a fused addition and truncation,
> >> >> a bit like FMA is a fused addition and multiplication.  Describing it as
> >> >> a DFmode addition followed by some conversion to SF would still involve
> >> >> double rounding.
> >> >
> >> > How so?  It would *mean* there is only single rounding, even!  That's
> >> > the whole point of it.
> >>
> >> But a PLUS should behave as a PLUS in any context.  Making its
> >> behaviour dependent on the containing rtxes (if any) would be a
> >> can of worms.
> >>
> >> Richard


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread John Darrington
On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote:
 
 
 No I meant something like that
 
 (define_special_memory_constraint "a" ...)
 (define_predicate "my_special_predicate" ...

  {
if (lra_in_progress_p)
  return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
return true if memory with sp addressing;
 })
 
 I think LRA spills pseudo-register and it will be memory addressed by sp
 at the end of LRA.

What I've done is this:

(define_predicate "my_special_predicate"
(match_operand 0 "memory_operand")
 {
   debug_rtx (op);
   gcc_assert (MEM_P (op));
   op = XEXP (op, 0);
   if (GET_CODE (op) == PLUS)
 op = XEXP (op, 0);

   if (lra_in_progress)
 {
   fprintf (stderr, "%s:%d\n", __FILE__, __LINE__);
   return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
 }


   if (REG_P (op))
 {
   int regno = REGNO (op);
   return (regno == 10); // register is the stack pointer
 }

   return true;
 })

 (and many variations)  Unfortunately, any moderately complicated input
 still results in a (mem (reg) ) insn repeatedly entering the
 lra_in_progress case and returning false, and eventually terminating with
 
 "internal compiler error: maximum number of generated reload insns per insn 
achieved (90)"


Any other ideas?

J'