[RFC] GCC 8 Project proposal: Extensions supporting C Metaprogramming, pseudo-templates

2017-05-08 Thread Daniel Santos
I would like to make some changes in GCC 8, so I thought that formal 
proposal and RFC would be the best path. I'm still relatively new to the 
GCC project.


I began experimenting with C metaprogramming techniques back in 2012, in 
order to implement more efficient generic libraries in C. The code is 
ANSI C compliant and can build anywhere, but exploits GCC attributes, 
checks and optimizations (when available) to assure optimal efficiency. 
The underlying mechanisms involve the exploitation of constant 
propagation and -findirect-inline to cause an inline function to behave 
similar to a C++ templatized function that is instantiated at the call 
site. The primary aim is to facilitate high-performance generic C 
libraries for software where C++ is not suitable, but the cost of 
run-time abstraction is unacceptable. A good example is the Linux 
kernel, where the source tree is littered with more than 100 hand-coded 
or boiler-plate (copy, paste and edit) search cores required to use the 
red-black tree library.


Here is a brief example of a simplified qsort algo:

/* A header file for a generic qsort library.  */

struct qsort_def {
size_t size;
size_t align;
int (*less_r)(const void *a, const void *b, void *context);
};

inline __attribute__((always_inline, flatten)) void
qsort_template(const struct qsort_def *def, void *const pbase, size_t n,
   void *context)
{
  /* details omitted... */
}

/* An implementation file using qsort.  */

static inline my_less_r (const void *a, const void *b, void *context)
{
  const struct my_struct *_a = a;
  const struct my_struct *_b = b;

  return _a->somefield < _b->somefield;
}

static const struct qsort_def my_qsort_def = {
.size   = sizeof (struct my_struct),
.align  = 16,
.less_r = my_less_r,
};

void __attribute__((flatten)) my_sort (struct my_struct *o, size_t n)
{
  qsort_template (&my_qsort_def, o, n);
}


The purpose of the "my_sort" wrapper function is to contain the template 
expansion. Beginning with GCC 4.4, when qsort_template inline expansion 
occurs, the entire struct qsort_def is compiled away. The "my_less_r" 
function is inlined, and by using __builtin_assume_aligned(), all 
manipulation of data is performed with the best available instructions 
and unneeded memcpy alignment pro/epilogues are omitted. This results in 
code that's both faster and smaller and is analogous in form and 
performance to C++'s std::sort, except that we are using the first 
parameter (const struct qsort_def *def) in lieu of formalized template 
parameters.


To further the usefulness of such techniques, I propose the addition of 
a c-family attribute to declare a parameter, variable (and possibly 
other declarations) as "constprop" or some similar word. The purpose of 
the attribute is to:


1.) Emit a warning or error when the value is not optimized away, and
2.) Direct various optimization passes to prefer (or force) either 
cloning or inlining of a function with such a parameter.


This will enable the use of pseudo-templates and:

1.) Eliminate the need for __attribute__((always_inline, flatten)) and 
complicated ASSERT_CONST() macros,

2.) Eliminate the need for an __attribute__((flatten) wrapper function,
3.) Reduce the need for the programmer to think about what the compiler 
is doing, and

4.) Allow gcc to decide rather inlining or cloning is better.

While not as powerful as C++ template metapgramming (type programming), 
there are certainly many more possibilities that haven't yet been 
discovered. I would like to be able to put something in GCC 8. Below is 
my current progress.


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index f2a88e147ba..5ec7b615e24 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -139,6 +139,7 @@ static tree handle_bnd_variable_size_attribute (tree *, 
tree, tree, int, bool *)
 static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
 static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
 static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *);
+static tree handle_constprop_attribute (tree *, tree , tree , int , bool *);
 
 /* Table of machine-independent attributes common to all C-like languages.
 
@@ -345,6 +346,8 @@ const struct attribute_spec c_common_attribute_table[] =

  handle_bnd_instrument, false },
   { "fallthrough", 0, 0, false, false, false,
  handle_fallthrough_attribute, false },
+  { "constprop", 0, 0, false, false, false,
+ handle_constprop_attribute, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
@@ -3173,3 +3176,15 @@ handle_fallthrough_attribute (tree *, tree name, tree, int,

   *no_add_attrs = true;
   return NULL_TREE;
 }
+
+static tree
+handle_constprop_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+   int flags, bo

Re: GCC target_clone support (functionality question)

2017-05-08 Thread Jeff Law

On 05/06/2017 12:44 AM, Richard Sandiford wrote:

Michael Meissner  writes:

This message is separated from the question about moving code, as it is a
questions about the functionality of target_clone support.

Right now it looks like target_clone only generates the ifunc handler if there
is a call to the function in the object file.  It does not generate the ifunc
handler if there is no call.

For the default function, it generates the normal name.  This means that any
function that calls the function from a different object module will only get
the standard function.  From a library and user perspective, I think this is
wrong.  Instead the default function should be generated with a different name,
and the ifunc function should list the standard name.  Then you don't have to
change all of the other calls in the object file, the normal ifunc handling
will handle it.  It also means you can more easily put this function in a
library and automatically call the appropriate version.


Yeah, that sounds much more useful.  One thing on my wish list is to
support target_clone for Advanced SIMD vs. SVE on AArch64 (although
it's unlikely that'll happen in the GCC 8 timeframe).  I'd naively
assumed it would already work in the way you suggested.

Likewise.
jeff


Re: Question about dump_printf/dump_printf_loc

2017-05-08 Thread Jeff Law

On 05/05/2017 04:37 PM, Steve Ellcey wrote:

I have a simple question about dump_printf and dump_printf_loc.  I notice
that most (all?) of the uses of these function are of the form:

if (dump_enabled_p ())
dump_printf_loc (MSG_*, ..);

Since dump_enabled_p() is just checking to see if dump_file or alt_dump_file
is set and since dump_printf_loc has checks for these as well, is there
any reason why we shouldn't or couldn't just use:

dump_printf_loc (MSG_*, ..);

with out the call to dump_enabled_p and have the dump function do nothing
when there is no dump file set?  I suppose the first version would have
some performance advantage since dump_enabled_p is an inlined function,
but is that enough of a reason to do it?  The second version seems like
it would look cleaner in the code where we are making these calls.

I doubt the performance difference is measurable.  More likely than not 
it's just habit.


I tend to perfer the second as it makes the code easier (IMHO) to read.

jeff


Re: Backporting Patches to GCC 7

2017-05-08 Thread Jonathan Wakely
On 5 May 2017 at 21:35, Palmer Dabbelt wrote:
> I just submitted two patches against trunk.  I'd like to also have them on the
> 7 branch, so when 7.2 comes out we'll have them.  These patches only touch the
> RISC-V backend, which I'm a maintainer of.  Is there a branch maintainer I'm
> supposed to have sign off on the patches or am I meant to just decide on my 
> own
> what I should commit?
>
> For reference, here's the patches
>
> 284b54c RISC-V: Add -mstrict-align option
> 70218e8 RISC-V: Unify indention in riscv.md

In general, backports that aren't fixing regressions or documentation
would need release managers approval. There's some leeway for target
maintainers of ports and other subsystems, for example I sometimes
make executive decisions about the C++ runtime libraries when the
backport only affects an isolated part of the library, or is clearly
safe and an obvious improvement. For bigger changes that aren't
regressions but I'd like to backport I still seek RM approval.

I would guess that for RISC-V which is new in 7.1, if you think the
backport is important and it doesn't affect other targets then it
should be OK.

Maybe one of the release managers can confirm that though.


Re: Feature Request for C

2017-05-08 Thread Florian Weimer via gcc

On 05/06/2017 07:09 PM, Taylor Holberton wrote:

Except instead of using c++ style mangling, it would simply just
prepend the name of the namespace to the symbols in the file.


Would this also apply to type names and struct tags?  How does it 
interfere with the name resolution in the body of inline functions?


Personally, I don't think we need to duplicate more things from C++ into 
C as GNU extensions.  If you want C++ features, just use a C++ subset.


Florian


Re: Question about loop induction variables

2017-05-08 Thread Richard Biener via gcc
On Sun, May 7, 2017 at 11:22 AM, Fredrik Hederstierna
 wrote:
> Hi,
>
> I have a question about loop induction variables, related to
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67213
>
> Consider a simple loop like
>
>   int ix;
>   for (ix = 0; ix < 6; ix++) {
> data[ix] = ix;
>   }
>
> In this case variable 'ix' is used as counting variable for array index,
> but also used as value for storage, so in some sense its used in two 
> different "ways".
>
> Several target architectures might have support to auto-increment pointer 
> registers when storing or reading data,
> but if same variable is used as both array index to pointer and as data it 
> might be more complicated?
>
> In the example above, it seems like GCC loop analyzer (for completely 
> peeling) thinks 'ix' is an induction variable that can be folded away:
> from 'tree-ssa-loop-ivcanon.c':
>
>   Loop 1 iterates 5 times.
>   Loop 1 iterates at most 5 times.
>   Estimating sizes for loop 1
>BB: 3, after_exit: 0
> size:   0 _4 = (char) i_9;
>  Induction variable computation will be folded away.
> size:   1 data[i_9] = _4;
> size:   1 i_6 = i_9 + 1;   <- OK?
>  Induction variable computation will be folded away.
> size:   1 ivtmp_7 = ivtmp_1 - 1;
>  Induction variable computation will be folded away.
> size:   2 if (ivtmp_7 != 0)
>  Exit condition will be eliminated in peeled copies.
>BB: 4, after_exit: 1
>   size: 5-4, last_iteration: 5-2
> Loop size: 5
> Estimated size after unrolling: 5
>
> Then index 'ix' is considered to be a induction variable, but possibly it 
> cannot be simply folded since its used in other ways as well?

Well, but all the listed stmts _can_ be folded away because the
starting value is zero and thus they "fold away" to constants.

> Though completely-peeling loop resulted in longer code:
>
>
>   int i;
>   char _4;
>   unsigned int ivtmp_7;
>   char _12;
>   unsigned int ivtmp_15;
>   char _19;
>   unsigned int ivtmp_22;
>   char _26;
>   unsigned int ivtmp_29;
>   char _33;
>   unsigned int ivtmp_36;
>   char _40;
>   unsigned int ivtmp_43;
>
>   :
>   _12 = 0;
>   data[0] = _12;
>   i_14 = 1;
>   ivtmp_15 = 5;
>   _19 = (char) i_14;
>   data[i_14] = _19;
>   i_21 = i_14 + 1;
>   ivtmp_22 = ivtmp_15 + 4294967295;
>   _26 = (char) i_21;
>   data[i_21] = _26;
>   i_28 = i_21 + 1;
>   ivtmp_29 = ivtmp_22 + 4294967295;
>   _33 = (char) i_28;
>   data[i_28] = _33;
>   i_35 = i_28 + 1;
>   ivtmp_36 = ivtmp_29 + 4294967295;
>   _40 = (char) i_35;
>   data[i_35] = _40;
>   i_42 = i_35 + 1;
>   ivtmp_43 = ivtmp_36 + 4294967295;
>   _4 = (char) i_42;
>   data[i_42] = _4;
>   i_6 = i_42 + 1;
>   ivtmp_7 = ivtmp_43 + 4294967295;
>   return;
>
>
> instead of original and shorter
>
>   int i;
>   unsigned int ivtmp_1;
>   char _4;
>   unsigned int ivtmp_7;
>
>   :
>
>   :
>   # i_9 = PHI 
>   # ivtmp_1 = PHI 
>   _4 = (char) i_9;
>   data[i_9] = _4;
>   i_6 = i_9 + 1;
>   ivtmp_7 = ivtmp_1 - 1;
>   if (ivtmp_7 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   goto ;
>
>   :
>   return;

The cost metrics assume constant propagation happened which in the
above code didn't yet:

>   i_14 = 1;
>   _19 = (char) i_14;
...

So you are comparing (visually) apples and oranges.

>
> Example for ARM target machine code it became longer:
>
> 001c :
>   1c:   e59f3030ldr r3, [pc, #48]   ; 54 
>   20:   e3a02000mov r2, #0
>   24:   e5c32000strbr2, [r3]
>   28:   e3a02001mov r2, #1
>   2c:   e5c32001strbr2, [r3, #1]
>   30:   e3a02002mov r2, #2
>   34:   e5c32002strbr2, [r3, #2]
>   38:   e3a02003mov r2, #3
>   3c:   e5c32003strbr2, [r3, #3]
>   40:   e3a02004mov r2, #4
>   44:   e5c32004strbr2, [r3, #4]
>   48:   e3a02005mov r2, #5
>   4c:   e5c32005strbr2, [r3, #5]
>   50:   e12fff1ebx  lr
>   54:   .word   0x
>
> compared to if complete-loop-peeling was not done
>
> 001c :
>   1c:   e59f2014ldr r2, [pc, #20]   ; 38 
>   20:   e3a03000mov r3, #0
>   24:   e7c33002strbr3, [r3, r2]
>   28:   e2833001add r3, r3, #1
>   2c:   e3530006cmp r3, #6
>   30:   1afbbne 24 
>   34:   e12fff1ebx  lr
>   38:   .word   0x
>
> Producing 15 instead of 8 words, giving ~100% larger code size.
>
> And same for x86 arch:
>
>f:   c6 05 00 00 00 00 00movb   $0x0,0x0(%rip)# 16
> 
>   16:   c6 05 00 00 00 00 01movb   $0x1,0x0(%rip)# 1d
> 
>   1d:   c6 05 00 00 00 00 02movb   $0x2,0x0(%rip)# 24
> 
>   24:   c6 05 00 00 00 00 03movb   $0x3,0x0(%rip)# 2b
> 
>   2b:   c6 05 00 00 00 00 04movb   $0x4,0x0(%rip)# 32
> 
>   32:   c6 05 00 00 00 00 05movb   $0x5,0x0(%rip)# 39
> 
>   39:   c3  retq
>
>
> Without com

Re: Question about dump_printf/dump_printf_loc

2017-05-08 Thread Richard Biener via gcc
On Sat, May 6, 2017 at 12:37 AM, Steve Ellcey  wrote:
> I have a simple question about dump_printf and dump_printf_loc.  I notice
> that most (all?) of the uses of these function are of the form:
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_*, ..);
>
> Since dump_enabled_p() is just checking to see if dump_file or alt_dump_file
> is set and since dump_printf_loc has checks for these as well, is there
> any reason why we shouldn't or couldn't just use:
>
> dump_printf_loc (MSG_*, ..);
>
> with out the call to dump_enabled_p and have the dump function do nothing
> when there is no dump file set?  I suppose the first version would have
> some performance advantage since dump_enabled_p is an inlined function,
> but is that enough of a reason to do it?  The second version seems like
> it would look cleaner in the code where we are making these calls.

The purpose of dump_enabled_p () is to save compile-time for the common case,
esp. when guarding multiple dump_* calls.  But also for the single-called case.
You could try improve things by having inline wrappers for all dump_* cases that
inline a dump_enabled_p () call but that would be somewhat gross.

Richard.

> Steve Ellcey
> sell...@cavium.com