Re: Partial PRE optimization causing slowdown

2013-12-13 Thread Richard Biener
On Thu, Dec 12, 2013 at 8:19 PM, Steve Ellcey  wrote:
> I have a question about the partial pre (-ftree-partial-pre) optimization
> that was added in GCC 4.8.  I have noticed that this optimization is slowing
> down the bitmnp01 benchmark in the EEMBC1.1 suite on MIPS.  I see this with
> the 4.8 GCC and with ToT GCC.  Comparing "-O3 -fno-tree-partial-pre" vs.
> just "-O3" on the Tot GCC with MIPS, I see almost a 50% slowdown in the
> benchmark due to the partial pre optimization.

Note that partial PRE wasn't added in 4.8 but much earlier.  But 4.8 got an
option to disable it - which means you have a workaround at your hands now.

Richard.

> I was wondering if anyone else has seen a slowdown like this on other
> platforms or in other benchmarks.  I haven't submitted a bugzilla report
> because I don't have a test case that I can include (EEMBC is licensed).
>
> Steve Ellcey
> sell...@mips.com
>


arm ttype encoding

2013-12-13 Thread Tristan Gingold
Hi,

we are currently working on the use of the arm ehabi for Ada exceptions,
and we aren't sure about which encoding has to be used for ttype.

The patch http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00765.html
explains that on `Older ARM EABI toolchains set this value
[ttype_encoding] incorrectly`, and added the _GLIBCXX_OVERRIDE_TTYPE_ENCODING
workaround.

But that patch doesn't change it: in gcc/config/arm/arm.h,
ASM_PREFERRED_EH_DATA_FORMAT is defined only if ARM_TARGET2_DWARF_FORMAT isn't.

We are not sure wether this is the intent or an overlook.

The position of the #endif is important:

--- gcc/config/arm/arm.h(revision 178807)
+++ gcc/config/arm/arm.h(working copy)
@@ -825,6 +825,16 @@ extern int arm_arch_thumb_hwdiv;
 #define ARM_EH_STACKADJ_REGNUM 2
 #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (SImode, ARM_EH_STACKADJ_REGNUM)
 
+#ifndef ARM_TARGET2_DWARF_FORMAT
+#define ARM_TARGET2_DWARF_FORMAT DW_EH_PE_pcrel
+
+/* ttype entries (the only interesting data references used)
+   use TARGET2 relocations.  */
+#define ASM_PREFERRED_EH_DATA_FORMAT(code, data) \
+  (((code) == 0 && (data) == 1 && ARM_UNWIND_INFO) ? ARM_TARGET2_DWARF_FORMAT \
+  : DW_EH_PE_absptr)
+#endif
+
 /* The native (Norcroft) Pascal compiler for the ARM passes the static chain
as an invisible last argument (possible since varargs don't exist in
Pascal), so the following is not true.  */

Can this be clarified ?

Thanks,
Tristan.



Re: Partial PRE optimization causing slowdown

2013-12-13 Thread Steve Ellcey
On Fri, 2013-12-13 at 11:26 +0100, Richard Biener wrote:
> On Thu, Dec 12, 2013 at 8:19 PM, Steve Ellcey  wrote:
> > I have a question about the partial pre (-ftree-partial-pre) optimization
> > that was added in GCC 4.8.  I have noticed that this optimization is slowing
> > down the bitmnp01 benchmark in the EEMBC1.1 suite on MIPS.  I see this with
> > the 4.8 GCC and with ToT GCC.  Comparing "-O3 -fno-tree-partial-pre" vs.
> > just "-O3" on the Tot GCC with MIPS, I see almost a 50% slowdown in the
> > benchmark due to the partial pre optimization.
> 
> Note that partial PRE wasn't added in 4.8 but much earlier.  But 4.8 got an
> option to disable it - which means you have a workaround at your hands now.
> 
> Richard.

That is interesting because we saw this slowdown on MIPS between 4.7 and
4.8 and someone tracked it down to this patch:


2012-11-16  Jan Hubicka  

PR tree-optimization/54717
* tree-ssa-pre.c (do_partial_partial_insertion): Consider also edges
with ANTIC_IN.

Steve Ellcey
sell...@mips.com




The Linux binutils 2.24.51.0.2 is released

2013-12-13 Thread H.J. Lu
It is also available as hjl/linux/release/2.24.51.0.2 tag at
 
https://sourceware.org/git/?p=binutils-gdb.git;a=summary


H.J.
---
This is the beta release of binutils 2.24.51.0.2 for Linux, which is
based on binutils 2013 1213 master branch on sourceware.org plus
various changes. It is purely for Linux.

All relevant patches in patches have been applied to the source tree.
You can take a look at patches/README to see what have been applied and
in what order they have been applied.

Starting from the 2.23.52.0.2 release, when creating executables, BFD
linker will issue an error for undefined weak reference which is
defined in a shared library from DT_NEEDED.  Previously BFD linker
will silently include the shared library from DT_NEEDED.

Starting from the 2.21.51.0.3 release, you must remove .ctors/.dtors
section sentinels when building glibc or other C run-time libraries.
Otherwise, you will run into:

http://sourceware.org/bugzilla/show_bug.cgi?id=12343

Starting from the 2.21.51.0.2 release, BFD linker has the working LTO
plugin support. It can be used with GCC 4.5 and above. For GCC 4.5, you
need to configure GCC with --enable-gold to enable LTO plugin support.

Starting from the 2.21.51.0.2 release, binutils fully supports compressed
debug sections.  However, compressed debug section isn't turned on by
default in assembler. I am planning to turn it on for x86 assembler in
the future release, which may lead to the Linux kernel bug messages like

WARNING: lib/ts_kmp.o (.zdebug_aranges): unexpected non-allocatable section.

But the resulting kernel works fine.

Starting from the 2.20.51.0.4 release, no diffs against the previous
release will be provided.

You can enable both gold and bfd ld with --enable-gold=both.  Gold will
be installed as ld.gold and bfd ld will be installed as ld.bfd.  By
default, ld.bfd will be installed as ld.  You can use the configure
option, --enable-gold=both/gold to choose gold as the default linker,
ld.  IA-32 binary and X64_64 binary tar balls are configured with
--enable-gold=both/ld --enable-plugins --enable-threads.

Starting from the 2.18.50.0.4 release, the x86 assembler no longer
accepts

fnstsw %eax

fnstsw stores 16bit into %ax and the upper 16bit of %eax is unchanged.
Please use

fnstsw %ax

Starting from the 2.17.50.0.4 release, the default output section LMA
(load memory address) has changed for allocatable sections from being
equal to VMA (virtual memory address), to keeping the difference between
LMA and VMA the same as the previous output section in the same region.

For

.data.init_task : { *(.data.init_task) }

LMA of .data.init_task section is equal to its VMA with the old linker.
With the new linker, it depends on the previous output section. You
can use

.data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) }

to ensure that LMA of .data.init_task section is always equal to its
VMA. The linker script in the older 2.6 x86-64 kernel depends on the
old behavior.  You can add AT (ADDR(section)) to force LMA of
.data.init_task section equal to its VMA. It will work with both old
and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and
above is OK.

The new x86_64 assembler no longer accepts

monitor %eax,%ecx,%edx

You should use

monitor %rax,%ecx,%edx

or
monitor

which works with both old and new x86_64 assemblers. They should
generate the same opcode.

The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

movl (%eax),%ds
movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

mov (%eax),%ds
mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

movw (%eax),%ds
movw %ds,(%eax)

without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are
available at

http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch
http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch

The ia64 assembler is now defaulted to tune for Itanium 2 processors.
To build a kernel for Itanium 1 processors, you will need to add

ifeq ($(CONFIG_ITANIUM),y)
CFLAGS += -Wa,-mtune=itanium1
AFLAGS += -Wa,-mtune=itanium1
endif

to arch/ia64/Makefile in your kernel source tree.

Please report any bugs related to binutils 2.24.51.0.2 to
hjl.to...@gmail.com

and

http://www.sourceware.org/bugzilla/

Changes from binutils 2.24.51.0.1:

1. Update from binutils 2013 1213.
2. Fix ld and objcopy to set the SHF_INFO_LINK bit for SHT_REL/SHT_RELA
sections.  PR 16317.
3. Fix ld and objcopy to properly generate PT_GNU_RELRO segment. PRs
14207/16322/16323.
4. Fix objcopy to copy EI_OSABI field.  PR 16318.
5. Change ld to set e_type in ELF header to ET_EXEC for -pie
-Ttext-segment=.
6. Fix a ld bu

Re: replace do-while macros with static inline functions

2013-12-13 Thread Trevor Saunders
On Wed, Dec 11, 2013 at 08:33:03PM +0530, Prathamesh Kulkarni wrote:
> I was wondering if it was a good idea to replace do-while macros with
> static inline functions returning void, where appropriate ?
> By "where appropriate" I mean:
> a) call to macro contains no side-effects
> b) macro does not modify the arguments.
> c) macro does not use any preprocessor operators (like ##)
> d) macro does not get undefined or is conditionally defined.
> e) macro is not type independent (use inline template for these?)
> f) Any other case ?

in general I'm infavor of replacing macros with unctions / constants /
templates etc.

> Example:
> Consider C_EXPR_APPEND macro defined in c-tree.h:
> 
> /* Append a new c_expr_t element to V.  */
> #define C_EXPR_APPEND(V, ELEM) \
>   do { \
> c_expr_t __elem = (ELEM); \
> vec_safe_push (V, __elem); \
>   } while (0)

Its not my code, but that macro looks like a totally useless
abstruction, why not just inline the vec_safe_push() ?

Trev

> 
> It is called at two places in c-parser.c:
> 0 c-parser.c  6140 C_EXPR_APPEND (cexpr_list, expr);
> 1 c-parser.c  6145 C_EXPR_APPEND (cexpr_list, expr);
> 
> Shall be replaced by:
> 
> static inline void
> C_EXPR_APPEND( vec * V, c_expr_t ELEM)
> {
> vec_safe_push(V, ELEM);
> }
> 
> I will volunteer to do it, if it's accepted.
> 
> Thanks and Regards,
> Prathamesh


Re: replace do-while macros with static inline functions

2013-12-13 Thread Diego Novillo
Bah. Forgot to remove html.

On Fri, Dec 13, 2013 at 2:47 PM, Diego Novillo  wrote:
>
>
>
> On Wed, Dec 11, 2013 at 10:03 AM, Prathamesh Kulkarni
>  wrote:
>>
>> I was wondering if it was a good idea to replace do-while macros with
>> static inline functions returning void, where appropriate ?
>> By "where appropriate" I mean:
>> a) call to macro contains no side-effects
>> b) macro does not modify the arguments.
>> c) macro does not use any preprocessor operators (like ##)
>> d) macro does not get undefined or is conditionally defined.
>> e) macro is not type independent (use inline template for these?)
>> f) Any other case ?
>>
>> Example:
>> Consider C_EXPR_APPEND macro defined in c-tree.h:
>>
>> /* Append a new c_expr_t element to V.  */
>> #define C_EXPR_APPEND(V, ELEM) \
>>   do { \
>> c_expr_t __elem = (ELEM); \
>> vec_safe_push (V, __elem); \
>>   } while (0)
>>
>
> Yes, it's a good idea.  One thing you will likely need to do, however, is to
> add some of the replaced functions into gdb's list of ignored inline
> functions. Some folks are used to gdb stepping over macro calls when 's' is
> used.
>
> Not all macros need to be turned into functions. Some can be completely
> removed (Trevor mentions an example in his reply).
>
>
> Diego.


Re: replace do-while macros with static inline functions

2013-12-13 Thread Ondřej Bílka
On Fri, Dec 13, 2013 at 02:42:23PM -0500, Trevor Saunders wrote:
> On Wed, Dec 11, 2013 at 08:33:03PM +0530, Prathamesh Kulkarni wrote:
> > I was wondering if it was a good idea to replace do-while macros with
> > static inline functions returning void, where appropriate ?
> > By "where appropriate" I mean:
> > a) call to macro contains no side-effects
> > b) macro does not modify the arguments.
> > c) macro does not use any preprocessor operators (like ##)
> > d) macro does not get undefined or is conditionally defined.
> > e) macro is not type independent (use inline template for these?)
> > f) Any other case ?
> 
> in general I'm infavor of replacing macros with unctions / constants /
> templates etc.
> 
> > Example:
> > Consider C_EXPR_APPEND macro defined in c-tree.h:
> > 
> > /* Append a new c_expr_t element to V.  */
> > #define C_EXPR_APPEND(V, ELEM) \
> >   do { \
> > c_expr_t __elem = (ELEM); \
> > vec_safe_push (V, __elem); \
> >   } while (0)
> 
> Its not my code, but that macro looks like a totally useless
> abstruction, why not just inline the vec_safe_push() ?
> 
Anyway if you inline macros you typically need to use always_inline.

Quite often gcc makes mistake of not inlining these, a body looks much
larger than actual inline expansion. Which is understandable as reason
of macro could be avoiding a function call overhead. 


Re: replace do-while macros with static inline functions

2013-12-13 Thread Prathamesh Kulkarni
On Sat, Dec 14, 2013 at 1:17 AM, Diego Novillo  wrote:
>
>
>
> On Wed, Dec 11, 2013 at 10:03 AM, Prathamesh Kulkarni
>  wrote:
>>
>> I was wondering if it was a good idea to replace do-while macros with
>> static inline functions returning void, where appropriate ?
>> By "where appropriate" I mean:
>> a) call to macro contains no side-effects
>> b) macro does not modify the arguments.
>> c) macro does not use any preprocessor operators (like ##)
>> d) macro does not get undefined or is conditionally defined.
>> e) macro is not type independent (use inline template for these?)
>> f) Any other case ?
>>
>> Example:
>> Consider C_EXPR_APPEND macro defined in c-tree.h:
>>
>> /* Append a new c_expr_t element to V.  */
>> #define C_EXPR_APPEND(V, ELEM) \
>>   do { \
>> c_expr_t __elem = (ELEM); \
>> vec_safe_push (V, __elem); \
>>   } while (0)
>>
>
> Yes, it's a good idea.  One thing you will likely need to do, however, is to
> add some of the replaced functions into gdb's list of ignored inline
> functions. Some folks are used to gdb stepping over macro calls when 's' is
> used.
>
> Not all macros need to be turned into functions. Some can be completely
> removed (Trevor mentions an example in his reply).

Thanks. I shall start working on it.
>
>
> Diego.