[PATCH, ARM] Fix redefinition of cpp macros with #pragma GCC pop,reset

2016-02-17 Thread Christian Bruel
target_option_current_node, used in c-pragma.c to check if a state
should be popped or reseted to the previous value, was not set when
switching state with #pragma GCC target (I missed to see that, since it
is done for pop,reset). So in some cases the state might not be reset
correctly.

This patch sets it for #pragma GCC target paths and update the comments
as well to clarify this point.

As a benefit we now use this cached value instead of 
build_target_option_node (_options), this should speed up (a
little bit) this path when processing arm_neon.h.

One effect of it is that some predicate tests (e.g arm_neonv2_ok) in the
testsuite was returning the wrong value, thus marking some test as
UNRESOLVED instead of PASS. See the reduced case of the issue attached
is the patch.

Regtested, a few new PASS for -mfpu=neon-fp-armv8








2016-02-17  Christian Bruel  <christian.br...@st.com>

	* config/arm/arm-c.c (arm_option_override): Initialize
	target_option_current_node.
	* config/arm/arm.c (arm_pragma_target_parse): Replace
	build_target_option_node call by target_option_current_node.
	Set target_option_current_node.	Fix comments.
	
2016-02-17  Christian Bruel  <christian.br...@st.com>

	* gcc.target/arm/pragma_cpp_fma.c: New test.

Index: gcc/config/arm/arm-c.c
===
--- gcc/config/arm/arm-c.c	(revision 233489)
+++ gcc/config/arm/arm-c.c	(working copy)
@@ -199,7 +199,7 @@ arm_cpu_cpp_builtins (struct cpp_reader * pfile)
 static bool
 arm_pragma_target_parse (tree args, tree pop_target)
 {
-  tree prev_tree = build_target_option_node (_options);
+  tree prev_tree = target_option_current_node;
   tree cur_tree;
   struct cl_target_option *prev_opt;
   struct cl_target_option *cur_opt;
@@ -220,9 +220,14 @@ arm_pragma_target_parse (tree args, tree pop_targe
 TREE_TARGET_OPTION (prev_tree));
 	  return false;
 	}
+
+  /* handle_pragma_pop_options and handle_pragma_reset_options will set
+   target_option_current_node, but not handle_pragma_target.  */
+  target_option_current_node = cur_tree;
 }
 
-  /* Figure out the previous mode.  */
+  /* Update macros if target_node changes. The global state will be restored
+ by arm_set_current_function.  */
   prev_opt  = TREE_TARGET_OPTION (prev_tree);
   cur_opt   = TREE_TARGET_OPTION (cur_tree);
 
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 233489)
+++ gcc/config/arm/arm.c	(working copy)
@@ -3446,7 +3446,8 @@ arm_option_override (void)
 
   /* Save the initial options in case the user does function specific
  options or #pragma target.  */
-  target_option_default_node = build_target_option_node (_options);
+  target_option_default_node = target_option_current_node
+  = build_target_option_node (_options);
 
   /* Init initial mode for testing.  */
   thumb_flipper = TARGET_THUMB;
Index: gcc/testsuite/gcc.target/arm/pragma_cpp_fma.c
===
--- gcc/testsuite/gcc.target/arm/pragma_cpp_fma.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pragma_cpp_fma.c	(working copy)
@@ -0,0 +1,36 @@
+/* Test that FMA macro is correctly undefined.  */
+/* { dg-do compile } */
+/* { dg-skip-if "Default no fma" { *-*-* } { "-mfpu=*vfpv4*" "-mfpu=*armv8"} } */
+/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-add-options arm_fp } */
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=crypto-neon-fp-armv8")
+
+#ifndef __ARM_FEATURE_FMA
+#error "__ARM_FEATURE_FMA is not defined but should be"
+#endif
+
+#ifndef __ARM_FEATURE_CRYPTO
+#error "__ARM_FEATURE_CRYPTO is not defined but should be"
+#endif
+
+#if __ARM_NEON_FP != 6
+#error "__ARM_NEON_FP"
+#endif
+
+#if __ARM_FP != 14
+#error "__ARM_FP"
+#endif
+
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-vfpv4")
+#pragma GCC pop_options
+
+#ifdef __ARM_FEATURE_FMA
+#error "__ARM_FEATURE_FMA is defined but should not be"
+#endif
+
+


Re: [PATCH, ARM] attribute target (thumb,arm) [3/6] respin (4th)

2015-05-11 Thread Christian Bruel


binunvnEzkM1k.bin
Description: PGP/MIME version identification


encrypted.asc
Description: OpenPGP encrypted message


Re: [PATCH, ARM] attribute target (thumb,arm) [1/6] respin (4th)

2015-05-11 Thread Christian Bruel



OK with those changes.



Ramana



thanks, done

following up the thumb_code cleanup, here is a missing chunk for the 
vxworks config.


arm-vxworks build checked. ok for trunk ?

thanks,

Christian
 2015-05-11  Christian Bruel  christian.br...@st.com
 
	* config/arm/arm-protos.h (thumb_code, thumb1_code): Remove.
	* config/arm/vxworks.h (thumb_code): Replace with TARGET_THUMB.

Index: gcc/config/arm/arm-protos.h
===
--- gcc/config/arm/arm-protos.h	(revision 222997)
+++ gcc/config/arm/arm-protos.h	(working copy)
@@ -462,12 +462,6 @@
 /* Nonzero if tuning for Cortex-A9.  */
 extern int arm_tune_cortex_a9;
 
-/* Nonzero if generating Thumb instructions.  */
-extern int thumb_code;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-extern int thumb1_code;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
Index: gcc/config/arm/vxworks.h
===
--- gcc/config/arm/vxworks.h	(revision 222997)
+++ gcc/config/arm/vxworks.h	(working copy)
@@ -40,7 +40,7 @@
   builtin_define (CPU=ARMARCH5);		\
 else if (arm_arch4)\
   {		\
-	if (thumb_code)\
+	if (TARGET_THUMB)			\
 	  builtin_define (CPU=ARMARCH4_T);	\
 	else	\
 	  builtin_define (CPU=ARMARCH4);	\


Re: [PATCH, ARM] committed: attribute target (thumb,arm) [1/6] respin (4th)

2015-05-11 Thread Christian Bruel

here the p1 patch committed at rev 222995

On 05/11/2015 11:56 AM, Ramana Radhakrishnan wrote:

On Mon, May 11, 2015 at 10:13 AM, Christian Bruel
christian.br...@st.com wrote:



OK with those changes.



Ramana



thanks, done

following up the thumb_code cleanup, here is a missing chunk for the vxworks
config.

arm-vxworks build checked. ok for trunk ?

thanks,

Christian



OK thanks  - please post the version of p1 that you committed for
archival purposes.

Ramana

Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 222994)
+++ gcc/ChangeLog	(revision 222995)
@@ -1,3 +1,15 @@
+2014-09-23  Christian Bruel  christian.br...@st.com
+
+	* config/arm/arm.c (arm_option_override): Reoganized and split into :
+	(arm_option_params_internal); New function.
+	(arm_option_check_internal): New function.
+	(arm_option_override_internal): New function.
+	(thumb_code, thumb1_code): Remove.
+	* config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros.
+	(TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise.
+	(thumb_code, thumb1_code): Remove.
+	* config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag.
+
 2015-05-11  Uros Bizjak  ubiz...@gmail.com
 
 	* config/alpha/alpha.c (alpha_emit_set_const_1)
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 222994)
+++ gcc/config/arm/arm.c	(revision 222995)
@@ -846,12 +846,6 @@
 /* Nonzero if tuning for Cortex-A9.  */
 int arm_tune_cortex_a9 = 0;
 
-/* Nonzero if generating Thumb instructions.  */
-int thumb_code = 0;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-int thumb1_code = 0;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
@@ -2669,6 +2663,150 @@
   return std_gimplify_va_arg_expr (valist, type, pre_p, post_p);
 }
 
+/* Check any incompatible options that the user has specified.  */
+static void
+arm_option_check_internal (struct gcc_options *opts)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+ other command line choices.  */
+  if (TREE_TARGET_ARM (opts)  !(insn_flags  FL_NOTM))
+error (target CPU does not support ARM mode);
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+ from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME)  TREE_TARGET_ARM (opts))
+warning (0, enabling backtrace support is only meaningful when compiling for the Thumb);
+
+  if (TREE_TARGET_ARM (opts)  TARGET_CALLEE_INTERWORKING)
+warning (0, enabling callee interworking support is only meaningful when compiling for the Thumb);
+
+  /* If this target is normally configured to use APCS frames, warn if they
+ are turned off and debugging is turned on.  */
+  if (TREE_TARGET_ARM (opts)
+   write_symbols != NO_DEBUG
+   !TARGET_APCS_FRAME
+   (TARGET_DEFAULT  MASK_APCS_FRAME))
+warning (0, -g with -mno-apcs-frame may not give sensible debugging);
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TREE_TARGET_THUMB (opts)  TARGET_IWMMXT)
+error (iWMMXt unsupported under Thumb mode);
+
+  if (TARGET_HARD_TP  TREE_TARGET_THUMB1 (opts))
+error (can not use -mtp=cp15 with 16-bit Thumb);
+
+  if (TREE_TARGET_THUMB (opts)  TARGET_VXWORKS_RTP  flag_pic)
+{
+  error (RTP PIC is incompatible with Thumb);
+  flag_pic = 0;
+}
+
+  /* We only support -mslow-flash-data on armv7-m targets.  */
+  if (target_slow_flash_data
+   ((!(arm_arch7  !arm_arch_notm)  !arm_arch7em)
+	  || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON)))
+error (-mslow-flash-data only supports non-pic code on armv7-m targets);
+}
+
+/* Set params depending on attributes and optimization options.  */
+static void
+arm_option_params_internal (struct gcc_options *opts)
+{
+ /* If we are not using the default (ARM mode) section anchor offset
+ ranges, then set the correct ranges now.  */
+  if (TREE_TARGET_THUMB1 (opts))
+{
+  /* Thumb-1 LDR instructions cannot have negative offsets.
+ Permissible positive offset ranges are 5-bit (for byte loads),
+ 6-bit (for halfword loads), or 7-bit (for word loads).
+ Empirical results suggest a 7-bit anchor range gives the best
+ overall code size.  */
+  targetm.min_anchor_offset = 0;
+  targetm.max_anchor_offset = 127;
+}
+  else if (TREE_TARGET_THUMB2 (opts))
+{
+  /* The minimum is set such that the total size of the block
+ for a particular anchor is 248 + 1 + 4095 bytes, which is
+ divisible by eight, ensuring natural spacing of anchors.  */
+  targetm.min_anchor_offset = -248;
+  targetm.max_anchor_offset = 4095;
+}
+  else
+{
+  targetm.min_anchor_offset = TARGET_MIN_ANCHOR_OFFSET;
+  targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET

[Cec-weeklies] ST40 tools #1517,#1518,#1519

2015-05-07 Thread Christian Bruel

Highlights
---
 - Beta Breakpad delived
 - Upstreaming (almost accepted)

Issues
---
 - Current FSF trunk boostrap broken (delay validations)

Completed
--
 - Fixed FSF attribute optimize inlining
 - Fixed FSF bad handling of ipa-cp (inlining of functions with constant
propagation)
 - Respinned ARM attribute target (means a week or merging/validation)

Details

 - Breakpad-SH4
* Beta version delivered to customer for pre-integration testing
  . No news good news. Seems to be integrated into the customer's box.
  . Google upstreaming. All issues addresses. No objection for  landing
  . Cross and on boards testing validation.
  . Rewrote syscall layer for more preprocessor factorization
(denser but more obfuscated)

 - Misc ST40
 obsp,seinit integration
* Received a B2147 (Alicante) Board
 Board setup done.
* Meeting (Pierre Gobin, Bruno Tisseran, Seraphin Bonnaffe, Loic
Pallardy, Denis Hory, Ludovic Barre) to discuss SEINIT future
  . Originally maintained in the OBSP by Bristol
  . Integrated with Docsis modem using a bpl elf patching tool by
OS team (remoteproc)
  . Knowledge seems to be lost
- seinit.o integration ?
- Can't rebuild seinit.o. Who is in charge of the sources
(currently embedded in the obsp) ?

  proposal:
- CEC replaces the elf patching tool by a board spec and a set
of linker commands.
- CEC Volunteer to get the hands on the seinit sources. (own?)
- First Experiment to take them out of the OBSP tree (lot of
dependencies).
- Issue with MMU/TLB/OS setups. Unknown specifications


 - ST40 STlinux Toolchain
 Working on GCC 5.0.0 distribution port
 Progressing on the packages. glibc link error fixes

 - GCC generic and x-targets
 - Working on ipa-cp inconsistencies (small functions not inlined)
 Reduced to a generic case
 x86 fixes accepted.
 - Working on function alignments with attribute ((optimize))
 Generic part fixed.
 Need hooks into the target backends (spread to all backends)
 x86, aarch64, alpha fixed
 other targets still under approvals or still undetected

 - Misc
 Re-spined attribute-target patch
 Fixed documentation





Re: [PATCH, ARM] attribute target (thumb,arm) [4/6] respin (4th)

2015-05-07 Thread Christian Bruel
+ Sandra's doc review fixes.

tested with make doc pdf

thanks

Christian

On 05/06/2015 04:24 PM, Christian Bruel wrote:
 Implements and document the hooks to support target_attributes.
 
 The emission of blx is handled directly for armv5 to overcome a bug with
 the current binutils that fails with calls to a static symbol in a
 different section. (e.g .text - .text.startup) in different modes.
 
 (ref https://sourceware.org/bugzilla/show_bug.cgi?id=17505)
 
 Regtests included
 
 Thanks
 
 Christian
 
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.opt (THUMB, arm_restrict_it, inline_asm_unified): Save.
	* config/arm/arm.h (arm_valid_target_attribute_tree): Declare.
	(arm_reset_previous_fndecl, arm_change_mode_p): Likewise.
	(SWITCHABLE_TARGET): Define.
	* config/arm/arm.c (arm_reset_previous_fndecl): New functions.
	(arm_valid_target_attribute_tree, arm_change_mode_p): Likewise.
	(arm_valid_target_attribute_p): Likewise.
	(arm_set_current_function, arm_can_inline_p): Likewise.
	(arm_valid_target_attribute_rec): Likewise.
	(arm_previous_fndecl): New variable.
	(TARGET_SET_CURRENT_FUNCTION, TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
	(TARGET_CAN_INLINE_P): Define.
	(arm_asm_trampoline_template): Emit mode.
	(arm_file_start): Don't set unified syntax.
	(arm_declare_function_name): Set unified syntax and mode.
	(arm_option_override): Init target_option_default_node.
	and target_option_current_node.
	* config/arm/arm.md (*call_value_symbol): Set mode when possible.
	(*call_symbol): Likewise.
	* doc/extend.texi: Document ARM target and pragma attribute.
	* doc/invoke.texi: Likewise.

2014-09-23  Christian Bruel  christian.br...@st.com

	* gcc.target/arm/attr_arm.c: New test
	* gcc.target/arm/attr_arm-err.c: New test
	* gcc.target/arm/attr_thumb.c: New test
	* gcc.target/arm/attr_thumb-static.c: New test



diff '--exclude=.svn' -ruN gnu_trunk.p3/gcc/gcc/config/arm/arm.c gnu_trunk.p4/gcc/gcc/config/arm/arm.c
--- gnu_trunk.p3/gcc/gcc/config/arm/arm.c	2015-05-06 14:31:48.750726995 +0200
+++ gnu_trunk.p4/gcc/gcc/config/arm/arm.c	2015-05-06 15:03:29.393992051 +0200
@@ -94,6 +94,7 @@
 #include opts.h
 #include dumpfile.h
 #include gimple-expr.h
+#include target-globals.h
 #include builtins.h
 #include tm-constrs.h
 #include rtl-iter.h
@@ -264,6 +265,9 @@
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static void arm_set_current_function (tree);
+static bool arm_can_inline_p (tree, tree);
+static bool arm_valid_target_attribute_p (tree, tree, tree, int);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
 static bool arm_macro_fusion_p (void);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
@@ -412,6 +416,9 @@
 #undef  TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE arm_output_function_epilogue
 
+#undef TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P arm_can_inline_p
+
 #undef  TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE arm_option_override
 
@@ -430,6 +437,12 @@
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION arm_set_current_function
+
+#undef TARGET_OPTION_VALID_ATTRIBUTE_P
+#define TARGET_OPTION_VALID_ATTRIBUTE_P arm_valid_target_attribute_p
+
 #undef TARGET_SCHED_REORDER
 #define TARGET_SCHED_REORDER arm_sched_reorder
 
@@ -2750,6 +2763,9 @@
 }
 }
 
+/* Options after initial target override.  */
+static GTY(()) tree init_optimize;
+
 /* Reset options between modes that the user has specified.  */
 static void
 arm_option_override_internal (struct gcc_options *opts,
@@ -2772,6 +2788,10 @@
   if (TREE_TARGET_THUMB (opts)  TARGET_CALLEE_INTERWORKING)
 opts-x_target_flags |= MASK_INTERWORK;
 
+  /* need to remember initial values so combinaisons of options like
+ -mflip-thumb -mthumb -fno-schedule-insns work for any attribute.  */
+  cl_optimization *to = TREE_OPTIMIZATION (init_optimize);
+
   if (! opts_set-x_arm_restrict_it)
 opts-x_arm_restrict_it = arm_arch8;
 
@@ -2779,15 +2799,17 @@
 opts-x_arm_restrict_it = 0;
 
   if (TREE_TARGET_THUMB1 (opts))
-{
-  /* Don't warn since it's on by default in -O2.  */
-  opts-x_flag_schedule_insns = 0;
-}
+/* Don't warn since it's on by default in -O2.  */
+opts-x_flag_schedule_insns = 0;
+  else
+opts-x_flag_schedule_insns = to-x_flag_schedule_insns;
 
   /* Disable shrink-wrap when optimizing function for size, since it tends to
  generate additional returns.  */
   if (optimize_function_for_size_p (cfun)  TREE_TARGET_THUMB2 (opts))
 opts-x_flag_shrink_wrap = false;
+  else
+opts-x_flag_shrink_wrap = to-x_flag_shrink_wrap;
 
   /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn
  - epilogue_insns - does not accurately model the corresponding insns
@@ -2799,6 +2821,8 @@
  fipa-ra

Re: [Cec-weeklies] ST40 tools #1517,#1518,#1519

2015-05-07 Thread Christian Bruel
please ignore

On 05/07/2015 10:43 AM, Christian Bruel wrote:
 
 Highlights
 ---
  - Beta Breakpad delived
  - Upstreaming (almost accepted)
 
 Issues
 ---
  - Current FSF trunk boostrap broken (delay validations)
 
 Completed
 --
  - Fixed FSF attribute optimize inlining
  - Fixed FSF bad handling of ipa-cp (inlining of functions with constant
 propagation)
  - Respinned ARM attribute target (means a week or merging/validation)
 
 Details
 
  - Breakpad-SH4
 * Beta version delivered to customer for pre-integration testing
   . No news good news. Seems to be integrated into the customer's box.
   . Google upstreaming. All issues addresses. No objection for  landing
   . Cross and on boards testing validation.
   . Rewrote syscall layer for more preprocessor factorization
 (denser but more obfuscated)
 
  - Misc ST40
  obsp,seinit integration
 * Received a B2147 (Alicante) Board
  Board setup done.
 * Meeting (Pierre Gobin, Bruno Tisseran, Seraphin Bonnaffe, Loic
 Pallardy, Denis Hory, Ludovic Barre) to discuss SEINIT future
   . Originally maintained in the OBSP by Bristol
   . Integrated with Docsis modem using a bpl elf patching tool by
 OS team (remoteproc)
   . Knowledge seems to be lost
 - seinit.o integration ?
 - Can't rebuild seinit.o. Who is in charge of the sources
 (currently embedded in the obsp) ?
 
   proposal:
 - CEC replaces the elf patching tool by a board spec and a set
 of linker commands.
 - CEC Volunteer to get the hands on the seinit sources. (own?)
 - First Experiment to take them out of the OBSP tree (lot of
 dependencies).
 - Issue with MMU/TLB/OS setups. Unknown specifications
 
 
  - ST40 STlinux Toolchain
  Working on GCC 5.0.0 distribution port
  Progressing on the packages. glibc link error fixes
 
  - GCC generic and x-targets
  - Working on ipa-cp inconsistencies (small functions not inlined)
  Reduced to a generic case
  x86 fixes accepted.
  - Working on function alignments with attribute ((optimize))
  Generic part fixed.
  Need hooks into the target backends (spread to all backends)
  x86, aarch64, alpha fixed
  other targets still under approvals or still undetected
 
  - Misc
  Re-spined attribute-target patch
  Fixed documentation
 
 
 


Re: [PATCH, ARM] attribute target (thumb,arm) [2/6] respin (4th)

2015-05-07 Thread Christian Bruel


On 05/07/2015 10:49 AM, Ramana Radhakrishnan wrote:
 
 
 On 06/05/15 15:20, Christian Bruel wrote:
 In preparation of the pragma target

 reorganize ÂTARGET_CPU_CPP_BUILTINSÂ to redefine mode dependent macros
 based on current thumb_p.
 
 I'm not entirely happy with this patch as it appears to be too tied to 
 just the thumbness of the attributes.

OK, I'll change arm_cpp_builtins (struct cpp_reader *in, bool thumb_p)
into arm_cpp_builtins (struct cpp_reader *in, int target_flags)

 Additionally there appears to be
 recomputing of booleans which could well have been done using the 
 standard headers, given that this includes tm.h why don't you have the 
 definitions from arm.h for the various TARGET_* macros. See a couple of 
 examples below.

This patch prepares the ground for attribute_target: the flags might not
be global, As I remember TARGET_THUMB checks
global_options.x_target_flags, but at that time of processing for the
#pragma target I think global_option is not adjusted.

see arm_pragma_target_parse: TARGET_THUMB != (TARGET_THUMB_P
(cur_opt-x_target_flags)


 
 While you are here , can you look at moving all the CPP builtins into 
 one function and then working from there. That would be a better 
 restructuring in the long term rather than a piece meal move in this 
 case. Makes rebuild times smaller for folks by doing this in one place.

Sure,

 
 
 

 Thanks,

 Christian

 
 2014-09-23  Christian Bruel  christian.br...@st.com

  * config/arm/arm-c.c (cpp_def_or_undef): New functions.
  (arm_cpp_builtins): Likewise.
  * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Move mode dependant
  macros to arm_cpp_builtins.
  * config/arm/arm-protos.h (arm_cpp_builtins): Declare.

 diff '--exclude=.svn' -ruN gnu_trunk.p1/gcc/gcc/config/arm/arm-c.c 
 gnu_trunk.p2/gcc/gcc/config/arm/arm-c.c
 --- gnu_trunk.p1/gcc/gcc/config/arm/arm-c.c  2015-05-06 14:06:27.508142998 
 +0200
 +++ gnu_trunk.p2/gcc/gcc/config/arm/arm-c.c  2015-05-06 14:27:45.362310057 
 +0200
 @@ -51,3 +51,73 @@
  {
arm_lang_output_object_attributes_hook = arm_output_c_attributes;
  }
 +
 +/* Define or undefine macro.  */
 +
 +static void
 +cpp_def_or_undef (struct cpp_reader *in, const char *str, bool def_p)
 +{
 +  if (def_p)
 +cpp_define (in, str);
 +  else
 +cpp_undef (in, str);
 +}
 +
 +/* Define or undefine macros based on the current target.  If the user does
 +   #pragma GCC target, we need to adjust the macros dynamically.  */
 +
 +void
 +arm_cpp_builtins (struct cpp_reader *in, bool thumb_p)
 +{
 +  bool target_32bit_p = !thumb_p || arm_arch_thumb2;
 
 Why isn't this TARGET_32BIT ?

TARGET_32BIT checks global_options. We will need the options from the
current tree.

 
 +  bool thumb2_p = thumb_p  arm_arch_thumb2;
 
 TARGET_THUMB2 ?

idem.

 
 +  bool have_ldrex_p = (arm_arch6  !thumb_p) || arm_arch7;
 +  bool have_ldrexbh_p = (arm_arch6k  !thumb_p) || arm_arch7;
 +  bool have_ldrexd_p = ((arm_arch6k  !thumb_p) || arm_arch7)
 + arm_arch_notm;
 
 TARGET_HAVE_LDREX* ?

idem.

 
 +
 +  int arm_feature_ldrex = (have_ldrex_p ? 4 : 0)
 +| (have_ldrexbh_p ? 3 : 0) | (have_ldrexd_p ? 8 : 0);
 +
 +  cpp_def_or_undef (in, __thumb__, thumb_p);
 +  if (arm_arch_thumb2)
 +cpp_def_or_undef (in, __thumb2__, thumb_p);
 +  if (TARGET_BIG_END)
 +cpp_def_or_undef (in, __THUMBEB__, thumb_p);
 +  else
 +cpp_def_or_undef (in, __THUMBEL__, thumb_p);
 +
 +  cpp_def_or_undef (in, __ARM_32BIT_STATE, target_32bit_p); /* 
 TARGET_32BIT  */
 +
 +  if (arm_arch5e  (arm_arch_notm || arm_arch7))   /* TARGET_ARM_QBIT  */
 +cpp_def_or_undef (in, __ARM_FEATURE_QBIT, target_32bit_p);
 +
 +  if (arm_arch6  (arm_arch_notm || arm_arch7))/* TARGET_ARM_SAT  */
 +cpp_def_or_undef (in, __ARM_FEATURE_SAT, target_32bit_p);
 +
 +  if (arm_arch5e  (arm_arch_notm || arm_arch7em)) /* TARGET_DSP_MULTIPLY  
 */
 +cpp_def_or_undef (in, __ARM_FEATURE_DSP, target_32bit_p);
 +
 +  if (arm_arch6  (arm_arch_notm || arm_arch7em))  /* TARGET_INT_SIMD  */
 +cpp_def_or_undef (in, __ARM_FEATURE_SIMD32, target_32bit_p);
 +
 + /* TARGET_IDIV  */
 +  cpp_def_or_undef (in, __ARM_ARCH_EXT_IDIV__,
 +((!thumb_p  arm_arch_arm_hwdiv)
 + || (thumb2_p  arm_arch_thumb_hwdiv)));
 +
 +  cpp_def_or_undef (in, __ARM_FEATURE_IDIV,
 +((!thumb_p  arm_arch_arm_hwdiv)
 + || (thumb2_p  arm_arch_thumb_hwdiv)));
 +
 + if (arm_feature_ldrex)
 +   cpp_define_formatted (in, __ARM_FEATURE_LDREX=%d, arm_feature_ldrex);
 + else
 +   cpp_undef (in, __ARM_FEATURE_LDREX);
 +
 + cpp_def_or_undef (in, __ARM_FEATURE_CLZ,
 +   ((TARGET_ARM_ARCH = 5  !thumb_p) || 
 TARGET_ARM_ARCH_ISA_THUMB =2));
 +
 + cpp_def_or_undef (in, __ARM_ASM_SYNTAX_UNIFIED__, inline_asm_unified);
 +}
 +
 diff '--exclude=.svn' -ruN gnu_trunk.p1/gcc/gcc/config/arm/arm.h 
 gnu_trunk.p2/gcc/gcc/config/arm/arm.h
 --- gnu_trunk.p1/gcc/gcc/config/arm/arm.h2015-05-06 14:24:41.149994939 
 +0200
 +++ gnu_trunk.p2/gcc

[PATCH, ARM] attribute target (thumb,arm) [0/6] respin (4th)

2015-05-06 Thread Christian Bruel
 A general note, please reply to each of the patches with a rebased
 patch as a separate email. Further more all your patches appear to
 have dos line endings so they don't seem to apply cleanly. Please
 don't have spurious headers in your patch submission - it then makes
 it hard to , please create it in a way that it is easily applied by
 someone trying it out. It looks like p4 needs a respin as I got a
 reject trying to apply the documentation patch to my tree while trying
 to apply it.


lets go... rebased (4th time since 2014 !), no conflicts.

so here is the attribute revisited / rebased , so

 - thumb1 is now supported
 - -mflip-thump option added for testing.
 - inlining is allowed between modes.

This set of patches was tested on arm-none-eabi as:

no regressions with:
arm-sim/
arm-sim/-march=armv7-a
arm-sim/-mthumb
arm-sim/-mthumb/-march=armv7-a

a few artifacts, all of them analyzed, with
arm-sim/-mflip-thumb/
arm-sim/-mflip-thumb//-march=armv7-a
arm-sim/-mflip-thumb//-mthumb
arm-sim/-mflip-thumb//-mthumb/-march=armv7-a


Artifacts are analyzed, they are mostly the fault of the testsuite being
unable to mix modes in the expected results (e.g thumb[1,2] or arm tests
predicates, mix setjmp/longjump between modes...)

The support is split as followed:

[1/6] Reorganized arm_option_override to dynamically redefine the flags
depending on the attribute mode.


[2/6] Reorganized macro settings to be set/unset for each #pragma targets

[3/6] Change ARM_DECLARE_FUNCTION_NAME into a function

[4/6] Implement hooks to support attribute target

[5/6] Implement #pragma target

[6/6] Add -mflip-thumb option for testing

thanks

Christian


[PATCH, ARM] attribute target (thumb,arm) [1/6] respin (4th)

2015-05-06 Thread Christian Bruel
In preparation of the target attribute,

reorganize Âarm_option_override into 3 entities:
arm_option_override_internal_p
arm_option_check_internal
arm_option_param_internal

Also define and use TREE_TARGET macros instead of file-scope variables
in the machine description.

Thanks,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.h (arm_option_override): Reoganized and split.
	(arm_option_params_internal); New function.
	(arm_option_check_internal): New function.
	(arm_option_override_internal): New function.
	(restrict_default): New boolean.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros.
	(TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag.

diff '--exclude=.svn' -ruN gnu_trunk.ref/gcc/gcc/config/arm/arm.c gnu_trunk.p1/gcc/gcc/config/arm/arm.c
--- gnu_trunk.ref/gcc/gcc/config/arm/arm.c	2015-05-05 14:35:30.214153999 +0200
+++ gnu_trunk.p1/gcc/gcc/config/arm/arm.c	2015-05-06 14:24:41.125994898 +0200
@@ -846,12 +846,6 @@
 /* Nonzero if tuning for Cortex-A9.  */
 int arm_tune_cortex_a9 = 0;
 
-/* Nonzero if generating Thumb instructions.  */
-int thumb_code = 0;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-int thumb1_code = 0;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
@@ -2669,6 +2663,148 @@
   return std_gimplify_va_arg_expr (valist, type, pre_p, post_p);
 }
 
+/* Check any incompatible options that the user has specified.  */
+static void
+arm_option_check_internal (struct gcc_options *opts)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+ other command line choices.  */
+  if (TREE_TARGET_ARM (opts)  !(insn_flags  FL_NOTM))
+error (target CPU does not support ARM mode);
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+ from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME)  TREE_TARGET_ARM (opts))
+warning (0, enabling backtrace support is only meaningful when compiling for the Thumb);
+
+  if (TREE_TARGET_ARM (opts)  TARGET_CALLEE_INTERWORKING)
+warning (0, enabling callee interworking support is only meaningful when compiling for the Thumb);
+
+  /* If this target is normally configured to use APCS frames, warn if they
+ are turned off and debugging is turned on.  */
+  if (TREE_TARGET_ARM (opts)
+   write_symbols != NO_DEBUG
+   !TARGET_APCS_FRAME
+   (TARGET_DEFAULT  MASK_APCS_FRAME))
+warning (0, -g with -mno-apcs-frame may not give sensible debugging);
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TREE_TARGET_THUMB (opts)  TARGET_IWMMXT)
+error (iWMMXt unsupported under Thumb mode);
+
+  if (TARGET_HARD_TP  TREE_TARGET_THUMB1 (opts))
+error (can not use -mtp=cp15 with 16-bit Thumb);
+
+  if (TREE_TARGET_THUMB (opts)  TARGET_VXWORKS_RTP  flag_pic)
+{
+  error (RTP PIC is incompatible with Thumb);
+  flag_pic = 0;
+}
+
+  /* We only support -mslow-flash-data on armv7-m targets.  */
+  if (target_slow_flash_data
+   ((!(arm_arch7  !arm_arch_notm)  !arm_arch7em)
+	  || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON)))
+error (-mslow-flash-data only supports non-pic code on armv7-m targets);
+}
+
+/* Check any params depending on attributes that the user has specified.  */
+static void
+arm_option_params_internal (struct gcc_options *opts)
+{
+ /* If we are not using the default (ARM mode) section anchor offset
+ ranges, then set the correct ranges now.  */
+  if (TREE_TARGET_THUMB1 (opts))
+{
+  /* Thumb-1 LDR instructions cannot have negative offsets.
+ Permissible positive offset ranges are 5-bit (for byte loads),
+ 6-bit (for halfword loads), or 7-bit (for word loads).
+ Empirical results suggest a 7-bit anchor range gives the best
+ overall code size.  */
+  targetm.min_anchor_offset = 0;
+  targetm.max_anchor_offset = 127;
+}
+  else if (TREE_TARGET_THUMB2 (opts))
+{
+  /* The minimum is set such that the total size of the block
+ for a particular anchor is 248 + 1 + 4095 bytes, which is
+ divisible by eight, ensuring natural spacing of anchors.  */
+  targetm.min_anchor_offset = -248;
+  targetm.max_anchor_offset = 4095;
+}
+  else
+{
+  targetm.min_anchor_offset = TARGET_MIN_ANCHOR_OFFSET;
+  targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
+}
+
+  if (optimize_size)
+{
+  /* If optimizing for size, bump the number of instructions that we
+ are prepared to conditionally execute (even on a StrongARM).  */
+  max_insns_skipped = 6;
+
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TREE_TARGET_THUMB2 (opts))
+	max_insns_skipped

[PATCH, ARM] attribute target (thumb,arm) [6/6] respin (4th)

2015-05-06 Thread Christian Bruel
Implement the -mflip-thump option. Undocumented for internal testing
only. This option artificially inserts alternative attribute thumb/modes
on functions.

This close the patch set. Thanks for your review,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.c (add_attribute, arm_insert_attributes): New functions
	(TARGET_INSERT_ATTRIBUTES): Define.
	(thumb_flipper): New var.
	* config/arm/arm.opt (-mflip-thumb): New switch.

diff '--exclude=.svn' -ruN gnu_trunk.p5/gcc/gcc/config/arm/arm.c gnu_trunk.p6/gcc/gcc/config/arm/arm.c
--- gnu_trunk.p5/gcc/gcc/config/arm/arm.c	2015-05-06 15:03:29.393992051 +0200
+++ gnu_trunk.p6/gcc/gcc/config/arm/arm.c	2015-05-06 15:04:15.970072133 +0200
@@ -99,6 +99,7 @@
 #include tm-constrs.h
 #include rtl-iter.h
 #include sched-int.h
+#include tree.h
 
 /* Forward definitions of types.  */
 typedef struct minipool_nodeMnode;
@@ -232,6 +233,7 @@
 
 static void arm_file_end (void);
 static void arm_file_start (void);
+static void arm_insert_attributes (tree, tree *);
 
 static void arm_setup_incoming_varargs (cumulative_args_t, machine_mode,
 	tree, int *, int);
@@ -390,6 +392,9 @@
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
+#undef  TARGET_INSERT_ATTRIBUTES
+#define TARGET_INSERT_ATTRIBUTES arm_insert_attributes
+
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START arm_file_start
 #undef TARGET_ASM_FILE_END
@@ -2763,6 +2768,10 @@
 }
 }
 
+/* True if -mflip-thumb should next add an attribute for the default
+   mode, false if it should next add an attribute for the opposite mode.  */
+static GTY(()) bool thumb_flipper;
+
 /* Options after initial target override.  */
 static GTY(()) tree init_optimize;
 
@@ -3329,6 +3338,9 @@
  options.  */
   target_option_default_node = target_option_current_node
 = build_target_option_node (global_options);
+
+  /* Init initial mode for testing.  */
+  thumb_flipper = TARGET_THUMB;
 }
 
 static void
@@ -29459,6 +29471,52 @@
   return build_target_option_node (opts);
 }
 
+static void 
+add_attribute  (const char * mode, tree *attributes)
+{
+  size_t len = strlen (mode);
+  tree value = build_string (len, mode);
+
+  TREE_TYPE (value) = build_array_type (char_type_node,
+	build_index_type (size_int (len)));
+
+  *attributes = tree_cons (get_identifier (target),
+			   build_tree_list (NULL_TREE, value),
+			   *attributes);
+}
+
+/* For testing. Insert thumb or arm modes alternatively on functions.  */
+
+static void
+arm_insert_attributes (tree fndecl, tree * attributes)
+{
+  const char *mode;
+
+  if (! TARGET_FLIP_THUMB)
+return;
+
+  if (TREE_CODE (fndecl) != FUNCTION_DECL || DECL_EXTERNAL(fndecl)
+  || DECL_BUILT_IN (fndecl) || DECL_ARTIFICIAL (fndecl))
+   return;
+
+  /* Nested definitions must inherit mode.  */
+  if (current_function_decl)
+   {
+ mode = TARGET_THUMB ? thumb : arm;  
+ add_attribute (mode, attributes);
+ return;
+   }
+
+  /* If there is already a setting don't change it.  */
+  if (lookup_attribute (target, *attributes) != NULL)
+return;
+
+  mode = thumb_flipper ? thumb : arm;
+  add_attribute (mode, attributes);
+
+  thumb_flipper = !thumb_flipper;
+}
+
 /* Hook to validate attribute((target(string))).  */
 
 static bool
@@ -29470,13 +29528,15 @@
   tree cur_tree, new_optimize;
   gcc_assert ((fndecl != NULL_TREE)  (args != NULL_TREE));
 
+  tree old_optimize = build_optimization_node (global_options);
+
   /* Get the optimization options of the current function.  */
   tree func_optimize = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
 
   /* If the function changed the optimization levels as well as setting target
  options, start with the optimizations specified.  */
   if (!func_optimize)
-func_optimize = optimization_default_node;
+func_optimize = old_optimize;
 
   /* Init func_options.  */
   memset (func_options, 0, sizeof (func_options));
@@ -29494,14 +29554,17 @@
   cur_tree = arm_valid_target_attribute_tree (args, func_options,
 	  global_options_set);
 
-  if (cur_tree == NULL_TREE)
-ret = false;
-
   new_optimize = build_optimization_node (func_options);
 
-  DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = cur_tree;
+  if (cur_tree == NULL_TREE)
+ret = false;
+  else
+{
+  DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = cur_tree;
 
-  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize;
+  if (old_optimize != new_optimize)
+	DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize;
+}
 
   return ret;
 }
diff '--exclude=.svn' -ruN gnu_trunk.p5/gcc/gcc/config/arm/arm-c.c gnu_trunk.p6/gcc/gcc/config/arm/arm-c.c
--- gnu_trunk.p5/gcc/gcc/config/arm/arm-c.c	2015-05-06 14:36:21.987195830 +0200
+++ gnu_trunk.p6/gcc/gcc/config/arm/arm-c.c	2015-05-06 14:37:58.799362130 +0200
@@ -20,7 +20,6 @@
 #include system.h
 #include coretypes.h
 #include tm.h
-#include tm_p.h
 #include hash-set.h
 #include machmode.h
 #include vec.h
@@ -31,7 +30,11

[PATCH, ARM] attribute target (thumb,arm) [3/6] respin (4th)

2015-05-06 Thread Christian Bruel
Re-implement ARM_DECLARE_FUNCTION_NAME as a function. That will make
changed related to unified/divided and mode directives easier to insert.

Thanks

Christian

2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm-protos.h (arm_declare_function_name): Declare.
	(is_called_in_ARM_mode): Remove.
	* config/arm/arm.c (is_called_in_ARM_mode): Declare static bool.
	(arm_declare_function_name): Moved from from ARM_DECLARE_FUNCTION_NAME.
	* config/arm/arm.h (ARM_DECLARE_FUNCTION_NAME): Call
	 arm_declare_function_name.

diff '--exclude=.svn' -ruN gnu_trunk.p2/gcc/gcc/config/arm/arm.c gnu_trunk.p3/gcc/gcc/config/arm/arm.c
--- gnu_trunk.p2/gcc/gcc/config/arm/arm.c	2015-05-06 14:27:41.042302661 +0200
+++ gnu_trunk.p3/gcc/gcc/config/arm/arm.c	2015-05-06 14:31:48.750726995 +0200
@@ -23451,6 +23451,23 @@
   fprintf (f, }\n);
 }
 
+/* Return nonzero if FUNC must be entered in ARM mode.  */
+static bool
+is_called_in_ARM_mode (tree func)
+{
+  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
+
+  /* Ignore the problem about functions whose address is taken.  */
+  if (TARGET_CALLEE_INTERWORKING  TREE_PUBLIC (func))
+return true;
+
+#ifdef ARM_PE
+  return lookup_attribute (interfacearm, DECL_ATTRIBUTES (func)) != NULL_TREE;
+#else
+  return false;
+#endif
+}
+
 /* Generate code to return from a thumb function.
If 'reg_containing_return_addr' is -1, then the return address is
actually on the stack, at the stack pointer.  */
@@ -23886,22 +23903,6 @@
   return 0;
 }
 
-/* Return nonzero if FUNC must be entered in ARM mode.  */
-int
-is_called_in_ARM_mode (tree func)
-{
-  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
-
-  /* Ignore the problem about functions whose address is taken.  */
-  if (TARGET_CALLEE_INTERWORKING  TREE_PUBLIC (func))
-return TRUE;
-
-#ifdef ARM_PE
-  return lookup_attribute (interfacearm, DECL_ATTRIBUTES (func)) != NULL_TREE;
-#else
-  return FALSE;
-#endif
-}
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
many additional registers to push instead of subtracting a constant
@@ -29270,6 +29271,25 @@
 	   CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)));
 }
 
+void
+arm_declare_function_name (FILE *stream, const char *name, tree decl)
+{
+  if (TARGET_THUMB)
+{
+  if (is_called_in_ARM_mode (decl)
+	  || (TARGET_THUMB1  !TARGET_THUMB1_ONLY
+	   cfun-is_thunk))
+	fprintf (stream, \t.code 32\n);
+  else if (TARGET_THUMB1)
+	fprintf (stream, \t.code\t16\n\t.thumb_func\n);
+  else
+	fprintf (stream, \t.thumb\n\t.thumb_func\n);
+}
+
+  if (TARGET_POKE_FUNCTION_NAME)
+arm_poke_function_name (stream, (const char *) name);
+}
+
 /* If MEM is in the form of [base+offset], extract the two parts
of address and set to BASE and OFFSET, otherwise return false
after clearing BASE and OFFSET.  */
@@ -29390,4 +29410,5 @@
   *pri = tmp;
   return;
 }
+
 #include gt-arm.h
diff '--exclude=.svn' -ruN gnu_trunk.p2/gcc/gcc/config/arm/arm.h gnu_trunk.p3/gcc/gcc/config/arm/arm.h
--- gnu_trunk.p2/gcc/gcc/config/arm/arm.h	2015-05-06 14:27:45.362310057 +0200
+++ gnu_trunk.p3/gcc/gcc/config/arm/arm.h	2015-05-06 14:31:48.750726995 +0200
@@ -2157,23 +2157,7 @@
? 1 : 0)
 
 #define ARM_DECLARE_FUNCTION_NAME(STREAM, NAME, DECL) 	\
-  do			\
-{			\
-  if (TARGET_THUMB) \
-{		\
-  if (is_called_in_ARM_mode (DECL)		\
-	  || (TARGET_THUMB1  !TARGET_THUMB1_ONLY	\
-		   cfun-is_thunk))	\
-fprintf (STREAM, \t.code 32\n) ;		\
-  else if (TARGET_THUMB1)			\
-   fprintf (STREAM, \t.code\t16\n\t.thumb_func\n) ;	\
-  else		\
-   fprintf (STREAM, \t.thumb\n\t.thumb_func\n) ;	\
-}		\
-  if (TARGET_POKE_FUNCTION_NAME)			\
-arm_poke_function_name (STREAM, (const char *) NAME);	\
-}			\
-  while (0)
+  arm_declare_function_name ((STREAM), (NAME), (DECL));
 
 /* For aliases of functions we use .thumb_set instead.  */
 #define ASM_OUTPUT_DEF_FROM_DECLS(FILE, DECL1, DECL2)		\
diff '--exclude=.svn' -ruN gnu_trunk.p2/gcc/gcc/config/arm/arm-protos.h gnu_trunk.p3/gcc/gcc/config/arm/arm-protos.h
--- gnu_trunk.p2/gcc/gcc/config/arm/arm-protos.h	2015-05-06 14:27:45.362310057 +0200
+++ gnu_trunk.p3/gcc/gcc/config/arm/arm-protos.h	2015-05-06 14:31:48.750726995 +0200
@@ -30,6 +30,7 @@
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
+extern void arm_declare_function_name (FILE *, const char *, tree);
 extern void thumb2_expand_return (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
@@ -181,9 +182,6 @@
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
-#ifdef TREE_CODE
-extern int is_called_in_ARM_mode (tree);
-#endif
 extern int thumb_shiftable_const (unsigned HOST_WIDE_INT);
 #ifdef RTX_CODE
 extern enum

[PATCH, ARM] attribute target (thumb,arm) [4/6] respin (4th)

2015-05-06 Thread Christian Bruel
Implements and document the hooks to support target_attributes.

The emission of blx is handled directly for armv5 to overcome a bug with
the current binutils that fails with calls to a static symbol in a
different section. (e.g .text - .text.startup) in different modes.

(ref https://sourceware.org/bugzilla/show_bug.cgi?id=17505)

Regtests included

Thanks

Christian

2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.opt (THUMB, arm_restrict_it, inline_asm_unified): Save.
	* config/arm/arm.h (arm_valid_target_attribute_tree): Declare.
	(arm_reset_previous_fndecl, arm_change_mode_p): Likewise.
	(SWITCHABLE_TARGET): Define.
	* config/arm/arm.c (arm_reset_previous_fndecl): New functions.
	(arm_valid_target_attribute_tree, arm_change_mode_p): Likewise.
	(arm_valid_target_attribute_p): Likewise.
	(arm_set_current_function, arm_can_inline_p): Likewise.
	(arm_valid_target_attribute_rec): Likewise.
	(arm_previous_fndecl): New variable.
	(TARGET_SET_CURRENT_FUNCTION, TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
	(TARGET_CAN_INLINE_P): Define.
	(arm_asm_trampoline_template): Emit mode.
	(arm_file_start): Don't set unified syntax.
	(arm_declare_function_name): Set unified syntax and mode.
	(arm_option_override): Init target_option_default_node.
	and target_option_current_node.
	* config/arm/arm.md (*call_value_symbol): Set mode when possible.
	(*call_symbol): Likewise.
	* doc/extend.texi: Document ARM target and pragma attribute.
	* doc/invoke.texi: Likewise.

diff '--exclude=.svn' -ruN gnu_trunk.p3/gcc/gcc/config/arm/arm.c gnu_trunk.p4/gcc/gcc/config/arm/arm.c
--- gnu_trunk.p3/gcc/gcc/config/arm/arm.c	2015-05-06 14:31:48.750726995 +0200
+++ gnu_trunk.p4/gcc/gcc/config/arm/arm.c	2015-05-06 15:03:29.393992051 +0200
@@ -94,6 +94,7 @@
 #include opts.h
 #include dumpfile.h
 #include gimple-expr.h
+#include target-globals.h
 #include builtins.h
 #include tm-constrs.h
 #include rtl-iter.h
@@ -264,6 +265,9 @@
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static void arm_set_current_function (tree);
+static bool arm_can_inline_p (tree, tree);
+static bool arm_valid_target_attribute_p (tree, tree, tree, int);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
 static bool arm_macro_fusion_p (void);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
@@ -412,6 +416,9 @@
 #undef  TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE arm_output_function_epilogue
 
+#undef TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P arm_can_inline_p
+
 #undef  TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE arm_option_override
 
@@ -430,6 +437,12 @@
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION arm_set_current_function
+
+#undef TARGET_OPTION_VALID_ATTRIBUTE_P
+#define TARGET_OPTION_VALID_ATTRIBUTE_P arm_valid_target_attribute_p
+
 #undef TARGET_SCHED_REORDER
 #define TARGET_SCHED_REORDER arm_sched_reorder
 
@@ -2750,6 +2763,9 @@
 }
 }
 
+/* Options after initial target override.  */
+static GTY(()) tree init_optimize;
+
 /* Reset options between modes that the user has specified.  */
 static void
 arm_option_override_internal (struct gcc_options *opts,
@@ -2772,6 +2788,10 @@
   if (TREE_TARGET_THUMB (opts)  TARGET_CALLEE_INTERWORKING)
 opts-x_target_flags |= MASK_INTERWORK;
 
+  /* need to remember initial values so combinaisons of options like
+ -mflip-thumb -mthumb -fno-schedule-insns work for any attribute.  */
+  cl_optimization *to = TREE_OPTIMIZATION (init_optimize);
+
   if (! opts_set-x_arm_restrict_it)
 opts-x_arm_restrict_it = arm_arch8;
 
@@ -2779,15 +2799,17 @@
 opts-x_arm_restrict_it = 0;
 
   if (TREE_TARGET_THUMB1 (opts))
-{
-  /* Don't warn since it's on by default in -O2.  */
-  opts-x_flag_schedule_insns = 0;
-}
+/* Don't warn since it's on by default in -O2.  */
+opts-x_flag_schedule_insns = 0;
+  else
+opts-x_flag_schedule_insns = to-x_flag_schedule_insns;
 
   /* Disable shrink-wrap when optimizing function for size, since it tends to
  generate additional returns.  */
   if (optimize_function_for_size_p (cfun)  TREE_TARGET_THUMB2 (opts))
 opts-x_flag_shrink_wrap = false;
+  else
+opts-x_flag_shrink_wrap = to-x_flag_shrink_wrap;
 
   /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn
  - epilogue_insns - does not accurately model the corresponding insns
@@ -2799,6 +2821,8 @@
  fipa-ra.  */
   if (TREE_TARGET_THUMB1 (opts))
 opts-x_flag_ipa_ra = 0;
+  else
+opts-x_flag_ipa_ra = to-x_flag_ipa_ra;
 
   /* Thumb2 inline assembly code should always use unified syntax.
  This will apply to ARM and Thumb1 eventually.  */
@@ -3291,12 +3315,20 @@
(!arm_arch7 || !current_tune-prefer_ldrd_strd))
 flag_schedule_fusion = 0

[PATCH, ARM] attribute target (thumb,arm) [5/6] respin (4th)

2015-05-06 Thread Christian Bruel
Implements the hooks for #pragma GCC target

A test included to check that macros were correctly defined/undefined on
pragma regions.

Thanks

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.h (REGISTER_TARGET_PRAGMAS):
	 Call arm_register_target_pragmas.
	* config/arm/arm-protos.h (arm_register_target_pragmas): Declare.
	* config/arm/arm-c.c (arm_register_target_pragmas): New function.
	(arm_pragma_target_parse): Likewise.

diff '--exclude=.svn' -ruN gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/pragma_attribute.c gnu_trunk.p5/gcc/gcc/testsuite/gcc.target/arm/pragma_attribute.c
--- gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/pragma_attribute.c	1970-01-01 01:00:00.0 +0100
+++ gnu_trunk.p5/gcc/gcc/testsuite/gcc.target/arm/pragma_attribute.c	2015-05-06 14:37:31.215314738 +0200
@@ -0,0 +1,35 @@
+/* Test for #prama target macros.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb1_ok } */
+
+#pragma GCC target (thumb)
+
+#ifndef __thumb__
+#error __thumb__ is not defined
+#endif
+
+#ifdef __thumb2__
+#ifndef __ARM_32BIT_STATE
+#error  __ARM_32BIT_STATE is not defined
+#endif
+#else /* thumb1 */
+#ifdef __ARM_32BIT_STATE
+#error  __ARM_32BIT_STATE is defined
+#endif
+#endif /* thumb1 */
+
+#pragma GCC target (arm)
+
+#ifdef __thumb__
+#error __thumb__ is defined
+#endif
+
+#if defined (__thumb2__) || defined (__thumb1__)
+#error thumb is defined
+#endif 
+
+#ifndef __ARM_32BIT_STATE
+#error  __ARM_32BIT_STATE is not defined
+#endif
+
+#pragma GCC reset_options


[PATCH, ARM] attribute target (thumb,arm) [2/6] respin (4th)

2015-05-06 Thread Christian Bruel
In preparation of the pragma target

reorganize ÂTARGET_CPU_CPP_BUILTINSÂ to redefine mode dependent macros
based on current thumb_p.

Thanks,

Christian

2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm-c.c (cpp_def_or_undef): New functions.
	(arm_cpp_builtins): Likewise.
	* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Move mode dependant 
	macros to arm_cpp_builtins.
	* config/arm/arm-protos.h (arm_cpp_builtins): Declare.

diff '--exclude=.svn' -ruN gnu_trunk.p1/gcc/gcc/config/arm/arm-c.c gnu_trunk.p2/gcc/gcc/config/arm/arm-c.c
--- gnu_trunk.p1/gcc/gcc/config/arm/arm-c.c	2015-05-06 14:06:27.508142998 +0200
+++ gnu_trunk.p2/gcc/gcc/config/arm/arm-c.c	2015-05-06 14:27:45.362310057 +0200
@@ -51,3 +51,73 @@
 {
   arm_lang_output_object_attributes_hook = arm_output_c_attributes;
 }
+
+/* Define or undefine macro.  */
+
+static void
+cpp_def_or_undef (struct cpp_reader *in, const char *str, bool def_p)
+{
+  if (def_p)
+cpp_define (in, str);
+  else
+cpp_undef (in, str);
+}
+
+/* Define or undefine macros based on the current target.  If the user does
+   #pragma GCC target, we need to adjust the macros dynamically.  */
+
+void
+arm_cpp_builtins (struct cpp_reader *in, bool thumb_p)
+{
+  bool target_32bit_p = !thumb_p || arm_arch_thumb2;
+  bool thumb2_p = thumb_p  arm_arch_thumb2;
+  bool have_ldrex_p = (arm_arch6  !thumb_p) || arm_arch7;
+  bool have_ldrexbh_p = (arm_arch6k  !thumb_p) || arm_arch7;
+  bool have_ldrexd_p = ((arm_arch6k  !thumb_p) || arm_arch7)
+ arm_arch_notm;
+
+  int arm_feature_ldrex = (have_ldrex_p ? 4 : 0)
+| (have_ldrexbh_p ? 3 : 0) | (have_ldrexd_p ? 8 : 0);
+
+  cpp_def_or_undef (in, __thumb__, thumb_p);
+  if (arm_arch_thumb2)
+cpp_def_or_undef (in, __thumb2__, thumb_p);
+  if (TARGET_BIG_END)
+cpp_def_or_undef (in, __THUMBEB__, thumb_p);
+  else
+cpp_def_or_undef (in, __THUMBEL__, thumb_p);
+
+  cpp_def_or_undef (in, __ARM_32BIT_STATE, target_32bit_p); /* TARGET_32BIT  */
+
+  if (arm_arch5e  (arm_arch_notm || arm_arch7))   /* TARGET_ARM_QBIT  */
+cpp_def_or_undef (in, __ARM_FEATURE_QBIT, target_32bit_p);
+
+  if (arm_arch6  (arm_arch_notm || arm_arch7))/* TARGET_ARM_SAT  */
+cpp_def_or_undef (in, __ARM_FEATURE_SAT, target_32bit_p);
+
+  if (arm_arch5e  (arm_arch_notm || arm_arch7em)) /* TARGET_DSP_MULTIPLY  */
+cpp_def_or_undef (in, __ARM_FEATURE_DSP, target_32bit_p);
+
+  if (arm_arch6  (arm_arch_notm || arm_arch7em))  /* TARGET_INT_SIMD  */
+cpp_def_or_undef (in, __ARM_FEATURE_SIMD32, target_32bit_p);
+
+ /* TARGET_IDIV  */
+  cpp_def_or_undef (in, __ARM_ARCH_EXT_IDIV__,
+		((!thumb_p  arm_arch_arm_hwdiv)
+		 || (thumb2_p  arm_arch_thumb_hwdiv)));
+
+  cpp_def_or_undef (in, __ARM_FEATURE_IDIV,
+		((!thumb_p  arm_arch_arm_hwdiv)
+		 || (thumb2_p  arm_arch_thumb_hwdiv)));
+
+ if (arm_feature_ldrex)
+   cpp_define_formatted (in, __ARM_FEATURE_LDREX=%d, arm_feature_ldrex);
+ else
+   cpp_undef (in, __ARM_FEATURE_LDREX);
+
+ cpp_def_or_undef (in, __ARM_FEATURE_CLZ,
+		   ((TARGET_ARM_ARCH = 5  !thumb_p) || TARGET_ARM_ARCH_ISA_THUMB =2));
+
+ cpp_def_or_undef (in, __ARM_ASM_SYNTAX_UNIFIED__, inline_asm_unified);
+}
+
diff '--exclude=.svn' -ruN gnu_trunk.p1/gcc/gcc/config/arm/arm.h gnu_trunk.p2/gcc/gcc/config/arm/arm.h
--- gnu_trunk.p1/gcc/gcc/config/arm/arm.h	2015-05-06 14:24:41.149994939 +0200
+++ gnu_trunk.p2/gcc/gcc/config/arm/arm.h	2015-05-06 14:27:45.362310057 +0200
@@ -48,29 +48,12 @@
 #define TARGET_CPU_CPP_BUILTINS()			\
   do			\
 {			\
-	if (TARGET_DSP_MULTIPLY)			\
-	   builtin_define (__ARM_FEATURE_DSP);	\
-if (TARGET_ARM_QBIT)\
-   builtin_define (__ARM_FEATURE_QBIT);	\
-if (TARGET_ARM_SAT)\
-   builtin_define (__ARM_FEATURE_SAT);	\
 if (TARGET_CRYPTO)\
 	   builtin_define (__ARM_FEATURE_CRYPTO);	\
 	if (unaligned_access)\
 	  builtin_define (__ARM_FEATURE_UNALIGNED);	\
 	if (TARGET_CRC32)\
 	  builtin_define (__ARM_FEATURE_CRC32);	\
-	if (TARGET_32BIT)\
-	  builtin_define (__ARM_32BIT_STATE);		\
-	if (TARGET_ARM_FEATURE_LDREX)\
-	  builtin_define_with_int_value (			\
-	__ARM_FEATURE_LDREX, TARGET_ARM_FEATURE_LDREX);	\
-	if ((TARGET_ARM_ARCH = 5  !TARGET_THUMB)		\
-	 || TARGET_ARM_ARCH_ISA_THUMB =2)			\
-	  builtin_define (__ARM_FEATURE_CLZ);			\
-	if (TARGET_INT_SIMD)	\
-	  builtin_define (__ARM_FEATURE_SIMD32);		\
-\
 	builtin_define_with_int_value (\
 	  __ARM_SIZEOF_MINIMAL_ENUM,\
 	  flag_short_enums ? 1 : 4);\
@@ -89,10 +72,6 @@
 	if (arm_arch_notm)\
 	  builtin_define (__ARM_ARCH_ISA_ARM);	\
 	builtin_define (__APCS_32__);			\
-	if (TARGET_THUMB)\
-	  builtin_define (__thumb__);			\
-	if (TARGET_THUMB2)\
-	  builtin_define (__thumb2__);		\
 	if (TARGET_ARM_ARCH_ISA_THUMB)			\
 	  builtin_define_with_int_value (		\
 	__ARM_ARCH_ISA_THUMB,			\
@@ -102,15 +81,9 @@
 	  {		\
 	builtin_define (__ARMEB__);		\
 	builtin_define (__ARM_BIG_ENDIAN

Re: [PATCH, PR target/66015]: Fix alignments with attribute_optimize for aarch64

2015-05-06 Thread Christian Bruel


On 05/05/2015 02:42 PM, Marcus Shawcroft wrote:
 On 5 May 2015 at 12:07, Christian Bruel christian.br...@st.com wrote:
 This fixes PR target/66015 and a latent issue revealed by
 gcc.dg/ipa/iinline-attr.c since
 https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01609.html

 Regtested on aarch64-linux-gnu by Linaro.

 OK for trunk ?
 
 OK.  Is this issue present in gcc-5? If so can you backport?

yes, eventhough there is no gcc.dg/ipa/iinline-attr.c to expose it

done. thanks

Christian
 
 Thanks
 /Marcus
 


Re: [Bug target/66015] New: align directives not propagated after __attribute__ ((__optimize__ (O2)))

2015-05-05 Thread Christian Bruel
Hi Jim, Steve, Andreas

Please find here a fix for the issue reported by Andreas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64835 for ia64.

same than x86 and aarch64.

I don't the environment to run the testsuite for ia64. would you mind
giving it a try and verify that it fixes the issue ?

many thanks

Christian





2015-05-05  Christian Bruel  christian.br...@st.com

	PR target/66015
	* config/ia64/ia64.c (ia64_option_override): Move align_loops,
	and align_functions into ia64_override_options_after_change.

2015-05-05  Christian Bruel  christian.br...@st.com

	PR target/66015
	* gcc.target/ia64/iinline-attr-1.c: New test.

Index: gcc/config/ia64/ia64.c
===
--- gcc/config/ia64/ia64.c	(revision 222803)
+++ gcc/config/ia64/ia64.c	(working copy)
@@ -6051,10 +6051,6 @@
 
   init_machine_status = ia64_init_machine_status;
 
-  if (align_functions = 0)
-align_functions = 64;
-  if (align_loops = 0)
-align_loops = 32;
   if (TARGET_ABI_OPEN_VMS)
 flag_no_common = 1;
 
@@ -6066,6 +6062,11 @@
 static void
 ia64_override_options_after_change (void)
 {
+  if (align_functions = 0)
+align_functions = 64;
+  if (align_loops = 0)
+align_loops = 32;
+
   if (optimize = 3
!global_options_set.x_flag_selective_scheduling
!global_options_set.x_flag_selective_scheduling2)
Index: gcc/testsuite/gcc.target/ia64/iinline-attr-1.c
===
--- gcc/testsuite/gcc.target/ia64/iinline-attr-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/ia64/iinline-attr-1.c	(working copy)
@@ -0,0 +1,28 @@
+/* Verify that alignment flags are set when  attribute __optimize is used.  */
+/* { dg-do compile } */
+
+extern void non_existent(int);
+
+__attribute__ ((__optimize__ (O2)))
+static void hooray ()
+{
+  non_existent (1);
+}
+
+__attribute__ ((__optimize__ (O2)))
+static void hiphip (void (*f)())
+{
+  non_existent (2);
+  f ();
+}
+
+__attribute__ ((__optimize__ (O2)))
+int test (void)
+{
+  hiphip (hooray);
+  return 0;
+}
+
+/* { dg-final { scan-assembler .align 64 } } */
+
+


Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook

2015-05-05 Thread Christian Bruel
thanks for the validation and the confirmation that iinline-attr.c is
now fixed on aarch64. I can now send the patch for submission request
(this one was just illustrative).

thanks

Christian

On 05/05/2015 11:10 AM, Yvan Roux wrote:
 Hi Christian,
 
 On 4 May 2015 at 11:29, Christian Bruel christian.br...@st.com wrote:

 Hi Christian,
 I noticed case gcc.dg/ipa/iinline-attr.c failed on aarch64.  The
 original patch is x86 specific, while the case is added as general
 one.  Could you please have a look at this?

 FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline
 hooray[^\\n]*inline copy in test


 that is the same latent bug for aarch64:  alignment flags are not
 propagated with attribute optimize (O2).

 testing attached patch
 
 The patch looks good to me, maybe you can just fix the original typo
 on optimizing in the comments while moving the code.  I've
 bootstrapped and regtested it on aarch64-linux-gnu with the same
 target testcase you added on i386 (as we discussed offline) and
 everything is ok (gcc.dg/ipa/iinline-attr.c now PASS).  Notice that I
 can't approve your patch.
 
 Cheers,
 Yvan
 


Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook

2015-05-04 Thread Christian Bruel

 Hi Christian,
 I noticed case gcc.dg/ipa/iinline-attr.c failed on aarch64.  The
 original patch is x86 specific, while the case is added as general
 one.  Could you please have a look at this?
 
 FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline
 hooray[^\\n]*inline copy in test
 

that is the same latent bug for aarch64:  alignment flags are not
propagated with attribute optimize (O2).

testing attached patch

Christian


Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c	(revision 222627)
+++ config/aarch64/aarch64.c	(working copy)
@@ -6908,18 +6908,6 @@
 #endif
 }
 
-  /* If not opzimizing for size, set the default
- alignment to what the target wants */
-  if (!optimize_size)
-{
-  if (align_loops = 0)
-	align_loops = aarch64_tune_params-loop_align;
-  if (align_jumps = 0)
-	align_jumps = aarch64_tune_params-jump_align;
-  if (align_functions = 0)
-	align_functions = aarch64_tune_params-function_align;
-}
-
   if (AARCH64_TUNE_FMA_STEERING)
 aarch64_register_fma_steering ();
 
@@ -6935,6 +6923,18 @@
 flag_omit_leaf_frame_pointer = false;
   else if (flag_omit_leaf_frame_pointer)
 flag_omit_frame_pointer = true;
+
+  /* If not opzimizing for size, set the default
+ alignment to what the target wants */
+  if (!optimize_size)
+{
+  if (align_loops = 0)
+	align_loops = aarch64_tune_params-loop_align;
+  if (align_jumps = 0)
+	align_jumps = aarch64_tune_params-jump_align;
+  if (align_functions = 0)
+	align_functions = aarch64_tune_params-function_align;
+}
 }
 
 static struct machine_function *


Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook

2015-04-30 Thread Christian Bruel
OK I've have a look,

thanks

Christian


On 04/30/2015 10:27 AM, Bin.Cheng wrote:
 On Mon, Apr 27, 2015 at 8:01 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Feb 4, 2015 at 2:21 PM, Christian Bruel christian.br...@st.com 
 wrote:
 While trying to reduce the PR64835 case for ARM and x86, I noticed that the
 alignment flags are cleared for x86 when attribute optimized is used.

 With the attached testcases, the visible effects are twofold :

 1) Functions compiled in with attribute optimize (-O2) are not aligned as if
 they were with the -O2 flag.

 2) can_inline_edge_p fails because opts_for_fn (caller-decl) != opts_for_fn
 (callee-decl)) even-though they are compiled with the same optimization
 level.

 2015-02-06  Christian Bruel  christian.br...@st.com

 PR target/64835
 * config/i386/i386.c (ix86_default_align): New function.
 (ix86_override_options_after_change): Call ix86_default_align.
 (TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): New hook.
 (ix86_override_options_after_change): New function.

 2015-02-06  Christian Bruel  christian.br...@st.com

 PR target/64835
 * gcc.dg/ipa/iinline-attr.c: New test.
 * gcc.target/i386/iinline-attr-2.c: New test.

 OK for mainline.
 
 Hi Christian,
 I noticed case gcc.dg/ipa/iinline-attr.c failed on aarch64.  The
 original patch is x86 specific, while the case is added as general
 one.  Could you please have a look at this?
 
 FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline
 hooray[^\\n]*inline copy in test
 
 Thanks,
 bin

 Thanks,
 Uros


Re: ping: [PATCH, ARM] attribute target (thumb,arm) [0-6]

2015-04-30 Thread Christian Bruel


On 04/30/2015 09:43 AM, Ramana Radhakrishnan wrote:
 On Mon, Apr 20, 2015 at 9:35 AM, Christian Bruel christian.br...@st.com 
 wrote:
 Hello Ramana



 Can you respin this now that we are in stage1 again ?

 Ramana


 Attached the rebased, rechecked set of patches. Original with comments
 posted in

 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02455.html
 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02458.html
 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02460.html
 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02461.html
 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02463.html
 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02467.html
 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02468.html

 many thanks,

 Christian
 
 
 A general note, please reply to each of the patches with a rebased
 patch as a separate email. Further more all your patches appear to
 have dos line endings so they don't seem to apply cleanly. Please
 don't have spurious headers in your patch submission - it then makes
 it hard to , please create it in a way that it is easily applied by
 someone trying it out. It looks like p4 needs a respin as I got a
 reject trying to apply the documentation patch to my tree while trying
 to apply it.
 

OK, thanks for the suggestions and sorry for the p4 reject. The sources
are moving fast and I have hard times catching up with re-bases.

 I tried the following decoration on foo in gcc.target/arm/attr_arm.c
 
 
 int __attribute__((target(arm, fpu=vfpv4)))
 foo(int a)
 {
   return a ? 1 : 5;
 }
 
 
 And the compiler accepts it just fine.

Indeed, it's a mistake for now. attributes other the arm/thumb ones
shall be rejected (eventually with a not yet implemented warning for
the fpu, error for the others.) until we extend it.

 
 Given that with LTO we are now using target attributes to decide
 inlining - I'm not convinced that the inline asm case goes away. In
 fact it only makes things worse so I'm almost convinced to forbid
 inlining from arm to thumb or vice-versa, which is a reversal of
 my earlier position. I hadn't twigged that LTO would reuse this
 infrastructure and it's probably simpler to prevent inlining in those
 cases.

I can resurrect the inline check chunk. FYI, with a few small examples
arm/thumb attribute is correctly handled by LTO

 
 Thoughts ?
 
 So in essence I'm still playing with this and would like to iterate
 towards a quick solution.
 

thanks, that would be good if we could land the arm/thumb attribute and
start the fpu extensions separately. (I'm currently playing with
fpu=neon but it will take time to have something solid).

Christian

 Ramana
 


ping*3: [PATCH, x86] [PR target/64835] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook

2015-04-27 Thread Christian Bruel
Hi,

I'd like to re-ping the following patch for GCC 5.2. It fixes the
__attribute__ ((__optimize__ (...))) on x86. Testcase is in the patch.

thanks

Christian

On 04/13/2015 04:24 PM, Christian Bruel wrote:
 https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00243.html
 
 thanks,
 
 Christian
 


Re: ping: [PATCH, ARM] attribute target (thumb,arm) [0-6]

2015-04-20 Thread Christian Bruel
Hello Ramana


 
 Can you respin this now that we are in stage1 again ?
 
 Ramana
 

Attached the rebased, rechecked set of patches. Original with comments
posted in

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02455.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02458.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02460.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02461.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02463.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02467.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02468.html

many thanks,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.h (arm_option_override): Reoganized and split.
	(arm_option_params_internal); New function.
	(arm_option_check_internal): New function.
	(arm_option_override_internal): New function.
	(restrict_default): New boolean.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros.
	(TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag.

diff -ruN '--exclude=.svn' a/gcc/gcc/config/arm/arm.c a1/gcc/gcc/config/arm/arm.c
--- a/gcc/gcc/config/arm/arm.c	2015-02-04 09:14:26.120602737 +0100
+++ a1/gcc/gcc/config/arm/arm.c	2015-02-05 09:19:32.853338616 +0100
@@ -846,12 +846,6 @@
 /* Nonzero if tuning for Cortex-A9.  */
 int arm_tune_cortex_a9 = 0;
 
-/* Nonzero if generating Thumb instructions.  */
-int thumb_code = 0;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-int thumb1_code = 0;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
@@ -2623,6 +2617,148 @@
   return std_gimplify_va_arg_expr (valist, type, pre_p, post_p);
 }
 
+/* Check any incompatible options that the user has specified.  */
+static void
+arm_option_check_internal (struct gcc_options *opts)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+ other command line choices.  */
+  if (TREE_TARGET_ARM (opts)  !(insn_flags  FL_NOTM))
+error (target CPU does not support ARM mode);
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+ from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME)  TREE_TARGET_ARM (opts))
+warning (0, enabling backtrace support is only meaningful when compiling for the Thumb);
+
+  if (TREE_TARGET_ARM (opts)  TARGET_CALLEE_INTERWORKING)
+warning (0, enabling callee interworking support is only meaningful when compiling for the Thumb);
+
+  /* If this target is normally configured to use APCS frames, warn if they
+ are turned off and debugging is turned on.  */
+  if (TREE_TARGET_ARM (opts)
+   write_symbols != NO_DEBUG
+   !TARGET_APCS_FRAME
+   (TARGET_DEFAULT  MASK_APCS_FRAME))
+warning (0, -g with -mno-apcs-frame may not give sensible debugging);
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TREE_TARGET_THUMB (opts)  TARGET_IWMMXT)
+error (iWMMXt unsupported under Thumb mode);
+
+  if (TARGET_HARD_TP  TREE_TARGET_THUMB1 (opts))
+error (can not use -mtp=cp15 with 16-bit Thumb);
+
+  if (TREE_TARGET_THUMB (opts)  TARGET_VXWORKS_RTP  flag_pic)
+{
+  error (RTP PIC is incompatible with Thumb);
+  flag_pic = 0;
+}
+
+  /* We only support -mslow-flash-data on armv7-m targets.  */
+  if (target_slow_flash_data
+   ((!(arm_arch7  !arm_arch_notm)  !arm_arch7em)
+	  || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON)))
+error (-mslow-flash-data only supports non-pic code on armv7-m targets);
+}
+
+/* Check any params depending on attributes that the user has specified.  */
+static void
+arm_option_params_internal (struct gcc_options *opts)
+{
+ /* If we are not using the default (ARM mode) section anchor offset
+ ranges, then set the correct ranges now.  */
+  if (TREE_TARGET_THUMB1 (opts))
+{
+  /* Thumb-1 LDR instructions cannot have negative offsets.
+ Permissible positive offset ranges are 5-bit (for byte loads),
+ 6-bit (for halfword loads), or 7-bit (for word loads).
+ Empirical results suggest a 7-bit anchor range gives the best
+ overall code size.  */
+  targetm.min_anchor_offset = 0;
+  targetm.max_anchor_offset = 127;
+}
+  else if (TREE_TARGET_THUMB2 (opts))
+{
+  /* The minimum is set such that the total size of the block
+ for a particular anchor is 248 + 1 + 4095 bytes, which is
+ divisible by eight, ensuring natural spacing of anchors.  */
+  targetm.min_anchor_offset = -248;
+  targetm.max_anchor_offset = 4095;
+}
+  else
+{
+  targetm.min_anchor_offset = TARGET_MIN_ANCHOR_OFFSET;
+  targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
+}
+
+  if (optimize_size)
+{
+  /* If optimizing for size, bump the number

Re: [PATCH] Fix inlining checks wrt optimize attribute

2015-04-17 Thread Christian Bruel
On 01/22/2015 11:33 AM, Richard Biener wrote:
 On Thu, 22 Jan 2015, Christian Bruel wrote:
 
 Hi Richard,

 I thought one of my current issue would be solved by this patch, but it is 
 not
 : I have some inlining failures with the attribute target on ARM. (e.g
 inline-3.c) where obvious early inline fails with because we fail into the
 last can_inline_edge_p case:

 opt_for_fn (callee-decl, optimize)
 = opt_for_fn (caller-decl, optimize)))

 when callee and caller are both -O2 and targetm.target_option.can_inline_p 
 was
 true, they should be inlined as in the general case (no
 DECL_FUNCTION_SPECIFIC_OPTIMIZATION)

 I'm currently testing this additional change:

 Index: ipa-inline.c
 ===
 --- ipa-inline.c (revision 219989)
 +++ ipa-inline.c (working copy)
 @@ -489,7 +489,7 @@
else if (opt_for_fn (callee-decl, optimize_size)
  opt_for_fn (caller-decl, optimize_size)
 || (opt_for_fn (callee-decl, optimize)
 -   = opt_for_fn (caller-decl, optimize)))
 +opt_for_fn (caller-decl, optimize)))
  {
if (estimate_edge_time (e)
= 20 + inline_edge_summary (e)-call_stmt_time)

 Since this is a hot topic for you, I though you would have useful comments on
 this before I ask for a commit (when stage 4 close) ?
 
 Yeah - the above looks like an obvious change to me.  Thus,
 approved if it passes bootstrap/regtest.
 
 Thanks,
 Richard.
 

thanks, sorry for the delay (stage1 blocked)

committed with the Changelog:

* ipa-inline.c (can_inline_edge_p): Allow inlining of functions with
same attributes.

This frees the road for

 - [PATCH, x86] [PR target/64835] Add
TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook
 https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00594.html

 and

 - [PATCH, ARM] attribute target (thumb,arm) [0-6]
  https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00706.html

with new regressions tests.





Re: [PATCH, DWARF] re-init dw_frame_pointer_regnum between functions

2015-04-14 Thread Christian Bruel
committed, thanks

sorry for the delay.

Christian

On 10/14/2014 08:25 PM, Richard Henderson wrote:
 On 10/14/2014 06:02 AM, Christian Bruel wrote:
 2014-09-23  Christian Bruel  christian.br...@st.com

  * execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each 
 function.
 
 It's tempting to make this a local variable within dwarf2out_frame_debug_expr
 and not try to cache it at all.
 
 But this is ok.
 
 
 r~
 


ping: [PATCH, x86] [PR target/64835] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook

2015-04-13 Thread Christian Bruel
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00243.html

thanks,

Christian



ping: [PATCH, ARM] attribute target (thumb,arm) [0-6]

2015-02-09 Thread Christian Bruel

Hello,

I'd like to ping with a respin of the 7 patches for
the attribute target (thumb,arm) [0-6] :

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02455.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02458.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02460.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02461.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02463.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02467.html
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02468.html

In order to fix the various conflicts that have happened since, please 
find attached the re-based patches to trunk rev #220529 (respectively 
from above p0.patch, p1.patch, p2,patch, p3.patch, p4,patch, p5,patch, 
p6,patch).


I understand the difficulty of reviewing those due to the code 
reorganization, but maintaining them are really a pain since a conflict 
happens at almost every update in the ARM back-end :-(


Comments, questions are welcome,

Many thanks

Christian




Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 220436)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2015-02-06  Christian Bruel  christian.br...@st.com
+
+	PR target/64835
+	* config/i386/i386.c (ix86_default_align): New function.
+	(ix86_override_options_after_change): Call ix86_default_align.
+	(TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): New hook.
+	(ix86_override_options_after_change): New function.
+
 2015-02-04  Jan Hubicka  hubi...@ucw.cz
 	Trevor Saunders  tsaund...@mozilla.com
 
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 220436)
+++ gcc/config/i386/i386.c	(working copy)
@@ -3105,6 +3105,35 @@
 }
 
 
+/* Default align_* from the processor table.  */
+
+static void
+ix86_default_align (struct gcc_options *opts)
+{
+  if (opts-x_align_loops == 0)
+{
+  opts-x_align_loops = processor_target_table[ix86_tune].align_loop;
+  align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
+}
+  if (opts-x_align_jumps == 0)
+{
+  opts-x_align_jumps = processor_target_table[ix86_tune].align_jump;
+  align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
+}
+  if (opts-x_align_functions == 0)
+{
+  opts-x_align_functions = processor_target_table[ix86_tune].align_func;
+}
+}
+
+/* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
+
+static void
+ix86_override_options_after_change (void)
+{
+  ix86_default_align (global_options);
+}
+
 /* Override various settings based on options.  If MAIN_ARGS_P, the
options are from the command line, otherwise they are from
attributes.  */
@@ -3902,20 +3931,7 @@
 opts-x_ix86_regparm = REGPARM_MAX;
 
   /* Default align_* from the processor table.  */
-  if (opts-x_align_loops == 0)
-{
-  opts-x_align_loops = processor_target_table[ix86_tune].align_loop;
-  align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
-}
-  if (opts-x_align_jumps == 0)
-{
-  opts-x_align_jumps = processor_target_table[ix86_tune].align_jump;
-  align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
-}
-  if (opts-x_align_functions == 0)
-{
-  opts-x_align_functions = processor_target_table[ix86_tune].align_func;
-}
+  ix86_default_align (opts);
 
   /* Provide default for -mbranch-cost= value.  */
   if (!opts_set-x_ix86_branch_cost)
@@ -51928,6 +51944,9 @@
 #undef TARGET_PROMOTE_FUNCTION_MODE
 #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
 
+#undef  TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
+#define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE ix86_override_options_after_change
+
 #undef TARGET_MEMBER_TYPE_FORCES_BLK
 #define TARGET_MEMBER_TYPE_FORCES_BLK ix86_member_type_forces_blk
 
Index: gcc/dwarf2cfi.c
===
--- gcc/dwarf2cfi.c	(revision 220436)
+++ gcc/dwarf2cfi.c	(working copy)
@@ -2941,7 +2941,6 @@
   dw_trace_info cie_trace;
 
   dw_stack_pointer_regnum = DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM);
-  dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM);
 
   memset (cie_trace, 0, sizeof (cie_trace));
   cur_trace = cie_trace;
@@ -2994,6 +2993,9 @@
 static unsigned int
 execute_dwarf2_frame (void)
 {
+  /* Different HARD_FRAME_POINTER_REGNUM might coexist in the same file.  */
+  dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM);
+
   /* The first time we're called, compute the incoming frame state.  */
   if (cie_cfi_vec == NULL)
 create_cie_data ();
Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c	(revision 220436)
+++ gcc/ipa-inline.c	(working copy)
@@ -489,7 +489,7 @@
   else if (opt_for_fn (callee-decl, optimize_size)
 	opt_for_fn (caller-decl, optimize_size

Re: ping: [PATCH, ARM] attribute target (thumb,arm) [0-6]

2015-02-09 Thread Christian Bruel



In order to fix the various conflicts that have happened since, please
find attached the re-based patches to trunk rev #220529 (respectively
from above p0.patch, p1.patch, p2,patch, p3.patch, p4,patch, p5,patch,
p6,patch).



oops, please don't review p0.patch here. This last one will be reviewed 
separately by the i386 and middle-end maintainers. It was posted now 
accidentally and is useful only for testing.


Thanks

Christian



[PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook

2015-02-04 Thread Christian Bruel
While trying to reduce the PR64835 case for ARM and x86, I noticed that 
the alignment flags are cleared for x86 when attribute optimized is used.


With the attached testcases, the visible effects are twofold :

1) Functions compiled in with attribute optimize (-O2) are not aligned 
as if they were with the -O2 flag.


2) can_inline_edge_p fails because opts_for_fn (caller-decl) != 
opts_for_fn (callee-decl)) even-though they are compiled with the same 
optimization level.


indeed we have:

 optimization_node 0x77522000
align_functions (0x10)
align_jumps (0x10)
align_loops (0x10)
flag_sched_stalled_insns_dep (0x1)
flag_tree_parallelize_loops (0x1)
flag_fp_contract_mode (0x2)
flag_ira_algorithm (0)
flag_ira_region (0x2)
flag_simd_cost_model (0)
flag_vect_cost_model (0x1)
optimize (0x2)
flag_aggressive_loop_optimizations (0x1)
...

optimization_node 0x77522188
flag_sched_stalled_insns_dep (0x1)
flag_tree_parallelize_loops (0x1)
flag_fp_contract_mode (0x2)
flag_ira_algorithm (0)
flag_ira_region (0x2)
flag_simd_cost_model (0)
flag_vect_cost_model (0x1)
optimize (0x2)
flag_aggressive_loop_optimizations (0x1)
...

The problem is that the alignment flags are not recomputed when setting 
the attribute flags in DECL_FUNCTION_SPECIFIC_OPTIMIZATION. Implementing 
the TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook to set them fixes the problem.


NB: TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE is also used to hold options 
from attribute target, so.this patch is a prerequisite to fix PR/64835 
on ARM and x86


bootstrapped, regtested with no new failures for x86_64-unknown-linux-gnu

Comments ? I'd like to candidate this for trunk when stage1 opens again.

Many Thanks

Christian






2015-02-06  Christian Bruel  christian.br...@st.com

	PR target/64835
	* config/i386/i386.c (ix86_default_align): New function.
	(ix86_override_options_after_change): Call ix86_default_align.
	(TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): New hook.
	(ix86_override_options_after_change): New function.

2015-02-06  Christian Bruel  christian.br...@st.com

	PR target/64835
	* gcc.dg/ipa/iinline-attr.c: New test.
	* gcc.target/i386/iinline-attr-2.c: New test.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 220394)
+++ gcc/config/i386/i386.c	(working copy)
@@ -3105,6 +3105,35 @@
 }
 
 
+/* Default align_* from the processor table.  */
+
+static void
+ix86_default_align (struct gcc_options *opts)
+{
+  if (opts-x_align_loops == 0)
+{
+  opts-x_align_loops = processor_target_table[ix86_tune].align_loop;
+  align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
+}
+  if (opts-x_align_jumps == 0)
+{
+  opts-x_align_jumps = processor_target_table[ix86_tune].align_jump;
+  align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
+}
+  if (opts-x_align_functions == 0)
+{
+  opts-x_align_functions = processor_target_table[ix86_tune].align_func;
+}
+}
+
+/* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
+
+static void
+ix86_override_options_after_change (void)
+{
+  ix86_default_align (global_options);
+}
+
 /* Override various settings based on options.  If MAIN_ARGS_P, the
options are from the command line, otherwise they are from
attributes.  */
@@ -3902,20 +3931,7 @@
 opts-x_ix86_regparm = REGPARM_MAX;
 
   /* Default align_* from the processor table.  */
-  if (opts-x_align_loops == 0)
-{
-  opts-x_align_loops = processor_target_table[ix86_tune].align_loop;
-  align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
-}
-  if (opts-x_align_jumps == 0)
-{
-  opts-x_align_jumps = processor_target_table[ix86_tune].align_jump;
-  align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
-}
-  if (opts-x_align_functions == 0)
-{
-  opts-x_align_functions = processor_target_table[ix86_tune].align_func;
-}
+  ix86_default_align (opts);
 
   /* Provide default for -mbranch-cost= value.  */
   if (!opts_set-x_ix86_branch_cost)
@@ -51928,6 +51944,9 @@
 #undef TARGET_PROMOTE_FUNCTION_MODE
 #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
 
+#undef  TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
+#define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE ix86_override_options_after_change
+
 #undef TARGET_MEMBER_TYPE_FORCES_BLK
 #define TARGET_MEMBER_TYPE_FORCES_BLK ix86_member_type_forces_blk
 
Index: gcc/testsuite/gcc.dg/ipa/iinline-attr.c
===
--- gcc/testsuite/gcc.dg/ipa/iinline-attr.c	(revision 0)
+++ gcc/testsuite/gcc.dg/ipa/iinline-attr.c	(working copy)
@@ -0,0 +1,27 @@
+/* Verify that simple indirect calls are inlined even when
+   attribute __optimize is used.  */
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-ipa-inline

Re: [PATCH] Fix inlining checks wrt optimize attribute

2015-01-22 Thread Christian Bruel

Hi Richard,

I thought one of my current issue would be solved by this patch, but it 
is not : I have some inlining failures with the attribute target on ARM. 
(e.g inline-3.c) where obvious early inline fails with because we fail 
into the last can_inline_edge_p case:


opt_for_fn (callee-decl, optimize)
   = opt_for_fn (caller-decl, optimize)))

when callee and caller are both -O2 and 
targetm.target_option.can_inline_p was true, they should be inlined as 
in the general case (no DECL_FUNCTION_SPECIFIC_OPTIMIZATION)


I'm currently testing this additional change:

Index: ipa-inline.c
===
--- ipa-inline.c(revision 219989)
+++ ipa-inline.c(working copy)
@@ -489,7 +489,7 @@
   else if (opt_for_fn (callee-decl, optimize_size)
opt_for_fn (caller-decl, optimize_size)
   || (opt_for_fn (callee-decl, optimize)
-  = opt_for_fn (caller-decl, optimize)))
+   opt_for_fn (caller-decl, optimize)))
{
  if (estimate_edge_time (e)
  = 20 + inline_edge_summary (e)-call_stmt_time)

Since this is a hot topic for you, I though you would have useful 
comments on this before I ask for a commit (when stage 4 close) ?


Cheers

Christian





On 01/22/2015 10:24 AM, Richard Biener wrote:


As said in the other thread - this makes sure we don't perform inlining
that might end up generating invalid code.  It also preserves
user-provided optimize attributes more properly.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-01-22  Richard Biener  rguent...@suse.de

* ipa-inline.c (can_inline_edge_p): Disable inlining of edges
with IL incompatible options.  Properly honor user optimize
attributes.

Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c(revision 219929)
+++ gcc/ipa-inline.c(working copy)
@@ -404,17 +404,56 @@ can_inline_edge_p (struct cgraph_edge *e
   optimization attribute.  */
else if (caller_tree != callee_tree)
  {
-  /* gcc.dg/pr43564.c.  Look at forced inline even in -O0.  */
-  if (DECL_DISREGARD_INLINE_LIMITS (callee-decl))
+  /* There are some options that change IL semantics which means
+ we cannot inline in these cases for correctness reason.
+Not even for always_inline declared functions.  */
+  /* Strictly speaking only when the callee contains signed integer
+ math where overflow is undefined.  */
+  if ((opt_for_fn (e-caller-decl, flag_strict_overflow)
+  != opt_for_fn (e-caller-decl, flag_strict_overflow))
+ || (opt_for_fn (e-caller-decl, flag_wrapv)
+ != opt_for_fn (e-caller-decl, flag_wrapv))
+ || (opt_for_fn (e-caller-decl, flag_trapv)
+ != opt_for_fn (e-caller-decl, flag_trapv))
+ /* Strictly speaking only when the callee contains memory
+accesses that are not using alias-set zero anyway.  */
+ || (opt_for_fn (e-caller-decl, flag_strict_aliasing)
+ != opt_for_fn (e-caller-decl, flag_strict_aliasing))
+ /* Strictly speaking only when the callee uses FP math.  */
+ || (opt_for_fn (e-caller-decl, flag_rounding_math)
+ != opt_for_fn (e-caller-decl, flag_rounding_math))
+ || (opt_for_fn (e-caller-decl, flag_trapping_math)
+ != opt_for_fn (e-caller-decl, flag_trapping_math))
+ || (opt_for_fn (e-caller-decl, flag_unsafe_math_optimizations)
+ != opt_for_fn (e-caller-decl, flag_unsafe_math_optimizations))
+ || (opt_for_fn (e-caller-decl, flag_finite_math_only)
+ != opt_for_fn (e-caller-decl, flag_finite_math_only))
+ || (opt_for_fn (e-caller-decl, flag_signaling_nans)
+ != opt_for_fn (e-caller-decl, flag_signaling_nans))
+ || (opt_for_fn (e-caller-decl, flag_cx_limited_range)
+ != opt_for_fn (e-caller-decl, flag_cx_limited_range))
+ || (opt_for_fn (e-caller-decl, flag_signed_zeros)
+ != opt_for_fn (e-caller-decl, flag_signed_zeros))
+ || (opt_for_fn (e-caller-decl, flag_associative_math)
+ != opt_for_fn (e-caller-decl, flag_associative_math))
+ || (opt_for_fn (e-caller-decl, flag_reciprocal_math)
+ != opt_for_fn (e-caller-decl, flag_reciprocal_math))
+ /* Strictly speaking only when the callee contains function
+calls that may end up setting errno.  */
+ || (opt_for_fn (e-caller-decl, flag_errno_math)
+ != opt_for_fn (e-caller-decl, flag_errno_math)))
+   {
+ e-inline_failed = CIF_OPTIMIZATION_MISMATCH;
+ inlinable = false;
+   }
+  /* gcc.dg/pr43564.c.  Apply user-forced inline even at -O0.  */
+  else if (DECL_DISREGARD_INLINE_LIMITS (callee-decl)
+   lookup_attribute (always_inline,
+  

[PATCH][SH] Check for 0 length with inlined strnlen builtin

2015-01-06 Thread Christian Bruel

Hello,

We should not enter the first iteration when length is 0. Testcase 
attached. Difficult to reduce because register allocation generated 
accidentally the correct return value.


testsuite OK

OK for 4.9 and trunk ?

Christian

2015-01-08  Christian Bruel  christian.br...@st.com

	PR target/64507
	* config/sh/sh-mem.cc (sh_expand_cmpnstr): Check 0 length.

2015-01-08  Christian Bruel  christian.br...@st.com

	PR target/64507
	* gcc.target/sh/pr64507.c: New test.

Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 219182)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -1,5 +1,5 @@
 /* Helper routines for memory move and comparison insns.
-   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+   Copyright (C) 2013-2015 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -421,6 +421,7 @@
 	  /* end loop.  Reached max iterations.  */
 	  if (sbytes == 0)
 	{
+	  emit_insn (gen_subsi3 (operands[0], tmp1, tmp2));
 	  jump = emit_jump_insn (gen_jump_compact (L_return));
 	  emit_barrier_after (jump);
 	}
@@ -496,6 +497,13 @@
   jump = emit_jump_insn (gen_jump_compact( L_end_loop_byte));
   emit_barrier_after (jump);
 }
+  else
+{
+  emit_insn (gen_tstsi_t (len, len));
+  emit_move_insn (operands[0], const0_rtx);
+  jump = emit_jump_insn (gen_branch_true (L_return));
+  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
+}
 
   addr1 = adjust_automodify_address (addr1, QImode, s1_addr, 0);
   addr2 = adjust_automodify_address (addr2, QImode, s2_addr, 0);
@@ -536,10 +544,10 @@
 emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2)));
   emit_insn (gen_zero_extendqisi2 (tmp1, gen_lowpart (QImode, tmp1)));
 
-  emit_label (L_return);
-
   emit_insn (gen_subsi3 (operands[0], tmp1, tmp2));
 
+  emit_label (L_return);
+
   return true;
 }
 
Index: gcc/testsuite/gcc.target/sh/pr64507.c
===
--- gcc/testsuite/gcc.target/sh/pr64507.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr64507.c	(working copy)
@@ -0,0 +1,25 @@
+/* Check that the __builtin_strnlen returns 0 with with 
+   non-constant 0 length.  */
+/* { dg-do run } */
+/* { dg-options -O2 } */
+
+extern int snprintf(char *, int, const char *, ...);
+extern void abort (void);
+
+int main()
+ {
+   int i;
+   int cmp = 0;
+   char buffer[1024];
+   const char* s = the string;
+
+   snprintf(buffer, 4, %s, s);
+
+   for (i = 1; i  4; i++)
+ cmp += __builtin_strncmp(buffer, s, i - 1);
+
+  if (cmp)
+abort();
+
+  return 0;
+}


Re: [PATCH][SH] Check for 0 length with inlined strnlen builtin

2015-01-06 Thread Christian Bruel



Please use 'gen_cmpeqsi_t (len, const0_rtx)' for comparing a value
against zero instead of the bit test insn.


OK, also then OK to replace the other occurrences of the idiom for 
coding consistency ? (not sure if I could commit this as obvious ?).


Cheers

Christian




2015-01-08  Christian Bruel  christian.br...@st.com

	* config/sh/sh-mem.cc (sh_expand_cmpnstr, sh_expand_setmem):
	 Use gen_cmpeqsi instead of gen_tstsi for comparing against 0.

Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 219257)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -410,7 +410,7 @@
 	  else
 	{
 	  emit_insn (gen_addsi3 (lenw, lenw, GEN_INT (-1)));
-	  emit_insn (gen_tstsi_t (lenw, lenw));
+ emit_insn (gen_cmpeqsi_t (lenw, const0_rtx));
 	}
 
 	  jump = emit_jump_insn (gen_branch_false (L_loop_long));
@@ -531,7 +531,7 @@
   else
 {
   emit_insn (gen_addsi3 (len, len, GEN_INT (-1)));
-  emit_insn (gen_tstsi_t (len, len));
+  emit_insn (gen_cmpeqsi_t (len, const0_rtx));
 }
 
   jump = emit_jump_insn (gen_branch_false (L_loop_byte));
@@ -691,7 +691,7 @@
   else
 	{
 	  emit_insn (gen_addsi3 (lenw, lenw, GEN_INT (-1)));
-	  emit_insn (gen_tstsi_t (lenw, lenw));
+ emit_insn (gen_cmpeqsi_t (lenw, const0_rtx));
 	}
 
   emit_move_insn (dest, val);
@@ -728,7 +728,7 @@
   else
 {
   emit_insn (gen_addsi3 (len, len, GEN_INT (-1)));
-  emit_insn (gen_tstsi_t (len, len));
+  emit_insn (gen_cmpeqsi_t (len, const0_rtx));
 }
 
   val = gen_lowpart (QImode, val);


Re: [PATCH, ARM] attribute target (thumb,arm) [6/6] - [7/7]

2014-12-18 Thread Christian Bruel

Hello Ramana,

I don't know if you have started to look at it, but the attribute 
support fails after upgrading.


This patch aims to catch up on the changes around the fipa_ra 
-masm-syntax-unified options since the initial posting. They were not 
tested/supported with the attribute, and of course generated new 
failures after an update, as they needs resetting depending on thumb mode.


It also fixes various minor thinkos, one that prevented global flags 
(e.g -fschedule_insns) depending on thumb mode to be correctly reset on 
situation were -mthumb was passed on the command line and thumb/arm 
attribute was used alternatively. The other is around the -mflip-thumb 
option setting.


Last, a minor fix in the test attr_arm-err.c that had false failures 
with conflicting -march.


Aside, a funny thing: The ACLE doc 
(https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html#ARM-C-Language-Extensions-_0028ACLE_0029) 
already as a visionary description of the attribute. I might have missed 
something but this time the doc comes earlier than the implementation 
:-) without the historical background.


Best Regards

Christian

2014-12-14  Christian Bruel  christian.br...@st.com

	* config/arm/arm.c (arm_option_override_internal): add opts_set param. Use it to set 
	restrict_it. Handle flag_ipa_ra and inline_asm_unified.
	(arm_valid_target_attribute_tree): Add opts_set param.
	Initialize init_optimize.
	(init_optimize): New static variable.
	(thumb_flipper): Set.
	( arm_valid_target_attribute_p): Rewrite.
	config/arm/arm-protos.h	(arm_valid_target_attribute_tree): New opts_set param
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewiese.
	* config/arm/arm.opt (inline_asm_unified): Save.

2014-12-14  Christian Bruel  christian.br...@st.com

* gcc.target/arm/attr_arm-err.c: Check conflicting -march options.

diff '--exclude=*~' '--exclude=.svn' -ru a/gcc/gcc/config/arm/arm.c b/gcc/gcc/config/arm/arm.c
--- a/gcc/gcc/config/arm/arm.c	2014-12-18 14:36:06.0 +0100
+++ b/gcc/gcc/config/arm/arm.c	2014-12-18 14:35:12.0 +0100
@@ -2625,9 +2625,16 @@
 }
 }
 
+/* True if -mflip-thumb should next add an attribute for the default
+   mode, false if it should next add an attribute for the opposite mode.  */
+static GTY(()) bool thumb_flipper;
+
+static GTY(()) tree init_optimize;
+
 /* Reset options between modes that the user has specified.  */
 static void
-arm_option_override_internal (struct gcc_options *opts)
+arm_option_override_internal (struct gcc_options *opts,
+			  struct gcc_options *opts_set)
 {
   if (TREE_TARGET_THUMB (opts)  !(insn_flags  FL_THUMB))
 {
@@ -2646,13 +2653,13 @@
   if (TREE_TARGET_THUMB (opts)  TARGET_CALLEE_INTERWORKING)
 opts-x_target_flags |= MASK_INTERWORK;
 
-  if (restrict_default)
+  if (! opts_set-x_arm_restrict_it)
 opts-x_arm_restrict_it = arm_arch8;
 
   if (!TREE_TARGET_THUMB2 (opts))
 opts-x_arm_restrict_it = 0;
 
-  if (TREE_TARGET_THUMB1 (opts)  opts-x_flag_schedule_insns)
+  if (TREE_TARGET_THUMB1 (opts))
 {
   /* Don't warn since it's on by default in -O2.  */
   opts-x_flag_schedule_insns = 0;
@@ -2663,9 +2670,21 @@
   if (optimize_function_for_size_p (cfun)  TREE_TARGET_THUMB2 (opts))
 opts-x_flag_shrink_wrap = false;
 
+  /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn
+ - epilogue_insns - does not accurately model the corresponding insns
+ emitted in the asm file.  In particular, see the comment in thumb_exit
+ 'Find out how many of the (return) argument registers we can corrupt'.
+ As a consequence, the epilogue may clobber registers without fipa-ra
+ finding out about it.  Therefore, disable fipa-ra in Thumb1 mode.
+ TODO: Accurately model clobbers for epilogue_insns and reenable
+ fipa-ra.  */
+  if (TREE_TARGET_THUMB1 (opts))
+opts-x_flag_ipa_ra = 0;
+
   /* Thumb2 inline assembly code should always use unified syntax.
  This will apply to ARM and Thumb1 eventually.  */
-  opts-x_inline_asm_unified = TREE_TARGET_THUMB2 (opts);
+  if (TREE_TARGET_THUMB2 (opts))
+  opts-x_inline_asm_unified = 1;
 }
 
 /* Fix up any incompatible options that the user has specified.  */
@@ -3127,27 +3146,17 @@
   if (target_slow_flash_data)
 arm_disable_literal_pool = true;
 
-  /* Override flags, but not the user's one.  */
-  restrict_default = (arm_restrict_it == 2);
-
   /* Disable scheduling fusion by default if it's not armv7 processor
  or doesn't prefer ldrd/strd.  */
   if (flag_schedule_fusion == 2
(!arm_arch7 || !current_tune-prefer_ldrd_strd))
 flag_schedule_fusion = 0;
 
-  /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn
- - epilogue_insns - does not accurately model the corresponding insns
- emitted in the asm file.  In particular, see the comment in thumb_exit
- 'Find out how many of the (return) argument registers we can corrupt'.
- As a consequence, the epilogue may clobber

Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-28 Thread Christian Bruel

Hi Ramana,

On 11/27/2014 11:36 AM, Ramana Radhakrishnan wrote:

On Wed, Nov 19, 2014 at 2:54 PM, Christian Bruel christian.br...@st.com wrote:



On 11/19/2014 03:18 PM, Ramana Radhakrishnan wrote:


On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel christian.br...@st.com
wrote:


I think I missed the stage3, Anyway would it be OK for stage1 when it
reopens ?



Since you submitted this well during stage1 and given that these
patches address comments from earlier in the review process we should
aim to get these in for 5.0. If at some point during the review
process it looks risky and there needs to be significant rework we can
always stop. It's on the list of patches to be reviewed and I will
find some dedicated time later this week to set down and review / play
with the patches in an attempt to move this forward as it is a
reasonably large chunk of work.



Thanks, also I forgot to mention that you need
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01231.html
to play with the attribute. Will be part of the same shot.


Isn't that Ok'd for committing ? Why then isn't it applied ?



yes it was, it is in my single changeset with the attribute target since 
itś the only way to trigger the issue.


Cheers

Christian





Ramana



Christian




Thanks for continuing to work on these patches and addressing the
earlier review comments.

regards
Ramana



Christian





[PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Christian Bruel

Hello Ramana,

Here is the attribute revisited after your comments, so

 - thumb1 is now supported
 - -mflip-thump option added for testing.
 - inlining is allowed between modes.

This set of patches was tested on rev#217709 as:

no regressions with:
arm-sim/
arm-sim/-march=armv7-a
arm-sim/-mthumb
arm-sim/-mthumb/-march=armv7-a

a few artifacts, all of them analyzed, with
arm-sim/-mflip-thumb/
arm-sim/-mflip-thumb//-march=armv7-a
arm-sim/-mflip-thumb//-mthumb
arm-sim/-mflip-thumb//-mthumb/-march=armv7-a


Artifacts are analyzed, they are mostly the fault of the testsuite being 
unable to mix modes in the expected results (e.g thumb[1,2] or arm tests 
predicates, mix setjmp/longjump between modes...)


The support is split as followed:

[1/6] Reorganized arm_option_override to dynamically redefine the flags 
depending on the attribute mode.


[2/6] Reorganized macro settings to be set/unset for each #pragma targets

[3/6] Change ARM_DECLARE_FUNCTION_NAME into a function

[4/6] Implement hooks to support attribute target

[5/6] Implement #pragma target

[6/6] Add -mflip-thumb option for testing

I think I missed the stage3, Anyway would it be OK for stage1 when it 
reopens ?


Christian



[PATCH, ARM] attribute target (thumb,arm) [1/6]

2014-11-19 Thread Christian Bruel

In preparation of the target attribute,

reorganize ´arm_option_override´ into 3 entities:
arm_option_override_internal_p
arm_option_check_internal
arm_option_param_internal

Also define and use TREE_TARGET macros instead of file-scope variables 
in the machine description.


Thanks,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.h (arm_option_override): Reoganized and split.
	(arm_option_params_internal); New function.
	(arm_option_check_internal): New function.
	(arm_option_override_internal): New function.
	(restrict_default): New boolean.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros.
	(TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN a/gcc/gcc/config/arm/arm.c b/gcc/gcc/config/arm/arm.c
--- a/gcc/gcc/config/arm/arm.c	2014-11-18 08:35:38.0 +0100
+++ b/gcc/gcc/config/arm/arm.c	2014-11-18 09:11:19.0 +0100
@@ -744,6 +744,9 @@ const struct arm_fpu_desc *arm_fpu_desc;
 rtx thumb_call_via_label[14];
 static int thumb_call_reg_needed;
 
+/* Remember default option is used.  */
+static bool restrict_default;
+
 /* Bit values used to identify processor capabilities.  */
 #define FL_CO_PROC(1  0)/* Has external co-processor bus */
 #define FL_ARCH3M (1  1)/* Extended multiply */
@@ -886,12 +889,6 @@ int arm_tune_wbuf = 0;
 /* Nonzero if tuning for Cortex-A9.  */
 int arm_tune_cortex_a9 = 0;
 
-/* Nonzero if generating Thumb instructions.  */
-int thumb_code = 0;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-int thumb1_code = 0;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
@@ -2601,6 +2598,136 @@ arm_gimplify_va_arg_expr (tree valist, t
   return std_gimplify_va_arg_expr (valist, type, pre_p, post_p);
 }
 
+/* Check any incompatible options that the user has specified.  */
+static void
+arm_option_check_internal (struct gcc_options *opts)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+ other command line choices.  */
+  if (TREE_TARGET_ARM (opts)  !(insn_flags  FL_NOTM))
+error (target CPU does not support ARM mode);
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+ from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME)  TREE_TARGET_ARM (opts))
+warning (0, enabling backtrace support is only meaningful when compiling for the Thumb);
+
+  if (TREE_TARGET_ARM (opts)  TARGET_CALLEE_INTERWORKING)
+warning (0, enabling callee interworking support is only meaningful when compiling for the Thumb);
+
+  /* If this target is normally configured to use APCS frames, warn if they
+ are turned off and debugging is turned on.  */
+  if (TREE_TARGET_ARM (opts)
+   write_symbols != NO_DEBUG
+   !TARGET_APCS_FRAME
+   (TARGET_DEFAULT  MASK_APCS_FRAME))
+warning (0, -g with -mno-apcs-frame may not give sensible debugging);
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TREE_TARGET_THUMB (opts)  TARGET_IWMMXT)
+error (iWMMXt unsupported under Thumb mode);
+
+  if (TARGET_HARD_TP  TREE_TARGET_THUMB1 (opts))
+error (can not use -mtp=cp15 with 16-bit Thumb);
+
+  if (TREE_TARGET_THUMB (opts)  TARGET_VXWORKS_RTP  flag_pic)
+{
+  error (RTP PIC is incompatible with Thumb);
+  flag_pic = 0;
+}
+
+  /* We only support -mslow-flash-data on armv7-m targets.  */
+  if (target_slow_flash_data
+   ((!(arm_arch7  !arm_arch_notm)  !arm_arch7em)
+	  || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON)))
+error (-mslow-flash-data only supports non-pic code on armv7-m targets);
+}
+
+/* Check any params depending on attributes that the user has specified.  */
+static void
+arm_option_params_internal (struct gcc_options *opts)
+{
+ /* If we are not using the default (ARM mode) section anchor offset
+ ranges, then set the correct ranges now.  */
+  if (TREE_TARGET_THUMB1 (opts))
+{
+  /* Thumb-1 LDR instructions cannot have negative offsets.
+ Permissible positive offset ranges are 5-bit (for byte loads),
+ 6-bit (for halfword loads), or 7-bit (for word loads).
+ Empirical results suggest a 7-bit anchor range gives the best
+ overall code size.  */
+  targetm.min_anchor_offset = 0;
+  targetm.max_anchor_offset = 127;
+}
+  else if (TREE_TARGET_THUMB2 (opts))
+{
+  /* The minimum is set such that the total size of the block
+ for a particular anchor is 248 + 1 + 4095 bytes, which is
+ divisible by eight, ensuring natural spacing of anchors.  */
+  targetm.min_anchor_offset = -248;
+  targetm.max_anchor_offset = 4095;
+}
+  else

[PATCH, ARM] attribute target (thumb,arm) [2/6]

2014-11-19 Thread Christian Bruel

In preparation of the pragma target

reorganize ´TARGET_CPU_CPP_BUILTINS´ to redefine mode dependent macros 
based on current thumb_p.


Thanks,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm-c.c (cpp_def_or_undef): New functions.
	(arm_cpp_builtins): Likewise.
	* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Move mode dependant 
	macros to arm_cpp_builtins.
	* config/arm/arm-protos.h (arm_cpp_builtins): Declare.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm-c.c c/gcc/gcc/config/arm/arm-c.c
--- b/gcc/gcc/config/arm/arm-c.c	2014-11-05 14:34:49.0 +0100
+++ c/gcc/gcc/config/arm/arm-c.c	2014-11-13 11:16:59.0 +0100
@@ -42,3 +42,73 @@ arm_lang_object_attributes_init (void)
 {
   arm_lang_output_object_attributes_hook = arm_output_c_attributes;
 }
+
+/* Define or undefine macro.  */
+
+static void
+cpp_def_or_undef (struct cpp_reader *in, const char *str, bool def_p)
+{
+  if (def_p)
+cpp_define (in, str);
+  else
+cpp_undef (in, str);
+}
+
+/* Define or undefine macros based on the current target.  If the user does
+   #pragma GCC target, we need to adjust the macros dynamically.  */
+
+void
+arm_cpp_builtins (struct cpp_reader *in, bool thumb_p)
+{
+  bool target_32bit_p = !thumb_p || arm_arch_thumb2;
+  bool thumb2_p = thumb_p  arm_arch_thumb2;
+  bool have_ldrex_p = (arm_arch6  !thumb_p) || arm_arch7;
+  bool have_ldrexbh_p = (arm_arch6k  !thumb_p) || arm_arch7;
+  bool have_ldrexd_p = ((arm_arch6k  !thumb_p) || arm_arch7)
+ arm_arch_notm;
+
+  int arm_feature_ldrex = (have_ldrex_p ? 4 : 0)
+| (have_ldrexbh_p ? 3 : 0) | (have_ldrexd_p ? 8 : 0);
+
+  cpp_def_or_undef (in, __thumb__, thumb_p);
+  if (arm_arch_thumb2)
+cpp_def_or_undef (in, __thumb2__, thumb_p);
+  if (TARGET_BIG_END)
+cpp_def_or_undef (in, __THUMBEB__, thumb_p);
+  else
+cpp_def_or_undef (in, __THUMBEL__, thumb_p);
+
+  cpp_def_or_undef (in, __ARM_32BIT_STATE, target_32bit_p); /* TARGET_32BIT  */
+
+  if (arm_arch5e  (arm_arch_notm || arm_arch7))   /* TARGET_ARM_QBIT  */
+cpp_def_or_undef (in, __ARM_FEATURE_QBIT, target_32bit_p);
+
+  if (arm_arch6  (arm_arch_notm || arm_arch7))/* TARGET_ARM_SAT  */
+cpp_def_or_undef (in, __ARM_FEATURE_SAT, target_32bit_p);
+
+  if (arm_arch5e  (arm_arch_notm || arm_arch7em)) /* TARGET_DSP_MULTIPLY  */
+cpp_def_or_undef (in, __ARM_FEATURE_DSP, target_32bit_p);
+
+  if (arm_arch6  (arm_arch_notm || arm_arch7em))  /* TARGET_INT_SIMD  */
+cpp_def_or_undef (in, __ARM_FEATURE_SIMD32, target_32bit_p);
+
+ /* TARGET_IDIV  */
+  cpp_def_or_undef (in, __ARM_ARCH_EXT_IDIV__,
+		((!thumb_p  arm_arch_arm_hwdiv)
+		 || (thumb2_p  arm_arch_thumb_hwdiv)));
+
+  cpp_def_or_undef (in, __ARM_FEATURE_IDIV,
+		((!thumb_p  arm_arch_arm_hwdiv)
+		 || (thumb2_p  arm_arch_thumb_hwdiv)));
+
+ if (arm_feature_ldrex)
+   cpp_define_formatted (in, __ARM_FEATURE_LDREX=%d, arm_feature_ldrex);
+ else
+   cpp_undef (in, __ARM_FEATURE_LDREX);
+
+ cpp_def_or_undef (in, __ARM_FEATURE_CLZ,
+		   ((TARGET_ARM_ARCH = 5  !thumb_p) || TARGET_ARM_ARCH_ISA_THUMB =2));
+
+ cpp_def_or_undef (in, __ARM_ASM_SYNTAX_UNIFIED__, inline_asm_unified);
+}
+
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm.h c/gcc/gcc/config/arm/arm.h
--- b/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:50.0 +0100
+++ c/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:37.0 +0100
@@ -48,29 +48,12 @@ extern char arm_arch_name[];
 #define TARGET_CPU_CPP_BUILTINS()			\
   do			\
 {			\
-	if (TARGET_DSP_MULTIPLY)			\
-	   builtin_define (__ARM_FEATURE_DSP);	\
-if (TARGET_ARM_QBIT)\
-   builtin_define (__ARM_FEATURE_QBIT);	\
-if (TARGET_ARM_SAT)\
-   builtin_define (__ARM_FEATURE_SAT);	\
 if (TARGET_CRYPTO)\
 	   builtin_define (__ARM_FEATURE_CRYPTO);	\
 	if (unaligned_access)\
 	  builtin_define (__ARM_FEATURE_UNALIGNED);	\
 	if (TARGET_CRC32)\
 	  builtin_define (__ARM_FEATURE_CRC32);	\
-	if (TARGET_32BIT)\
-	  builtin_define (__ARM_32BIT_STATE);		\
-	if (TARGET_ARM_FEATURE_LDREX)\
-	  builtin_define_with_int_value (			\
-	__ARM_FEATURE_LDREX, TARGET_ARM_FEATURE_LDREX);	\
-	if ((TARGET_ARM_ARCH = 5  !TARGET_THUMB)		\
-	 || TARGET_ARM_ARCH_ISA_THUMB =2)			\
-	  builtin_define (__ARM_FEATURE_CLZ);			\
-	if (TARGET_INT_SIMD)	\
-	  builtin_define (__ARM_FEATURE_SIMD32);		\
-\
 	builtin_define_with_int_value (\
 	  __ARM_SIZEOF_MINIMAL_ENUM,\
 	  flag_short_enums ? 1 : 4);\
@@ -89,10 +72,6 @@ extern char arm_arch_name[];
 	if (arm_arch_notm)\
 	  builtin_define (__ARM_ARCH_ISA_ARM);	\
 	builtin_define (__APCS_32__);			\
-	if (TARGET_THUMB)\
-	  builtin_define (__thumb__);			\
-	if (TARGET_THUMB2)\
-	  builtin_define (__thumb2__);		\
 	if (TARGET_ARM_ARCH_ISA_THUMB)			\
 	  builtin_define_with_int_value (		\
 	__ARM_ARCH_ISA_THUMB

[PATCH, ARM] attribute target (thumb,arm) [3/6]

2014-11-19 Thread Christian Bruel
Re-implement ARM_DECLARE_FUNCTION_NAME as a function. That will make 
changed related to unified/divided and mode directives easier to insert.


Thanks

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm-protos.h (arm_declare_function_name): Declare.
	(is_called_in_ARM_mode): Remove.
	* config/arm/arm.c (is_called_in_ARM_mode): Declare static bool.
	(arm_declare_function_name): Moved from from ARM_DECLARE_FUNCTION_NAME.
	* config/arm/arm.h (ARM_DECLARE_FUNCTION_NAME): Call
	 arm_declare_function_name.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.c d/gcc/gcc/config/arm/arm.c
--- c/gcc/gcc/config/arm/arm.c	2014-11-18 08:52:48.0 +0100
+++ d/gcc/gcc/config/arm/arm.c	2014-11-18 08:51:50.0 +0100
@@ -26422,6 +26422,23 @@ thumb_pop (FILE *f, unsigned long mask)
   fprintf (f, }\n);
 }
 
+/* Return nonzero if FUNC must be entered in ARM mode.  */
+static bool
+is_called_in_ARM_mode (tree func)
+{
+  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
+
+  /* Ignore the problem about functions whose address is taken.  */
+  if (TARGET_CALLEE_INTERWORKING  TREE_PUBLIC (func))
+return true;
+
+#ifdef ARM_PE
+  return lookup_attribute (interfacearm, DECL_ATTRIBUTES (func)) != NULL_TREE;
+#else
+  return false;
+#endif
+}
+
 /* Generate code to return from a thumb function.
If 'reg_containing_return_addr' is -1, then the return address is
actually on the stack, at the stack pointer.  */
@@ -26857,22 +26874,6 @@ thumb_far_jump_used_p (void)
   return 0;
 }
 
-/* Return nonzero if FUNC must be entered in ARM mode.  */
-int
-is_called_in_ARM_mode (tree func)
-{
-  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
-
-  /* Ignore the problem about functions whose address is taken.  */
-  if (TARGET_CALLEE_INTERWORKING  TREE_PUBLIC (func))
-return TRUE;
-
-#ifdef ARM_PE
-  return lookup_attribute (interfacearm, DECL_ATTRIBUTES (func)) != NULL_TREE;
-#else
-  return FALSE;
-#endif
-}
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
many additional registers to push instead of subtracting a constant
@@ -32386,6 +32387,25 @@ arm_is_constant_pool_ref (rtx x)
 	   CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)));
 }
 
+void
+arm_declare_function_name (FILE *stream, const char *name, tree decl)
+{
+  if (TARGET_THUMB)
+{
+  if (is_called_in_ARM_mode (decl)
+	  || (TARGET_THUMB1  !TARGET_THUMB1_ONLY
+	   cfun-is_thunk))
+	fprintf (stream, \t.code 32\n);
+  else if (TARGET_THUMB1)
+	fprintf (stream, \t.code\t16\n\t.thumb_func\n);
+  else
+	fprintf (stream, \t.thumb\n\t.thumb_func\n);
+}
+
+  if (TARGET_POKE_FUNCTION_NAME)
+arm_poke_function_name (stream, (const char *) name);
+}
+
 /* If MEM is in the form of [base+offset], extract the two parts
of address and set to BASE and OFFSET, otherwise return false
after clearing BASE and OFFSET.  */
@@ -32506,4 +32526,5 @@ arm_sched_fusion_priority (rtx_insn *ins
   *pri = tmp;
   return;
 }
+
 #include gt-arm.h
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.h d/gcc/gcc/config/arm/arm.h
--- c/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:37.0 +0100
+++ d/gcc/gcc/config/arm/arm.h	2014-11-13 12:19:45.0 +0100
@@ -2184,23 +2184,7 @@ extern int making_const_table;
? 1 : 0)
 
 #define ARM_DECLARE_FUNCTION_NAME(STREAM, NAME, DECL) 	\
-  do			\
-{			\
-  if (TARGET_THUMB) \
-{		\
-  if (is_called_in_ARM_mode (DECL)		\
-	  || (TARGET_THUMB1  !TARGET_THUMB1_ONLY	\
-		   cfun-is_thunk))	\
-fprintf (STREAM, \t.code 32\n) ;		\
-  else if (TARGET_THUMB1)			\
-   fprintf (STREAM, \t.code\t16\n\t.thumb_func\n) ;	\
-  else		\
-   fprintf (STREAM, \t.thumb\n\t.thumb_func\n) ;	\
-}		\
-  if (TARGET_POKE_FUNCTION_NAME)			\
-arm_poke_function_name (STREAM, (const char *) NAME);	\
-}			\
-  while (0)
+  arm_declare_function_name ((STREAM), (NAME), (DECL));
 
 /* For aliases of functions we use .thumb_set instead.  */
 #define ASM_OUTPUT_DEF_FROM_DECLS(FILE, DECL1, DECL2)		\
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm-protos.h d/gcc/gcc/config/arm/arm-protos.h
--- c/gcc/gcc/config/arm/arm-protos.h	2014-11-13 11:16:17.0 +0100
+++ d/gcc/gcc/config/arm/arm-protos.h	2014-11-13 12:21:36.0 +0100
@@ -30,6 +30,7 @@ extern void arm_load_pic_register (unsig
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
+extern void arm_declare_function_name (FILE *, const char *, tree);
 extern void thumb2_expand_return (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
@@ -178,9 +179,6 @@ extern const char *thumb1_unexpanded_epi
 extern void

[PATCH, ARM] attribute target (thumb,arm) [4/6]

2014-11-19 Thread Christian Bruel

Implements and document the hooks to support target_attributes.

The emission of blx is handled directly for armv5 to overcome a bug with 
the current binutils that fails with calls to a static symbol in a 
different section. (e.g .text - .text.startup) in different modes.

(ref https://sourceware.org/bugzilla/show_bug.cgi?id=17505)

Regtests included

Thanks

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.opt (mthumb): Save.
	* config/arm/arm.h (arm_valid_target_attribute_tree): Declare.
	(arm_reset_previous_fndecl, arm_change_mode_p): Likewise.
	(SWITCHABLE_TARGET): Define.
	* config/arm/arm.c (arm_reset_previous_fndecl): New functions.
	(arm_valid_target_attribute_tree, arm_change_mode_p): Likewise.
	(arm_valid_target_attribute_p): Likewise.
	(arm_set_current_function, arm_can_inline_p): Likewise.
	(arm_valid_target_attribute_rec): Likewise.
	(arm_previous_fndecl): New variable.
	(TARGET_SET_CURRENT_FUNCTION, TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
	(TARGET_CAN_INLINE_P): Define.
	(arm_asm_trampoline_template): Emit mode.
	(arm_file_start): Don't set unified syntax.
	(arm_declare_function_name): Set unified syntax and mode.
	(arm_option_override): Init target_option_default_node.
	and target_option_current_node.
	* config/arm/arm.md (*call_value_symbol): Set mode when possible.
	(*call_symbol): Likewise.
	* doc/extend.texi: Document ARM target and pragma attribute.
	* doc/invoke.texi: Likewise.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN d/gcc/gcc/config/arm/arm.c e/gcc/gcc/config/arm/arm.c
--- d/gcc/gcc/config/arm/arm.c	2014-11-18 08:51:50.0 +0100
+++ e/gcc/gcc/config/arm/arm.c	2014-11-18 09:05:57.0 +0100
@@ -80,6 +80,7 @@
 #include opts.h
 #include dumpfile.h
 #include gimple-expr.h
+#include target-globals.h
 #include builtins.h
 #include tm-constrs.h
 #include rtl-iter.h
@@ -258,6 +259,9 @@ static tree arm_build_builtin_va_list (v
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static void arm_set_current_function (tree);
+static bool arm_can_inline_p (tree, tree);
+static bool arm_valid_target_attribute_p (tree, tree, tree, int);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
@@ -400,6 +404,9 @@ static const struct attribute_spec arm_a
 #undef  TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE arm_output_function_epilogue
 
+#undef TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P arm_can_inline_p
+
 #undef  TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE arm_option_override
 
@@ -412,6 +419,12 @@ static const struct attribute_spec arm_a
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION arm_set_current_function
+
+#undef TARGET_OPTION_VALID_ATTRIBUTE_P
+#define TARGET_OPTION_VALID_ATTRIBUTE_P arm_valid_target_attribute_p
+
 #undef TARGET_SCHED_REORDER
 #define TARGET_SCHED_REORDER arm_sched_reorder
 
@@ -3205,6 +3218,11 @@ arm_option_override (void)
 
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
+
+  /* Save the initial options in case the user does function specific
+ options.  */
+  target_option_default_node = target_option_current_node
+= build_target_option_node (global_options);
 }
 
 static void
@@ -3358,13 +3376,20 @@ arm_warn_func_return (tree decl)
 static void
 arm_asm_trampoline_template (FILE *f)
 {
+  if (TARGET_UNIFIED_ASM)
+fprintf (f, \t.syntax unified\n);
+  else
+fprintf (f, \t.syntax divided\n);
+
   if (TARGET_ARM)
 {
+  fprintf (f, \t.arm\n);
   asm_fprintf (f, \tldr\t%r, [%r, #0]\n, STATIC_CHAIN_REGNUM, PC_REGNUM);
   asm_fprintf (f, \tldr\t%r, [%r, #0]\n, PC_REGNUM, PC_REGNUM);
 }
   else if (TARGET_THUMB2)
 {
+  fprintf (f, \t.thumb\n);
   /* The Thumb-2 trampoline is similar to the arm implementation.
 	 Unlike 16-bit Thumb, we enter the stub in thumb mode.  */
   asm_fprintf (f, \tldr.w\t%r, [%r, #4]\n,
@@ -26874,6 +26899,23 @@ thumb_far_jump_used_p (void)
   return 0;
 }
 
+/* Check that FUNC is called with a different mode.  */
+
+bool
+arm_change_mode_p (tree func)
+{
+  if (TREE_CODE (func) != FUNCTION_DECL)
+return false;
+
+  tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (func);
+
+  if (!callee_tree)
+callee_tree = target_option_default_node;
+
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+
+  return (TREE_TARGET_THUMB (callee_opts) != TARGET_THUMB);
+}
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
many additional registers to push instead of subtracting a constant
@@ -28448,9 +28490,6 @@ arm_file_start (void)
 {
   int val

[PATCH, ARM] attribute target (thumb,arm) [5/6]

2014-11-19 Thread Christian Bruel

Implements the hooks for #pragma GCC target

A test included to check that macros were correctly defined/undefined on 
pragma regions.


Thanks

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.h (REGISTER_TARGET_PRAGMAS):
	 Call arm_register_target_pragmas.
	* config/arm/arm-protos.h (arm_register_target_pragmas): Declare.
	* config/arm/arm-c.c (arm_register_target_pragmas): New function.
	(arm_pragma_target_parse): Likewise.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-c.c f/gcc/gcc/config/arm/arm-c.c
--- e/gcc/gcc/config/arm/arm-c.c	2014-11-13 13:03:27.0 +0100
+++ f/gcc/gcc/config/arm/arm-c.c	2014-11-13 13:53:45.0 +0100
@@ -20,9 +20,12 @@
 #include system.h
 #include coretypes.h
 #include tm.h
-#include tm_p.h
 #include tree.h
+#include tm_p.h
 #include c-family/c-common.h
+#include target.h
+#include target-def.h
+#include c-family/c-pragma.h
 
 /* Output C specific EABI object attributes.  These can not be done in
arm.c because they require information from the C frontend.  */
@@ -112,3 +115,78 @@ arm_cpp_builtins (struct cpp_reader *in,
  cpp_def_or_undef (in, __ARM_ASM_SYNTAX_UNIFIED__, inline_asm_unified);
 }
 
+/* Hook to validate the current #pragma GCC target and set the arch custom
+   mode state.  If ARGS is NULL, then POP_TARGET is used to reset
+   the options.  */
+static bool
+arm_pragma_target_parse (tree args, tree pop_target)
+{
+  tree prev_tree = build_target_option_node (global_options);
+  tree cur_tree;
+  struct cl_target_option *prev_opt;
+  struct cl_target_option *cur_opt;
+  bool cur_mode, prev_mode;
+
+  if (! args)
+{
+  cur_tree = ((pop_target) ? pop_target : target_option_default_node);
+  cl_target_option_restore (global_options,
+TREE_TARGET_OPTION (cur_tree));
+}
+  else
+{
+  cur_tree = arm_valid_target_attribute_tree (args,  global_options);
+  if (cur_tree == NULL_TREE)
+	{
+	  cl_target_option_restore (global_options,
+TREE_TARGET_OPTION (prev_tree));
+	  return false;
+	}
+}
+
+  target_option_current_node = cur_tree;
+  arm_reset_previous_fndecl ();
+
+  /* Figure out the previous mode.  */
+  prev_opt  = TREE_TARGET_OPTION (prev_tree);
+  cur_opt   = TREE_TARGET_OPTION (cur_tree);
+
+  gcc_assert (prev_opt);
+  gcc_assert (cur_opt);
+
+  cur_mode = TARGET_THUMB_P (cur_opt-x_target_flags);
+  prev_mode = TARGET_THUMB_P (prev_opt-x_target_flags);
+
+  if (prev_mode != cur_mode)
+{
+  /* For the definitions, ensure all newly defined macros are considered
+	 as used for -Wunused-macros.  There is no point warning about the
+	 compiler predefined macros.  */
+  cpp_options *cpp_opts = cpp_get_options (parse_in);
+  unsigned char saved_warn_unused_macros = cpp_opts-warn_unused_macros;
+  cpp_opts-warn_unused_macros = 0;
+
+  /* Update macros.  */
+  arm_cpp_builtins (parse_in, cur_mode);
+
+  cpp_opts-warn_unused_macros = saved_warn_unused_macros;
+}
+
+  return true;
+}
+
+/* Register target pragmas.  We need to add the hook for parsing #pragma GCC
+   option here rather than in arm.c since it will pull in various preprocessor
+   functions, and those are not present in languages like fortran without a
+   preprocessor.  */
+
+void
+arm_register_target_pragmas (void)
+{
+  /* Update pragma hook to allow parsing #pragma GCC target.  */
+  targetm.target_option.pragma_parse = arm_pragma_target_parse;
+
+#ifdef REGISTER_SUBTARGET_PRAGMAS
+  REGISTER_SUBTARGET_PRAGMAS ();
+#endif
+}
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm.h f/gcc/gcc/config/arm/arm.h
--- e/gcc/gcc/config/arm/arm.h	2014-11-13 13:48:59.0 +0100
+++ f/gcc/gcc/config/arm/arm.h	2014-11-13 13:48:01.0 +0100
@@ -2097,7 +2097,8 @@ extern int making_const_table;
   c_register_pragma (0, long_calls, arm_pr_long_calls);		\
   c_register_pragma (0, no_long_calls, arm_pr_no_long_calls);		\
   c_register_pragma (0, long_calls_off, arm_pr_long_calls_off);	\
-  arm_lang_object_attributes_init(); \
+  arm_lang_object_attributes_init();	\
+  arm_register_target_pragmas();   \
 } while (0)
 
 /* Condition code information.  */
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-protos.h f/gcc/gcc/config/arm/arm-protos.h
--- e/gcc/gcc/config/arm/arm-protos.h	2014-11-13 13:07:47.0 +0100
+++ f/gcc/gcc/config/arm/arm-protos.h	2014-11-13 13:49:50.0 +0100
@@ -306,6 +306,7 @@ extern const char *arm_rewrite_selected_
 
 /* Defined in gcc/common/config/arm-c.c.  */
 extern void arm_lang_object_attributes_init(void);
+extern void arm_register_target_pragmas (void);
 extern void arm_cpp_builtins (struct cpp_reader *, bool);
 
 extern bool arm_is_constant_pool_ref (rtx);
2014-09-23  Christian Bruel  christian.br...@st.com

	* gcc.target/arm

[PATCH, ARM] attribute target (thumb,arm) [6/6]

2014-11-19 Thread Christian Bruel
Implement the -mflip-thump option. Undocumented for internal testing 
only. This option artificially inserts alternative attribute thumb/modes 
on functions.


This close the patch set. Thanks for your review,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.c (add_attribute, arm_insert_attributes): New functions
	(TARGET_INSERT_ATTRIBUTES): Define.
	(thumb_flipper): New var.
	* config/arm/arm.opt (-mflip-thumb): New switch.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.c g/gcc/gcc/config/arm/arm.c
--- f/gcc/gcc/config/arm/arm.c	2014-11-18 13:16:44.0 +0100
+++ g/gcc/gcc/config/arm/arm.c	2014-11-19 14:04:22.0 +0100
@@ -218,6 +218,7 @@ static void arm_encode_section_info (tre
 
 static void arm_file_end (void);
 static void arm_file_start (void);
+static void arm_insert_attributes (tree, tree *);
 
 static void arm_setup_incoming_varargs (cumulative_args_t, machine_mode,
 	tree, int *, int);
@@ -370,6 +371,9 @@ static const struct attribute_spec arm_a
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
+#undef  TARGET_INSERT_ATTRIBUTES
+#define TARGET_INSERT_ATTRIBUTES arm_insert_attributes
+
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START arm_file_start
 #undef TARGET_ASM_FILE_END
@@ -29403,6 +29407,56 @@ arm_valid_target_attribute_tree (tree ar
   return t;
 }
 
+/* True if -mflip-thumb should next add an attribute for the default
+   mode, false if it should next add an attribute for the opposite mode.  */
+static GTY(()) bool thumb_flipper = TARGET_THUMB;
+
+static void 
+add_attribute  (const char * mode, tree *attributes)
+{
+  size_t len = strlen (mode);
+  tree value = build_string (len, mode);
+
+  TREE_TYPE (value) = build_array_type (char_type_node,
+	build_index_type (size_int (len)));
+
+  *attributes = tree_cons (get_identifier (target),
+			   build_tree_list (NULL_TREE, value),
+			   *attributes);
+}
+
+/* For testing. Insert thumb or arm modes alternatively on functions.  */
+
+static void
+arm_insert_attributes (tree fndecl, tree * attributes)
+{
+  const char *mode;
+
+  if (TREE_CODE (fndecl) != FUNCTION_DECL || DECL_EXTERNAL(fndecl)
+  || DECL_BUILT_IN (fndecl) || DECL_ARTIFICIAL (fndecl))
+   return;
+
+  /* Nested definitions must inherit mode.  */
+  if (current_function_decl)
+   {
+ mode = TARGET_THUMB ? thumb : arm;  
+ add_attribute (mode, attributes);
+ return;
+   }
+
+  if (! TARGET_FLIP_THUMB)
+return;
+
+  /* If there is already a setting don't change it.  */
+  if (lookup_attribute (target, *attributes) != NULL)
+return;
+
+  mode = thumb_flipper ? thumb : arm;
+  add_attribute (mode, attributes);
+
+  thumb_flipper = !thumb_flipper;
+}
+
 /* Hook to validate attribute((target(string))).  */
 
 static bool
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.opt g/gcc/gcc/config/arm/arm.opt
--- f/gcc/gcc/config/arm/arm.opt	2014-11-13 14:05:34.0 +0100
+++ g/gcc/gcc/config/arm/arm.opt	2014-11-19 13:59:46.0 +0100
@@ -122,6 +122,10 @@ Enum(float_abi_type) String(softfp) Valu
 EnumValue
 Enum(float_abi_type) String(hard) Value(ARM_FLOAT_ABI_HARD)
 
+mflip-thumb
+Target Report Var(TARGET_FLIP_THUMB)
+Switch ARM/Thumb modes on alternating functions for compiler testing
+
 mfp16-format=
 Target RejectNegative Joined Enum(arm_fp16_format_type) Var(arm_fp16_format) Init(ARM_FP16_FORMAT_NONE)
 Specify the __fp16 floating-point format


Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Christian Bruel



On 11/19/2014 03:18 PM, Ramana Radhakrishnan wrote:

On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel christian.br...@st.com wrote:


I think I missed the stage3, Anyway would it be OK for stage1 when it
reopens ?


Since you submitted this well during stage1 and given that these
patches address comments from earlier in the review process we should
aim to get these in for 5.0. If at some point during the review
process it looks risky and there needs to be significant rework we can
always stop. It's on the list of patches to be reviewed and I will
find some dedicated time later this week to set down and review / play
with the patches in an attempt to move this forward as it is a
reasonably large chunk of work.


Thanks, also I forgot to mention that you need
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01231.html
to play with the attribute. Will be part of the same shot.

Christian



Thanks for continuing to work on these patches and addressing the
earlier review comments.

regards
Ramana



Christian



Re: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target

2014-11-07 Thread Christian Bruel

hi,

the ARM bootstrap seems to fail for libgcc2.c on the thumb multilib for 
libgcc2: muldi3 -mthumb -O2  -g


/tmp/ccYrycUw.s: Assembler messages:
/tmp/ccYrycUw.s:69: Error: MOV Rd, Rs with two low registers is not 
permitted on this architecture -- `mov r6,r7'


preprocessed attached.

Thanks

Christian


typedef int ptrdiff_t;
typedef unsigned int size_t;
typedef unsigned int wchar_t;
typedef struct {
  long long __max_align_ll __attribute__((__aligned__(__alignof__(long long;
  long double __max_align_ld __attribute__((__aligned__(__alignof__(long 
double;
} max_align_t;
extern void *malloc (size_t);
extern void free (void *);
extern int atexit (void (*)(void));
extern void abort (void) __attribute__ ((__noreturn__));
extern size_t strlen (const char *);
extern void *memcpy (void *, const void *, size_t);
extern void *memset (void *, int, size_t);
typedef unsigned int hashval_t;
typedef hashval_t (*htab_hash) (const void *);
typedef int (*htab_eq) (const void *, const void *);
typedef void (*htab_del) (void *);
typedef int (*htab_trav) (void **, void *);
typedef void *(*htab_alloc) (size_t, size_t);
typedef void (*htab_free) (void *);
typedef void *(*htab_alloc_with_arg) (void *, size_t, size_t);
typedef void (*htab_free_with_arg) (void *, void *);
struct htab {
  htab_hash hash_f;
  htab_eq eq_f;
  htab_del del_f;
  void ** entries;
  size_t size;
  size_t n_elements;
  size_t n_deleted;
  unsigned int searches;
  unsigned int collisions;
  htab_alloc alloc_f;
  htab_free free_f;
  void * alloc_arg;
  htab_alloc_with_arg alloc_with_arg_f;
  htab_free_with_arg free_with_arg_f;
  unsigned int size_prime_index;
};
typedef struct htab *htab_t;
enum insert_option {NO_INSERT, INSERT};
extern htab_t htab_create_alloc (size_t, htab_hash,
htab_eq, htab_del,
htab_alloc, htab_free);
extern htab_t htab_create_alloc_ex (size_t, htab_hash,
  htab_eq, htab_del,
  void *, htab_alloc_with_arg,
  htab_free_with_arg);
extern htab_t htab_create_typed_alloc (size_t, htab_hash, htab_eq, htab_del,
 htab_alloc, htab_alloc, htab_free);
extern htab_t htab_create (size_t, htab_hash, htab_eq, htab_del);
extern htab_t htab_try_create (size_t, htab_hash, htab_eq, htab_del);
extern void htab_set_functions_ex (htab_t, htab_hash,
   htab_eq, htab_del,
   void *, htab_alloc_with_arg,
   htab_free_with_arg);
extern void htab_delete (htab_t);
extern void htab_empty (htab_t);
extern void * htab_find (htab_t, const void *);
extern void ** htab_find_slot (htab_t, const void *, enum insert_option);
extern void * htab_find_with_hash (htab_t, const void *, hashval_t);
extern void ** htab_find_slot_with_hash (htab_t, const void *,
   hashval_t, enum insert_option);
extern void htab_clear_slot (htab_t, void **);
extern void htab_remove_elt (htab_t, void *);
extern void htab_remove_elt_with_hash (htab_t, void *, hashval_t);
extern void htab_traverse (htab_t, htab_trav, void *);
extern void htab_traverse_noresize (htab_t, htab_trav, void *);
extern size_t htab_size (htab_t);
extern size_t htab_elements (htab_t);
extern double htab_collisions (htab_t);
extern htab_hash htab_hash_pointer;
extern htab_eq htab_eq_pointer;
extern hashval_t htab_hash_string (const void *);
extern hashval_t iterative_hash (const void *, size_t, hashval_t);
extern int filename_cmp (const char *s1, const char *s2);
extern int filename_ncmp (const char *s1, const char *s2,
 size_t n);
extern hashval_t filename_hash (const void *s);
extern int filename_eq (const void *s1, const void *s2);
struct _dont_use_rtx_here_;
struct _dont_use_rtvec_here_;
struct _dont_use_rtx_insn_here_;
union _dont_use_tree_here_;
enum function_class {
  function_c94,
  function_c99_misc,
  function_c99_math_complex,
  function_sincos,
  function_c11_misc
};
enum memmodel
{
  MEMMODEL_RELAXED = 0,
  MEMMODEL_CONSUME = 1,
  MEMMODEL_ACQUIRE = 2,
  MEMMODEL_RELEASE = 3,
  MEMMODEL_ACQ_REL = 4,
  MEMMODEL_SEQ_CST = 5,
  MEMMODEL_LAST = 6
};
typedef void (*gt_pointer_operator) (void *, void *);
typedef unsigned char uchar;
enum debug_info_type
{
  NO_DEBUG,
  DBX_DEBUG,
  SDB_DEBUG,
  DWARF2_DEBUG,
  XCOFF_DEBUG,
  VMS_DEBUG,
  VMS_AND_DWARF2_DEBUG
};
enum debug_info_levels
{
  DINFO_LEVEL_NONE,
  DINFO_LEVEL_TERSE,
  DINFO_LEVEL_NORMAL,
  DINFO_LEVEL_VERBOSE
};
enum debug_info_usage
{
  DINFO_USAGE_DFN,
  DINFO_USAGE_DIR_USE,
  DINFO_USAGE_IND_USE,
  DINFO_USAGE_NUM_ENUMS
};
enum debug_struct_file
{
  DINFO_STRUCT_FILE_NONE,
  DINFO_STRUCT_FILE_BASE,
  DINFO_STRUCT_FILE_SYS,
  DINFO_STRUCT_FILE_ANY
};
enum symbol_visibility
{
  VISIBILITY_DEFAULT,
  VISIBILITY_PROTECTED,
  VISIBILITY_HIDDEN,
  VISIBILITY_INTERNAL
};
enum ivar_visibility
{
  IVAR_VISIBILITY_PRIVATE,
  IVAR_VISIBILITY_PROTECTED,
  

[PATCH, DWARF] re-init dw_frame_pointer_regnum between functions

2014-10-14 Thread Christian Bruel
Hello,

ARM and Thumb modes use different hard_frame_pointer_regnum ABIs. The
problem is that dwarf2cfi.c:dw_frame_pointer_regnum cache is initialized
only once per file, when creating the CIE. 
While testing the ARM attribute target to switch modes between
functions, I got a few assertion with -g, because this value gets
inconsistent with the respective FDEs that have different
hard_frame_pointer_rtx...

The snippet from dwarf2cfi.c illustrates the potential issue with the
mismatch between hard_frame_pointer_rtx and a badly set CFA register :

 if (dest == hard_frame_pointer_rtx)
   ...
  cur_cfa-reg = dw_frame_pointer_regnum;
  ...

I'm not aware of other targets giving the possibility to change the
frame_pointer_regnum ABI in a file, so the issue will only be show up
with the ARM target attribute. However I'd like very much your feedback
on this change, before I can send the remaining ARM parts.

Tested manually for arm-none-eabi with gdb, unwinding and frame access
seem OK when mixing modes.
x86 bootstrapped and regressions tests are running.

Many thanks,

Christian





2014-09-23  Christian Bruel  christian.br...@st.com

	* execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each function.

Index: dwarf2cfi.c
===
--- dwarf2cfi.c	(revision 216146)
+++ dwarf2cfi.c	(working copy)
@@ -2860,7 +2860,6 @@
   dw_trace_info cie_trace;
 
   dw_stack_pointer_regnum = DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM);
-  dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM);
 
   memset (cie_trace, 0, sizeof (cie_trace));
   cur_trace = cie_trace;
@@ -2913,6 +2912,9 @@
 static unsigned int
 execute_dwarf2_frame (void)
 {
+  /* Different HARD_FRAME_POINTER_REGNUM might coexist in the same file.  */
+  dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM);
+
   /* The first time we're called, compute the incoming frame state.  */
   if (cie_cfi_vec == NULL)
 create_cie_data ();


Re: [PATCH, ARM] attribute target (thumb,arm)

2014-10-10 Thread Christian Bruel

On 10/09/2014 04:11 PM, Richard Earnshaw wrote:
 On 09/10/14 12:35, Christian Bruel wrote:
 On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
 Hi Christian,
  snipped agreed stuf 
 3) about inlining
I dislike inlining different modes, From a conceptual use, a user
 might want to switch mode only when changing a function's hotness.
 Usually inlining a cold function into a hot one is not what the user
 explicitly required when setting different mode attributes for them,
 __attribute__((thumb)) should not imply coldness or hotness. Inlining 
 between cold and hot functions should be done based on profile feedback. 
 The choice of compiling in Thumb1 state for coldness is a separate one 
 because that's where the choice needs to be made.
 Ideally yes. but I think that a user motivation to use target attribute
 ((thumb) is to reduce code size even in the cases where PFO is not
 available (libraries, kernel or user application build spec packaged
 without profile data). And there are cases where static probabilities
 are not enough and that a user wants it own control with gprof or oprofile.
 But in this case, we could point to the __attribute__ ((cold)) on the
 function ? That would probably be the best workaround to propose if we
 recommend this

 Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
 that.  The days when ARM=fast, thumb=small are in the past now, and
 thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
 can be faster than larger ARM code simply because you can get more of it
 in the cache.  The use of arm vs thumb is likely to be much more subtle now.

I'm also very interested by this. From my last bench session, ARM mode
could bring a speedup (from noise to 5/6%) but with a very big size
penalty.  So I believe there is room for fine tuning at application
level, and I agree this is very subtle and difficult  this is another
topic. (That was with a GCC 4.8, maybe the gap has reduced since).


 But here is another scenario: Using of attribute ((arm)) for exception
 entry points is indeed not related to hotness. But consider a typical
 thumb binary with an entry point in arm compiled in C (ex handler, a
 kernel...). Today due to the file boundary the thumb part is not inlined
 into the arm point. (Using -flto is not possible because the whole
 gimple would be thumb).

 We have no-inline attributes for scenarios like that.  I don't think a
 specific use case should dominate other cases.

That's severe, no-inline attribute would disable inlining same modes !


 Now, using attribute ((target)) for the functions others than the
 entry point, with your approach they would all be inlined (assuming the
 cost allow this) and we would end up with a arm binary instead of a
 thumb binary...

 But there are still 3 points  :

 - At least 2 other target (i386, Powerpc) that support attribute_target
 disable inlining between modes that are not subsets. I like to think
 about homogeneity between targets and I find odd to have different
 inlining rules...

 That's because use of specific instructions must not be allowed to leak
 past a gating check that is in the caller.  It would be a disaster if a
 function that used a neon register, for example, was allowed to leak
 into code that might run on a target with no Neon register file.  The
 ARM/thumb distinction shouldn't, by default, be limited in that manner.

 I believe inlining could happen from a subset of the archtiecture into a
 function using a superset, just not vice-versa.

I'm afraid I misunderstand this, Do you want inlining from Thumb to a
function using ARM because you consider thumb to be a subset of ARM ?
You know better that I but I never thought that, or maybe there is
something to do with the unified assembler ?

In this case I see the problem under a new light. With the unified
assembly, indeed we could inline from any mode as long as no divide mode
asm inside.


 - Scanning the function body to check for ASM_INPUT does not look very
 elegant (if this matters) because the asm could well be unrelated

 The only case when it will always be a win to inline thumb into arm is
 when the cost of the inlined body is less than a BX instruction (but
 still, with branch prediction this cost is pondered).

 One of the problems with not inlining is that the C++ abstraction
 penalty is likely to shoot up.  There will be many major lost
 optimization opportunities if we start down that path.

I would never consider users to use extensively this attribute on
inlined member functions. But I take your point

 So I think inlining should only be disabled if there's some technical
 reason why it should be disabled, not because of some 'it might not
 always be ideal' feelings.  Furthermore, we should expect users to use
 the other attributes consistently when they expect specific behaviours
 to occur.

Sure, me also I would have preferred objective benchmarks results, but
its a little bit early to have experience

Re: [PATCH, ARM] attribute target (thumb,arm)

2014-10-09 Thread Christian Bruel

On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
 Hi Christian,

 snipped agreed stuf 
 3) about inlining
I dislike inlining different modes, From a conceptual use, a user
 might want to switch mode only when changing a function's hotness.
 Usually inlining a cold function into a hot one is not what the user
 explicitly required when setting different mode attributes for them,
 __attribute__((thumb)) should not imply coldness or hotness. Inlining 
 between cold and hot functions should be done based on profile feedback. 
 The choice of compiling in Thumb1 state for coldness is a separate one 
 because that's where the choice needs to be made.

Ideally yes. but I think that a user motivation to use target attribute
((thumb) is to reduce code size even in the cases where PFO is not
available (libraries, kernel or user application build spec packaged
without profile data). And there are cases where static probabilities
are not enough and that a user wants it own control with gprof or oprofile.
But in this case, we could point to the __attribute__ ((cold)) on the
function ? That would probably be the best workaround to propose if we
recommend this

But here is another scenario: Using of attribute ((arm)) for exception
entry points is indeed not related to hotness. But consider a typical
thumb binary with an entry point in arm compiled in C (ex handler, a
kernel...). Today due to the file boundary the thumb part is not inlined
into the arm point. (Using -flto is not possible because the whole
gimple would be thumb).

Now, using attribute ((target)) for the functions others than the
entry point, with your approach they would all be inlined (assuming the
cost allow this) and we would end up with a arm binary instead of a
thumb binary...

But there are still 3 points  :

- At least 2 other target (i386, Powerpc) that support attribute_target
disable inlining between modes that are not subsets. I like to think
about homogeneity between targets and I find odd to have different
inlining rules...

- Scanning the function body to check for ASM_INPUT does not look very
elegant (if this matters) because the asm could well be unrelated

The only case when it will always be a win to inline thumb into arm is
when the cost of the inlined body is less than a BX instruction (but
still, with branch prediction this cost is pondered).



 The compiler would take a decision that is not what the user wrote. And
 in addition if you consider the few instructions to modify R15 to switch
 state that would end up with more code executed in the critical path,
 voiding a possible size of speed gain.
 I do not expect there to be any additional instructions needed to switch 
 state. If function x is inlined into function y the state would be lost 
 and the state would be in terms of the state of function x.
Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
this. In this case the mode is inherited.

 Obviously if the user doesn't want inlining - the user would add 
 attributes to disable inlining. You do have extensions such as 
 __attribute__((noinline)) and __attribute__((never_inline)) to give the 
 user that control and those bits need to be used in addition.

Those attributes are overkill. They would disable inlining between
caller-callee of a same mode. This is not what we want


 The attribute then purely reflects then the output instruction state of 
 the function if a copy of it's body is laid out separately in the output.

 IMHO, the heuristics for inlining should be the best judge of when 
 functions should be inlined between one and another and we shouldn't be 
 second guessing that in the backend

 If there is a copy of the function to be put out by the compiler, only 
 then should we choose this based on the state of the target i.e. arm 
 or thumb.

Yes,

So to summarize, we can:

  1) don't inline between different modes. Same behavior with other
targets. Solves asm case
  2) always inline unless the function contains asm statements. ( I
reject adding a new compilation switch)
  3) always inline. But recommend the use of attribute ((noinline)) to
handle asm or attribute ((cold,hot)) in the absence of profile datas

I obviously prefer 1) safe and  homogenous, 3) is the worse as it
requires additional user action (poor user). 2) is less worse.

Thanks for supporting me ::)

Christian



Re: [PATCH, ARM] attribute target (thumb,arm)

2014-10-08 Thread Christian Bruel
Hi Ramana,

Thanks for your feedback. Just a few comments while you continue the review

1) about the documentation in extend.texi, it was in the  patch already
: did I miss a part ?
  * doc/extend.texi (arm, thumb): Document target attributes.
  * doc/invoke.texi (arm, thumb): Mention target attributes.

2) about supporting thumb1
   OK I'll suppress this limitation. But I covered the testing only for
thumb2 as I don't have a thumb1 platform, if it's OK with you thumb1
will only be covered by visual checking. Can you help to test this mode ?
   
3) about inlining
  I dislike inlining different modes, From a conceptual use, a user
might want to switch mode only when changing a function's hotness.
Usually inlining a cold function into a hot one is not what the user
explicitly required when setting different mode attributes for them,

The compiler would take a decision that is not what the user wrote. And
in addition if you consider the few instructions to modify R15 to switch
state that would end up with more code executed in the critical path,
voiding a possible size of speed gain.

4) about coverage.
   Thanks for your idea about a mflip like internal option for the
testsuite. I'll give it a try. Note that in the meantime I gave a few
successful tries with LTO, and I'm in the process of running a
combinatorial exploration of a set of larger benchmarcks.
   Thanks for you hint about testing the  -march=armv7em -march=armv7m
error cases. This is indeed needed.

5) I'm still not sure about what to do with TARGET_UNIFIED_ASM. In one
hand I'm reluctant to bind to this development an improvement that
should be orthogonal (or a prerequisite), in another hand I don't really
like the logic with emit_thumb. If your recommendation is to make
TARGET_UNIFIED_ASM the default for ARM that great, but I'm still
worrying for thumb1. Terry's feedback might be useful for this.

I'll resent the patch in different parts and thumb1 support. For your
inlining concern do you agree that inlining different modes might not be
mandatory (or even counter-productive) at this stage ?

Best Regards

Christian

On 10/08/2014 03:05 PM, Ramana Radhakrishnan wrote:
 Hi Christian,

   Thanks for looking at this. I will need to read the code in detail but 
 this is a first top level reivew.

 On 09/29/14 12:03, Christian Bruel wrote:
 Hi Ramana, Richard,

 This patch implements the attribute target (and pragma) to allow
 function based interworking.

 as in the updated documentation, the syntax is:

   __attribute__((target(thumb))) int foo()
 Forces thumb mode for function foo only. If the file was compiled with
 -mthumb iit has no effect.
 Indeed


 Similarly

   __attribute__((target(arm))) int foo()
 Forces arm mode for function foo. It has no effect if the file was not
 compiled with -mthumb.
 Indeed.

 and regions can be grouped together with

 #pragma GCC target (thumb)
 or
 #pragma GCC target (arm)

 a few notes
 - Inlining is allowed between functions of the same mode (compilation
 switch, #pragma and attribute)
 Why shouldn't we allow inlining between functions of ARM mode vs Thumb 
 mode ? After all the choice of mode is irrelevant at the time of 
 inlining (except possibly for inline assembler).

 Perhaps an option is to try to disable inlining in the presence of 
 inline assembler or if not gate it from a command line option.

 - 'arm_option_override' is now reorganized around
 'arm_option_override_internal' for thumb related macros
 Looks like a reasonable start - We need a couple of tests to make sure 
 that __attribute__((arm)) on a file compiled for the M profile results 
 in a syntax error. v7(e)m is Thumb2 only.

 for bonus points it would be great to get __attribute__((target)) 
 working properly in the backend. I suspect a number of the tuning flags 
 and the global architecture state needs to be moved into this as well to 
 handle cases where __attribute__((arm)) used with M profile options is 
 error'd out.

 - I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
 would avoid to switch between unified/divided asms
 I know Terry's been trying to get Thumb1 to also switch by default to 
 unified asm. So I think a lot of the logic with emit_thumb could just 
 go away. Maybe we should just consider switching ARM state to unified 
 syntax and that would be as simple as changing TARGET_UNIFIED_SYNTAX in 
 arm.h to be TARGET_32BIT. Long overdue IMHO.

 The only gotcha here is inline assembler but GAS is so permissive that 
 I'm not too worried about it in ARM state and Thumb2 state. I'm a bit 
 worried about Thumb1.


and simplify arm_declare_function_name. Should be considered at some
 point.
 I think that can be done for a lot of newer cores - some of that logic 
 is dated now IIUC.

 I remember why my original project failed - I couldn't get enough of the 
 backend in shape for the state to be saved and restored and then I moved 
 on to other more interesting things, so whatever is done here

[Patch ARM] Turn on hot cold partitioning ?

2014-10-01 Thread Christian Bruel
Hi Ramana,

Your patch https://gcc.gnu.org/ml/gcc-patches/2012-02/msg01492.html
seems to have not been applied for 4.10. Are there any stoppers or is it
an omission ?

Many Thanks

Christian



Re: [Patch ARM] Turn on hot cold partitioning ?

2014-10-01 Thread Christian Bruel
OK, thanks for the update. partitioning would be very important for my
current work so I'd like to understand what is so special with ARM that
it's the only target that can't achieve that (on the V7 at least ).
Christophe, Mathew, did you have a test case (I don't have a direct
access to the Linaro archives) ?

Thanks a lot,

Christian

On 10/01/2014 12:43 PM, Ramana Radhakrishnan wrote:
 On Wed, Oct 1, 2014 at 10:03 AM, Christian Bruel christian.br...@st.com 
 wrote:
 Hi Ramana,

 Your patch https://gcc.gnu.org/ml/gcc-patches/2012-02/msg01492.html
 seems to have not been applied for 4.10. Are there any stoppers or is it
 an omission ?
 Short answer, no, not an omission.  It could not be made to work
 properly for a few reasons. When I continued with it the problem I was
 hitting was the assumption that you can branch anywhere. IIRC using an
 indirect branch wasn't possible because you couldn't find enough
 registers because of pass ordering issues.

 Matthew Gretton-Dann and Christophe Lyon at Linaro worked on it for
 sometime and they hit other problems. There are probably enough mails
 in the archive to document this history.

 regards
 Ramana



[PATCH, ARM] attribute target (thumb,arm)

2014-09-29 Thread Christian Bruel
Hi Ramana, Richard,

This patch implements the attribute target (and pragma) to allow
function based interworking.

as in the updated documentation, the syntax is:

 __attribute__((target(thumb))) int foo()
Forces thumb mode for function foo only. If the file was compiled with
-mthumb iit has no effect.

Similarly

 __attribute__((target(arm))) int foo()
Forces arm mode for function foo. It has no effect if the file was not
compiled with -mthumb.

and regions can be grouped together with

#pragma GCC target (thumb)
or
#pragma GCC target (arm)

a few notes
- Inlining is allowed between functions of the same mode (compilation
switch, #pragma and attribute)
- 'arm_option_override' is now reorganized around
'arm_option_override_internal' for thumb related macros
- I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
would avoid to switch between unified/divided asms
  and simplify arm_declare_function_name. Should be considered at some
point.
- It is only available for Thumb2 variants (for thumb1 lack of interest
and a few complications I was unable to test, although this could be
added easily if needed, I think)

Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]

  OK for trunk ?

many thanks,

Christian









2014-09-23  Christian Bruel  christian.br...@st.com

	PR target/52144
	* config/arm/arm.opt (THUMB): Sqve target option.
	* config/arm/arm-protos.h (arm_declare_function_name, arm_valid_target_attribute_tree
	arm_register_target_pragmas, arm_reset_previous_fndecl): Declare.
	* config/arm/arm.c (arm_declare_function_name): Move here.
	add attribute target support.
	(emit_thumb): New boolean.
	(arm_file_start): Set emit_thumb mode.
	(arm_pragma_target_parse): New function.
	(arm_valid_target_attribute_p, arm_valid_target_attribute_tree,
	arm_valid_target_attribute_rec): New functions.
	(arm_can_inline_p): New function.
	(arm_set_current_function, arm_reset_previous_fndecl): New functions.
	(arm_option_override): Split.
	(arm_option_override_internal): New function.
	(TARGET_CAN_INLINE_P, TARGET_SET_CURRENT_FUNCTION,
	TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
	* config/arm/arm-c.c (arm_pragma_target_parse, arm_target_modify_macros,
	arm_pragma_target_parse, arm_register_target_pragmas): New functions.
	* config/arm/arm.h (SWITCHABLE_TARGET): Define.
	(ARM_DECLARE_FUNCTION_NAME): Call arm_declare_function_name.
	(REGISTER_TARGET_PRAGMAS): Call arm_register_target_pragma.
	(TREE_TARGET_THUMB): New macro.
	* doc/extend.texi (arm, thumb): Document target attributes.
	* doc/invoke.texi (arm, thumb): Mention target attributes.

2014-09-23  Christian Bruel  christian.br...@st.com

	PR target/52144
	* gcc.target/arm/attr_thumb.c: New test.

Index: gcc/config/arm/arm-c.c
===
--- gcc/config/arm/arm-c.c	(revision 215680)
+++ gcc/config/arm/arm-c.c	(working copy)
@@ -20,9 +20,12 @@
 #include system.h
 #include coretypes.h
 #include tm.h
-#include tm_p.h
 #include tree.h
+#include tm_p.h
 #include c-family/c-common.h
+#include target.h
+#include target-def.h
+#include c-family/c-pragma.h
 
 /* Output C specific EABI object attributes.  These can not be done in
arm.c because they require information from the C frontend.  */
@@ -42,3 +45,109 @@
 {
   arm_lang_output_object_attributes_hook = arm_output_c_attributes;
 }
+
+
+/* Define or undefine macros based on the current target.  If the user does
+   #pragma GCC target, we need to adjust the macros dynamically.  */
+
+static void
+arm_target_modify_macros (bool thumb_p)
+{
+ if (thumb_p)
+   {
+ cpp_define (parse_in, __thumb__);
+ if (arm_arch_thumb2)
+   cpp_define (parse_in, __thumb2__);
+ if (TARGET_BIG_END)
+   cpp_define (parse_in, __THUMBEB__);
+ else
+   cpp_define (parse_in, __THUMBEL__);
+   }
+ else
+   {
+ cpp_undef (parse_in, __thumb__);
+ if (arm_arch_thumb2)
+   cpp_undef (parse_in, __thumb2__);
+ if (TARGET_BIG_END)
+   cpp_undef (parse_in, __THUMBEB__);
+ else
+   cpp_undef (parse_in, __THUMBEL__);
+   }
+
+}
+
+/* Hook to validate the current #pragma GCC target and set the FPU custom
+   code option state.  If ARGS is NULL, then POP_TARGET is used to reset
+   the options.  */
+static bool
+arm_pragma_target_parse (tree args, tree pop_target)
+{
+  tree prev_tree = build_target_option_node (global_options);
+  tree cur_tree;
+  struct cl_target_option *prev_opt;
+  struct cl_target_option *cur_opt;
+  bool cur_mode, prev_mode;
+
+  if (! args)
+{
+  cur_tree = ((pop_target) ? pop_target : target_option_default_node);
+  cl_target_option_restore (global_options,
+TREE_TARGET_OPTION (cur_tree));
+}
+  else
+{
+  cur_tree = arm_valid_target_attribute_tree (args,  global_options);
+  if (cur_tree == NULL_TREE)
+	{
+	  cl_target_option_restore (global_options,
+TREE_TARGET_OPTION (prev_tree));
+	  return false;
+	}
+}
+
+  target_option_current_node = cur_tree

[PING*3][PATCH RTL] Extend mode-switching to support toggle

2014-06-30 Thread Christian Bruel
Hello,

I still miss an approval for the middle-end part of

http://gcc.gnu.org/ml/gcc-patches/2014-06/msg01038.html

thanks,

Christian



Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-12 Thread Christian Bruel
On 06/11/2014 02:00 PM, Christian Bruel wrote:
 On 06/11/2014 06:17 AM, Joern Rennecke wrote:
 Joern, is this new target macro interface OK with you ?
 Yes, this interface should allow me to do switches between rounding
 and truncating
 floating-point modes with an add/subtract immediate.

 However, the implentation, as posted, doesn't work - it causes memory
 corruption.

 It appears to work with the attached amendment patch.

 Indeed,  thanks for pointing out the bad reusing of the aux field
 between multiple entities.

 In fact rereading this part of the implementation, I find the allocation
 of aux*n_entities awkward. A simpler setting in the entity loop to carry
 the mode directly into eg-aux is possible without array allocation
 (which also fixes a memory leak by the way).


Here is the revised version fixing the aforementioned issue found by
Joern on Epiphany. It also simplifies the allocation of the aux edges
field to carry the modes.

Now that everyone agrees on the interface, is this OK for trunk ?

bootstrapped/regtested for X86 and SH4a.

thanks,

Christian






2014-06-12  Christian Bruel  christian.br...@st.com

	* mode-switching.c (struct bb_info): Add mode_out, mode_in caches.
	(make_preds_opaque): Delete.
	(clear_mode_bit, mode_bit_p, set_mode_bit): New macros.
	(commit_mode_sets): New function.
	(optimize_mode_switching): Handle current_mode to mode_switching_emit.
	Process all modes at once.
	* basic-block.h (pre_edge_lcm_avs): Declare.
	* lcm.c (pre_edge_lcm_avs): Renamed from pre_edge_lcm.
	Call clear_aux_for_edges. Fix comments.
	(pre_edge_lcm): New wrapper function to call pre_edge_lcm_avs.
	(pre_edge_rev_lcm): Idem.
	* config/epiphany/epiphany.c (emit_set_fp_mode): Add prev_mode parameter.
	* config/epiphany/epiphany-protos.h (emit_set_fp_mode): Idem.
	* config/epiphany/resolve-sw-modes.c (pass_resolve_sw_modes::execute): Idem.
	* config/i386/i386.c (x96_emit_mode_set): Idem.
	* config/sh/sh.c (sh_emit_mode_set): Likewise. Handle PR toggle.
	* config/sh/sh.md (toggle_pr): 	Defined if TARGET_FPU_SINGLE.
	(fpscr_toggle) Disallow from delay slot.
	* target.def (emit_mode_set): Add prev_mode parameter.
	* doc/tm.texi: Regenerate.

2014-06-12  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/fpchg.c: New test.

Index: gcc/basic-block.h
===
--- gcc/basic-block.h	(revision 211436)
+++ gcc/basic-block.h	(working copy)
@@ -711,6 +711,9 @@ extern void bitmap_union_of_preds (sbitmap, sbitma
 extern struct edge_list *pre_edge_lcm (int, sbitmap *, sbitmap *,
    sbitmap *, sbitmap *, sbitmap **,
    sbitmap **);
+extern struct edge_list *pre_edge_lcm_avs (int, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap **, sbitmap **);
 extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
 	   sbitmap *, sbitmap *,
 	   sbitmap *, sbitmap **,
Index: gcc/config/epiphany/epiphany-protos.h
===
--- gcc/config/epiphany/epiphany-protos.h	(revision 211436)
+++ gcc/config/epiphany/epiphany-protos.h	(working copy)
@@ -40,7 +40,8 @@ extern int epiphany_initial_elimination_offset (in
 extern void epiphany_init_expanders (void);
 extern int hard_regno_mode_ok (int regno, enum machine_mode mode);
 #ifdef HARD_CONST
-extern void emit_set_fp_mode (int entity, int mode, HARD_REG_SET regs_live);
+extern void emit_set_fp_mode (int entity, int mode, int prev_mode,
+			  HARD_REG_SET regs_live);
 #endif
 extern void epiphany_insert_mode_switch_use (rtx insn, int, int);
 extern void epiphany_expand_set_fp_mode (rtx *operands);
Index: gcc/config/epiphany/epiphany.c
===
--- gcc/config/epiphany/epiphany.c	(revision 211436)
+++ gcc/config/epiphany/epiphany.c	(working copy)
@@ -2543,7 +2543,8 @@ epiphany_mode_exit (int entity)
 }
 
 void
-emit_set_fp_mode (int entity, int mode, HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
+emit_set_fp_mode (int entity, int mode, int prev_mode ATTRIBUTE_UNUSED,
+		  HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
 {
   rtx save_cc, cc_reg, mask, src, src2;
   enum attr_fp_mode fp_mode;
Index: gcc/config/epiphany/resolve-sw-modes.c
===
--- gcc/config/epiphany/resolve-sw-modes.c	(revision 211436)
+++ gcc/config/epiphany/resolve-sw-modes.c	(working copy)
@@ -170,7 +170,7 @@ pass_resolve_sw_modes::execute (function *fun)
 	}
 	  start_sequence ();
 	  emit_set_fp_mode (EPIPHANY_MSW_ENTITY_ROUND_UNKNOWN,
-			jilted_mode, NULL);
+			jilted_mode, FP_MODE_NONE, NULL);
 	  seq = get_insns ();
 	  end_sequence ();
 	  need_commit = true;
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 211436)
+++ gcc/config/i386/i386.c	(working copy)
@@ -16447,7 +16447,8 @@ ix86_avx_emit_vzeroupper

Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-11 Thread Christian Bruel

On 06/11/2014 06:17 AM, Joern Rennecke wrote:

 Joern, is this new target macro interface OK with you ?
 Yes, this interface should allow me to do switches between rounding
 and truncating
 floating-point modes with an add/subtract immediate.

 However, the implentation, as posted, doesn't work - it causes memory
 corruption.

 It appears to work with the attached amendment patch.


Indeed,  thanks for pointing out the bad reusing of the aux field
between multiple entities.

In fact rereading this part of the implementation, I find the allocation
of aux*n_entities awkward. A simpler setting in the entity loop to carry
the mode directly into eg-aux is possible without array allocation
(which also fixes a memory leak by the way).

cheers,

Christian



Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-11 Thread Christian Bruel

On 06/10/2014 04:03 PM, Joern Rennecke wrote:
 On 13 May 2014 22:41, Oleg Endo oleg.e...@t-online.de wrote:

 Right.  I was thinking to add FPSCR.SZ mode switching to SH, in order to
 do float vector moves.  For that SZ and PR need to be switched both at
 the same time (only SH4A has both, fpchg and fschg).  So basically I'd
 add another mode entity, which would emit SZ mode changes in addition to
 the PR mode changes.  But then adjacent FPSCR-changing insns could be
 combined ... any idea/suggestion how to accomplish that?
 If they are sufficiently adjacent, you can use a peephole2 pattern for this.

 I see Cristian's patch addresses this in a different way - keeping size and
 precision in the same entity, and emitting toggles as appropriate.

yes, I was only interested to optimize the SH4a case when PR=1 with a
good enough implementation. To cover all the other possibilities a new
entity would be better. But then as you say recombining them might be
difficult.  An alternate hackish way could be to have a singe entity
with 4 modes covering all PR*SZ combinations).

but I'm not sure that covering the case where PR=0 SZ=1 worth it, maybe
code size only, ? as the 64 move would be implemented as 2*32 moves anyway,


 The problem get's a bit more interesting if you have some instruction patterns
 that care about one setting but not the other.
 Describing this exactly allows lazy code motion to be a bit more lazy, but 
 OTOH
 it can make it harder to combine mode switching instructions if you
 still want to
 do that.



[PATCH, tree-ssa] Optimize loop invariant phi defs constants

2014-06-05 Thread Christian Bruel
Hello,

while checking why a loop snippet like

  for (i = 0; i = 5000; i++)
if (b)
  a = 2;
else
  a = x;

was not optimized in -O2 (unless loop unrolling or loop switching), I
found out that the case was already  partially handled by Richard in
PR43934. So this patch just adds a cost to the phi defs constants to
allow the whole test to be hoisted out of the loop.

Richard, does this seem reasonable and OK for 4.10 ?

bootstrapped/regtested for x86

many thanks

Christian





2014-06-03  Christian Bruel  christian.br...@st.com

	PR tree-optimization/43934
	* tree-ssa-loop-im.c (determine_max_movement): Add PHI def constant cost.
2014-06-03  Christian Bruel  christian.br...@st.com

	PR tree-optimization/43934
	* gcc.dg/tree-ssa/ssa-lim-8.c: New testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options -O -fdump-tree-lim1-details } */
+
+void bar (int);
+void foo (int n, int m)
+{
+  unsigned i;
+  for (i = 0; i  n; ++i)
+{
+  int x;
+  if (m  0)
+	x = 1;
+  else
+	x = m;
+  bar (x);
+}
+}
+
+/* { dg-final { scan-tree-dump-times Moving PHI node 1 lim1  } } */
+/* { dg-final { cleanup-tree-dump lim1 } } */
Index: gcc/tree-ssa-loop-im.c
===
--- gcc/tree-ssa-loop-im.c	(revision 211255)
+++ gcc/tree-ssa-loop-im.c	(working copy)
@@ -719,8 +719,21 @@ determine_max_movement (gimple stmt, bool must_pre
   FOR_EACH_PHI_ARG (use_p, stmt, iter, SSA_OP_USE)
 	{
 	  val = USE_FROM_PTR (use_p);
+
 	  if (TREE_CODE (val) != SSA_NAME)
-	continue;
+	{
+	  unsigned cst_cost = 1;
+
+	  gcc_assert (TREE_CODE (val) == INTEGER_CST
+			  || TREE_CODE (val) == REAL_CST
+			  || TREE_CODE (val) == VECTOR_CST
+			  || TREE_CODE (val) == COMPLEX_CST
+			  || TREE_CODE (val) == ADDR_EXPR);
+
+	  min_cost = MIN (min_cost, cst_cost);
+	  total_cost += cst_cost;
+	  continue;
+	}
 	  if (!add_dependency (val, lim_data, loop, false))
 	return false;
 	  def_data = get_lim_data (SSA_NAME_DEF_STMT (val));


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-06-02 Thread Christian Bruel
Hello,

Any feedback for this ? I'd like to commit only when OK for Epiphany.

many thanks,

Christian

On 05/26/2014 05:32 PM, Christian Bruel wrote:
 On 04/28/2014 10:08 AM, Christian Bruel wrote:
 Hello,

 I'd like to ping the following patches

 [Hookize mode-switching]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01003.html

 [Add new hooks to support toggle and SH4A fpchg instruction]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01005.html
 Sorry, I only saw the first part and thought I' d need to wait till I
 see the second part - and I somehow missed that.

 I think the previous known mode should be passed to the
 TARGET_MODE_EMIT hook - no need to have extra hooks
 for toggling, and, as I mentioned earlier, fixating on the toggle is
 actually an SH artifact - other ports have multi-way
 modes settings that can benefit from knowing the previous mode.
 OK I'll change the bool toggle parameter by the previous mode in
 TARGET_MODE_EMIT and remove the bool XOR hooks in the SH description.

 Hello,

 This is the interface for targets that could use the previous mode for
 switching. TARGET_MODE_EMIT now takes a new prev_mode parameter. If the
 previous mode cannot be determined, MODE_NONE (value depends on  the
 entity) is used.

 The implementation is less trivial than just supporting a boolean toggle
 bit, as the previous modes information have to be carried along the
 edges. For this I recycle the auxiliary edge field that is made
 unnecessary by the removal of make_pred_opaque and a change in the
 implementation to call LCM for every modes from every identity
 simultaneously. This idea was suggested by Joern in PR29349.  Another
 speed improvement is that we process the modes to no_mode instead of
 max_num_modes for each entity.
 Thanks to all this, the only additional data to support prev_mode is
 that for each BB, the avin/avout lcm computation are cached inside the
 bb_info mode_in/mode_out fields,  the xor toggle bit handling  could
 have been removed.

 bootstrapped/regtested for x86 and sh4, sh4a, sh4a-single,
 epiphany build is OK. testsuite not ran.

 Joern, is this new target macro interface OK with you ? Jeff, (or other
 RTL maintainer) since this is a new implementation for
 optimize_mode_switching I suppose your previous approval doesn't held
 anymore... is this new one OK for trunk as well ?
 No change for x86/sh4/2a interfaces.

 Many thanks

 Christian





Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-26 Thread Christian Bruel

 On 04/28/2014 10:08 AM, Christian Bruel wrote:
 Hello,

 I'd like to ping the following patches

 [Hookize mode-switching]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01003.html

 [Add new hooks to support toggle and SH4A fpchg instruction]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01005.html
 Sorry, I only saw the first part and thought I' d need to wait till I
 see the second part - and I somehow missed that.

 I think the previous known mode should be passed to the
 TARGET_MODE_EMIT hook - no need to have extra hooks
 for toggling, and, as I mentioned earlier, fixating on the toggle is
 actually an SH artifact - other ports have multi-way
 modes settings that can benefit from knowing the previous mode.
 OK I'll change the bool toggle parameter by the previous mode in
 TARGET_MODE_EMIT and remove the bool XOR hooks in the SH description.

Hello,

This is the interface for targets that could use the previous mode for
switching. TARGET_MODE_EMIT now takes a new prev_mode parameter. If the
previous mode cannot be determined, MODE_NONE (value depends on  the
entity) is used.

The implementation is less trivial than just supporting a boolean toggle
bit, as the previous modes information have to be carried along the
edges. For this I recycle the auxiliary edge field that is made
unnecessary by the removal of make_pred_opaque and a change in the
implementation to call LCM for every modes from every identity
simultaneously. This idea was suggested by Joern in PR29349.  Another
speed improvement is that we process the modes to no_mode instead of
max_num_modes for each entity.
Thanks to all this, the only additional data to support prev_mode is
that for each BB, the avin/avout lcm computation are cached inside the
bb_info mode_in/mode_out fields,  the xor toggle bit handling  could
have been removed.

bootstrapped/regtested for x86 and sh4, sh4a, sh4a-single,
epiphany build is OK. testsuite not ran.

Joern, is this new target macro interface OK with you ? Jeff, (or other
RTL maintainer) since this is a new implementation for
optimize_mode_switching I suppose your previous approval doesn't held
anymore... is this new one OK for trunk as well ?
No change for x86/sh4/2a interfaces.

Many thanks

Christian


2014-05-23  Christian Bruel  christian.br...@st.com

	* mode-switching.c (struct bb_info): Add mode_out, mode_in caches.
	(make_preds_opaque): Delete function.
	(clear_mode_bit, mode_bit_p, set_mode_bit): New macros.
	(add_mode_set, get_mode_set, alloc_mode_aux, free_modes_edges): New functions.
	(commit_mode_sets): New function.
	(optimize_mode_switching): Handle current_mode to mode_switching_emit.
	Process all modes at once. 
	* basic-block.h (pre_edge_lcm_avs): Declare.
	* lcm.c (pre_edge_lcm_avs): Renamed from pre_edge_lcm.
	Call clear_aux_for_edges. Fix comments.
	(pre_edge_lcm): New wrapper function to call pre_edge_lcm_avs.
	(pre_edge_rev_lcm): Idem.
	* config/epiphany/epiphany.c (emit_set_fp_mode): Add prev_mode parameter.
	* config/epiphany/epiphany-protos.h (emit_set_fp_mode): Idem.
	* config/epiphany/resolve-sw-modes.c (pass_resolve_sw_modes::execute): Idem.
	* config/i386/i386.c (x96_emit_mode_set): Idem.
	* config/sh/sh.c (sh_emit_mode_set): Likewise. Handle PR toggle.
	* config/sh/sh.md (toggle_pr): 	Defined if TARGET_FPU_SINGLE.
	(fpscr_toggle) Disallow from delay slot.
	* target.def (emit_mode_set): Add prev_mode parameter.
	* doc/tm.texi: Regenerate.

2014-05-19  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/fpchg.c: New test.

Index: gcc/basic-block.h
===
--- gcc/basic-block.h	(revision 210845)
+++ gcc/basic-block.h	(working copy)
@@ -711,6 +711,9 @@ extern void bitmap_union_of_preds (sbitmap, sbitma
 extern struct edge_list *pre_edge_lcm (int, sbitmap *, sbitmap *,
    sbitmap *, sbitmap *, sbitmap **,
    sbitmap **);
+extern struct edge_list *pre_edge_lcm_avs (int, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap **, sbitmap **);
 extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
 	   sbitmap *, sbitmap *,
 	   sbitmap *, sbitmap **,
Index: gcc/config/epiphany/epiphany-protos.h
===
--- gcc/config/epiphany/epiphany-protos.h	(revision 210845)
+++ gcc/config/epiphany/epiphany-protos.h	(working copy)
@@ -40,7 +40,8 @@ extern int epiphany_initial_elimination_offset (in
 extern void epiphany_init_expanders (void);
 extern int hard_regno_mode_ok (int regno, enum machine_mode mode);
 #ifdef HARD_CONST
-extern void emit_set_fp_mode (int entity, int mode, HARD_REG_SET regs_live);
+extern void emit_set_fp_mode (int entity, int mode, int prev_mode,
+			  HARD_REG_SET regs_live);
 #endif
 extern void epiphany_insert_mode_switch_use (rtx insn, int, int);
 extern void epiphany_expand_set_fp_mode (rtx *operands);
Index: gcc/config/epiphany/epiphany.c

[PATCH SH] Don't switch mode around fmov (pr61195)

2014-05-16 Thread Christian Bruel
Hi,

This patch reduces the number of unnecessary PR mode switches for single
precision moves. When FPSCR.SZ is not forced. Test illustration in the patch

Regtested for sh-none-elf with -m4, (-m2a still running), bootstrapped
on sh4-linux-elf board. ok for trunk ?

Many thanks,

Christian





 


2014-05-16  Christian Bruel  christian.br...@st.com

	PR target/61195
	* config/sh/sh.c (movsf_ie): Unset fp_mode for fmov.

2014-05-16  Christian Bruel  christian.br...@st.com

	PR target/61195
	* gcc.target/sh/pr61195.c: New test.

Index: config/sh/sh.md
===
--- config/sh/sh.md	(revision 210475)
+++ config/sh/sh.md	(working copy)
@@ -8357,9 +8357,26 @@ label:
   (const_int 2)
   (const_int 2)
   (const_int 0)])
-   (set (attr fp_mode) (if_then_else (eq_attr fmovd yes)
-	   (const_string single)
-	   (const_string single)))])
+  (set_attr_alternative fp_mode
+ [(if_then_else (eq_attr fmovd yes) (const_string single) (const_string none))
+  (const_string none)
+  (const_string single)
+  (const_string single)
+  (const_string none)
+  (if_then_else (eq_attr fmovd yes) (const_string single) (const_string none))
+  (if_then_else (eq_attr fmovd yes) (const_string single) (const_string none))
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)])])
 
 (define_split
   [(set (match_operand:SF 0 register_operand )
Index: testsuite/gcc.target/sh/pr61195.c
===
--- testsuite/gcc.target/sh/pr61195.c	(revision 0)
+++ testsuite/gcc.target/sh/pr61195.c	(working copy)
@@ -0,0 +1,19 @@
+/* Verify that we don't switch mode for single moves.  */
+/* { dg-do compile }  */
+/* { dg-require-effective-target hard_float } */
+/* { dg-skip-if  { *-*-* }  { mfmovd } {  } } */
+/* { dg-final { scan-assembler-not fpscr } } */
+
+float *g;
+
+float
+foo(float f)
+{
+  return f;
+}
+
+float
+foo1(void)
+{
+  return *g;
+}


Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-12 Thread Christian Bruel
Hello,

I'd still wish to ping for the following set of patches. Those changes
does not impact other targets than SH4 but, as suggested by Joern, I
have hooked the macros and moved the SH4A specific support to the target
parts (so a different target can eventually implement other models than
dual mode).

Patch2 only does very little restructuring  but if is not interesting
enough for all targets, patch 1 should not be that intrusive.

For RTL middle end and (X86, SH, Epiphany) target reviewers,

Many thanks,

Christian

On 04/28/2014 10:08 AM, Christian Bruel wrote:
 Hello,

 I'd like to ping the following patches

 [Hookize mode-switching]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01003.html

 [Add new hooks to support toggle and SH4A fpchg instruction]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01005.html

 Many thanks







Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-12 Thread Christian Bruel
Just saw the Jeff's approval for the RTL part. Sorry for the crossed answers

remains the target maintainers.  Joern, Kaz ?

Many thanks.

Christian

On 05/12/2014 10:44 AM, Christian Bruel wrote:
 Hello,

 I'd still wish to ping for the following set of patches. Those changes
 does not impact other targets than SH4 but, as suggested by Joern, I
 have hooked the macros and moved the SH4A specific support to the target
 parts (so a different target can eventually implement other models than
 dual mode).

 Patch2 only does very little restructuring  but if is not interesting
 enough for all targets, patch 1 should not be that intrusive.

 For RTL middle end and (X86, SH, Epiphany) target reviewers,

 Many thanks,

 Christian

 On 04/28/2014 10:08 AM, Christian Bruel wrote:
 Hello,

 I'd like to ping the following patches

 [Hookize mode-switching]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01003.html

 [Add new hooks to support toggle and SH4A fpchg instruction]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01005.html

 Many thanks







Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)

2014-05-12 Thread Christian Bruel
On 04/28/2014 10:08 AM, Christian Bruel wrote:
 Hello,

 I'd like to ping the following patches

 [Hookize mode-switching]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01003.html

 [Add new hooks to support toggle and SH4A fpchg instruction]
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01005.html
 Sorry, I only saw the first part and thought I' d need to wait till I
 see the second part - and I somehow missed that.

 I think the previous known mode should be passed to the
 TARGET_MODE_EMIT hook - no need to have extra hooks
 for toggling, and, as I mentioned earlier, fixating on the toggle is
 actually an SH artifact - other ports have multi-way
 modes settings that can benefit from knowing the previous mode.

OK I'll change the bool toggle parameter by the previous mode in
TARGET_MODE_EMIT and remove the bool XOR hooks in the SH description.
Just for my curiosity, which other targets have multi-way toggling
support ?

I'll commit the first patch as approved and re post the second one.

Many thanks,

Christian




[PING][PATCH] Extend mode-switching to support toggle (1/2)

2014-04-28 Thread Christian Bruel
Hello,

I'd like to ping the following patches

[Hookize mode-switching]
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01003.html

[Add new hooks to support toggle and SH4A fpchg instruction]
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01005.html

Many thanks






[PATCH, SH] Extend HIQI mode constants

2014-04-22 Thread Christian Bruel
This patch allows constant propagation from HIQI modes, as illustrated
by the attached testcase, by converting them into a new SImode pseudo.
It also merge the HIQI mode patterns using general_movdst_operand for both.

No regression on sh-none-elf. OK for trunk ?

Thanks,


2014-04-22  Christian Bruel  christian.br...@st.com

	* config/sh/sh.md (movmode): Replace movQIHI.
	 Force immediates to SImode.

2014-04-22  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/hiconst.c: New test.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 209556)
+++ gcc/config/sh/sh.md	(working copy)
@@ -6978,20 +6978,20 @@ label:
   [(set_attr type sfunc)
(set_attr needs_delay_slot yes)])
 
-(define_expand movhi
-  [(set (match_operand:HI 0 general_movdst_operand )
-	(match_operand:HI 1 general_movsrc_operand ))]
+(define_expand movmode
+  [(set (match_operand:QIHI 0 general_movdst_operand)
+	(match_operand:QIHI 1 general_movsrc_operand))]
   
 {
-  prepare_move_operands (operands, HImode);
-})
+ if (can_create_pseudo_p ()  CONST_INT_P (operands[1])
+ REG_P (operands[0])  REGNO (operands[0]) != R0_REG)
+{
+rtx reg = gen_reg_rtx(SImode);
+emit_move_insn (reg, operands[1]);
+operands[1] = gen_lowpart (MODEmode, reg);
+}
 
-(define_expand movqi
-  [(set (match_operand:QI 0 general_operand )
-	(match_operand:QI 1 general_operand ))]
-  
-{
-  prepare_move_operands (operands, QImode);
+  prepare_move_operands (operands, MODEmode);
 })
 
 ;; Specifying the displacement addressing load / store patterns separately
Index: gcc/testsuite/gcc.target/sh/hiconst.c
===
--- gcc/testsuite/gcc.target/sh/hiconst.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/hiconst.c	(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target sh*-*-* } } */
+/* { dg-options -O1 } */
+
+char a;
+int b;
+
+foo(char *pt, int *pti)
+{
+  a = 0;
+  b = 0;
+  *pt = 0;
+  *pti = 0;
+}
+
+rab(char *pt, int *pti)
+{
+  pt[2] = 0;
+  pti[3] = 0;
+}
+
+/* { dg-final { scan-assembler-times mov\t#0 2 } } */
+


[ PATCH] Extend mode-switching to support toggle (1/2)

2014-04-17 Thread Christian Bruel
Hello,

He is a new version of the patch. It hookizes the mode-setting and
mode-toggling macros. Split in 2 parts.

Successfully bootstrapped/regtested on ix86 and SH4/SH4a.

I was able to do a limited build on Epiphany, if someone could give it a
try on it that would be great.

comments ? suggestions ?

many thanks,

Christian









2014-04-02  Christian Bruel  christian.br...@st.com

	* target.def (mode_switching): New hook vector.
	(mode_emit, mode_needed, mode_after, mode_entry): New hooks.
	(mode_exit, modepriority_to_mode): Likewise.
	* mode-switching.c (MODE_NEEDED, MODE_AFTER, MODE_ENTRY): Hookify.
	(MODE_EXIT, MODE_PRIORITY_TO_MODE, EMIT_MODE_SET): Likewise.
	(default_priority_to_mode): Define.
	* targhooks.h (default_priority_to_mode): Declare.
	* target.h: Include tm.h and hard-reg-set.h.
	* doc/tm.texi.in (EMIT_MODE_SET, MODE_NEEDED, MODE_AFTER, MODE_ENTRY)
	(MODE_EXIT, MODE_PRIORITY_TO_MODE): Delete and hookify.
	* doc/tm.texi Regenerate.
	* config/sh/sh.h (MODE_NEEDED, MODE_AFTER, MODE_ENTRY): Delete
	(MODE_EXIT, MODE_PRIORITY_TO_MODE, EMIT_MODE_SET): Likewise.
	* config/sh/sh.c (emit_fpu_toggle): New function.
	(sh4_emit_mode_set, sh4_mode_needed): Hookify.
	(sh4_mode_after, sh4_mode_entry, sh4_mode_exit): Likewise.
	* config/i386/i386.h (MODE_NEEDED, MODE_AFTER, MODE_ENTRY): Delete
	(MODE_EXIT, MODE_PRIORITY_TO_MODE, EMIT_MODE_SET): Likewise.
	* config/i386/i386-protos.h (ix86_mode_needed, ix86_mode_after)
	(ix86_mode_entrym, ix86_emit_mode_set): Remove external declaration.
	* config/i386/i386.c (ix86_mode_needed, ix86_mode_after, ix86_mode_exit,
	(ix86_mode_entry, ix86_mode_priority, ix86_emit_mode_set): Hookify.
	* config/epiphany/epiphany.h (MODE_NEEDED, MODE_AFTER, MODE_ENTRY):
	Delete
	(MODE_EXIT, MODE_PRIORITY_TO_MODE, EMIT_MODE_SET): Likewise.
	* config/sh/sh.h (MODE_NEEDED, MODE_AFTER, MODE_ENTRY): Delete
	(MODE_EXIT, MODE_PRIORITY_TO_MODE, EMIT_MODE_SET): Likewise.
	* config/sh/sh.c (sh4_emit_mode_set, sh4_mode_needed): Hookify.
	(sh4_mode_after, sh4_mode_entry, sh4_mode_exit): Likewise.
	* config/epiphany/epiphany-protos.h (epiphany_mode_needed)
	(emit_set_fp_mode, epiphany_mode_entry_exit, epiphany_mode_after)
	(epiphany_mode_priority_to_mode): Remove declaration.
	* config/epiphany/epiphany.c (emit_set_fp_mode): Hookify.
	(epiphany_mode_needed, epiphany_mode_priority_to_mode): Likewise.
	(epiphany_mode_entry, epiphany_mode_exit, epiphany_mode_after):
	Likewise.
	(epiphany_mode_priority_to_mode): Change priority type. Hookify.
	(epiphany_mode_needed, epiphany_mode_entry_exit): Hookify.
	(epiphany_mode_after, epiphany_mode_entry, emit_set_fp_mode): Hookify.

--- gcc/config/epiphany/epiphany-protos.h	(revision 209415)
+++ gcc/config/epiphany/epiphany-protos.h	(working copy)
@@ -45,9 +45,7 @@ extern void emit_set_fp_mode (int entity, int mode
 extern void epiphany_insert_mode_switch_use (rtx insn, int, int);
 extern void epiphany_expand_set_fp_mode (rtx *operands);
 extern int epiphany_mode_needed (int entity, rtx insn);
-extern int epiphany_mode_entry_exit (int entity, bool);
 extern int epiphany_mode_after (int entity, int last_mode, rtx insn);
-extern int epiphany_mode_priority_to_mode (int entity, unsigned priority);
 extern bool epiphany_epilogue_uses (int regno);
 extern bool epiphany_optimize_mode_switching (int entity);
 extern bool epiphany_is_interrupt_p (tree);
--- gcc/config/epiphany/epiphany.c	(revision 209415)
+++ gcc/config/epiphany/epiphany.c	(working copy)
@@ -152,6 +152,20 @@ static rtx frame_insn (rtx);
 /* We further restrict the minimum to be a multiple of eight.  */
 #define TARGET_MIN_ANCHOR_OFFSET (optimize_size ? 0 : -2040)
 
+/* Mode switching hooks.  */
+
+#define TARGET_MODE_EMIT emit_set_fp_mode
+
+#define TARGET_MODE_NEEDED epiphany_mode_needed
+
+#define TARGET_MODE_PRIORITY epiphany_mode_priority
+
+#define TARGET_MODE_ENTRY epiphany_mode_entry
+
+#define TARGET_MODE_EXIT epiphany_mode_exit
+
+#define TARGET_MODE_AFTER epiphany_mode_after
+
 #include target-def.h
 
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -2306,8 +2320,8 @@ epiphany_optimize_mode_switching (int entity)
   gcc_unreachable ();
 }
 
-int
-epiphany_mode_priority_to_mode (int entity, unsigned priority)
+static int
+epiphany_mode_priority (int entity, int priority)
 {
   if (entity == EPIPHANY_MSW_ENTITY_AND || entity == EPIPHANY_MSW_ENTITY_OR
   || entity== EPIPHANY_MSW_ENTITY_CONFIG)
@@ -2415,7 +2429,7 @@ epiphany_mode_needed (int entity, rtx insn)
   }
 }
 
-int
+static int
 epiphany_mode_entry_exit (int entity, bool exit)
 {
   int normal_mode = epiphany_normal_fp_mode ;
@@ -2502,6 +2516,18 @@ epiphany_mode_after (int entity, int last_mode, rt
   return last_mode;
 }
 
+static int
+epiphany_mode_entry (int entity)
+{
+  return epiphany_mode_entry_exit (entity, false);
+}
+
+static int
+epiphany_mode_exit (int entity)
+{
+  return epiphany_mode_entry_exit (entity, true);
+}
+
 void
 emit_set_fp_mode (int entity, int mode, HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
 {
--- gcc/config

[ PATCH] Extend mode-switching to support toggle (2/2)

2014-04-17 Thread Christian Bruel
and the toggle-support hookized

many thanks,

Christian



2014-04-02  Christian Bruel  christian.br...@st.com

	* target.def (mode_switching): New hook vector.
	(toggle_init, toggle_destroy, toggle_set, toggle_test):
	New mode toggle hooks.
	* targhooks.h (default_toggle_test): Declare.
	* basic-block.h (pre_edge_lcm_avs): Declare.
	* lcm.c (pre_edge_lcm_avs): Renamed from pre_edge_lcm.
	Call clear_aux_for_edges. Fix comments.
	(pre_edge_lcm): New wrapper function to call pre_edge_lcm_avs.
	(pre_edge_rev_lcm): Idem.
	* mode-switching.c (init_modes_infos): New function.
	(free_modes_infos): Likewise.
	(add_mode_set): Likewise.
	(get_mode): Likewise.
	(commit_mode_sets): Likewise.
	(merge_modes): Likewise.
	(optimize_mode_switching): Support mode toggle.
	(default_priority_to_mode, default_toggle_test): Define.
	* doc/tm.texi.in (TARGET_MODE_TOGGLE_INIT, TARGET_MODE_TOGGLE_TEST)
	(TARGET_MODE_TOGGLE_DESTROY, TARGET_MODE_TOGGLE_SET):
	 New target hooks.
	* doc/tm.texi: Regenerate.
	* config/sh/sh.c (sh4_toggle_init, sh4_toggle_destroy): Add hook and define.
	(sh4_toggle_set, sh4_toggle_test): Likewise.
	(mode_in_flip, mode_out_flip): Add bitmap to compute mode flipping.
	(TARGET_MODE_EMIT): New toggle parameter.
	* config/sh/sh.md (toggle_pr): Defined for TARGET_SH4_300 and TARGET_SH4A_FP.
	(in_delay_slot): fpscr_toggle don't go in delay slot.
	* config/i386/i386.c (ix86_emit_mode_set): Add bool unused parameter.
	* config/epiphany/epiphany.c (emit_set_fp_mode): Add bool unused parameter.

--- gcc/basic-block.h	2014-01-07 10:30:59.0 +0100
+++ /work1/bruel/superh_elf/gnu_trunk.devs/gcc/gcc/basic-block.h	2014-04-15 16:17:53.0 +0200
@@ -711,6 +711,9 @@
 extern struct edge_list *pre_edge_lcm (int, sbitmap *, sbitmap *,
    sbitmap *, sbitmap *, sbitmap **,
    sbitmap **);
+extern struct edge_list *pre_edge_lcm_avs (int, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap **, sbitmap **);
 extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
 	   sbitmap *, sbitmap *,
 	   sbitmap *, sbitmap **,
--- gcc/config/epiphany/epiphany.c	2014-04-17 13:23:48.0 +0200
+++ /work1/bruel/superh_elf/gnu_trunk.devs/gcc/gcc/config/epiphany/epiphany.c	2014-04-17 13:25:54.0 +0200
@@ -2529,7 +2529,8 @@
 }
 
 void
-emit_set_fp_mode (int entity, int mode, HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
+emit_set_fp_mode (int entity, int mode, bool toggle ATTRIBUTE_UNUSED,
+		  HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
 {
   rtx save_cc, cc_reg, mask, src, src2;
   enum attr_fp_mode fp_mode;
--- gcc/config/epiphany/epiphany-protos.h	2014-04-17 11:10:36.0 +0200
+++ /work1/bruel/superh_elf/gnu_trunk.devs/gcc/gcc/config/epiphany/epiphany-protos.h	2014-04-17 11:22:02.0 +0200
@@ -40,7 +40,8 @@
 extern void epiphany_init_expanders (void);
 extern int hard_regno_mode_ok (int regno, enum machine_mode mode);
 #ifdef HARD_CONST
-extern void emit_set_fp_mode (int entity, int mode, HARD_REG_SET regs_live);
+extern void emit_set_fp_mode (int entity, int mode,
+			  bool toggle ATTRIBUTE_UNUSED, HARD_REG_SET regs_live);
 #endif
 extern void epiphany_insert_mode_switch_use (rtx insn, int, int);
 extern void epiphany_expand_set_fp_mode (rtx *operands);
--- gcc/config/epiphany/resolve-sw-modes.c	2014-04-17 11:10:36.0 +0200
+++ /work1/bruel/superh_elf/gnu_trunk.devs/gcc/gcc/config/epiphany/resolve-sw-modes.c	2014-04-17 11:21:07.0 +0200
@@ -147,7 +147,7 @@
 	}
 	  start_sequence ();
 	  emit_set_fp_mode (EPIPHANY_MSW_ENTITY_ROUND_UNKNOWN,
-			jilted_mode, NULL);
+			jilted_mode, false, NULL);
 	  seq = get_insns ();
 	  end_sequence ();
 	  need_commit = true;
--- gcc/config/i386/i386.c	2014-04-17 13:02:49.0 +0200
+++ /work1/bruel/superh_elf/gnu_trunk.devs/gcc/gcc/config/i386/i386.c	2014-04-17 13:04:18.0 +0200
@@ -16409,7 +16409,8 @@
are to be inserted.  */
 
 static void
-ix86_emit_mode_set (int entity, int mode, HARD_REG_SET regs_live)
+ix86_emit_mode_set (int entity, int mode, bool toggle ATTRIBUTE_UNUSED,
+		HARD_REG_SET regs_live)
 {
   switch (entity)
 {
--- gcc/config/sh/sh.c	2014-04-17 13:23:07.0 +0200
+++ /work1/bruel/superh_elf/gnu_trunk.devs/gcc/gcc/config/sh/sh.c	2014-04-17 13:25:27.0 +0200
@@ -202,7 +202,7 @@
 static int calc_live_regs (HARD_REG_SET *);
 static HOST_WIDE_INT rounded_frame_size (int);
 static bool sh_frame_pointer_required (void);
-static void sh4_emit_mode_set (int, int, HARD_REG_SET);
+static void sh4_emit_mode_set (int, int, bool, HARD_REG_SET);
 static int sh4_mode_needed (int, rtx);
 static int sh4_mode_after (int, int, rtx);
 static int sh4_mode_entry (int);
@@ -590,9 +590,21 @@
 #undef TARGET_MODE_EXIT
 #define TARGET_MODE_EXIT sh4_mode_exit
 
+#undef TARGET_MODE_TOGGLE_INIT
+#define TARGET_MODE_TOGGLE_INIT sh4_toggle_init
+
 #undef TARGET_MODE_PRIORITY
 #define TARGET_MODE_PRIORITY sh4_mode_priority
 
+#undef

[PING PATCH] Extend mode-switching to support toggle

2014-04-15 Thread Christian Bruel
Hello,

I guess it's for RTL maintainers. Also interested by mode-switching.c
last contributors (from past ChangeLog entries) comments,
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00196.html

Many thanks

Christian,


Re: [PING PATCH] Extend mode-switching to support toggle

2014-04-15 Thread Christian Bruel

On 04/15/2014 01:13 PM, Joern Rennecke wrote:
 On 15 April 2014 10:20, Christian Bruel christian.br...@st.com wrote:
 Hello,

 I guess it's for RTL maintainers. Also interested by mode-switching.c
 last contributors (from past ChangeLog entries) comments,

 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00196.html
 This only helps if there are exactly two modes for an entity.
 An interface which extends EMIT_MODE_SET with a parameter for the known
 mode would be more versatile.  I.e. if you need to manipulate a control 
 register
 with an AND and OR to set a specific mode with no knowledge of the previous
 mode, having the known previous mode allows to use a single add or xor
 to make the desired switch.  An unknown mode could be represented by no_mode
 or -1.

yes, I didn't have a 3 state (or more) toggling in mind. My
implementation only works for 2 states entities (flip on/off a bit).
More than that would require not using a XOR and should expose the
test/set_toggle_status machinery to a machine description part. This is
a limitation of my proposal, If this will be a missing extension for a
target, we will need to move the test/set toggle machinery out of the
machine independent part. Maybe overkill as of today.

I agree that extending the current EMIT_MODE_SET might be more flexible
than a new EMIT_TOGGLE. I was balancing between the two interfaces...
thanks for this point.

 While you are at it, you should also hookize the thing.

OK,


 FWIW, I have noted down some weaknesses/improvement opportunities of
 the mode switching pass in: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29349 
 .
 This also touches the fpchg issue
Thanks, I have also some example where this can be improved, hoping to
resurrect this problem one day.


[PATCH] Extend mode-switching to support toggle

2014-04-04 Thread Christian Bruel
Respin of a long standing forgotten patch
(http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01562.html).

This patch extends the mode-switching pass to toggle on/off a control
register status bit instead of setting the value. e.g on the SH4a to
support the FPCHG instruction used to switch FPU precision mode.

The idea is to use the AVOUT/AVIN information computed by LCM for each
mode on the CFG edges. When 2 modes for a given entity conflict, a
regular mode set is emitted. Elsewhere the same mode is set is for all
outgoing/incoming edges, a toggle is possible. The only important
general change is that we have to postpone the mode toggling (or
setting) after the modes have been computed for each edges.

bootstrapped/regtested for x86_64-unknown-linux-gnu and
sh-none-elf/sh-linux with for -m4a, and -m4

Comments appreciated. If OK go for trunk/stage1 ?

many thanks,







2014-04-02  Christian Bruel  christian.br...@st.com

	* basic-block.h (pre_edge_lcm_avs): Declare.
	* doc/tm.texi (EMIT_MODE_TOGGLE): Document.
	* doc/tm.texi.in (EMIT_MODE_TOGGLE): Idem.
	* config/sh/sh.h (EMIT_MODE_TOGGLE): Define.
	* config/sh/sh-protos.h	(emit_fpu_toggle): Declare
	* config/sh/sh.c (emit_fpu_toggle): New function.
	* config/sh/sh.md (toggle_pr): Defined for TARGET_SH4_300 and
	TARGET_SH4A_FP.
	(in_delay_slot): fpscr_toggle don't go in delay slot.
	* lcm.c (pre_edge_lcm_avs): Renamed from pre_edge_lcm.
	Call clear_aux_for_edges. Fix comments.
	(pre_edge_lcm): New wrapper function to call pre_edge_lcm_avs.
	(pre_edge_rev_lcm): Idem.
	* mode-switching.c (init_modes_infos): New function.
	(free_modes_infos): Idem.
	(init_modes_infos): Idem
	(add_mode_set): Idem.
	(get_mode): Idem.
	(commit_mode_sets): Idem.
	(merge_modes): Idem.
	(set_flip_status): Idem
	(test_flip_status): Idem.
	(optimize_mode_switching): Add support to toggle modes.

2014-04-02  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/fpchg1.c: New test.

Index: gcc/basic-block.h
===
--- gcc/basic-block.h	(revision 208956)
+++ gcc/basic-block.h	(working copy)
@@ -711,6 +711,9 @@ extern void bitmap_union_of_preds (sbitmap, sbitma
 extern struct edge_list *pre_edge_lcm (int, sbitmap *, sbitmap *,
    sbitmap *, sbitmap *, sbitmap **,
    sbitmap **);
+extern struct edge_list *pre_edge_lcm_avs (int, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap *, sbitmap *,
+	   sbitmap *, sbitmap **, sbitmap **);
 extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
 	   sbitmap *, sbitmap *,
 	   sbitmap *, sbitmap **,
Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 208956)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -210,6 +210,7 @@ extern bool check_use_sfunc_addr (rtx, rtx);
 #ifdef HARD_CONST
 extern void fpscr_set_from_mem (int, HARD_REG_SET);
 #endif
+extern bool emit_fpu_toggle (int);
 
 extern void sh_pr_interrupt (struct cpp_reader *);
 extern void sh_pr_trapa (struct cpp_reader *);
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 208956)
+++ gcc/config/sh/sh.c	(working copy)
@@ -10042,6 +10042,17 @@ get_free_reg (HARD_REG_SET regs_live)
   return gen_rtx_REG (Pmode, 7);
 }
 
+/* This function switches the fpscr.  */
+bool
+emit_fpu_toggle (int e ATTRIBUTE_UNUSED)
+{
+  emit_insn (gen_toggle_pr ());
+  if (TARGET_FMOVD)
+emit_insn (gen_toggle_sz ());
+
+  return true;
+}
+
 /* This function will set the fpscr from memory.
MODE is the mode we are setting it to.  */
 void
Index: gcc/config/sh/sh.h
===
--- gcc/config/sh/sh.h	(revision 208956)
+++ gcc/config/sh/sh.h	(working copy)
@@ -2259,6 +2259,9 @@ extern int current_function_interrupt;
 #define MODE_PRIORITY_TO_MODE(ENTITY, N) \
   ((TARGET_FPU_SINGLE != 0) ^ (N) ? FP_MODE_SINGLE : FP_MODE_DOUBLE)
 
+#define EMIT_MODE_TOGGLE(ENTITY, MODE) \
+  ((TARGET_SH4A_FP || TARGET_SH4_300) ? emit_fpu_toggle (ENTITY) : false)
+
 #define EMIT_MODE_SET(ENTITY, MODE, HARD_REGS_LIVE) \
   fpscr_set_from_mem ((MODE), (HARD_REGS_LIVE))
 
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 208956)
+++ gcc/config/sh/sh.md	(working copy)
@@ -504,6 +504,7 @@
 (define_attr in_delay_slot yes,no
   (cond [(eq_attr type cbranch) (const_string no)
 	 (eq_attr type pcload,pcload_si) (const_string no)
+	 (eq_attr type fpscr_toggle) (const_string no)
 	 (eq_attr needs_delay_slot yes) (const_string no)
 	 (eq_attr length 2) (const_string yes)
 	 ] (const_string no)))
@@ -12182,15 +12183,10 @@ label:
   fschg
   [(set_attr type fpscr_toggle) (set_attr fp_set unknown)])
 
-;; There's no way we can use it today, since optimize mode switching
-;; doesn't enable us to know from which mode we're switching to the
-;; mode

Re: [PATH, SH] Small builtin_strlen improvement

2014-03-31 Thread Christian Bruel

On 03/30/2014 11:02 PM, Oleg Endo wrote:
 Hi,

 On Wed, 2014-03-26 at 08:58 +0100, Christian Bruel wrote:

 This patches adds a few instructions to the inlined builtin_strlen to
 unroll the remaining bytes for word-at-a-time loop. This enables to have
 2 distinct execution paths (no fall-thru in the byte-at-a-time loop),
 allowing block alignment assignation. This partially improves the
 problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin
 string functions ignore loop and label alignment
 Actually, my original concern was the (mis)alignment of the 4 byte inner
 loop.  AFAIR it's better for the SH pipeline if the first insn of a loop
 is 4 byte aligned.

yes, this is why I haven't closed the PR. IMHO the problem is with the
non-aligned loop stems from to the generic alignment code in final.c.
changing branch frequencies is quite impacting to BB reordering as well.
Further tuning of static branch estimations, or tuning of the LOOP_ALIGN
macro is needed. Note that my branch estimations in this code is very
empirical, a dynamic profiling benchmarking would be nice as well.
My point was just that forcing a local .align in this code is a
workaround, as we should be able to rely on generic reordering/align
code  for this. So the tuning of loop alignment is more global (and well
exhibited here indeed)


 whereas the test now expands (-O2 -m4) as
 mov r4,r0
 tst #3,r0
 mov r4,r2
 bf/s.L12
 mov r4,r3
 mov #0,r2
 .L4:
 mov.l   @r4+,r1
 cmp/str r2,r1
 bf  .L4
 add #-4,r4
 mov.b   @r4,r1
 tst r1,r1
 bt  .L2
 add #1,r4
 mov.b   @r4,r1
 tst r1,r1
 bt  .L2
 add #1,r4
 mov.b   @r4,r1
 tst r1,r1
 mov #-1,r1
 negcr1,r1
 add r1,r4
 .L2:
 mov r4,r0
 rts
 sub r3,r0
 .align 1
 .L12:
 mov.b   @r4+,r1
 tst r1,r1
 bf/s.L12
 mov r2,r3
 add #1,r3
 mov r4,r0
 rts
 sub r3,r0


 Best tuning compared to the compact version I got on is ~1% for c++
 regular expression benchmark, but well, code looks best this way.
 I haven't done any measurements but doesn't this introduce some
 performance regressions here and there due to the increased code size?
 Maybe the byte unrolling should not be done at -O2 but at -O3?

Maybe, Actually from my I said I have seen only small improvements and
no regressions on my test base which is already very large.  This is
tuned for a 32 byte cache line so the exit code fits.


 Moreover, post-inc addressing on the bytes could be used.  Ideally we'd
 get something like this:

Maybe, from what I measured (I tried this) it was slightly worse (but
small peanuts). I preferred to avoid the last pointer reparation, and
since we are on a very small number of unrolled loops (3), it counts.
This is a proposed implementation,I there are always alternatives and I
might have missed some coding patterns, I'd be happy to benchmarks other
coding if you have a patch to try.

Cheers,


 mov r4,r0
 tst #3,r0
 bf/s.L12
 mov r4,r3
 mov #0,r2
 .L4:
 mov.l   @r4+,r1
 cmp/str r2,r1
 bf  .L4

 add #-4,r4

 mov.b   @r4+,r1
 tst r1,r1
 bt  .L2

 mov.b   @r4+,r1
 tst r1,r1
 bt  .L2

 mov.b   @r4+,r1
 tst r1,r1
 mov #-1,r1
 subcr1,r4
 sett
 .L2:
 mov r4,r0
 rts
 subcr3,r0
 .align 1
 .L12:
 mov.b   @r4+,r1
 tst r1,r1
 bf .L12

 mov r4,r0
 rts
 subcr3,r0


 I'll have a look at the missed 'subc' cases.
 Cheers,
 Oleg




[PATH, SH] Small builtin_strlen improvement

2014-03-26 Thread Christian Bruel
Hello,

This patches adds a few instructions to the inlined builtin_strlen to
unroll the remaining bytes for word-at-a-time loop. This enables to have
2 distinct execution paths (no fall-thru in the byte-at-a-time loop),
allowing block alignment assignation. This partially improves the
problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin
string functions ignore loop and label alignment

whereas the test now expands (-O2 -m4) as
mov r4,r0
tst #3,r0
mov r4,r2
bf/s.L12
mov r4,r3
mov #0,r2
.L4:
mov.l   @r4+,r1
cmp/str r2,r1
bf  .L4
add #-4,r4
mov.b   @r4,r1
tst r1,r1
bt  .L2
add #1,r4
mov.b   @r4,r1
tst r1,r1
bt  .L2
add #1,r4
mov.b   @r4,r1
tst r1,r1
mov #-1,r1
negcr1,r1
add r1,r4
.L2:
mov r4,r0
rts
sub r3,r0
.align 1
.L12:
mov.b   @r4+,r1
tst r1,r1
bf/s.L12
mov r2,r3
add #1,r3
mov r4,r0
rts
sub r3,r0


Best tuning compared to the compact version I got on is ~1% for c++
regular expression benchmark, but well, code looks best this way.

regtested tested for -m2, -m4

OK for trunk ?


2014-03-20  Christian Bruel  christian.br...@st.com

	* config/sh/sh-mem.cc (sh_expand_strlen): Unroll last word.

Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 208745)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -586,9 +586,35 @@ sh_expand_strlen (rtx *operands)
 
   emit_move_insn (current_addr, plus_constant (Pmode, current_addr, -4));
 
-  /* start byte loop.  */
   addr1 = adjust_address (addr1, QImode, 0);
 
+  /* unroll remaining bytes.  */
+  emit_insn (gen_extendqisi2 (tmp1, addr1));
+  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
+  jump = emit_jump_insn (gen_branch_true (L_return));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+
+  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
+
+  emit_insn (gen_extendqisi2 (tmp1, addr1));
+  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
+  jump = emit_jump_insn (gen_branch_true (L_return));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+
+  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
+
+  emit_insn (gen_extendqisi2 (tmp1, addr1));
+  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
+  jump = emit_jump_insn (gen_branch_true (L_return));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+
+  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
+
+  emit_insn (gen_extendqisi2 (tmp1, addr1));
+  jump = emit_jump_insn (gen_jump_compact (L_return));
+  emit_barrier_after (jump);
+
+  /* start byte loop.  */
   emit_label (L_loop_byte);
 
   emit_insn (gen_extendqisi2 (tmp1, addr1));
@@ -600,11 +626,12 @@ sh_expand_strlen (rtx *operands)
 
   /* end loop.  */
 
+  emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1)));
+
   emit_label (L_return);
 
-  emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1)));
-
   emit_insn (gen_subsi3 (operands[0], current_addr, start_addr));
 
   return true;
 }
+


[PATCH, SH] inline builtin_memset

2014-03-26 Thread Christian Bruel
Hello,

This patch inlines builtin_memset whose size is a constant 128  size 
15. Small sizes are better unrolled with mov_insn sequences. Big size
(or non constants) are better handled with a libc implementation that
does cache line aligned copying and unrolling or prefetching.

No new regressions for sh-none-elf and sh-linux-elf without new errors.

OK for trunk ?

many thanks,








2014-03-20  Christian Bruel  christian.br...@st.com

	* config/sh/sh.md (setmemqi): New expand pattern.
	(CLEAR_RATIO): Define.
	* config/sh/sh-mem.cc (sh_expand_setmem): Define.
	* config/sh/sh-protos.h (sh_expand_setmem): Declare.

2014-01-20  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/memset.c: New test.

Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 208745)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -608,3 +608,106 @@ sh_expand_strlen (rtx *operands)
 
   return true;
 }
+
+/* Emit code to perform a memset
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the size;
+   OPERANDS[2] is the char to search.
+   OPERANDS[3] is the alignment.  */
+void
+sh_expand_setmem (rtx *operands)
+{
+  rtx L_loop_byte = gen_label_rtx ();
+  rtx L_loop_word = gen_label_rtx ();
+  rtx L_return = gen_label_rtx ();
+  rtx jump;
+  rtx dest = copy_rtx (operands[0]);
+  rtx dest_addr = copy_addr_to_reg (XEXP (dest, 0));
+  rtx val = force_reg (SImode, operands[2]);
+  int align = INTVAL (operands[3]);
+  int count = 0;
+  rtx len = force_reg (SImode, operands[1]);
+
+  if (! CONST_INT_P (operands[1]))
+return;
+
+  count = INTVAL (operands[1]);
+
+  if (CONST_INT_P (operands[2])
+   (INTVAL (operands[2]) == 0 || INTVAL (operands[2]) == -1)  count  8)
+{
+  rtx lenw = gen_reg_rtx (SImode);
+
+  if (align  4)
+{
+  emit_insn (gen_tstsi_t (GEN_INT (3), dest_addr));
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+}
+
+  /* word count. Do we have iterations ? */
+  emit_insn (gen_lshrsi3 (lenw, len, GEN_INT (2)));
+
+  dest = adjust_automodify_address (dest, SImode, dest_addr, 0);
+
+  /* start loop.  */
+  emit_label (L_loop_word);
+
+  if (TARGET_SH2)
+emit_insn (gen_dect (lenw, lenw));
+  else
+{
+  emit_insn (gen_addsi3 (lenw, lenw, GEN_INT (-1)));
+  emit_insn (gen_tstsi_t (lenw, lenw));
+}
+
+  emit_move_insn (dest, val);
+  emit_move_insn (dest_addr, plus_constant (Pmode, dest_addr,
+GET_MODE_SIZE (SImode)));
+
+
+  jump = emit_jump_insn (gen_branch_false (L_loop_word));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  count = count % 4;
+
+  dest = adjust_address (dest, QImode, 0);
+
+  val = gen_lowpart (QImode, val);
+
+  while (count--)
+{
+  emit_move_insn (dest, val);
+  emit_move_insn (dest_addr, plus_constant (Pmode, dest_addr,
+GET_MODE_SIZE (QImode)));
+}
+
+  jump = emit_jump_insn (gen_jump_compact (L_return));
+  emit_barrier_after (jump);
+}
+
+  dest = adjust_automodify_address (dest, QImode, dest_addr, 0);
+
+  /* start loop.  */
+  emit_label (L_loop_byte);
+
+  if (TARGET_SH2)
+emit_insn (gen_dect (len, len));
+  else
+{
+  emit_insn (gen_addsi3 (len, len, GEN_INT (-1)));
+  emit_insn (gen_tstsi_t (len, len));
+}
+
+  val = gen_lowpart (QImode, val);
+  emit_move_insn (dest, val);
+  emit_move_insn (dest_addr, plus_constant (Pmode, dest_addr,
+GET_MODE_SIZE (QImode)));
+
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+
+  emit_label (L_return);
+
+  return;
+}
Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 208745)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -119,6 +119,7 @@ extern void prepare_move_operands (rtx[], enum mac
 extern bool sh_expand_cmpstr (rtx *);
 extern bool sh_expand_cmpnstr (rtx *);
 extern bool sh_expand_strlen  (rtx *);
+extern void sh_expand_setmem (rtx *);
 extern enum rtx_code prepare_cbranch_operands (rtx *, enum machine_mode mode,
 	   enum rtx_code comparison);
 extern void expand_cbranchsi4 (rtx *operands, enum rtx_code comparison, int);
Index: gcc/config/sh/sh.h
===
--- gcc/config/sh/sh.h	(revision 208745)
+++ gcc/config/sh/sh.h	(working copy)
@@ -1594,6 +1594,11 @@ struct sh_args {
 
 #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN)
 
+/* If a memory clear move would take CLEAR_RATIO or more simple
+   move-instruction pairs, we will do a setmem instead.  */
+
+#define CLEAR_RATIO(speed) ((speed

Re: [PATCH, SH] inline builtin_memset

2014-03-26 Thread Christian Bruel

On 03/26/2014 11:22 AM, Christian Bruel wrote:
 Hello,

 This patch inlines builtin_memset whose size is a constant 128  size 
 15. Small sizes are better unrolled with mov_insn sequences. Big size
 (or non constants) are better handled with a libc implementation that
 does cache line aligned copying and unrolling or prefetching.

Correction, it's memcpy that can do that, but nevertheless, a
specialized implementation in the glibc is better for big sizes (and
absorbs the cost of the jump).



 No new regressions for sh-none-elf and sh-linux-elf without new errors.

 OK for trunk ?

 many thanks,











Re: [PATCH ARM]: Fix more -mapcs-frame failures

2014-03-07 Thread Christian Bruel
Hi Ramana,

Thanks for your comments,

 Please respin using plus_constant instead of gen_addsi3. 

Here is my feeling about this:

I experimented on using plus_constant instead of gen_addsi3. But there
are cases when the emitted code is not equivalent for large frames
(!const_ok_for_op (val, PLUS)) and leads to complications.

We could fix this case with a call to arm_split_constant (PLUS, Pmode,
NULL, amount, stack_pointer_rtx, stack_pointer_rtx, 0),  but I'm not
sure we gain in clarity here. Also for consistency, the same interface
change would preferably be needed in the other parts of the arm.c file
(that I didn't modify) sharing the same sequence. For instance
arm_expand_epilogue

 Otherwise
 this looks good to me.

 Please repost updated patch and I will look at it again.

For the reasons expressed above it'd prefer to consider this new change
as a separate development with a new patch.

For the time being (considering only the original apcs issue) is it OK
to apply only the patch as it ? and validate a global
gen_addsi3/plus_constant interface review as a separate step ?

Many thanks,

Christian


 Ramana



[PING] [PATCH ARM]: Fix more -mapcs-frame failures

2014-03-03 Thread Christian Bruel
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01402.html

fixes -mapcs-frame -g ICEs.

ok for trunk ?





[PATCH ARM]: Fix more -mapcs-frame failures

2014-02-24 Thread Christian Bruel
This patch improves the one sent previously,
(http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01159.html),  to fix a few
more failures in the testsuite that could arise with shrink-wrap and
-fexceptions.

To recall, the problem that it fixes is that with -mapcs-frame :

-  the epilogue pops as

 sub sp, fp, #12 @ does not set FRAME_RELATED_P
 ldmia   sp, {fp, sp, lr}  @ XXX assert  def_cfa-reg is FP instead
of SP

- with vrp this is worse, we have

   fldmfdd ip!, {d8}@ FRAME_RELATED_P
   sub sp, fp, #20   ...
   ldmfd   sp, {r3, r4, fp, sp, pc}  @ XXX assert def_cfa-reg is IP
instead of SP,

Fixed by inserting a REG_CFA_DEF_CFA note, fixing the arm_unwind_emit
machinery and setting the FRAME_RELATED_P . The comment says :

/* The INSN is generated in epilogue.  It is set as RTX_FRAME_RELATED_P
   to get correct dwarf information for shrink-wrap.  We should not
   emit unwind information for it because these are used either for
   pretend arguments or notes to adjust sp and restore registers from
   stack.  */

the  testsuite score improves without regression (improvements from -g
and -fexeptions tests)

=== gcc Summary for arm-sim//-mapcs-frame ===

# of expected passes77545
# of unexpected failures31
# of unexpected successes2
# of expected failures172
# of unsupported tests1336

 === g++ Summary for arm-sim//-mapcs-frame ===

# of expected passes50116
# of unexpected failures9
# of unexpected successes3
# of expected failures280
# of unsupported tests1229

instead of

=== gcc Summary for arm-sim//-mapcs-frame ===

# of expected passes77106
# of unexpected failures500
# of unexpected successes2
# of expected failures172
# of unresolved testcases111
# of unsupported tests1336

=== g++ Summary for arm-sim//-mapcs-frame ===

# of expected passes50021
# of unexpected failures136
# of unexpected successes3
# of expected failures280
# of unsupported tests1229

Comments ? OK for trunk ?

Many thanks


2014-02-18  Christian Bruel  christian.br...@st.com

	PR target/60264
	* config/arm/arm.c (arm_emit_vfp_multi_reg_pop): Emit a	REG_CFA_DEF_CFA
	note.
	(arm_expand_epilogue_apcs_frame): call arm_add_cfa_adjust_cfa_note.
	(arm_unwind_emit): Allow REG_CFA_DEF_CFA.

2014-02-18  Christian Bruel  christian.br...@st.com

	PR target/60264
	* gcc.target/arm/pr60264.c
	* gcc.target/arm/pr60264-2.c

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 207942)
+++ gcc/config/arm/arm.c	(working copy)
@@ -19909,8 +19909,15 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num
   par = emit_insn (par);
   REG_NOTES (par) = dwarf;
 
-  arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs,
-			   base_reg, base_reg);
+  /* Make sure cfa doesn't leave with IP_REGNUM to allow unwinding fron FP.  */
+  if (TARGET_VFP  REGNO (base_reg) == IP_REGNUM)
+{
+  RTX_FRAME_RELATED_P (par) = 1;
+  add_reg_note (par, REG_CFA_DEF_CFA, hard_frame_pointer_rtx);
+}
+  else
+arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs,
+ base_reg, base_reg);
 }
 
 /* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
@@ -27098,15 +27105,19 @@ arm_expand_epilogue_apcs_frame (bool really_return
   if (TARGET_HARD_FLOAT  TARGET_VFP)
 {
   int start_reg;
+  rtx ip_rtx = gen_rtx_REG (SImode, IP_REGNUM);
 
   /* The offset is from IP_REGNUM.  */
   int saved_size = arm_get_vfp_saved_size ();
   if (saved_size  0)
 {
+	  rtx insn;
   floats_from_frame += saved_size;
-  emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
- hard_frame_pointer_rtx,
- GEN_INT (-floats_from_frame)));
+  insn = emit_insn (gen_addsi3 (ip_rtx,
+	hard_frame_pointer_rtx,
+	GEN_INT (-floats_from_frame)));
+	  arm_add_cfa_adjust_cfa_note (insn, -floats_from_frame,
+   ip_rtx, hard_frame_pointer_rtx);
 }
 
   /* Generate VFP register multi-pop.  */
@@ -27179,11 +27190,15 @@ arm_expand_epilogue_apcs_frame (bool really_return
   num_regs = bit_count (saved_regs_mask);
   if ((offsets-outgoing_args != (1 + num_regs)) || cfun-calls_alloca)
 {
+  rtx insn;
   emit_insn (gen_blockage ());
   /* Unwind the stack to just below the saved registers.  */
-  emit_insn (gen_addsi3 (stack_pointer_rtx,
- hard_frame_pointer_rtx,
- GEN_INT (- 4 * num_regs)));
+  insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
+hard_frame_pointer_rtx,
+GEN_INT (- 4 * num_regs)));
+
+  arm_add_cfa_adjust_cfa_note (insn, - 4 * num_regs,
+   stack_pointer_rtx, hard_frame_pointer_rtx

Re: [PATCH ARM]: Fix more -mapcs-frame failures

2014-02-24 Thread Christian Bruel

On 02/24/2014 11:11 AM, Zhenqiang Chen wrote:
 Please also check the two test cases in patch
 https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg72712.html

Just cheched, they both now pass.

Cheers,



 Thanks!
 -Zhenqiang

 On 24 February 2014 17:11, Christian Bruel christian.br...@st.com wrote:
 This patch improves the one sent previously,
 (http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01159.html),  to fix a few
 more failures in the testsuite that could arise with shrink-wrap and
 -fexceptions.

 To recall, the problem that it fixes is that with -mapcs-frame :

 -  the epilogue pops as

  sub sp, fp, #12 @ does not set FRAME_RELATED_P
  ldmia   sp, {fp, sp, lr}  @ XXX assert  def_cfa-reg is FP instead
 of SP

 - with vrp this is worse, we have

fldmfdd ip!, {d8}@ FRAME_RELATED_P
sub sp, fp, #20   ...
ldmfd   sp, {r3, r4, fp, sp, pc}  @ XXX assert def_cfa-reg is IP
 instead of SP,

 Fixed by inserting a REG_CFA_DEF_CFA note, fixing the arm_unwind_emit
 machinery and setting the FRAME_RELATED_P . The comment says :

 /* The INSN is generated in epilogue.  It is set as RTX_FRAME_RELATED_P
to get correct dwarf information for shrink-wrap.  We should not
emit unwind information for it because these are used either for
pretend arguments or notes to adjust sp and restore registers from
stack.  */

 the  testsuite score improves without regression (improvements from -g
 and -fexeptions tests)

 === gcc Summary for arm-sim//-mapcs-frame ===

 # of expected passes77545
 # of unexpected failures31
 # of unexpected successes2
 # of expected failures172
 # of unsupported tests1336

  === g++ Summary for arm-sim//-mapcs-frame ===

 # of expected passes50116
 # of unexpected failures9
 # of unexpected successes3
 # of expected failures280
 # of unsupported tests1229

 instead of

 === gcc Summary for arm-sim//-mapcs-frame ===

 # of expected passes77106
 # of unexpected failures500
 # of unexpected successes2
 # of expected failures172
 # of unresolved testcases111
 # of unsupported tests1336

 === g++ Summary for arm-sim//-mapcs-frame ===

 # of expected passes50021
 # of unexpected failures136
 # of unexpected successes3
 # of expected failures280
 # of unsupported tests1229

 Comments ? OK for trunk ?

 Many thanks





[PATCH ARM] Fix PR60264 (ICE in dwarf2out_frame_debug_adjust_cfa) part 2

2014-02-19 Thread Christian Bruel
Hello,

This patch is a followup of
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01042.html

If fixes a bunch of ICEs for the testsuite ran with
--target_board=arm-sim/\{-mapcs-frame\},  noticed on a reference branch
for testing the former patch.

One of the strange issue I had to deal with, for instance with
./gcc.c-torture/compile/991202-1.c,  is that the epilogue emitted the
CFA notes in the following order:

(set/f (reg:SI 12 ip)
(plus:SI (reg:SI 12 ip)
(const_int 16 [0x10])))
(set/f (reg:DF 32 s16)
(mem/c:DF (reg:SI 12 ip) [3  S8 A64]))
(set/f (reg:DF 34 s18)
(mem/c:DF (plus:SI (reg:SI 12 ip)
(const_int 8 [0x8])) [3  S8 A64]))
]) /home/bruelc/tmp/991202-1.c:18 347
{*vfp_pop_multiple_with_writeback}
 (expr_list:REG_UNUSED (reg:SI 12 ip)
(expr_list:REG_CFA_ADJUST_CFA (set (reg:SI 12 ip)
(plus:SI (reg:SI 12 ip)
(const_int 16 [0x10])))
(expr_list:REG_CFA_DEF_CFA (reg/f:SI 11 fp)
(expr_list:REG_CFA_RESTORE (reg:DF 34 s18)
(expr_list:REG_CFA_RESTORE (reg:DF 32 s16)
(nil)))

but shrink-wrapping duplicates it as

(insn/f:TI 140 137 171 (parallel [
(set/f (reg:SI 12 ip)
(plus:SI (reg:SI 12 ip)
(const_int 16 [0x10])))
(set/f (reg:DF 32 s16)
(mem/c:DF (reg:SI 12 ip) [3  S8 A64]))
(set/f (reg:DF 34 s18)
(mem/c:DF (plus:SI (reg:SI 12 ip)
(const_int 8 [0x8])) [3  S8 A64]))
]) /home/bruelc/tmp/991202-1.c:18 347
{*vfp_pop_multiple_with_writeback}
 (expr_list:REG_UNUSED (reg:SI 12 ip)
(expr_list:REG_CFA_RESTORE (reg:DF 32 s16)
(expr_list:REG_CFA_RESTORE (reg:DF 34 s18)
(expr_list:REG_CFA_DEF_CFA (reg/f:SI 11 fp)
(expr_list:REG_CFA_ADJUST_CFA (set (reg:SI 12 ip)
(plus:SI (reg:SI 12 ip)
(const_int 16 [0x10])))
(nil)))

Since the CFA_RESTORE order is inverted with CFA_DEF_CFA, cur_cfa-reg
was set with IP instead of FP

I fixed this by not emitting the CFA_ADJUST_CFA in this case, since it's
not needed anyway as we have:

fldmfddip!, {d8-d9}@ 140*vfp_pop_multiple_with_writeback
subsp, fp, #12@ 142  
ldmfdsp, {fp, sp, pc}@ 143   

so after @140 cur_cfa can't be IP.

Regression tested for for armv7-a
--target_board=arm-sim/\{,-mapcs-frame\}. Fixes a large number of tests

OK for trunk ?

Many thanks


--- config/arm/arm.c	2014-02-19 15:28:34.0 +0100
+++ /work1/bruel/superh_elf/gnu_trunk.devs/gcc/gcc/config/arm/arm.c	2014-02-19 14:30:44.0 +0100
@@ -19911,10 +19911,14 @@
 
   /* Make sure cfa doesn't leave with IP_REGNUM.  */
   if (TARGET_VFP  REGNO (base_reg) == IP_REGNUM)
-add_reg_note (par, REG_CFA_DEF_CFA, hard_frame_pointer_rtx);
+{
+  RTX_FRAME_RELATED_P (par) = 1;
+  add_reg_note (par, REG_CFA_DEF_CFA, hard_frame_pointer_rtx);
+}
+  else
+arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs,
+ base_reg, base_reg);
 
-  arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs,
-			   base_reg, base_reg);
 
 }
 
@@ -27109,8 +27113,8 @@
   if (saved_size  0)
 {
 	  rtx insn;
-	  floats_from_frame += saved_size;
-	  insn = emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
+  floats_from_frame += saved_size;
+  insn = emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
 	hard_frame_pointer_rtx,
 	GEN_INT (-floats_from_frame)));
 	  RTX_FRAME_RELATED_P (insn) = 1;
@@ -27192,7 +27196,9 @@
   insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
 hard_frame_pointer_rtx,
 GEN_INT (- 4 * num_regs)));
-  RTX_FRAME_RELATED_P (insn) = 1;
+
+  arm_add_cfa_adjust_cfa_note (insn, - 4 * num_regs,
+   stack_pointer_rtx, hard_frame_pointer_rtx);
 }
 
   arm_emit_multi_reg_pop (saved_regs_mask);


[PATCH ARM] Fix PR60264 (ICE in dwarf2out_frame_debug_adjust_cfa)

2014-02-18 Thread Christian Bruel
Hello,

Considering the attached trivial case with the epilogue:

 sub sp, fp, #12
 ldmia   sp, {fp, sp, lr}frame_related_p
 
the sub instruction should also be frame_related_fp. (a gcc_assert
triggers in dwarf2out_frame_debug_adjust_cfa)

This patch sets RTX_FRAME_RELATED_P on stack restore instructions for
the -mapcs ABI.

A second problem arise with -mfloat-abi=hard, hidden by the above. a vrp
poping instruction in the epilogue (see tescase from the PR)  sets the
cfa register to IP, although the following instruction updates FP

 fldmfdd ip!, {d8}frame_related_p
 sub sp, fp, #12   frame_related_p
 ldmia   sp, {fp, sp, lr}frame_related_p

This patch adds a  REG_CFA_DEF_CFA note so the sub instruction gets the
FP as expected.

Regression tested for  for armv7-a
--target_board=arm-sim/\{,-mapcs-frame\}. Fixes a large number of
compilation errors in the testsuite.

OK for trunk ?

Many Thanks








Re: [PATCH ARM] Fix PR60264 (ICE in dwarf2out_frame_debug_adjust_cfa)

2014-02-18 Thread Christian Bruel
probably easier to review with patch attached...

2014-02-18  Christian Bruel  christian.br...@st.com

	PR target/60264
	* config/arm/arm.c (arm_emit_vfp_multi_reg_pop): Restore cfa register.
	(arm_expand_epilogue_apcs_frame): Set RTX_FRAME_RELATED_P.

2014-02-18  Christian Bruel  christian.br...@st.com

	PR target/60264
	* gcc.target/arm/pr60264.c
	* gcc.target/arm/pr60264-2.c

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 207817)
+++ gcc/config/arm/arm.c	(working copy)
@@ -19909,8 +19909,13 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num
   par = emit_insn (par);
   REG_NOTES (par) = dwarf;
 
+  /* Make sure cfa doesn't leave with IP_REGNUM.  */
+  if (TARGET_VFP  REGNO (base_reg) == IP_REGNUM)
+add_reg_note (par, REG_CFA_DEF_CFA, hard_frame_pointer_rtx);
+
   arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs,
 			   base_reg, base_reg);
+
 }
 
 /* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
@@ -27103,10 +27108,12 @@ arm_expand_epilogue_apcs_frame (bool really_return
   int saved_size = arm_get_vfp_saved_size ();
   if (saved_size  0)
 {
-  floats_from_frame += saved_size;
-  emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
- hard_frame_pointer_rtx,
- GEN_INT (-floats_from_frame)));
+	  rtx insn;
+	  floats_from_frame += saved_size;
+	  insn = emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
+	hard_frame_pointer_rtx,
+	GEN_INT (-floats_from_frame)));
+	  RTX_FRAME_RELATED_P (insn) = 1;
 }
 
   /* Generate VFP register multi-pop.  */
@@ -27179,11 +27186,13 @@ arm_expand_epilogue_apcs_frame (bool really_return
   num_regs = bit_count (saved_regs_mask);
   if ((offsets-outgoing_args != (1 + num_regs)) || cfun-calls_alloca)
 {
+  rtx insn;
   emit_insn (gen_blockage ());
   /* Unwind the stack to just below the saved registers.  */
-  emit_insn (gen_addsi3 (stack_pointer_rtx,
- hard_frame_pointer_rtx,
- GEN_INT (- 4 * num_regs)));
+  insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
+hard_frame_pointer_rtx,
+GEN_INT (- 4 * num_regs)));
+  RTX_FRAME_RELATED_P (insn) = 1;
 }
 
   arm_emit_multi_reg_pop (saved_regs_mask);
Index: gcc/testsuite/gcc.target/arm/pr60264-2.c
===
--- gcc/testsuite/gcc.target/arm/pr60264-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr60264-2.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -mapcs -mfloat-abi=hard  -g } */
+
+double bar(void);
+
+int foo(void)
+{
+  int i = bar() + bar();
+
+  return i;
+}
+
Index: gcc/testsuite/gcc.target/arm/pr60264.c
===
--- gcc/testsuite/gcc.target/arm/pr60264.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr60264.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -mapcs -g } */
+
+void
+bar()
+{
+  foo();
+  foo();
+}


[PATCH, SH] fix builtin_strncmp

2014-01-24 Thread Christian Bruel
Hi,

This patch fixes a bug of mine whereas more bytes than needed was read
when processing remaining bytes after an aligned  word at at time loop.
This case was not caught neither by the gcc testsuite not the glibc
tests, Regression test included.

no new regressions with the GCC and glibc testsuites., OK for trunk ?

thanks,
Christian




2014-01-23  Christian Bruel  christian.br...@st.com

	* config/sh/sh-mem.cc (sh_expand_cmpnstr): Fix remaining bytes after
	words comparisons.

2014-01-23  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/torture/strncmp.c: New tests.

Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 206918)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -344,7 +344,6 @@ sh_expand_cmpnstr (rtx *operands)
 
   rtx L_loop_long = gen_label_rtx ();
   rtx L_end_loop_long = gen_label_rtx ();
-  rtx L_small = gen_label_rtx ();
 
   int align = INTVAL (operands[4]);
   int bytes = INTVAL (operands[3]);
@@ -403,34 +402,60 @@ sh_expand_cmpnstr (rtx *operands)
   jump = emit_jump_insn (gen_branch_false (L_loop_long));
   add_int_reg_note (jump, REG_BR_PROB, prob_likely);
 
+ int sbytes = bytes % 4;
+
   /* end loop.  Reached max iterations.  */
-  if (bytes % 4 == 0)
+  if (! sbytes)
 {
-  /* Done.  */
   jump = emit_jump_insn (gen_jump_compact (L_return));
   emit_barrier_after (jump);
 }
   else
 {
-  /* Remaining bytes to read.   */
-  jump = emit_jump_insn (gen_jump_compact (L_small));
+  /* Remaining bytes to check.  */
+
+  addr1 = adjust_automodify_address (addr1, QImode, s1_addr, 0);
+  addr2 = adjust_automodify_address (addr2, QImode, s2_addr, 0);
+
+  while (sbytes--)
+{
+  emit_insn (gen_extendqisi2 (tmp1, addr1));
+  emit_insn (gen_extendqisi2 (tmp2, addr2));
+
+  emit_insn (gen_cmpeqsi_t (tmp2, const0_rtx));
+  jump = emit_jump_insn (gen_branch_true (L_end_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
+
+  emit_insn (gen_cmpeqsi_t (tmp1, tmp2));
+  if (flag_delayed_branch)
+emit_insn (gen_zero_extendqisi2 (tmp2,
+ gen_lowpart (QImode,
+  tmp2)));
+  jump = emit_jump_insn (gen_branch_false (L_end_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
+
+  addr1 = adjust_address (addr1, QImode,
+  GET_MODE_SIZE (QImode));
+  addr2 = adjust_address (addr2, QImode,
+  GET_MODE_SIZE (QImode));
+}
+
+  jump = emit_jump_insn (gen_jump_compact( L_end_loop_byte));
   emit_barrier_after (jump);
 }
 
   emit_label (L_end_loop_long);
 
   /* Found last word.  Restart it byte per byte. */
-  bytes =  4;
+
   emit_move_insn (s1_addr, plus_constant (Pmode, s1_addr,
   -GET_MODE_SIZE (SImode)));
   emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr,
   -GET_MODE_SIZE (SImode)));
+
+  /* fall thru.  */
 }
 
-  emit_label (L_small);
-
-  gcc_assert (bytes = 7);
-
   addr1 = adjust_automodify_address (addr1, QImode, s1_addr, 0);
   addr2 = adjust_automodify_address (addr2, QImode, s2_addr, 0);
 
@@ -445,7 +470,8 @@ sh_expand_cmpnstr (rtx *operands)
 
   emit_insn (gen_cmpeqsi_t (tmp1, tmp2));
   if (flag_delayed_branch)
-emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2)));
+emit_insn (gen_zero_extendqisi2 (tmp2,
+ gen_lowpart (QImode, tmp2)));
   jump = emit_jump_insn (gen_branch_false (L_end_loop_byte));
   add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
 
Index: gcc/testsuite/gcc.target/sh/torture/strncmp.c
===
--- gcc/testsuite/gcc.target/sh/torture/strncmp.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/torture/strncmp.c	(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+const char *s=astc;
+const char *s1=-BEGIN RSA PRIVATE KEY-;
+const char *s2=atextaac;
+
+main()
+{
+  if (! __builtin_strncmp (astb, s, 4))
+abort();
+
+  if (__builtin_strncmp(s1, -BEGIN , 11))
+abort();
+
+  if (! __builtin_strncmp (atextaacb, s2, 9))
+abort();
+
+  return 0;
+}
+


[PATCH, SH] Improve builtin strnlen for small lengths

2014-01-10 Thread Christian Bruel
Hello,

This patch unrolls string compare for length  8 and residual bytes
after the word at a time loops (with cmp/str), using base+offset
addressing mode.
It also allows the builtin to be inlined for non-constant lengths.

No new regressions. Upgraded test case to handle former case.

OK for trunk ?

thanks






2014-01-09  Christian Bruel  christian.br...@st.com

	* gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr): Unroll small sizes and
	  optimized non constant lengths.

2014-01-09  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/cmpstrn.c: New case.

Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 206385)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -324,7 +324,6 @@ sh_expand_cmpnstr (rtx *operands)
   rtx addr2 = operands[2];
   rtx s1_addr = copy_addr_to_reg (XEXP (addr1, 0));
   rtx s2_addr = copy_addr_to_reg (XEXP (addr2, 0));
-  rtx tmp0 = gen_reg_rtx (SImode);
   rtx tmp1 = gen_reg_rtx (SImode);
   rtx tmp2 = gen_reg_rtx (SImode);
 
@@ -334,98 +333,128 @@ sh_expand_cmpnstr (rtx *operands)
   rtx L_end_loop_byte = gen_label_rtx ();
 
   rtx len = force_reg (SImode, operands[3]);
-  int constp = (CONST_INT_P (operands[3]));
-  int bytes = (constp ? INTVAL (operands[3]) : 0);
-  int witers = bytes / 4;
+  int constp = CONST_INT_P (operands[3]);
 
-  /* We could still loop on a register count. Not found very
- convincing to optimize yet.  */
-  if (! constp)
-return false;
+  /* Loop on a register count. */
+  if (constp)
+{
+  rtx tmp0 = gen_reg_rtx (SImode);
+  rtx tmp3 = gen_reg_rtx (SImode);
+  rtx lenw = gen_reg_rtx (SImode);
 
-  if (witers  1)
-{
   rtx L_loop_long = gen_label_rtx ();
   rtx L_end_loop_long = gen_label_rtx ();
-  rtx tmp3 = gen_reg_rtx (SImode);
-  rtx lenw = gen_reg_rtx (SImode);
+  rtx L_small = gen_label_rtx ();
+
   int align = INTVAL (operands[4]);
+  int bytes = INTVAL (operands[3]);
+  int witers = bytes / 4;
 
-  emit_move_insn (tmp0, const0_rtx);
+  if (witers  1)
+{
+  addr1 = adjust_automodify_address (addr1, SImode, s1_addr, 0);
+  addr2 = adjust_automodify_address (addr2, SImode, s2_addr, 0);
 
-  if (align  4)
-	{
-	  emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
-	  emit_insn (gen_tstsi_t (GEN_INT (3), tmp1));
-	  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-	  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
-	}
+  emit_move_insn (tmp0, const0_rtx);
 
-  addr1 = adjust_automodify_address (addr1, SImode, s1_addr, 0);
-  addr2 = adjust_automodify_address (addr2, SImode, s2_addr, 0);
+  if (align  4)
+{
+  emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
+  emit_insn (gen_tstsi_t (GEN_INT (3), tmp1));
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+}
 
-  /* word count. Do we have iterations ? */
-  emit_insn (gen_lshrsi3 (lenw, len, GEN_INT (2)));
+  /* word count. Do we have iterations ? */
+  emit_insn (gen_lshrsi3 (lenw, len, GEN_INT (2)));
 
-  /*start long loop.  */
-  emit_label (L_loop_long);
+  /*start long loop.  */
+  emit_label (L_loop_long);
 
-  /* tmp2 is aligned, OK to load.  */
-  emit_move_insn (tmp2, addr2);
-  emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, 4));
+  /* tmp2 is aligned, OK to load.  */
+  emit_move_insn (tmp2, addr2);
+  emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, GET_MODE_SIZE (SImode)));
 
-  /* tmp1 is aligned, OK to load.  */
-  emit_move_insn (tmp1, addr1);
-  emit_move_insn (s1_addr, plus_constant (Pmode, s1_addr, 4));
+  /* tmp1 is aligned, OK to load.  */
+  emit_move_insn (tmp1, addr1);
+  emit_move_insn (s1_addr, plus_constant (Pmode, s1_addr, GET_MODE_SIZE (SImode)));
 
-  /* Is there a 0 byte ?  */
-  emit_insn (gen_andsi3 (tmp3, tmp2, tmp1));
+  /* Is there a 0 byte ?  */
+  emit_insn (gen_andsi3 (tmp3, tmp2, tmp1));
 
-  emit_insn (gen_cmpstr_t (tmp0, tmp3));
-  jump = emit_jump_insn (gen_branch_true (L_end_loop_long));
-  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
+  emit_insn (gen_cmpstr_t (tmp0, tmp3));
+  jump = emit_jump_insn (gen_branch_true (L_end_loop_long));
+  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
 
-  emit_insn (gen_cmpeqsi_t (tmp1, tmp2));
-  jump = emit_jump_insn (gen_branch_false (L_end_loop_long));
-  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
+  emit_insn (gen_cmpeqsi_t (tmp1, tmp2));
+  jump = emit_jump_insn (gen_branch_false (L_end_loop_long));
+  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
 
-  if (TARGET_SH2)
-	emit_insn (gen_dect (lenw, lenw

Re: [PATCH, SH] Implement builtin_strlen

2013-11-06 Thread Christian Bruel

On 11/05/2013 02:12 PM, Kaz Kojima wrote:
 Christian Bruel christian.br...@st.com wrote:
 No regressions for sh-none-elf. OK for trunk ?
 OK.

 Regards,
   kaz

thanks, applied together with the cleanup referenced earlier and a
slight variable renaming (start_addr-curr_addr, end_addr-start_addr)
for readability as obvious,

Christian


[PATCH, SH] Cleanup/simplify str builtins

2013-11-05 Thread Christian Bruel
Hello,

This patch is just for code simplification,  Main changes are that I now
use adjust_address instead of adjust_automodify_address and the prob
variables are globalized to ease tuning and avoid duplication.

No regression for sh-none-elf. Prerequisite for upcoming builtin_strlen
patch.

Many thanks

Christian





Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 204346)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-11-05  Christian Bruel  christian.br...@st.com
+
+	* gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr, sh_expand_cmpstr):
+	Factorize probabilities, Use adjust_address instead of
+	adjust_automodify_address when possible. Enable for optimize.
+
 2013-11-04  Richard Sandiford  rdsandif...@googlemail.com
 
 	* config/avr/avr-log.c (avr_double_int_pop_digit): Delete.
Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 204346)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -179,32 +179,31 @@ expand_block_move (rtx *operands)
   return false;
 }
 
+static int prob_unlikely = REG_BR_PROB_BASE / 10;
+static int prob_likely = REG_BR_PROB_BASE / 4;
+
 /* Emit code to perform a strcmp.
 
OPERANDS[0] is the destination.
OPERANDS[1] is the first string.
OPERANDS[2] is the second string.
-   OPERANDS[3] is the align.  */
+   OPERANDS[3] is the known alignment.  */
 bool
 sh_expand_cmpstr (rtx *operands)
 {
-  rtx s1 = copy_rtx (operands[1]);
-  rtx s2 = copy_rtx (operands[2]);
-  rtx s1_addr = copy_addr_to_reg (XEXP (s1, 0));
-  rtx s2_addr = copy_addr_to_reg (XEXP (s2, 0));
+  rtx addr1 = operands[1];
+  rtx addr2 = operands[2];
+  rtx s1_addr = copy_addr_to_reg (XEXP (addr1, 0));
+  rtx s2_addr = copy_addr_to_reg (XEXP (addr2, 0));
   rtx tmp0 = gen_reg_rtx (SImode);
   rtx tmp1 = gen_reg_rtx (SImode);
   rtx tmp2 = gen_reg_rtx (SImode);
   rtx tmp3 = gen_reg_rtx (SImode);
 
+  rtx jump;
   rtx L_return = gen_label_rtx ();
   rtx L_loop_byte = gen_label_rtx ();
   rtx L_end_loop_byte = gen_label_rtx ();
-
-  rtx jump, addr1, addr2;
-  int prob_unlikely = REG_BR_PROB_BASE / 10;
-  int prob_likely = REG_BR_PROB_BASE / 4;
-
   rtx L_loop_long = gen_label_rtx ();
   rtx L_end_loop_long = gen_label_rtx ();
 
@@ -220,8 +219,8 @@ sh_expand_cmpstr (rtx *operands)
   add_int_reg_note (jump, REG_BR_PROB, prob_likely);
 }
 
-  addr1 = adjust_automodify_address (s1, SImode, s1_addr, 0);
-  addr2 = adjust_automodify_address (s2, SImode, s2_addr, 0);
+  addr1 = adjust_automodify_address (addr1, SImode, s1_addr, 0);
+  addr2 = adjust_automodify_address (addr2, SImode, s2_addr, 0);
 
   /* tmp2 is aligned, OK to load.  */
   emit_move_insn (tmp3, addr2);
@@ -276,8 +275,8 @@ sh_expand_cmpstr (rtx *operands)
   emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, -4));
 
   /* start byte loop.  */
-  addr1 = adjust_automodify_address (s1, QImode, s1_addr, 0);
-  addr2 = adjust_automodify_address (s2, QImode, s2_addr, 0);
+  addr1 = adjust_address (addr1, QImode, 0);
+  addr2 = adjust_address (addr2, QImode, 0);
 
   emit_label (L_loop_byte);
 
@@ -317,27 +316,23 @@ sh_expand_cmpstr (rtx *operands)
OPERANDS[1] is the first string.
OPERANDS[2] is the second string.
OPERANDS[3] is the length.
-   OPERANDS[4] is the align.  */
+   OPERANDS[4] is the known alignment.  */
 bool
 sh_expand_cmpnstr (rtx *operands)
 {
-  rtx s1 = copy_rtx (operands[1]);
-  rtx s2 = copy_rtx (operands[2]);
-
-  rtx s1_addr = copy_addr_to_reg (XEXP (s1, 0));
-  rtx s2_addr = copy_addr_to_reg (XEXP (s2, 0));
+  rtx addr1 = operands[1];
+  rtx addr2 = operands[2];
+  rtx s1_addr = copy_addr_to_reg (XEXP (addr1, 0));
+  rtx s2_addr = copy_addr_to_reg (XEXP (addr2, 0));
   rtx tmp0 = gen_reg_rtx (SImode);
   rtx tmp1 = gen_reg_rtx (SImode);
   rtx tmp2 = gen_reg_rtx (SImode);
 
+  rtx jump;
   rtx L_return = gen_label_rtx ();
   rtx L_loop_byte = gen_label_rtx ();
   rtx L_end_loop_byte = gen_label_rtx ();
 
-  rtx jump, addr1, addr2;
-  int prob_unlikely = REG_BR_PROB_BASE / 10;
-  int prob_likely = REG_BR_PROB_BASE / 4;
-
   rtx len = force_reg (SImode, operands[3]);
   int constp = (CONST_INT_P (operands[3]));
   int bytes = (constp ? INTVAL (operands[3]) : 0);
@@ -366,10 +361,10 @@ sh_expand_cmpnstr (rtx *operands)
 	  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
 	}
 
-  addr1 = adjust_automodify_address (s1, SImode, s1_addr, 0);
-  addr2 = adjust_automodify_address (s2, SImode, s2_addr, 0);
+  addr1 = adjust_automodify_address (addr1, SImode, s1_addr, 0);
+  addr2 = adjust_automodify_address (addr2, SImode, s2_addr, 0);
 
-  /* words count.  */
+  /* word count. Do we have iterations ? */
   emit_insn (gen_lshrsi3 (lenw, len, GEN_INT (2)));
 
   /*start long loop.  */
@@ -429,48 +424,48 @@ sh_expand_cmpnstr (rtx *operands)
   emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, -4));
 }
 
-addr1

[PATCH, SH] Implement builtin_strlen

2013-11-05 Thread Christian Bruel
Hello,

This patch inlines strlen when optimizing for speed.

A strlen body is now inlined as:

mov r4,r0
tst #3,r0
bf/s.L6
mov r4,r1
mov #0,r3
.L4:
mov.l   @r1+,r2
cmp/str r3,r2
bf  .L4
add #-4,r1
.L6:
mov.b   @r1+,r2
tst r2,r2
bf/s.L6
sett
mov r1,r0
rts
subcr4,r0

A few percent performance improvement here and there for regexp based
benchmarks, but worth to highlight is a 70% speedup for eembc
networking/qos that now nicely combines sequences like !strncmp(*av,
any, strlen(*av))

No regressions for sh-none-elf. OK for trunk ?

Many thanks

Christian



2013-11-05  Christian Bruel  christian.br...@st.com

	* gcc/config/sh/sh-mem.cc (sh_expand_strlen): New function.
	* gcc/config/sh/sh-protos.h (sh_expand_strlen): Declare.
	* gcc/config/sh/sh.md (strlensi): New pattern.
	(UNSPEC_BUILTIN_STRLEN): Define.

2013-11-05  Christian Bruel  christian.br...@st.com

	* gcc.target/sh/strlen.c: New test.

diff --exclude='*~' --exclude=.svn -ruN gcc/config/sh/sh.md ../../gnu_trunk.test/gcc/gcc/config/sh/sh.md
--- gcc/config/sh/sh.md	2013-11-05 12:28:38.0 +0100
+++ ../../gnu_trunk.test/gcc/gcc/config/sh/sh.md	2013-11-05 11:16:00.0 +0100
@@ -161,6 +161,9 @@
   ;; (unspec [OFFSET ANCHOR] UNSPEC_PCREL_SYMOFF) == OFFSET - (ANCHOR - .).
   (UNSPEC_PCREL_SYMOFF	46)
 
+  ;; Misc builtins
+  (UNSPEC_BUILTIN_STRLEN 47)
+
   ;; These are used with unspec_volatile.
   (UNSPECV_BLOCKAGE	0)
   (UNSPECV_ALIGN	1)
@@ -12081,6 +12084,20 @@
 FAIL;
 })
 
+(define_expand strlensi
+  [(set (match_operand:SI 0 register_operand)
+	(unspec:SI [(match_operand:BLK 1 memory_operand)
+		   (match_operand:SI 2 immediate_operand)
+		   (match_operand:SI 3 immediate_operand)]
+		  UNSPEC_BUILTIN_STRLEN))]
+  TARGET_SH1  optimize
+{
+ if (! optimize_insn_for_size_p ()  sh_expand_strlen (operands))
+   DONE;
+ else
+   FAIL;
+})
+
 
 ;; -
 ;; Floating point instructions.
diff --exclude='*~' --exclude=.svn -ruN gcc/config/sh/sh-mem.cc ../../gnu_trunk.test/gcc/gcc/config/sh/sh-mem.cc
--- gcc/config/sh/sh-mem.cc	2013-11-05 12:30:33.0 +0100
+++ ../../gnu_trunk.test/gcc/gcc/config/sh/sh-mem.cc	2013-11-04 15:34:05.0 +0100
@@ -469,3 +469,83 @@
 
   return true;
 }
+
+/* Emit code to perform a strlen
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the string.
+   OPERANDS[2] is the char to search.
+   OPERANDS[3] is the alignment.  */
+bool
+sh_expand_strlen (rtx *operands)
+{
+  rtx addr1 = operands[1];
+  rtx start_addr = copy_addr_to_reg (XEXP (addr1, 0));
+  rtx end_addr = gen_reg_rtx (Pmode);
+  rtx tmp0 = gen_reg_rtx (SImode);
+  rtx tmp1 = gen_reg_rtx (SImode);
+  rtx L_return = gen_label_rtx ();
+  rtx L_loop_byte = gen_label_rtx ();
+
+  rtx jump;
+  rtx L_loop_long = gen_label_rtx ();
+  rtx L_end_loop_long = gen_label_rtx ();
+
+  int align = INTVAL (operands[3]);
+
+  emit_move_insn (operands[0], GEN_INT (-1));
+
+  /* remember start of string.  */
+  emit_move_insn (end_addr, start_addr);
+
+  if (align  4)
+{
+  emit_insn (gen_tstsi_t (GEN_INT (3), start_addr));
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+}
+
+  emit_move_insn (tmp0, operands[2]);
+
+  addr1 = adjust_automodify_address (addr1, SImode, start_addr, 0);
+
+  /*start long loop.  */
+  emit_label (L_loop_long);
+
+  /* tmp1 is aligned, OK to load.  */
+  emit_move_insn (tmp1, addr1);
+  emit_move_insn (start_addr, plus_constant (Pmode, start_addr, 4));
+
+  /* Is there a 0 byte ?  */
+  emit_insn (gen_cmpstr_t (tmp0, tmp1));
+
+  jump = emit_jump_insn (gen_branch_false (L_loop_long));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  /* end loop.  */
+
+  emit_label (L_end_loop_long);
+
+  emit_move_insn (start_addr, plus_constant (Pmode, start_addr, -4));
+
+  /* start byte loop.  */
+  addr1 = adjust_address (addr1, QImode, 0);
+
+  emit_label (L_loop_byte);
+
+  emit_insn (gen_extendqisi2 (tmp1, addr1));
+  emit_move_insn (start_addr, plus_constant (Pmode, start_addr, 1));
+
+  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+
+  /* end loop.  */
+
+  emit_label (L_return);
+
+  emit_insn (gen_addsi3 (end_addr, end_addr, GEN_INT (1)));
+
+  emit_insn (gen_subsi3 (operands[0], start_addr, end_addr));
+
+  return true;
+}
diff --exclude='*~' --exclude=.svn -ruN gcc/config/sh/sh-protos.h ../../gnu_trunk.test/gcc/gcc/config/sh/sh-protos.h
--- gcc/config/sh/sh-protos.h	2013-11-05 12:47:44.0 +0100
+++ ../../gnu_trunk.test/gcc/gcc/config/sh/sh-protos.h	2013-11-05 10:14:48.0 +0100
@@ -118,6 +118,7 @@
 extern void prepare_move_operands (rtx[], enum machine_mode mode);
 extern bool

[PATCH, SH] Add support for inlined builtin_strncmp

2013-10-25 Thread Christian Bruel
Hello,

This patch implements the cmpstrnsi pattern to support the strncmp
builtin for constant lengths. The cmp/str instructions is used for size
= 8 bytes, else fall back to the byte-at-a-time check to favor small
strings.

I now also handle the cases where align is known for both cmpstr and
cmpstrn, so we can avoid the pointer check, and added a schedule
improvement to speculate the extu.b  r1,r1 instruction into the delay
slot, winning an additional instruction (we know that r1 is 0) when the
end of string is reached. The byte-at-a-time loop becomes:

mov.b   @r4+,r1
tst r1,r1
bt/s.L4
mov.b   @r3+,r0
cmp/eq  r1,r0
bt/s.L9
extu.b  r1,r1
.L4:
extu.b  r0,r0
rts
sub r1,r0

Enabled the existing execute/builtins/strncmp-2.c for functional check
and added 2 new target specific tests.

No regressions for -m2 and -m4 for sh-elf.
OK for trunk ?

Many thanks,

Christian





2013-10-27  Christian Bruel  christian.br...@st.com

	* gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr): Moved here.
	(sh_expand_cmpstr): Handle known align and schedule improvements.
	* gcc/config/sh/sh-protos.h (sh_expand_cmpstrn): Declare.
	* gcc/config/sh/sh.md (cmpstrnsi): New pattern.

	* gcc.c-torture/execute/builtins/strncmp-2.c: Enable for SH.
	* gcc.target/sh/cmpstr.c: New test.
	* gcc.target/sh/cmpstrn.c: New test.

Index: config/sh/sh-mem.cc
===
--- config/sh/sh-mem.cc	(revision 204013)
+++ config/sh/sh-mem.cc	(working copy)
@@ -200,22 +200,25 @@ sh_expand_cmpstr (rtx *operands)
   rtx L_return = gen_label_rtx ();
   rtx L_loop_byte = gen_label_rtx ();
   rtx L_end_loop_byte = gen_label_rtx ();
-  rtx L_loop_long = gen_label_rtx ();
-  rtx L_end_loop_long = gen_label_rtx ();
 
   rtx jump, addr1, addr2;
   int prob_unlikely = REG_BR_PROB_BASE / 10;
   int prob_likely = REG_BR_PROB_BASE / 4;
 
-  emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
-  emit_move_insn (tmp0, GEN_INT (3));
+  rtx L_loop_long = gen_label_rtx ();
+  rtx L_end_loop_long = gen_label_rtx ();
 
-  emit_insn (gen_tstsi_t (tmp0, tmp1));
+  int align = INTVAL (operands[3]);
 
   emit_move_insn (tmp0, const0_rtx);
 
-  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  if (align  4)
+{
+  emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
+  emit_insn (gen_tstsi_t (GEN_INT (3), tmp1));
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+}
 
   addr1 = adjust_automodify_address (s1, SImode, s1_addr, 0);
   addr2 = adjust_automodify_address (s2, SImode, s2_addr, 0);
@@ -250,7 +253,7 @@ sh_expand_cmpstr (rtx *operands)
   add_int_reg_note (jump, REG_BR_PROB, prob_likely);
   /* end loop.  */
 
-  /* Fallthu, check if one of the word is greater.  */
+  /* Fallthu, diff results r.  */
   if (TARGET_LITTLE_ENDIAN)
 {
   rtx low_1 = gen_lowpart (HImode, tmp1);
@@ -267,15 +270,15 @@ sh_expand_cmpstr (rtx *operands)
   jump = emit_jump_insn (gen_jump_compact (L_return));
   emit_barrier_after (jump);
 
-  /* start byte loop.  */
-  addr1 = adjust_automodify_address (s1, QImode, s1_addr, 0);
-  addr2 = adjust_automodify_address (s2, QImode, s2_addr, 0);
-
   emit_label (L_end_loop_long);
 
   emit_move_insn (s1_addr, plus_constant (Pmode, s1_addr, -4));
   emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, -4));
 
+  /* start byte loop.  */
+  addr1 = adjust_automodify_address (s1, QImode, s1_addr, 0);
+  addr2 = adjust_automodify_address (s2, QImode, s2_addr, 0);
+
   emit_label (L_loop_byte);
 
   emit_insn (gen_extendqisi2 (tmp2, addr2));
@@ -289,13 +292,16 @@ sh_expand_cmpstr (rtx *operands)
   add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
 
   emit_insn (gen_cmpeqsi_t (tmp1, tmp2));
-  emit_jump_insn (gen_branch_true (L_loop_byte));
+  if (flag_delayed_branch)
+emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2)));
+  jump = emit_jump_insn (gen_branch_true (L_loop_byte));
   add_int_reg_note (jump, REG_BR_PROB, prob_likely);
   /* end loop.  */
 
   emit_label (L_end_loop_byte);
 
-  emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2)));
+  if (! flag_delayed_branch)
+emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2)));
   emit_insn (gen_zero_extendqisi2 (tmp1, gen_lowpart (QImode, tmp1)));
 
   emit_label (L_return);
@@ -305,3 +311,166 @@ sh_expand_cmpstr (rtx *operands)
   return true;
 }
 
+/* Emit code to perform a strcmp.
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the first string.
+   OPERANDS[2] is the second string.
+   OPERANDS[3] is the length.
+   OPERANDS[4] is the align.  */
+bool
+sh_expand_cmpnstr (rtx *operands)
+{
+  rtx s1 = copy_rtx (operands[1]);
+  rtx s2 = copy_rtx (operands[2]);
+
+  rtx s1_addr = copy_addr_to_reg (XEXP (s1, 0));
+  rtx s2_addr = copy_addr_to_reg

Re: [PATCH, SH] Add support for inlined builtin_strncmp

2013-10-25 Thread Christian Bruel
In the ChangeLog,  the entry

* gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr): Moved here.

is instead

 * gcc/config/sh/sh-mem.cc (sh_expand_cmpnstr): New function.

Sorry for this,

Christian



Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-20 Thread Christian Bruel
Hi Oleg,

On 10/19/2013 11:30 AM, Oleg Endo wrote:



 I've attached two test cases, tested with 
 make -k check-gcc RUNTESTFLAGS=sh.exp=strcmp* --target_board=sh-sim
 \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}

 Could you please include them?

 Cheers,
 Oleg


thanks for having retested this,  The tests are still not complete for 
RTL generated functions, there are cases where no str/cmp wil be
emitted, because we can predict than the size is less than 4 and so have
a direct fallthru into the byte at a byte loop copying.

Also I will consider a size optimized implementation, so we don't jump
to the library

I will post examples for this shortly (and add them as a testcase) with
a strncmp implementation helper, that pertains to strcmp with constant
strings by the way. Please allow me some time to complete by benchmarking.

thanks  for the hints about removing empty  is the expanders

Kaz, before proceeding with the next patch, was your approval for 1/2
only or 2/2 with the expander cleanup ?

Many thanks,

Christian






Re: [PATCH, SH] Add support for inlined builtin-strcmp (1/2)

2013-10-18 Thread Christian Bruel

On 10/18/2013 12:53 AM, Oleg Endo wrote:
 Hi,

 On Thu, 2013-10-17 at 16:13 +0200, Christian Bruel wrote:
 Hello,

 This patch just reorganizes the SH code used for memory builtins into
 its own file, in preparation of the RTL strcmp hoisting in the next part.

 Since GCC is now being compiled as C++, it's probably better to name
 newly added source files .cc instead of .c.  Could you please rename the
 new file to sh-mem.cc?

 Thanks,
 Oleg
Hello Oleg,

I have no objection to rename a pure C file to a c++ suffixed file. 
I'll conform to whatever
 the general guidelines for pure C code is.

For now it doesn't seem to be the tendency.

grep -i ew File ChangeLog | grep .c:
* gimple-builder.c: New File.
* config/winnt-c.c: New file
* ipa-profile.c: New file.
* ubsan.c: New file.
* ipa-devirt.c: New file.
* vtable-verify.c: New file.
* config/arm/aarch-common.c: ... here.  New file.
* diagnostic-color.c: New file.
* config/linux-android.c: New file.

I haven't seen any reference to this in the GCC coding guidelines,
should we prefer .cc, cxx, C,  cpp., c++.. ?

Also I'm wondering if there is any plan to rename all files in the tree
so we have a consistent source tree.

Do we have general recommendation from the general maintainers ?

Many thanks

Christian





Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-18 Thread Christian Bruel
On 10/18/2013 01:05 AM, Oleg Endo wrote:
 I was wondering, in file sh-mem.c, the new function
 'sh4_expand_cmpstr' ... why is it SH4-something?  It's a bit confusing,
 since cmp/str has been around since ever (i.e. since SH1). Maybe just
 rename it to 'sh_expand_cmpstr' instead?

Just historical. (SH4* are our primary SH platforms). The code is
enabled/tested for all SH1 of course, I will  rename. Thanks .

  Maybe just
 rename it to 'sh_expand_cmpstr' instead?  The function always returns
 'true', so maybe just make it return 'void'?

yes, it's for genericity as I plan to reuse/specialize the code based on
the count parameter for strncmp to be contributed next.

 Also, in the expander ...

 +  [(set (match_operand:SI 0 register_operand )
 + (compare:SI (match_operand:BLK 1 memory_operand )

 ... no need to use empty  constraints

OK, thanks

Christian

 Cheers,
 Oleg




[PATCH, SH] Add support for inlined builtin-strcmp (1/2)

2013-10-17 Thread Christian Bruel
Hello,

This patch just reorganizes the SH code used for memory builtins into
its own file, in preparation of the RTL strcmp hoisting in the next part.

OK for trunk ?

Thanks

Christian





2013-10-17  Christian Bruel  christian.br...@st.com

	* config.gcc (sh-*): Add sh-mem.o to extra_obj.
	* gcc/config/sh/t-sh (sh-mem.o): New rule.
	* gcc/config/sh/sh-mem (expand_block_move): Moved here.
	* gcc/config/sh/sh.c (force_into, expand_block_move): Move to sh-mem.c

Index: gcc/config/sh/sh-mem.c
===
--- gcc/config/sh/sh-mem.c	(revision 0)
+++ gcc/config/sh/sh-mem.c	(working copy)
@@ -0,0 +1,176 @@
+/* Helper routines for memory move and comparison insns.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tm.h
+#include expr.h
+#include tm_p.h
+
+/* Like force_operand, but guarantees that VALUE ends up in TARGET.  */
+static void
+force_into (rtx value, rtx target)
+{
+  value = force_operand (value, target);
+  if (! rtx_equal_p (value, target))
+emit_insn (gen_move_insn (target, value));
+}
+
+/* Emit code to perform a block move.  Choose the best method.
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the source.
+   OPERANDS[2] is the size.
+   OPERANDS[3] is the alignment safe to use.  */
+bool
+expand_block_move (rtx *operands)
+{
+  int align = INTVAL (operands[3]);
+  int constp = (CONST_INT_P (operands[2]));
+  int bytes = (constp ? INTVAL (operands[2]) : 0);
+
+  if (! constp)
+return false;
+
+  /* If we could use mov.l to move words and dest is word-aligned, we
+ can use movua.l for loads and still generate a relatively short
+ and efficient sequence.  */
+  if (TARGET_SH4A_ARCH  align  4
+   MEM_ALIGN (operands[0]) = 32
+   can_move_by_pieces (bytes, 32))
+{
+  rtx dest = copy_rtx (operands[0]);
+  rtx src = copy_rtx (operands[1]);
+  /* We could use different pseudos for each copied word, but
+	 since movua can only load into r0, it's kind of
+	 pointless.  */
+  rtx temp = gen_reg_rtx (SImode);
+  rtx src_addr = copy_addr_to_reg (XEXP (src, 0));
+  int copied = 0;
+
+  while (copied + 4 = bytes)
+	{
+	  rtx to = adjust_address (dest, SImode, copied);
+	  rtx from = adjust_automodify_address (src, BLKmode,
+		src_addr, copied);
+
+	  set_mem_size (from, 4);
+	  emit_insn (gen_movua (temp, from));
+	  emit_move_insn (src_addr, plus_constant (Pmode, src_addr, 4));
+	  emit_move_insn (to, temp);
+	  copied += 4;
+	}
+
+  if (copied  bytes)
+	move_by_pieces (adjust_address (dest, BLKmode, copied),
+			adjust_automodify_address (src, BLKmode,
+		   src_addr, copied),
+			bytes - copied, align, 0);
+
+  return true;
+}
+
+  /* If it isn't a constant number of bytes, or if it doesn't have 4 byte
+ alignment, or if it isn't a multiple of 4 bytes, then fail.  */
+  if (align  4 || (bytes % 4 != 0))
+return false;
+
+  if (TARGET_HARD_SH4)
+{
+  if (bytes  12)
+	return false;
+  else if (bytes == 12)
+	{
+	  rtx func_addr_rtx = gen_reg_rtx (Pmode);
+	  rtx r4 = gen_rtx_REG (SImode, 4);
+	  rtx r5 = gen_rtx_REG (SImode, 5);
+
+	  function_symbol (func_addr_rtx, __movmemSI12_i4, SFUNC_STATIC);
+	  force_into (XEXP (operands[0], 0), r4);
+	  force_into (XEXP (operands[1], 0), r5);
+	  emit_insn (gen_block_move_real_i4 (func_addr_rtx));
+	  return true;
+	}
+  else if (! optimize_size)
+	{
+	  const char *entry_name;
+	  rtx func_addr_rtx = gen_reg_rtx (Pmode);
+	  int dwords;
+	  rtx r4 = gen_rtx_REG (SImode, 4);
+	  rtx r5 = gen_rtx_REG (SImode, 5);
+	  rtx r6 = gen_rtx_REG (SImode, 6);
+
+	  entry_name = (bytes  4 ? __movmem_i4_odd : __movmem_i4_even);
+	  function_symbol (func_addr_rtx, entry_name, SFUNC_STATIC);
+	  force_into (XEXP (operands[0], 0), r4);
+	  force_into (XEXP (operands[1], 0), r5);
+
+	  dwords = bytes  3;
+	  emit_insn (gen_move_insn (r6, GEN_INT (dwords - 1)));
+	  emit_insn (gen_block_lump_real_i4 (func_addr_rtx));
+	  return true;
+	}
+  else
+	return false;
+}
+  if (bytes  64)
+{
+  char entry[30];
+  rtx func_addr_rtx = gen_reg_rtx (Pmode);
+  rtx r4 = gen_rtx_REG (SImode, 4);
+  rtx r5 = gen_rtx_REG (SImode, 5);
+
+  sprintf (entry, __movmemSI%d, bytes);
+  function_symbol

[PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-17 Thread Christian Bruel
Hello,

This patch adds support to inline an optimized version of strcmp when
not optimizing for size. The generated code makes use of the cmp/str
instruction to test 4 bytes at a time when correctly aligned.

note that a new pattern was added to match the cmp/str instruction, but
no attempt was made to catch it from combine.

This results in general cycles improvements (against both newlib and
glibc implementations), one of which is a 10%  cycle improvement for a
famous strcmp-biased benchmark starting with a D , but still standard.

This optimization  can be disabled with -fno-builtin-strcmp.

No regressions on sh4 in big and little endian, and sh2 (sh3, and sh4a
are still running for big and little endian for sanity)

OK for trunk

Thanks

Christian



 
2013-10-17  Christian Bruel  christian.br...@st.com

	* gcc/config/sh/sh-mem.c (sh4_expand_cmpstr): New function.
	* gcc/config/sh/sh-protos.h (sh4_expand_cmpstr): Declare.
	* gcc/config/sh/sh.md (cmpstrsi, cmpstr_t): New patterns.
	(rotlhi3_8): Rename.

--- gcc/config/sh/sh.md	2013-10-17 15:14:18.0 +0200
+++ gcc-new/config/sh/sh.md	2013-10-16 16:13:49.0 +0200
@@ -31,9 +31,6 @@
 ;; ??? The MAC.W and MAC.L instructions are not supported.  There is no
 ;; way to generate them.
 
-;; ??? The cmp/str instruction is not supported.  Perhaps it can be used
-;; for a str* inline function.
-
 ;; BSR is not generated by the compiler proper, but when relaxing, it
 ;; generates .uses pseudo-ops that allow linker relaxation to create
 ;; BSR.  This is actually implemented in bfd/{coff,elf32}-sh.c
@@ -4037,7 +4034,7 @@
   DONE;
 })
 
-(define_insn *rotlhi3_8
+(define_insn rotlhi3_8
   [(set (match_operand:HI 0 arith_reg_dest =r)
 	(rotate:HI (match_operand:HI 1 arith_reg_operand r)
 		   (const_int 8)))]
@@ -11912,6 +11909,41 @@
   jsr	@%0%#
   [(set_attr type sfunc)
(set_attr needs_delay_slot yes)])
+
+;; byte compare pattern
+;; temp = a ^ b;
+;; !((temp  0xF000)  (temp  0x0F00)  (temp  0x00F0)  (temp  0x000F))
+(define_insn cmpstr_t
+  [(set (reg:SI T_REG)
+	(eq:SI (and:SI
+		(and:SI
+		 (and:SI
+		  (zero_extract:SI (xor:SI (match_operand:SI 0 arith_reg_operand r)
+	   (match_operand:SI 1 arith_reg_operand r))
+   (const_int 8) (const_int 0))
+		  (zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+   (const_int 8) (const_int 8)))
+		  (zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+   (const_int 8) (const_int 16)))
+		(zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+ (const_int 8) (const_int 24))) (const_int 0)))]
+  TARGET_SH1
+  cmp/str	%0,%1
+  [(set_attr type mt_group)])
+
+(define_expand cmpstrsi
+  [(set (match_operand:SI 0 register_operand )
+	(compare:SI (match_operand:BLK 1 memory_operand )
+		(match_operand:BLK 2 memory_operand )))
+   (use (match_operand 3 immediate_operand ))]
+  TARGET_SH1
+  
+{
+   if (! optimize_insn_for_size_p ()  sh4_expand_cmpstr(operands))
+  DONE;
+   else FAIL;
+})
+
 
 ;; -
 ;; Floating point instructions.
diff -ru gcc/config/sh/sh-mem.c gcc-new/config/sh/sh-mem.c
--- gcc/config/sh/sh-mem.c	2013-10-17 14:59:02.0 +0200
+++ gcc-new/config/sh/sh-mem.c	2013-10-17 14:57:57.0 +0200
@@ -23,6 +23,7 @@
 #include tm.h
 #include expr.h
 #include tm_p.h
+#include basic-block.h
 
 /* Like force_operand, but guarantees that VALUE ends up in TARGET.  */
 static void
@@ -174,3 +175,130 @@
 
   return false;
 }
+
+/* Emit code to perform a strcmp.
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the first string.
+   OPERANDS[2] is the second string.
+   OPERANDS[3] is the align.  */
+bool
+sh4_expand_cmpstr (rtx *operands)
+{
+  rtx s1 = copy_rtx (operands[1]);
+  rtx s2 = copy_rtx (operands[2]);
+  rtx s1_addr = copy_addr_to_reg (XEXP (s1, 0));
+  rtx s2_addr = copy_addr_to_reg (XEXP (s2, 0));
+  rtx tmp0 = gen_reg_rtx (SImode);
+  rtx tmp1 = gen_reg_rtx (SImode);
+  rtx tmp2 = gen_reg_rtx (SImode);
+  rtx tmp3 = gen_reg_rtx (SImode);
+
+  rtx L_return = gen_label_rtx ();
+  rtx L_loop_byte = gen_label_rtx ();
+  rtx L_end_loop_byte = gen_label_rtx ();
+  rtx L_loop_long = gen_label_rtx ();
+  rtx L_end_loop_long = gen_label_rtx ();
+
+  rtx jump, addr1, addr2;
+  int prob_unlikely = REG_BR_PROB_BASE / 10;
+  int prob_likely = REG_BR_PROB_BASE / 4;
+
+  emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
+  emit_move_insn (tmp0, GEN_INT (3));
+
+  emit_insn (gen_tstsi_t (tmp0, tmp1));
+
+  emit_move_insn (tmp0, const0_rtx);
+
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+
+  addr1 = adjust_automodify_address (s1, SImode, s1_addr, 0);
+  addr2 = adjust_automodify_address (s2, SImode, s2_addr, 0);
+
+  /* tmp2 is aligned, OK to load.  */
+  emit_move_insn (tmp3, addr2);
+  emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, 4));
+
+  /*start long loop.  */
+  emit_label (L_loop_long);
+
+  emit_move_insn

Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Christian Bruel
Hi Oleg,

+/*
+This pass tries to optimize for example this:
+   mov.l   @(4,r4),r1
+   tst r1,r1
+   movtr1
+   tst r1,r1
+   bt/s.L5
+
+into something simpler:
+   mov.l   @(4,r4),r1
+   tst r1,r1
+   bf/s.L5
+
+Such sequences can be identified by looking for conditional branches and
+checking whether the ccreg is set before the conditional branch
+by testing another register for != 0, which was set by a ccreg store.
+This can be optimized by eliminating the redundant comparison and
+inverting the branch condition.  There can be multiple comparisons in
+different basic blocks that all end up in the redunant test insn before the
+conditional branch.  Some example RTL ...
+

Nice things to optimize the sequences when t-bit values are not recognized due 
to branches direction, I have 2 questions

1) I find the name if-conversion for this pass a little bit forced, since you 
don't aim to remove branches. If looks more like some kind of extension value 
numbering. 

2) I'm wondering in which extend this case could be handled by a more global 
generic target value numbering to handle boolean operations. Maybe just a 
phasing problem as the branch directions are not yet computed in gimple-ssa, 
which would mean reworking in RTL ?

Many thanks,

Christian 



Re: [PATCH, SH4] Fix PR58475 insn swapb does not satisfy its constraints

2013-09-23 Thread Christian Bruel

On 09/20/2013 01:07 AM, Kaz Kojima wrote:
 Christian Bruel christian.br...@st.com wrote:
 This patch fixes the aforementioned PR by refusing FPUL_REG to be an
 acceptable reg for any arithmetic_operand on TARGET_SH4. (This was a
 strange SH4 singularity with regards to the SH family).

 The only impacted insn is movsf_ie used for reg-fpreg transfers. So the
 condition now mentions explicitly fpul_operand, allowing to simplify a
 bit the logic to match by removing the extra checks.

 The testsuite survived (no regression) for 
 -m2,-m2a,-m2a-nofpu,-m2a-single,-m2a-single-only,-m3,-m3e,-m4,-m4-single,-m4-single-only,-m4a,-m4a-single,-m4a-single-only

 No performance impact on a large number of benchmarks (CSIBE, EEMBC,
 Coremark, ...)

 sh4-linux-elf survived a full Linux distribution rebuild

 OK for trunk?
 OK.

 Regards,
   kaz

committed. Note that this fixes the ICE:

testsuite/gcc.c-torture/compile/pr55921.c:21:1: error: insn does not
satisfy its constraints:
(insn 128 33 129 (parallel [
(set (reg:SF 65 fr1 [ cf ])
(reg:SF 3 r3))
(use (reg/v:PSI 151 ))
(clobber (scratch:SI))
])

for -m2a (but still fails with  error: 'asm' operand requires impossible
reload  as previously with the other configurations)

Thanks

Christian









Re: [PATCH, committed] SH: Fix PR58314 (unsatisfied constraints)

2013-09-19 Thread Christian Bruel
Hi Kaz, Oleg,

On 09/19/2013 01:15 AM, Kaz Kojima wrote:
 Christian Bruel christian.br...@st.com wrote:
  (!can_create_pseudo_p ()  REG_P (operands[0])  REG_P (operands[1]))

 is necessary ?
 It looks an another hack to allow the 2nd and 3rd alternatives only
 when reloading.  If so, it might be a bit cleaner to use a special
 predicate like


This still looks complicated to me. I have tested for sh-superh-elf and
sh-linux the attached patch that just fixes the issue reported by
Richard with no regression and absolutely no differences in code
generation for CSIBe and a few other benches (eembc, coremark, ...). 
The spill alternatives are correctly selected and the original PR still
passes.

If OK I'd like to apply it to trunk/4.8. If there is the need for an
additional hack, How about sending it separately ?

Many thanks,

Christian
2013-09-13  Christian Bruel  christian.br...@st.com

	* config/sh/sh.md (movmode_reg_reg): Use general_movd*_operand predicate and guard insn with reg only operand.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 202699)
+++ gcc/config/sh/sh.md	(working copy)
@@ -6894,9 +6894,11 @@ label:
 ;; reloading MAC subregs otherwise.  For that probably special patterns
 ;; would be required.
 (define_insn *movmode_reg_reg
-  [(set (match_operand:QIHI 0 arith_reg_dest =r,m,*z)
-	(match_operand:QIHI 1 register_operand r,*z,m))]
-  TARGET_SH1  !t_reg_operand (operands[1], VOIDmode)
+  [(set (match_operand:QIHI 0 general_movdst_operand =r,m,*z)
+	(match_operand:QIHI 1 general_movsrc_operand r,*z,m))]
+  TARGET_SH1  !t_reg_operand (operands[1], VOIDmode)
+arith_reg_dest (operands[0], MODEmode)
+register_operand (operands[1], MODEmode)
   @
 mov		%1,%0
 mov.bw	%1,%0


[PATCH, SH4] Fix PR58475 insn swapb does not satisfy its constraints

2013-09-19 Thread Christian Bruel
Hello,

This patch fixes the aforementioned PR by refusing FPUL_REG to be an
acceptable reg for any arithmetic_operand on TARGET_SH4. (This was a
strange SH4 singularity with regards to the SH family).

The only impacted insn is movsf_ie used for reg-fpreg transfers. So the
condition now mentions explicitly fpul_operand, allowing to simplify a
bit the logic to match by removing the extra checks.

The testsuite survived (no regression) for 
-m2,-m2a,-m2a-nofpu,-m2a-single,-m2a-single-only,-m3,-m3e,-m4,-m4-single,-m4-single-only,-m4a,-m4a-single,-m4a-single-only

No performance impact on a large number of benchmarks (CSIBE, EEMBC,
Coremark, ...)

sh4-linux-elf survived a full Linux distribution rebuild

OK for trunk?

many thanks,

Christian


2013-09-19  Christian Bruel  christian.br...@st.com

	PR target/58475
	* config/sh/sh.md (movsf_ie): Allow fpul_operand.
	* config/sh/predicate.md (arith_reg_operand): Disallow FPUL_REG.

2013-09-19  Christian Bruel  christian.br...@st.com

	PR target/58475
	* gcc.target/sh/torture/pr58475.c: New test.

Index: gcc/config/sh/predicates.md
===
--- gcc/config/sh/predicates.md	(revision 202699)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -154,7 +154,7 @@
 
   return (regno != T_REG  regno != PR_REG
 	   ! TARGET_REGISTER_P (regno)
-	   (regno != FPUL_REG || TARGET_SH4)
+	   regno != FPUL_REG
 	   regno != MACH_REG  regno != MACL_REG);
 }
   /* Allow a no-op sign extension - compare LOAD_EXTEND_OP.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 202699)
+++ gcc/config/sh/sh.md	(working copy)
@@ -8203,15 +8205,9 @@ label:
(use (match_operand:PSI 2 fpscr_operand c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c))
(clobber (match_scratch:SI 3 =X,X,Bsc,Bsc,z,X,X,X,X,X,X,X,X,y,X,X,X,X,X))]
   TARGET_SH2E
-(arith_reg_operand (operands[0], SFmode)
-   || arith_reg_operand (operands[1], SFmode)
-   || arith_reg_operand (operands[3], SImode)
-   || (fpul_operand (operands[0], SFmode)
-	memory_operand (operands[1], SFmode)
-	GET_CODE (XEXP (operands[1], 0)) == POST_INC)
-   || (fpul_operand (operands[1], SFmode)
-	memory_operand (operands[0], SFmode)
-	GET_CODE (XEXP (operands[0], 0)) == PRE_DEC))
+(arith_reg_operand (operands[0], SFmode) || fpul_operand (operands[0], SFmode)
+   || arith_reg_operand (operands[1], SFmode) || fpul_operand (operands[1], SFmode)
+   || arith_reg_operand (operands[3], SImode))
   @
 	fmov	%1,%0
 	mov	%1,%0

Index: gcc/testsuite/gcc.target/sh/torture/pr58475.c
===
--- gcc/testsuite/gcc.target/sh/torture/pr58475.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/torture/pr58475.c	(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target sh*-*-* } } */
+
+int
+kerninfo(int __bsx, double tscale)
+{
+ return (
+	 (int)(__extension__
+	   ({
+		 __bsx)  0xff00u)  24)
+		  | (((__bsx)  0x00ff)  8)
+		  | (((__bsx)  0xff00)  8)
+		  | (((__bsx)  0x00ff)  24)
+		  ); }))
+	   * tscale);
+}


Re: [PATCH, committed] SH: Fix PR58314 (unsatisfied constraints)

2013-09-18 Thread Christian Bruel
Hi Richard,

On 09/16/2013 07:10 PM, Richard Sandiford wrote:
 Hi Christian,

 Christian Bruel christian.br...@st.com writes:
 @@ -6893,11 +6894,14 @@ label:
  ;; reloading MAC subregs otherwise.  For that probably special patterns
  ;; would be required.
  (define_insn *movmode_reg_reg
 -  [(set (match_operand:QIHI 0 arith_reg_dest =r)
 -(match_operand:QIHI 1 register_operand r))]
 +  [(set (match_operand:QIHI 0 arith_reg_dest =r,m,*z)
 +(match_operand:QIHI 1 register_operand r,*z,m))]
 If the constraints allow m, the predicates need to accept memories too.
 (It'd be worth having an insn condition that rejects both operands
 being memories though.)

 Thanks,
 Richard
Thanks for your comment,

I was wondering this too when doing the fix. I felt that a memory
operand would be matched by the *movhi patterns bellow.  As  I wanted
to fix only the spilling case, so the original operand is a pseudo reg
having matched the register predicate.
Without the predicate memory not found, I wonder how I never hit a kind
of insn not found error,  well, 'll give a try to adding a memory
condition in the predicate, but I fear that the movhi patterns will stop
to match,

Cheers

Christian




Re: [PATCH, committed] SH: Fix PR58314 (unsatisfied constraints)

2013-09-18 Thread Christian Bruel
Hi Oleg,

On 09/18/2013 02:59 PM, Oleg Endo wrote:
 On Wed, 2013-09-18 at 09:55 +0200, Christian Bruel wrote:
 Hi Richard,

 On 09/16/2013 07:10 PM, Richard Sandiford wrote:
 Hi Christian,

 Christian Bruel christian.br...@st.com writes:
 @@ -6893,11 +6894,14 @@ label:
  ;; reloading MAC subregs otherwise.  For that probably special patterns
  ;; would be required.
  (define_insn *movmode_reg_reg
 -  [(set (match_operand:QIHI 0 arith_reg_dest =r)
 -  (match_operand:QIHI 1 register_operand r))]
 +  [(set (match_operand:QIHI 0 arith_reg_dest =r,m,*z)
 +  (match_operand:QIHI 1 register_operand r,*z,m))]
 If the constraints allow m, the predicates need to accept memories too.
 (It'd be worth having an insn condition that rejects both operands
 being memories though.)

 Thanks,
 Richard
 Thanks for your comment,

 I was wondering this too when doing the fix. I felt that a memory
 operand would be matched by the *movhi patterns bellow.  As  I wanted
 to fix only the spilling case, so the original operand is a pseudo reg
 having matched the register predicate.
 Without the predicate memory not found, I wonder how I never hit a kind
 of insn not found error,  well, 'll give a try to adding a memory
 condition in the predicate, 
 but I fear that the movhi patterns will stop
 to match,
 Yes, this will be the case.  The order of the movhi and movqi patterns
 in the md file is important.  To address the predicates vs. constraints
 issue, the following seems to work:

 (define_insn *movmode_reg_reg
   [(set (match_operand:QIHI 0 general_movsrc_operand =r,m,*z)
   (match_operand:QIHI 1 general_movdst_operand r,*z,m))]
   TARGET_SH1  !t_reg_operand (operands[1], VOIDmode)
 (arith_reg_operand (operands[0], MODEmode)
|| arith_reg_operand (operands[1], MODEmode))
 (!can_create_pseudo_p ()  REG_P (operands[0])  REG_P (operands[1]))
   @
   mov %1,%0
   mov.bw%1,%0
   mov.bw%1,%0
   [(set_attr type move,store,load)])

 .. at least it survives the test case for this PR.  I haven't done
 further tests.

yes I agree (although it seems a weird idea of have a predicate set
larger that what the insn can accept :-),

I was validating a similar change, more simple:

(define_insn *movmode_reg_reg
  [(set (match_operand:QIHI 0 general_movsrc_operand =r,m,*z)
(match_operand:QIHI 1 general_movdst_operand r,*z,m))]
  TARGET_SH1  !t_reg_operand (operands[1], VOIDmode)
arith_reg_operand (operands[0], MODEmode)
arith_reg_operand (operands[1], MODEmode))

do you think your line :

 (!can_create_pseudo_p ()  REG_P (operands[0])  REG_P (operands[1]))

is necessary ?



 BTW, in the test case (gcc.target/sh/torture/pr58314.c), this 

 /* { dg-options -Os } */

 defeats the purpose of the torture tests.

does it ? I though that the dg-option would be a force it in addition to
the torture-option list (which should include -Os anyway, but it's just
to make sure). In my log I have

PASS: gcc.target/sh/torture/pr58314.c  -O0  (test for excess errors)
PASS: gcc.target/sh/torture/pr58314.c  -O1  (test for excess errors)
PASS: gcc.target/sh/torture/pr58314.c  -O2  (test for excess errors)
PASS: gcc.target/sh/torture/pr58314.c  -O3 -funroll-loops  (test for
excess errors)
PASS: gcc.target/sh/torture/pr58314.c  -O3 -funroll-all-loops
-finline-functions  (test for excess errors)
PASS: gcc.target/sh/torture/pr58314.c  -O3 -g  (test for excess errors)
PASS: gcc.target/sh/torture/pr58314.c  -Os  (test for excess errors)

Cheers

Christian


 Cheers,
 Oleg




[PATCH, committed] SH: Fix PR58314 (unsatisfied constraints)

2013-09-13 Thread Christian Bruel
For 4.8 and 4.9

2013-09-13  Christian Bruel  christian.br...@st.com

	PR target/58314
	* config/sh/sh.md (movmode_reg_reg): Allow memory reloads.

2013-09-13  Christian Bruel  christian.br...@st.com

	PR target/58314
	* gcc.target/sh/torture/pr58314.c: New test.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 202556)
+++ gcc/config/sh/sh.md	(working copy)
@@ -6878,10 +6878,11 @@ label:
 ;; If movqi_reg_reg is specified as an alternative of movqi, movqi will be
 ;; selected to copy QImode regs.  If one of them happens to be allocated
 ;; on the stack, reload will stick to movqi insn and generate wrong
-;; displacement addressing because of the generic m alternatives.  
-;; With the movqi_reg_reg being specified before movqi it will be initially 
-;; picked to load/store regs.  If the regs regs are on the stack reload will
-;; try other insns and not stick to movqi_reg_reg.
+;; displacement addressing because of the generic m alternatives.
+;; With the movqi_reg_reg being specified before movqi it will be initially
+;; picked to load/store regs.  If the regs regs are on the stack reload
+;; try other insns and not stick to movqi_reg_reg, unless there were spilled
+;; pseudos in which case 'm' constraints pertain.
 ;; The same applies to the movhi variants.
 ;;
 ;; Notice, that T bit is not allowed as a mov src operand here.  This is to
@@ -6893,11 +6894,14 @@ label:
 ;; reloading MAC subregs otherwise.  For that probably special patterns
 ;; would be required.
 (define_insn *movmode_reg_reg
-  [(set (match_operand:QIHI 0 arith_reg_dest =r)
-	(match_operand:QIHI 1 register_operand r))]
+  [(set (match_operand:QIHI 0 arith_reg_dest =r,m,*z)
+	(match_operand:QIHI 1 register_operand r,*z,m))]
   TARGET_SH1  !t_reg_operand (operands[1], VOIDmode)
-  mov	%1,%0
-  [(set_attr type move)])
+  @
+mov		%1,%0
+mov.bw	%1,%0
+mov.bw	%1,%0
+  [(set_attr type move,store,load)])
 
 ;; FIXME: The non-SH2A and SH2A variants should be combined by adding
 ;; enabled attribute as it is done in other targets.
Index: gcc/testsuite/gcc.target/sh/torture/pr58314.c
===
--- gcc/testsuite/gcc.target/sh/torture/pr58314.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/torture/pr58314.c	(working copy)
@@ -0,0 +1,102 @@
+/* { dg-do compile { target sh*-*-* } } */
+/* { dg-options -Os } */
+
+typedef unsigned short __u16;
+typedef unsigned int __u32;
+
+typedef signed short s16;
+
+
+static inline __attribute__((always_inline)) __attribute__((__const__)) __u16 __arch_swab16(__u16 x)
+{
+ __asm__(
+  swap.b		%1, %0
+  : =r (x)
+  : r (x));
+ return x;
+}
+
+void u16_add_cpu(__u16 *var)
+{
+  *var = __arch_swab16(*var);
+}
+
+typedef struct xfs_mount {
+ int m_attr_magicpct;
+} xfs_mount_t;
+
+typedef struct xfs_da_args {
+ struct xfs_mount *t_mountp;
+ int index;
+} xfs_da_args_t;
+
+typedef struct xfs_dabuf {
+ void *data;
+} xfs_dabuf_t;
+
+typedef struct xfs_attr_leaf_map {
+ __u16 base;
+ __u16 size;
+} xfs_attr_leaf_map_t;
+typedef struct xfs_attr_leaf_hdr {
+ __u16 count;
+ xfs_attr_leaf_map_t freemap[3];
+} xfs_attr_leaf_hdr_t;
+
+typedef struct xfs_attr_leaf_entry {
+  __u16 nameidx;
+} xfs_attr_leaf_entry_t;
+
+typedef struct xfs_attr_leafblock {
+ xfs_attr_leaf_hdr_t hdr;
+ xfs_attr_leaf_entry_t entries[1];
+} xfs_attr_leafblock_t;
+
+int
+xfs_attr_leaf_remove(xfs_attr_leafblock_t *leaf, xfs_da_args_t *args)
+{
+ xfs_attr_leaf_hdr_t *hdr;
+ xfs_attr_leaf_map_t *map;
+ xfs_attr_leaf_entry_t *entry;
+ int before, after, smallest, entsize;
+ int tablesize, tmp, i;
+ xfs_mount_t *mp;
+ hdr = leaf-hdr;
+ mp = args-t_mountp;
+
+ entry = leaf-entries[args-index];
+
+ tablesize = __arch_swab16(hdr-count);
+
+ map = hdr-freemap[0];
+ tmp = map-size;
+ before = after = -1;
+ smallest = 3 - 1;
+ entsize = xfs_attr_leaf_entsize(leaf, args-index);
+
+ for (i = 0; i  2; map++, i++) {
+
+  if (map-base == tablesize)
+u16_add_cpu(map-base);
+
+  if (__arch_swab16(map-base)  + __arch_swab16(map-size)  == __arch_swab16(entry-nameidx))
+   before = i;
+  else if (map-base == entsize)
+   after = i;
+  else if (__arch_swab16(map-size)  tmp)
+   smallest = i;
+ }
+
+ if (before = 0)
+  {
+   map = hdr-freemap[after];
+   map-base = entry-nameidx;
+
+  }
+
+  map = hdr-freemap[smallest];
+
+  map-base = __arch_swab16(entry-nameidx);
+
+ return(tmp  mp-m_attr_magicpct);
+}


[PATCH, SH4] Fix PR58314 (unsatisfied constraints)

2013-09-12 Thread Christian Bruel
The attached patch fixes an ice while building the linux kernel. Reduced
in the included testcase.

The problem is that we are generating a movhi_reg_reg insn that accepts
only registers as operands. Spilling a pseudo on the stack results in an
invalid memory load/store constraints.

The attached patch allows memory for reload.
Tested with the testsuite on sh4-linux and sh-superh-gcc.
No performance impact on a large number of benchmarks (EEMBC, CSIBe,
spec2006, ...)

Oleg, since you moved out the r,r constraints from *mohi into
movhi_reg_reg, do you agree ?

OK for trunk and 4.8 ?

Thanks

Christian


2013-09-13  Christian Bruel  christian.br...@st.com

	PR target/58314
	* config/sh/sh.md (movmode_reg_reg): Allow memory for reload.

2013-09-13  Christian Bruel  christian.br...@st.com

	PR target/58314
	* gcc.target/sh/pr58314.c: New test.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 202517)
+++ gcc/config/sh/sh.md	(working copy)
@@ -6893,11 +6893,14 @@ label:
 ;; reloading MAC subregs otherwise.  For that probably special patterns
 ;; would be required.
 (define_insn *movmode_reg_reg
-  [(set (match_operand:QIHI 0 arith_reg_dest =r)
-	(match_operand:QIHI 1 register_operand r))]
+  [(set (match_operand:QIHI 0 arith_reg_dest =r,m,*z)
+	(match_operand:QIHI 1 register_operand r,*z,m))]
   TARGET_SH1  !t_reg_operand (operands[1], VOIDmode)
-  mov	%1,%0
-  [(set_attr type move)])
+  @
+mov		%1,%0
+mov.bw	%1,%0
+mov.bw	%1,%0
+  [(set_attr type move,store,load)])
 
 ;; FIXME: The non-SH2A and SH2A variants should be combined by adding
 ;; enabled attribute as it is done in other targets.
Index: gcc/testsuite/gcc.target/sh/pr58314.c
===
--- gcc/testsuite/gcc.target/sh/pr58314.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr58314.c	(working copy)
@@ -0,0 +1,102 @@
+/* { dg-do compile { target sh*-*-* } } */
+/* { dg-options -Os } */
+
+typedef unsigned short __u16;
+typedef unsigned int __u32;
+
+typedef signed short s16;
+
+
+static inline __attribute__((always_inline)) __attribute__((__const__)) __u16 __arch_swab16(__u16 x)
+{
+ __asm__(
+  swap.b		%1, %0
+  : =r (x)
+  : r (x));
+ return x;
+}
+
+void u16_add_cpu(__u16 *var)
+{
+  *var = __arch_swab16(*var);
+}
+
+typedef struct xfs_mount {
+ int m_attr_magicpct;
+} xfs_mount_t;
+
+typedef struct xfs_da_args {
+ struct xfs_mount *t_mountp;
+ int index;
+} xfs_da_args_t;
+
+typedef struct xfs_dabuf {
+ void *data;
+} xfs_dabuf_t;
+
+typedef struct xfs_attr_leaf_map {
+ __u16 base;
+ __u16 size;
+} xfs_attr_leaf_map_t;
+typedef struct xfs_attr_leaf_hdr {
+ __u16 count;
+ xfs_attr_leaf_map_t freemap[3];
+} xfs_attr_leaf_hdr_t;
+
+typedef struct xfs_attr_leaf_entry {
+  __u16 nameidx;
+} xfs_attr_leaf_entry_t;
+
+typedef struct xfs_attr_leafblock {
+ xfs_attr_leaf_hdr_t hdr;
+ xfs_attr_leaf_entry_t entries[1];
+} xfs_attr_leafblock_t;
+
+int
+xfs_attr_leaf_remove(xfs_attr_leafblock_t *leaf, xfs_da_args_t *args)
+{
+ xfs_attr_leaf_hdr_t *hdr;
+ xfs_attr_leaf_map_t *map;
+ xfs_attr_leaf_entry_t *entry;
+ int before, after, smallest, entsize;
+ int tablesize, tmp, i;
+ xfs_mount_t *mp;
+ hdr = leaf-hdr;
+ mp = args-t_mountp;
+
+ entry = leaf-entries[args-index];
+
+ tablesize = __arch_swab16(hdr-count);
+
+ map = hdr-freemap[0];
+ tmp = map-size;
+ before = after = -1;
+ smallest = 3 - 1;
+ entsize = xfs_attr_leaf_entsize(leaf, args-index);
+
+ for (i = 0; i  2; map++, i++) {
+
+  if (map-base == tablesize)
+u16_add_cpu(map-base);
+
+  if (__arch_swab16(map-base)  + __arch_swab16(map-size)  == __arch_swab16(entry-nameidx))
+   before = i;
+  else if (map-base == entsize)
+   after = i;
+  else if (__arch_swab16(map-size)  tmp)
+   smallest = i;
+ }
+
+ if (before = 0)
+  {
+   map = hdr-freemap[after];
+   map-base = entry-nameidx;
+
+  }
+
+  map = hdr-freemap[smallest];
+
+  map-base = __arch_swab16(entry-nameidx);
+
+ return(tmp  mp-m_attr_magicpct);
+}


Re: Fwd: Re: [PATCH]. Fix HAVE_SYS_SDT_H for cross-compilation

2013-09-05 Thread Christian Bruel

On 08/30/2013 05:50 PM, Joseph S. Myers wrote:
 On Fri, 30 Aug 2013, Christian Bruel wrote:

 So to cross build a target library |
 --with-build-sysroot=|dir looks appropriate to specify the alternative
 host root path.
 but
 --with-sysroot looks not appropriate because it changes the search paths
 (that should still be /usr/include on the target tree).
 ...
 Your patch suggests you are actually using a cross compiler to build a 
 native compiler - $build != $host == $target.  In that case, it's best not 
 to build target libraries at all, as they will already have been built 
 with the $build-x-$target cross compiler that must have previously been 
 built from the same GCC sources, with the same configuration.  That is, 
 make all-host and make install-host, and copy the libraries from the 
 previous build.  And since you already have such a $build-x-$target 
 compiler, it would seem best to determine what directory that compiler 
 actually searches for headers and compute target_header_dir that way, to 
 the extent that the target headers need examining to determine 
 configuration of the compiler proper.

Thanks for the hint. Indeed in the normal case it doesn't make sense to
rebuild twice the target libraries, the target build is simplified with
the all-host rules and reusing those built with the cross-gcc is best.
But there are cases were it is still interesting to rebuild them (e.g 
the target gcc is not  from the same version or have different
CFLAGS_FOR_TARGET, or for cross-checking reasons).  Let me ping the
patch (from http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01748.html)

Many thanks

Christian



  1   2   >