RE: Fixing ifcvt issue as exposed by BZ89430

2019-05-09 Thread JiangNing OS


> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, May 8, 2019 3:35 PM
> To: JiangNing OS 
> Cc: gcc-patches@gcc.gnu.org; Richard Biener ;
> pins...@gcc.gnu.org
> Subject: Re: Fixing ifcvt issue as exposed by BZ89430
> 
> On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS
>  wrote:
> >
> > To solve BZ89430 the followings are needed,
> >
> > (1) The code below in noce_try_cmove_arith needs to be fixed.
> >
> >   /* ??? We could handle this if we knew that a load from A or B could
> >  not trap or fault.  This is also true if we've already loaded
> >  from the address along the path from ENTRY.  */
> >   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > return FALSE;
> >
> > Finding dominating memory access could help to decide whether the
> memory access following the conditional move is valid or not.
> > * If there is a dominating memory write to the same memory address in
> test_bb as the one from x=a, it is always safe.
> > * When the dominating memory access to the same memory address in
> test_bb is a read, for the load out of x=a, it is always safe. For the store 
> out
> of x=a, if the memory is on stack, it is still safe.
> >
> > Besides, there is a bug around rearranging the then_bb and else_bb layout,
> which needs to be fixed.
> >
> > (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html
> is an overkill. If the write target following conditional move is a memory
> access, it exits shortly.
> >
> >   if (!set_b && MEM_P (orig_x))
> > /* We want to avoid store speculation to avoid cases like
> >  if (pthread_mutex_trylock(mutex))
> >++global_variable;
> >Rather than go to much effort here, we rely on the SSA optimizers,
> >which do a good enough job these days.  */
> > return FALSE;
> >
> > It looks a bit hard for compiler to know the program semantic is
> > related to mutex and avoid store speculation. Without removing this
> > overkill, the fix mentioned by (1) would not work. Any idea? An
> > alternative solution is to detect the following pattern
> > conservatively,
> 
> But it's important to not break this.  Note we do have --param allow-store-
> data-races which the user can use to override this.
> Note that accesses to the stack can obviously not cause store speculation if
> its address didn't escape.  But that's probably what is refered to by "rely on
> the SSA optimizers".

Yes! So I have sent out a patch with title "[PATCH] improve ifcvt optimization 
(PR rtl-optimization/89430)" to detect the access to stack two months ago.
The SSA Optimization called "tree-if-conv" in middle-end can't really cover
this case. The key part of my change is like,

-/* We want to avoid store speculation to avoid cases like
-if (pthread_mutex_trylock(mutex))
-  ++global_variable;
-   Rather than go to much effort here, we rely on the SSA optimizers,
-   which do a good enough job these days.  */
-return FALSE;
+{
+  /* We want to avoid store speculation to avoid cases like
+   if (pthread_mutex_trylock(mutex))
+ ++global_variable;  
+ Tree if conversion cannot handle this case well, and it intends to
+ help vectorization for loops only. */
+  if (!noce_mem_is_on_stack(insn_a, orig_x))
+return FALSE;
 
+  /* For case like,
+   if (pthread_mutex_trylock(mutex))
+ ++local_variable;
+ If any stack variable address is taken, potentially this local
+ variable could be modified by other threads and introduce store
+ speculation. */
+  if (!cfun_no_stack_address_taken)
+return FALSE;
+}

Can you please take a look if you have time? Thank you!

> 
> >  if (a_function_call(...))
> >++local_variable;
> >
> > If the local_variable doesn't have address taken at all in current 
> > function, it
> mustn't be a pthread mutex lock semantic, and then the following patch
> would work to solve (1) and pass the last case as described in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430.
> >
> > Index: gcc/cse.c
> >
> 
> ===
> > --- gcc/cse.c   (revision 268256)
> > +++ gcc/cse.c   (working copy)
> > @@ -540,7 +540,6 @@
> > already as part of an already processed extended basic block.  */
> > static sbitmap cse_visited_basic_blocks;
> >
> > -static bool fixed_base_plus_p (rtx x);  static int notreg_cost (rtx,
> > machine_mode, enum rtx_code, int);  static int preferable (int, in

Re: Fixing ifcvt issue as exposed by BZ89430

2019-05-08 Thread Richard Biener
On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS
 wrote:
>
> To solve BZ89430 the followings are needed,
>
> (1) The code below in noce_try_cmove_arith needs to be fixed.
>
>   /* ??? We could handle this if we knew that a load from A or B could
>  not trap or fault.  This is also true if we've already loaded
>  from the address along the path from ENTRY.  */
>   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> return FALSE;
>
> Finding dominating memory access could help to decide whether the memory 
> access following the conditional move is valid or not.
> * If there is a dominating memory write to the same memory address in test_bb 
> as the one from x=a, it is always safe.
> * When the dominating memory access to the same memory address in test_bb is 
> a read, for the load out of x=a, it is always safe. For the store out of x=a, 
> if the memory is on stack, it is still safe.
>
> Besides, there is a bug around rearranging the then_bb and else_bb layout, 
> which needs to be fixed.
>
> (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html  is 
> an overkill. If the write target following conditional move is a memory 
> access, it exits shortly.
>
>   if (!set_b && MEM_P (orig_x))
> /* We want to avoid store speculation to avoid cases like
>  if (pthread_mutex_trylock(mutex))
>++global_variable;
>Rather than go to much effort here, we rely on the SSA optimizers,
>which do a good enough job these days.  */
> return FALSE;
>
> It looks a bit hard for compiler to know the program semantic is related to 
> mutex and avoid store speculation. Without removing this overkill, the fix 
> mentioned by (1) would not work. Any idea? An alternative solution is to 
> detect the following pattern conservatively,

But it's important to not break this.  Note we do have
--param allow-store-data-races which the user can use to override this.
Note that accesses to the stack can obviously not cause store
speculation if its address didn't escape.  But that's probably what is
refered to by "rely on the SSA optimizers".

>  if (a_function_call(...))
>++local_variable;
>
> If the local_variable doesn't have address taken at all in current function, 
> it mustn't be a pthread mutex lock semantic, and then the following patch 
> would work to solve (1) and pass the last case as described in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430.
>
> Index: gcc/cse.c
> ===
> --- gcc/cse.c   (revision 268256)
> +++ gcc/cse.c   (working copy)
> @@ -540,7 +540,6 @@
> already as part of an already processed extended basic block.  */
>  static sbitmap cse_visited_basic_blocks;
>
> -static bool fixed_base_plus_p (rtx x);
>  static int notreg_cost (rtx, machine_mode, enum rtx_code, int);
>  static int preferable (int, int, int, int);
>  static void new_basic_block (void);
> @@ -606,30 +605,7 @@
>
>  static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
>
>
> -/* Nonzero if X has the form (PLUS frame-pointer integer).  */
>
> -static bool
> -fixed_base_plus_p (rtx x)
> -{
> -  switch (GET_CODE (x))
> -{
> -case REG:
> -  if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> -   return true;
> -  if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> -   return true;
> -  return false;
> -
> -case PLUS:
> -  if (!CONST_INT_P (XEXP (x, 1)))
> -   return false;
> -  return fixed_base_plus_p (XEXP (x, 0));
> -
> -default:
> -  return false;
> -}
> -}
> -
>  /* Dump the expressions in the equivalence class indicated by CLASSP.
> This function is used only for debugging.  */
>  DEBUG_FUNCTION void
> Index: gcc/ifcvt.c
> ===
> --- gcc/ifcvt.c (revision 268256)
> +++ gcc/ifcvt.c (working copy)
> @@ -76,6 +76,9 @@
>  /* Whether conditional execution changes were made.  */
>  static int cond_exec_changed_p;
>
> +/* bitmap for stack frame pointer definition insns. */
> +static bitmap bba_sets_sfp;
> +
>  /* Forward references.  */
>  static int count_bb_insns (const_basic_block);
>  static bool cheap_bb_rtx_cost_p (const_basic_block, profile_probability, 
> int);
> @@ -99,6 +102,7 @@
>edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
> +static void collect_all_fp_insns (void);
>
>
>  /* Count the number of non-jump active insns in BB.  */
>
> @@ -2029,6 +2033,110 @@
>return true;
>  }
>
> +/* Return true if MEM x is on stack. a_insn contains x, if it exists. */
> +
> +static bool
> +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x)
> +{
> +  df_ref use;
> +
> +  gcc_assert (x);
> +  gcc_assert (MEM_P (x));
> +
> +  /* Early exits if find base register is a stack register. */
> +  rtx a = XEXP (x, 0);
> +  if 

Re: Fixing ifcvt issue as exposed by BZ89430

2019-03-19 Thread Jeff Law
On 2/28/19 5:26 AM, JiangNing OS wrote:
> To solve BZ89430 the followings are needed,
> 
> (1) The code below in noce_try_cmove_arith needs to be fixed.
> 
>   /* ??? We could handle this if we knew that a load from A or B could
>  not trap or fault.  This is also true if we've already loaded
>  from the address along the path from ENTRY.  */
>   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> return FALSE;
> 
> Finding dominating memory access could help to decide whether the memory 
> access following the conditional move is valid or not. 
> * If there is a dominating memory write to the same memory address in test_bb 
> as the one from x=a, it is always safe.
> * When the dominating memory access to the same memory address in test_bb is 
> a read, for the load out of x=a, it is always safe. For the store out of x=a, 
> if the memory is on stack, it is still safe.
> 
> Besides, there is a bug around rearranging the then_bb and else_bb layout, 
> which needs to be fixed.
> 
> (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html  is 
> an overkill. If the write target following conditional move is a memory 
> access, it exits shortly.
> 
>   if (!set_b && MEM_P (orig_x))
> /* We want to avoid store speculation to avoid cases like
>  if (pthread_mutex_trylock(mutex))
>++global_variable;
>Rather than go to much effort here, we rely on the SSA optimizers,
>which do a good enough job these days.  */
> return FALSE;
> 
> It looks a bit hard for compiler to know the program semantic is related to 
> mutex and avoid store speculation. Without removing this overkill, the fix 
> mentioned by (1) would not work. Any idea? An alternative solution is to 
> detect the following pattern conservatively,
> 
>  if (a_function_call(...))
>++local_variable;
> 
> If the local_variable doesn't have address taken at all in current function, 
> it mustn't be a pthread mutex lock semantic, and then the following patch 
> would work to solve (1) and pass the last case as described in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430. 
Just a note. I'm deferring this to gcc-10.
jeff