RE: [RFC] ipa: Adjust references to identify read-only globals

2021-07-27 Thread JiangNing OS via Gcc-patches



> -Original Message-
> From: Martin Jambor 
> Sent: Tuesday, July 27, 2021 5:39 PM
> To: JiangNing OS ; Richard Biener
> 
> Cc: GCC Patches ; Jan Hubicka 
> Subject: RE: [RFC] ipa: Adjust references to identify read-only globals
> 
> Hi,
> 
> On Tue, Jul 27 2021, JiangNing OS wrote:
> >> Since it has been pre-approved by Honza, I would like to commit it to
> >> master soon.  Nevertheless, Jiangning, I am OK to wait a day or so if
> >> you can give it another test on your setup.
> >>
> >
> > I failed to apply your patch, so could you please provide a patch file 
> > instead?
> 
> I have committed the patch this morning as commit 13586172d0b - so please
> try that.
> 
> But I really hope the issues are resolved.

Yes. The ICEs I saw before are fixed on trunk.

Thanks,
-Jiangning


RE: [RFC] ipa: Adjust references to identify read-only globals

2021-07-27 Thread JiangNing OS via Gcc-patches
> Since it has been pre-approved by Honza, I would like to commit it to master
> soon.  Nevertheless, Jiangning, I am OK to wait a day or so if you can give it
> another test on your setup.
> 

I failed to apply your patch, so could you please provide a patch file instead?

patch:  malformed patch at line 45: @@ -4008,7 +4009,8 @@ 
ipcp_discover_new_direct_edges (struct cgraph_node *node,

Thanks,
-Jiangning


RE: [RFC] ipa: Adjust references to identify read-only globals

2021-07-20 Thread JiangNing OS via Gcc-patches
> -Original Message-
> From: Gcc-patches  bounces+jiangning=os.amperecomputing@gcc.gnu.org> On Behalf Of
> Martin Jambor
> Sent: Wednesday, June 30, 2021 4:19 AM
> To: GCC Patches 
> Cc: Jan Hubicka 
> Subject: [RFC] ipa: Adjust references to identify read-only globals
> 
> Hi,
> 
> this patch has been motivated by SPEC 2017's 544.nab_r in which there is a
> static variable which is never written to and so zero throughout the run-time
> of the benchmark.  However, it is passed by reference to a function in which
> it is read and (after some multiplications) passed into __builtin_exp which in
> turn unnecessarily consumes almost 10% of the total benchmark run-time.

I do see ~8.5% runtime reduction on aarch64.

> The situation is illustrated by the added testcase remref-3.c.
> 
> The patch adds a flag to ipa-prop descriptor of each parameter to mark such
> parameters.  IPA-CP and inling then take the effort to remove IPA_REF_ADDR
> references in the caller and only add IPA_REF_LOAD reference to the
> clone/overall inlined function.  This is sufficient for subsequent symbol 
> table
> analysis code to identify the read-only variable as such and optimize the 
> code.
> 
> I plan to compile a number of packages with the patch to test it some more
> and get a bit better idea of its impact.  But it has passed bootstrap,
> LTObootstrap and testing on x86_64-linux and i686-linux and so unless I find
> any problem, I would like to commit it at some point next month without any
> major changes, so I'd be grateful for any feedback even now.

I see 3 cases in SPEC2017 failed to compile on aarch64, i.e. 521.wrf_r, 
527.cam4_r, 554.roms_r. For example,

pre_step3d.fppized.f90:1260:35: internal compiler error: Segmentation fault
 1260 |   CALL wclock_on (ng, iNLM, 22)
  |   ^
0x1645c6b internal_error(char const*, ...)
???:0
0xe1f4f4 place_block_symbol(rtx_def*)
???:0
0x84ab33 use_anchored_address(rtx_def*)
???:0
0x868203 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
???:0
0x868793 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
???:0
0x75b593 expand_call(tree_node*, rtx_def*, int)
???:0
0x86a09f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
???:0
Please submit a full bug report

Thanks,
-Jiangning


RE: Should ARMv8-A generic tuning default to -moutline-atomics

2020-05-01 Thread JiangNing OS via Gcc-patches
In reality, a lot of users are still using old gcc versions running on new 
hardware. OpenJDK is a typical example, I think.

> -Original Message-
> From: Richard Earnshaw 
> Sent: Friday, May 1, 2020 6:41 PM
> To: JiangNing OS ; Kyrylo Tkachov
> ; Andrew Pinski ; Florian
> Weimer 
> Cc: gcc-patches@gcc.gnu.org; nmeye...@amzn.com
> Subject: Re: Should ARMv8-A generic tuning default to -moutline-atomics
> 
> On 01/05/2020 11:38, JiangNing OS via Gcc-patches wrote:
> > Hi Kyrill,
> >
> > Can it be backported to gcc 8/9/10 branches?
> >
> 
> I'm not sure changing the defaults of things like this is a good idea on 'dot'
> releases.
> 
> Having the option is one thing.  Changing the default another thing entirely.
> 
> R.
> 
> > Thanks,
> > -Jiangning
> >
> >> -Original Message-
> >> From: Gcc-patches  On Behalf Of
> >> Kyrylo Tkachov
> >> Sent: Thursday, April 30, 2020 8:27 PM
> >> To: Kyrylo Tkachov ; Andrew Pinski
> >> ; Florian Weimer 
> >> Cc: gcc-patches@gcc.gnu.org; nmeye...@amzn.com
> >> Subject: RE: Should ARMv8-A generic tuning default to
> >> -moutline-atomics
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: Gcc-patches  On Behalf Of
> >>> Kyrylo Tkachov
> >>> Sent: 30 April 2020 11:57
> >>> To: Andrew Pinski ; Florian Weimer
> >>> 
> >>> Cc: gcc-patches@gcc.gnu.org; nmeye...@amzn.com
> >>> Subject: RE: Should ARMv8-A generic tuning default to
> >>> -moutline-atomics
> >>>
> >>> [Moving to gcc-patches]
> >>>
> >>>> -Original Message-
> >>>> From: Gcc  On Behalf Of Andrew Pinski via
> >>>> Gcc
> >>>> Sent: 30 April 2020 07:21
> >>>> To: Florian Weimer 
> >>>> Cc: GCC Mailing List ; nmeye...@amzn.com
> >>>> Subject: Re: Should ARMv8-A generic tuning default to
> >>>> -moutline-atomics
> >>>>
> >>>> On Wed, Apr 29, 2020 at 6:25 AM Florian Weimer via Gcc
> >>>> 
> >>>> wrote:
> >>>>>
> >>>>> Distributions are receiving requests to build things with
> >>>>> -moutline-atomics:
> >>>>>
> >>>>>   <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418>
> >>>>>
> >>>>> Should this be reflected in the GCC upstream defaults for ARMv8-A
> >>>>> generic tuning?  It does not make much sense to me if every
> >>>>> distribution has to overide these flags, either in their build
> >>>>> system or by patching GCC.
> >>>>
> >>>> At least we should make it a configure option.
> >>>> I do want the ability to default it for our (Marvell) toolchain for
> >>>> Linux (our bare metal toolchain will be defaulting to ARMv8.2-a
> >>>> anyways).
> >>>
> >>> After some internal discussions, I am open to having it on as a default.
> >>> Here are two versions. One has it as a tuning setting that CPUs can
> >>> override, the other just switches it on by default always unless
> >>> overridden by -mno- outline-atomics.
> >>> I slightly prefer the second one as it's cleaner and simpler, but
> >>> happy to take either.
> >>> Any preferences?
> >>> Thanks,
> >>> Kyrill
> >>>
> >>> ChangeLogs:
> >>>
> >>> 2020-04-30  Kyrylo Tkachov  
> >>>
> >>>   * config/aarch64/aarch64-tuning-flags.def (no_outline_atomics):
> >>> Declare.
> >>>   * config/aarch64/aarch64.h (TARGET_OUTLINE_ATOMICS): Define.
> >>>   * config/aarch64/aarch64.opt (moutline-atomics): Change to Int
> >>> variable.
> >>>
> >>> 2020-04-30  Kyrylo Tkachov  
> >>>
> >>>   * config/aarch64/aarch64.h (TARGET_OUTLINE_ATOMICS): Define.
> >>>   * config/aarch64/aarch64.opt (moutline-atomics): Change to Int
> >>> variable.
> >>>   * doc/invoke.texi (moutline-atomics): Document as on by default.
> >>>
> >>
> >> I've pushed this second variant after bootstrap and testing on
> >> aarch64-none- linux-gnu.
> >> Compiled a simple atomic-using testcase with all relevant
> >> combinations of - moutline-atomics and LSE and no-LSE -march options.
> >> Before the results were (as expected):
> >>  |-moutline-atomics | -mno-outline-atomics |  >> outline-atomics option
> >> 
> >> LSE  |  inline LSE  | inline LSE| inline LSE
> >> no-LSE   |  outline | inline LDXR/STXR  | inline LDX/STXR
> >>
> >>
> >> with this patch they are:
> >>  -moutline-atomics  | -mno-outline-atomics |  >> outline-atomics option>
> >> 
> >> LSE  |  inline LSE  | inline LSE   | inline LSE
> >> no-LSE   |  outline | inline LDXR/STXR | outline
> >>
> >> Thanks,
> >> Kyrill



RE: Should ARMv8-A generic tuning default to -moutline-atomics

2020-05-01 Thread JiangNing OS via Gcc-patches
Hi Kyrill,

Can it be backported to gcc 8/9/10 branches?

Thanks,
-Jiangning

> -Original Message-
> From: Gcc-patches  On Behalf Of Kyrylo
> Tkachov
> Sent: Thursday, April 30, 2020 8:27 PM
> To: Kyrylo Tkachov ; Andrew Pinski
> ; Florian Weimer 
> Cc: gcc-patches@gcc.gnu.org; nmeye...@amzn.com
> Subject: RE: Should ARMv8-A generic tuning default to -moutline-atomics
> 
> 
> 
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Kyrylo Tkachov
> > Sent: 30 April 2020 11:57
> > To: Andrew Pinski ; Florian Weimer
> > 
> > Cc: gcc-patches@gcc.gnu.org; nmeye...@amzn.com
> > Subject: RE: Should ARMv8-A generic tuning default to
> > -moutline-atomics
> >
> > [Moving to gcc-patches]
> >
> > > -Original Message-
> > > From: Gcc  On Behalf Of Andrew Pinski via
> > > Gcc
> > > Sent: 30 April 2020 07:21
> > > To: Florian Weimer 
> > > Cc: GCC Mailing List ; nmeye...@amzn.com
> > > Subject: Re: Should ARMv8-A generic tuning default to
> > > -moutline-atomics
> > >
> > > On Wed, Apr 29, 2020 at 6:25 AM Florian Weimer via Gcc
> > > 
> > > wrote:
> > > >
> > > > Distributions are receiving requests to build things with
> > > > -moutline-atomics:
> > > >
> > > >   
> > > >
> > > > Should this be reflected in the GCC upstream defaults for ARMv8-A
> > > > generic tuning?  It does not make much sense to me if every
> > > > distribution has to overide these flags, either in their build
> > > > system or by patching GCC.
> > >
> > > At least we should make it a configure option.
> > > I do want the ability to default it for our (Marvell) toolchain for
> > > Linux (our bare metal toolchain will be defaulting to ARMv8.2-a
> > > anyways).
> >
> > After some internal discussions, I am open to having it on as a default.
> > Here are two versions. One has it as a tuning setting that CPUs can
> > override, the other just switches it on by default always unless
> > overridden by -mno- outline-atomics.
> > I slightly prefer the second one as it's cleaner and simpler, but
> > happy to take either.
> > Any preferences?
> > Thanks,
> > Kyrill
> >
> > ChangeLogs:
> >
> > 2020-04-30  Kyrylo Tkachov  
> >
> > * config/aarch64/aarch64-tuning-flags.def (no_outline_atomics):
> > Declare.
> > * config/aarch64/aarch64.h (TARGET_OUTLINE_ATOMICS): Define.
> > * config/aarch64/aarch64.opt (moutline-atomics): Change to Int
> > variable.
> >
> > 2020-04-30  Kyrylo Tkachov  
> >
> > * config/aarch64/aarch64.h (TARGET_OUTLINE_ATOMICS): Define.
> > * config/aarch64/aarch64.opt (moutline-atomics): Change to Int
> > variable.
> > * doc/invoke.texi (moutline-atomics): Document as on by default.
> >
> 
> I've pushed this second variant after bootstrap and testing on aarch64-none-
> linux-gnu.
> Compiled a simple atomic-using testcase with all relevant combinations of -
> moutline-atomics and LSE and no-LSE -march options.
> Before the results were (as expected):
>  |-moutline-atomics | -mno-outline-atomics |  option
> 
> LSE  |  inline LSE  | inline LSE| inline LSE
> no-LSE   |  outline | inline LDXR/STXR  | inline LDX/STXR
> 
> 
> with this patch they are:
>  -moutline-atomics  | -mno-outline-atomics |  option>
> 
> LSE  |  inline LSE  | inline LSE   | inline LSE
> no-LSE   |  outline | inline LDXR/STXR | outline
> 
> Thanks,
> Kyrill


RE: [C++ PATCH 2/4] Fix conversions for built-in operator overloading candidates.

2019-09-18 Thread JiangNing OS
Hi Jason,

This commit caused boot-strap failure on aarch64. Is it a bug? Can this be 
fixed ASAP?

../../gcc/gcc/expmed.c:5602:19: error: ���int_mode��� may be used uninitialized 
in this function [-Werror=maybe-uninitialized]
 5602 |   scalar_int_mode int_mode;
  |   ^~~~

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Jason Merrill
> Sent: Monday, September 16, 2019 12:33 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [C++ PATCH 2/4] Fix conversions for built-in operator overloading
> candidates.
> 
> While working on C++20 operator<=>, I noticed that build_new_op_1 was
> doing too much conversion when a built-in candidate was selected; the
> standard says it should only perform user-defined conversions, and then
> leave the normal operator semantics to handle any standard conversions.
> This is important for operator<=> because a comparison of two different
> unscoped enums is ill-formed; if we promote the enums to int here,
> cp_build_binary_op never gets to see the original operand types, so we can't
> give the error.
> 
> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
>   * call.c (build_new_op_1): Don't apply any standard conversions to
>   the operands of a built-in operator.  Don't suppress conversions in
>   cp_build_unary_op.
>   * typeck.c (cp_build_unary_op): Do integral promotions for enums.
> ---
>  gcc/cp/call.c| 51 
>  gcc/cp/typeck.c  |  4 ++--
>  gcc/cp/ChangeLog |  7 +++
>  3 files changed, 34 insertions(+), 28 deletions(-)
> 
> diff --git a/gcc/cp/call.c b/gcc/cp/call.c index c3045d948c5..457fa6605c2
> 100644
> --- a/gcc/cp/call.c
> +++ b/gcc/cp/call.c
> @@ -6139,41 +6139,40 @@ build_new_op_1 (const op_location_t ,
> enum tree_code code, int flags,
> break;
>   }
> 
> -   /* We need to strip any leading REF_BIND so that bitfields
> -  don't cause errors.  This should not remove any important
> -  conversions, because builtins don't apply to class
> -  objects directly.  */
> +   /* "If a built-in candidate is selected by overload resolution, the
> +  operands of class type are converted to the types of the
> +  corresponding parameters of the selected operation function,
> +  except that the second standard conversion sequence of a
> +  user-defined conversion sequence (12.3.3.1.2) is not applied."
> +*/
> conv = cand->convs[0];
> -   if (conv->kind == ck_ref_bind)
> - conv = next_conversion (conv);
> -   arg1 = convert_like (conv, arg1, complain);
> +   if (conv->user_conv_p)
> + {
> +   while (conv->kind != ck_user)
> + conv = next_conversion (conv);
> +   arg1 = convert_like (conv, arg1, complain);
> + }
> 
> if (arg2)
>   {
> conv = cand->convs[1];
> -   if (conv->kind == ck_ref_bind)
> - conv = next_conversion (conv);
> -   else
> - arg2 = decay_conversion (arg2, complain);
> -
> -   /* We need to call warn_logical_operator before
> -  converting arg2 to a boolean_type, but after
> -  decaying an enumerator to its value.  */
> -   if (complain & tf_warning)
> - warn_logical_operator (loc, code, boolean_type_node,
> -code_orig_arg1, arg1,
> -code_orig_arg2, arg2);
> -
> -   arg2 = convert_like (conv, arg2, complain);
> +   if (conv->user_conv_p)
> + {
> +   while (conv->kind != ck_user)
> + conv = next_conversion (conv);
> +   arg2 = convert_like (conv, arg2, complain);
> + }
>   }
> +
> if (arg3)
>   {
> conv = cand->convs[2];
> -   if (conv->kind == ck_ref_bind)
> - conv = next_conversion (conv);
> -   convert_like (conv, arg3, complain);
> +   if (conv->user_conv_p)
> + {
> +   while (conv->kind != ck_user)
> + conv = next_conversion (conv);
> +   arg3 = convert_like (conv, arg3, complain);
> + }
>   }
> -
>   }
>  }
> 
> @@ -6241,7 +6240,7 @@ build_new_op_1 (const op_location_t , enum
> tree_code code, int flags,
>  case REALPART_EXPR:
>  case IMAGPART_EXPR:
>  case ABS_EXPR:
> -  return cp_build_unary_op (code, arg1, candidates != 0, complain);
> +  return cp_build_unary_op (code, arg1, false, complain);
> 
>  case ARRAY_REF:
>return cp_build_array_ref (input_location, arg1, arg2, complain); diff 
> --git
> a/gcc/cp/typeck.c b/gcc/cp/typeck.c index 70094d1b426..620f2c9afdf 100644
> --- a/gcc/cp/typeck.c
> +++ b/gcc/cp/typeck.c
> @@ -6242,7 +6242,7 @@ cp_build_unary_op (enum tree_code code, tree
> xarg, bool noconvert,
>   

RE: PR90724 - ICE with __sync_bool_compare_and_swap with -march=armv8.2-a

2019-08-21 Thread JiangNing OS


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org  On
> Behalf Of JiangNing OS
> Sent: Thursday, August 22, 2019 8:24 AM
> To: Prathamesh Kulkarni ; James Greenhalgh
> 
> Cc: gcc Patches ; Richard Sandiford
> ; Kyrill Tkachov ;
> nd 
> Subject: RE: PR90724 - ICE with __sync_bool_compare_and_swap with -
> march=armv8.2-a
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org  On
> > Behalf Of Prathamesh Kulkarni
> > Sent: Thursday, August 22, 2019 2:36 AM
> > To: James Greenhalgh 
> > Cc: gcc Patches ; Richard Sandiford
> > ; Kyrill Tkachov
> > ; nd 
> > Subject: Re: PR90724 - ICE with __sync_bool_compare_and_swap with -
> > march=armv8.2-a
> >
> > On Mon, 19 Aug 2019 at 22:14, James Greenhalgh
> >  wrote:
> > >
> > > On Thu, Aug 15, 2019 at 02:11:25PM +0100, Prathamesh Kulkarni wrote:
> > > > On Thu, 8 Aug 2019 at 11:22, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Thu, 1 Aug 2019 at 15:34, Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, 25 Jul 2019 at 11:56, Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, 17 Jul 2019 at 18:15, Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, 17 Jul 2019 at 13:45, Kyrill Tkachov
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi Prathamesh
> > > > > > > > >
> > > > > > > > > On 7/10/19 12:24 PM, Prathamesh Kulkarni wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > For following test-case, static long long AL[24];
> > > > > > > > > >
> > > > > > > > > > int
> > > > > > > > > > check_ok (void)
> > > > > > > > > > {
> > > > > > > > > >   return (__sync_bool_compare_and_swap (AL+1,
> > > > > > > > > > 0x20003ll, 0x1234567890ll)); }
> > > > > > > > > >
> > > > > > > > > > Compiling with -O2 -march=armv8.2-a results in:
> > > > > > > > > > pr90724.c: In function ‘check_ok’:
> > > > > > > > > > pr90724.c:7:1: error: unrecognizable insn:
> > > > > > > > > > 7 | }
> > > > > > > > > >   | ^
> > > > > > > > > > (insn 11 10 12 2 (set (reg:CC 66 cc)
> > > > > > > > > > (compare:CC (reg:DI 95)
> > > > > > > > > > (const_int 8589934595 [0x20003])))
> > "pr90724.c":6:11 -1
> > > > > > > > > >  (nil))
> > > > > > > > > >
> > > > > > > > > > IIUC, the issue is that 0x20003 falls outside the
> > > > > > > > > > range of allowable immediate in cmp ? If it's replaced
> > > > > > > > > > by a small constant then it works.
> > > > > > > > > >
> > > > > > > > > > The ICE results with -march=armv8.2-a because, we
> > > > > > > > > > enter if
> > > > > > > > > > (TARGET_LSE) { ... } condition in
> > > > > > > > > > aarch64_expand_compare_and_swap, while with
> > > > > > > > > > -march=armv8.a it goes into else, which forces oldval
> > > > > > > > > > into register if the predicate fails to match.
> > > > > > > > > >
> > > > > > > > > > The attached patch checks if y (oldval) satisfies
> > > > > > > > > > aarch64_plus_operand predicate and if not, forces it
> > > > > > > > > > to be in
> > register, which resolves ICE.
> > > > > > > > > > Does it look OK ?
> > > > > > > > > >
> > > > > > > > > > Bootstrap+testing in progress on aarch64-linux-gnu.
> > > > > > > > > >
> > > > > > > > > > PS: The issue has nothing to do with SVE, which I
> > > > > > > > > > incorrectly mentioned in bug report.
> > > > > > > > > >
> > > > > > > > > This looks ok to me (but you'll need maintainer approval).
> > > > > > > > >
> > > > > > > > > Does this fail on the branches as well?
> > > > > > > > Hi Kyrill,
> > > > > > > > Thanks for the review. The test also fails on gcc-9-branch
> > > > > > > > (but not on
> > gcc-8).
> > > > > > > Hi James,
> > > > > > > Is the patch OK to commit  ?
> > > > > > > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > > > > > ping * 3:
> > > > > > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > > > > ping * 4:
> > > > > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > > > ping * 5: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > >
> > > Hi,
> > >
> > > Sorry, this missed my filters as it didn't mention AArch64 in the
> > > subject line.
> > >
> > > Thais is good for trunk, thanks for waiting.
> > Thanks, committed to trunk in r274805.
> 
> Trunk aarch64 build failure is exposed by this commit.
> 
> ../../gcc/gcc/config/aarch64/aarch64.c: In functionbool
> aarch64_evpc_sel(expand_vec_perm_d*)   :
> ../../gcc/gcc/config/aarch64/aarch64.c:18018:75: error:gen_vcond_mask
> was not declared in this scope
>emit_insn (gen_vcond_mask (vmode, vmode, d->target, d->op1, d->op0,
> pred));
>^
> make[3]: *** [aarch64.o] Error 1
> make[3]: Leaving directory `/home/amptest/gcc/build/gcc'

Oh. It is caused by a different commit 274810 rather than 274805.

> 
> > Is this OK to backport to gcc-9-branch ?
> >
> > Thanks,
> > Prathamesh
> > >
> > > James
> > >


RE: PR90724 - ICE with __sync_bool_compare_and_swap with -march=armv8.2-a

2019-08-21 Thread JiangNing OS
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org  On
> Behalf Of Prathamesh Kulkarni
> Sent: Thursday, August 22, 2019 2:36 AM
> To: James Greenhalgh 
> Cc: gcc Patches ; Richard Sandiford
> ; Kyrill Tkachov ;
> nd 
> Subject: Re: PR90724 - ICE with __sync_bool_compare_and_swap with -
> march=armv8.2-a
> 
> On Mon, 19 Aug 2019 at 22:14, James Greenhalgh
>  wrote:
> >
> > On Thu, Aug 15, 2019 at 02:11:25PM +0100, Prathamesh Kulkarni wrote:
> > > On Thu, 8 Aug 2019 at 11:22, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 1 Aug 2019 at 15:34, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Thu, 25 Jul 2019 at 11:56, Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, 17 Jul 2019 at 18:15, Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, 17 Jul 2019 at 13:45, Kyrill Tkachov
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Prathamesh
> > > > > > > >
> > > > > > > > On 7/10/19 12:24 PM, Prathamesh Kulkarni wrote:
> > > > > > > > > Hi,
> > > > > > > > > For following test-case, static long long AL[24];
> > > > > > > > >
> > > > > > > > > int
> > > > > > > > > check_ok (void)
> > > > > > > > > {
> > > > > > > > >   return (__sync_bool_compare_and_swap (AL+1,
> > > > > > > > > 0x20003ll, 0x1234567890ll)); }
> > > > > > > > >
> > > > > > > > > Compiling with -O2 -march=armv8.2-a results in:
> > > > > > > > > pr90724.c: In function ‘check_ok’:
> > > > > > > > > pr90724.c:7:1: error: unrecognizable insn:
> > > > > > > > > 7 | }
> > > > > > > > >   | ^
> > > > > > > > > (insn 11 10 12 2 (set (reg:CC 66 cc)
> > > > > > > > > (compare:CC (reg:DI 95)
> > > > > > > > > (const_int 8589934595 [0x20003])))
> "pr90724.c":6:11 -1
> > > > > > > > >  (nil))
> > > > > > > > >
> > > > > > > > > IIUC, the issue is that 0x20003 falls outside the
> > > > > > > > > range of allowable immediate in cmp ? If it's replaced
> > > > > > > > > by a small constant then it works.
> > > > > > > > >
> > > > > > > > > The ICE results with -march=armv8.2-a because, we enter
> > > > > > > > > if
> > > > > > > > > (TARGET_LSE) { ... } condition in
> > > > > > > > > aarch64_expand_compare_and_swap, while with
> > > > > > > > > -march=armv8.a it goes into else, which forces oldval
> > > > > > > > > into register if the predicate fails to match.
> > > > > > > > >
> > > > > > > > > The attached patch checks if y (oldval) satisfies
> > > > > > > > > aarch64_plus_operand predicate and if not, forces it to be in
> register, which resolves ICE.
> > > > > > > > > Does it look OK ?
> > > > > > > > >
> > > > > > > > > Bootstrap+testing in progress on aarch64-linux-gnu.
> > > > > > > > >
> > > > > > > > > PS: The issue has nothing to do with SVE, which I
> > > > > > > > > incorrectly mentioned in bug report.
> > > > > > > > >
> > > > > > > > This looks ok to me (but you'll need maintainer approval).
> > > > > > > >
> > > > > > > > Does this fail on the branches as well?
> > > > > > > Hi Kyrill,
> > > > > > > Thanks for the review. The test also fails on gcc-9-branch (but 
> > > > > > > not on
> gcc-8).
> > > > > > Hi James,
> > > > > > Is the patch OK to commit  ?
> > > > > > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > > > > ping * 3:
> > > > > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > > > ping * 4: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> > > ping * 5: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> >
> > Hi,
> >
> > Sorry, this missed my filters as it didn't mention AArch64 in the
> > subject line.
> >
> > Thais is good for trunk, thanks for waiting.
> Thanks, committed to trunk in r274805.

Trunk aarch64 build failure is exposed by this commit.

../../gcc/gcc/config/aarch64/aarch64.c: In function ���bool 
aarch64_evpc_sel(expand_vec_perm_d*)���:
../../gcc/gcc/config/aarch64/aarch64.c:18018:75: error: ���gen_vcond_mask��� 
was not declared in this scope
   emit_insn (gen_vcond_mask (vmode, vmode, d->target, d->op1, d->op0, pred));
   ^
make[3]: *** [aarch64.o] Error 1
make[3]: Leaving directory `/home/amptest/gcc/build/gcc'

> Is this OK to backport to gcc-9-branch ?
> 
> Thanks,
> Prathamesh
> >
> > James
> >


RE: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-24 Thread JiangNing OS
> -Original Message-
> From: Martin Sebor 
> Sent: Thursday, July 25, 2019 2:08 AM
> To: Jeff Law ; JiangNing OS
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for
> conditional store optimization
> 
> On 7/24/19 11:12 AM, Jeff Law wrote:
> > On 7/24/19 10:09 AM, Martin Sebor wrote:
> >> On 7/24/19 9:25 AM, Jeff Law wrote:
> >>> On 7/23/19 10:20 AM, Martin Sebor wrote:
> >>>> On 7/22/19 10:26 PM, JiangNing OS wrote:
> >>>>> This patch is to fix PR91195. Is it OK for trunk?
> >>>>>
> >>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog index
> >>>>> 711a31ea597..4db36644160 100644
> >>>>> --- a/gcc/ChangeLog
> >>>>> +++ b/gcc/ChangeLog
> >>>>> @@ -1,3 +1,9 @@
> >>>>> +2019-07-22  Jiangning Liu  
> >>>>> +
> >>>>> +    PR middle-end/91195
> >>>>> +    * tree-ssa-phiopt.c (cond_store_replacement): Work around
> >>>>> +    -Wmaybe-uninitialized warning.
> >>>>> +
> >>>>>     2019-07-22  Stafford Horne  
> >>>>>       * config/or1k/or1k.c (or1k_expand_compare): Check for
> >>>>> int before diff --git a/gcc/tree-ssa-phiopt.c
> >>>>> b/gcc/tree-ssa-phiopt.c index b64bde695f4..7a86007d087 100644
> >>>>> --- a/gcc/tree-ssa-phiopt.c
> >>>>> +++ b/gcc/tree-ssa-phiopt.c
> >>>>> @@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block
> >>>>> middle_bb, basic_block join_bb,
> >>>>>   tree base = get_base_address (lhs);
> >>>>>   if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
> >>>>>     return false;
> >>>>> +
> >>>>> +  /* The transformation below will inherently introduce a
> >>>>> +memory
> >>>>> load,
> >>>>> + for which LHS may not be initialized yet if it is not in
> >>>>> +NOTRAP,
> >>>>> + so a -Wmaybe-uninitialized warning message could be triggered.
> >>>>> + Since it's a bit hard to have a very accurate
> >>>>> +uninitialization
> >>>>> + analysis to memory reference, we disable the warning here to
> >>>>> avoid
> >>>>> + confusion.  */
> >>>>> +  TREE_NO_WARNING (lhs) = 1;
> >>>>
> >>>> The no-warning bit is sometimes (typically?) set by the middle-end
> >>>> after a warning has been issued, to avoid triggering other warnings
> >>>> down the line for an already diagnosed expression.  Unless it's
> >>>> done just for the purposes of a single pass and the bit is cleared
> >>>> afterwards, using it to avoid potential false positives doesn't
> >>>> seem like a robust solution.  It will mask warnings for constructs
> >>>> that have been determined to be invalid.
> >>>>
> >>>> FWIW, the middle-end is susceptible to quite a few false positives
> >>>> that would nice to avoid.  We have discussed various approaches to
> >>>> the problem but setting the no-warning bit seems like too blunt of
> >>>> an instrument.
> >>> All true.
> >>>
> >>> But in the case JiangNing is working with the transformation
> >>> inherently can introduce an uninitialized read.  It seems reasonable
> >>> to mark those loads he generates that don't have a dominating read.
> >>>
> >>> His code takes something like
> >>>
> >>>     if (x)
> >>>   *p = 
> >>>
> >>> And turns it into
> >>>
> >>>     t1 = *p;
> >>>     t2 = x ?  : t1;
> >>>     *p = t2;
> >>>
> >>> In the past we required there be a dominating read from *p (among
> >>> other restrictions).  That requirement was meant to ensure *p isn't
> >>> going to fault.  Jiang's work relaxes that requirement somewhat for
> >>> objects that we can prove aren't going to fault via other means.
> >>>
> >>> Can setting TREE_NO_WARNING on the inserted loads inhibit warnings?
> >>> Certainly.  However, I believe we use it in other places where we
> >>> know the code we're emitting is safe, but can cause a warning.  I
> >>> think Jiang's work falls into that category.
>

[PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-22 Thread JiangNing OS
This patch is to fix PR91195. Is it OK for trunk?

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 711a31ea597..4db36644160 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-07-22  Jiangning Liu  
+
+   PR middle-end/91195
+   * tree-ssa-phiopt.c (cond_store_replacement): Work around
+   -Wmaybe-uninitialized warning.
+
 2019-07-22  Stafford Horne  
 
* config/or1k/or1k.c (or1k_expand_compare): Check for int before
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index b64bde695f4..7a86007d087 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block middle_bb, 
basic_block join_bb,
   tree base = get_base_address (lhs);
   if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
return false;
+
+  /* The transformation below will inherently introduce a memory load,
+for which LHS may not be initialized yet if it is not in NOTRAP,
+so a -Wmaybe-uninitialized warning message could be triggered.
+Since it's a bit hard to have a very accurate uninitialization
+analysis to memory reference, we disable the warning here to avoid
+confusion.  */
+  TREE_NO_WARNING (lhs) = 1;
 }
 
   /* Now we've checked the constraints, so do the transformation:

Thanks,
-Jiangning


RE: Remove array_index inliner hint

2019-07-15 Thread JiangNing OS
Hi Honza,

It seems this commit caused gcc bootstrap failure on aarch64 as below,

Comparing stages 2 and 3
warning: gcc/cc1obj-checksum.o differs
Bootstrap comparison failure!
gcc/dwarf2out.o differs
gcc/var-tracking.o differs
gcc/aarch64.o differs
make[2]: *** [compare] Error 1

Do you know the reason? I'm not sure if you have an aarch64 machine to try. Can 
it be fixed?

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Jan Hubicka
> Sent: Sunday, July 14, 2019 8:03 PM
> To: gcc-patches@gcc.gnu.org
> Subject: Remove array_index inliner hint
> 
> Hi,
> array_index hint marks functions contains an array reference that is indexed
> by value that will become constant after inlining.
> This hint is later used by ipa-inline in rather agressive way updating inline-
> insns-auto to inline-insns-single for such functions.  While tuning -finline-
> functions for -O2 I have noticed that this leads to 30% code size growth of
> SPEC versions of GCC. This is because all predicates like register_operand
> contains such array references based on mode parameter and this makes
> GCC to conclude about agressively inlining them all which bloats
> insn-* a lot.
> 
> This hint was quite early experiment with propagating additional stuff
> through inliner.  I think it is better to actualy account the instructions 
> which
> will be generated for array indexing rather then handling this specially.
> 
> This patch makes us to accound 1 instruction for every non-constant array
> access.  This is still probably making inliner overly optimistic about 
> inlining
> benefits since these accesses will later be CSEed, but it kills bit of magic 
> and
> makes things more robust.
> 
> This will make inliner notice that function will simplify and give it higher
> priority for inlining possibly still bypassing the bounds if big speedup is
> achieved. This is however a lot more rare than before.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
> Honza
> 
>   * ipa-fnsummary.c (ipa_dump_hints): Do not dump array_index.
>   (ipa_fn_summary::~ipa_fn_summary): Do not destroy array_index.
>   (ipa_fn_summary_t::duplicate): Do not duplicate array_index.
>   (array_index_predicate): Remove.
>   (analyze_function_body): Account cost for variable ofsetted array
>   indexing.
>   (estimate_node_size_and_time): Do not compute array index hint.
>   (ipa_merge_fn_summary_after_inlining): Do not merge array index
> hint.
>   (inline_read_section): Do not read array index hint.
>   (ipa_fn_summary_write): Do not write array index hint.
>   * doc/invoke.texi (ipa-cp-array-index-hint-bonus): Remove.
>   * ipa-cp.c (hint_time_bonus): Remove.
>   * ipa-fnsummary.h (ipa_hints_vals): Remove array_index.
>   (ipa_fnsummary): Remove array_index.
>   * ipa-inline.c (want_inline_small_function_p): Do not use
>   array_index.
>   (edge_badness): Likewise.
>   * params.def (PARAM_IPA_CP_ARRAY_INDEX_HINT_BONUS):
> Remove.
> 
> Index: doc/invoke.texi
> 
> ===
> --- doc/invoke.texi   (revision 273450)
> +++ doc/invoke.texi   (working copy)
> @@ -11895,12 +11895,6 @@ of iterations of a loop known, it adds a
> @option{ipa-cp-loop-hint-bonus} to the profitability score of  the candidate.
> 
> -@item ipa-cp-array-index-hint-bonus
> -When IPA-CP determines that a cloning candidate would make the index of -
> an array access known, it adds a bonus of -@option{ipa-cp-array-index-hint-
> bonus} to the profitability -score of the candidate.
> -
>  @item ipa-max-aa-steps
>  During its analysis of function bodies, IPA-CP employs alias analysis  in 
> order
> to track values pointed to by function parameters.  In order
> Index: ipa-cp.c
> 
> ===
> --- ipa-cp.c  (revision 273450)
> +++ ipa-cp.c  (working copy)
> @@ -2607,8 +2607,6 @@ hint_time_bonus (ipa_hints hints)
>int result = 0;
>if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
>  result += PARAM_VALUE (PARAM_IPA_CP_LOOP_HINT_BONUS);
> -  if (hints & INLINE_HINT_array_index)
> -result += PARAM_VALUE (PARAM_IPA_CP_ARRAY_INDEX_HINT_BONUS);
>return result;
>  }
> 
> Index: ipa-fnsummary.c
> 
> ===
> --- ipa-fnsummary.c   (revision 273450)
> +++ ipa-fnsummary.c   (working copy)
> @@ -134,11 +134,6 @@ ipa_dump_hints (FILE *f, ipa_hints hints
>hints &= ~INLINE_HINT_declared_inline;
>fprintf (f, " declared_inline");
>  }
> -  if (hints & INLINE_HINT_array_index)
> -{
> -  hints &= ~INLINE_HINT_array_index;
> -  fprintf (f, " array_index");
> -}
>if (hints & INLINE_HINT_known_hot)
>  {
>hints &= ~INLINE_HINT_known_hot;
> @@ -549,8 +544,6 @@ ipa_fn_summary::~ipa_fn_summary ()
>  edge_predicate_pool.remove (loop_iterations);

RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-07-08 Thread JiangNing OS
Hi Jeff and Richard B.,

Following your tips, I've found a much simpler solution in tree-ssa-phiopt.c. 
Attached is the new patch. Review again, please!

Thanks a lot!
-Jiangning

> -Original Message-
> From: Jeff Law 
> Sent: Saturday, June 22, 2019 12:10 AM
> To: Richard Biener 
> Cc: JiangNing OS ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
> 
> On 6/21/19 6:23 AM, Richard Biener wrote:
> >
> > That looks like a pretty easy form to analyze.  I'd suggest looking
> > through tree-ssa-phiopt.c closely.  There's several transformations in
> > there that share similarities with yours.
> >
> >
> > More specifically there is the conditional store elimination (cselim)
> > pass inside this file.
> That's the one I was thinking of.  It looks reasonably close to the cases
> JiangNing is trying to capture.
> 
> 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >> 2) My current solution fits into current back-end if-conversion
> > pass very well. I don't want to invent
> >> a new framework to solve this relatively small issue. Besides,
> > this back-end patch doesn't only
> >> enhance store speculation detection, but also fix a bug in the
> > original code. Understood, but I still wonder if we're better off
> > addressing this in gimple.
> >
> >
> > Traditionally if-conversions profitability heavily depends on the
> > target and esp. if memory is involved costing on GIMPLE is hard.
> > That's also one reason why we turned off loop if-conversion on GIMPLE
> > for non-vectorized code.
> Yea.  That's always the concern for transformations that aren't trivial to 
> show
> are better.
> 
> Conditional store elimination as implemented right now in phiopt requires a
> single store in the then/else clauses.  So we're really just sinking the 
> stores
> through a PHI.  We're really not changing the number of loads or stores on
> any path.
> 
> In the cases I think JiangNing is trying to handle we are adding a store on a
> path where it didn't previously exist because there is no else clause.  So we
> likely need to check the edge probabilities to avoid doing something dumb.  I
> don't have a good sense where the cutoff should be.  tree-ssa-sink might
> have some nuggets of information to help guide.
> 
> >
> > phiopt is really about followup optimization opportunities in passes
> > that do not handle conditionals well.
> >
> > cselim is on the border...
> ACK.  In fact, looking at it it I wonder if it'd be better in tree-ssa-sink.c 
> :-)
> 
> 
> jeff


csel5.patch
Description: csel5.patch


RE: Use ODR for canonical types construction in LTO

2019-07-01 Thread JiangNing OS
Hi,

FYI. This patch works for my application LTO build on aarch64.

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Jan Hubicka
> Sent: Monday, July 1, 2019 8:22 PM
> To: Christophe Lyon 
> Cc: Eric Botcazou ; gcc Patches  patc...@gcc.gnu.org>; Richard Biener ; d...@dcepelik.cz;
> Martin Liška 
> Subject: Re: Use ODR for canonical types construction in LTO
> 
> Hi,
> this patch fixes the ICE. Problem is that va_arg type is pre-streamed and thus
> at stream-in time we have main variant constructed by LTO FE which
> is !CXX_ODR_P while vairants are ones comming from C++ FE which are ODR.
> It is safe to drop the flags here since we only care about main variants
> anyway.
> 
> I am testing the patch on aarch64 now.
> 
> Honza
> 
> Index: lto/lto-common.c
> 
> ===
> --- lto/lto-common.c  (revision 272846)
> +++ lto/lto-common.c  (working copy)
> @@ -568,8 +568,17 @@ lto_register_canonical_types_for_odr_typ
> 
>/* Register all remaining types.  */
>FOR_EACH_VEC_ELT (*types_to_register, i, t)
> -if (!TYPE_CANONICAL (t))
> -  gimple_register_canonical_type (t);
> +{
> +  /* For pre-streamed types like va-arg it is possible that main variant
> +  is !CXX_ODR_P while the variant (which is streamed) is.
> +  Copy CXX_ODR_P to make type verifier happy.  This is safe because
> +  in canonical type calculation we only consider main variants.
> +  However we can not change this flag before streaming is finished
> +  to not affect tree merging.  */
> +  TYPE_CXX_ODR_P (t) = TYPE_CXX_ODR_P (TYPE_MAIN_VARIANT (t));
> +  if (!TYPE_CANONICAL (t))
> +gimple_register_canonical_type (t);
> +}
>  }
> 
> 


RE: Use ODR for canonical types construction in LTO

2019-06-27 Thread JiangNing OS
No. Since this is LTO, it's very hard to simplify the big application. Sorry 
for that.

I think Christophe is mentioning the case from g++.dg is reporting the similar 
issue like " type variant differs by TYPE_CXX_ODR_P  ", right?

Thanks,
-Jiangning

> -Original Message-
> From: Jan Hubicka 
> Sent: Thursday, June 27, 2019 2:29 PM
> To: JiangNing OS 
> Cc: Eric Botcazou ; Christophe Lyon
> ; gcc Patches ;
> Richard Biener ; d...@dcepelik.cz; Martin Liška
> 
> Subject: Re: Use ODR for canonical types construction in LTO
> 
> > Hi,
> >
> > This commit
> > https://gcc.gnu.org/viewcvs/gcc?view=revision=272628 is
> > breaking trunk LTO on some real benchmarks, so can it be fixed or
> > reverted? For example,
> 
> Do you have a testcase?
> Honza
> >
> > lto1: error: type variant differs by TYPE_CXX_ODR_P   > 0x99943d08 __va_list BLK
> > size  bitsizetype> constant 256>
> > unit-size  sizetype> constant 32>
> > align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x99943d08
> > fields  > type  > void>
> > public unsigned DI
> > size 
> > unit-size 
> > align:64 warn_if_not_align:0 symtab:0 alias-set 7 
> > structural-equality
> > pointer_to_this >
> > unsigned DI :0:0 size  
> > unit-size
> 
> > align:64 warn_if_not_align:0 offset_align 128
> > offset 
> > bit-offset  context
> 
> > chain  0x99940fc0>
> > unsigned DI :0:0 size  
> > unit-
> size 
> > align:64 warn_if_not_align:0 offset_align 128 offset 
> >  0x99890c30 0> bit-offset  context
>  chain  __vr_top>>>
> > reference_to_this  chain  > 0x99960098 __va_list>>   BLK
> > size  bitsizetype> constant 256>
> > unit-size  sizetype> constant 32>
> > align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x99943d08
> > fields  > type  > void>
> > public unsigned DI
> > size 
> > unit-size 
> > align:64 warn_if_not_align:0 symtab:0 alias-set 7 
> > structural-equality
> > pointer_to_this >
> > unsigned DI :0:0 size  
> > unit-size
> 
> > align:64 warn_if_not_align:0 offset_align 128
> > offset 
> > bit-offset  context
> 
> > chain  0x99940fc0>
> > unsigned DI :0:0 size  
> > unit-
> size 
> > align:64 warn_if_not_align:0 offset_align 128 offset 
> >  0x99890c30 0> bit-offset  context
>  chain  __vr_top>>>
> > pointer_to_this  reference_to_this
> > >
> > lto1: internal compiler error: 'verify_type' failed
> > 0xe33e93 verify_type(tree_node const*)
> > ../../gcc/gcc/tree.c:14655
> > 0x5efc4b lto_fixup_state
> > ../../gcc/gcc/lto/lto-common.c:2429
> > 0x5fc01b lto_fixup_decls
> > ../../gcc/gcc/lto/lto-common.c:2460
> > 0x5fc01b read_cgraph_and_symbols(unsigned int, char const**)
> > ../../gcc/gcc/lto/lto-common.c:2693
> > 0x5ded23 lto_main()
> > ../../gcc/gcc/lto/lto.c:616
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > Please include the complete backtrace with any bug report.
> > See <https://gcc.gnu.org/bugs/> for instructions.
> > lto-wrapper: fatal error: /home/amptest/gcc/install_last//bin/g++
> > returned 1 exit status compilation terminated.
> > /usr/bin/ld: error: lto-wrapper failed
> > collect2: error: ld returned 1 exit status
> >
> > Thanks,
> > -Jiangning
> >
> > > -Original Message-
> > > From: gcc-patches-ow...@gcc.gnu.org 
> > > On Behalf Of Christophe Lyon
> > > Sent: Tuesday, June 25, 2019 8:30 PM
> > > To: Jan Hubicka 
> > > Cc: Eric Botcazou ; gcc Patches  > > patc...@gcc.gnu.org>; Richard Biener ;
> > > d...@dcepelik.cz; Martin Liška 
> > > Subject: Re: Use ODR for canonical types construction in LTO
> > >
> > > Hi,
> > >
> > >
> > > On Tue, 25 Jun 2019 at 10:20, Jan Hubicka  wrote:
> > > >
> > > > > > * gcc-interface/decl.c (gnat_to_gnu_entity): Check that
> > > > > > type is array or integer prior checking string flag.
> > > > >
> > > > > The test for array is superfluous h

RE: Use ODR for canonical types construction in LTO

2019-06-26 Thread JiangNing OS
Hi,

This commit https://gcc.gnu.org/viewcvs/gcc?view=revision=272628 is 
breaking trunk LTO on some real benchmarks, so can it be fixed or reverted? For 
example,

lto1: error: type variant differs by TYPE_CXX_ODR_P
  constant 256>
unit-size  constant 32>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x99943d08
fields 
public unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 7 
structural-equality
pointer_to_this >
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
chain 
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128 offset  bit-offset  context 
 chain >>
reference_to_this  chain >
  constant 256>
unit-size  constant 32>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x99943d08
fields 
public unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 7 
structural-equality
pointer_to_this >
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
chain 
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128 offset  bit-offset  context 
 chain >>
pointer_to_this  reference_to_this 
>
lto1: internal compiler error: 'verify_type' failed
0xe33e93 verify_type(tree_node const*)
../../gcc/gcc/tree.c:14655
0x5efc4b lto_fixup_state
../../gcc/gcc/lto/lto-common.c:2429
0x5fc01b lto_fixup_decls
../../gcc/gcc/lto/lto-common.c:2460
0x5fc01b read_cgraph_and_symbols(unsigned int, char const**)
../../gcc/gcc/lto/lto-common.c:2693
0x5ded23 lto_main()
../../gcc/gcc/lto/lto.c:616
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
lto-wrapper: fatal error: /home/amptest/gcc/install_last//bin/g++ returned 1 
exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Christophe Lyon
> Sent: Tuesday, June 25, 2019 8:30 PM
> To: Jan Hubicka 
> Cc: Eric Botcazou ; gcc Patches  patc...@gcc.gnu.org>; Richard Biener ; d...@dcepelik.cz;
> Martin Liška 
> Subject: Re: Use ODR for canonical types construction in LTO
> 
> Hi,
> 
> 
> On Tue, 25 Jun 2019 at 10:20, Jan Hubicka  wrote:
> >
> > > > * gcc-interface/decl.c (gnat_to_gnu_entity): Check that
> > > > type is array or integer prior checking string flag.
> > >
> > > The test for array is superfluous here.
> > >
> > > > * gcc-interface/gigi.h (gnat_signed_type_for,
> > > > maybe_character_value): Likewise.
> > >
> > > Wrong ChangeLog, the first modified function is maybe_character_type.
> > >
> > > I have installed the attached patchlet after testing it on x86-64/Linux.
> > >
> > >
> > >   * gcc-interface/decl.c (gnat_to_gnu_entity): Remove superfluous test
> in
> > >   previous change.
> > >   * gcc-interface/gigi.h (maybe_character_type): Fix formatting.
> > >   (maybe_character_value): Likewise.
> >
> > Thanks a lot. I was not quite sure if ARRAY_TYPEs can happen there and
> > I should have added you to the CC.
> >
> 
> After the main commit (r272628), I have noticed regressions on arm and
> aarch64:
> 
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O0 -flto
> -flto-partition=1to1 -fno-use-linker-plugin  (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O0 -flto
> -flto-partition=none -fuse-linker-plugin (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O0 -flto
> -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O2 -flto
> -flto-partition=1to1 -fno-use-linker-plugin  (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O2 -flto
> -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects (internal 
> compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O2 -flto
> -fuse-linker-plugin (internal compiler error)
> g++.dg/torture/pr45843.C   -O2 -flto -fno-use-linker-plugin
> -flto-partition=none  (internal compiler error)
> g++.dg/torture/pr45843.C   -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects  (internal compiler error)
> g++.dg/torture/stackalign/eh-vararg-1.C   -O2 -flto
> -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
> 

RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-20 Thread JiangNing OS
Hi Jeff,

Appreciate your effort to review my patch! I've updated my patch as attached. 
See my answers below.

> in current function, so the store speculation can be avoided.
> So at a high level should we be doing this in gimple rather than RTL?
> We're going to have a lot more information about types, better
> infrastructure for looking at uses/defs, access to the alias oracle, we should
> be able to accurately distinguish between potentially shared objects vs those
> which are local to the thread, etc.  We lose the low level costing information
> though.
> 
> I'm still going to go through the patch and do some level of review, but I do
> think we need to answer the higher level question though.
> 
I have the following reasons,

1) Following the clue Richard B gave me before about parameter --param 
allow-store-data-races,
I did check the middle-end pass tree-if-conv, but I think this pass at the 
moment doesn't work
for the issue I'm trying to solve. Tree-if-conv is to do if conversion for 
loop, and its final goal is to
help loop vectorization, while my case doesn't have a loop at all. 
2) My current solution fits into current back-end if-conversion pass very well. 
I don't want to invent
a new framework to solve this relatively small issue. Besides, this back-end 
patch doesn't only
enhance store speculation detection, but also fix a bug in the original code. 

> Nits: We typically refer to parameters, variables, etc in comments using
> upper case.  You'll need to review the entire patch for these its.
> 
> So perhaps the comment should be something like:
> 
> /* Return true of X, a MEM expression, is on the stack.  A_INSN contains
>X if A_INSN exists.  */
> 
Fixed in attached new patch.

> 
> Just from a design standpoint, what are the consequences if this function
> returns true for something that isn't actually in the stack or false for
> something that is on the stack?
> 
If noce_mem_is_on_stack returns true for something that isn't actually in the 
stack, 
it could potentially introduce store speculation, then the if-conversion 
optimization
will be incorrect. If this function returns false for something that is on 
stack, it doesn't
matter, because the optimization will not be triggered. 

> Nit: Space between the function name and its open paren for arguments.  ie
> 
> if (fixed_base_plus_p (a)
>  ^
> I see other instances of this nit, please review the patch and correct them.
> 
Fixed in attached new patch.

> 
> > +
> > +  if (!a_insn)
> > +return false;
> I'm not sure what the calling profile is for this function, but this is a 
> cheaper
> test, so you might consider moving it before the test of fixed_base_plus_p.
> 
Fixed in attached new patch.

> 
> > +
> > +  if (!reg_mentioned_p (x, a_insn))
> > +return false;
> > +
> > +  /* Check if x is on stack. Assume a mem expression using registers
> > + related to stack register is always on stack. */
> > + FOR_EACH_INSN_USE (use, a_insn)
> > +if (reg_mentioned_p (DF_REF_REG (use), x)
> > +&& bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use)))
> > +  return true;
> > +
> > +  return false;
> > +}
> So is X always a MEM?  Just wanted to make sure I understand.
> reg_mentioned_p will do the right thing (using rtx_equal_p) for the
> comparison.
> 
Yes. X is always a MEM. There is an assertion for this in the code above.

> 
> > +
> > +/* Always return true, if there is a dominating write.
> > +
> > +   When there is a dominating read from memory on stack,
> > +   1) if x = a is a memory read, return true.
> > +   2) if x = a is a memory write, return true if the memory is on stack.
> > +  This is the guarantee the memory is *not* readonly. */
> > +
> > +static bool
> > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> > +   const_rtx x, bool is_store) {
> > +  rtx_insn *insn;
> > +  rtx set;
> > +
> > +  gcc_assert (MEM_P (x));
> > +
> > +  FOR_BB_INSNS (bb, insn)
> > +{
> > +  set = single_set (insn);
> > +  if (!set)
> > +continue;
> > +
> > +  /* Dominating store */
> > +  if (rtx_equal_p (x, SET_DEST (set)))
> > +return true;
> > +
> > +  /* Dominating load */
> > +  if (rtx_equal_p (x, SET_SRC (set)))
> > +if (is_store && noce_mem_is_on_stack (a_insn, x))
> > +  return true;
> > +}
> > +
> > +  return false;
> > +}
> So what would be the consequences here of returning false when in fact
> there was a dominating read or write?  That could easily happen if the
> dominating read or write was not a single_set.
If return false when in fact there is a dominating read or write, the 
optimization will not
be triggered, because noce_mem_maybe_invalid_p will be true, which is playing 
the same
role as may_trap_or_fault_p.

> 
> I'm guessing that from a design standpoint you're trying to find cases where
> you know an object was written to within the block and does not escape.  So
> 

RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-16 Thread JiangNing OS
Hi,

May I know any feedback comments to this patch?

Thanks,
-Jiangning

From: Kyrill Tkachov 
Sent: Friday, May 24, 2019 10:55 PM
To: JiangNing OS ; gcc-patches@gcc.gnu.org
Cc: ebotca...@adacore.com; ste...@gcc.gnu.org; l...@redhat.com
Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)


Hi Jiangning
On 3/15/19 4:46 AM, JiangNing OS wrote:
This patch is to fix a missing ifcvt opportunity in back-end. For the simple 
case below,

if (...)
x = a;  /* x is memory */
/* no else */

We can generate conditional move and remove the branch as below if the target 
cost is acceptable.

r1 = x
r2 = a
cmp ...
csel r3, r1, r2, cond
x = r3

This could be safe if x is a stack variable, and there isn't any address taken 
in current function, so the store speculation can be avoided.

In practice, this optimization can improve a real application performance by %4 
on aarch64.



Now that GCC 10 development is open, this should appropriate for considering.

I've cc'ed folks who are either listed maintainers in this area or have 
reviewed patches in this area in my recent memory.

Thanks,

Kyrill


Thanks,
-Jiangning


RE: Fixing ifcvt issue as exposed by BZ89430

2019-05-09 Thread JiangNing OS


> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, May 8, 2019 3:35 PM
> To: JiangNing OS 
> Cc: gcc-patches@gcc.gnu.org; Richard Biener ;
> pins...@gcc.gnu.org
> Subject: Re: Fixing ifcvt issue as exposed by BZ89430
> 
> On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS
>  wrote:
> >
> > To solve BZ89430 the followings are needed,
> >
> > (1) The code below in noce_try_cmove_arith needs to be fixed.
> >
> >   /* ??? We could handle this if we knew that a load from A or B could
> >  not trap or fault.  This is also true if we've already loaded
> >  from the address along the path from ENTRY.  */
> >   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > return FALSE;
> >
> > Finding dominating memory access could help to decide whether the
> memory access following the conditional move is valid or not.
> > * If there is a dominating memory write to the same memory address in
> test_bb as the one from x=a, it is always safe.
> > * When the dominating memory access to the same memory address in
> test_bb is a read, for the load out of x=a, it is always safe. For the store 
> out
> of x=a, if the memory is on stack, it is still safe.
> >
> > Besides, there is a bug around rearranging the then_bb and else_bb layout,
> which needs to be fixed.
> >
> > (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html
> is an overkill. If the write target following conditional move is a memory
> access, it exits shortly.
> >
> >   if (!set_b && MEM_P (orig_x))
> > /* We want to avoid store speculation to avoid cases like
> >  if (pthread_mutex_trylock(mutex))
> >++global_variable;
> >Rather than go to much effort here, we rely on the SSA optimizers,
> >which do a good enough job these days.  */
> > return FALSE;
> >
> > It looks a bit hard for compiler to know the program semantic is
> > related to mutex and avoid store speculation. Without removing this
> > overkill, the fix mentioned by (1) would not work. Any idea? An
> > alternative solution is to detect the following pattern
> > conservatively,
> 
> But it's important to not break this.  Note we do have --param allow-store-
> data-races which the user can use to override this.
> Note that accesses to the stack can obviously not cause store speculation if
> its address didn't escape.  But that's probably what is refered to by "rely on
> the SSA optimizers".

Yes! So I have sent out a patch with title "[PATCH] improve ifcvt optimization 
(PR rtl-optimization/89430)" to detect the access to stack two months ago.
The SSA Optimization called "tree-if-conv" in middle-end can't really cover
this case. The key part of my change is like,

-/* We want to avoid store speculation to avoid cases like
-if (pthread_mutex_trylock(mutex))
-  ++global_variable;
-   Rather than go to much effort here, we rely on the SSA optimizers,
-   which do a good enough job these days.  */
-return FALSE;
+{
+  /* We want to avoid store speculation to avoid cases like
+   if (pthread_mutex_trylock(mutex))
+ ++global_variable;  
+ Tree if conversion cannot handle this case well, and it intends to
+ help vectorization for loops only. */
+  if (!noce_mem_is_on_stack(insn_a, orig_x))
+return FALSE;
 
+  /* For case like,
+   if (pthread_mutex_trylock(mutex))
+ ++local_variable;
+ If any stack variable address is taken, potentially this local
+ variable could be modified by other threads and introduce store
+ speculation. */
+  if (!cfun_no_stack_address_taken)
+return FALSE;
+}

Can you please take a look if you have time? Thank you!

> 
> >  if (a_function_call(...))
> >++local_variable;
> >
> > If the local_variable doesn't have address taken at all in current 
> > function, it
> mustn't be a pthread mutex lock semantic, and then the following patch
> would work to solve (1) and pass the last case as described in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430.
> >
> > Index: gcc/cse.c
> >
> 
> ===
> > --- gcc/cse.c   (revision 268256)
> > +++ gcc/cse.c   (working copy)
> > @@ -540,7 +540,6 @@
> > already as part of an already processed extended basic block.  */
> > static sbitmap cse_visited_basic_blocks;
> >
> > -static bool fixed_base_plus_p (rtx x);  static int notreg_cost (rtx,
> > machine_mode, enum rtx_code, int);  static int preferable (int, in

RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-05-04 Thread JiangNing OS
Hi Jeff,

Yes. The latter one "[PATCH] improve ifcvt optimization (PR 
rtl-optimization/89430)" supersedes the earlier one " Fixing ifcvt issue as 
exposed by BZ89430".

Thanks,
-Jiangning

-Original Message-
From: Jeff Law  
Sent: Tuesday, April 30, 2019 11:54 PM
To: JiangNing OS ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

On 3/14/19 10:46 PM, JiangNing OS wrote:
> This patch is to fix a missing ifcvt opportunity in back-end. For the 
> simple case below,
> 
> if (...)
> x = a;  /* x is memory */
> /* no else */
> 
> We can generate conditional move and remove the branch as below if the target 
> cost is acceptable. 
> 
> r1 = x
> r2 = a
> cmp ...
> csel r3, r1, r2, cond
> x = r3
> 
> This could be safe if x is a stack variable, and there isn't any address 
> taken in current function, so the store speculation can be avoided. 
> 
> In practice, this optimization can improve a real application performance by 
> %4 on aarch64.
[ ... ]
I notice that a couple weeks you'd sent a similar patch under the heading 
"Fixing ifcvt issue as exposed by BZ89430".  Does this patch supersede the 
earlier patch?

Jeff



[PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-03-14 Thread JiangNing OS
This patch is to fix a missing ifcvt opportunity in back-end. For the simple 
case below,

if (...)
x = a;  /* x is memory */
/* no else */

We can generate conditional move and remove the branch as below if the target 
cost is acceptable. 

r1 = x
r2 = a
cmp ...
csel r3, r1, r2, cond
x = r3

This could be safe if x is a stack variable, and there isn't any address taken 
in current function, so the store speculation can be avoided. 

In practice, this optimization can improve a real application performance by %4 
on aarch64.

Thanks,
-Jiangning


csel3.patch
Description: csel3.patch


Fixing ifcvt issue as exposed by BZ89430

2019-02-28 Thread JiangNing OS
To solve BZ89430 the followings are needed,

(1) The code below in noce_try_cmove_arith needs to be fixed.

  /* ??? We could handle this if we knew that a load from A or B could
 not trap or fault.  This is also true if we've already loaded
 from the address along the path from ENTRY.  */
  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
return FALSE;

Finding dominating memory access could help to decide whether the memory access 
following the conditional move is valid or not. 
* If there is a dominating memory write to the same memory address in test_bb 
as the one from x=a, it is always safe.
* When the dominating memory access to the same memory address in test_bb is a 
read, for the load out of x=a, it is always safe. For the store out of x=a, if 
the memory is on stack, it is still safe.

Besides, there is a bug around rearranging the then_bb and else_bb layout, 
which needs to be fixed.

(2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html  is an 
overkill. If the write target following conditional move is a memory access, it 
exits shortly.

  if (!set_b && MEM_P (orig_x))
/* We want to avoid store speculation to avoid cases like
 if (pthread_mutex_trylock(mutex))
   ++global_variable;
   Rather than go to much effort here, we rely on the SSA optimizers,
   which do a good enough job these days.  */
return FALSE;

It looks a bit hard for compiler to know the program semantic is related to 
mutex and avoid store speculation. Without removing this overkill, the fix 
mentioned by (1) would not work. Any idea? An alternative solution is to detect 
the following pattern conservatively,

 if (a_function_call(...))
   ++local_variable;

If the local_variable doesn't have address taken at all in current function, it 
mustn't be a pthread mutex lock semantic, and then the following patch would 
work to solve (1) and pass the last case as described in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430. 

Index: gcc/cse.c
===
--- gcc/cse.c   (revision 268256)
+++ gcc/cse.c   (working copy)
@@ -540,7 +540,6 @@
already as part of an already processed extended basic block.  */
 static sbitmap cse_visited_basic_blocks;
 
-static bool fixed_base_plus_p (rtx x);
 static int notreg_cost (rtx, machine_mode, enum rtx_code, int);
 static int preferable (int, int, int, int);
 static void new_basic_block (void);
@@ -606,30 +605,7 @@
 
 static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
 

-/* Nonzero if X has the form (PLUS frame-pointer integer).  */
 
-static bool
-fixed_base_plus_p (rtx x)
-{
-  switch (GET_CODE (x))
-{
-case REG:
-  if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
-   return true;
-  if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
-   return true;
-  return false;
-
-case PLUS:
-  if (!CONST_INT_P (XEXP (x, 1)))
-   return false;
-  return fixed_base_plus_p (XEXP (x, 0));
-
-default:
-  return false;
-}
-}
-
 /* Dump the expressions in the equivalence class indicated by CLASSP.
This function is used only for debugging.  */
 DEBUG_FUNCTION void
Index: gcc/ifcvt.c
===
--- gcc/ifcvt.c (revision 268256)
+++ gcc/ifcvt.c (working copy)
@@ -76,6 +76,9 @@
 /* Whether conditional execution changes were made.  */
 static int cond_exec_changed_p;
 
+/* bitmap for stack frame pointer definition insns. */
+static bitmap bba_sets_sfp;
+
 /* Forward references.  */
 static int count_bb_insns (const_basic_block);
 static bool cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int);
@@ -99,6 +102,7 @@
   edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void collect_all_fp_insns (void);
 

 /* Count the number of non-jump active insns in BB.  */
 
@@ -2029,6 +2033,110 @@
   return true;
 }
 
+/* Return true if MEM x is on stack. a_insn contains x, if it exists. */
+
+static bool
+noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x)
+{
+  df_ref use;
+
+  gcc_assert (x);
+  gcc_assert (MEM_P (x));
+
+  /* Early exits if find base register is a stack register. */
+  rtx a = XEXP (x, 0);
+  if (fixed_base_plus_p(a))
+return true;
+
+  if (!a_insn)
+return false;
+
+  /* Check if x is on stack. Assume a mem expression using registers
+ related to stack register is always on stack. */
+  FOR_EACH_INSN_USE (use, a_insn)
+{
+  if (bitmap_bit_p (bba_sets_sfp, DF_REF_REGNO (use)))
+return true;
+}
+
+  return false;
+}
+
+/* Always return true, if there is a dominating write.
+   
+   When there is a dominating read from memory on stack,
+   1) if x = a is a memory read, return true.
+   2) if x = a is a memory write, return true if the memory is on stack.
+  This