from:"xinliangli at gmail dot com"

[Bug middle-end/35545] tracer pass is run too late

2014-09-29 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #23 from davidxl xinliangli at gmail dot com ---
(In reply to Jan Hubicka from comment #22)
   Doing it at same approximately the same place as loop header copying 
   seems to
   make most sense to me.  It benefits from early cleanups and DCE definitly 
   and
   it should enable more fun with the later scalar passes that are almost all
   rerun then.
  
  We need to make sure tracer doesn't mess too much with loops then.
  Btw, useless tracing may be undone again by tail-merging.
 
 Tracer already does:
 
   /* We have the tendency to duplicate the loop header
  of all do { } while loops.  Do not do that - it is
  not profitable and it might create a loop with multiple
  entries or at least rotate the loop.  */
bb2-loop_father-header != bb2)
 
 so it won't kill natural loops and peel (I should find time to update the
 peeling pass). It also has:
 
   if (bb_seen_p (bb2) || (e-flags  (EDGE_DFS_BACK | EDGE_COMPLEX))
   || find_best_successor (bb2) != e)
 break;
 
 to not unroll.  So i think it is safe for loop optimizer - all it can do
 is expanding a loop that has control flow in it reducing its unrollability
 later.  
  
  Tracer seems to consume only profile information and thus doesn't
  rely on any other transforms (well, apart from cleanups which could
  affect its cost function).  Why not schedule it even earlier?
  Like to before pass_build_alias?  (the pipeline up to loop transforms
  is quite a mess...)
 
 It uses profile information and code size estiamtes.  I expect that
 (especially
 for C++ stuff) the early post inline passes will remove a lot of code and
 thus
 improve traceability.  This is why I looked for spot after first DCE/DSE.
 
 David, can you please be more specific about how you tested? Was it with
 profile feedback?  What about code size metrics?

It is with profile feedback. There is also reduction in code size, though very
tiny change.

David

 
 Honza

[Bug middle-end/35545] tracer pass is run too late

2014-09-28 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #19 from davidxl xinliangli at gmail dot com ---
(In reply to Jan Hubicka from comment #18)
  WE can try some internal benchmarks with this change too.
 
 That would be very welcome.  Tracer used to be quite useful pass in old days,
 doing 1.6% on -O3+FDO SPECint (for 1.4% code size cost) that put it very
 close
 to the inliner (1.8%) that however needed 12.9% of code size.
 
 http://www.ucw.cz/~hubicka/papers/amd64/node4.html (see aggressive
 optimization table)
 
 I do not think it was ever resonably returned for GIMPLE. Martin, do you have
 any current scores? Part of benefits came from lack of global optimizers, but
 I think tail duplication has a potential in modern compiler, too.
 
 Perhaps we should try to do that and given that we now have tail merging
 pass,
 consider getting it useful without FDO, too.
 
 Honza

I saw small improvement across most of the benchmarks with this change on
sandybridge machines with FDO. The geomean improvement is about 0.3%. The
baseline compiler is Google gcc-49 compiler, and the test compiler is Google
gcc-49 with the patch.

[Bug middle-end/35545] tracer pass is run too late

2014-09-26 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #17 from davidxl xinliangli at gmail dot com ---
(In reply to Jan Hubicka from comment #16)
 I have moved tracer before the late cleanups that seems to be rather obbious
 thing to do. This lets us to optimize the testcase (with -O2):
 int main() ()
 {
   struct A * ap;
   int i;
   int _6;
 
   bb 2:
 
   bb 3:
   # i_29 = PHI i_22(6), 0(2)
   _6 = i_29 % 7;
   if (_6 == 0)
 goto bb 4;
   else
 goto bb 5;
 
   bb 4:
   ap_8 = operator new (16);
   ap_8-i = 0;
   ap_8-_vptr.A = MEM[(void *)_ZTV1A + 16B];
   goto bb 6;
 
   bb 5:
   ap_13 = operator new (16);
   MEM[(struct B *)ap_13].D.2244.i = 0;
   MEM[(struct B *)ap_13].b = 0;
   MEM[(struct B *)ap_13].D.2244._vptr.A = MEM[(void *)_ZTV1B + 16B];
 
   bb 6:
   # ap_4 = PHI ap_13(5), ap_8(4)
   operator delete (ap_4);
   i_22 = i_29 + 1;
   if (i_22 != 1)
 goto bb 3;
   else
 goto bb 7;
 
   bb 7:
   return 0;
 
 }
 
 Martin, I do not have SPEC setup, do you think you can benchmark the
 attached patch with SPEC and profile feedback and also non-FDO -O3 -ftracer
 compared to -O3, please?
 It would be nice to know code size impact, too.
 Index: passes.def
 ===
 --- passes.def  (revision 215651)
 +++ passes.def  (working copy)
 @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_dce);
NEXT_PASS (pass_call_cdce);
NEXT_PASS (pass_cselim);
 +  NEXT_PASS (pass_tracer);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_tree_ifcombine);
NEXT_PASS (pass_phiopt);
 @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_cse_reciprocals);
NEXT_PASS (pass_reassoc);
NEXT_PASS (pass_strength_reduction);
 -  NEXT_PASS (pass_tracer);
NEXT_PASS (pass_dominator);
NEXT_PASS (pass_strlen);
NEXT_PASS (pass_vrp);
 
 Doing it at same approximately the same place as loop header copying seems
 to make most sense to me.  It benefits from early cleanups and DCE definitly
 and it should enable more fun with the later scalar passes that are almost
 all rerun then.


WE can try some internal benchmarks with this change too.

David

[Bug middle-end/45631] Indirect call profiling can be improved to handle multiple targets

2014-09-25 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45631

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xur at google dot com

--- Comment #8 from davidxl xinliangli at gmail dot com ---
(In reply to Jan Hubicka from comment #7)
 Any news on this code?


Rong, I think it is probably good time to extract the multi-target value
profiling code and upstream it.

David

[Bug middle-end/63220] error: inlining failed in call to always_inline

2014-09-11 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63220

--- Comment #2 from davidxl xinliangli at gmail dot com ---
(In reply to Richard Biener from comment #1)
 First of all you should mark the functions 'inline' as well.

This does not help.

  Then the issue
 is that 'eq' is called indirectly which isn't allowed for always_inline
 functions:


Is this documented somewhere? A function can be called indirectly and directly.
What is the right way to force inlining the direct calls?

A warning is already emitted about always_inline might not be inlinable, why
the error?

 
 t.C:81:66: error: inlining failed in call to always_inline 'static constexpr
 bool std::__1::char_traitschar::eq(std::__1::char_traitschar::char_type,
 std::__1::char_traitschar::char_type)': indirect function call with a yet
 undetermined callee
__attribute__ (( __always_inline__)) static constexpr bool
 eq(char_type __c1, char_type __c2)  {
   ^
 t.C:75:37: error: called from here
  if (!__pred(*__m1, *__m2)) { } 
  ^
 
 which means this is a missed-optimization only.  The error is your fault.
 
 Note that getting the error is unreliable so -O0 simply doesn't discover the
 failed inlining.

-O2 works fine -- I have not debugged the problem -- but it seems to be some
newly cloned cgraph edge to be in inconsistent state.

David

[Bug target/63209] [ARM] Wrong conditional move generated

2014-09-10 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63209

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from davidxl xinliangli at gmail dot com ---
Fixed in trunk: r215136, in gcc-4_9: r215137

2014-09-10  Xinliang David Li  davi...@google.com

PR target/63209
* config/arm/arm.md (movcond_addsi): Handle case where source
and target operands are the same.

[Bug middle-end/63220] New: error: inlining failed in call to always_inline

2014-09-10 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63220

Bug ID: 63220
   Summary: error: inlining failed in call to always_inline
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xinliangli at gmail dot com

Created attachment 33466
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33466action=edit
test case

Compile the attached source with -std=gnu++11 -O1 option, the compilation will
fail with the error message

 error: inlining failed in call to always_inline 


There is no problem with O2.

[Bug rtl-optimization/63209] New: [ARM] Wrong conditional move generated

2014-09-08 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63209

Bug ID: 63209
   Summary: [ARM] Wrong conditional move generated
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xinliangli at gmail dot com

Created attachment 33460
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33460action=edit
patch file

GCC (arm-unknown-linux-gnueabi) generates wrong code (O2) for the following
program. The problem is that it uses a source register in the condition move
that has already been overwritten in the previous move.

The program produces output:


top[-1]: ff7a7a7a top[0]: ff7a7a7a left: ff7b7b7b pred: ff7a7a7a
pred != left


The bad assembly code is:

 cmn r1, r2
 mov r0, r3
 movgt   r0, r3



The expected output is:


top[-1]: ff7a7a7a top[0]: ff7a7a7a left: ff7b7b7b pred: ff7b7b7b

The expected assembly:

 cmn r1, r2
 movle   r0, r3


The attached patch fixed the problem. The patch passes regression test.

// tests.c

#include stdio.h

static int Sub(int a, int b) {
  return  b -a;
}

static unsigned Select(unsigned a, unsigned b, unsigned c) {
  const int pa_minus_pb =
  Sub((a   8)  0xff, (b   8)  0xff) +
  Sub((a   0)  0xff, (b   0)  0xff);
  return (pa_minus_pb = 0) ? a : b;
}

__attribute__((noinline)) unsigned Predictor(unsigned left, const unsigned*
const top) {
  const unsigned pred = Select(top[1], left, top[0]);
  return pred;
}

int main(void) {
  const unsigned top[2] = {0xff7a7a7a, 0xff7a7a7a};
  const unsigned left = 0xff7b7b7b;
  const unsigned pred = Predictor(left, top /*+ 1*/);
  fprintf(stderr, top[-1]: %8x top[0]: %8x left: %8x pred: %8x\n,
  top[0], top[1], left, pred);
  if (pred == left)
return 0;
  fprintf(stderr, pred != left\n);
  return 1;
}

[Bug rtl-optimization/62040] New: internal compiler error: in simplify_const_unary_operation, at simplify-rtx.c:1555

2014-08-06 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62040

Bug ID: 62040
   Summary: internal compiler error: in
simplify_const_unary_operation, at simplify-rtx.c:1555
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xinliangli at gmail dot com

Compile the attached source file with option -Os -g  using  aarch64 compiler,
the compiler ICEs:

et2.c: In function 'foo':
et2.c:189:1: internal compiler error: in simplify_const_unary_operation, at
simplify-rtx.c:1555
 }

[Bug rtl-optimization/62040] internal compiler error: in simplify_const_unary_operation, at simplify-rtx.c:1555

2014-08-06 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62040

--- Comment #1 from davidxl xinliangli at gmail dot com ---
Created attachment 33264
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33264action=edit
test case

[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-17 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #3 from davidxl xinliangli at gmail dot com ---
In tree-profiling (), in the intrumentation loop (for each def function) -- a
pending list of functions calling pure functions should be recorded.

After the loop that reset the const/pure flags, a loop should iterate though
the pending list, and perform cfg fixup on them.

David


(In reply to Richard Biener from comment #1)
 There are several possibilities:
 
  - don't instrument pure/const functions
  - do not reset the pure/const flag (I see no reason to - the compiler might
not insert side-effects and we can consider the counter updates as
non-side-effect)
 
 That is, I wonder why we do
 
   /* Drop pure/const flags from instrumented functions.  */
   FOR_EACH_DEFINED_FUNCTION (node)
 {
   if (!gimple_has_body_p (node-decl)
   || !(!node-clone_of
   || node-decl != node-clone_of-decl))
 continue;
 
   /* Don't profile functions produced for builtin stuff.  */
   if (DECL_SOURCE_LOCATION (node-decl) == BUILTINS_LOCATION)
 continue;
 
   cgraph_set_const_flag (node, false, false);
   cgraph_set_pure_flag (node, false, false);
 }
 
 but don't then call execute_fixup_cfg () again (but it doesn't yet deal
 with a const/pure function becoming non-pure/const and thus a bb-ending
 stmt).
 
 Btw, this is yet another case where transitioning to call flags instead
 of decl flags would save us - definitely even the instrumented call cannot
 return abnormally.
 
 Dropping the clearing of const/pure fixes the bug.  Honza?  I only can think
 of the case where we have
 
   for (..)
{
  const ();
  const ();
}
 
 and inline only one of the calls where after that loop store-motion might
 apply store-motion to the counter updates of the inline clone, overwriting
 the updates from the non-inline call.  But do we really care?
 
 Anyway, confirmed.  Btw, goo() should be leaf but it seems we don't
 auto-compute 'leaf' yet?

[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-17 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776

--- Comment #5 from davidxl xinliangli at gmail dot com ---
(In reply to wmi from comment #4)
 Can we move the pure/const resetting loop to an earlier place: inside
 branch_prob , after instrument_edges and before gsi_commit_edge_inserts
 (where stmt_ends_bb_p  is checked), so that gsi_commit_edge_inserts() which
 changes cfg could take reset const/pure flags into consideration?

Sounds plausible. Have you tried it?

David

[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-17 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776

--- Comment #7 from davidxl xinliangli at gmail dot com ---
(In reply to wmi from comment #6)
 (In reply to davidxl from comment #5)
  (In reply to wmi from comment #4)
   Can we move the pure/const resetting loop to an earlier place: inside
   branch_prob , after instrument_edges and before gsi_commit_edge_inserts
   (where stmt_ends_bb_p  is checked), so that gsi_commit_edge_inserts() 
   which
   changes cfg could take reset const/pure flags into consideration?
  
  Sounds plausible. Have you tried it?
  
  David
 
 I just tried but found it was not very easy.
 
 FOR_EACH_DEFINED_FUNCTION (node) {
   execute_fixup_cfg() and cleanup_tree_cfg()
   branch_prob()
 }
 
 For the above loop, branch_prob is called one by one for each defined func.
 Because a func could possibly call any other funcs on the cgraph, we need to
 reset the const/pure flags for every defined func before any branch_prob()
 is called, so we cannot put the const/pure reset code inside branch_prob().
 

As you noted below, resetting the flag before branch_prob will lead to cfg
mismatch between instr and profile-use build.

David

 We also cannot move the const/pure reset loop before the branch_prob() loop,
 because execute_fixup_cfg will use const/pure flags to generate different
 cfg. If we put the const/pure reset code before the branch_prob() loop, the
 const/pure reset code should only be executed in intrumentation phase, not
 in annotation phase, so that we may get different cfg between intrumentation
 and annotation.
 
 Wei.

[Bug middle-end/61571] bad aliasing -- wrong FRE

2014-06-23 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571

--- Comment #5 from davidxl xinliangli at gmail dot com ---
Thanks for the analysis.  I agree it is invalid to use -fstrict-aliasing for
the code. 

The implementation tries to save some space in __list_impl class by making the
sentinel marker __end_ to be just __list_node_base, but it is linked with other
__list_node type objects.

David

[Bug middle-end/61571] New: bad aliasing -- wrong FRE

2014-06-20 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571

Bug ID: 61571
   Summary: bad aliasing -- wrong FRE
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xinliangli at gmail dot com

Compile the attached source with -m32 -std=c++11 -O2 on x86, the program will
crash. It runs fine with -m32 -std=c++11 -O2 -fno-tree-pre -fno-tree-fre.

(same failure is also observed on arm target).

The bad FRE happens in listint::merge function. listint::splice needs to be
inlined to trigger the bad CSE.

The problematic CSE is this one:

listint::merge (...)
 {

 if (this != __c)
{
iterator __f1 = begin();   
iterator __e1 = end();
iterator __f2 = __c.begin(); // this one
...
while (...)
{
}

   // inline instance of splice:


if (!__c.empty())
{
__node_pointer __f = __c.__end_.__next_;   // and this one
__node_pointer __l = __c.__end_.__prev_;
base::__unlink_nodes(__f, __l);
__link_nodes(__p.__ptr_, __f, __l);
base::__sz() += __c.__sz();
__c.__sz() = 0;
}

Note begin() method accesses __c.__end_.__next_.

[Bug middle-end/61571] bad aliasing -- wrong FRE

2014-06-20 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571

--- Comment #1 from davidxl xinliangli at gmail dot com ---
Created attachment 32979
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32979action=edit
source file

[Bug middle-end/61571] bad aliasing -- wrong FRE

2014-06-20 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61571

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||richard.guenther at gmail dot 
com

--- Comment #2 from davidxl xinliangli at gmail dot com ---
The issue exists in trunk and as early as 4.7 (that i have tested).

[Bug tree-optimization/61289] New: Bad jump threading generates infinite loop

2014-05-22 Thread xinliangli at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61289

Bug ID: 61289
   Summary: Bad jump threading generates infinite loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xinliangli at gmail dot com

Build the following program with -fno-exceptions -O2, the program aborts at
runtime.

Adding -fno-tree-dominator-opts, the program runs fine.

To trigger the problem, the first iteration needs to return list == new_list
-- the code will then get into the cloned path which is actually an infinite
loop.

(the toy program does something nonsense, but it should finish without
aborting).

jpthread.h
-

struct MyList {
 inline int LastHit () const { return last_; }
 // inline int FirstHit () const { return last_-1; }

 MyList* PushBack(int);
// MyList* PushFront(int);
 void Destroy();

 static MyList* Create(int);
 MyList(int i): last_(i) {}
 void PopBack();
 int last_;
};

jpthread.cc


#include jpthread.h

void test  ()
{

  MyList *list = MyList::Create(20);

  for (int i = 0; i  2; ) {
MyList* new_list = list-PushBack(
list-LastHit() + 10);
if (new_list != list) {
  list-Destroy();
  list = new_list;
  ++i;
}
  }

  list-Destroy();
}
int main()
{
 test();
}


m.cc
-
#include stdlib.h
#include stdio.h
#include jpthread.h


int cnt = 0;
MyList *prev = 0;

MyList* MyList::PushBack(int i)
{
  cnt ++;

  if (cnt == 1)
 return prev;

  if (cnt  3) {
 fprintf (stderr, Loop not ending, aborting ..\n);
 abort ();
  }

  prev = this;
  return new MyList(i+last_);
}

MyList* MyList::Create(int i)
{
  prev = new MyList(i);

  return prev;
}

void MyList::Destroy() {
   delete this;
}

[Bug tree-optimization/61009] Incorrect jump threading in dom

2014-05-08 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61009

--- Comment #10 from davidxl xinliangli at gmail dot com ---
(In reply to Jeffrey A. Law from comment #9)
 Sorry, personal issue taking an enormous amount of my time right now.  I
 have a fully tested patch and just need to twiddle the attached test into an
 executable testcase for the regression suite.

Can you post the early version of the patch? We can help test it with larger
test cases that exposed the problem in the first place.

David

[Bug tree-optimization/61009] Incorrect jump threading in dom

2014-05-07 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61009

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #8 from davidxl xinliangli at gmail dot com ---
(In reply to Jeffrey A. Law from comment #7)
 I see what's happening here...  I need to think about how best to handle
 this situation.  Oh how threading across loop backedges perilous.

Any update on this issue?

David

[Bug tree-optimization/60899] undef reference generated with -fdevirtualize-speculatively

2014-04-20 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899

--- Comment #3 from davidxl xinliangli at gmail dot com ---
(In reply to Jan Hubicka from comment #2)
 David,
 it seems a_m.C should be different form a.C.  From chain of events you
 describe I think
 we need to figure out why the last folding happens.  Does the function pass 
 can_refer_decl_in_current_unit_p and if so, how does cgraph node look at
 that time?
 
 Honza

Cut  paste error:

// a_m.cc

#include a.h
struct D2: public DI {
  virtual int doit () { return 3; }
};

extern int bar(DI*);
int main()
{
  D2 d2;
  return bar(d2);
}

[Bug tree-optimization/60899] undef reference generated with -fdevirtualize-speculatively

2014-04-20 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899

--- Comment #4 from davidxl xinliangli at gmail dot com ---
(In reply to davidxl from comment #3)
 (In reply to Jan Hubicka from comment #2)
  David,
  it seems a_m.C should be different form a.C.  From chain of events you
  describe I think
  we need to figure out why the last folding happens.  Does the function pass 
  can_refer_decl_in_current_unit_p and if so, how does cgraph node look at
  that time?
  
  Honza
 
 Cut  paste error:
 
 // a_m.cc
 
 #include a.h
 struct D2: public DI {
   virtual int doit () { return 3; }
 };
 
 extern int bar(DI*);
 int main()
 {
   D2 d2;
   return bar(d2);
 }


stepping into can_refer_decl_in_current_unit_p indicates it returns true (for
A::foo and A::vtable) at the condition @line 106.

[Bug tree-optimization/60899] undef reference generated with -fdevirtualize-speculatively

2014-04-20 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899

--- Comment #7 from davidxl xinliangli at gmail dot com ---
(In reply to Paul Pluzhnikov from comment #6)
 Google ref: b/13453242

Verified that the proposed patch fixed the problem in b/1345242.

[Bug tree-optimization/60899] New: undef reference generated with -fdevirtualize-speculatively

2014-04-19 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60899

Bug ID: 60899
   Summary: undef reference generated with
-fdevirtualize-speculatively
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xinliangli at gmail dot com

Build the following code with the following command line:

g++ -O2 -fdisable-tree-einline a.cc a_m.cc 

results in:

/tmp/cci31j3N.o: In function `D::doit()':
a.cc:(.text._ZN1D4doitEv[_ZN1D4doitEv]+0x5): undefined reference to `A::foo()'
collect2: error: ld returned 1 exit status

It builds fine when devirtualization is disabled:

-O2 -fno-devirtualization-speculatively -fdisable-tree-einline 


The problem is there is no instantiation of any class A instances (final or
subclass) in the program, so vtables and A::foo are all eliminated. The
reference to A::foo is from D::doit. In a successful build, there are no D
instances either, so D::doit won't be emitted. However with speculative
devirtualization, D::doit may be speculatively referenced even though there are
no D instances.

What happens is that during ipa-inline, goo is inlined into D::doit, the
virtual call to foo should become an direct call to A::foo, but the new edge is
not discovered. Since there is no call edge to A::foo, A::foo gets removed
right after ipa-inline (before inline transform). However during inline
transform, gimple-fold-call converts the virtual call into a direct call.


The test case is extracted from a very large real program. The explicit
reference to D::doit in bar is to demonstrate the problem -- in the real
program, the reference is from spec-devirt.


//a.h
struct B {
 virtual int foo() = 0;
 int goo() { return foo(); }
 int i;
};

struct A  : public B {
 A() : i(0)  {}
 int foo() { return 1;}
 int i;
};
struct A2  : public B {
 int foo() { return 2;}
};

struct DI {
  virtual int doit() = 0;
};

struct D : public DI {
   virtual int doit () { return m.goo(); }

   A m;
};

// a.cc

#include a.h

int cond;
int bar (DI* ap) {
   if (cond) return static_castD*(ap)-D::doit();  // Mimic speculative
devirtualization
   return ap-doit();
}


// a_m.cc

#include a.h

int cond;
int bar (DI* ap) {
   if (cond) return static_castD*(ap)-D::doit();  // Mimic speculative
devirtualization
   return ap-doit();
}

[Bug tree-optimization/48316] missed CSE / reassoc with array offsets

2014-03-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48316

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #1 from davidxl xinliangli at gmail dot com ---
int a[8][8];


int foo(int i)
{
  return a[i][3] + a[i+1][3] + a[i+2][3];
}

ICC generates:

movslq%edi, %rdi#5.1
shlq  $5, %rdi  #6.10
movl  12+a(%rdi), %eax  #6.10
addl  44+a(%rdi), %eax  #6.20
addl  76+a(%rdi), %eax

GCC can not CSE the common code for address arithmetic:

   leal1(%rdi), %eax
movslq  %edi, %rdx
addl$2, %edi
cltq
movslq  %edi, %rdi
salq$5, %rdx
salq$5, %rax
salq$5, %rdi
movla+12(%rax), %eax
addla+12(%rdx), %eax
addla+12(%rdi), %eax
ret

[Bug tree-optimization/59303] [4.9 Regression] uninit-pred-8_b.c and uninit-pred-9_b.c fail after better optimizations

2013-12-19 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59303

--- Comment #5 from davidxl xinliangli at gmail dot com ---
Fixing this requires more powerful predicate analysis with the help of value
equivalent classes.

From the dump:

  Use in stmt blah (v_1);
  is guarded by :
  if (_23 != 0)


_23 = pephitmp_22 | _8, so the use guard is

prephitmp_22 != 0 || _8 != 0

prephitmp_22 = PHI 0(13), prephitmp_21(3), so the guard can be translated to

prephitmp_21 != 0 || _8 != 0

prephitmp_21 = r_9(D) = 19, so the predicate becomes

 r_9(D) = 19 || _8 != 0


For the def:

[CHECK] Found def edge 1 in v_1 = PHI v_14(D)(13), r_9(D)(3)
Operand defs of phi v_1 = PHI v_14(D)(13), r_9(D)(3)
is guarded by :
   if (_8 != 0)
  (.OR.)
(.NOT.) if (_8 != 0)
(.AND.)
 if (_13 != 0)

the predicate is
   _8 != 0 || _13 != 0

where _13 = _10 | _12, 10_ = (r_9(D) = 19), _12 = l_11(D) != 0

so the predicate is
   _8 != 0 || r_9(D) = 19 || 1_11(D) != 0

which is the superset for the use.


Perhaps we need to do around of predicate normalization before comparison.

David

[Bug tree-optimization/59303] [4.9 Regression] uninit-pred-8_b.c and uninit-pred-9_b.c fail after better optimizations

2013-12-19 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59303

--- Comment #6 from davidxl xinliangli at gmail dot com ---
I am working on a solution (and some cleanups).

David

[Bug middle-end/45631] devirtualization with profile feedback does not work for function pointers

2013-12-17 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45631

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #6 from davidxl xinliangli at gmail dot com ---
(In reply to Jan Hubicka from comment #5)
 Google has mentioned to have patches for smarter multi-target collection.
 So perhaps for next stage1?

Yes -- we will contribute that code back to trunk in next stage1 (was planning
to do so in this cycle, but Rong was busy with contributing other patches: fast
coverage mode, atomic update, libgcov refactoring, and profile-tool).

David

[Bug target/59376] -mmemcpy-strategy= and -mmemset-strategy= are undocumented

2013-12-03 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59376

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #1 from davidxl xinliangli at gmail dot com ---
Where should it be documented (they are documented in doc/invoke.texi already)?

[Bug tree-optimization/41488] IVOpts cannot coalesce multiple induction variables

2013-10-29 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41488

--- Comment #5 from davidxl xinliangli at gmail dot com ---
Alternative approach -- introduce a special forward propagation before or after
the ivopt to get rid of the redundant iv. This propagation needs to propagate
through header phi.

[Bug middle-end/57287] [4.9 Regression] Bogous uninitialized warning with abnormal control flow

2013-08-28 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57287

--- Comment #16 from davidxl xinliangli at gmail dot com ---
(In reply to Richard Biener from comment #15)
 Confirmed.  David, can you have a look here?  I had a hard time following
 what
 exactly to do with the dataflow in the uninit pass for abnormal control flow
 (abnormal control flow should be considered receiving an initialized value).
 
 Thanks.


I looked at it a little. The warning is not from the predicated uninit analysis
which checks uses of phi defs.  The warning is emitted from tree-ssa.c:1644 in
function warn_uninitialized_vars. It warns about the 'definitely' uninitialized
variable that may be executed. In this case it is use of 'buf_16(ab)' in BB3. 
Not sure where this statement comes from:

buf_24 = buf_16(ab);

The dot file is attached.

digraph uninit.c.131t.uninit1 {
overlap=false;
subgraph baz {
color=black;
label=baz;
fn_11_basic_block_0
[shape=Mdiamond,style=filled,fillcolor=white,label=ENTRY];

fn_11_basic_block_1
[shape=Mdiamond,style=filled,fillcolor=white,label=EXIT];

fn_11_basic_block_2 [shape=record,style=filled,fillcolor=lightgrey,label={
FREQ:1 |\bb\ 2\:\l\
|varseen_11(ab)\ =\ 0;\l\
|_14\ =\ head\ ();\l\
}];

fn_11_basic_block_3 [shape=record,style=filled,fillcolor=lightgrey,label={
FREQ:7100 |\bb\ 3\:\l\
|list\ =\ _14;\l\
|buf_24\ =\ buf_16(ab);\l\
|if\ (_14\ !=\ 0B)\l\
\ \ goto\ \bb\ 4\;\l\
else\l\
\ \ goto\ \bb\ 8\;\l\
}];

fn_11_basic_block_4 [shape=record,style=filled,fillcolor=lightgrey,label={
FREQ:8500 |\bb\ 4\:\l\
|#\ varseen_3(ab)\ =\ PHI\ \1(3),\ 1(6)\\l\
|#\ n_26(ab)\ =\ PHI\ \_14(3),\ n_2(ab)(6)\\l\
|#\ buf_27(ab)\ =\ PHI\ \buf_24(3),\ buf_7(ab)(6)\\l\
|buf_21\ =\ bar\ ();\l\
}];

fn_11_basic_block_5 [shape=record,style=filled,fillcolor=lightgrey,label={
FREQ:4250 |\bb\ 5\:\l\
}];

fn_11_basic_block_6 [shape=record,style=filled,fillcolor=lightgrey,label={
FREQ:2900 |\bb\ 6\:\l\
|#\ n_1(ab)\ =\ PHI\ \n_15(ab)(2),\ n_26(ab)(4),\ n_26(ab)(5)\\l\
|#\ varseen_4(ab)\ =\ PHI\ \varseen_11(ab)(2),\ varseen_3(ab)(4),\
varseen_3(ab)(5)\\l\
|#\ buf_6(ab)\ =\ PHI\ \buf_16(ab)(2),\ buf_27(ab)(4),\ buf_21(5)\\l\
|__sigsetjmp\ (buf_6(ab),\ 1);\l\
|n_19\ =\ n_1(ab)-\next;\l\
|n_2(ab)\ =\ n_19;\l\
|varseen_5\ =\ varseen_4(ab);\l\
|buf_7(ab)\ =\ buf_6(ab);\l\
|if\ (n_2(ab)\ !=\ 0B)\l\
\ \ goto\ \bb\ 4\;\l\
else\l\
\ \ goto\ \bb\ 7\;\l\
}];

fn_11_basic_block_7 [shape=record,style=filled,fillcolor=lightgrey,label={
FREQ:1500 |\bb\ 7\:\l\
|_25\ =\ varseen_5;\l\
}];

fn_11_basic_block_8 [shape=record,style=filled,fillcolor=lightgrey,label={
FREQ:1500 |\bb\ 8\:\l\
|#\ _8\ =\ PHI\ \_25(7),\ 0(3)\\l\
|return\ _8;\l\
}];

fn_11_basic_block_0:s - fn_11_basic_block_2:n
[style=solid,bold,color=blue,weight=100,constraint=true, label=[100%]];
fn_11_basic_block_2:s - fn_11_basic_block_6:n
[style=solid,bold,color=red,weight=10,constraint=true, label=[29%]];
fn_11_basic_block_2:s - fn_11_basic_block_3:n
[style=solid,bold,color=blue,weight=100,constraint=true, label=[71%]];
fn_11_basic_block_3:s - fn_11_basic_block_4:n
[style=solid,bold,color=black,weight=10,constraint=true, label=[85%]];
fn_11_basic_block_3:s - fn_11_basic_block_8:n
[style=solid,bold,color=black,weight=10,constraint=true, label=[15%]];
fn_11_basic_block_4:s - fn_11_basic_block_6:n
[style=dotted,bold,color=red,weight=10,constraint=false, label=[50%]];
fn_11_basic_block_4:s - fn_11_basic_block_5:n
[style=solid,bold,color=blue,weight=100,constraint=true, label=[50%]];
fn_11_basic_block_5:s - fn_11_basic_block_6:n
[style=dotted,bold,color=blue,weight=10,constraint=false, label=[100%]];
fn_11_basic_block_6:s - fn_11_basic_block_4:n
[style=solid,bold,color=black,weight=10,constraint=true, label=[85%]];
fn_11_basic_block_6:s - fn_11_basic_block_7:n
[style=solid,bold,color=black,weight=10,constraint=true, label=[15%]];
fn_11_basic_block_7:s - fn_11_basic_block_8:n
[style=solid,bold,color=blue,weight=100,constraint=true, label=[100%]];
fn_11_basic_block_8:s - fn_11_basic_block_1:n
[style=solid,bold,color=black,weight=10,constraint=true, label=[100%]];
fn_11_basic_block_0:s - fn_11_basic_block_1:n
[style=invis,constraint=true];
}
}

[Bug tree-optimization/56490] [4.6/4.7/4.8 Regression] -Wall triggering infinite loop

2013-03-08 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56490



--- Comment #5 from davidxl xinliangli at gmail dot com 2013-03-08 16:21:13 
UTC ---

(In reply to comment #4)

 Clearly the limiting is bougs ... the issue is with large fanout BBs

 we recurse in compute_control_dep_chain which leads to exponential

 complexity.  Limiting that to 8^5 isn't really a fix ... (yes, you

 can call that O(1)).

 

 compute_control_dep_chain is called 65 million times(!) for the testcase

 after the patch.

 

 There has to be a better algorithmic way of implementing this.

 

 IMHO still not fixed in 4.8.



Right. 



However for on demand computation like this,  limiting is still needed. For

this particular case, limiting the depth to 3 (or may be smaller) won't cause

any regressions.

[Bug middle-end/56490] [4.6/4.7/4.8 Regression] -Wall triggering infinite loop

2013-03-01 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56490



davidxl xinliangli at gmail dot com changed:



   What|Removed |Added



 CC||xinliangli at gmail dot com



--- Comment #2 from davidxl xinliangli at gmail dot com 2013-03-01 22:45:48 
UTC ---

The function has a control flow that post-dom chain is as long as 103. Limiting

the max post-dom walk length solve the problem.

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-18 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #34 from davidxl xinliangli at gmail dot com 2013-01-18 17:27:43 
UTC ---



The patch is missing changes in documentation on the new attribute.



David





(In reply to comment #32)

 Created attachment 29207 [details]

 gcc48-pr55742.patch

 

 This bug is open for way too long given its severity, so let's start talking

 over patches.

 

 This patch attempts to implement what I understand from Jason's comments, just

 with default instead of any, because it is indeed the default target

 attribute (whatever you specify on the command line).

 

 Say on:

 void foo ();

 void foo () __attribute__((target (avx)));

 void foo () __attribute__((target (default)));

 __attribute__((target (default))) void foo ()

 {

 }

 __attribute__((target (avx))) void foo ()

 {

 }

 void (*fn) () = foo;

 

 first we merge the first two decls, because only if target attribute is 
 present

 on both, we consider it for multi-versioning, for compatibility with 4.7 and

 older.  On e.g.

 void foo ();

 void foo () __attribute__((target (sse4)));

 void foo () __attribute__((target (default)));

 void foo ()

 {

 }

 we reject the last fn definition, because at that point foo is already known 
 to

 be multi-versioned, thus it is required that target attribute is specified for

 foo (either default, or some other).  Unfortunately, for this case the error

 is reported twice for some reason.

 

 The #c0 testcase now compiles.

 

 Now, the issues I discovered with multiversioning, still unfixed by the patch:

 1) the mv*.C testcases should be moved, probably to g++.dg/ext/mv*.C

 2) can you please explain the mess in handle_target_attribute?

   /* Do not strip invalid target attributes for targets which support function

  multiversioning as the target string is used to determine versioned

  functions.  */

   else if (! targetm.target_option.valid_attribute_p (*node, name, args,

   flags)

 ! targetm.target_option.supports_function_versions ())

 *no_add_attrs = true;

 Why do you need that?  Consider complete garbage in target attribute 
 arguments,

 which is errored about, but the above for i386/x86_64 keeps the target

 attribute around anyway, leading to lots of ICEs everywhere:

 Consider e.g.:

 __attribute__((target (default))) void foo (void)

 {

 }

 __attribute__((target (128))) void foo (void)

 {

 }

 3) the multiversioning code assumes that target has a single argument, but it

 can have more than one.  Say for:

 __attribute__((target (avx,popcnt))) void foo (void)

 {

 }

 __attribute__((target (popcnt,avx))) void bar (void)

 {

 }

 the compiler handles those two as equivalent, but with -Dbar=foo

 multi-versioning only considers the first string out of that.

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-17 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #30 from davidxl xinliangli at gmail dot com 2013-01-17 22:45:22 
UTC ---

(In reply to comment #26)

 On Wed, Jan 16, 2013 at 5:02 PM, jakub at gcc dot gnu.org

 gcc-bugzi...@gcc.gnu.org wrote:

 

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742

 

  --- Comment #25 from Jakub Jelinek jakub at gcc dot gnu.org 2013-01-16 
  16:02:35 UTC ---

  The actual merging of target attribute isn't that important, what would be 
  more

  important is that other attributes are merged together in that case and the

  decls treated as the same thing.

 

  Anyway, with target(any) attribute, what would happen for

  void foo () __attribute__((target (avx)));

  void foo () __attribute__((target (any)));

 

 IMHO the re-declaration with a different target attribute should be an error.



This can be a clean way to handle declarations. The definition should have

either 'default' attribute or a matching target attribute.



 A proper forward declaration for a function with MV applied shouldn't have

 any target attribute.



What does this sentence mean?





David

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-15 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #22 from davidxl xinliangli at gmail dot com 2013-01-15 19:03:03 
UTC ---

(In reply to comment #21)

 What does it mean to merge two declarations with different target attributes? 

 Are the strings just combined?



This is a good question.  'merge' needs to be clearly defined and can be

implementation defined. There might be conflicts between explicit target

strings) and the compiler can choose to pick one and discard others (with a

proper warning).



David

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-14 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #11 from davidxl xinliangli at gmail dot com 2013-01-14 17:17:21 
UTC ---

(In reply to comment #9)

 I'd say once a target attribute appears at a declaration (non-definition) MV

 needs to be disabled.  Or, the declarations target attribute and those at

 the MV definitions need to be merged.

 

 Thus, a declarations target attribute applies to all definitions (thus all

 MV variants).



This might reduce the usefulness of MV -- using declarations with attributes to

indicate MV candidates is the intended use model.



David

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-14 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #12 from davidxl xinliangli at gmail dot com 2013-01-14 17:20:56 
UTC ---

(In reply to comment #10)

 Either use a different name of the attribute (replace target with mv_target or

 whatever), or require a new attribute (mv?) to be present for multi-versioning

 (mv attribute on any of the decls would enable it, if mv attribute isn't

 present on either of the two decls being merged, then the target attribute is

 merged as before 4.8).





I like this proposal:



require a new attribute (mv?) to be present for multi-versioning

 (mv attribute on any of the decls would enable it, if mv attribute isn't

 present on either of the two decls being merged, then the target attribute is

 merged as before 4.8)





David

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-14 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #14 from davidxl xinliangli at gmail dot com 2013-01-14 17:49:59 
UTC ---

(In reply to comment #13)

 (In reply to comment #12)

  (In reply to comment #10)

   Either use a different name of the attribute (replace target with 
   mv_target or

   whatever), or require a new attribute (mv?) to be present for 
   multi-versioning

   (mv attribute on any of the decls would enable it, if mv attribute isn't

   present on either of the two decls being merged, then the target 
   attribute is

   merged as before 4.8).

  

  

  I like this proposal:

 

 I too like just using a different attribute name. I will prepare a patch asap

 for this.

 

 Thanks

 Sri.

 

  

  require a new attribute (mv?) to be present for multi-versioning

   (mv attribute on any of the decls would enable it, if mv attribute isn't

   present on either of the two decls being merged, then the target 
   attribute is

   merged as before 4.8)

  

  

  David





I mean Jacub's second alternative -- adding additional attribute that alters

the meaning of 'target' attribute -- when it is present, no merging will be

done.





David

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-14 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #18 from davidxl xinliangli at gmail dot com 2013-01-14 20:17:45 
UTC ---

All target attributes on decls not tagged with 'mv' attribute should be merged

into the default definition. Any decl tagged with 'mv' should be treated as a

new decl. 



David

[Bug c++/55742] [4.8 regression] attribute in class function declaration cause prototype does not match errors.

2013-01-14 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



--- Comment #20 from davidxl xinliangli at gmail dot com 2013-01-14 20:29:54 
UTC ---

(In reply to comment #19)

 That wouldn't work, because you would then have the default (non-mv) version,

 possibly mv version with no target attribute, and then some other mv versions

 with target attributes.  The problem with that is that the first two mangle 
 the

 same.  This means that a non-mv and mv with no target attribute needs to be

 treated as the same decl (i.e. merged together).



that makes sense -- mv annotated decl without target attribute gets merged with

default version -- basically it has no actual effect.



David

[Bug c++/55742] attribute in class function declaration cause prototype does not match errors.

2012-12-20 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55742



davidxl xinliangli at gmail dot com changed:



   What|Removed |Added



 CC||xinliangli at gmail dot com



--- Comment #6 from davidxl xinliangli at gmail dot com 2012-12-20 19:52:32 
UTC ---

Since target attribute merging/conflicts handling is not well specified nor

implemented, with the introduction of MV, it is a good opportunity to

straighten it out and document it.



MV helps providing diagnostics to detect the broken code (where the target

attribute is ignored) :)



David





(In reply to comment #5)

 (In reply to comment #4)

  On Thu, Dec 20, 2012 at 1:21 PM, tmsriram at google dot com

  gcc-bugzi...@gcc.gnu.org wrote:

  

   However, with function multiversioning, this will become a problem as

   multiversioning does not treat two decls with different target attributes

   as

   identical. Since we are enabling multiversioning by default, atleast in

   C++

   front-end for now, IMO, it is better to insist that the definition and

   declaration contain identical target attributes.

  

  Unfortunately, we cannot do that.  A lot of existing code relies on

  this attribute merging.  The cleanest approach here is probably to add

  an additional 'mv' attribute to explicitly enable multiversioning.

  Breaking the existing semantics is going to break a lot of code.

 

 Ok, just to be clear, there are two problems here:

 

 1) Target attribute merging. If the assumption that the target attributes of

 the decls must be merged is valid, there is a bug. Also, this means that using

 target attributes to do multiversioning is wrong.

 

 2) Function multiversioning is exposing the bug, via build failures, the

 problem of declarations and definitions not having identical target 
 attributes.

 

 

 First, we need to decide if target attribute merging is a valid assumption. If

 so, we fix the bug and make function multiversioning use a new attribute.

 

 If the assumption is not valid, changing the source is the only solution. The

 source is invalid now since the intended behaviour is not happening.

 

 

  

  

  Diego.

[Bug c/55771] Negation and type conversion incorrectly exchanged

2012-12-20 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55771



davidxl xinliangli at gmail dot com changed:



   What|Removed |Added



 CC||xinliangli at gmail dot com



--- Comment #2 from davidxl xinliangli at gmail dot com 2012-12-21 06:58:59 
UTC ---

(In reply to comment #1)

 Google ref b/7902206



The FE behaves correctly when 'double' type is used instead of 'float' .

[Bug middle-end/55595] [google] r172952 (LIPO) broke profiledbootstrap on google/main, and later in google/gcc-4_7

2012-12-04 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55595



davidxl xinliangli at gmail dot com changed:



   What|Removed |Added



 CC||xinliangli at gmail dot com



--- Comment #1 from davidxl xinliangli at gmail dot com 2012-12-04 20:19:48 
UTC ---

google/main branch is not fully tested, and not supposed to be used. Google's

release branches are.



In very near future, lots of cleanups will be done in google/main, and it will

be synced with trunk. During that period, expect it to be very unstable.



David







(In reply to comment #0)

 starting with r172592, google/main fails profiledbootstrap with the error

 below. going one revision back eliminates the error. later commits

 compound/mutate the profiledbootstrap failures; once this bug is fixed, I'll

 retest.

 

 $ rm -rf *

 

 $ ../google-main/configure --program-suffix=-google-4.7 --prefix=/u/mhargett

 --enable-languages=c,c++,lto --enable-lto --with-fpmath=sse

 --disable-libmudflap --disable-libssp --enable-build-with-cxx 
 --enable-gold=yes

 --with-mpc=/u/mhargett --with-gmp=/u/mhargett/ --with-mpfr=/u/mhargett/

 --disable-multilib --disable-libgomp --disable-werror

 

 $ make -j1 profiledbootstrap

 

 if [ x-fpic != x ]; then \

   /work/mhargett/google-main-obj/./prev-gcc/xgcc

 -B/work/mhargett/google-main-obj/./prev-gcc/

 -B/u/mhargett/x86_64-unknown-linux-gnu/bin/

 -B/u/mhargett/x86_64-unknown-linux-gnu/bin/

 -B/u/mhargett/x86_64-unknown-linux-gnu/lib/ -isystem

 /u/mhargett/x86_64-unknown-linux-gnu/include -isystem

 /u/mhargett/x86_64-unknown-linux-gnu/sys-include-c -DHAVE_CONFIG_H -g -O2

 -fprofile-use  -I. -I../../google-main/libiberty/../include  -W -Wall

 -Wwrite-strings -Wc++-compat -Wstrict-prototypes -pedantic  -fpic

 ../../google-main/libiberty/cp-demangle.c -o pic/cp-demangle.o; \

 else true; fi

 ../../google-main/libiberty/cp-demangle.c: In function

 \u2018is_ctor_or_dtor\u2019:

 ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for

 function \u2018is_ctor_or_dtor\u2019 while reading counter \u2018arcs\u2019

 [-Werror=coverage-mismatch]

 ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is 
 7

 instead of 8

 ../../google-main/libiberty/cp-demangle.c: In function

 \u2018d_demangle_callback\u2019:

 ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for

 function \u2018d_demangle_callback\u2019 while reading counter 
 \u2018arcs\u2019

 [-Werror=coverage-mismatch]

 ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is

 25 instead of 26

 ../../google-main/libiberty/cp-demangle.c: In function

 \u2018d_ctor_dtor_name\u2019:

 ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for

 function \u2018d_ctor_dtor_name\u2019 while reading counter \u2018arcs\u2019

 [-Werror=coverage-mismatch]

 ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is

 20 instead of 18

 ../../google-main/libiberty/cp-demangle.c: In function

 \u2018d_operator_name\u2019:

 ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for

 function \u2018d_operator_name\u2019 while reading counter \u2018arcs\u2019

 [-Werror=coverage-mismatch]

 ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is

 21 instead of 20

 ../../google-main/libiberty/cp-demangle.c: In function 
 \u2018d_make_name\u2019:

 ../../google-main/libiberty/cp-demangle.c:5172:1: error: coverage mismatch for

 function \u2018d_make_name\u2019 while reading counter \u2018arcs\u2019

 [-Werror=coverage-mismatch]

 ../../google-main/libiberty/cp-demangle.c:5172:1: note: number of counters is 
 5

 instead of 4

 cc1: some warnings being treated as errors

[Bug c++/55245] [4.6/4.7/4.8 Regression] Compiler segfault when compiling a small test case

2012-11-21 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245



--- Comment #5 from davidxl xinliangli at gmail dot com 2012-11-21 16:17:27 
UTC ---

Should the main variant types gets laid out in gimplify_type_sizes, when the

variant's type has size, but the main variant is incomplete? 



David

[Bug other/55376] [asan] libsanitizer/README.gcc must contain the exact steps to do code changes and to port code from upstream

2012-11-17 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55376



davidxl xinliangli at gmail dot com changed:



   What|Removed |Added



 CC||xinliangli at gmail dot com



--- Comment #1 from davidxl xinliangli at gmail dot com 2012-11-18 05:18:11 
UTC ---

(In reply to comment #0)

 We need to document the libsanitizer update process in libsanitizer/README.gcc

 

 Few things to cover:

 

 - The changes need to be approved by one of the maintainers (or is it 
 obvious)?

 - Except for *really* trivial changes all patches should go through the

 upstream tree first. 

 - When updating from upstream, the comment header should be tampered with (oh,

 my)

 - Ideally, we need to have a fully automated script to grab the upstream

 changes (including new/deleted/moved files, etc). Suggestions here? 

 - What is the testing procedure when updating from upstream? (e.g. how do we

 avoid regressions on the platforms to which the maintainers have no access?)



Are all upstream changes considered reviewed and automatically approved for gcc

repo? If yes, then this should be automated (including gcc build and libasan

testing).

[Bug c++/55245] New: Compiler segfault when compiling the small test case

2012-11-08 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245



 Bug #: 55245

   Summary: Compiler segfault when compiling the small test case

Classification: Unclassified

   Product: gcc

   Version: 4.7.3

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: xinlian...@gmail.com





Compiling the following program without any option, gcc segfaults:



get.cc: In member function 'virtual void Test2::TestBody()':

get.cc:22:27: internal compiler error: Segmentation fault





The problem is that one array type Vector2_f[2]'s size field is reset to zero

during gimplify_type_sizes by one of its variant (which has null size).



The problem does not show up in trunk compiler.  Was this fixed explicitly in

trunk or it happens to work by luck?



David







template typename VType class Vector2 { };

typedef Vector2float Vector2_f;



void GetR( const Vector2_f mosaic_position[3]);



class Test1 {

 private: virtual void TestBody();

};



void Test1::TestBody() {

  Vector2_f mosaic_position[2][1];   // (1)

}



class Test2 {

 private: virtual void TestBody();

};



int tri;

void Test2::TestBody() {

  Vector2_f mosaic_position[2][3] = { };

  GetR(mosaic_position[tri]);

}

[Bug c++/55245] Compiler segfault when compiling the small test case

2012-11-08 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245



davidxl xinliangli at gmail dot com changed:



   What|Removed |Added



Version|4.7.3   |4.8.0



--- Comment #1 from davidxl xinliangli at gmail dot com 2012-11-09 00:42:55 
UTC ---

The problem actually exists in main line compiler too.

[Bug c++/55245] Compiler segfault when compiling the small test case

2012-11-08 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55245



--- Comment #2 from davidxl xinliangli at gmail dot com 2012-11-09 06:05:43 
UTC ---

(In reply to comment #1)

 The problem actually exists in main line compiler too.





This is another test case.



Segfaults without option, but builds ok with -DOK.



The problem seems to be that the main variant of the array type

Vector2float[2] does not have size, while the variant Vector2_f[2] has. The

variant's size later gets reset to 0 (from main variant).



David



template typename VType class Vector2 {

 public:

  Vector2();

  VType x() const;

};

typedef Vector2float Vector2_f;

void GetR(const Vector2_f mosaic_position[3]);

template class C struct DefaultDeleter {

   float  GetColorTexCoord(Vector2_f tex_coord) const  {

return tex_coord.x();

  }

};



class GetT { private: virtual void TestBody(); };



void GetT::TestBody() {

#ifdef OK

  Vector2float mosaic_position[2][3] = { };

#else

  Vector2_f mosaic_position[2][3] = { };

#endif

}

[Bug middle-end/35310] Late struct expansion -- missing PRE (2)

2012-10-31 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35310



--- Comment #2 from davidxl xinliangli at gmail dot com 2012-10-31 21:17:27 
UTC ---

GCC is still not generating good code here, neither does ICC.



However LLVM does a good job here.



David







(In reply to comment #1)

 Confirmed.  Looks like something for postreload-gcse to handle.  Before that,

 there are no partial redundancies in the RTL (at least, not in the quick look 
 I

 gave it).

[Bug middle-end/35305] Speculative PRE support missing

2012-10-30 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35305



--- Comment #4 from davidxl xinliangli at gmail dot com 2012-10-30 18:57:18 
UTC ---

The suggested transformation can be useful in some cases, but won't be general

enough. The listed example is an extreme case. For instance, the second a+b

instance does not have to be in the hot trace but still still hot enough to be

PREed. Or there is no one dominating traces available -- the expression is

available in all incoming paths except for one rare path.



switch (a)

{

   case 1:

 g[1] = a + b;

 break;

   case 2:

 g[2] = a + b;

 break;

   case 3: 

 g[3] = a + b;

   ...

   default:

 g[0] = 0;

 }



switch (b)

{

   case 1:

 ... a + b; // partially redundant

   case 2:

  ... a + b; // redundant



   default:

 // does nothing

 break;

}





Regarding handling dereferences, the availability and down-safety analysis

needs to be extended to to recognize safe speculation candidates.





Example 1:  dereference of same pointer fully available -- 



int g1, g2;

struct A

{

  int a;

  int b;

};



void foo(struct A* ap, int k,int m)

{



   if (__builtin_expect (k, 1))

  g1 = ap-b;

   else

  g2 = ap-a;





   if (__builtin_expect (m, 1))

   {

   g2 = ap-b;// Good safe speculative PRE candidate

   }

}





Example 2:  deference of ap fully anticipated







int g1, g2;

struct A

{

  int a;

  int b;

};



void foo(struct A* ap, int k,int m)

{



   if (__builtin_expect (k, 1))

  g1 = ap-b;





   if (__builtin_expect (m, 1))

   g2 = ap-b; // Safe to speculatively hoist across the branch

into the else of the previous branch

   else

   g1 = ap-a;

}





David





(In reply to comment #3)

 Wouldn't this be a candidate for forming a superblock from hot traces of

 a function?  Thus in the testcase

 

   if (k  m)

 {

   g1 = a + b;

   g2 = a + b;

 }

   else

 {

 ... old code

 }

 

 which would also handle the case where we cannot speculatively move code

 (like dereferences)?

[Bug tree-optimization/35357] Loop peeling not happening

2012-10-26 Thread xinliangli at gmail dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35357



--- Comment #2 from davidxl xinliangli at gmail dot com 2012-10-26 16:55:48 
UTC ---

(In reply to comment #1)

 What is the expected optimization and what is its benefit?



Should be transformed into the following the simplified loop body.

void foo(int n)

{

  int i;



  if (n  0) {

   for ( i = 0; i  n; i++)

{

   b[i*2] = a[3*i+1];

   b[i*2] +=1; 

}

b[i*2] = a[3*i + 1];

  }

}



Icc at O2 distributes the loop body:



 for ( i = 0; i = n; i++)

  {

   b[i*2] = a[3*i+1]; 

  }

for ( i = 0; i  n; i++)

  {

   b[i*2] += 1; 

  }

[Bug c++/53845] Another error reporting routines re-entered issue

2012-07-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53845

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #2 from davidxl xinliangli at gmail dot com 2012-07-13 00:10:07 
UTC ---
(In reply to comment #1)
 push_tinst_level tries to error out even when called (during diagnostics) by a
 deduction_tsubst_fntype which is passed a complain not including tf_error.


Similar ICE can also be triggered in many cases with -fdump-tree-all option.
The dumper tries to print template parameters which triggers template
instantiation (odd). The instantiation may fail for unknown reason causing the
ICE.

David

[Bug c++/53605] [4.7 Regression] Compiler ICEs in size_binop_loc

2012-06-13 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53605

--- Comment #7 from davidxl xinliangli at gmail dot com 2012-06-13 22:32:20 
UTC ---
thanks for the fix. Is the fix going to be in gcc-4_7 branch?

David

[Bug c++/53605] New: Compiler ICEs in size_binop_loc

2012-06-07 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53605

 Bug #: 53605
   Summary: Compiler ICEs in size_binop_loc
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


Compile the small program with gcc (trunk or 4-7 compiler), the compiler will
ICE.


template bool lhs_is_null_literal
class EqHelper {
 public:
  template typename T1, typename T2
  static int  Compare( const T1 expected,
 const T2 actual);
};
void foo(){
  static const int kData[] = {};
 ::EqHelperfalse::Compare(kData, abc);
}

This is a regression from gcc4_6.

David

[Bug middle-end/49363] [feature request] multiple target attribute (and runtime dispatching based on cpuid)

2012-05-10 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363

--- Comment #20 from davidxl xinliangli at gmail dot com 2012-05-10 20:45:58 
UTC ---
Auto Cloning is an independent feature. What is requested here is an extension
of a previous auto clone patch by Sri (which is based on command line option
-mversion=...).

David


(In reply to comment #18)
 On Thu, May 10, 2012 at 3:16 AM, vincenzo.innocente at cern dot ch
 gcc-bugzi...@gcc.gnu.org wrote:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363
 
  --- Comment #17 from vincenzo Innocente vincenzo.innocente at cern dot ch 
  2012-05-10 10:16:13 UTC ---
  I tested this
 
  float x[1024], y[1024], z[1024], w[1024];
 
  void foo() {
   for (int i=0; i!=1024; ++i)
      x[i]=y[i]*z[i]+w[i];
  }
 
 
  void __attribute__ ((target(arch=corei7))) foo() {
   for (int i=0; i!=1024; ++i)
      x[i]=y[i]*z[i]+w[i];
  }
 
  void __attribute__ ((target(avx))) foo() {
   for (int i=0; i!=1024; ++i)
      x[i]=y[i]*z[i]+w[i];
  }
 
 
  and see the three versions generated  + the resolver.
 
  As you notice the source code is identical as I'm exploiting compiler
  autovectorization here.
  In this case I was hoping that a single declaration such as  __attribute__
  ((target(arch=corei7,avx)))
  or  __attribute__ ((target(arch=corei7),target(avx))) would generate the 
  two
  versions w/o hang to duplicate the source code.
  Is this possible to support?
 
 Yes, this is on the list of things to add. The front-end should clone
 the bodies and tag the attributes appropriately.
 
 Thanks,
 -Sri,
 
 
 
  --
  Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
  --- You are receiving this mail because: ---
  You are on the CC list for the bug.

[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals

2012-05-07 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220

--- Comment #5 from davidxl xinliangli at gmail dot com 2012-05-07 16:18:13 
UTC ---
So it is possible either 
1) to keep the current G++ semantics of compound literals, but change its
behavior due to the implementation change (with clobber marker); 
or
2) to change hte G++ semantics to match C semantic, but keep the compiler
behavior the same

Which way to go? If we go for 1), we probably just need to document this
behavior better in GCC, and let user change their code.

David

[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals

2012-05-07 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220

--- Comment #7 from davidxl xinliangli at gmail dot com 2012-05-07 17:03:51 
UTC ---
Yes, the array case should be warned or disallowed if 1 is the way to go.

I won't call it a lousy choice -- the C++ semantics of the compound literals
allow more agressive optimization and smaller stack usage.

David



(In reply to comment #6)
 (In reply to comment #5)
 
  1) to keep the current G++ semantics of compound literals, but change its
  behavior due to the implementation change (with clobber marker); 
 
 I would argue that 1 is completely useless for you can also construct an
 array use case from http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html
 
 It always initializes the pointer with dangling storage, and is always a bug.
 
 If keep the current g++ semantics, then the code should be rejected at
 compile time, and should *not* work when built without optimization.
 
 IMO, having this code working in C and not working in C++ is a lousy choice.

[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals

2012-05-07 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220

--- Comment #9 from davidxl xinliangli at gmail dot com 2012-05-08 00:16:30 
UTC ---

c++11 defines the lifetime of a temporary -- does it match C or g++'s semantics
of compound literals or neither?

Note that without your change, the original program may also be subject to
runtime failures due to escaped life ranges of the scoped variables leading to
bad stack layout -- though such bugs are more subtle and less likely to be
triggered.

David


(In reply to comment #8)
 The thing is, C++11 introduces list-initialized temporaries; I could rewrite
 the testcase in C++11 as
 
 extern C int printf (const char *, ...);
 
 int main()
 {
   typedef int AR[4];
   for (int *p = AR{1,2,3,0}; *p; ++p)
 {
   printf (%d\n, *p);
 }
   return 0;
 }
 
 so it made sense to me for compound literals to have the same semantics;
 otherwise you have a difference in lifetime based on whether or not the type 
 is
 wrapped in parentheses.
 
 I definitely agree that we need to give a diagnostic about taking the address
 of a temporary here.

[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals

2012-05-04 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #3 from davidxl xinliangli at gmail dot com 2012-05-04 18:52:21 
UTC ---
What is the right scope for the compound literals? Is the clobber correctly
added or the program is ill formed?

David

[Bug middle-end/53090] New: suboptimal ivopt

2012-04-23 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53090

 Bug #: 53090
   Summary: suboptimal ivopt
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


Compiling the attached benchmark code with trunk gcc, the code generated for
the hot memory swap loop (line 60) is very inefficient : both icc and llvm use
two ivs and generate a tight loop with 9 instructions, but gcc decides to use 3
ivs, and the loop exit testing code is wierd and inefficient -- it ends up
produce a loop with 11 instructions.

#define XCH(x,y){ Aint t_mp; t_mp=(x); (x)=(y); (y)=t_mp; }

for( i=1, j=k-1 ; ij ; ++i, --j ) {
XCH(perm[i], perm[j])
}

The tight version:


.LBB0_11:   # %for.body57.i
#   Parent Loop BB0_1 Depth=1
# Parent Loop BB0_9 Depth=2
# =This Inner Loop Header: Depth=3
movl(%rbx,%rdi,4), %ebp
movl(%rbx,%rsi,4), %eax
movl%eax, (%rbx,%rdi,4)
movl%ebp, (%rbx,%rsi,4)
decq%rsi
incq%rdi
cmpl%edx, %edi
leal-1(%rdx), %edx
jl.LBB0_11


The gcc version:

.L18:
movl(%rdx), %edi
addl$1, %ecx
movl(%rsi), %eax
movl%eax, (%rdx)
addq$4, %rdx
movl%edi, (%rsi)
movl%r8d, %edi
subq$4, %rsi
subl%ecx, %edi
cmpl%edi, %ecx
jl.L18


However gcc is doing the right thing when applied on the extracted test case:

#define XCH(x,y) { int t_mp; t_mp=(x); (x)=(y); (y)=t_mp; }

void foo (int *perm, int k)
{
 int i,j;
 for( i=1, j=k-1 ; ij ; ++i, --j ) {
XCH(perm[i], perm[j])
 }
}

[Bug middle-end/53090] suboptimal ivopt

2012-04-23 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53090

--- Comment #1 from davidxl xinliangli at gmail dot com 2012-04-23 17:37:40 
UTC ---
Created attachment 27223
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27223
benchmark source

[Bug middle-end/53081] New: memcpy/memset loop recognition

2012-04-22 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081

 Bug #: 53081
   Summary: memcpy/memset loop recognition
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


Both LLVM and icc recognize initialization and copy loop and synthesize calls
to memcpy and memset.  memmove call can also be synthesized when src/target may
overlap.

Option needs to provided to disable such optimization in signal handlers.

I consider this as optimization for benchmarking ;) For instance, the prime
number finder program sieve.c is one of the benchmarks in LLVM. Both LLVM and
icc beats gcc in this one because of the missing optimization.


#ifndef T
#define T int
#endif

T arr[1000];

void foo(int n)
{
  int i;
  for (i = 0; i  n; i++)
{
  arr[i] = 0;
}
}

void foo2(int n, T* p)
{
  int i;
  for (i = 0; i  n; i++)
{
  *p++ = 0;
}
}

#ifndef T
#define T int
#endif

T arr[1000];
T arr2[1000];

void foo(int n)
{
  int i;
  for (i = 0; i  n; i++)
{
  arr[i] = arr2[i];
}
}


// sieve.c

/* -*- mode: c -*-
 * $Id: sieve.c 36673 2007-05-03 16:55:46Z laurov $
 * http://www.bagley.org/~doug/shootout/
 */

#include stdio.h
#include stdlib.h

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 17000
#else
#define LENGTH 17
#endif
int NUM = ((argc == 2) ? atoi(argv[1]) : LENGTH);
static char flags[8192 + 1];
long i, k;
int count = 0;

while (NUM--) {
count = 0; 
for (i=2; i = 8192; i++) {
flags[i] = 1;
}
for (i=2; i = 8192; i++) {
if (flags[i]) {
/* remove all multiples of prime: i */
for (k=i+i; k = 8192; k+=i) {
flags[k] = 0;
}
count++;
}
}
}
printf(Count: %d\n, count);
return(0);
}

[Bug middle-end/53081] memcpy/memset loop recognition

2012-04-22 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081

--- Comment #3 from davidxl xinliangli at gmail dot com 2012-04-23 05:34:24 
UTC ---
(In reply to comment #2)
 I should mention I made one patch before based on the vectorizer code which 
 did
 detection of at least memset; it was while I was an intern at Apple.  I posted
 it and there was some review.  And a few years back there was a paper at the
 GCC summit about it and expanding it to memcpy.  I don't know what happened to
 that code though.

Some simple analysis using scev to identify loads and stores with linear
address should be good enough. 

LLVM's version also tries to merge smaller memsets into a larger one.

[Bug middle-end/53081] memcpy/memset loop recognition

2012-04-22 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53081

--- Comment #4 from davidxl xinliangli at gmail dot com 2012-04-23 05:34:55 
UTC ---
(In reply to comment #2)
 I should mention I made one patch before based on the vectorizer code which 
 did
 detection of at least memset; it was while I was an intern at Apple.  I posted
 it and there was some review.  And a few years back there was a paper at the
 GCC summit about it and expanding it to memcpy.  I don't know what happened to
 that code though.

Some simple analysis using scev to identify loads and stores with linear
address should be good enough. 

LLVM's version also tries to merge smaller memsets into a larger one.

[Bug middle-end/53082] New: local malloc/free optimization

2012-04-22 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53082

 Bug #: 53082
   Summary: local malloc/free optimization
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


Malloc/free pairs sometimes are used in a way that behaves like a stack memory.
The optimizer should recognize the pattern and take advantage of it. For
instance if the memory is not escaped, dead stores to the memory can be
removed. In extreme case, the malloc/free pair can be completely removed.

For instance:

for (i = 0; i  100; i++)
  {
int *s = (int*) malloc (sizeof(int));
*s = 10;
free (s);
   }

This code can be completely removed (which LLVM does).

The above can be considered as a benchmarking optimization -- see the one
appended (objinst.c in LLVM's benchmark suite).

The following more advanced optimization (which as far as I know only HP
compiler does) can also be done: coalescing multiple malloc calls into one
alloca call, and eliminating the free calls. The memory can be escaped.

for (...)
 {
   int *s1 = malloc (...);
   int *s2 = malloc (...);
   int *s3 = malloc (...);
   ...

   free(s1); free(s2); free(s3);
}

==

for (...)
  {
 int s = alloca (...);
 s1 = s;
 s2 = s+ ..
 s3 = s+ ..

...
  }



// objinst.c


/* -*- mode: c -*-
 * $Id: objinst.c 36673 2007-05-03 16:55:46Z laurov $
 * http://www.bagley.org/~doug/shootout/
 */

#include stdio.h
#include stdlib.h


enum {false, true};

#define TOGGLE \
char state; \
char (*value)(struct Toggle *); \
struct Toggle *(*activate)(struct Toggle *)

#define DESTROY  free

typedef struct Toggle {
TOGGLE;
} Toggle;

char toggle_value(Toggle *this) {
return(this-state);
}
Toggle *toggle_activate(Toggle *this) {
this-state = !this-state;
return(this);
}
Toggle *init_Toggle(Toggle *this, char start_state) {
this-state = start_state;
this-value = toggle_value;
this-activate = toggle_activate;
return(this);
}
Toggle *new_Toggle(char start_state) {
Toggle *this = (Toggle *)malloc(sizeof(Toggle));
return(init_Toggle(this, start_state));
}


typedef struct NthToggle {
Toggle base;
int count_max;
int counter;
} NthToggle;

NthToggle *nth_toggle_activate(NthToggle *this) {
if (++this-counter = this-count_max) {
this-base.state = !this-base.state;
this-counter = 0;
}
return(this);

}
NthToggle *init_NthToggle(NthToggle *this, int max_count) {
this-count_max = max_count;
this-counter = 0;
this-base.activate = (Toggle *(*)(Toggle *))nth_toggle_activate;
return(this);
}
NthToggle *new_NthToggle(char start_state, int max_count) {
NthToggle *this = (NthToggle *)malloc(sizeof(NthToggle));
this = (NthToggle *)init_Toggle((Toggle *)this, start_state);
return(init_NthToggle(this, max_count));
}


int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 700
#else
#define LENGTH 7000
#endif
int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
Toggle *tog;
NthToggle *ntog;

tog = new_Toggle(true);
for (i=0; i5; i++) {
puts((tog-activate(tog)-value(tog)) ? true : false);
}
DESTROY(tog);
for (i=0; in; i++) {
tog = new_Toggle(true);
DESTROY(tog);
}

puts();

ntog = new_NthToggle(true, 3);
for (i=0; i8; i++) {
const char *Msg;
if (ntog-base.activate((Toggle*)ntog)-value((Toggle*)ntog))
  Msg = true;
else
  Msg = false;
puts(Msg);
}
DESTROY(ntog);
for (i=0; in; i++) {
ntog = new_NthToggle(true, 3);
DESTROY(ntog);
}
return 0;
}

[Bug c++/52957] Missing suggestions on '=' and '==' confusion

2012-04-14 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957

--- Comment #11 from davidxl xinliangli at gmail dot com 2012-04-15 04:09:53 
UTC ---
(In reply to comment #7)
   are against it). I don't think there is any GCC contributor that is paid 
   to
   improve GCC's diagnostics (am I wrong?).
  
  Right, but partially because it is a hard thing to do improve and the ROI is
  low.
 
 It is not *technically* hard. Anyone that knows C and a bit of C++ can fix
 hundreds (if you know C++ well and have a some experience with the C++ parser,
 you can probably fix thousands of bugs on your own, each fix taking no more
 than a couple of hours of debugging and few minutes of programming). However,
 it is time-consuming because of all the bureaucracy: copyright assignment,
 setting up the environment, bootstrapping+testing, patch submission, patch
 review, communication delays, style nits, changelogs nits, and updating the
 testsuite. That is why even current contributors often don't bother with
 one-liners, because all the associated tasks can make a one minute change
 require days of effort.

For many compiler hobbyists,  the bureaucratic issues you mentioned don't exist
yet before they start hacking on GCC FE. If the learning curve is steep and the
interfaces are obscure, GCC will be less appealing to them. There is a
difference between 'can do' and 'want to do'.

 
   This is very different from Clang,
   where Chris obviously cares a great deal about Clang's diagnostics, and it
   seems to be supported by a team in Apple (and in Google?).
  
  Yes, and in Google, but do you know why? Because it is easy to improve their
  diagnostics and benefit can be quickly seen.
 
 But why is it easy? For example, the C++ FE, although I don't like many of its
 design decisions, is quite easy to hack.
 

Clang can be used to do lots of interesting stuff -- such as source code
indexing (for semantic based code search), complicated automatic code
refactoring etc. The Clang's design makes it possible because 1) source
location information is well preserved; 2) information about macros are
preserved; 3) there is no on the fly constant folding etc.

 And I don't think this is the reason. GCC is obviously easy to hack for their
 current developers, but they focus on different things than the Clang/LLVM
 developers. The whole attitude is quite different, for better or worse, and in
 terms of popularity, the LLVM crowd wins (of course, the GCC crowd may say 
 that
 this is not a popularity contest, but if you want to attract new developers,
 who now have a choice, then it is).
 
   However, there are a lot of them, and getting
   patches reviewed and accepted requires herculean amounts of perseverance. 
   But
   probably you can share with me more about your (or Googlers) thoughts on 
   the
   latter, since you have the experience of submitting the same project to 
   both
   LLVM and GCC.
  
  Infrastructure makes a big difference here -- see more about this below.
 
 Indeed, but my point is that moving to C++ does little (in my opinion) to fix
 the infrastructure issues in GCC: dejagnu is awful, the wiki is unmaintained,
 patch submission is a pain, patch review is a bottleneck, no automatic style
 checks, useless obligatory changelogs, no marketing, poor documentation for
 beginners, poor coordination with other projects (binutils, gdb, glibc), etc.
 (and, worse, the expectation that new/non-paid contributors should take care 
 of
 fixing these issues).
 
   
   Nothing of the above gets fixed by switching to C++ or by having a clearly
   defined AST...
  
  I think they are clearly related.
 
 In any case, I hope you are right, and the move to C++ helps. I imagine that 
 it
 can help to attract more corporate contributors, who don't have to bother with
 legal paperwork and are paid to do also boring bits. For sporadic and/or
 non-paid contributors, there are other bigger issues to solve.

Moving to C++ itself won't do anything useful -- it is the new design as a
result of moving to C++ that will make a difference.

David

[Bug c++/52954] New: Missing bounds check warning

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52954

 Bug #: 52954
   Summary: Missing bounds check warning
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


int foo() {
  int arr[100];
  // ...
 return arr[100];
 }

 /*  From clang:

 array.cpp:4:10: warning: array index 100 is past the end of the array
(which contains 100 elements) [-Warray-bounds]
 return arr[100];
^   ~~~
array.cpp:2:3: note: array 'arr' declared here
 int arr[100];
 ^
1 warning generated.

*/

[Bug c++/52954] Missing bounds check warning

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52954

--- Comment #1 from davidxl xinliangli at gmail dot com 2012-04-12 21:56:07 
UTC ---
Another example:

constexpr int arr_size = 42;
 constexpr int N = 44;
 void f(int);
 int test() {
   int arr[arr_size];
  // ...
   f(arr[N]);
   // ...
  if (N  arr_size) return arr[N];
  return 0;
 }

/*

deadcode.cpp:7:5: warning: array index 44 is past the end of the array
(which contains 42 elements) [-Warray-bounds]
 f(arr[N]);
   ^   ~
deadcode.cpp:5:3: note: array 'arr' declared here
 int arr[arr_size];
 ^
1 warning generated.

*/

[Bug c++/52955] New: Missing warning on wrong sizeof usage

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52955

 Bug #: 52955
   Summary: Missing warning on wrong sizeof usage
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


Example:


 #include memory.h
 struct S { int a, b, c; float vec[16]; };
 void test(S *s) {
  // ...
  memset(s, 0, sizeof(s));
  // ...
 }

 /*

memset.cpp:5:23: warning: argument to 'sizeof' in 'memset' call is the
same expression as the destination; did you mean to dereference it? [-
Wsizeof-pointer-memaccess]
 memset(s, 0, sizeof(s));
~^
1 warning generated.

*/

[Bug c++/52956] New: Missing overflow warning

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52956

 Bug #: 52956
   Summary: Missing overflow warning
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


static const long long
  DiskCacheSize = 8  30; // 8 Gigs

/*

overflow.cpp:1:42: warning: signed shift result (0x2) requires 35
bits to represent, but 'int' only has 32 bits [-Wshift-overflow]
static const long long DiskCacheSize = 8  30; // 8 Gigs
  ~ ^  ~~
1 warning generated.

*/

[Bug c++/52957] New: Missing suggestions on '=' and '==' confusion

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957

 Bug #: 52957
   Summary: Missing suggestions on '=' and '==' confusion
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


void foo();
 void bar();
 void test(int i) {
  // ...
 if (i = 42) foo();
  else bar();
 }

 /*

 parentheses1.cpp:5:9: warning: using the result of an assignment as a
condition without parentheses [-Wparentheses]
 if (i = 42) foo();
 ~~^~~~
parentheses1.cpp:5:9: note: place parentheses around the assignment to
silence this warning
 if (i = 42) foo();
   ^
 ( )
parentheses1.cpp:5:9: note: use '==' to turn this assignment into an
equality comparison
 if (i = 42) foo();
   ^
   ==
1 warning generated.

*/

[Bug c++/52958] New: Missing warning on missed parehthesis

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52958

 Bug #: 52958
   Summary: Missing warning on missed parehthesis
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


int test(bool b, int x, int y) {
  return 42 + b ? x : y;
 }

/*

parentheses4.cpp:2:17: warning: operator '?:' has lower precedence than
'+'; '+' will be evaluated first [-Wparentheses]
 return 42 + b ? x : y;
~~ ^
parentheses4.cpp:2:17: note: place parentheses around the '+' expression
to silence this warning
 return 42 + b ? x : y;
   ^
( )
parentheses4.cpp:2:17: note: place parentheses around the '?:' expression
to evaluate it first
 return 42 + b ? x : y;
   ^
 ()
1 warning generated.

*/

[Bug c++/52959] New: Missing typo suggestions

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52959

 Bug #: 52959
   Summary: Missing typo suggestions
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


struct BaseType {};
struct DerivedType : public BaseType {
 static int base_type;
 DerivedType() : basetype() {}
};


/*

typo1.cpp:4:19: error: initializer 'basetype' does not name a non-static
data member or base class; did you mean the base class 'BaseType'?
 DerivedType() : basetype() {}
 ^~~~
 BaseType
typo1.cpp:2:22: note: base class 'BaseType' specified here
struct DerivedType : public BaseType {
^~~
1 error generated.

*/


namespace fiz { namespace bang {
  int foobar();
 } }
 int test() {
  return bang::Foobar();
 }

/*

typo2.cpp:5:10: error: use of undeclared identifier 'bang'; did you mean
'fiz::bang'?
 return bang::Foobar();
^~~~
fiz::bang
typo2.cpp:1:27: note: 'fiz::bang' declared here
namespace fiz { namespace bang {
 ^
typo2.cpp:5:16: error: no member named 'Foobar' in namespace 'fiz::bang';
did you mean 'foobar'?
 return bang::Foobar();
~~^~
  foobar
typo2.cpp:2:7: note: 'foobar' declared here
 int foobar();
 ^
2 errors generated.

*/

[Bug c++/52960] New: Missing warnings on ambiguous source : function decl vs local var decl

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52960

 Bug #: 52960
   Summary: Missing warnings on ambiguous source : function decl
vs local var decl
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


struct S { int n; };
 S bar();
 S test() {
  S foo,
  bar();
  // ...
 return foo;
 }

/*

conversions.cppreturn.cpvexing.cpp
vexing.cpp:5:6: warning: empty parentheses interpreted as a function
declaration [-Wvexing-parse]
 bar();
^~
vexing.cpp:4:8: note: change this ',' to a ';' to call 'bar'
 S foo,
  ^
  ;
vexing.cpp:5:6: note: replace parentheses with an initializer to declare
a variable
 bar();
^~
{}
1 warning generated.

*/

[Bug c++/52961] New: Missing warning on empty if

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52961

 Bug #: 52961
   Summary: Missing warning on empty if
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


int test(int i) {
  if (i  42);
return 42;
  return i;
 }

 /*

 emptyif.cpp:2:14: warning: if statement has empty body [-Wempty-body]
 if (i  42);
^
1 warning generated.

*/

[Bug c++/52962] New: Column number incorrect in error

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52962

 Bug #: 52962
   Summary: Column number incorrect in error
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


#define M1(x, y, z) y();
#define M2(x, y, z) M1(x, y, z)
void test() {
  M2(a, b, c);
}

/* = Clang

macros1.cpp:4:9: error: use of undeclared identifier 'b'
 M2(a, b, c);
   ^
macros1.cpp:2:27: note: expanded from macro 'M2'
#define M2(x, y, z) M1(x, y, z)
 ^
macros1.cpp:1:21: note: expanded from macro 'M1'
#define M1(x, y, z) y();
   ^
1 error generated.

 */

[Bug c++/52963] New: Missing error on pack expansion

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52963

 Bug #: 52963
   Summary: Missing error on pack expansion
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


template class T, class U struct S {};
  template typename... Ts void f(Ts...);

  template typename... Ts struct X {
template typename... Us void g() {
  f(STs, Us()...);
}
  };

 void test(Xint, float x) {
   x.gint, float, double();
 }


/*

oscar-wilde.cpp:6:18: error: pack expansion contains parameter packs 'Ts'
and 'Us' that have different lengths (2 vs. 3)
f(STs, Us()...);
~~  ~~   ^
oscar-wilde.cpp:11:5: note: in instantiation of function template
specialization 'Xint, float::gint, float, double' requested here
  x.gint, float, double();
^
1 error generated.

*/

[Bug c++/52964] New: No warning on negative array size in template instantatiation

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52964

 Bug #: 52964
   Summary: No warning on negative array size in template
instantatiation
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


template int N struct S1 { int arr[N - 5]; };
template typename T struct S2 { S1sizeof(T) s1; };
template typename T void foo(T x) { S2T s2; }
void test() { foo(42); }

/* From Clang:
templates1.cpp:1:38: error: 'arr' declared as an array with a negative
size
template int N struct S1 { int arr[N - 5]; };
^
templates1.cpp:2:49: note: in instantiation of template class 'S14'
requested here
template typename T struct S2 { S1sizeof(T) s1; };
   ^
templates1.cpp:3:45: note: in instantiation of template class 'S2int'
requested here
template typename T void foo(T x) { S2T s2; }
   ^
templates1.cpp:4:15: note: in instantiation of function template
specialization 'fooint' requested here
void test() { foo(42); }
== */

[Bug c++/52961] Missing warning on empty if

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52961

--- Comment #3 from davidxl xinliangli at gmail dot com 2012-04-12 23:11:10 
UTC ---
(In reply to comment #2)
 Yes, see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36478 .

thanks. I tried -Wempty-body, gcc gives warning as expected:


emptyif.cpp: In function 'int test(int)':
emptyif.cpp:2:15: warning: suggest braces around empty body in an 'if'
statement [-Wempty-body]
if (i  42);

[Bug c++/52964] No warning on negative array size in template instantatiation

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52964

--- Comment #3 from davidxl xinliangli at gmail dot com 2012-04-12 23:19:33 
UTC ---
(In reply to comment #2)
 And with trunk we print:
 
 pr52964.cc: In instantiation of ‘struct S14’:
 pr52964.cc:2:49:   required from ‘struct S2int’
 pr52964.cc:3:45:   required from ‘void foo(T) [with T = int]’
 pr52964.cc:4:21:   required from here
 pr52964.cc:1:43: error: size of array is negative
  template int N struct S1 { int arr[N - 5]; };
^
 
 We could easily print the caret for each line. And the locations could be much
 better.
 
 I also like more the order of Clang and the extra info about what each thing
 is, but we would have to convince Gabriel and/or Jason.
 
 David, what else do you see that could be improved?

With -fsyntax-only, the warning is not emitted -- any reason for that?

David

[Bug c++/52957] Missing suggestions on '=' and '==' confusion

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957

--- Comment #4 from davidxl xinliangli at gmail dot com 2012-04-12 23:28:38 
UTC ---
(In reply to comment #3)
 Yep, a dup.
 
 I am sorry David, but many of these deficiencies are well-known. What GCC
 sorely needs is people working on fixing them, and that is an issue that has 
 no
 easy fix and it is only going to get worse. :-(
 
 *** This bug has been marked as a duplicate of bug 29280 ***

This brings up the issue of programmer productivity where Clang can claim that
their C++ based code base and clearly defined AST can make fixing things like
this easier .. 

David

[Bug c++/52957] Missing suggestions on '=' and '==' confusion

2012-04-12 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52957

--- Comment #6 from davidxl xinliangli at gmail dot com 2012-04-13 05:01:56 
UTC ---
I think you indirectly proved my point (see below) :)

(In reply to comment #5)
 Maintainers (those who decide) are too few and they either do not care or do
 not have time to fix these things. Existing or new contributors that are paid
 to do something specific don't care about anything else as long as it doesn't
 interfere with their job (if it does, whatever improvement it may be, then 
 they
 are against it). I don't think there is any GCC contributor that is paid to
 improve GCC's diagnostics (am I wrong?).

Right, but partially because it is a hard thing to do improve and the ROI is
low.

 This is very different from Clang,
 where Chris obviously cares a great deal about Clang's diagnostics, and it
 seems to be supported by a team in Apple (and in Google?).

Yes, and in Google, but do you know why? Because it is easy to improve their
diagnostics and benefit can be quickly seen.

 
 Contributors to GCC that care about these things, either unpaid or in their
 free time, are very few and completely overworked. It is true that the GCC 
 code
 base has a huge technical debt, but diagnostic fixes tend to be easy to fix, 
 if
 you have a little of experience. 

But Clang is even easier to hack on. Any person without any compiler background
can build tools on top of it. In other words, the bar of working on compiler
becomes very low -- and this is really exciting and attractive for developers
who used to think it as untouchable area to work on.

 However, there are a lot of them, and getting
 patches reviewed and accepted requires herculean amounts of perseverance. But
 probably you can share with me more about your (or Googlers) thoughts on the
 latter, since you have the experience of submitting the same project to both
 LLVM and GCC.

Infrastructure makes a big difference here -- see more about this below.

 
 New contributors to GCC are almost non-existent (except those that are paid to
 work on very specific stuff, like a new port, but these new contributors also
 don't care about GCC in general, as long as it doesn't interfere with their
 job). The reasons why GCC fails to attract new contributors are very
 well-known: the copyright assignment is a major hurdle for casual/unpaid
 contributors, patches from new contributors are routinely ignored and the 
 patch
 review process is hostile to new contributors (in reality, much less than in
 the Linux kernel, but they have enough developers to afford being hostile).

I am aware of this.

 Interacting for the first time with GCC developers can easily discourage a lot
 of people (there is a whole page just on the subject!
 http://gcc.gnu.org/wiki/GCC_Research). In Clang for example, small obvious
 fixes proposed by new contributors without any changelog or even without a
 patch are often committed immediately. I have never seen this in +6 years
 contributing to GCC, on the contrary, the contributor will be asked to
 bootstrap, regtest, provide a Changelog, fix the Changelog, provide a patch,
 read the code style, fix the code style, some bikeshedding, and between each
 step, there may be weeks of no feedback at all.

The root cause is still GCC's older infrastructure. Lack of modern design 
leads to weird interactions between various components, lots of hidden
assumptions, lack of good verification etc -- all these lead to the problem you
mention -- GCC patches from new contributors are bound to be buggy or to
trigger other latent bugs here and there, which means the heavy weight process
in GCC is required. Being an expert in GCC requires years of knowledge
accumulation and hostility is bound to happen (and ironically really required
to ensure quality) because qualified reviewers are so few and it just does not
scale. 

Clang can afford to lower the bar for contributors because its infrastructure.

 
 GCC also has a huge marketing problem, no overall vision, no clear leadership
 (it is easy to get contradictory answers by the people who decide), and GCC
 developers show both a lack of enthusiasm (just compare presentations by
 Clang/LLVM developers, full of amazing features, awesome graphics,
 never-seen-before numbers and LLVM webpages are full of how good their 
 compiler
 is, how much better than anything else, etc.) and a sense of superiority 

I think part of the reasons is that contributors in that community are much
younger on average (the requirement on contributor experience is lower). Young
kids tend to be very enthusiastic.

(we
 don't want to be like Clang/LLVM! It produces slower code by 0.0001%! 

I will assure you the performance difference is much more than that.

It
 doesn't support mmix-knuth-mmixware). These kind of things are essential to
 attract new contributors, however, they do not seem to provide any advantage 
 to
 existing developers, and they are seen as pointless extra work better done by,
 wait

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2012-01-11 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #28 from davidxl xinliangli at gmail dot com 2012-01-11 17:26:46 
UTC ---
See comment 24 for shorter test case.

Summary:

1) the regression reported by Oleg in gcc4_6 and earlier versions is due to FE
code generation difference which lead to the backend to generate code leading
to partial register stall.
2) the RAT stall problem is fixed in gcc4_7 
3) however in 4_7, there is a different problem -- redundant sign-extension and
move instruction is generated. It could be due to the limitation in RTL forward
propagation and combine pass to deal with multiple downward uses

David

[Bug c++/23383] builtin array operator new is not marked with malloc attribute

2012-01-05 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383

--- Comment #20 from davidxl xinliangli at gmail dot com 2012-01-05 18:11:18 
UTC ---
(In reply to comment #19)
 On Wed, 4 Jan 2012, xinliangli at gmail dot com wrote:
 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383
  
  --- Comment #18 from davidxl xinliangli at gmail dot com 2012-01-04 
  17:11:26 UTC ---
  (In reply to comment #17)
   On Wed, 4 Jan 2012, xinliangli at gmail dot com wrote:
   
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383

--- Comment #16 from davidxl xinliangli at gmail dot com 2012-01-04 
00:28:55 UTC ---
A related topic - aliasing property of realloc -- the malloc attribute 
is not
applied in the glibc header and the comment is like

/* __attribute_malloc__ is not used, because if realloc returns
   the same pointer that was passed to it, aliasing needs to be allowed
   between objects pointed by the old and new pointers.  *


It is true that the realloc can return an address is physically aliased 
with
the pointer passed to it -- but assuming no-alias by the compiler 
should cause
no harm -- as all code motions/CSEs across the realloc call will not be
possible because realloc may modify/use the memory location.


Any comment on this? 
   
   The malloc attribute assumes that the contents of the memory pointed
   to by the return value is undefined, so the comment is inaccurate
   but the malloc attribute can indeed be not used.
  
  Which part of the optimizer takes advantage of the 'undefinedness' of 
  returned
  memory?
 
 points-to analysis.  It assumes that the returned blob of memory
 points to nothing (yet).  So for
 
  int i;
  int **p = malloc (8);
  *p = i;
  int **q = realloc (p, 8);
 
 you'd get that *q points to nothing insteda of i.

The malloc attribute documentation is confusing:

malloc
The malloc attribute is used to tell the compiler that a function may be
treated as if any non-NULL pointer it returns cannot alias any other pointer
valid when the function returns. This will often improve optimization. Standard
functions with this property include malloc and calloc. realloc-like functions
have this property as long as the old pointer is never referred to (including
comparing it to the new pointer) after the function returns a non-NULL value. 


It does not mention the undefineness explicitly.

It might be better to fix the semantics of this attribute to only specify the
aliasing property so that it can also be applied to realloc. Points-to needs to
be fixed to special case realloc for the initialization.

David


 
   We can explicitly handle REALLOC in the points-to code to honor the
   fact that it is not (we do not at the moment).
  
  ok.
 
 which needs to be fixed here.
 
 Richard.

[Bug c++/23383] builtin array operator new is not marked with malloc attribute

2012-01-05 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383

--- Comment #22 from davidxl xinliangli at gmail dot com 2012-01-05 18:54:51 
UTC ---
(In reply to comment #21)
 But can't a valid code also compare the result from realloc with the old
 pointer, and if they are equal, do something, otherwise do something else?
 I think it is pretty common e.g. if the malloced block contains pointers to
 parts of the malloced area and upon realloc that didn't return the passed
 address wants to adjust all those pointers.
 Having a malloc attribute on realloc would still break this.
 I'd say we want realloc attribute and handle it where we currently handle
 BUILT_IN_REALLOC.

Right, the case you describe may be common. On the other hand, the compiler
probably should not optimize away pointer comparisons without knowing if the
pointer is used after 'free' or 'free' like function 'realloc'.

David

[Bug c++/23383] builtin array operator new is not marked with malloc attribute

2012-01-04 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383

--- Comment #18 from davidxl xinliangli at gmail dot com 2012-01-04 17:11:26 
UTC ---
(In reply to comment #17)
 On Wed, 4 Jan 2012, xinliangli at gmail dot com wrote:
 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383
  
  --- Comment #16 from davidxl xinliangli at gmail dot com 2012-01-04 
  00:28:55 UTC ---
  A related topic - aliasing property of realloc -- the malloc attribute is 
  not
  applied in the glibc header and the comment is like
  
  /* __attribute_malloc__ is not used, because if realloc returns
 the same pointer that was passed to it, aliasing needs to be allowed
 between objects pointed by the old and new pointers.  *
  
  
  It is true that the realloc can return an address is physically aliased with
  the pointer passed to it -- but assuming no-alias by the compiler should 
  cause
  no harm -- as all code motions/CSEs across the realloc call will not be
  possible because realloc may modify/use the memory location.
  
  
  Any comment on this? 
 
 The malloc attribute assumes that the contents of the memory pointed
 to by the return value is undefined, so the comment is inaccurate
 but the malloc attribute can indeed be not used.

Which part of the optimizer takes advantage of the 'undefinedness' of returned
memory?


 
 We can explicitly handle REALLOC in the points-to code to honor the
 fact that it is not (we do not at the moment).

ok.

[Bug c++/23383] builtin array operator new is not marked with malloc attribute

2012-01-03 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383

--- Comment #16 from davidxl xinliangli at gmail dot com 2012-01-04 00:28:55 
UTC ---
A related topic - aliasing property of realloc -- the malloc attribute is not
applied in the glibc header and the comment is like

/* __attribute_malloc__ is not used, because if realloc returns
   the same pointer that was passed to it, aliasing needs to be allowed
   between objects pointed by the old and new pointers.  *


It is true that the realloc can return an address is physically aliased with
the pointer passed to it -- but assuming no-alias by the compiler should cause
no harm -- as all code motions/CSEs across the realloc call will not be
possible because realloc may modify/use the memory location.


Any comment on this? 

David


(In reply to comment #15)
 (In reply to comment #14)
  We do the exact opposite - type-based rules override points-to must-alias
  information (or really may-alias information).  Also for the proposed scheme
  to work you need to guarantee that you always can compute correct points-to
  relations (I mean, if points-to information says pt_anything and if you
  then assume must-alias and thus a conflict then you simply disable TBAA
  completely).
  
 
 Right, in general, type alias rules should override field and flow insensitive
 pointer aliasing information as they really have very low confidence level
 (especially for pt_anything case which is just a baseless guess) -- but 
 precise/trustworthy aliasing info should be checked before assertion based
 alias information and decide whether to proceed. 
 
 For example:
 
 if (no_alias_according_to_conservative_pointer_info) return no_alias;
 if (no_alias_according_to_precise_pointer_info) return no_alias;
 if (must_alias or definitely_may_alias) return may/must_alias;   (1)
 
 // now proceed with type based rules, etc.
 
 
 This is in theory. In practice, it can be tricky to tag the confidence level 
 of
 aliasing info.
 
 David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-10-24 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #20 from davidxl xinliangli at gmail dot com 2011-10-24 19:33:18 
UTC ---
The test.cpp attached seems to be the same as the old version.

David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-10-24 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #22 from davidxl xinliangli at gmail dot com 2011-10-24 19:58:23 
UTC ---
(In reply to comment #21)
 OK, just in case, here is my current test.

Preprocessed test case? I saw the main assembly difference that can explain the
performance diff, but want to make sure it is not due to your new source change
(I saw some print statement addeded).

David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-10-24 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #24 from davidxl xinliangli at gmail dot com 2011-10-24 23:00:22 
UTC ---
(In reply to comment #23)
 Here is the source preprocessed for gcc47. The test exhibits the 
 slowdown mentioned in comment 11.


The problem can be reproduced with a simplified test case -- basically
depending on how the result value from the inner loop is used in the outer loop
(related to casting), the inner loop code is quite different - in the slow
case, there are two redundant sign extension and a move instructions generated.

# the fast version
gcc -O3 -DFAST_VER bug.cpp

./a.out 
rv=4282167296

test description   absolute   operations   ratio with
number time   per second   test0

 0 int8_t constant add   1.05 sec   1523.81 M 1.00

Total absolute time for int8_t constant folding: 1.05 sec

# the slow version:

gcc -O3 bug.cpp
./a.out 
rv=4282167296

test description   absolute   operations   ratio with
number time   per second   test0

 0 int8_t constant add   1.57 sec   1019.11 M 1.00

Total absolute time for int8_t constant folding: 1.57 sec


# however, when disabling inlining of check_shifted_sum_1 in the slow case, the
runtime is recovered:

gcc -O3 -DNOINLINE bug.cpp

./a.out 
rv=4282167296

test description   absolute   operations   ratio with
number time   per second   test0

 0 int8_t constant add   1.05 sec   1523.81 M 1.00

Total absolute time for int8_t constant folding: 1.05 sec



The inner loop body in faster case:

.L60:
movzbl0(%rbp,%rcx), %r9d
addq$1, %rcx
cmpl%ecx, %ebx
leal10(%r8,%r9), %r8d
# SUCC: 4 [91.0%]  (dfs_back,can_fallthru) 5 [9.0%] 
(fallthru,can_fallthru,loop_exit)
jg.L60

while for the slow case:

.L60:
movzbl(%r12,%rcx), %eax
movsbl%r8b, %r8d
addq$1, %rcx
leal10(%rax), %r9d
movsbl%r9b, %r9d
addl%r8d, %r9d
cmpl%ecx, %ebp
movl%r9d, %r8d
# SUCC: 4 [91.0%]  (dfs_back,can_fallthru) 5 [9.0%] 
(fallthru,can_fallthru,loop_exit)
jg.L60


The relevant source change:

#ifdef NOINLINE
#define INL __attribute__((noinline))
#else
#define INL inline
#endif

template typename T, typename T2, typename Shifter
INL void check_shifted_sum_1(T2 result) {
 T temp = (T)SIZE * Shifter::do_shift((T)init_value);
 if (!tolerance_equalT((T)result,temp))
  printf(test %i failed\n, current_test);
}

#ifdef FAST_VER
#define TYPE u_int32_t
#else
#define TYPE int8_t
#endif


template typename T, typename Shifter
__attribute__((noinline)) u_int32_t test_constant(T* first, int count, const
char *label)
{
int i;
u_int32_t rv = 0;

start_timer();

for (i = 0; i  iterations; ++i) {
T result = 0;
for (int n = 0; n  count; ++n) {
result += Shifter::do_shift( first[n] );
}
rv += result;
check_shifted_sum_1T, TYPE, Shifter(result);
}

record_result( timer(), label );
return rv;
}

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-10-24 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #25 from davidxl xinliangli at gmail dot com 2011-10-24 23:02:14 
UTC ---
Created attachment 25600
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25600
test case for 47



Note that with gcc46, the result is even slower -- it has the RAT stall problem
which is fixed in 47.


David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-10-21 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #15 from davidxl xinliangli at gmail dot com 2011-10-21 23:02:16 
UTC ---
(In reply to comment #14)
 (In reply to comment #13)
  David, it looks like we are seeing different things with v4.7... See my 
  comment 11 - I am still observing the slowdown. Do you have access to 
  v4.1 and v4.6? Could you try reproducing my test please?
 
 Sorry for the delay -- I am pretty swamped these days (till mid October). I
 will try to look at the problem more then.
 
 David


I still can not reproduce the problem with trunk compiler:


rv=4282167296

test description   absolute   operations   ratio with
number time   per second   test0

 0 int8_t constant add   1.09 sec   1467.89 M 1.00

Total absolute time for int8_t constant folding: 1.09 sec


Can you attach the output of -v and the assembly file with -fverbose-asm -dA
and the optimized dump file with option -fdump-tree-optimized-blocks using
trunk compiler?

thanks,

David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-09-15 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #14 from davidxl xinliangli at gmail dot com 2011-09-15 17:28:10 
UTC ---
(In reply to comment #13)
 David, it looks like we are seeing different things with v4.7... See my 
 comment 11 - I am still observing the slowdown. Do you have access to 
 v4.1 and v4.6? Could you try reproducing my test please?

Sorry for the delay -- I am pretty swamped these days (till mid October). I
will try to look at the problem more then.

David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-08-25 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #8 from davidxl xinliangli at gmail dot com 2011-08-25 16:17:10 
UTC ---
gcc46 and gcc47 difference can be reproduced using -O2 -m64.

David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-08-24 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

davidxl xinliangli at gmail dot com changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com

--- Comment #2 from davidxl xinliangli at gmail dot com 2011-08-24 23:15:44 
UTC ---
The problem is fixed in trunk compiler:

1) with 4.6 compiler:


test description   absolute   operations   ratio with
number time   per second   test0

 0 int8_t constant add   3.29 sec   486.32 M 1.00


RAT_STALLS.registers = 288249 (sampling count 10001)


2) with trunk compiler:


test description   absolute   operations   ratio with
number time   per second   test0

 0 int8_t constant add   1.34 sec   1194.03 M 1.00

No partial register stalls from user functions.


Inner loop from trunk compiler:

.L55:
movzbl0(%rbp,%rcx), %r9d
addq$1, %rcx
cmpl%ecx, %ebx
leal10(%r8,%r9), %r8d
jg.L55


Inner loop from 46 compiler:

.L43:
addl$10, %eax
addb(%rdx), %al
addq$1, %rdx
cmpq$data8+8000, %rdx
jne.L43


RAT stalls (not precise event so the instruction causing stalls is a little
off)
   :  400e27:nopw   0x0(%rax,%rax,1)
   127  0.0440 :  400e30:add$0xa,%eax
  5869  2.0330 :  400e33:add(%rdx),%al
282125 97.7263 :  400e35:add$0x1,%rdx
   :  400e39:cmp$0x404560,%rdx
   :  400e40:jne400e30 main+0xd0


David

[Bug target/50182] Performance degradation from gcc 4.1 (x86_64)

2011-08-24 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #3 from davidxl xinliangli at gmail dot com 2011-08-25 00:13:00 
UTC ---
Caused by differences in FE generated code:

46:


D.6887 = (int) D.6886;
D.6888 = custom_constant_addsigned char::do_shift (D.6887);
D.6889 = (unsigned char) D.6888;
result.8 = (unsigned char) result;
D.6891 = D.6889 + result.8;
result = (signed char) D.6891;
n = n + 1;


trunk:


D.6938 = (int) D.6937;
D.6874 = custom_constant_addsigned char::do_shift (D.6938);
D.6939 = (int) result;   -- promoted to int
D.6940 = (int) D.6874;   ---promoted to int
D.6941 = D.6939 + D.6940;
result = (signed char) D.6941;
n = n + 1;

[Bug c++/49855] New: internal compiler error: in fold_convert_const_int_from_real

2011-07-26 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49855

   Summary: internal compiler error: in
fold_convert_const_int_from_real
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: xinlian...@gmail.com


In the following code, FE should generate FIX_TRUNC_EXPR instead of NOP_EXPR
for the float to int conversion. 

extern void foo(int);

template class Key, class Value void Basic() {

  const int kT = 1.5e6;// --- causes ICE 
  int size = kT*2/3;
  do {
foo(size);
size = size * 0.5 - 1;
  } while (size = 0 );

}

Note that removing the template specification, the right code is generated and
there is no ICE.


David

[Bug middle-end/49363] [feature request] multiple target attribute (and runtime dispatching based on cpuid)

2011-07-23 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363

--- Comment #6 from davidxl xinliangli at gmail dot com 2011-07-23 17:28:30 
UTC ---
(In reply to comment #5)
 Any news on this item? 
 Is this feature still foreseen for 4.7?
 A patch to test for instance.

Sri is working on this. He will post a formal specification of feature soon.
Part of the proposal also includes implementation of the MV runtime support
(runtime initialization to determine cpu topology and intrinsics to query) and
command line option extensions.  

David

[Bug regression/49498] [4.7 Regression]: gcc.dg/uninit-pred-8_b.c bogus warning line 20

2011-07-07 Thread xinliangli at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49498

--- Comment #9 from davidxl xinliangli at gmail dot com 2011-07-07 19:31:53 
UTC ---
On Thu, Jul 7, 2011 at 9:23 AM, law at redhat dot com 
gcc-bugzi...@gcc.gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49498

 --- Comment #8 from Jeffrey A. Law law at redhat dot com 2011-07-07
 16:22:48 UTC ---
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 07/05/11 16:52, xinliangli at gmail dot com wrote:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49498
 
  --- Comment #7 from davidxl xinliangli at gmail dot com 2011-07-05
 22:51:49 UTC ---
  (In reply to comment #6)
  Created attachment 24693 [details]
  k.c.127t.uninit
 
 
  1) is the complicated control flow generated by if-merging +
 jump-threading?
 I think jump-threading is the primarly culprit.  At one time I tried to
 track how things changed through the passes, but ultimately decided the
 end result either had to be analyzable or not by the uninit code.

  2) On the targets that have the problem, is branch cost considered cheap?
 No idea.  I didn't bother to look at why cris-elf triggers the problem,
 but x86 doesn't.  Presumably it's a branch-cost or similar issue.

  3) Looks like there are more jump-threading opportunities that can clean
 up the
  control flow, namely,
 Yes.  Jump threading is inherently an iterative process.  It was decided
 some time ago to remove the iterative step as it doesn't buy a whole lot
 in terms of code quality and it's a fairly significant compile-time cost.


Does it need to be iterative? Any simple example to show dependency of
jump-threading analysis on actual transformation of some other
jump-threading instance (e.g, opportunity only gets exposed after
transformation)?



 During my investigations noticed the same thing and manually reran DOM
 from within GDB and verified it further simplified the CFG, but
 rerunning DOM isn't really going to be an option.


Is incremental DOM update a choice?

David




 
  With the complicated control flow, the uninit analysis is definitely
 confused
  -- requires powerful symbolic evaluation to simplify the predicates: the
  predicates guarding the use in BLOCK 6:
 
  (n=9 AND  m==0) OR (n = 9 AND m != 0 AND (n =9 OR m  100)) OR (n  9
 AND m
  100) OR (n  9 AND m = 100 AND (n=9 OR m  100))
 
  of course there might be other bugs in the analysis that prevent the
 above from
  formulated, but nonetheless, it is unlikely to be handled).
 OK.  That's kindof what I needed to know -- we're unlikely to handle
 this particular case.

 
  Solution
  1) fix the test case by disabling jumpthreading and if-merging
 Or xfailing it for the affected targets.

  2) or maybe move the uninit analysis earlier in the pipeline.
 As you know, that runs the risk of introducing other false positives.

 I can live with #1 or the target-specific xfailing approach.

 jeff
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

 iQEcBAEBAgAGBQJOFd1BAAoJEBRtltQi2kC7wpsIAJ1FStgYozE/6o8o5mZrj+5V
 hWP7WmspSVFQMLTGDZw7hNeCEnE2rIl/8Tmin6/GDWM50oNGati2HqTRqTSZwU0y
 YvMY8NlH1/YY4hJ94YPEpNrALAIwD8w3qdhiPVlS7eTWgOl8iTmXLJmJqk6OnT+Z
 BeYPxpYDkQgYvicyjT4VVcWpwcmCbE/D9bKTNt68ABAH8RTmkba/VaK1wtGpt3qt
 hy0qXCi59OUPh7TbcKvgrDLL3GDmy68C9afHUiEKyfDu3JxV9awl4y4Bfr1lOURF
 bvTOuhKQo0MOlbtgJo24nGYK2TU/1Nv1DkcdhgvsCCBciO8LPoocnSZ176ohE5E=
 =/ti9
 -END PGP SIGNATURE-

 --
 Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
 --- You are receiving this mail because: ---
 You are on the CC list for the bug.

1 2 3 >

1 - 100 of 224 matches

Mail list logo