Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2013-01-25 Thread Richard Biener
On Thu, Jan 24, 2013 at 3:17 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote:


 Richard Biener wrote, On Thursday 24 January 2013 05:28 PM:


 In the program below, we have a global pointer p that has conditional
 assignments before its
 use on the RHS of a copy statement.

 -
 #includestdio.h

 int **p;
 int main()
 {
  int *x,*y,a,c,*z;
  x = a;
  z = c;
  if(a)
  p = x;
  else
  p = y;
   int **t = p;
   printf(%d,%d,*z,*x);
}

 -
 The assignment t=p does not seem to load p inspite of p being a global
 pointer. Here's
 the relevant code fragment from test.c.017t.ssa:

 
 bb 3:
p = x;
goto bb 5;

 bb 4:
p = y;

 bb 5:
t_3 = p;


 here p is loaded directly into t_3, you can consider it optimized
 (copy propagated)
 from

D.1234_8 = p;
t_3 = D.1234_8;

x.1_4 = x;
D.2524_5 = *x.1_4;
D.2525_6 = *z_1;

 ---
 Note that p is global but it has been loaded neither loaded, nor does it
 have a PHI function.


 As p resides in memory it does not need a PHI - values are merged by
 storing them to memory.


 Yes, but because it is in memory, should it not be loaded? I thought every
 memory
 access is preceded by a load. At least that is how I interpreted your
 original statement.

My original statement was that if you have a memory access through a
pointer, *p, in C terms then p will be loaded into an SSA name first (if p is
not already an SSA name).  In your example p is not dereferenced.
Thus, more precise every indirect memory access to *p is preceeded by a load
of the pointer p to an SSA name.



 There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in
 a
 flow sensitive
 manner.

 So if a later optimization pass were to find out aliases of p or pointees
 of
 p for the statement
 t_3 = p, I don't know how will we supply the information that was
 discovered
 by our analysis.


 Your analysis should have come across that statement

t_3 = p;

 and record that t_3 points to what p points to at this point of the
 program,
 thus compute a solution for all SSA name pointers.


 Yes that is easy to do. But my concern is what if there is a pass that
 requires/uses pointer
 information of p in the statement t_3 = p.

Then it would have no way of getting at it.  Likewise if there is a global x

struct { int i; int *p; double d; } x;

and a pass would want to get at pointer information of x.p.

But why would any pass want pointer information of p or x.p if there
is no use of p or x.p as pointer in the program?  And if there is, then
the pointer will be loaded to an SSA name and used as SSA name.
Thus the pass will never want to query pointer information on memory
but only on SSA names (as those are the only ones that get dereferenced
and thus used).

At least it should work 99% of the cases in this way - and I can't currently
think of an example for the remaining 1% - though I'm sure there may be
some obscure case.


 This is what the present points-to algorithm does, and at the end it
 simply
 discards anything but the solutions computed for SSA name pointers
 which are stored in SSA_NAME_PTR_INFO.


 Oh, does it mean that the SSA_NAME_PTR_INFO would not be present for p
 but would be present only for t_3 (because p is not an SSA name and t_3 is)?

Yes.

 If this is true, can we assume the invariant that any pass would seek
 pointer
 info only for SSA names and not for any non-SSA name?

Yes, that is my belief.

 If that is the case, then I think there is a simple way in which we can make
 the
 result of our analysis available to all other passes without they having to
 consult our data structure.

Yes.

Richard.


 Uday.



Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2013-01-24 Thread Uday P. Khedker



Richard Biener wrote, On Thursday 24 January 2013 01:57 AM:

On Wed, Jan 23, 2013 at 8:12 PM, Uday Khedker u...@cse.iitb.ac.in wrote:

Hi Richard,

I am trying to understand the full implications of your statement:



Yes, that's what I say.  Any pointer that is dereferenced is first
copied to
an SSA name.  Basically everything residing in memory first has to be
loaded to an SSA name before it can be dereferenced.  That load is
explicit
in the IL


There are many parts and let me take one by one:

- You talk about dereference. So a statement p=q would not mean loading
   q into memory. Had q been a global scalar variable, it would have been
   loaded into an SSA name. Then why is it not loaded when it is global
   pointer. Why a load only for p=*q and not for p=q?


If q is a global pointer it is of course also first loaded into an SSA name.


- When we have p=*q, agreed that we are loading a value from memory but
   then it is *q that is being loaded. Why should an SSA name be created
   for q even when q is a local pointer?


If q is a local pointer that does not have its address taken then it will be
in SSA form.  If it is not in SSA form then q is first loaded into an SSA name
before it can be dereferenced.



What am I missing?


If I remember the discussion correctly it was about where to put
points-to results.


Yes that is the context and we are trying to devise a way of implementing your 
suggestion.

My student Pritam has come up with an example that seems to differ from your 
explanation (and
we don't know how to proceed).

In the program below, we have a global pointer p that has conditional 
assignments before its
use on the RHS of a copy statement.
-
#includestdio.h

int **p;
int main()
{
int *x,*y,a,c,*z;
x = a;
z = c;
if(a)
p = x;
else
p = y;
 int **t = p;
 printf(%d,%d,*z,*x);
  }
-
The assignment t=p does not seem to load p inspite of p being a global pointer. 
Here's
the relevant code fragment from test.c.017t.ssa:

bb 3:
  p = x;
  goto bb 5;

bb 4:
  p = y;

bb 5:
  t_3 = p;
  x.1_4 = x;
  D.2524_5 = *x.1_4;
  D.2525_6 = *z_1;
---
Note that p is global but it has been loaded neither loaded, nor does it have a 
PHI function.
There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in a 
flow sensitive
manner.

So if a later optimization pass were to find out aliases of p or pointees of p 
for the statement
t_3 = p, I don't know how will we supply the information that was discovered by 
our analysis.

Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2013-01-24 Thread Richard Biener
On Thu, Jan 24, 2013 at 9:02 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:


 Richard Biener wrote, On Thursday 24 January 2013 01:57 AM:

 On Wed, Jan 23, 2013 at 8:12 PM, Uday Khedker u...@cse.iitb.ac.in wrote:

 Hi Richard,

 I am trying to understand the full implications of your statement:


 Yes, that's what I say.  Any pointer that is dereferenced is first
 copied to
 an SSA name.  Basically everything residing in memory first has to be
 loaded to an SSA name before it can be dereferenced.  That load is
 explicit
 in the IL


 There are many parts and let me take one by one:

 - You talk about dereference. So a statement p=q would not mean loading
q into memory. Had q been a global scalar variable, it would have been
loaded into an SSA name. Then why is it not loaded when it is global
pointer. Why a load only for p=*q and not for p=q?


 If q is a global pointer it is of course also first loaded into an SSA
 name.

 - When we have p=*q, agreed that we are loading a value from memory but
then it is *q that is being loaded. Why should an SSA name be created
for q even when q is a local pointer?


 If q is a local pointer that does not have its address taken then it will
 be
 in SSA form.  If it is not in SSA form then q is first loaded into an SSA
 name
 before it can be dereferenced.


 What am I missing?


 If I remember the discussion correctly it was about where to put
 points-to results.


 Yes that is the context and we are trying to devise a way of implementing
 your suggestion.

 My student Pritam has come up with an example that seems to differ from your
 explanation (and
 we don't know how to proceed).

 In the program below, we have a global pointer p that has conditional
 assignments before its
 use on the RHS of a copy statement.
 -
 #includestdio.h

 int **p;
 int main()
 {
 int *x,*y,a,c,*z;
 x = a;
 z = c;
 if(a)
 p = x;
 else
 p = y;
  int **t = p;
  printf(%d,%d,*z,*x);
   }
 -
 The assignment t=p does not seem to load p inspite of p being a global
 pointer. Here's
 the relevant code fragment from test.c.017t.ssa:
 
 bb 3:
   p = x;
   goto bb 5;

 bb 4:
   p = y;

 bb 5:
   t_3 = p;

here p is loaded directly into t_3, you can consider it optimized
(copy propagated)
from

  D.1234_8 = p;
  t_3 = D.1234_8;

   x.1_4 = x;
   D.2524_5 = *x.1_4;
   D.2525_6 = *z_1;
 ---
 Note that p is global but it has been loaded neither loaded, nor does it
 have a PHI function.

As p resides in memory it does not need a PHI - values are merged by
storing them to memory.

 There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in a
 flow sensitive
 manner.

 So if a later optimization pass were to find out aliases of p or pointees of
 p for the statement
 t_3 = p, I don't know how will we supply the information that was discovered
 by our analysis.

Your analysis should have come across that statement

  t_3 = p;

and record that t_3 points to what p points to at this point of the program,
thus compute a solution for all SSA name pointers.

This is what the present points-to algorithm does, and at the end it simply
discards anything but the solutions computed for SSA name pointers
which are stored in SSA_NAME_PTR_INFO.  What the present points-to
fails to do is to properly handle changes of 'p' in a flow-sensitive manner
if you'd consider another store to p like

   p = z;

after

  t_3 = p;

then if no other transform would have optimized this in some way then
it would compute that t_3 also may point to z (then it of course also
does not handle killing defs of memory).

Hope this helps,
Richard.

 Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2013-01-24 Thread Uday P. Khedker



Richard Biener wrote, On Thursday 24 January 2013 05:28 PM:


In the program below, we have a global pointer p that has conditional
assignments before its
use on the RHS of a copy statement.
-
#includestdio.h

int **p;
int main()
{
 int *x,*y,a,c,*z;
 x = a;
 z = c;
 if(a)
 p = x;
 else
 p = y;
  int **t = p;
  printf(%d,%d,*z,*x);
   }
-
The assignment t=p does not seem to load p inspite of p being a global
pointer. Here's
the relevant code fragment from test.c.017t.ssa:

bb 3:
   p = x;
   goto bb 5;

bb 4:
   p = y;

bb 5:
   t_3 = p;


here p is loaded directly into t_3, you can consider it optimized
(copy propagated)
from

   D.1234_8 = p;
   t_3 = D.1234_8;


   x.1_4 = x;
   D.2524_5 = *x.1_4;
   D.2525_6 = *z_1;
---
Note that p is global but it has been loaded neither loaded, nor does it
have a PHI function.


As p resides in memory it does not need a PHI - values are merged by
storing them to memory.


Yes, but because it is in memory, should it not be loaded? I thought every 
memory
access is preceded by a load. At least that is how I interpreted your original 
statement.




There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in a
flow sensitive
manner.

So if a later optimization pass were to find out aliases of p or pointees of
p for the statement
t_3 = p, I don't know how will we supply the information that was discovered
by our analysis.


Your analysis should have come across that statement

   t_3 = p;

and record that t_3 points to what p points to at this point of the program,
thus compute a solution for all SSA name pointers.


Yes that is easy to do. But my concern is what if there is a pass that 
requires/uses pointer
information of p in the statement t_3 = p.



This is what the present points-to algorithm does, and at the end it simply
discards anything but the solutions computed for SSA name pointers
which are stored in SSA_NAME_PTR_INFO.


Oh, does it mean that the SSA_NAME_PTR_INFO would not be present for p
but would be present only for t_3 (because p is not an SSA name and t_3 is)?

If this is true, can we assume the invariant that any pass would seek pointer
info only for SSA names and not for any non-SSA name?

If that is the case, then I think there is a simple way in which we can make the
result of our analysis available to all other passes without they having to
consult our data structure.


Uday.



Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2013-01-23 Thread Uday Khedker

Hi Richard,

I am trying to understand the full implications of your statement:

 Yes, that's what I say.  Any pointer that is dereferenced is first
 copied to
 an SSA name.  Basically everything residing in memory first has to be
 loaded to an SSA name before it can be dereferenced.  That load is
 explicit
 in the IL

There are many parts and let me take one by one:

- You talk about dereference. So a statement p=q would not mean loading
  q into memory. Had q been a global scalar variable, it would have been
  loaded into an SSA name. Then why is it not loaded when it is global
  pointer. Why a load only for p=*q and not for p=q?
- When we have p=*q, agreed that we are loading a value from memory but
  then it is *q that is being loaded. Why should an SSA name be created
  for q even when q is a local pointer?

What am I missing?

Uday.
On Friday 12 October 2012 03:25 PM, Uday P. Khedker wrote:

Excellent! Thanks.

Uday.

Richard Biener wrote, On Friday 12 October 2012 03:20 PM:

On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker
u...@cse.iitb.ac.in wrote:



Richard Biener wrote, On Friday 12 October 2012 02:51 PM:



we _always_ see

ssa_name_1 = a;
use (ssa_name_1);

so you have a place to associate your flow-sensitive and
context-sensitive
points-to-info with (the SSA name).  My point is that for _using_ the
points-to info flow-sensitivity provided by SSA form is enough.  For
computing
it you of course need to flow-sensitively process assignments to 'a'.



This is VERY interesting! I had not thought about the difference between
computing
and using values. Now that you point it out, I think all we need to
do is to
map
flow-sensitively computed values to ssa names.

What about variables that do not have ssa names? Or are you saying
that all
such
variables would be copied into an artificial variables that have ssa
names?
I seem
to observe this in the dumps but I don't know if it holds in general.


Yes, that's what I say.  Any pointer that is dereferenced is first
copied to
an SSA name.  Basically everything residing in memory first has to be
loaded to an SSA name before it can be dereferenced.  That load is
explicit
in the IL so you should already compute points-to sets for the SSA name
destination of the load.

Richard.



Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2013-01-23 Thread Richard Biener
On Wed, Jan 23, 2013 at 8:12 PM, Uday Khedker u...@cse.iitb.ac.in wrote:
 Hi Richard,

 I am trying to understand the full implications of your statement:


 Yes, that's what I say.  Any pointer that is dereferenced is first
 copied to
 an SSA name.  Basically everything residing in memory first has to be
 loaded to an SSA name before it can be dereferenced.  That load is
 explicit
 in the IL

 There are many parts and let me take one by one:

 - You talk about dereference. So a statement p=q would not mean loading
   q into memory. Had q been a global scalar variable, it would have been
   loaded into an SSA name. Then why is it not loaded when it is global
   pointer. Why a load only for p=*q and not for p=q?

If q is a global pointer it is of course also first loaded into an SSA name.

 - When we have p=*q, agreed that we are loading a value from memory but
   then it is *q that is being loaded. Why should an SSA name be created
   for q even when q is a local pointer?

If q is a local pointer that does not have its address taken then it will be
in SSA form.  If it is not in SSA form then q is first loaded into an SSA name
before it can be dereferenced.


 What am I missing?

If I remember the discussion correctly it was about where to put
points-to results.
I said that fully flow-sensitive points-to info can be attached to SSA
name definitions
via SSA_NAME_PTR_INFO (as it is done now) and that this is the only place where
we are looking at this information via the alias oracle.  Because all
indirect memory references - which
are those with a non-trivial points-to set - happen through an SSA name pointer.

The existing points-to solver of course keeps points-to sets for more
things internally
to be able to precisely deal with (sub-)memory objects.  Internally it
does so in a
flow-insensitive way.  At the end of the solving process all is
translated to points-to
solutions for the existing SSA name pointers in the program.

Maybe that clarifies my answer - if not, can you re-phrase your question again?

Thanks,
Richard.

 Uday.

 On Friday 12 October 2012 03:25 PM, Uday P. Khedker wrote:

 Excellent! Thanks.

 Uday.

 Richard Biener wrote, On Friday 12 October 2012 03:20 PM:

 On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker
 u...@cse.iitb.ac.in wrote:



 Richard Biener wrote, On Friday 12 October 2012 02:51 PM:


 we _always_ see

 ssa_name_1 = a;
 use (ssa_name_1);

 so you have a place to associate your flow-sensitive and
 context-sensitive
 points-to-info with (the SSA name).  My point is that for _using_ the
 points-to info flow-sensitivity provided by SSA form is enough.  For
 computing
 it you of course need to flow-sensitively process assignments to 'a'.



 This is VERY interesting! I had not thought about the difference between
 computing
 and using values. Now that you point it out, I think all we need to
 do is to
 map
 flow-sensitively computed values to ssa names.

 What about variables that do not have ssa names? Or are you saying
 that all
 such
 variables would be copied into an artificial variables that have ssa
 names?
 I seem
 to observe this in the dumps but I don't know if it holds in general.


 Yes, that's what I say.  Any pointer that is dereferenced is first
 copied to
 an SSA name.  Basically everything residing in memory first has to be
 loaded to an SSA name before it can be dereferenced.  That load is
 explicit
 in the IL so you should already compute points-to sets for the SSA name
 destination of the load.

 Richard.


 Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-17 Thread Uday Khedker




On Saturday 13 October 2012 02:34 AM, Xinliang David Li wrote:

Somewhere it is mentioned that heap is handled conservatively. Does
it mean the algorithm can not disambiguate heap objects all all, or
it can but does not track pointer values stored in heap objects?

How about field sensitivity? For many programs (mostly C++ ones),
this is very important.

For CS, virtual calls in C++ will be one of the the biggest
challenges. How is that dealt with? It can be combined with type
propagation/analysis.

thanks,

David



My apologies for the delay. I wanted to check some details with the
original implementor Prashant who had his exams. Here's my final
response.

- In our earlier version, we represented heap locations in terms of base
  and offset pairs and all locations allocated at the same pointed were
  treated alike. We used pointer arithmetic to map offset calculation to
  find out which heap locations can possibly coincide. Where we could
  not determine unambiguously, we assumed that the location could
  coincide with any other location.

- For non-heap structures (allocated on stack or static area), our
  analysis is completely field sensitive.

- In our updated implementation, we have removed offsets completely
  for simplicity and made the implementation more conservative. This was
  primarily to simplify the implementation and gain some efficiency.
  But a bigger reason is that we have a much better technique of heap
  analysis in the pipeline. It essentially uses the same abstract idea
  of restricting pointer information to live data but would use heap
  liveness analysis [Khedker-Sanyal-Karkare, TOPLAS 2007]. This analysis
  discovers accesses in deep heap through a data flow analysis that
  represents the this data flow information flow sensitively in the
  form of graphs rooted at stack variables.

- We handle calls through function pointers by adjusting the call graph
  as and when we get pointees of the function pointers. In case of a
  function pointer fptr, we see what fptr may point to, and consider
  all those functions called from that point. We haven't looked at
  GCC's IR for a C++ program with virtual functions. But maybe we can
  store the type of the classes pointed to by an object during the
  pointer analysis to determine the appropriate function that needs
  to be invoked.

Thanks and regards,

Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-12 Thread Uday P. Khedker



Andrew Pinski wrote, On Friday 12 October 2012 11:26 AM:



Except the problem here is just about what f1 clobbers.  Since a has
not escaped by the time f1 is called, f1 cannot clobber a (or d).
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23384 for reference on why
GCC gets this incorrect.  GCC does not have a flow sensitive clobber
list yet.



Agreed. However, two pertinent points are:

- I can replace the call to f1 by some statically indeterminate boolean
  condition that does not involve any function call. So my basic point
  although details could vary.

- It is this flow insensitivity that I dislike :-) I have been trying for
  over five years to do away with flow and context insensitivity without
  hampering efficiency. Now I know that avoiding them could actually aid
  efficiency, apart from the guaranteed benefit of precision :-)

Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-12 Thread Richard Biener
On Fri, Oct 12, 2012 at 7:41 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:


 Andrew Pinski wrote, On Friday 12 October 2012 10:29 AM:


 Here's an example:

 main()
 {
  int **p;
  int *a, *d;
  int w, x;

  a = w;
  f1(a);
  p = a;
  a = x;
  f2(p);
  d = a;

  return *d;
 }

 It is clear that d can only point to x and can never point to w.


 I think you are wrong there.

 int *a1;
 void f1(int *a)
 {
a1 = a;
 }

 void f2(int **p)
 {
*p = a1;
 }

 That will change a to w after f2 is called.  So it looks like your
 aliasing analysis does not take into account escaping like it should.
 This is the whole point of marking a as escaped.  Maybe I missed
 something here though but d can point w with my functions for f1 and
 f2.


 Ah, you caught me there, but I think I can escape, at least in this
 situation :-)

 The call to f1 is not central to the point I am making; I had included it
 only to ensure
 that the assignment a=w doesn't get eliminated by dead code elimination.
 Since you
 decided to hold the address of a into a1 through function f1, let me
 eliminate the
 call to f1 and make the assignment a=w live in some other way. Here's the
 changed code:


 main()
 {
 int **p;
 int *a, *d;
 int w, x;

 d = x;
 a = w;
 if (f1())

 {
 p = a;
 a = x;
 f2(p);
 d = a;
 }

 return *d + *a;
 }

 Now when f2 is called, a definitely does not point to w. Hence d should not
 point to w.
 And yet, the dump shows that d continue to point to w.

 In any case, your point about escaping variables strengthens my point about
 inappropriateness
 of SSA for pointer analysis, although not through this example.

The point is we are only ever interested at what SSA names point to
when we use points-to information.  Even the current IPA-PTA code
uses internal representation for representing globals and aggregates
(but yes, not flow-sensitively).  For

int *a;

foo ()
{
   a = ...;
...

   use (a);
}

we _always_ see

  ssa_name_1 = a;
  use (ssa_name_1);

so you have a place to associate your flow-sensitive and context-sensitive
points-to-info with (the SSA name).  My point is that for _using_ the
points-to info flow-sensitivity provided by SSA form is enough.  For computing
it you of course need to flow-sensitively process assignments to 'a'.

Richard.

 Uday.




Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-12 Thread Uday P. Khedker



Richard Biener wrote, On Friday 12 October 2012 02:51 PM:


we _always_ see

   ssa_name_1 = a;
   use (ssa_name_1);

so you have a place to associate your flow-sensitive and context-sensitive
points-to-info with (the SSA name).  My point is that for _using_ the
points-to info flow-sensitivity provided by SSA form is enough.  For computing
it you of course need to flow-sensitively process assignments to 'a'.


This is VERY interesting! I had not thought about the difference between 
computing
and using values. Now that you point it out, I think all we need to do is to map
flow-sensitively computed values to ssa names.

What about variables that do not have ssa names? Or are you saying that all such
variables would be copied into an artificial variables that have ssa names? I 
seem
to observe this in the dumps but I don't know if it holds in general.

Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-12 Thread Richard Biener
On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:


 Richard Biener wrote, On Friday 12 October 2012 02:51 PM:


 we _always_ see

ssa_name_1 = a;
use (ssa_name_1);

 so you have a place to associate your flow-sensitive and context-sensitive
 points-to-info with (the SSA name).  My point is that for _using_ the
 points-to info flow-sensitivity provided by SSA form is enough.  For
 computing
 it you of course need to flow-sensitively process assignments to 'a'.


 This is VERY interesting! I had not thought about the difference between
 computing
 and using values. Now that you point it out, I think all we need to do is to
 map
 flow-sensitively computed values to ssa names.

 What about variables that do not have ssa names? Or are you saying that all
 such
 variables would be copied into an artificial variables that have ssa names?
 I seem
 to observe this in the dumps but I don't know if it holds in general.

Yes, that's what I say.  Any pointer that is dereferenced is first copied to
an SSA name.  Basically everything residing in memory first has to be
loaded to an SSA name before it can be dereferenced.  That load is explicit
in the IL so you should already compute points-to sets for the SSA name
destination of the load.

Richard.


 Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-12 Thread Uday P. Khedker

Excellent! Thanks.

Uday.

Richard Biener wrote, On Friday 12 October 2012 03:20 PM:

On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:



Richard Biener wrote, On Friday 12 October 2012 02:51 PM:



we _always_ see

ssa_name_1 = a;
use (ssa_name_1);

so you have a place to associate your flow-sensitive and context-sensitive
points-to-info with (the SSA name).  My point is that for _using_ the
points-to info flow-sensitivity provided by SSA form is enough.  For
computing
it you of course need to flow-sensitively process assignments to 'a'.



This is VERY interesting! I had not thought about the difference between
computing
and using values. Now that you point it out, I think all we need to do is to
map
flow-sensitively computed values to ssa names.

What about variables that do not have ssa names? Or are you saying that all
such
variables would be copied into an artificial variables that have ssa names?
I seem
to observe this in the dumps but I don't know if it holds in general.


Yes, that's what I say.  Any pointer that is dereferenced is first copied to
an SSA name.  Basically everything residing in memory first has to be
loaded to an SSA name before it can be dereferenced.  That load is explicit
in the IL so you should already compute points-to sets for the SSA name
destination of the load.

Richard.



Uday.




Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-12 Thread Xinliang David Li
Somewhere it is mentioned that heap is handled conservatively. Does it
mean the algorithm can not disambiguate heap objects all all, or it
can but does not track pointer values stored in heap objects?

How about field sensitivity? For many programs (mostly C++ ones), this
is very important.

For CS, virtual calls in C++ will be one of the the biggest
challenges. How is that dealt with? It can be combined with type
propagation/analysis.

thanks,

David

On Wed, Oct 10, 2012 at 10:56 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:

 We have designed and implemented a fully flow and context sensitive
 points-to analysis in gcc-4.6.0. For simplicity, we have made a dynamic
 plugin available at http://www.cse.iitb.ac.in/grc/index.php?page=l-fcpa.
 This page also provides an overview of the method, and links to the paper,
 slides, and instructions on how to use the plugin. Our method questions the
 conventional wisdom that precise flow and context sensitive pointer
 information is prohibitively large and hence we cannot hope to compute it
 efficiently. We show that the actual usable information is rather small and
 sparse and hence an increase in precision actually improves the efficiency
 too, rather than worsen it.

 Needless to say, precise pointer information is critical for the
 effectiveness of almost all analyses and optimizations in any compiler. Now
 that we have some ray of hope of precise points-to analysis in GCC, we would
 like to invite collaboration from like minded people. There is a lot of work
 that needs to be done further; some details of future work are available on
 the given URL and we will be happy to provide more information to the
 interested people.

 Looking forward to hearing from people who would like to contribute to this
 important project.

 Thanks and regards,

 Uday Khedker.
 --
 Dr. Uday Khedker
 Professor
 Department of Computer Science  Engg.
 IIT Bombay, Powai, Mumbai 400 076, India.
 Email   :   u...@cse.iitb.ac.in
 Homepage:   http://www.cse.iitb.ac.in/~uday
 Phone   :
 Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct)
 Res.   -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct)




 --
 --
 Dr. Uday Khedker
 Professor
 Department of Computer Science  Engg.
 IIT Bombay, Powai, Mumbai 400 076, India.
 Email   :   u...@cse.iitb.ac.in
 Homepage:   http://www.cse.iitb.ac.in/~uday
 Phone   :
 Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct)
 Res.   -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct)




Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Richard Biener
On Thu, Oct 11, 2012 at 5:04 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:
 Hi David,

 This is great progress.


 Thanks.



 If I understand the experiments, your implementtion has very small
 cost to perform the analysis, at least for the SPEC benchmarks you are
 testing.  Have you connected the analysis to any optimizations?  Is
 there any improvement in performance on SPEC CPU or other
 applications?


 Unfortunately we could not do this as it requires changing the way the
 pointer information
 is made available to other passes. Right now, it is encoded in the TREE
 nodes of
 variables which make is same at all program points. In other words, GCC, by
 definition
 assumes flow insensitive information. Supporting flow sensitivity implies
 being able
 to provide different information for the same pointer at different program
 points.

That's actually not true.  In fact existing GCC pointer analysis is
flow-sensitive for all SSA pointers.  Points-to information is associated
with each (and only with!) SSA pointer variable via SSA_NAME_PTR_INFO.

 We are investigating how this can be done and this is one of the most
 important future
 works on which we seek collaboration. Unless we are able to do this, we will
 not be able
 to take advantage of the analysis.

I suggest to look at the above and the disambiguators that use points-to
information in tree-ssa-alias.c (and their helpers in tree-ssa-struct-alias.c).


 You ask for collaboration, but it's not clear what state the
 infrastructure is in, how complete it is, and what more needs to be
 done.


 The infrastructure is in a reasonably complete state except that
  (a) It is not directly useful to other analyses and optimizations for the
 reasons described above.
 (b) It uses rather naive and inefficient data structures in which sets are
 stored as linked lists and set operations are implemented using linear
 searches.
 (c) Some corner cases related identifying pointers need to be fixed.
 (d) It handles heap locations rather conservatively.

 We seek collaborations on (a) and (b). We have designed APIs to hide the
 data structures
 and now these are good student projects. Some details can be found at
 http://www.cse.iitb.ac.in/grc/gcc-workshop-12/index.php?page=projects#Projects_with_Concrete_and_Detailed.

 We are carrying out research on (d) and have some ideas on what needs to be
 done but it will
 be some time before the infrastructure is enhanced to include it. We are
 committed to doing it.

I'd be interested to know how you analysis (is it interprocedural at all?)
scales on programs the existing IPA PTA blows up (in both compile-time
and memory-usage, usually first memory-usage ;)).  Did you investigate
on how to handle whole-program PTA at link-time with our distributed
WHOPR mode (thus at no point a single compiler process sees all
function bodies of the program but WPA phase sees the whole program
callgraph).

Thanks,
Richard.

 Thanks and regards,

 Uday.




Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Uday P. Khedker




That's actually not true.  In fact existing GCC pointer analysis is
flow-sensitive for all SSA pointers.


SSA provides partial flow sensitivity to the top level pointers. For deeper
pointers, one needs to interleave SSA and points-to analysis. Besides, it cannot
handle global pointers which are important at the interprocedural level.


Points-to information is associated
with each (and only with!) SSA pointer variable via SSA_NAME_PTR_INFO.


I have enclosed a program fragment and dump fragment that shows
(a) flow insensitivity of GCC's points-to analysis
(b) points-to information for non-SSA variables.

At the same time, yes I reckon that the points-to information is encoded
in SSA_NAME_PTR_INFO and I wondered why the prefix SSA. Could you help
me by explaining the dump?

 

I suggest to look at the above and the disambiguators that use points-to
information in tree-ssa-alias.c (and their helpers in tree-ssa-struct-alias.c).


Since we are interested in interprocedural analysis, we would like to include
global variables and pointers at all levels of indirection. Hence we do not 
depend
on SSA variables. However, our analysis can become more efficient by separating 
SSA
names and other names.


I'd be interested to know how you analysis (is it interprocedural at all?)


Yes it is interprocedural and does not introduce approximations even in
the presence of recursion. Without interprocedural analysis, pointer analysis
is not much useful.

We use full call strings method (Sharir-Pnueli, 1981) for full flow and
context sensitivity but use value based termination (Khedker-Karkare, CC 2008)
that reduces the time and space by orders of magnitude (actually by a factor of 
a
million for recursive programs).


scales on programs the existing IPA PTA blows up (in both compile-time
and memory-usage, usually first memory-usage ;)).


You are right. As our paper shows, even with our value based termination,
the blow up is significant. Hence we factor in strong liveness. This reduces
the size of information dramatically without missing out on any useful 
information.
And there are more avenues of speeding up our analysis at the algorithmic level
(and I am not counting the use of better data structures which is the most
obvious thing to do.)


Did you investigate
on how to handle whole-program PTA at link-time with our distributed
WHOPR mode (thus at no point a single compiler process sees all
function bodies of the program but WPA phase sees the whole program
callgraph).


We have use LTO framework but did not use WHOPR mode because context sensitive
analysis is not possible for recursive programs by creating context independent
summaries by looking at one procedure at a time. Well, one can always compromise
on precision but we did not want to do that. So we use -flto-partition=none to
load all procedure bodies.

Thanks.

Uday.

-
Program fragment
-
#include stdio.h
int a, b, c, *e;
int main()
{

if (a == b)
e = c;   /* statement n1 */
else
e = b;   /* statement n2 */
e = a;   /* statement n3 */
p();
}

p()
{
printf (%d, e);
}

Dump fragment. Note that flow sensitive analysis should say that
- e points only to c after n1,
- e points only to b after n2,
- e points only to c and b before n3,
- e points only a after n3.
However the analysis simply says that e points to a, b, c (ap0art from ESCAPED 
and NONLOCAL).

Points-to sets

NULL = { }
ANYTHING = { ANYTHING }
READONLY = { READONLY }
ESCAPED = { READONLY ESCAPED NONLOCAL a b c }
NONLOCAL = { ESCAPED NONLOCAL }
CALLUSED = { }
STOREDANYTHING = { }
INTEGER = { ANYTHING }
e.0_1 = same as e
e = { ESCAPED NONLOCAL a b c }
a.1_1 = { ESCAPED NONLOCAL }
a = same as a.1_1
b.2_2 = { ESCAPED NONLOCAL }
b = same as b.2_2
c = { ESCAPED NONLOCAL }





Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Diego Novillo
On Thu, Oct 11, 2012 at 11:53 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:


 That's actually not true.  In fact existing GCC pointer analysis is
 flow-sensitive for all SSA pointers.


 SSA provides partial flow sensitivity to the top level pointers. For deeper
 pointers, one needs to interleave SSA and points-to analysis. Besides, it
 cannot
 handle global pointers which are important at the interprocedural level.

Yes, for global pointers, but in GIMPLE we do not have 'deeper'
pointers.  Unless I'm misunderstanding you.  By 'deep pointers' you
mean 'foo-ptr1-ptr2'?  There are no memory expressions of that kind
in GIMPLE.


Diego.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Uday P. Khedker



Diego Novillo wrote, On Friday 12 October 2012 01:41 AM:

On Thu, Oct 11, 2012 at 11:53 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote:




That's actually not true.  In fact existing GCC pointer analysis is
flow-sensitive for all SSA pointers.



SSA provides partial flow sensitivity to the top level pointers. For deeper
pointers, one needs to interleave SSA and points-to analysis. Besides, it
cannot
handle global pointers which are important at the interprocedural level.


Yes, for global pointers, but in GIMPLE we do not have 'deeper'
pointers.  Unless I'm misunderstanding you.  By 'deep pointers' you
mean 'foo-ptr1-ptr2'?  There are no memory expressions of that kind
in GIMPLE.


I meant pointers that are pointed to by other pointers. However I see that
GCC manages to solve common cases as a consequence of other optimizations so
I will need some time to create an example that shows problems with deeper
pointers but my main reason for not using SSA is that it seems fine for
local analysis of scalars. When we introduce globals (the example that I gave in
my previous mail http://gcc.gnu.org/ml/gcc/2012-10/msg00164.html) OR
have function calls to which we pass pointers, SSA is helpless.

Here's an example:

main()
{
int **p;
int *a, *d;
int w, x;

a = w;
f1(a);
p = a;
a = x;
f2(p);
d = a;

return *d;
}

It is clear that d can only point to x and can never point to w. However,
the points-to sets dumped in file .052i.pta show that d can point to w.

ANYTHING = { ANYTHING }
READONLY = { READONLY }
ESCAPED = { ESCAPED NONLOCAL a w x } same as main.clobber
NONLOCAL = { ESCAPED NONLOCAL } same as f1
STOREDANYTHING = { }
INTEGER = { ANYTHING }
main.clobber = { ESCAPED NONLOCAL a w x }
main.use = { ESCAPED NONLOCAL a w x } same as main.clobber
main.result = { ESCAPED NONLOCAL } same as D.1958_4
main.varargs = { }
a = { ESCAPED NONLOCAL w x } same as a.0_1
w = { ESCAPED NONLOCAL }
a.0_1 = { ESCAPED NONLOCAL w x }
f1 = { ESCAPED NONLOCAL }
x = { ESCAPED NONLOCAL }
f2 = { ESCAPED NONLOCAL } same as f1
d_3 = { ESCAPED NONLOCAL w x } same as a.0_1
D.1958_4 = { ESCAPED NONLOCAL }

The basic point I am trying to make is that SSA is primarily defined for
locally scoped scalars and works excellently for them. In some cases
it can be extended to handle other situations too. However, for
pointers, starting from the first principles could be cleaner (and
certainly more precise).

Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Andrew Pinski
On Thu, Oct 11, 2012 at 9:41 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote:


 Diego Novillo wrote, On Friday 12 October 2012 01:41 AM:

 On Thu, Oct 11, 2012 at 11:53 AM, Uday P. Khedker u...@cse.iitb.ac.in
 wrote:



 That's actually not true.  In fact existing GCC pointer analysis is
 flow-sensitive for all SSA pointers.



 SSA provides partial flow sensitivity to the top level pointers. For
 deeper
 pointers, one needs to interleave SSA and points-to analysis. Besides, it
 cannot
 handle global pointers which are important at the interprocedural level.


 Yes, for global pointers, but in GIMPLE we do not have 'deeper'
 pointers.  Unless I'm misunderstanding you.  By 'deep pointers' you
 mean 'foo-ptr1-ptr2'?  There are no memory expressions of that kind
 in GIMPLE.


 I meant pointers that are pointed to by other pointers. However I see that
 GCC manages to solve common cases as a consequence of other optimizations so
 I will need some time to create an example that shows problems with deeper
 pointers but my main reason for not using SSA is that it seems fine for
 local analysis of scalars. When we introduce globals (the example that I
 gave in
 my previous mail http://gcc.gnu.org/ml/gcc/2012-10/msg00164.html) OR
 have function calls to which we pass pointers, SSA is helpless.

 Here's an example:

 main()
 {
 int **p;
 int *a, *d;
 int w, x;

 a = w;
 f1(a);
 p = a;
 a = x;
 f2(p);
 d = a;

 return *d;
 }

 It is clear that d can only point to x and can never point to w.

I think you are wrong there.

int *a1;
void f1(int *a)
{
  a1 = a;
}

void f2(int **p)
{
  *p = a1;
}

That will change a to w after f2 is called.  So it looks like your
aliasing analysis does not take into account escaping like it should.
This is the whole point of marking a as escaped.  Maybe I missed
something here though but d can point w with my functions for f1 and
f2.

Thanks,
Andrew Pinski

 However,
 the points-to sets dumped in file .052i.pta show that d can point to w.


 ANYTHING = { ANYTHING }
 READONLY = { READONLY }
 ESCAPED = { ESCAPED NONLOCAL a w x } same as main.clobber
 NONLOCAL = { ESCAPED NONLOCAL } same as f1

 STOREDANYTHING = { }
 INTEGER = { ANYTHING }
 main.clobber = { ESCAPED NONLOCAL a w x }
 main.use = { ESCAPED NONLOCAL a w x } same as main.clobber
 main.result = { ESCAPED NONLOCAL } same as D.1958_4
 main.varargs = { }
 a = { ESCAPED NONLOCAL w x } same as a.0_1
 w = { ESCAPED NONLOCAL }
 a.0_1 = { ESCAPED NONLOCAL w x }
 f1 = { ESCAPED NONLOCAL }
 x = { ESCAPED NONLOCAL }
 f2 = { ESCAPED NONLOCAL } same as f1
 d_3 = { ESCAPED NONLOCAL w x } same as a.0_1
 D.1958_4 = { ESCAPED NONLOCAL }

 The basic point I am trying to make is that SSA is primarily defined for
 locally scoped scalars and works excellently for them. In some cases
 it can be extended to handle other situations too. However, for
 pointers, starting from the first principles could be cleaner (and
 certainly more precise).

 Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Uday P. Khedker



Andrew Pinski wrote, On Friday 12 October 2012 10:29 AM:


Here's an example:

main()
{
 int **p;
 int *a, *d;
 int w, x;

 a = w;
 f1(a);
 p = a;
 a = x;
 f2(p);
 d = a;

 return *d;
}

It is clear that d can only point to x and can never point to w.


I think you are wrong there.

int *a1;
void f1(int *a)
{
   a1 = a;
}

void f2(int **p)
{
   *p = a1;
}

That will change a to w after f2 is called.  So it looks like your
aliasing analysis does not take into account escaping like it should.
This is the whole point of marking a as escaped.  Maybe I missed
something here though but d can point w with my functions for f1 and
f2.


Ah, you caught me there, but I think I can escape, at least in this situation 
:-)

The call to f1 is not central to the point I am making; I had included it only 
to ensure
that the assignment a=w doesn't get eliminated by dead code elimination. Since 
you
decided to hold the address of a into a1 through function f1, let me eliminate 
the
call to f1 and make the assignment a=w live in some other way. Here's the 
changed code:

main()
{
int **p;
int *a, *d;
int w, x;

d = x;
a = w;
if (f1())
{
p = a;
a = x;
f2(p);
d = a;
}

return *d + *a;
}

Now when f2 is called, a definitely does not point to w. Hence d should not 
point to w.
And yet, the dump shows that d continue to point to w.

In any case, your point about escaping variables strengthens my point about 
inappropriateness
of SSA for pointer analysis, although not through this example.

Uday.




Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Uday P. Khedker



decided to hold the address of a into a1 through function f1, let me eliminate 
the
call to f1 and make the assignment a=w live in some other way. Here's the 
changed code:


Please read it as eliminate the call passing a to f1.

Uday.


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-11 Thread Andrew Pinski
On Thu, Oct 11, 2012 at 10:41 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote:


 Andrew Pinski wrote, On Friday 12 October 2012 10:29 AM:


 Here's an example:

 main()
 {
  int **p;
  int *a, *d;
  int w, x;

  a = w;
  f1(a);
  p = a;
  a = x;
  f2(p);
  d = a;

  return *d;
 }

 It is clear that d can only point to x and can never point to w.


 I think you are wrong there.

 int *a1;
 void f1(int *a)
 {
a1 = a;
 }

 void f2(int **p)
 {
*p = a1;
 }

 That will change a to w after f2 is called.  So it looks like your
 aliasing analysis does not take into account escaping like it should.
 This is the whole point of marking a as escaped.  Maybe I missed
 something here though but d can point w with my functions for f1 and
 f2.


 Ah, you caught me there, but I think I can escape, at least in this
 situation :-)

 The call to f1 is not central to the point I am making; I had included it
 only to ensure
 that the assignment a=w doesn't get eliminated by dead code elimination.
 Since you
 decided to hold the address of a into a1 through function f1, let me
 eliminate the
 call to f1 and make the assignment a=w live in some other way. Here's the
 changed code:


 main()
 {
 int **p;
 int *a, *d;
 int w, x;

 d = x;
 a = w;
 if (f1())

 {
 p = a;
 a = x;
 f2(p);
 d = a;
 }

 return *d + *a;
 }

 Now when f2 is called, a definitely does not point to w. Hence d should not
 point to w.
 And yet, the dump shows that d continue to point to w.

 In any case, your point about escaping variables strengthens my point about
 inappropriateness
 of SSA for pointer analysis, although not through this example.


Except the problem here is just about what f1 clobbers.  Since a has
not escaped by the time f1 is called, f1 cannot clobber a (or d).
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23384 for reference on why
GCC gets this incorrect.  GCC does not have a flow sensitive clobber
list yet.


 Uday.




Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-10 Thread Uday P. Khedker


We have designed and implemented a fully flow and context sensitive points-to 
analysis in gcc-4.6.0. For simplicity, we have made a dynamic plugin available 
at http://www.cse.iitb.ac.in/grc/index.php?page=l-fcpa. This page also provides 
an overview of the method, and links to the paper, slides, and instructions on 
how to use the plugin. Our method questions the conventional wisdom that 
precise flow and context sensitive pointer information is prohibitively large 
and hence we cannot hope to compute it efficiently. We show that the actual 
usable information is rather small and sparse and hence an increase in 
precision actually improves the efficiency too, rather than worsen it.

Needless to say, precise pointer information is critical for the effectiveness 
of almost all analyses and optimizations in any compiler. Now that we have some 
ray of hope of precise points-to analysis in GCC, we would like to invite 
collaboration from like minded people. There is a lot of work that needs to be 
done further; some details of future work are available on the given URL and we 
will be happy to provide more information to the interested people.

Looking forward to hearing from people who would like to contribute to this 
important project.

Thanks and regards,

Uday Khedker.
--
Dr. Uday Khedker
Professor
Department of Computer Science  Engg.
IIT Bombay, Powai, Mumbai 400 076, India.
Email   :   u...@cse.iitb.ac.in
Homepage:   http://www.cse.iitb.ac.in/~uday
Phone   :   
Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct)
Res.   -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct)




--
--
Dr. Uday Khedker
Professor
Department of Computer Science  Engg.
IIT Bombay, Powai, Mumbai 400 076, India.
Email   :   u...@cse.iitb.ac.in
Homepage:   http://www.cse.iitb.ac.in/~uday
Phone   :   
Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct)
Res.   -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct)




Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-10 Thread David Edelsohn
On Wed, Oct 10, 2012 at 1:56 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote:

 We have designed and implemented a fully flow and context sensitive
 points-to analysis in gcc-4.6.0. For simplicity, we have made a dynamic
 plugin available at http://www.cse.iitb.ac.in/grc/index.php?page=l-fcpa.
 This page also provides an overview of the method, and links to the paper,
 slides, and instructions on how to use the plugin. Our method questions the
 conventional wisdom that precise flow and context sensitive pointer
 information is prohibitively large and hence we cannot hope to compute it
 efficiently. We show that the actual usable information is rather small and
 sparse and hence an increase in precision actually improves the efficiency
 too, rather than worsen it.

 Needless to say, precise pointer information is critical for the
 effectiveness of almost all analyses and optimizations in any compiler. Now
 that we have some ray of hope of precise points-to analysis in GCC, we would
 like to invite collaboration from like minded people. There is a lot of work
 that needs to be done further; some details of future work are available on
 the given URL and we will be happy to provide more information to the
 interested people.

Uday,

This is great progress.

If I understand the experiments, your implementtion has very small
cost to perform the analysis, at least for the SPEC benchmarks you are
testing.  Have you connected the analysis to any optimizations?  Is
there any improvement in performance on SPEC CPU or other
applications?

You ask for collaboration, but it's not clear what state the
infrastructure is in, how complete it is, and what more needs to be
done.

Thanks, David


Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0

2012-10-10 Thread Uday P. Khedker

Hi David,


This is great progress.


Thanks.
 

If I understand the experiments, your implementtion has very small
cost to perform the analysis, at least for the SPEC benchmarks you are
testing.  Have you connected the analysis to any optimizations?  Is
there any improvement in performance on SPEC CPU or other
applications?


Unfortunately we could not do this as it requires changing the way the pointer 
information
is made available to other passes. Right now, it is encoded in the TREE nodes of
variables which make is same at all program points. In other words, GCC, by 
definition
assumes flow insensitive information. Supporting flow sensitivity implies being 
able
to provide different information for the same pointer at different program 
points.

We are investigating how this can be done and this is one of the most important 
future
works on which we seek collaboration. Unless we are able to do this, we will 
not be able
to take advantage of the analysis.



You ask for collaboration, but it's not clear what state the
infrastructure is in, how complete it is, and what more needs to be
done.


The infrastructure is in a reasonably complete state except that
 
(a) It is not directly useful to other analyses and optimizations for the

reasons described above.
(b) It uses rather naive and inefficient data structures in which sets are
stored as linked lists and set operations are implemented using linear
searches.
(c) Some corner cases related identifying pointers need to be fixed.
(d) It handles heap locations rather conservatively.

We seek collaborations on (a) and (b). We have designed APIs to hide the data 
structures
and now these are good student projects. Some details can be found at
http://www.cse.iitb.ac.in/grc/gcc-workshop-12/index.php?page=projects#Projects_with_Concrete_and_Detailed.

We are carrying out research on (d) and have some ideas on what needs to be 
done but it will
be some time before the infrastructure is enhanced to include it. We are 
committed to doing it.

Thanks and regards,

Uday.