Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Thu, Jan 24, 2013 at 3:17 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Richard Biener wrote, On Thursday 24 January 2013 05:28 PM: In the program below, we have a global pointer p that has conditional assignments before its use on the RHS of a copy statement. - #includestdio.h int **p; int main() { int *x,*y,a,c,*z; x = a; z = c; if(a) p = x; else p = y; int **t = p; printf(%d,%d,*z,*x); } - The assignment t=p does not seem to load p inspite of p being a global pointer. Here's the relevant code fragment from test.c.017t.ssa: bb 3: p = x; goto bb 5; bb 4: p = y; bb 5: t_3 = p; here p is loaded directly into t_3, you can consider it optimized (copy propagated) from D.1234_8 = p; t_3 = D.1234_8; x.1_4 = x; D.2524_5 = *x.1_4; D.2525_6 = *z_1; --- Note that p is global but it has been loaded neither loaded, nor does it have a PHI function. As p resides in memory it does not need a PHI - values are merged by storing them to memory. Yes, but because it is in memory, should it not be loaded? I thought every memory access is preceded by a load. At least that is how I interpreted your original statement. My original statement was that if you have a memory access through a pointer, *p, in C terms then p will be loaded into an SSA name first (if p is not already an SSA name). In your example p is not dereferenced. Thus, more precise every indirect memory access to *p is preceeded by a load of the pointer p to an SSA name. There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in a flow sensitive manner. So if a later optimization pass were to find out aliases of p or pointees of p for the statement t_3 = p, I don't know how will we supply the information that was discovered by our analysis. Your analysis should have come across that statement t_3 = p; and record that t_3 points to what p points to at this point of the program, thus compute a solution for all SSA name pointers. Yes that is easy to do. But my concern is what if there is a pass that requires/uses pointer information of p in the statement t_3 = p. Then it would have no way of getting at it. Likewise if there is a global x struct { int i; int *p; double d; } x; and a pass would want to get at pointer information of x.p. But why would any pass want pointer information of p or x.p if there is no use of p or x.p as pointer in the program? And if there is, then the pointer will be loaded to an SSA name and used as SSA name. Thus the pass will never want to query pointer information on memory but only on SSA names (as those are the only ones that get dereferenced and thus used). At least it should work 99% of the cases in this way - and I can't currently think of an example for the remaining 1% - though I'm sure there may be some obscure case. This is what the present points-to algorithm does, and at the end it simply discards anything but the solutions computed for SSA name pointers which are stored in SSA_NAME_PTR_INFO. Oh, does it mean that the SSA_NAME_PTR_INFO would not be present for p but would be present only for t_3 (because p is not an SSA name and t_3 is)? Yes. If this is true, can we assume the invariant that any pass would seek pointer info only for SSA names and not for any non-SSA name? Yes, that is my belief. If that is the case, then I think there is a simple way in which we can make the result of our analysis available to all other passes without they having to consult our data structure. Yes. Richard. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Richard Biener wrote, On Thursday 24 January 2013 01:57 AM: On Wed, Jan 23, 2013 at 8:12 PM, Uday Khedker u...@cse.iitb.ac.in wrote: Hi Richard, I am trying to understand the full implications of your statement: Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL There are many parts and let me take one by one: - You talk about dereference. So a statement p=q would not mean loading q into memory. Had q been a global scalar variable, it would have been loaded into an SSA name. Then why is it not loaded when it is global pointer. Why a load only for p=*q and not for p=q? If q is a global pointer it is of course also first loaded into an SSA name. - When we have p=*q, agreed that we are loading a value from memory but then it is *q that is being loaded. Why should an SSA name be created for q even when q is a local pointer? If q is a local pointer that does not have its address taken then it will be in SSA form. If it is not in SSA form then q is first loaded into an SSA name before it can be dereferenced. What am I missing? If I remember the discussion correctly it was about where to put points-to results. Yes that is the context and we are trying to devise a way of implementing your suggestion. My student Pritam has come up with an example that seems to differ from your explanation (and we don't know how to proceed). In the program below, we have a global pointer p that has conditional assignments before its use on the RHS of a copy statement. - #includestdio.h int **p; int main() { int *x,*y,a,c,*z; x = a; z = c; if(a) p = x; else p = y; int **t = p; printf(%d,%d,*z,*x); } - The assignment t=p does not seem to load p inspite of p being a global pointer. Here's the relevant code fragment from test.c.017t.ssa: bb 3: p = x; goto bb 5; bb 4: p = y; bb 5: t_3 = p; x.1_4 = x; D.2524_5 = *x.1_4; D.2525_6 = *z_1; --- Note that p is global but it has been loaded neither loaded, nor does it have a PHI function. There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in a flow sensitive manner. So if a later optimization pass were to find out aliases of p or pointees of p for the statement t_3 = p, I don't know how will we supply the information that was discovered by our analysis. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Thu, Jan 24, 2013 at 9:02 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Richard Biener wrote, On Thursday 24 January 2013 01:57 AM: On Wed, Jan 23, 2013 at 8:12 PM, Uday Khedker u...@cse.iitb.ac.in wrote: Hi Richard, I am trying to understand the full implications of your statement: Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL There are many parts and let me take one by one: - You talk about dereference. So a statement p=q would not mean loading q into memory. Had q been a global scalar variable, it would have been loaded into an SSA name. Then why is it not loaded when it is global pointer. Why a load only for p=*q and not for p=q? If q is a global pointer it is of course also first loaded into an SSA name. - When we have p=*q, agreed that we are loading a value from memory but then it is *q that is being loaded. Why should an SSA name be created for q even when q is a local pointer? If q is a local pointer that does not have its address taken then it will be in SSA form. If it is not in SSA form then q is first loaded into an SSA name before it can be dereferenced. What am I missing? If I remember the discussion correctly it was about where to put points-to results. Yes that is the context and we are trying to devise a way of implementing your suggestion. My student Pritam has come up with an example that seems to differ from your explanation (and we don't know how to proceed). In the program below, we have a global pointer p that has conditional assignments before its use on the RHS of a copy statement. - #includestdio.h int **p; int main() { int *x,*y,a,c,*z; x = a; z = c; if(a) p = x; else p = y; int **t = p; printf(%d,%d,*z,*x); } - The assignment t=p does not seem to load p inspite of p being a global pointer. Here's the relevant code fragment from test.c.017t.ssa: bb 3: p = x; goto bb 5; bb 4: p = y; bb 5: t_3 = p; here p is loaded directly into t_3, you can consider it optimized (copy propagated) from D.1234_8 = p; t_3 = D.1234_8; x.1_4 = x; D.2524_5 = *x.1_4; D.2525_6 = *z_1; --- Note that p is global but it has been loaded neither loaded, nor does it have a PHI function. As p resides in memory it does not need a PHI - values are merged by storing them to memory. There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in a flow sensitive manner. So if a later optimization pass were to find out aliases of p or pointees of p for the statement t_3 = p, I don't know how will we supply the information that was discovered by our analysis. Your analysis should have come across that statement t_3 = p; and record that t_3 points to what p points to at this point of the program, thus compute a solution for all SSA name pointers. This is what the present points-to algorithm does, and at the end it simply discards anything but the solutions computed for SSA name pointers which are stored in SSA_NAME_PTR_INFO. What the present points-to fails to do is to properly handle changes of 'p' in a flow-sensitive manner if you'd consider another store to p like p = z; after t_3 = p; then if no other transform would have optimized this in some way then it would compute that t_3 also may point to z (then it of course also does not handle killing defs of memory). Hope this helps, Richard. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Richard Biener wrote, On Thursday 24 January 2013 05:28 PM: In the program below, we have a global pointer p that has conditional assignments before its use on the RHS of a copy statement. - #includestdio.h int **p; int main() { int *x,*y,a,c,*z; x = a; z = c; if(a) p = x; else p = y; int **t = p; printf(%d,%d,*z,*x); } - The assignment t=p does not seem to load p inspite of p being a global pointer. Here's the relevant code fragment from test.c.017t.ssa: bb 3: p = x; goto bb 5; bb 4: p = y; bb 5: t_3 = p; here p is loaded directly into t_3, you can consider it optimized (copy propagated) from D.1234_8 = p; t_3 = D.1234_8; x.1_4 = x; D.2524_5 = *x.1_4; D.2525_6 = *z_1; --- Note that p is global but it has been loaded neither loaded, nor does it have a PHI function. As p resides in memory it does not need a PHI - values are merged by storing them to memory. Yes, but because it is in memory, should it not be loaded? I thought every memory access is preceded by a load. At least that is how I interpreted your original statement. There is no way we can put the pointees of p in SSA_NAME_PTR_INFO of p in a flow sensitive manner. So if a later optimization pass were to find out aliases of p or pointees of p for the statement t_3 = p, I don't know how will we supply the information that was discovered by our analysis. Your analysis should have come across that statement t_3 = p; and record that t_3 points to what p points to at this point of the program, thus compute a solution for all SSA name pointers. Yes that is easy to do. But my concern is what if there is a pass that requires/uses pointer information of p in the statement t_3 = p. This is what the present points-to algorithm does, and at the end it simply discards anything but the solutions computed for SSA name pointers which are stored in SSA_NAME_PTR_INFO. Oh, does it mean that the SSA_NAME_PTR_INFO would not be present for p but would be present only for t_3 (because p is not an SSA name and t_3 is)? If this is true, can we assume the invariant that any pass would seek pointer info only for SSA names and not for any non-SSA name? If that is the case, then I think there is a simple way in which we can make the result of our analysis available to all other passes without they having to consult our data structure. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Hi Richard, I am trying to understand the full implications of your statement: Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL There are many parts and let me take one by one: - You talk about dereference. So a statement p=q would not mean loading q into memory. Had q been a global scalar variable, it would have been loaded into an SSA name. Then why is it not loaded when it is global pointer. Why a load only for p=*q and not for p=q? - When we have p=*q, agreed that we are loading a value from memory but then it is *q that is being loaded. Why should an SSA name be created for q even when q is a local pointer? What am I missing? Uday. On Friday 12 October 2012 03:25 PM, Uday P. Khedker wrote: Excellent! Thanks. Uday. Richard Biener wrote, On Friday 12 October 2012 03:20 PM: On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Richard Biener wrote, On Friday 12 October 2012 02:51 PM: we _always_ see ssa_name_1 = a; use (ssa_name_1); so you have a place to associate your flow-sensitive and context-sensitive points-to-info with (the SSA name). My point is that for _using_ the points-to info flow-sensitivity provided by SSA form is enough. For computing it you of course need to flow-sensitively process assignments to 'a'. This is VERY interesting! I had not thought about the difference between computing and using values. Now that you point it out, I think all we need to do is to map flow-sensitively computed values to ssa names. What about variables that do not have ssa names? Or are you saying that all such variables would be copied into an artificial variables that have ssa names? I seem to observe this in the dumps but I don't know if it holds in general. Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL so you should already compute points-to sets for the SSA name destination of the load. Richard. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Wed, Jan 23, 2013 at 8:12 PM, Uday Khedker u...@cse.iitb.ac.in wrote: Hi Richard, I am trying to understand the full implications of your statement: Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL There are many parts and let me take one by one: - You talk about dereference. So a statement p=q would not mean loading q into memory. Had q been a global scalar variable, it would have been loaded into an SSA name. Then why is it not loaded when it is global pointer. Why a load only for p=*q and not for p=q? If q is a global pointer it is of course also first loaded into an SSA name. - When we have p=*q, agreed that we are loading a value from memory but then it is *q that is being loaded. Why should an SSA name be created for q even when q is a local pointer? If q is a local pointer that does not have its address taken then it will be in SSA form. If it is not in SSA form then q is first loaded into an SSA name before it can be dereferenced. What am I missing? If I remember the discussion correctly it was about where to put points-to results. I said that fully flow-sensitive points-to info can be attached to SSA name definitions via SSA_NAME_PTR_INFO (as it is done now) and that this is the only place where we are looking at this information via the alias oracle. Because all indirect memory references - which are those with a non-trivial points-to set - happen through an SSA name pointer. The existing points-to solver of course keeps points-to sets for more things internally to be able to precisely deal with (sub-)memory objects. Internally it does so in a flow-insensitive way. At the end of the solving process all is translated to points-to solutions for the existing SSA name pointers in the program. Maybe that clarifies my answer - if not, can you re-phrase your question again? Thanks, Richard. Uday. On Friday 12 October 2012 03:25 PM, Uday P. Khedker wrote: Excellent! Thanks. Uday. Richard Biener wrote, On Friday 12 October 2012 03:20 PM: On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Richard Biener wrote, On Friday 12 October 2012 02:51 PM: we _always_ see ssa_name_1 = a; use (ssa_name_1); so you have a place to associate your flow-sensitive and context-sensitive points-to-info with (the SSA name). My point is that for _using_ the points-to info flow-sensitivity provided by SSA form is enough. For computing it you of course need to flow-sensitively process assignments to 'a'. This is VERY interesting! I had not thought about the difference between computing and using values. Now that you point it out, I think all we need to do is to map flow-sensitively computed values to ssa names. What about variables that do not have ssa names? Or are you saying that all such variables would be copied into an artificial variables that have ssa names? I seem to observe this in the dumps but I don't know if it holds in general. Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL so you should already compute points-to sets for the SSA name destination of the load. Richard. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Saturday 13 October 2012 02:34 AM, Xinliang David Li wrote: Somewhere it is mentioned that heap is handled conservatively. Does it mean the algorithm can not disambiguate heap objects all all, or it can but does not track pointer values stored in heap objects? How about field sensitivity? For many programs (mostly C++ ones), this is very important. For CS, virtual calls in C++ will be one of the the biggest challenges. How is that dealt with? It can be combined with type propagation/analysis. thanks, David My apologies for the delay. I wanted to check some details with the original implementor Prashant who had his exams. Here's my final response. - In our earlier version, we represented heap locations in terms of base and offset pairs and all locations allocated at the same pointed were treated alike. We used pointer arithmetic to map offset calculation to find out which heap locations can possibly coincide. Where we could not determine unambiguously, we assumed that the location could coincide with any other location. - For non-heap structures (allocated on stack or static area), our analysis is completely field sensitive. - In our updated implementation, we have removed offsets completely for simplicity and made the implementation more conservative. This was primarily to simplify the implementation and gain some efficiency. But a bigger reason is that we have a much better technique of heap analysis in the pipeline. It essentially uses the same abstract idea of restricting pointer information to live data but would use heap liveness analysis [Khedker-Sanyal-Karkare, TOPLAS 2007]. This analysis discovers accesses in deep heap through a data flow analysis that represents the this data flow information flow sensitively in the form of graphs rooted at stack variables. - We handle calls through function pointers by adjusting the call graph as and when we get pointees of the function pointers. In case of a function pointer fptr, we see what fptr may point to, and consider all those functions called from that point. We haven't looked at GCC's IR for a C++ program with virtual functions. But maybe we can store the type of the classes pointed to by an object during the pointer analysis to determine the appropriate function that needs to be invoked. Thanks and regards, Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Andrew Pinski wrote, On Friday 12 October 2012 11:26 AM: Except the problem here is just about what f1 clobbers. Since a has not escaped by the time f1 is called, f1 cannot clobber a (or d). http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23384 for reference on why GCC gets this incorrect. GCC does not have a flow sensitive clobber list yet. Agreed. However, two pertinent points are: - I can replace the call to f1 by some statically indeterminate boolean condition that does not involve any function call. So my basic point although details could vary. - It is this flow insensitivity that I dislike :-) I have been trying for over five years to do away with flow and context insensitivity without hampering efficiency. Now I know that avoiding them could actually aid efficiency, apart from the guaranteed benefit of precision :-) Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Fri, Oct 12, 2012 at 7:41 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Andrew Pinski wrote, On Friday 12 October 2012 10:29 AM: Here's an example: main() { int **p; int *a, *d; int w, x; a = w; f1(a); p = a; a = x; f2(p); d = a; return *d; } It is clear that d can only point to x and can never point to w. I think you are wrong there. int *a1; void f1(int *a) { a1 = a; } void f2(int **p) { *p = a1; } That will change a to w after f2 is called. So it looks like your aliasing analysis does not take into account escaping like it should. This is the whole point of marking a as escaped. Maybe I missed something here though but d can point w with my functions for f1 and f2. Ah, you caught me there, but I think I can escape, at least in this situation :-) The call to f1 is not central to the point I am making; I had included it only to ensure that the assignment a=w doesn't get eliminated by dead code elimination. Since you decided to hold the address of a into a1 through function f1, let me eliminate the call to f1 and make the assignment a=w live in some other way. Here's the changed code: main() { int **p; int *a, *d; int w, x; d = x; a = w; if (f1()) { p = a; a = x; f2(p); d = a; } return *d + *a; } Now when f2 is called, a definitely does not point to w. Hence d should not point to w. And yet, the dump shows that d continue to point to w. In any case, your point about escaping variables strengthens my point about inappropriateness of SSA for pointer analysis, although not through this example. The point is we are only ever interested at what SSA names point to when we use points-to information. Even the current IPA-PTA code uses internal representation for representing globals and aggregates (but yes, not flow-sensitively). For int *a; foo () { a = ...; ... use (a); } we _always_ see ssa_name_1 = a; use (ssa_name_1); so you have a place to associate your flow-sensitive and context-sensitive points-to-info with (the SSA name). My point is that for _using_ the points-to info flow-sensitivity provided by SSA form is enough. For computing it you of course need to flow-sensitively process assignments to 'a'. Richard. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Richard Biener wrote, On Friday 12 October 2012 02:51 PM: we _always_ see ssa_name_1 = a; use (ssa_name_1); so you have a place to associate your flow-sensitive and context-sensitive points-to-info with (the SSA name). My point is that for _using_ the points-to info flow-sensitivity provided by SSA form is enough. For computing it you of course need to flow-sensitively process assignments to 'a'. This is VERY interesting! I had not thought about the difference between computing and using values. Now that you point it out, I think all we need to do is to map flow-sensitively computed values to ssa names. What about variables that do not have ssa names? Or are you saying that all such variables would be copied into an artificial variables that have ssa names? I seem to observe this in the dumps but I don't know if it holds in general. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Richard Biener wrote, On Friday 12 October 2012 02:51 PM: we _always_ see ssa_name_1 = a; use (ssa_name_1); so you have a place to associate your flow-sensitive and context-sensitive points-to-info with (the SSA name). My point is that for _using_ the points-to info flow-sensitivity provided by SSA form is enough. For computing it you of course need to flow-sensitively process assignments to 'a'. This is VERY interesting! I had not thought about the difference between computing and using values. Now that you point it out, I think all we need to do is to map flow-sensitively computed values to ssa names. What about variables that do not have ssa names? Or are you saying that all such variables would be copied into an artificial variables that have ssa names? I seem to observe this in the dumps but I don't know if it holds in general. Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL so you should already compute points-to sets for the SSA name destination of the load. Richard. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Excellent! Thanks. Uday. Richard Biener wrote, On Friday 12 October 2012 03:20 PM: On Fri, Oct 12, 2012 at 11:46 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Richard Biener wrote, On Friday 12 October 2012 02:51 PM: we _always_ see ssa_name_1 = a; use (ssa_name_1); so you have a place to associate your flow-sensitive and context-sensitive points-to-info with (the SSA name). My point is that for _using_ the points-to info flow-sensitivity provided by SSA form is enough. For computing it you of course need to flow-sensitively process assignments to 'a'. This is VERY interesting! I had not thought about the difference between computing and using values. Now that you point it out, I think all we need to do is to map flow-sensitively computed values to ssa names. What about variables that do not have ssa names? Or are you saying that all such variables would be copied into an artificial variables that have ssa names? I seem to observe this in the dumps but I don't know if it holds in general. Yes, that's what I say. Any pointer that is dereferenced is first copied to an SSA name. Basically everything residing in memory first has to be loaded to an SSA name before it can be dereferenced. That load is explicit in the IL so you should already compute points-to sets for the SSA name destination of the load. Richard. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Somewhere it is mentioned that heap is handled conservatively. Does it mean the algorithm can not disambiguate heap objects all all, or it can but does not track pointer values stored in heap objects? How about field sensitivity? For many programs (mostly C++ ones), this is very important. For CS, virtual calls in C++ will be one of the the biggest challenges. How is that dealt with? It can be combined with type propagation/analysis. thanks, David On Wed, Oct 10, 2012 at 10:56 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: We have designed and implemented a fully flow and context sensitive points-to analysis in gcc-4.6.0. For simplicity, we have made a dynamic plugin available at http://www.cse.iitb.ac.in/grc/index.php?page=l-fcpa. This page also provides an overview of the method, and links to the paper, slides, and instructions on how to use the plugin. Our method questions the conventional wisdom that precise flow and context sensitive pointer information is prohibitively large and hence we cannot hope to compute it efficiently. We show that the actual usable information is rather small and sparse and hence an increase in precision actually improves the efficiency too, rather than worsen it. Needless to say, precise pointer information is critical for the effectiveness of almost all analyses and optimizations in any compiler. Now that we have some ray of hope of precise points-to analysis in GCC, we would like to invite collaboration from like minded people. There is a lot of work that needs to be done further; some details of future work are available on the given URL and we will be happy to provide more information to the interested people. Looking forward to hearing from people who would like to contribute to this important project. Thanks and regards, Uday Khedker. -- Dr. Uday Khedker Professor Department of Computer Science Engg. IIT Bombay, Powai, Mumbai 400 076, India. Email : u...@cse.iitb.ac.in Homepage: http://www.cse.iitb.ac.in/~uday Phone : Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct) Res. -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct) -- -- Dr. Uday Khedker Professor Department of Computer Science Engg. IIT Bombay, Powai, Mumbai 400 076, India. Email : u...@cse.iitb.ac.in Homepage: http://www.cse.iitb.ac.in/~uday Phone : Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct) Res. -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct)
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Thu, Oct 11, 2012 at 5:04 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Hi David, This is great progress. Thanks. If I understand the experiments, your implementtion has very small cost to perform the analysis, at least for the SPEC benchmarks you are testing. Have you connected the analysis to any optimizations? Is there any improvement in performance on SPEC CPU or other applications? Unfortunately we could not do this as it requires changing the way the pointer information is made available to other passes. Right now, it is encoded in the TREE nodes of variables which make is same at all program points. In other words, GCC, by definition assumes flow insensitive information. Supporting flow sensitivity implies being able to provide different information for the same pointer at different program points. That's actually not true. In fact existing GCC pointer analysis is flow-sensitive for all SSA pointers. Points-to information is associated with each (and only with!) SSA pointer variable via SSA_NAME_PTR_INFO. We are investigating how this can be done and this is one of the most important future works on which we seek collaboration. Unless we are able to do this, we will not be able to take advantage of the analysis. I suggest to look at the above and the disambiguators that use points-to information in tree-ssa-alias.c (and their helpers in tree-ssa-struct-alias.c). You ask for collaboration, but it's not clear what state the infrastructure is in, how complete it is, and what more needs to be done. The infrastructure is in a reasonably complete state except that (a) It is not directly useful to other analyses and optimizations for the reasons described above. (b) It uses rather naive and inefficient data structures in which sets are stored as linked lists and set operations are implemented using linear searches. (c) Some corner cases related identifying pointers need to be fixed. (d) It handles heap locations rather conservatively. We seek collaborations on (a) and (b). We have designed APIs to hide the data structures and now these are good student projects. Some details can be found at http://www.cse.iitb.ac.in/grc/gcc-workshop-12/index.php?page=projects#Projects_with_Concrete_and_Detailed. We are carrying out research on (d) and have some ideas on what needs to be done but it will be some time before the infrastructure is enhanced to include it. We are committed to doing it. I'd be interested to know how you analysis (is it interprocedural at all?) scales on programs the existing IPA PTA blows up (in both compile-time and memory-usage, usually first memory-usage ;)). Did you investigate on how to handle whole-program PTA at link-time with our distributed WHOPR mode (thus at no point a single compiler process sees all function bodies of the program but WPA phase sees the whole program callgraph). Thanks, Richard. Thanks and regards, Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
That's actually not true. In fact existing GCC pointer analysis is flow-sensitive for all SSA pointers. SSA provides partial flow sensitivity to the top level pointers. For deeper pointers, one needs to interleave SSA and points-to analysis. Besides, it cannot handle global pointers which are important at the interprocedural level. Points-to information is associated with each (and only with!) SSA pointer variable via SSA_NAME_PTR_INFO. I have enclosed a program fragment and dump fragment that shows (a) flow insensitivity of GCC's points-to analysis (b) points-to information for non-SSA variables. At the same time, yes I reckon that the points-to information is encoded in SSA_NAME_PTR_INFO and I wondered why the prefix SSA. Could you help me by explaining the dump? I suggest to look at the above and the disambiguators that use points-to information in tree-ssa-alias.c (and their helpers in tree-ssa-struct-alias.c). Since we are interested in interprocedural analysis, we would like to include global variables and pointers at all levels of indirection. Hence we do not depend on SSA variables. However, our analysis can become more efficient by separating SSA names and other names. I'd be interested to know how you analysis (is it interprocedural at all?) Yes it is interprocedural and does not introduce approximations even in the presence of recursion. Without interprocedural analysis, pointer analysis is not much useful. We use full call strings method (Sharir-Pnueli, 1981) for full flow and context sensitivity but use value based termination (Khedker-Karkare, CC 2008) that reduces the time and space by orders of magnitude (actually by a factor of a million for recursive programs). scales on programs the existing IPA PTA blows up (in both compile-time and memory-usage, usually first memory-usage ;)). You are right. As our paper shows, even with our value based termination, the blow up is significant. Hence we factor in strong liveness. This reduces the size of information dramatically without missing out on any useful information. And there are more avenues of speeding up our analysis at the algorithmic level (and I am not counting the use of better data structures which is the most obvious thing to do.) Did you investigate on how to handle whole-program PTA at link-time with our distributed WHOPR mode (thus at no point a single compiler process sees all function bodies of the program but WPA phase sees the whole program callgraph). We have use LTO framework but did not use WHOPR mode because context sensitive analysis is not possible for recursive programs by creating context independent summaries by looking at one procedure at a time. Well, one can always compromise on precision but we did not want to do that. So we use -flto-partition=none to load all procedure bodies. Thanks. Uday. - Program fragment - #include stdio.h int a, b, c, *e; int main() { if (a == b) e = c; /* statement n1 */ else e = b; /* statement n2 */ e = a; /* statement n3 */ p(); } p() { printf (%d, e); } Dump fragment. Note that flow sensitive analysis should say that - e points only to c after n1, - e points only to b after n2, - e points only to c and b before n3, - e points only a after n3. However the analysis simply says that e points to a, b, c (ap0art from ESCAPED and NONLOCAL). Points-to sets NULL = { } ANYTHING = { ANYTHING } READONLY = { READONLY } ESCAPED = { READONLY ESCAPED NONLOCAL a b c } NONLOCAL = { ESCAPED NONLOCAL } CALLUSED = { } STOREDANYTHING = { } INTEGER = { ANYTHING } e.0_1 = same as e e = { ESCAPED NONLOCAL a b c } a.1_1 = { ESCAPED NONLOCAL } a = same as a.1_1 b.2_2 = { ESCAPED NONLOCAL } b = same as b.2_2 c = { ESCAPED NONLOCAL }
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Thu, Oct 11, 2012 at 11:53 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: That's actually not true. In fact existing GCC pointer analysis is flow-sensitive for all SSA pointers. SSA provides partial flow sensitivity to the top level pointers. For deeper pointers, one needs to interleave SSA and points-to analysis. Besides, it cannot handle global pointers which are important at the interprocedural level. Yes, for global pointers, but in GIMPLE we do not have 'deeper' pointers. Unless I'm misunderstanding you. By 'deep pointers' you mean 'foo-ptr1-ptr2'? There are no memory expressions of that kind in GIMPLE. Diego.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Diego Novillo wrote, On Friday 12 October 2012 01:41 AM: On Thu, Oct 11, 2012 at 11:53 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: That's actually not true. In fact existing GCC pointer analysis is flow-sensitive for all SSA pointers. SSA provides partial flow sensitivity to the top level pointers. For deeper pointers, one needs to interleave SSA and points-to analysis. Besides, it cannot handle global pointers which are important at the interprocedural level. Yes, for global pointers, but in GIMPLE we do not have 'deeper' pointers. Unless I'm misunderstanding you. By 'deep pointers' you mean 'foo-ptr1-ptr2'? There are no memory expressions of that kind in GIMPLE. I meant pointers that are pointed to by other pointers. However I see that GCC manages to solve common cases as a consequence of other optimizations so I will need some time to create an example that shows problems with deeper pointers but my main reason for not using SSA is that it seems fine for local analysis of scalars. When we introduce globals (the example that I gave in my previous mail http://gcc.gnu.org/ml/gcc/2012-10/msg00164.html) OR have function calls to which we pass pointers, SSA is helpless. Here's an example: main() { int **p; int *a, *d; int w, x; a = w; f1(a); p = a; a = x; f2(p); d = a; return *d; } It is clear that d can only point to x and can never point to w. However, the points-to sets dumped in file .052i.pta show that d can point to w. ANYTHING = { ANYTHING } READONLY = { READONLY } ESCAPED = { ESCAPED NONLOCAL a w x } same as main.clobber NONLOCAL = { ESCAPED NONLOCAL } same as f1 STOREDANYTHING = { } INTEGER = { ANYTHING } main.clobber = { ESCAPED NONLOCAL a w x } main.use = { ESCAPED NONLOCAL a w x } same as main.clobber main.result = { ESCAPED NONLOCAL } same as D.1958_4 main.varargs = { } a = { ESCAPED NONLOCAL w x } same as a.0_1 w = { ESCAPED NONLOCAL } a.0_1 = { ESCAPED NONLOCAL w x } f1 = { ESCAPED NONLOCAL } x = { ESCAPED NONLOCAL } f2 = { ESCAPED NONLOCAL } same as f1 d_3 = { ESCAPED NONLOCAL w x } same as a.0_1 D.1958_4 = { ESCAPED NONLOCAL } The basic point I am trying to make is that SSA is primarily defined for locally scoped scalars and works excellently for them. In some cases it can be extended to handle other situations too. However, for pointers, starting from the first principles could be cleaner (and certainly more precise). Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Thu, Oct 11, 2012 at 9:41 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Diego Novillo wrote, On Friday 12 October 2012 01:41 AM: On Thu, Oct 11, 2012 at 11:53 AM, Uday P. Khedker u...@cse.iitb.ac.in wrote: That's actually not true. In fact existing GCC pointer analysis is flow-sensitive for all SSA pointers. SSA provides partial flow sensitivity to the top level pointers. For deeper pointers, one needs to interleave SSA and points-to analysis. Besides, it cannot handle global pointers which are important at the interprocedural level. Yes, for global pointers, but in GIMPLE we do not have 'deeper' pointers. Unless I'm misunderstanding you. By 'deep pointers' you mean 'foo-ptr1-ptr2'? There are no memory expressions of that kind in GIMPLE. I meant pointers that are pointed to by other pointers. However I see that GCC manages to solve common cases as a consequence of other optimizations so I will need some time to create an example that shows problems with deeper pointers but my main reason for not using SSA is that it seems fine for local analysis of scalars. When we introduce globals (the example that I gave in my previous mail http://gcc.gnu.org/ml/gcc/2012-10/msg00164.html) OR have function calls to which we pass pointers, SSA is helpless. Here's an example: main() { int **p; int *a, *d; int w, x; a = w; f1(a); p = a; a = x; f2(p); d = a; return *d; } It is clear that d can only point to x and can never point to w. I think you are wrong there. int *a1; void f1(int *a) { a1 = a; } void f2(int **p) { *p = a1; } That will change a to w after f2 is called. So it looks like your aliasing analysis does not take into account escaping like it should. This is the whole point of marking a as escaped. Maybe I missed something here though but d can point w with my functions for f1 and f2. Thanks, Andrew Pinski However, the points-to sets dumped in file .052i.pta show that d can point to w. ANYTHING = { ANYTHING } READONLY = { READONLY } ESCAPED = { ESCAPED NONLOCAL a w x } same as main.clobber NONLOCAL = { ESCAPED NONLOCAL } same as f1 STOREDANYTHING = { } INTEGER = { ANYTHING } main.clobber = { ESCAPED NONLOCAL a w x } main.use = { ESCAPED NONLOCAL a w x } same as main.clobber main.result = { ESCAPED NONLOCAL } same as D.1958_4 main.varargs = { } a = { ESCAPED NONLOCAL w x } same as a.0_1 w = { ESCAPED NONLOCAL } a.0_1 = { ESCAPED NONLOCAL w x } f1 = { ESCAPED NONLOCAL } x = { ESCAPED NONLOCAL } f2 = { ESCAPED NONLOCAL } same as f1 d_3 = { ESCAPED NONLOCAL w x } same as a.0_1 D.1958_4 = { ESCAPED NONLOCAL } The basic point I am trying to make is that SSA is primarily defined for locally scoped scalars and works excellently for them. In some cases it can be extended to handle other situations too. However, for pointers, starting from the first principles could be cleaner (and certainly more precise). Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Andrew Pinski wrote, On Friday 12 October 2012 10:29 AM: Here's an example: main() { int **p; int *a, *d; int w, x; a = w; f1(a); p = a; a = x; f2(p); d = a; return *d; } It is clear that d can only point to x and can never point to w. I think you are wrong there. int *a1; void f1(int *a) { a1 = a; } void f2(int **p) { *p = a1; } That will change a to w after f2 is called. So it looks like your aliasing analysis does not take into account escaping like it should. This is the whole point of marking a as escaped. Maybe I missed something here though but d can point w with my functions for f1 and f2. Ah, you caught me there, but I think I can escape, at least in this situation :-) The call to f1 is not central to the point I am making; I had included it only to ensure that the assignment a=w doesn't get eliminated by dead code elimination. Since you decided to hold the address of a into a1 through function f1, let me eliminate the call to f1 and make the assignment a=w live in some other way. Here's the changed code: main() { int **p; int *a, *d; int w, x; d = x; a = w; if (f1()) { p = a; a = x; f2(p); d = a; } return *d + *a; } Now when f2 is called, a definitely does not point to w. Hence d should not point to w. And yet, the dump shows that d continue to point to w. In any case, your point about escaping variables strengthens my point about inappropriateness of SSA for pointer analysis, although not through this example. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
decided to hold the address of a into a1 through function f1, let me eliminate the call to f1 and make the assignment a=w live in some other way. Here's the changed code: Please read it as eliminate the call passing a to f1. Uday.
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Thu, Oct 11, 2012 at 10:41 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote: Andrew Pinski wrote, On Friday 12 October 2012 10:29 AM: Here's an example: main() { int **p; int *a, *d; int w, x; a = w; f1(a); p = a; a = x; f2(p); d = a; return *d; } It is clear that d can only point to x and can never point to w. I think you are wrong there. int *a1; void f1(int *a) { a1 = a; } void f2(int **p) { *p = a1; } That will change a to w after f2 is called. So it looks like your aliasing analysis does not take into account escaping like it should. This is the whole point of marking a as escaped. Maybe I missed something here though but d can point w with my functions for f1 and f2. Ah, you caught me there, but I think I can escape, at least in this situation :-) The call to f1 is not central to the point I am making; I had included it only to ensure that the assignment a=w doesn't get eliminated by dead code elimination. Since you decided to hold the address of a into a1 through function f1, let me eliminate the call to f1 and make the assignment a=w live in some other way. Here's the changed code: main() { int **p; int *a, *d; int w, x; d = x; a = w; if (f1()) { p = a; a = x; f2(p); d = a; } return *d + *a; } Now when f2 is called, a definitely does not point to w. Hence d should not point to w. And yet, the dump shows that d continue to point to w. In any case, your point about escaping variables strengthens my point about inappropriateness of SSA for pointer analysis, although not through this example. Except the problem here is just about what f1 clobbers. Since a has not escaped by the time f1 is called, f1 cannot clobber a (or d). http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23384 for reference on why GCC gets this incorrect. GCC does not have a flow sensitive clobber list yet. Uday.
Fully flow and context sensitive points-to analysis in GCC 4.6.0
We have designed and implemented a fully flow and context sensitive points-to analysis in gcc-4.6.0. For simplicity, we have made a dynamic plugin available at http://www.cse.iitb.ac.in/grc/index.php?page=l-fcpa. This page also provides an overview of the method, and links to the paper, slides, and instructions on how to use the plugin. Our method questions the conventional wisdom that precise flow and context sensitive pointer information is prohibitively large and hence we cannot hope to compute it efficiently. We show that the actual usable information is rather small and sparse and hence an increase in precision actually improves the efficiency too, rather than worsen it. Needless to say, precise pointer information is critical for the effectiveness of almost all analyses and optimizations in any compiler. Now that we have some ray of hope of precise points-to analysis in GCC, we would like to invite collaboration from like minded people. There is a lot of work that needs to be done further; some details of future work are available on the given URL and we will be happy to provide more information to the interested people. Looking forward to hearing from people who would like to contribute to this important project. Thanks and regards, Uday Khedker. -- Dr. Uday Khedker Professor Department of Computer Science Engg. IIT Bombay, Powai, Mumbai 400 076, India. Email : u...@cse.iitb.ac.in Homepage: http://www.cse.iitb.ac.in/~uday Phone : Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct) Res. -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct) -- -- Dr. Uday Khedker Professor Department of Computer Science Engg. IIT Bombay, Powai, Mumbai 400 076, India. Email : u...@cse.iitb.ac.in Homepage: http://www.cse.iitb.ac.in/~uday Phone : Office -91 (22) 2572 2545 x 7717, 91 (22) 2576 7717 (Direct) Res. -91 (22) 2572 2545 x 8717, 91 (22) 2576 8717 (Direct)
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
On Wed, Oct 10, 2012 at 1:56 PM, Uday P. Khedker u...@cse.iitb.ac.in wrote: We have designed and implemented a fully flow and context sensitive points-to analysis in gcc-4.6.0. For simplicity, we have made a dynamic plugin available at http://www.cse.iitb.ac.in/grc/index.php?page=l-fcpa. This page also provides an overview of the method, and links to the paper, slides, and instructions on how to use the plugin. Our method questions the conventional wisdom that precise flow and context sensitive pointer information is prohibitively large and hence we cannot hope to compute it efficiently. We show that the actual usable information is rather small and sparse and hence an increase in precision actually improves the efficiency too, rather than worsen it. Needless to say, precise pointer information is critical for the effectiveness of almost all analyses and optimizations in any compiler. Now that we have some ray of hope of precise points-to analysis in GCC, we would like to invite collaboration from like minded people. There is a lot of work that needs to be done further; some details of future work are available on the given URL and we will be happy to provide more information to the interested people. Uday, This is great progress. If I understand the experiments, your implementtion has very small cost to perform the analysis, at least for the SPEC benchmarks you are testing. Have you connected the analysis to any optimizations? Is there any improvement in performance on SPEC CPU or other applications? You ask for collaboration, but it's not clear what state the infrastructure is in, how complete it is, and what more needs to be done. Thanks, David
Re: Fully flow and context sensitive points-to analysis in GCC 4.6.0
Hi David, This is great progress. Thanks. If I understand the experiments, your implementtion has very small cost to perform the analysis, at least for the SPEC benchmarks you are testing. Have you connected the analysis to any optimizations? Is there any improvement in performance on SPEC CPU or other applications? Unfortunately we could not do this as it requires changing the way the pointer information is made available to other passes. Right now, it is encoded in the TREE nodes of variables which make is same at all program points. In other words, GCC, by definition assumes flow insensitive information. Supporting flow sensitivity implies being able to provide different information for the same pointer at different program points. We are investigating how this can be done and this is one of the most important future works on which we seek collaboration. Unless we are able to do this, we will not be able to take advantage of the analysis. You ask for collaboration, but it's not clear what state the infrastructure is in, how complete it is, and what more needs to be done. The infrastructure is in a reasonably complete state except that (a) It is not directly useful to other analyses and optimizations for the reasons described above. (b) It uses rather naive and inefficient data structures in which sets are stored as linked lists and set operations are implemented using linear searches. (c) Some corner cases related identifying pointers need to be fixed. (d) It handles heap locations rather conservatively. We seek collaborations on (a) and (b). We have designed APIs to hide the data structures and now these are good student projects. Some details can be found at http://www.cse.iitb.ac.in/grc/gcc-workshop-12/index.php?page=projects#Projects_with_Concrete_and_Detailed. We are carrying out research on (d) and have some ideas on what needs to be done but it will be some time before the infrastructure is enhanced to include it. We are committed to doing it. Thanks and regards, Uday.