Re: Great example of why everything is a tree sucks
On 13/11/13 17:32, Jeff Law wrote: On 11/13/13 03:15, Richard Biener wrote: You know - 'tree's were a design decision (well, just my guess - I wasn't around 25 years ago ...). They are a perfect match to represent an AST. So I'd say whoever introduced that middle-end between the FEs AST and RTL was at fault :P (luckily I wasn't around either ... ;)) Yea, you can blame Diego, Andrew and myself largely for the decision to re-use trees in gimple. Reusing trees was a conscious decision made in large part because doing something different for the middle end would have been more work than we could justify at the time. We would have had to do all the things we're doing now, back then to get it right. The only difference is now we've got a lot more gimple bits that know about trees than we did early in the early gimple/ssa days. I don't think blame is the right word, it was a different era, we now have more ram than could be imagined back then and as you say at the time it made sense. I like to think there are a few eras, you had the start, single processor, limited speed, limited ram, then processors got a lot faster (the mhz wars lasted a long time) but still not much ram, then today, buying ram by the 2gb stick usually in pairs is the lowest form you can commonly get (1gb sticks are rare) with processors able to do a huge amount very quickly, parallel stuff doesn't really apply to GCC. Anyway that tangent done with, each era changes software, you started with the undefined behavior wizards/gods then came the era where storing huge programs could actually be done, the backing store wasn't measured in kb any more, and so forth. I fear we have entered the era of crap, software doing the same or less with more resources, the era of the cloud (The next version of eclipse is browser based, with the motto code.anywhere=true;, part of me died) and HTML 5 and other such nonsense. Caja (Nautilus fork within MATE) implemented copying files as a python script, which had a memory leaking endless loop, I do not know why) BUT some stuff continues to get better, GCC is making some pretty huge/amazing changes right now (the tree is a great example) this is an exciting time and I really hope I get to be a part of it. Now let us, as Eric Raymond would say, plan for the future, for it will be here sooner than we think. Alec The point of this mail is to hopefully get you guys pondering on the future, I really enjoy reading these mailing lists and watching the annual tub/pot (I forget...) conference videos you guys make, while my motives are selfish an ounce of prevention is worth a pound of cure :P We did the best we could, now it's time to correct that problem and make sense out of our datastructures. Jeff
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 5:50 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Nov 13, 2013 at 04:43:45PM +, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Jeff Law wrote: On 11/13/13 08:59, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Steven Bosscher wrote: Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. From a C perspective, a useful change that would facilitate moving the IR away from tree would be moving most of fold to operate on GIMPLE instead of on trees (that is, rewriting it as GIMPLE optimizations; I don't think this can be a mechanical refactoring). [ ... ] Yes. That is most certainly part of the plan. Andrew, myself and others have discussed it extensively. It's a lot of work, but getting the tree folder disentangled from the gimple optimizers is definitely on the hit list. Note that *removing* things from the tree folder (and convert.c, and shorten_compare, and shorten_binary_op, and any other such fold-like things) once they've been moved to GIMPLE is a critical part of making it easier to clean up front-end IR; having them in both places won't help. Richard B. had the idea of generating parts of fold-const and corresponding GIMPLE ops from some meta definition file. Yeah, I hope to tackle the fold-const.c vs. GIMPLE mess during stage3 when everyone else is fixing bugs. (haha) Richard. Note, in many cases, removing optimizations from fold-const.c leads to regressions on code assuming something is folded (especially in initializers). Sure, that is all typically undocumented GNU extensions, but we had several such problems in the past already. Jakub
Re: Great example of why everything is a tree sucks
On Tue, Nov 12, 2013 at 9:52 PM, Diego Novillo dnovi...@google.com wrote: On Tue, Nov 12, 2013 at 3:35 PM, Jakub Jelinek ja...@redhat.com wrote: Note that we have tons of code which accept either objects or types, both in the frontends and in the middle-end, so changing TREE_TYPE from tree to something else is definitely non-trivial. Well, sure it's hard. This is the whole point behind Andrew's refactoring project: setting up the groundwork for this kind of conversion to be possible. Well, he doesn't even consider that the exact same tree rant applies to frontend code. Thus fixing trees would be far better as you'd win for both frontend and middle-end code! Fix the trees! Don't invent new ones! Richard. The software engineering atrocities that we have committed in the code base are going to take a few iterations to fix. But fix them, we must. I am convinced that this is the only way for GCC to avoid untimely oblivion; and allow it to evolve in ways that are now hard or impossible to implement. Diego.
Re: Great example of why everything is a tree sucks
On Tue, Nov 12, 2013 at 10:25 PM, Jeff Law l...@redhat.com wrote: On 11/12/13 13:35, Jakub Jelinek wrote: On Tue, Nov 12, 2013 at 12:59:47PM -0700, Jeff Law wrote: So I lost something like 3 hrs last night due to writing a hunk of code like this if (INTEGRAL_TYPE_P (gimple_assign_lhs (stmt))) INTEGRAL_TYPE_P is a macro, which accepts everything, just adding a TYPE_CHECK to that macro would be sufficient to catch that Yes, I know full well that I could hack up INTEGRAL_TYPE_P to detect this case and my brain damage would have been caught via the check sometime during building the runtime libraries or the stage2 build. My point is the mere need to hack up INTEGRAL_TYPE_P in that way is a result of a fundamental misdesign of the tree structures. If the structures were properly designed what I did would have been flagged as a compile error. It's that fundamental mis-design that we're trying to correct now with the work from Andrew, David others. You know - 'tree's were a design decision (well, just my guess - I wasn't around 25 years ago ...). They are a perfect match to represent an AST. So I'd say whoever introduced that middle-end between the FEs AST and RTL was at fault :P (luckily I wasn't around either ... ;)) Still splitting 'tree's into a few separate classes is not hard. It's just work - and in the end even the frontends will benefit. Oh, and I believe it is a project that has a much higher success rate than trying to replace trees with something else in the GIMPLE and RTL middle-end only and have that cooperate sanely with the rest of the compiler. Something along 1 man year for the first with a 75% success chance against 6 man years for the second with a 20% success chance. And the nice thing is that the first can be done incrementally (we _are_ already in the process of that transition - see the changes done to 'tree' during the last 6 years). And the other nice thing is that even non-100% success will end up in something better than we have now. With the second project it's a all-or-nothing (or rather all-or-just-uglification). Richard. Jeff
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 11:10 AM, Richard Biener wrote: Well, he doesn't even consider that the exact same tree rant applies to frontend code. That's not entirely true, either. Most front ends already use their own IL in the parser (only C++ uses 'tree' for everything). What worries me most, is that we may end up with: FE-IL = 'tree' = gimple where 'tree' is just a memory-consuming, complicated intermediate step between the front ends and the middle end. I don't think that's the interface we want. Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. Ciao! Steven
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 11:15 AM, Richard Biener wrote: You know - 'tree's were a design decision (well, just my guess - I wasn't around 25 years ago ...). They are a perfect match to represent an AST. I'd argue against that, but perhaps some other time, in a different thread... So I'd say whoever introduced that middle-end between the FEs AST and RTL was at fault :P (luckily I wasn't around either ... ;)) A brief history of 'tree' for you, then! :-) Originally 'tree' wasn't a very long-living structure. The front ends would build some declaration or expression as a 'tree', expand it to RTL immediately, and free the 'tree'. Then we got functions-as-trees, at first for a C++ front-end specific inliner and later for a generic tree inliner for all front ends (you can still see the C++ front-end specific code in tree-inline.c). Next up was tree-ssa as an experiment. Turned out perhaps a somewhat bigger project than initially anticipated, but the idea was: Hey, we have functions as 'tree' now, let's see if we can do some code transformations on it! I don't think it was a concious design decision at the time to make 'tree' the new IL for a complete new optimization framework. What started as convenient (it's there, let's do something with it) is now considered inappropriate use of FE-trees in the middle end. The funny thing is that 'tree' also doesn't really work for the front ends because many language specific things are more easily expressed in front-end specific types and data structures. Remember the discussions about the C++ front end AST being too far away from the source language? That's still true, and one reason for it is 'tree'. The Ada, Fortran, and even C front ends use front-end specific data structures for most parsed entities and only produce 'tree' from it in the process of generatic GENERIC. Still splitting 'tree's into a few separate classes is not hard. It's just work - and in the end even the frontends will benefit. Oh, and I believe it is a project that has a much higher success rate than trying to replace trees with something else in the GIMPLE and RTL middle-end only and have that cooperate sanely with the rest of the compiler. Something along 1 man year for the first with a 75% success chance against 6 man years for the second with a 20% success chance. And the nice thing is that the first can be done incrementally (we _are_ already in the process of that transition - see the changes done to 'tree' during the last 6 years). And the other nice thing is that even non-100% success will end up in something better than we have now. With the second project it's a all-or-nothing (or rather all-or-just-uglification). Agreed. Obviously something has to be done, but I've got the feeling we may be attacking the problem from the wrong end... Ciao! Steven
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 11:49:56AM +0100, Steven Bosscher wrote: source language? That's still true, and one reason for it is 'tree'. The Ada, Fortran, and even C front ends use front-end specific data structures for most parsed entities and only produce 'tree' from it in the process of generatic GENERIC. You are clearly misreading what the C front end does then. It doesn't have any IL other than trees, and uses trees for everything from the start. And, the problems with the FEs you've talked about aren't caused by using trees, but by folding stuff too early. It got somewhat better in the C FE, but the C++ FE still folds things too early everywhere. Jakub
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 11:37 AM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Nov 13, 2013 at 11:10 AM, Richard Biener wrote: Well, he doesn't even consider that the exact same tree rant applies to frontend code. That's not entirely true, either. Most front ends already use their own IL in the parser (only C++ uses 'tree' for everything). What worries me most, is that we may end up with: FE-IL = 'tree' = gimple where 'tree' is just a memory-consuming, complicated intermediate step between the front ends and the middle end. I don't think that's the interface we want. True. The above applies to Fortran and Ada, and maybe also to Go and Java. But yes, as far as I understand Andrews plan we'd still require FE-IL = GENERIC = gimple where GENERIC is 'tree' based and for the C-family frontends FE-IL is already GENERIC with some added tree codes that are lowered during GENERICIZATION (well, not entirely true - the gimplifier is fed not with GENERIC but with GENERIC+ and uses langhooks to handle the '+' part ... ugh). Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. Right. The question is what we'd like the end result to look like? FE-IL = GIMPLE I suppose. That means gimplification will be the frontends job. The current hand-off point for IL - cgraph_finalize_function - would get a function with GIMPLE IL already. It would be reasonably easy to support both - GENERIC and GIMPLE IL at this point but I'm not sure how difficult converting a frontend to have its own gimplifier would be. If anybody is eager to try I'd try on Fortran ;) Note that this would somewhat break the fact that now frontends see all of the TUs IL before gimplification starts, so we could also get the GIMPLE IL at finalize_compilation_unit time, making gimplification a langhook that gets us at the IL. Eric, would emitting GIMPLE from gigi make that a lot more complicated? That is, would you prefer to have an even higher-level early GIMPLE (considering stuff like TARGET_EXPR and WITH_CLEANUP_EXPR, etc.)? Richard. Ciao! Steven
Re: Great example of why everything is a tree sucks
Eric, would emitting GIMPLE from gigi make that a lot more complicated? That is, would you prefer to have an even higher-level early GIMPLE (considering stuff like TARGET_EXPR and WITH_CLEANUP_EXPR, etc.)? This would mean privatizing in gigi all the machinery needed to support types with variable size scattered over the middle-end (tree.c, fold-const.c and stor-layout.c) unless you want to make them first-class citizens in GIMPLE, which is very unlikely I presume. Not undoable, but IMO that would be a step backwards from GENERIC in which you can express a lot a things, and not only the semantics of C and its close relatives. -- Eric Botcazou
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 5:10 AM, Richard Biener richard.guent...@gmail.com wrote: Thus fixing trees would be far better as you'd win for both frontend and middle-end code! For FEs, sure. I agree. But right now the focus is on fixing the interface between FEs and the ME. One thing at a time. Finally separating FE ASTs from the ME will give us some of the modularity aspects we are looking for. Similar work would be beneficial on the FE side, as well. Particularly those using 'tree'. Front ends using their own data structures for ASTs will only need to worry about emitting GIMPLE. Front ends have their own share of issues. If we can at least isolate them to the particular FE, we will have gained something. I also believe that g++ needs a lot of similar work. But, again, one thing at a time. Diego.
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 5:15 AM, Richard Biener richard.guent...@gmail.com wrote: You know - 'tree's were a design decision (well, just my guess - I wasn't around 25 years ago ...). They are a perfect match to represent an AST. Yes, of course. It may have been the right decision at the time. But design is a dynamic entity. It moves, it evolves. Few decisions are ever correct in absolute terms. They are the best compromise at the time. Revising them from time to time is useful. Still splitting 'tree's into a few separate classes is not hard. It's just work - and in the end even the frontends will benefit. Oh, and I believe it is a project that has a much higher success rate than trying to replace trees with something else in the GIMPLE and RTL middle-end only and have that cooperate sanely with the rest of the compiler. Co-opting the same data structure for two different needs is the design decision that we are trying to move away from. It ends up being the bloated mess that we have today. Diego.
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 5:37 AM, Steven Bosscher stevenb@gmail.com wrote: Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. Each FE will be free to choose their own data structure for their ASTs. In the case of the C based FEs, their AST *is* tree. Once the FEs don't need to emit symbols and types using 'tree', they will be able to use the more streamlined gimple variants. I don't know what the gimple variants for symbols and types will look like. They will need to have enough attributes to describe the needs of debug info, but they will not need anything related to semantic analysis and codegen for the source language. So, there will be shared attributes, but they will be on completely separate modules and type systems. That orthogonality will allow the FEs to add more features without impacting the rest of the compiler. Diego.
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 1:07 PM, Eric Botcazou ebotca...@adacore.com wrote: Eric, would emitting GIMPLE from gigi make that a lot more complicated? That is, would you prefer to have an even higher-level early GIMPLE (considering stuff like TARGET_EXPR and WITH_CLEANUP_EXPR, etc.)? This would mean privatizing in gigi all the machinery needed to support types with variable size scattered over the middle-end (tree.c, fold-const.c and stor-layout.c) unless you want to make them first-class citizens in GIMPLE, which is very unlikely I presume. Not undoable, but IMO that would be a step backwards from GENERIC in which you can express a lot a things, and not only the semantics of C and its close relatives. Yeah, that's why I am asking. I suppose you are refering to gimplify_type_sizes and the required lowering of ARRAY_REFs and COMPONENT_REFs? Yes, I'd not make them first class citizens in the GIMPLE that the optimizers see but I'm fine with having high-GIMPLE support for them, doing a lowering pass on GIMPLE. For a partial transition (that is, have frontends emitting GENERIC and frontends emitting GIMPLE) we'd split out these pieces into a 2nd gimplification stage and call its input very-high-GIMPLE ;) Do you see any benefits in emitting high GIMPLE (besides eventually some minor compile-time and memory-usage improvements)? That is, would some things be easier to do? [you'd also lose any folding that generating GENERIC would have done] Richard. -- Eric Botcazou
Re: Great example of why everything is a tree sucks
On Wed, 13 Nov 2013, Eric Botcazou wrote: Eric, would emitting GIMPLE from gigi make that a lot more complicated? That is, would you prefer to have an even higher-level early GIMPLE (considering stuff like TARGET_EXPR and WITH_CLEANUP_EXPR, etc.)? This would mean privatizing in gigi all the machinery needed to support types with variable size scattered over the middle-end (tree.c, fold-const.c and stor-layout.c) unless you want to make them first-class citizens in GIMPLE, which is very unlikely I presume. Not undoable, but IMO that would be a step backwards from GENERIC in which you can express a lot a things, and not only the semantics of C and its close relatives. C has already (GCC 4.5 and later) largely moved to using its own logic for variable size types, to ensure that sizes get evaluated at exactly the right time according to C language semantics; I think the only generic pieces it relies on are the lowering of VLA objects to explicit stack allocation and deallocation. Other places where GENERIC provides something that no doubt seemed convenient originally, but is less useful when you want to ensure things are evaluated at exactly the time implied by language semantics, include SAVE_EXPRs and pre/post increment/decrement. The generic gimplification logic for the latter caused issues in the past when it turned out not to match C11 semantics; C now explicitly creates the required temporaries in the case of increment / decrement / compound assignment of atomic types, and will probably need to move to using more of its own logic for compound assignment in general to fix bug 58943. I'd be happy for front ends to move to doing all these things themselves, rather than trying to define GENERIC in a way that works for every language's requirements on when things are evaluated and when storage is allocated / deallocated. -- Joseph S. Myers jos...@codesourcery.com
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 8:25 AM, Joseph S. Myers jos...@codesourcery.com wrote: assignment in general to fix bug 58943. I'd be happy for front ends to move to doing all these things themselves, rather than trying to define GENERIC in a way that works for every language's requirements on when things are evaluated and when storage is allocated / deallocated. I agree. I would move away from trying to give FEs a target in the form of another high-level AST. When doing codegen, it would be preferable if they express all the needed semantics in GIMPLE directly. Diego.
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 5:30 AM, Diego Novillo dnovi...@google.com wrote: On Wed, Nov 13, 2013 at 8:25 AM, Joseph S. Myers jos...@codesourcery.com wrote: assignment in general to fix bug 58943. I'd be happy for front ends to move to doing all these things themselves, rather than trying to define GENERIC in a way that works for every language's requirements on when things are evaluated and when storage is allocated / deallocated. I agree. I would move away from trying to give FEs a target in the form of another high-level AST. When doing codegen, it would be preferable if they express all the needed semantics in GIMPLE directly. That is what I wanted to do in the Go frontend, but unfortunately the current frontend interface actually requires me to generate trees for statements. These trees are then immediately gimplified. It's a waste of time and space. (I haven't tried to fix this because there are unfortunately even bigger wastes of space in the Go frontend.) Ian
Re: Great example of why everything is a tree sucks
On Wed, 13 Nov 2013, Steven Bosscher wrote: Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. From a C perspective, a useful change that would facilitate moving the IR away from tree would be moving most of fold to operate on GIMPLE instead of on trees (that is, rewriting it as GIMPLE optimizations; I don't think this can be a mechanical refactoring). There are at least the following types of folding that can get called from the C front end: (a) Folding required for standard language semantics of constant expressions. (b) Other trivial folding (e.g. || ?: with constant controlling expression, even if the ignored half isn't valid in ISO C constant expressions; comma operators whose LHS has no side effects; __builtin function calls with constant operands; differences of addresses within an object with static storage duration). Needed in many cases for GNU C semantics (implementing ISO C constant expression semantics showed that many cases did in fact need folding, with just pedwarns-if-pedantic if something should be a constant expression but isn't in ISO C terms, because code bases such as the Linux kernel used such not-ISO-constant expressions in places needing constant expressions). Needed in some cases for ISO C semantics in the standard library (e.g. __builtin_nanf () is what we provide for a library to define NAN, which must be usable in static initializers). (c) More complicated optimization transforms that aren't really constant folding but are done by fold-const.c. (d) Like (c), but done in convert.c (see convert_to_real for example). (e) Like (c), but done in the front end (including c-family/ code, see shorten_compare). Types (c), (d) and (e) should become GIMPLE optimizations (everything in (a) and (b) should *also* be done on GIMPLE when GIMPLE optimizations result in operands being constant, but I hope that's already the case). That would massively reduce the amount of folding code called by the C front end, so making what's left much easier to reimplement on a better IR (the reimplementation would still need to call into language-independent code e.g. for built-in functions with constant operands, but I don't think that's a problem to handle with different IR in front end and middle end). However: (i) Distinguishing (c), (d) and (e) from (b) can be tricky; GNU/Linux distribution rebuilds would be helpful in making sure changes didn't remove too much from fold. (ii) Some places where the C front end calls c_fully_fold before complete expressions have been built up are because warnings rely on transformations carried out by fold in order to avoid false positives (i.e. there were failures of warning testcases unless the folding was done). I think this is generally about warnings relating to implicit conversions. So to the extent that such transformations would otherwise fall in (c), (d) or (e), maybe the GIMPLE needs annotating in some way if this conversion can change the numerical value of its operand, give this warning, with the warnings then being output after enough optimization has taken place. -- Joseph S. Myers jos...@codesourcery.com
Re: Great example of why everything is a tree sucks
On 11/13/13 08:59, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Steven Bosscher wrote: Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. From a C perspective, a useful change that would facilitate moving the IR away from tree would be moving most of fold to operate on GIMPLE instead of on trees (that is, rewriting it as GIMPLE optimizations; I don't think this can be a mechanical refactoring). [ ... ] Yes. That is most certainly part of the plan. Andrew, myself and others have discussed it extensively. It's a lot of work, but getting the tree folder disentangled from the gimple optimizers is definitely on the hit list. jeff
Re: Great example of why everything is a tree sucks
On Wed, 13 Nov 2013, Jeff Law wrote: On 11/13/13 08:59, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Steven Bosscher wrote: Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. From a C perspective, a useful change that would facilitate moving the IR away from tree would be moving most of fold to operate on GIMPLE instead of on trees (that is, rewriting it as GIMPLE optimizations; I don't think this can be a mechanical refactoring). [ ... ] Yes. That is most certainly part of the plan. Andrew, myself and others have discussed it extensively. It's a lot of work, but getting the tree folder disentangled from the gimple optimizers is definitely on the hit list. Note that *removing* things from the tree folder (and convert.c, and shorten_compare, and shorten_binary_op, and any other such fold-like things) once they've been moved to GIMPLE is a critical part of making it easier to clean up front-end IR; having them in both places won't help. -- Joseph S. Myers jos...@codesourcery.com
Re: Great example of why everything is a tree sucks
On Wed, Nov 13, 2013 at 04:43:45PM +, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Jeff Law wrote: On 11/13/13 08:59, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Steven Bosscher wrote: Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. From a C perspective, a useful change that would facilitate moving the IR away from tree would be moving most of fold to operate on GIMPLE instead of on trees (that is, rewriting it as GIMPLE optimizations; I don't think this can be a mechanical refactoring). [ ... ] Yes. That is most certainly part of the plan. Andrew, myself and others have discussed it extensively. It's a lot of work, but getting the tree folder disentangled from the gimple optimizers is definitely on the hit list. Note that *removing* things from the tree folder (and convert.c, and shorten_compare, and shorten_binary_op, and any other such fold-like things) once they've been moved to GIMPLE is a critical part of making it easier to clean up front-end IR; having them in both places won't help. Richard B. had the idea of generating parts of fold-const and corresponding GIMPLE ops from some meta definition file. Note, in many cases, removing optimizations from fold-const.c leads to regressions on code assuming something is folded (especially in initializers). Sure, that is all typically undocumented GNU extensions, but we had several such problems in the past already. Jakub
Re: Great example of why everything is a tree sucks
On 11/13/13 09:43, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Jeff Law wrote: On 11/13/13 08:59, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Steven Bosscher wrote: Really the best place to start IMHO would be to evict 'tree' from the front ends. That would really be a step towards making the front ends independent of the rest of the compiler, and it would simplify changes towards static 'tree' types. From a C perspective, a useful change that would facilitate moving the IR away from tree would be moving most of fold to operate on GIMPLE instead of on trees (that is, rewriting it as GIMPLE optimizations; I don't think this can be a mechanical refactoring). [ ... ] Yes. That is most certainly part of the plan. Andrew, myself and others have discussed it extensively. It's a lot of work, but getting the tree folder disentangled from the gimple optimizers is definitely on the hit list. Note that *removing* things from the tree folder (and convert.c, and shorten_compare, and shorten_binary_op, and any other such fold-like things) once they've been moved to GIMPLE is a critical part of making it easier to clean up front-end IR; having them in both places won't help. Yup, absolutely. Kai had the idea that this might make a good GSOC project. Take a transformation in fold-const.c, move it into gimple. Evaluate over body of code to verify we haven't regressed, if no regressions, submit. Repeat until GSOC time is up. It's not the most sexy work, but it comes out in bite sized hunks and anything they get done stands on its own and helps the overall goal of reducing the amount of unnecessary folding done in fold-const.c jeff
Re: Great example of why everything is a tree sucks
On Wed, 13 Nov 2013, Jakub Jelinek wrote: Note, in many cases, removing optimizations from fold-const.c leads to regressions on code assuming something is folded (especially in initializers). Sure, that is all typically undocumented GNU extensions, but we had several such problems in the past already. That's what I said about distinguishing (c), (d) and (e) from (b), with distribution rebuilds as a way of testing whether there is an issue with removing some particular optimizations. (In the static initializer case there's the option of the front end generating GIMPLE code for the initializer and telling the middle end that it should (ped)warn if it ends up being optimized to a constant, error if it doesn't. But I rather hope that isn't needed.) -- Joseph S. Myers jos...@codesourcery.com
Re: Great example of why everything is a tree sucks
Yeah, that's why I am asking. I suppose you are refering to gimplify_type_sizes and the required lowering of ARRAY_REFs and COMPONENT_REFs? Yes, I'd not make them first class citizens in the GIMPLE that the optimizers see but I'm fine with having high-GIMPLE support for them, doing a lowering pass on GIMPLE. Yes, plus the machinery in tree.c and stor-layout.c for types with self- referential size. Do you see any benefits in emitting high GIMPLE (besides eventually some minor compile-time and memory-usage improvements)? That is, would some things be easier to do? [you'd also lose any folding that generating GENERIC would have done] We cannot lose the folding on offset/size expressions of types in Ada, otherwise Ada software making heavy use of types wouldn't compile anymore. That's why I also mentioned fold-const.c, all the size_* routines and their dependencies would need to be privatized as well. I can think of marginal benefits like more direct transfert of optimization hints on loops for example, but nothing really decisive. -- Eric Botcazou
Re: Great example of why everything is a tree sucks
Other places where GENERIC provides something that no doubt seemed convenient originally, but is less useful when you want to ensure things are evaluated at exactly the time implied by language semantics, include SAVE_EXPRs and pre/post increment/decrement. I'm a little skeptical here because Ada is second to no other languages (at least those supported by GCC) when it comes to precise rules of evaluation of expressions or elaboration of types; of course GENERIC knows nothing about of them and, precisely because of that, it's possible to support a significant range of semantics with it, including that of Ada. I'd be happy for front ends to move to doing all these things themselves, rather than trying to define GENERIC in a way that works for every language's requirements on when things are evaluated and when storage is allocated / deallocated. I'm under the impression that the view is skewed here because the C family of compilers essentially build their ASTs in GENERIC, so they give it properties that it is not meant to have. Instead it's a very flexible IL framework where you can express a lot of things, unlike GIMPLE which is much more narrow. -- Eric Botcazou
Re: Great example of why everything is a tree sucks
On 11/13/13 03:15, Richard Biener wrote: You know - 'tree's were a design decision (well, just my guess - I wasn't around 25 years ago ...). They are a perfect match to represent an AST. So I'd say whoever introduced that middle-end between the FEs AST and RTL was at fault :P (luckily I wasn't around either ... ;)) Yea, you can blame Diego, Andrew and myself largely for the decision to re-use trees in gimple. Reusing trees was a conscious decision made in large part because doing something different for the middle end would have been more work than we could justify at the time. We would have had to do all the things we're doing now, back then to get it right. The only difference is now we've got a lot more gimple bits that know about trees than we did early in the early gimple/ssa days. We did the best we could, now it's time to correct that problem and make sense out of our datastructures. Jeff
Re: Great example of why everything is a tree sucks
Richard Biener richard.guent...@gmail.com writes: On Wed, Nov 13, 2013 at 11:37 AM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Nov 13, 2013 at 11:10 AM, Richard Biener wrote: Well, he doesn't even consider that the exact same tree rant applies to frontend code. That's not entirely true, either. Most front ends already use their own IL in the parser (only C++ uses 'tree' for everything). What worries me most, is that we may end up with: FE-IL = 'tree' = gimple where 'tree' is just a memory-consuming, complicated intermediate step between the front ends and the middle end. I don't think that's the interface we want. True. The above applies to Fortran and Ada, and maybe also to Go and Java. just for completeness this also applies to Modula-2 which uses the technique of double book keeping. Not sure if this is totally relevant but gm2 needs (would like :-) the ability to create a SET_TYPE which maps onto appropriate debugging type info. regards, Gaius
Re: Great example of why everything is a tree sucks
On Wed, 13 Nov 2013, Gaius Mulley wrote: just for completeness this also applies to Modula-2 which uses the technique of double book keeping. Not sure if this is totally relevant but gm2 needs (would like :-) the ability to create a SET_TYPE which maps onto appropriate debugging type info. Note we removed SET_TYPE in 2004 for lack of in-tree uses (there was a suspicion it might have been for Pascal, but GNU Pascal is now a moribund project (largely I think because of the combination of (a) being closely tied to GCC internals through use of trees as front-end internal representation and (b) not being developed in the GCC repository but externally with attempts to support multiple GCC versions from one GNU Pascal version, meaning lots of effort needed outside the GCC community to update it to new GCC versions - also (c) there's another more actively developed GPL Pascal compiler, Free Pascal, albeit for different versions of the Pascal language). In general, for GCC development to consider requirements of your front end or back end, getting it into the GCC repository and developing it there is strongly recommended. -- Joseph S. Myers jos...@codesourcery.com
Re: Great example of why everything is a tree sucks
On 11/13/13 11:30, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Gaius Mulley wrote: just for completeness this also applies to Modula-2 which uses the technique of double book keeping. Not sure if this is totally relevant but gm2 needs (would like :-) the ability to create a SET_TYPE which maps onto appropriate debugging type info. Note we removed SET_TYPE in 2004 for lack of in-tree uses (there was a suspicion it might have been for Pascal, but GNU Pascal is now a moribund project (largely I think because of the combination of (a) being closely tied to GCC internals through use of trees as front-end internal representation and (b) not being developed in the GCC repository but externally with attempts to support multiple GCC versions from one GNU Pascal version, meaning lots of effort needed outside the GCC community to update it to new GCC versions - also (c) there's another more actively developed GPL Pascal compiler, Free Pascal, albeit for different versions of the Pascal language). In general, for GCC development to consider requirements of your front end or back end, getting it into the GCC repository and developing it there is strongly recommended. Sadly, I tried multiple times in the late 90s to bring the folks going GNU Pascal development into the GCC project without any success. Eventually I have up. jeff
Re: Great example of why everything is a tree sucks
Jeff Law l...@redhat.com writes: On 11/13/13 11:30, Joseph S. Myers wrote: On Wed, 13 Nov 2013, Gaius Mulley wrote: just for completeness this also applies to Modula-2 which uses the technique of double book keeping. Not sure if this is totally relevant but gm2 needs (would like :-) the ability to create a SET_TYPE which maps onto appropriate debugging type info. Note we removed SET_TYPE in 2004 for lack of in-tree uses (there was a suspicion it might have been for Pascal, but GNU Pascal is now a moribund project (largely I think because of the combination of (a) being closely tied to GCC internals through use of trees as front-end internal representation and (b) not being developed in the GCC repository but externally with attempts to support multiple GCC versions from one GNU Pascal version, meaning lots of effort needed outside the GCC community to update it to new GCC versions - also (c) there's another more actively developed GPL Pascal compiler, Free Pascal, albeit for different versions of the Pascal language). In general, for GCC development to consider requirements of your front end or back end, getting it into the GCC repository and developing it there is strongly recommended. Sadly, I tried multiple times in the late 90s to bring the folks going GNU Pascal development into the GCC project without any success. Eventually I have up. I'd be delighted to see gm2 in the gcc repository. The gm2 repository is currently in git format (changed from cvs 2 weeks ago). All fsf copyright assignment forms have been done some years ago. At present the gm2 master can be grafted onto gcc-4.7.3 with 10 patches applied. Under Debian Wheezy x86_64 the regression tests show 332 failures and 8298 passes. From reading http://gnu.gcc.org/wiki/SvnBranch I wonder whether it would seem sensible to create two branches one at 4.7.3 and another at branch at the head (maybe) and mercilessly merge from the head. Maybe one of the earlier activities should be to forward port the 10 patches and post them to the appropriate mailing list? regards, Gaius
Re: Great example of why everything is a tree sucks
On 11/13/13 12:33, Gaius Mulley wrote: I'd be delighted to see gm2 in the gcc repository. The gm2 repository is currently in git format (changed from cvs 2 weeks ago). All fsf copyright assignment forms have been done some years ago. At present the gm2 master can be grafted onto gcc-4.7.3 with 10 patches applied. Under Debian Wheezy x86_64 the regression tests show 332 failures and 8298 passes. From reading http://gnu.gcc.org/wiki/SvnBranch I wonder whether it would seem sensible to create two branches one at 4.7.3 and another at branch at the head (maybe) and mercilessly merge from the head. Maybe one of the earlier activities should be to forward port the 10 patches and post them to the appropriate mailing list? I'm totally swamped right now so I can't really shepherd this at the moment. Other maintainers may be interested. A lot has changed since 4.7, so you're definitely going to have to rebase against something more modern. jeff
Re: Great example of why everything is a tree sucks
On Wed, 13 Nov 2013, Gaius Mulley wrote: In general, for GCC development to consider requirements of your front end or back end, getting it into the GCC repository and developing it there is strongly recommended. Sadly, I tried multiple times in the late 90s to bring the folks going GNU Pascal development into the GCC project without any success. Eventually I have up. I'd be delighted to see gm2 in the gcc repository. The gm2 repository is currently in git format (changed from cvs 2 weeks ago). All fsf copyright assignment forms have been done some years ago. Personally I'd welcome addition of front ends for any mainstream languages (of the sort that are suited to ahead-of-time compilation to machine code as in GCC) where there are developers interested in maintaining them in GCC following the usual GCC coding standards and development practices. (sourcebuild.texi, Front End, provides a checklist so you don't have anything obviously missing, but of course there's lots more involved in following what are currently considered good coding practices in GCC - some things existing front ends do may not now be considered good practice for new front ends.) This doesn't say whether a new front end would be built by default (though my personal view is that --enable-languages=all should mean all languages supported for the given target with the available build tools, and should be the default, but the expectations for routine patch testing could be something less). Of course, if a front end then ceases to be maintained it may get removed, as the CHILL front end was removed. From reading http://gnu.gcc.org/wiki/SvnBranch I wonder whether it would seem sensible to create two branches one at 4.7.3 and another at branch at the head (maybe) and mercilessly merge from the head. Maybe one of the earlier activities should be to forward port the 10 patches and post them to the appropriate mailing list? Any patches that aren't simply filling in checklist items for a new front end certainly need posting separately with their own rationale (for example, if there's an optimizer bug fix for a bug the front end shows up). If the patch only makes sense with the new front end, but isn't simply checklist-filling, it's probably best posted at the same time as the front end itself, but as a separate patch in the series (for example, if you want a new language-independent GIMPLE operation). I think it's reasonable to give SVN write access to anyone with a copyright assignment on file who is interested in maintaining a front end on a branch and hopefully contributing it to trunk in future. -- Joseph S. Myers jos...@codesourcery.com
Re: Great example of why everything is a tree sucks
Joseph S. Myers jos...@codesourcery.com writes: On Wed, 13 Nov 2013, Gaius Mulley wrote: In general, for GCC development to consider requirements of your front end or back end, getting it into the GCC repository and developing it there is strongly recommended. Sadly, I tried multiple times in the late 90s to bring the folks going GNU Pascal development into the GCC project without any success. Eventually I have up. I'd be delighted to see gm2 in the gcc repository. The gm2 repository is currently in git format (changed from cvs 2 weeks ago). All fsf copyright assignment forms have been done some years ago. Personally I'd welcome addition of front ends for any mainstream languages (of the sort that are suited to ahead-of-time compilation to machine code as in GCC) where there are developers interested in maintaining them in GCC following the usual GCC coding standards and development practices. (sourcebuild.texi, Front End, provides a checklist so you don't have anything obviously missing, but of course there's lots more involved in following what are currently considered good coding practices in GCC - some things existing front ends do may not now be considered good practice for new front ends.) This doesn't say whether a new front end would be built by default (though my personal view is that --enable-languages=all should mean all languages supported for the given target with the available build tools, and should be the default, but the expectations for routine patch testing could be something less). Of course, if a front end then ceases to be maintained it may get removed, as the CHILL front end was removed. From reading http://gnu.gcc.org/wiki/SvnBranch I wonder whether it would seem sensible to create two branches one at 4.7.3 and another at branch at the head (maybe) and mercilessly merge from the head. Maybe one of the earlier activities should be to forward port the 10 patches and post them to the appropriate mailing list? Any patches that aren't simply filling in checklist items for a new front end certainly need posting separately with their own rationale (for example, if there's an optimizer bug fix for a bug the front end shows up). If the patch only makes sense with the new front end, but isn't simply checklist-filling, it's probably best posted at the same time as the front end itself, but as a separate patch in the series (for example, if you want a new language-independent GIMPLE operation). I think it's reasonable to give SVN write access to anyone with a copyright assignment on file who is interested in maintaining a front end on a branch and hopefully contributing it to trunk in future. many thanks for the great advice and pointers! I'm leaning towards three branches, head, 4.8.2 and 4.7.3. The last since it is so close to passing all gm2 regression tests and thus useful to the m2 community. 4.8.2 to allow patch testing against the whole gcc testsuite and finally head. Once 4.7.3 and all gm2 regression tests pass then this would be becomes maintenance and my focus would be on 4.8.2 and head. regards, Gaius
Re: Great example of why everything is a tree sucks
On Tue, Nov 12, 2013 at 2:59 PM, Jeff Law l...@redhat.com wrote: It's time to move on and do something sensible with the core parts of our ILs so that we're all more effective in the long run. My sentiments, exactly! Diego.
Re: Great example of why everything is a tree sucks
On Tue, Nov 12, 2013 at 12:59:47PM -0700, Jeff Law wrote: So I lost something like 3 hrs last night due to writing a hunk of code like this if (INTEGRAL_TYPE_P (gimple_assign_lhs (stmt))) INTEGRAL_TYPE_P is a macro, which accepts everything, just adding a TYPE_CHECK to that macro would be sufficient to catch that (but of course could break various parts of GCC, so if you wanted to change the representation of types from tree to tree_type or something similar, the addition of the TYPE_CHECK would be probably first step towards that). Note that we have tons of code which accept either objects or types, both in the frontends and in the middle-end, so changing TREE_TYPE from tree to something else is definitely non-trivial. Jakub
Re: Great example of why everything is a tree sucks
On Tue, Nov 12, 2013 at 3:35 PM, Jakub Jelinek ja...@redhat.com wrote: Note that we have tons of code which accept either objects or types, both in the frontends and in the middle-end, so changing TREE_TYPE from tree to something else is definitely non-trivial. Well, sure it's hard. This is the whole point behind Andrew's refactoring project: setting up the groundwork for this kind of conversion to be possible. The software engineering atrocities that we have committed in the code base are going to take a few iterations to fix. But fix them, we must. I am convinced that this is the only way for GCC to avoid untimely oblivion; and allow it to evolve in ways that are now hard or impossible to implement. Diego.
Re: Great example of why everything is a tree sucks
The name David Malcolm comes to mind, I remember watching a GCC ... bucket, tub, some sort of large container (pot?) talk on it. He was replacing all the macros with a class with no virtuals (only one data member, as used by the macros in effect) and so forth and using inheritance, doesn't that solve this? (or wont that solve this? --future tense) C++11 has a lot of great things (like std::is_base_of and std::remove_pointer in type_traits) that help with this, I'm pretty sure these came from Boost, most good things come from Boost (read: I am certain they came from Boost and that Boost lets us do them, but it's been so long I couldn't tell you exactly how without reading documentation again). If C++11 stuff can't be used (I'm not saying we should, just observing, I agree that fairly old compilers should be able to build GCC) can't we just use what Boost does? I never spend much time looking but Boost does say which versions of what compilers are supported. If they can do it surely we can? This would allow some pretty solid compile time checks to be introduced I would have thought? Alec On 12/11/13 20:52, Diego Novillo wrote: On Tue, Nov 12, 2013 at 3:35 PM, Jakub Jelinek ja...@redhat.com wrote: Note that we have tons of code which accept either objects or types, both in the frontends and in the middle-end, so changing TREE_TYPE from tree to something else is definitely non-trivial. Well, sure it's hard. This is the whole point behind Andrew's refactoring project: setting up the groundwork for this kind of conversion to be possible. The software engineering atrocities that we have committed in the code base are going to take a few iterations to fix. But fix them, we must. I am convinced that this is the only way for GCC to avoid untimely oblivion; and allow it to evolve in ways that are now hard or impossible to implement. Diego.
Re: Great example of why everything is a tree sucks
On 11/12/13 13:35, Jakub Jelinek wrote: On Tue, Nov 12, 2013 at 12:59:47PM -0700, Jeff Law wrote: So I lost something like 3 hrs last night due to writing a hunk of code like this if (INTEGRAL_TYPE_P (gimple_assign_lhs (stmt))) INTEGRAL_TYPE_P is a macro, which accepts everything, just adding a TYPE_CHECK to that macro would be sufficient to catch that Yes, I know full well that I could hack up INTEGRAL_TYPE_P to detect this case and my brain damage would have been caught via the check sometime during building the runtime libraries or the stage2 build. My point is the mere need to hack up INTEGRAL_TYPE_P in that way is a result of a fundamental misdesign of the tree structures. If the structures were properly designed what I did would have been flagged as a compile error. It's that fundamental mis-design that we're trying to correct now with the work from Andrew, David others. Jeff
Re: Great example of why everything is a tree sucks
On 11/12/13 14:19, Alec Teal wrote: The name David Malcolm comes to mind, I remember watching a GCC ... bucket, tub, some sort of large container (pot?) talk on it. He was replacing all the macros with a class with no virtuals (only one data member, as used by the macros in effect) and so forth and using inheritance, doesn't that solve this? (or wont that solve this? --future tense) Yup. My rant was to show a very clear, current, example of the kind of things that we're looking to fix with Andrew David's work on our core data structures. Jeff