Re: Activate -mrecip with -ffast-math?
On 6/18/07, Uros Bizjak [EMAIL PROTECTED] wrote: On 6/18/07, tbp [EMAIL PROTECTED] wrote: Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Attached patch to should fix these troubles for the cost of 2 extra clocks. The trick is to limit the result just below infinity for rsqrt, and this keeps 0.0*(inf-) - 0.0. I guess I'm still confused how this will fix sqrt(x) - rsqrt for x == 0, so, can we have a testcase enumerating the now bogus cases? Thx, Richard. Uros. Index: i386.c === --- i386.c (revision 125790) +++ i386.c (working copy) @@ -22590,7 +22590,7 @@ void ix86_emit_swdivsf (rtx res, rtx a, void ix86_emit_swsqrtsf (rtx res, rtx a, enum machine_mode mode, bool recip) { - rtx x0, e0, e1, e2, e3, three, half; + rtx x0, e0, e1, e2, e3, three, half, bignum; x0 = gen_reg_rtx (mode); e0 = gen_reg_rtx (mode); @@ -22600,15 +22600,18 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, three = CONST_DOUBLE_FROM_REAL_VALUE (dconst3, SFmode); half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, SFmode); + bignum = gen_lowpart (SFmode, GEN_INT (0x7f7f)); if (VECTOR_MODE_P (mode)) { three = ix86_build_const_vector (SFmode, true, three); half = ix86_build_const_vector (SFmode, true, half); + bignum = ix86_build_const_vector (SFmode, true, bignum); } three = force_reg (mode, three); half = force_reg (mode, half); + bignum = force_reg (mode, bignum); /* sqrt(a) = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) 1.0 / sqrt(a) = 0.5 * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) */ @@ -22617,6 +22620,9 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, emit_insn (gen_rtx_SET (VOIDmode, x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), UNSPEC_RSQRT))); + emit_insn (gen_rtx_SET (VOIDmode, x0, + gen_rtx_SMIN (mode, x0, bignum))); + /* e0 = x0 * a */ emit_insn (gen_rtx_SET (VOIDmode, e0, gen_rtx_MULT (mode, x0, a)));
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote: No, that's not the contract with -ffast-math. Note that -ffast-math enables -funsafe-math-optimizations which is allowed to change results (add/remove rounding operations, contract expressions, do transforms like a/b to a * 1/b, do transformations that get you bigger errors than 0.5ulp, etc.) I can't expect a division by a constant to survive -ffast-math unscathed, but then that's a change in precision and manageable. Being returned a NaN i'm not supposed to be see for a common case depending on some transformation is something else, entirely. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Hm, which particular case are you concerned about (maybe it was mentioned, but I don't remember the details)? Note that -ffast-math enables -ffinite-math-only as well, so the compiler assumes nothing will result in NaNs or Infs. Yes and that's why it's such a pain to handle them correctly while in -ffast-math. But if i generate some, then i get what i've asked for (and i'm in for a local fix). Fair enough. I'm not going to give up ie fast robust SSE ray/aabb slab tests (or ray/plane or...) because of some arbitrary rule; the hardware handles it just fine (yes there's a penalty, but then it's way faster than branching). For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first get an inf from said reciprocal which then turns to a NaN in the NR stage but if you correct it by, say, doing a comparison to 0 and a 'and'. That's what ICC used to do in your back. That's what you'll find page 151 of the amdfam10 optimization manual. Because that's a common case. As far as i can see, there's no such provision in the current patch. At the very least provide a mean to look after those NaNs without losing sanity, like a way to enforce argument order of min/max[ss|ps|pd] without ressorting to inline asm. Well - certainly another reason for the Math BOF ;) We all expect very different things from -ffast-math or -funsafe-math-optimizations. You mean fast unsafe? I think there's quite a margin between to let someone shoot himself in the feet and put a gun on his head.
Re: More vectorizer testcases?
On 6/18/2007 1:26 PM, Dorit Nuzman wrote: these 3 are actually not so simple... the main thing that's blocking 2 of them right now is that they need support for stores with gaps, which can be added except the other problem is that the vectorizer thinks it's not profitable to vectorize them (or rather 2 of them. as does ICC by the way). When you say not profitable, is that target-dependent? I would be satisfied when the vectorizer can vectorize it *but* prefer not to do it because it can be done more efficiently on the specific target. Of course, it would interesting to still force the vectorizer to produce the code, so to compare the vectorized version with the non-vectorized version and see if it is really right. Is there (will there be) an option to turn off cost-based estimation within the vectorizer? Since the time you opened these PRs we came quite a bit closer to vectorizing these (the support for interleaved accesses and for multiple data-types were also required). It will be fun to add the last missing bit - the support for the stores-with-gaps. I hope we'll get to it before too long... Nice! I'm looking forward to it! If you have other (hot) code examples that expose different missing features I think that's always interesting to know about (but if it's like the codes above then maybe it will not have much added value...). I have dozens and dozens of loops which I believe that could be vectorized and are not. I don't whether they are related to store-with-gaps or not, though. So, well, I'll open the bugreports and let you do the analysys. Feel free to close them as duplicates if you think they're not worth to keep opened on their own. -- Giovanni Bajo
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Giovanni Bajo [EMAIL PROTECTED] wrote: I understand your problems, but let me state that your objections are totally subjective. *You* need a specific behaviour from -ffast-math (eg: keep NaN/Inf), but that's not what *I* need. So, we have different goals. No. My NaN are my problem. Those generated by gcc, aren't. At the very least provide a cannonical (efficient) way to filter them (ie SSE min/max).
Re: Some thoughts about steerring commitee work
However, as far as I know (also from talking with the SLP authors) pretty much all the opportunities they had found at the time were in loops. I can hand you more than the testcases i've given so far. There is tons of code out there that would benefit from straight line Interesting. I wasn't aware of this potential. Please do send some of this code. thanks! vectorization. In fact, we have some that gets written in loop form right now just so it gets vectorized! that doesn't sound like such a bad idea to me... :-) (seriously - isn't it more intuitive for the programmer and informative for the compiler to use loops when possible? of course, I haven't seen the code you're talking about so maybe it doesn't apply to the cases you're referring to) we'll have to have a much better cost model before we start packing random sequences of stmts out of loops. This i'm happy to agree on, but it does not change that I am disappointed that you have tied the SLP implementation to loops so heavily. I think the SLP that we did should really be viewed more as extending the vectorizer to also consider intra-iteration DLP (in addition to inter-iteration DLP). We used the term SLP cause there is a lot of analogy to SLP, and a lot of people are familar with this term, but we are not doing SLP per se. And, as I said before, I see the dependence on the existing loop infrastructure as an advantage, and a way to efficiently vectorize a lot of SLP-like-codes without writing a whole new vectorizer. I don't debate that there's room to also implement real basic-block SLP (well, at least now that I hear that there's ton of code that can benefit from it), but I don't think you should be so disappointed... :-) thanks, dorit Simply because you can't find cases in SPEC2000 doesn't mean it's not useful. I don't know where you're taking this from. SPEC2000 is really not so interesting vectorization wise, inside or outside loops. This was from some private mails I received about how it is not useful for benchmarks. HTH, Dan
Re: Incorrect bitfield aliasing with Tree SSA
If it was designed properly in the first place, there simply would *be no problem at the tree level*, because nothing would have broken. That's certainly a point of view. The other is that the RTL implementation predates the Tree one, works fine in GCC 3.x, including for the C compiler. One would have thought that the Tree implementation would be aware of it instead of overlooking it, given that alias.c is shared among them. So far you guys have resisted what seem like perfectly reasonable solutions by Adam You mean the patch that would have disabled the whole thing at the RTL level? I'm sure that we can devise something better. -- Eric Botcazou
Re: Incorrect bitfield aliasing with Tree SSA
That is not the example case we have given where this breaks. The case where it breaks is exactly the case i have shown you. We have a pointer to a structure, and because you have not recorded the type's alias relationships properly, we claim derferences that are offsetted from the structure can not access the field. This is a direct consequence of trying to use the parent's alias set for that of the child type, instead of creating a new alias set. Let me try to explain it this way: if you have a structure where all fields are nonaddressable, EVERY reference should have the same alias set, that of the structure. So they all conflict, as they should, and no other reference will conflict, which is also correct. It sounds like there's a bug here in that somebody is using the wrong alias set somewhere. All the RTL dumps posted in this thread (and the related one in gcc-patches) look correct, so it's right at the RTL level. That means that only place it's wrong would be at tree level, but get_alias_set also does the right thing.
Re: More vectorizer testcases?
Giovanni Bajo [EMAIL PROTECTED] wrote on 18/06/2007 15:06:48: On 6/18/2007 1:26 PM, Dorit Nuzman wrote: these 3 are actually not so simple... the main thing that's blocking 2 of them right now is that they need support for stores with gaps, which can be added except the other problem is that the vectorizer thinks it's not profitable to vectorize them (or rather 2 of them. as does ICC by the way). When you say not profitable, is that target-dependent? I would be satisfied when the vectorizer can vectorize it *but* prefer not to do it that's a fair point. In the one case (the 3X3 matrices and loop-bounds=3) the vectorizer just can't handle sizes that don't evenly divide the vector-size. Even when that will be extended (by conceptually unrolling the loop by 4 to be able to pack into 3 vectors of size 4) it won't help this case cause the loop bound is only 3. This particular testcase fails to vectorize even without the newly added initial cost-model, just based on the fact that the loop-count is less than the vector-size (this is not a target dependent decision). ICC is reported to also choose to not vectorize this loop. The other two loops we just can't vectorize yet (both of which ICC chooses not to vectorize because it thinks it's not profitable). because it can be done more efficiently on the specific target. Of course, it would interesting to still force the vectorizer to produce the code, so to compare the vectorized version with the non-vectorized version and see if it is really right. Is there (will there be) an option to turn off cost-based estimation within the vectorizer? the choice not to vectorize when the loop-bound is less than the vectorization factor is the only cost estimation that is hard coded. The rest of the cost model is controlled by the flag -fvect-cost-model. dorit Since the time you opened these PRs we came quite a bit closer to vectorizing these (the support for interleaved accesses and for multiple data-types were also required). It will be fun to add the last missing bit - the support for the stores-with-gaps. I hope we'll get to it before too long... Nice! I'm looking forward to it! If you have other (hot) code examples that expose different missing features I think that's always interesting to know about (but if it's like the codes above then maybe it will not have much added value...). I have dozens and dozens of loops which I believe that could be vectorized and are not. I don't whether they are related to store-with-gaps or not, though. So, well, I'll open the bugreports and let you do the analysys. Feel free to close them as duplicates if you think they're not worth to keep opened on their own. -- Giovanni Bajo
gcov / gcov-dump
I'm writing a tool which reads information (arcs) from the .gcno file produced by GCC with -ftest-coverage. It calculates the NPATH complexity (number of execution paths in a function) By doing this I found out the graph generated by GCC contains more paths then I expected. For every function call it generates emergency exits in case exceptions occur. (i'm guessing here) I found some documentation in the gcov-io.h but I still have trouble figuring out how to interpret these paths. In some cases inline C++ code appears in the graph, but for simple cases it doesn't. The .gcno files seemed a simple way to get access to the arcs, but I'm not sure the data is rich enough to work out the NPATH complexity. (for simple cases, it works well) Any ideas how I can find more information to solve this? Eddy -- http://sourceforge.net/projects/gnocchi
Re: I'm sorry, but this is unacceptable (union members and ctors)
On 6/17/07, michael.a [EMAIL PROTECTED] wrote: I appreciate the thought, but there is sort of an imperitive with this effort to shy away from Boost/STL/virtual inheritance completely. You'd be hard-pressed to find any instance of dynamic polymorphism anywhere in Boost. Most of Boost is based on compile-time template tricks. For example, you have been pointed to boost::optionalT; here's boost/optional/optional.hpp, where it's implemented, see if you can find virtual anywhere: http://www.boost.org/boost/optional/optional.hpp
Re: Incorrect bitfield aliasing with Tree SSA
If it was designed properly in the first place, there simply would *be no problem at the tree level*, because nothing would have broken. It's possible to have bugs anytime and that's all we have here: somebody is using the wrong alias set someplace. We fix that and all is OK. So far you guys have resisted what seem like perfectly reasonable solutions by Adam Because they turn off the feature rather than use it. I still don't understand what the difficulty is here and why you persist in thinking that type alias set of the type of a non-addressable field has any use at all: it doesn't and should be COMPLETELY ignored. All the code has been written to do that. Somebody is trying to directly compute an alias set instead of using get_alias_set and when you find that, you'll find the bug.
missing symbols
In the following code, compiled with g++ cls.cc -Wall -W -g3 -o cls why only only virtual functions f1, f2 and constructor is listed by nm. Only debugging symbols for virtual functions are included in executable output file ? //cls.cc #include iostream using namespace std; class test { public: int u; test(int t){u=t;}; virtual void f1(){ cout uendl; } virtual void f2(int t){u=t;}; void f3(int t){u=t;}; }; int main(int argc, char **argv) { test t(100); } $nm -C cls | grep test:: 0804874e W test::f1() 08048740 W test::f2(int) 08048728 W test::test(int)
Re: missing symbols
On 6/18/07, costin_c [EMAIL PROTECTED] wrote: In the following code, compiled with g++ cls.cc -Wall -W -g3 -o cls why only only virtual functions f1, f2 and constructor is listed by nm. Only debugging symbols for virtual functions are included in executable output file ? //cls.cc #include iostream using namespace std; class test { public: int u; test(int t){u=t;}; virtual void f1(){ cout uendl; } virtual void f2(int t){u=t;}; void f3(int t){u=t;}; }; int main(int argc, char **argv) { test t(100); } $nm -C cls | grep test:: 0804874e W test::f1() 08048740 W test::f2(int) 08048728 W test::test(int) Wierd assembler file, generated by -S parameter, include all information about test class methods: test,f1,f2,f3 .. .string test::test .long 0x5d40 .string test::f2 .long 0x5d70 .string test::f3 .long 0x5e4d .string test::f1 .long 0x5e71 .string main
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Eric Botcazou [EMAIL PROTECTED] wrote: If it was designed properly in the first place, there simply would *be no problem at the tree level*, because nothing would have broken. That's certainly a point of view. The other is that the RTL implementation predates the Tree one, works fine in GCC 3.x, including for the C compiler. One would have thought that the Tree implementation would be aware of it instead of overlooking it, given that alias.c is shared among them. Uh, except as we've discovered, the RTL uses alias set 0, so whatever alias set you choose for these doesn't matter anyway to the RTL level. So far you guys have resisted what seem like perfectly reasonable solutions by Adam You mean the patch that would have disabled the whole thing at the RTL level? I'm sure that we can devise something better. No i mean the idea of making it a different alias set than the parent, but a subset of the parent. -- Eric Botcazou
Re: missing symbols
On 6/18/07, costin_c [EMAIL PROTECTED] wrote: On 6/18/07, costin_c [EMAIL PROTECTED] wrote: In the following code, compiled with g++ cls.cc -Wall -W -g3 -o cls why only only virtual functions f1, f2 and constructor is listed by nm. Because they are needed for the vtable. While f3 is declared as inline and not used so it is not outputed. Wierd assembler file, generated by -S parameter, include all information about test class methods: test,f1,f2,f3 Not really because that is what you get with -g3. Thanks, Andrew Pinski
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Daniel Berlin [EMAIL PROTECTED] wrote: On 6/18/07, Eric Botcazou [EMAIL PROTECTED] wrote: If it was designed properly in the first place, there simply would *be no problem at the tree level*, because nothing would have broken. That's certainly a point of view. The other is that the RTL implementation predates the Tree one, works fine in GCC 3.x, including for the C compiler. One would have thought that the Tree implementation would be aware of it instead of overlooking it, given that alias.c is shared among them. Uh, except as we've discovered, the RTL uses alias set 0, so whatever alias set you choose for these doesn't matter anyway to the RTL level. So far you guys have resisted what seem like perfectly reasonable solutions by Adam You mean the patch that would have disabled the whole thing at the RTL level? I'm sure that we can devise something better. No i mean the idea of making it a different alias set than the parent, but a subset of the parent. also, unique from the alias set of the other type (IE int:31 has a different alias set than int).
Re: I'm sorry, but this is unacceptable (union members and ctors)
michael.a: A proper C++ style fix would require the introduction of new syntax rather than tagging unions or such. The dominant ctors would have to be specified, or unions themselves could simply be allowed ctors that override the member ctors. Call them constructor overloads or something, no new syntax or revolutionary semantics, just a quick and easy fix. If all you need is one memeber that has constructors / destructors, and all other members are PODs that provide an alternate view of the contents, then I think that would make a logical extension of the transparent union extension. A transparent union as passed to functions in the same manner as its first member. You could define that a tranparent union is allowed to have as its first member a class with constructors and/or destructors, and that these constructors / destructors are then the constructors / destructors of the union. Caveat: If the union is larger or more alingned than its first member, the argument passing semantics don't make sense. This is documented in extend.texi: Second, the argument is passed to the function using the calling conventions of the first member of the transparent union, not the calling conventions of the union itself. All members of the union must have the same machine representation; this is necessary for this argument passing to work properly. There is also a syntax example for __attribute__ ((__transparent_union__)) in extend.texi. Inside the compiler, you can check if union is a transparent union using the TYPE_TRANSPARENT_UNION macro. Joe Buck: I wouldn't object if someone implemented a clean extension. The problem with extensions, though, is documenting how all the corner cases work, and making sure that they all get tested. This is somewhat easier when you're cloning someone else's extension, because the other implementation can be used for comparison. To avoid having too many corner cases, you can keep the defined functionality small and well delineated, declare anything beyond this scope as invoking undefined behaviour (simplest for implementation - just make sure you don't ICE) or as a constraint violation (i.e. you should make sure that the compiler produces an error - the benefit is that it prevents people from accidentially starting to use accidential functionality that is not covered by the documented extension). michael.a: Sometimes extensions just have to be quick and dirty. Microsoft is a major influence. The facilities should be there to match MS whenever within reason... as well as should be ever present warnings not to abuse such facilities. If you make a quick and dirty hack, you have so be prepared for it not to be maintainable for any length of time. michael.a: I went to compile a tainted build last night, but I ran into a build error apparently related only to subversion checkouts, which might also be particular to the target debian distribution / hardware support for some esoteric reason according to what can be gleamed from google. So I went to just download the release sources, but all of the mirrors were down for some reason. The error is related to a bison/flex build event, which for some reason can't be completed by autotools or something... I figure it easier to just go with the release sources as suggested (the relevant .c files are pregenerated in the release trees) Try contrib/gcc_update --touch after the checkout.
Re: Incorrect bitfield aliasing with Tree SSA
Uh, except as we've discovered, the RTL uses alias set 0, so whatever alias set you choose for these doesn't matter anyway to the RTL level. Only in some cases. That was a kludge put in to fix some obscure bug and left there. I hope we can remove it at some point, and think we can. No i mean the idea of making it a different alias set than the parent, but a subset of the parent. Because it *is* the parent in most places, so it should be in all.
Re: Incorrect bitfield aliasing with Tree SSA
Again, the tree level relies on the documented (in the comments of alias.c) fact that given a structure, the fields contained in a structure will have alias sets that are strict subsets of the parent. That is ONLY true for fields that don't have DECL_NONADDRESSABLE_P and that's been the case foreever. The documentation might be confusing, but the code has never been. The bug reports are about cases where we have a struct foo * (where struct foo contains int a:31), and foo pointer-a is claimed to not alias with foo.a. How can you take a pointer to the bitfield? I would much rather maintain the strict subset invariant than the component_uses_parent_alias_set stuff, since this is the documented invariant, and makes sense. But throws away the entire DECL_NONADDRESSABLE_P mechanism! Also, how do we handle TYPE_NONALIASED_COMPONENT? It's exactly the same issue?
Re: Incorrect bitfield aliasing with Tree SSA
His first patch, which simply makes #1 true, would cause missed optimization. It doesn't cause missed optimizations: it completely removes all the functionality of DECL_NONADDRESSABLE_P!
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote: Again, the tree level relies on the documented (in the comments of alias.c) fact that given a structure, the fields contained in a structure will have alias sets that are strict subsets of the parent. That is ONLY true for fields that don't have DECL_NONADDRESSABLE_P and that's been the case foreever. The documentation might be confusing, but the code has never been. The bug reports are about cases where we have a struct foo * (where struct foo contains int a:31), and foo pointer-a is claimed to not alias with foo.a. How can you take a pointer to the bitfield? I would much rather maintain the strict subset invariant than the component_uses_parent_alias_set stuff, since this is the documented invariant, and makes sense. But throws away the entire DECL_NONADDRESSABLE_P mechanism! No, an int* will still not conflict with int:31 a short * will still not conflict with short:31 Also, how do we handle TYPE_NONALIASED_COMPONENT? It's exactly the same issue? Tell me what TYPE_NONALIASED_COMPONENT does, and i'll tell you what will happen right now :)
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote: His first patch, which simply makes #1 true, would cause missed optimization. It doesn't cause missed optimizations: it completely removes all the functionality of DECL_NONADDRESSABLE_P! Hence the reason for the second suggestion.
Re: I'm sorry, but this is unacceptable (union members and ctors)
Robert Dewar writes: Ross Ridge wrote: t formal definition. Most of GCC's long list of extensions to C are also implemented as extensions to C++, so you've already lost this battle in GNU C++. And many of them are ill-defined (and some would agree ill-considered). Mistakes in the past are not a good reason for mistakes in the future. Trying to add new a new feature without an existing implementation only makes it harder to get both a correct formal definition and something that people will actually want to use. I think the best procedure is to discuss new features from a language design point of view, and the committee is the best forum for that, then implement them as *part* of the (typically fairly drawn out) process of adding a new feature. There's always a chicken and egg problem here: language features that might be good for a standardization proposal need to be tested in real-world applications before anyone knows that they will be useful. Of course, some of gcc's C extensions are ill-considered and caused problems, but one of the reasons we know how ill-considered they are is that they were implemented and people tried to use them. gcc has a role to play as a deployment vehicle for language extensions. The trouble is that it's very hard to kill an extension once people are using it... Andrew.
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote: But throws away the entire DECL_NONADDRESSABLE_P mechanism! No, an int* will still not conflict with int:31 a short * will still not conflict with short:31 Using what mechanism? That's what DECL_NONADDRESSABLE_P does! Please read what the *second* proposal was again 1. The alias set is a subset of the parent set and more importantly 2. The alias set is different than that of the underlying type and the parent set. Thus, the alias sets will not conflict with the underlying type, but will conflict with the parent set which is exactly what you want. How does this get a different result for trees than RTL? As i've explained, we rely on the proper of the TBAA forest that given struct foo (set 1) / \ int :31 (set 2) short :31 (set 3) sets for int :31 and short :31 are strict subsets of that of struct foo. This is how it is documented and except for this one little wart introduced by DECL_NONADDRESSABLE_P, how it is. For the sake of a complete example, we also have int (set 4) and short (set 5), which are both roots of this forest, and thus, do not conflict with set 3 or 2. The forest we have now says: struct foo, int :31, short :31 (set 1) (and int = set 4 and short = set 5). Note again that in neither forest does set 1 conflict with 4 or 5, but in the first forest, the subset relationship between int:31 and struct foo is properly represented as subset but different. As I said to Eric, you can also change the strict subset to subset_or_equals, but that is really not quite in accord with reality. They *really are different alias sets than their parent*. Note also that it is more precise in the first case. If you were ever to ask can int:31 touch short:31, the first forest would correctly say no, what we have now would say yes. Tell me what TYPE_NONALIASED_COMPONENT does, and i'll tell you what will happen right now :) Very similar. If I have typedef xyz[100] foo; and mark that type with that flag, it means that int * will not conflict with it, just foo *. Then these should simply have different non-conflicting aliasing sets :)
Re: Some thoughts about steerring commitee work
I can hand you more than the testcases i've given so far. There is tons of code out there that would benefit from straight line vectorization. I'm interested in these test cases. Thanks! In fact, we have some that gets written in loop form right now just so it gets vectorized! May be loop materialization is useful in such situation ? - Devang
Re: RFC: Make dllimport/dllexport imply default visibility
Chris Lattner wrote: [Richard E., please see below for a question re. RealView's behavior.] You and Chris are taking the view that the type has a location. But, a lot of people don't look at it this way. The things that have locations are variables (including class data) and functions. After all, types don't appear in object files; only variables and functions do. That is a limited view of things based on the current implementation of GCC. When future developments (e.g. LTO) occur, this will change: types certainly do live in object files. I don't see how LTO changes this. Yes, type definitions will appear in one or more object files. But, the intended semantics of LTO are just to do what the linker would do -- plus some consistency checking. Furthermore, you're taking the view that: __attribute__((visibility (hidden)) on a type means something about visibility of the type in a linguistic sense, i.e., that it provides some kind of scoping, perhaps like an anonymous namespace that is different in each shared library. Yes. That's a possible meaning, but it's not the meaning that was intended. As Danny has said, it's not the meaning that Windows users want. It's also not the meaning that SymbianOS users want. But, the visibility attribute is only specified in terms of its effects on ELF symbols, not as having C++ semantics per se. The hidden visibility attribute says that all members of the class have hidden visibility, unless otherwise specified. I'll paraphrase this as saying: this is already an extension, not a standard - we can extend the extension without remorse. Currently, the compiler generates wrong code: it generates a hidden reference to a dllimport'd function. There are two ways to fix that: declare the construct invalid, or make the compiler generate a non-hidden reference. The second option might be an extension to an extension but it might also just be a bug fix. We also allow: struct __attribute__((visibility(hidden))) S { __attribute__((visibility(default))) void f(); void g(); }; Because there is no standard to reference, I think it's important to consider these things in terms of explainability. It is very easy (and common) to explain visibility and anon namespaces in terms of types (when applied to a type). Here would be my explanation: The visibility attribute to a class specifies the default visibility for all of its members, including compiler-generated functions and variables. You can override that default by explicitly specifying a different visibility for the members. That seems acceptably simple to me. As long as we allow visibility specifications for the members that are different from the class (independently of whether that is narrower or wider visibility) an explanation in terms of namespaces will require a caveat. For example: Giving a class hidden visibility is similar to putting it in an anonymous namespace shared not just within a single translation unit, but across all translation units in a shared object. However, if you override the visibility of the members of the class, then they may have more or less visibility than specified by the class. ELF operates at a level below C++, and can be used to do things that C++ does not allow. For example, the C++ standard (via the ODR) forbids a single program from having two classes with the same name. But, one of the goals of ELF hidden visibility is to allow that, so that, for example, two plugins can have classes with the same name without conflicting. You can also give two C++ functions the same address via appropriate ELF magic. These sorts of things must be done with care, but they are techniques used by many real programs, and in the hands of experts, useful. I suspect that the realview compiler accepts this as an oversight or a bug, not as an intentional feature. Let's ask. Richard E., is the fact that RealView 3.0SP1 accepts: class __declspec(notshared) S { __declspec(dllimport) void f(); }; a bug or a feature? If this is considered a bug, is it something that RealView is likely to change in a future release, or will it be preserved for the forseeable future for backwards compatibility? There are two conflicting goals to balance: 1. Define our extensions as well as possible and make their semantics as explainable and logical as possible. 2. Compile existing code with maximum compatibility. To me, the best way to handle this is to reject this by default (based on #1). To handle #2, add a flag (defaulting to off) to enable this extended extension. In the diagnostic, tell the user about the option, and in the manual document the option and the issue. Good; at this point we've agreed that we should accept the code. Now we're just arguing about whether we accept it by default. That's a less important issue, since at least there will be some way to get the behavior that users want. We have accepted this code: struct
Re: Suffix for __float128 FP constants
On Sun, Jun 17, 2007 at 09:06:36PM +, Joseph S. Myers wrote: On Sun, 17 Jun 2007, Uros Bizjak wrote: I was trying to load a full 128 bit constant into __float128 variable, but with L suffix, I was able to load only XFmode constant. Is there a special suffix for __float128 available in gcc? No; since the x86-64 ABI is what defines the __float128 name, you could ask the associated mailing list about a standard suffix to associate with it. Lack of standard for __float128 is always a problem. Suffix for __float128 constant is one, scanf/printf specifier for __float128 is another. We also don't have a name for string to __float128 function. H.J.
Re: Suffix for __float128 FP constants
H. J. Lu wrote: I was trying to load a full 128 bit constant into __float128 variable, but with L suffix, I was able to load only XFmode constant. Is there a special suffix for __float128 available in gcc? No; since the x86-64 ABI is what defines the __float128 name, you could ask the associated mailing list about a standard suffix to associate with it. Lack of standard for __float128 is always a problem. Suffix for __float128 constant is one, scanf/printf specifier for __float128 is another. We also don't have a name for string to __float128 function. While the __float128 scanf/printf specifier is part of library (and this way, a custom library can provide these functions), the suffix for constant should be covered by the compiler. Otherwise there is no (clear) way to load the 128bit register with a 128bit constant value. BTW: IA64 has the same issues with two FP types (long double XFmode and longer double TFmode). How is this solved for IA64? Uros.
Re: Some thoughts about steerring commitee work
On 6/18/07, Dorit Nuzman [EMAIL PROTECTED] wrote: I can hand you more than the testcases i've given so far. There is tons of code out there that would benefit from straight line Interesting. I wasn't aware of this potential. Please do send some of this code. thanks! I'm thinking about loops whose bodies contain a call that is not inlined, so the code in that function looks like stright line code, but in fact is called from inside a loop. This could happen even because the compiler decided to outline the body of some loop, as is the case for the openMP code gen, or autoparallelization. Sebastian
Re: Some thoughts about steerring commitee work
On 6/18/07, Sebastian Pop [EMAIL PROTECTED] wrote: On 6/18/07, Dorit Nuzman [EMAIL PROTECTED] wrote: I can hand you more than the testcases i've given so far. There is tons of code out there that would benefit from straight line Interesting. I wasn't aware of this potential. Please do send some of this code. thanks! I'm thinking about loops whose bodies contain a call that is not inlined, so the code in that function looks like stright line code, but in fact is called from inside a loop. This could happen even because the compiler decided to outline the body of some loop, as the case for the openMP code gen, or autoparallelization. This is in fact, most of the code i will send to dorit. It is, IMHO, not reasonable, to say that we should inline everything that may ever turn out to be vectorizable :) If you throw virtual functions into the mix, it may not even be possible ;)
Re: Suffix for __float128 FP constants
On Mon, Jun 18, 2007 at 07:25:06PM +0200, Uros Bizjak wrote: H. J. Lu wrote: I was trying to load a full 128 bit constant into __float128 variable, but with L suffix, I was able to load only XFmode constant. Is there a special suffix for __float128 available in gcc? No; since the x86-64 ABI is what defines the __float128 name, you could ask the associated mailing list about a standard suffix to associate with it. Lack of standard for __float128 is always a problem. Suffix for __float128 constant is one, scanf/printf specifier for __float128 is another. We also don't have a name for string to __float128 function. While the __float128 scanf/printf specifier is part of library (and this way, a custom library can provide these functions), the suffix for constant should be covered by the compiler. Otherwise there is no (clear) way to load the 128bit register with a 128bit constant value. BTW: IA64 has the same issues with two FP types (long double XFmode and longer double TFmode). How is this solved for IA64? The same as x86-64 :-(. That is there is __float128 in ia64 psABI. But it isn't fully implemented in gcc and glibc. H.J.
New LTO branch ready
Hi guys, I have merged all patches touching lto/ into the new lto branch I'm almost 100% positive the result will not compile. There are no interesting conflicts to report (most were just formatting changes). I have not merged ChangeLog.lto onto the new branch, since it looked like it only contained info about changes that were outside of lto/, and the changes to lto/ went into lto/ChangeLog If you want it, i'm happy to copy it. I will perform merges from mainline to branch every week or two, unless you guys see a good reason not to
Re: Suffix for __float128 FP constants
BTW: IA64 has the same issues with two FP types (long double XFmode and longer double TFmode). How is this solved for IA64? Uros. This is different on IA64 HP-UX and IA64 Linux. On HP-UX, 128 bits is the standard long double and 80 bits is __float80. We use the 'W' suffix for a __float80 constant on HP-UX. HP-UX also uses a lower case 'w' in math names for functions (e.g. sqrtw) for __float80 functions. Since __float128 == long double on HP-UX we can just use 'L' and 'l' for those. None of which helps on Linux. Steve Ellcey [EMAIL PROTECTED]
Re: Activate -mrecip with -ffast-math?
tbp wrote: For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first get an inf from said reciprocal which then turns to a NaN in the NR stage but if you correct it by, say, doing a comparison to 0 and a 'and'. That's what ICC used to do in your back. That's what you'll find page 151 of the amdfam10 optimization manual. Because that's a common case. As far as i can see, there's no such provision in the current patch. At the very least provide a mean to look after those NaNs without losing sanity, like a way to enforce argument order of min/max[ss|ps|pd] without ressorting to inline asm. But even if sqrt is corrected for 0.0 * inf, there would still be a lot of problems with the combinations of NR-enhanced rsqrt and rcp. Consider for example: 1.0/sqrt(a/b) alias rsqrt(a/b) Having a=0, b != 0, the result is inf. This expression is mathematically equal to sqrt(b/a) and the compiler is free to do this optimization. In this case, b*rcp(a) produces NaN due to NR of rcp(a) and here we loose. Let's correct both, rsqrt and rcp NR steps for 0.0, so we have NR-rsqrt(0.0) = inf, NR-rcp(0.0) = inf. Again, sqrt(b/a) will create sqrt(inf) = inf * rsqrt(inf), so NR step for rsqrt will hit (0.0 * inf) from the other side. We loose, because there is no correction for the case where input operand is infinity. IMO, due to limited range of operands for -mrecip pass (inf, -inf); where 0.0 is excluded, it should be keept out of -ffast-math. There is no point to fix reciprocals only for 0.0, we need to fix both conversions for infinity and 0.0, even in -ffast-math. Uros.
RE: Fixed-point branch?
Bernd Schmidt wrote: I attached a diff file for 14 files of the new structures and documents. You and other maintainers are welcome to check it. Thanks a lot! Note: 14 files are = genmodes.c mode-classes.def machmode.def machmode.h tree.def tree.h tree.c rtl.def rtl.h rtl.c fixed-value.h fixed-value.c doc/extend.texi doc/rtl.texi doc/c-tree.texi doc/md.texi Random comments.. + unsigned saturating_flag : 1; /* FIXME. This new flag increases the size of + tree_common by a full word. */ Sounds undesirable. We need to look hard for a way to avoid this. Yes, we can get one of 24 spare bits for this flag. We just fixed this issue last week. +ACCUM_MODE (HA, 2, 8, 7); /* s8.7 */ +ACCUM_MODE (SA, 4, 16, 15); /* s16.15 */ +ACCUM_MODE (DA, 8, 32, 31); /* s32.31 */ +ACCUM_MODE (TA, 16, 64, 63); /* s64.63 */ Lots of predefined types and modes in this patch. What about targets with other requirements (the Blackfin has 40 bit (8 + 32) accumulators)? In bfin-modes.def, we can adjust the DA mode to (s7.32) by using ADJUST_IBIT(DA, 7) ADJUST_FBIT(DA, 32) For vectors, we let the targets define the supported modes. Why do we want something else for fractional support? I am not clear about this question. The new modes (FRACT, UFRACT, ACCUM, and UACCUM) enables GCC to recognize the formats of the underlying values to perform constant folding (e.g., + - * /). To use the DA mode for vector, we can use: VECTOR_MODE (ACCUM, DA, 2); +int +fixed_zerop (tree expr) +{ + return TREE_CODE (expr) == FIXED_CST + double_int_zero_p (TREE_FIXED_CST (expr).data); +} Formatting - this needs parentheses. Elsewhere too. Ok. +static tree +make_or_reuse_fract_type (unsigned size, int unsignedp, int satp) Comments before functions. Ok. Thanks! Regards, Chao-ying
Re: Activate -mrecip with -ffast-math?
Giovanni Bajo wrote: Both our goals are legitimate. But that's not the point. The point is what -ffast-math semantically means (the simplistic list of suboptions activated by it is of couse unsufficiente because it doesn't explain how to behave in face of new options, like -mrecip). My proposal is: -ffast-math activates all the mathematical-related optimizations that improves code speed while destroying floating point accuracy. I don't think that's a workable proposal. If it is taken literally, it means that the optimization of converting all floating-point arithmetic to no-ops and replacing all references to floating-point variables with zeros is allowed (and would be appropriate under this option). And, personally, I don't think that documentation is of use if it can't be taken reasonably literally. There's a line between what's acceptable and what's not, and regardless of where exactly it is, the documentation needs to fairly clearly indicate its location. - Brooks
Re: Activate -mrecip with -ffast-math?
On Jun 18, 2007, at 2:14 PM, Uros Bizjak wrote: tbp wrote: For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first get an inf from said reciprocal which then turns to a NaN in the NR stage but if you correct it by, say, doing a comparison to 0 and a 'and'. That's what ICC used to do in your back. That's what you'll find page 151 of the amdfam10 optimization manual. Because that's a common case. As far as i can see, there's no such provision in the current patch. At the very least provide a mean to look after those NaNs without losing sanity, like a way to enforce argument order of min/max[ss|ps|pd] without ressorting to inline asm. But even if sqrt is corrected for 0.0 * inf, there would still be a lot of problems with the combinations of NR-enhanced rsqrt and rcp. Consider for example: 1.0/sqrt(a/b) alias rsqrt(a/b) Having a=0, b != 0, the result is inf. As already stated, -ffast-math turns on -ffinite-math-only, which allows the compiler to assume that a result of inf cannot happen, so gcc is allowed to ignore this possiblity. Producing NaN instead of inf seems to be allowed. This expression is mathematically equal to sqrt(b/a) and the compiler is free to do this optimization. In this case, b*rcp(a) produces NaN due to NR of rcp(a) and here we loose. Let's correct both, rsqrt and rcp NR steps for 0.0, so we have NR- rsqrt(0.0) = inf, NR-rcp(0.0) = inf. Again, sqrt(b/a) will create sqrt(inf) = inf * rsqrt(inf), so NR step for rsqrt will hit (0.0 * inf) from the other side. We loose, because there is no correction for the case where input operand is infinity. IMO, due to limited range of operands for -mrecip pass (inf, - inf); where 0.0 is excluded, it should be keept out of -ffast-math. There is no point to fix reciprocals only for 0.0, we need to fix both conversions for infinity and 0.0, even in -ffast-math. I think that tbp wants just to ensure that sqrt(0.0)=0.0 even with your various reciprocal and sqrt optimizations. (I can't test the new code now, but I think he claims that with the new sqrt optimizations sqrt(0.) = NaN; if indeed it does this then I would consider this a bug.) I don't think he wants the optimizations to have to do the right thing when an argument or result of one of these operations is infinite or a NaN. Of course, he can correct me if I'm wrong. Brad
Re: Activate -mrecip with -ffast-math?
On Jun 18, 2007, at 2:27 PM, Bradley Lucier wrote: But even if sqrt is corrected for 0.0 * inf, there would still be a lot of problems with the combinations of NR-enhanced rsqrt and rcp. Consider for example: 1.0/sqrt(a/b) alias rsqrt(a/b) Having a=0, b != 0, the result is inf. As already stated, -ffast-math turns on -ffinite-math-only, which allows the compiler to assume that a result of inf cannot happen, so gcc is allowed to ignore this possiblity. Producing NaN instead of inf seems to be allowed. Let me restate this. If -ffinite-math-only is specified, then producing NaN instead of inf should be allowed. If -fno-finite-math-only is specified, then the generated code should do the right thing if an argument or result is inf or NaN. In any case, I would consider it an error if the argument is finite, the result is supposed to be finite, and inf or NaN is produced. Brad
Re: Incorrect bitfield aliasing with Tree SSA
I am glad to see we are converging toward implementation issues now! I am storing it in a new field under the alias_set_entry: get_alias_set_entry (TYPE_ALIAS_SET (t))-nonaddr_alias_set. Where T is which type?
Re: Incorrect bitfield aliasing with Tree SSA
It gives you the alias set of the parent, which, for the reason that OTHER THINGS USE THE ALIAS SET SPLAY TREES, gives the wrong answer. Can you give a few sentence explanation of what alias set splay trees are and why they aren't using the alias set mechanism? I'm not sure what a TBAA forest is, but keep in mind that, at least in Ada, we have many different types (meaning different tree nodes) that have the same alias set and we really do mean that they are to conflict. That's nice. But are they handled properly? There are other questions we ask about alias sets other than do these two alias sets conflict (which is asking whether they are subsets of each other, or equal). We have good reasons to ask these questions. Can you give examples of those questions?
Re: Incorrect bitfield aliasing with Tree SSA
Richard Kenner writes: I am glad to see we are converging toward implementation issues now! I am storing it in a new field under the alias_set_entry: get_alias_set_entry (TYPE_ALIAS_SET (t))-nonaddr_alias_set. Where T is which type? Type of the expression passed to get_alias_set. And without the component_uses_parent_alias_set loop. Adam
Re: Incorrect bitfield aliasing with Tree SSA
Type of the expression passed to get_alias_set. And without the component_uses_parent_alias_set loop. So you mean the type of the *field*? That can't work. That type can't be used for *anything*! Otherwise, if you have struct foo {int a: 32; int b: 32; }; struct bar {int c: 32; int d: 32; }; you have the fields A and C conflicting, which is wrong. The T has to be the *record type*, so that when you share alias sets, it's the same for every type in the same record, not every occurence of some random type in different records.
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote: It gives you the alias set of the parent, which, for the reason that OTHER THINGS USE THE ALIAS SET SPLAY TREES, gives the wrong answer. Can you give a few sentence explanation of what alias set splay trees are and why they aren't using the alias set mechanism? They are the alias set mechanism, which you don't seem to understand. They always have been. How do you believe we determine whether two alias sets conflict, or are related at all? I'm not sure what a TBAA forest is, but keep in mind that, at least in Ada, we have many different types (meaning different tree nodes) that have the same alias set and we really do mean that they are to conflict. That's nice. But are they handled properly? Yes There are other questions we ask about alias sets other than do these two alias sets conflict (which is asking whether they are subsets of each other, or equal). We have good reasons to ask these questions. Can you give examples of those questions? I'd rather not explain all of alias.c to you in an email message, to be honest
Re: GCC Status Report (2007-06-15)
H. J. Lu wrote: Good. I have another question. Intel BID patch itself doesn't change any sources in DFP nor libdecnummber. The only significant change is to change Makefile in libgcc to use Intel BID library for DFP intrinsics when BID encoding is selected. Currently, DFP is only supported on Linux/PPC, which uses DPD encoding, and Linux/x86, which uses BID encoding. So Intel BID patch only affects Linux/x86 as it changes libgcc/Makefile.in to use Intel BID library. Who has the final say on this patch? The build system maintainers and the x86 maintainers. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Incorrect bitfield aliasing with Tree SSA
They are the alias set mechanism, which you don't seem to understand. They always have been. I certainly understand the alias set mechanism. It sounded like you were talking about something else since if the only thing we're using is alias sets, I'm mystified as to what the issue is. I'd rather not explain all of alias.c to you in an email message, to be honest As I said, I completely understand alias.c. It sounded like you were trying to do something OUTSIDE of that. So let's start again: why is it suddenly necessary that their be a hierarchy of alias sets when no fields are addressable? If I have struct foo {int a: 1; int b: 1;}; why do we need more than one alias set? Who is it that requires any subsetting at all? Certainly nothing in alias.c does.
RE: Some thoughts about steerring commitee work
Hi Dorit, loop-context when it helps you do things more efficiently. In any case, we'll have to have a much better cost model before we start packing random sequences of stmts out of loops. This is off topic from the discussion at hand, but we would be happy to help with changing the cost model to handle this in the autovect-branch or in mainline as you see it fit. Thanks, Harsha
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote: They are the alias set mechanism, which you don't seem to understand. They always have been. I certainly understand the alias set mechanism. It sounded like you were talking about something else since if the only thing we're using is alias sets, I'm mystified as to what the issue is. I'd rather not explain all of alias.c to you in an email message, to be honest As I said, I completely understand alias.c. You clearly do not It sounded like you were trying to do something OUTSIDE of that. So let's start again: why is it suddenly necessary that their be a hierarchy of alias sets when no fields are addressable? If I have struct foo {int a: 1; int b: 1;}; why do we need more than one alias set? Who is it that requires any subsetting at all? Certainly nothing in alias.c does. I'm not going through this again, i'm just going to fix the problem. I've wasted enough time on this.
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote: Type of the expression passed to get_alias_set. And without the component_uses_parent_alias_set loop. So you mean the type of the *field*? That can't work. That type can't be used for *anything*! Otherwise, if you have struct foo {int a: 32; int b: 32; }; struct bar {int c: 32; int d: 32; }; you have the fields A and C conflicting, which is wrong. With the current scheme you have fields a and b conflict and c and d conflicting Both of which are wrong HTH, Dan
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote: you have the fields A and C conflicting, which is wrong. Well, that is where structure-field aliasing comes in. The two cannot even alias for addressable fields: At tree level I'll take your word for it, but what about RTL level? Is that nonconflicting status passed to RTL? What *is* the problem with just using the parent alias set? At the RTL level, nothing. If you ever wanted to make the RTL level do *better* than it does now, you'd run into the same problem the tree level does. I continue to strongly feel that the field type shouldn't be used for ANYTHING! Then you will continue to get worse code generation than you could, in addition to bugs like we have now. HTH, Dan
The Linux binutils 2.17.50.0.17 is released
This is the beta release of binutils 2.17.50.0.17 for Linux, which is based on binutils 2007 0615 in CVS on sourceware.org plus various changes. It is purely for Linux. All relevant patches in patches have been applied to the source tree. You can take a look at patches/README to see what have been applied and in what order they have been applied. Starting from the 2.17.50.0.4 release, the default output section LMA (load memory address) has changed for allocatable sections from being equal to VMA (virtual memory address), to keeping the difference between LMA and VMA the same as the previous output section in the same region. For .data.init_task : { *(.data.init_task) } LMA of .data.init_task section is equal to its VMA with the old linker. With the new linker, it depends on the previous output section. You can use .data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) } to ensure that LMA of .data.init_task section is always equal to its VMA. The linker script in the older 2.6 x86-64 kernel depends on the old behavior. You can add AT (ADDR(section)) to force LMA of .data.init_task section equal to its VMA. It will work with both old and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and above is OK. The new x86_64 assembler no longer accepts monitor %eax,%ecx,%edx You should use monitor %rax,%ecx,%edx or monitor which works with both old and new x86_64 assemblers. They should generate the same opcode. The new i386/x86_64 assemblers no longer accept instructions for moving between a segment register and a 32bit memory location, i.e., movl (%eax),%ds movl %ds,(%eax) To generate instructions for moving between a segment register and a 16bit memory location without the 16bit operand size prefix, 0x66, mov (%eax),%ds mov %ds,(%eax) should be used. It will work with both new and old assemblers. The assembler starting from 2.16.90.0.1 will also support movw (%eax),%ds movw %ds,(%eax) without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are available at http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch The ia64 assembler is now defaulted to tune for Itanium 2 processors. To build a kernel for Itanium 1 processors, you will need to add ifeq ($(CONFIG_ITANIUM),y) CFLAGS += -Wa,-mtune=itanium1 AFLAGS += -Wa,-mtune=itanium1 endif to arch/ia64/Makefile in your kernel source tree. Please report any bugs related to binutils 2.17.50.0.17 to [EMAIL PROTECTED] and http://www.sourceware.org/bugzilla/ Changes from binutils 2.17.50.0.16: 1. Update from binutils 2007 0615. 2. Preserve section alignment for copy relocation. PR 4504. 3. Properly fix regression with objcopy --only-keep-debug. PR 4479. 4. Fix ELF eh frame handling. PR 4497. 5. Fix ia64 string merge. PR 4590. 5. Don't use PE target on EFI files nor EFI target on PE files. 6. Speed up linker with many input files. 7. Support cross compiling windres. PR 2737. 8. Fix various windres bugs. 9. Fix various arms bugs. 10. Fix various m68k bugs. 11. Fix various mips bugs. 12. Fix various ppc bugs. 13. Fix various sparc bugs. 14. Fix various spu bugs. 15. Fix various xtensa bugs. Changes from binutils 2.17.50.0.15: 1. Update from binutils 2007 0511. 2. Fix objcopy --only-keep-debug and linker multiple BSS sections handling. PR 4479. 3. Fix readelf -s -D for gnu hash. PR 4476. 4. Fix ia64 linker crash with --unresolved-symbols=ignore-all. PR 4409. 5. Improve crc32 support in x86 assembler/dissassembler. 6. Improve displacement handling in x86 dissassembler. PR 4430. 7. Correct PC relative displacement handling in x86-64 dissassembler for Intel mode. PR 4429. 8. Fix various PPC bugs. 9. Fix various SPU bugs. 10. Fix various ARM bugs. 11. Fix various m68k bugs. 12. Fix various xtensa bugs. Changes from binutils 2.17.50.0.14: 1. Update from binutils 2007 0418. 2. Support Intel SSE4 instructions. 3. Fix linker --fatal-warnings for --warn-shared-textrel. PR 4304. 4. Improve linker error message to identify linker script error location. PR 4090. 5. Fix objcopy to allow removing all sections. PR 4348. 6. Don't print addresses of 32-bit targets as 64-bit values on 64bit host. PR 4292. 7. Improve checking for corrupted input files. PR 4110. 8. Improve alpha linker performance. 9. Add a new linker option, -l:foo. 10. Fix a PPC linker bug. PR 4267. 11. Misc vxworks bug fixes. 12. Misc SH bug fixes. 13. Misc SPU bug fixes. 14. Misc ARM bug fixes. 15. Misc MIPS bug fixes. 16. Misc xtensa bug fixes. Changes from binutils 2.17.50.0.13: 1. Update from binutils 2007 0322. 2. Fix 16byte nop padding regression in x86 assembler. 3. Fix x86-64 disassembler for xchg. PR 4218. 4. Optimize opcode for x86-64 xchg. 5. Allow register operand with x86 nop. 6. Properly handle holes between sections for PE-COFF. PR 4210. 7. Print more PE-COFF info for
Re: Incorrect bitfield aliasing with Tree SSA
struct foo {int a: 32; int b: 32; }; struct bar {int c: 32; int d: 32; }; you have the fields A and C conflicting, which is wrong. With the current scheme you have fields a and b conflict and c and d conflicting Both of which are wrong But nothing is changing that! This is true whether or not the fields are addressable and for all proposals given so far. The only way to change this would be to make a new unique alias set for each nonaddressable field in a record and mark each as a subset of the record. This would be optimal, but is expensive for large records (e.g., ones with thousands of fields) and there's no good place to store such an alias set. However, you don't really NEED to deconflict such fields using alias sets since there are already mechanisms at both the tree and RTL level to know that such accesses can't conflict (being different FIELD_DECLs).
Re: Incorrect bitfield aliasing with Tree SSA
I continue to strongly feel that the field type shouldn't be used for ANYTHING! Then you will continue to get worse code generation than you could, in addition to bugs like we have now. Explain to me why in the following case: struct s1 {int a;}; struct s2 {short a;}; there should be any difference. Why should one reference something having to do with int and the other short? How does knowing the type of the field here help anything? Perhaps you are forgetting about MEM_EXPR (which I understand *very well* since I was the implementor of it)!
Re: Suffix for __float128 FP constants
On Mon, Jun 18, 2007 at 11:10:43AM -0700, Steve Ellcey wrote: BTW: IA64 has the same issues with two FP types (long double XFmode and longer double TFmode). How is this solved for IA64? Uros. This is different on IA64 HP-UX and IA64 Linux. On HP-UX, 128 bits is the standard long double and 80 bits is __float80. We use the 'W' suffix for a __float80 constant on HP-UX. HP-UX also uses a lower case 'w' in math names for functions (e.g. sqrtw) for __float80 functions. Since __float128 == long double on HP-UX we can just use 'L' and 'l' for those. We need a standard for __float128. Otherwise, a program using __float128 may generate different results with different compilers on different platforms. BTW, I had a __float128 patch for glibc. Because there is no __float128 standard, it wasn't accepted. H.J.
Re: Fixed-point branch?
Fu, Chao-Ying wrote: +ACCUM_MODE (HA, 2, 8, 7); /* s8.7 */ +ACCUM_MODE (SA, 4, 16, 15); /* s16.15 */ +ACCUM_MODE (DA, 8, 32, 31); /* s32.31 */ +ACCUM_MODE (TA, 16, 64, 63); /* s64.63 */ Lots of predefined types and modes in this patch. What about targets with other requirements (the Blackfin has 40 bit (8 + 32) accumulators)? In bfin-modes.def, we can adjust the DA mode to (s7.32) by using ADJUST_IBIT(DA, 7) ADJUST_FBIT(DA, 32) For vectors, we let the targets define the supported modes. Why do we want something else for fractional support? I am not clear about this question. The new modes (FRACT, UFRACT, ACCUM, and UACCUM) enables GCC to recognize the formats of the underlying values to perform constant folding (e.g., + - * /). To use the DA mode for vector, we can use: VECTOR_MODE (ACCUM, DA, 2); No, I was trying to make an analogy of how ports explicitly define the modes their hardware supports, e.g. for arm: /* Vector modes. */ VECTOR_MODES (INT, 4);/*V4QI V2HI */ VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ VECTOR_MODES (FLOAT, 8); /*V4HF V2SF */ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ I'm wondering whether it's a good idea to have a lot of pre-defined fractional modes and types that may or may not match the target hardware. Not saying it's necessarily wrong; I'm just interested to hear why you chose to do it this way. (I also just noticed that things like SHORT_ACCUM_TYPE_SIZE are used but apparently not defined in the patch - does it actually compile?) Bernd -- This footer brought to you by insane German lawmakers. Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368 Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Re: Suffix for __float128 FP constants
We need a standard for __float128. Otherwise, a program using __float128 may generate different results with different compilers on different platforms. BTW, I had a __float128 patch for glibc. Because there is no __float128 standard, it wasn't accepted. H.J. The HP compiler has an option that allows it to accept 'extended' as a type that is equivalent to __float80 and 'quad' as equivalent to 'long double' which is __float128 on HP-UX. For the quad type it uses the Q suffix for quad constants (and a lower case q for quad functions like sqrtq). I don't think this is a standard, but it is a precedent. Steve Ellcey [EMAIL PROTECTED]
Re: I'm sorry, but this is unacceptable (union members and ctors)
If all you need is one memeber that has constructors / destructors, and all other members are PODs that provide an alternate view of the contents, then I think that would make a logical extension of the transparent union extension. A transparent union as passed to functions in the same manner as its first member. You could define that a tranparent union is allowed to have as its first member a class with constructors and/or destructors, and that these constructors / destructors are then the constructors / destructors of the union. Caveat: If the union is larger or more alingned than its first member, the argument passing semantics don't make sense. This is documented in extend.texi: Second, the argument is passed to the function using the calling conventions of the first member of the transparent union, not the calling conventions of the union itself. All members of the union must have the same machine representation; this is necessary for this argument passing to work properly. There is also a syntax example for __attribute__ ((__transparent_union__)) in extend.texi. Inside the compiler, you can check if union is a transparent union using the TYPE_TRANSPARENT_UNION macro. If the initial union member can be an anonymous struct, and rather than expecting further members to be POD instead their ctors/dtors are simply ignored, then that would work for anything I can come up with. Try contrib/gcc_update --touch after the checkout. This suggestion made some ground. But I just can't get a build to complete. The newest checkout / release aren't compatible with my C libraries it seems, and I'm not sure its safe dependency wise to just replace the C libraries. So I rewound my subversion checkout to the same branch as is in my debian distribution repository. That build gave up when it couldn't find a directory called config/i386 I think it was. So I downloaded the same major release as my distro just now (4.0.0) and this one is trying to access the gcc/include directory with ../include from build*/liberty which obviously should be ../../include, so it gives up. I just can't win. -- View this message in context: http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11184663 Sent from the gcc - Dev mailing list archive at Nabble.com.
Re: Suffix for __float128 FP constants
On Mon, Jun 18, 2007 at 02:33:07PM -0700, Steve Ellcey wrote: We need a standard for __float128. Otherwise, a program using __float128 may generate different results with different compilers on different platforms. BTW, I had a __float128 patch for glibc. Because there is no __float128 standard, it wasn't accepted. H.J. The HP compiler has an option that allows it to accept 'extended' as a type that is equivalent to __float80 and 'quad' as equivalent to 'long double' which is __float128 on HP-UX. For the quad type it uses the Q suffix for quad constants (and a lower case q for quad functions like sqrtq). I don't think this is a standard, but it is a precedent. I used `q' as suffix for __float128 functions like __isinfq/__isnanq. But I used strtoqd since we have strtold. I like `Q' suffix in __float128 constants. H.J.
Re: I'm sorry, but this is unacceptable (union members and ctors)
On Jun 18, 2007, at 2:36 PM, michael.a wrote: This suggestion made some ground. But I just can't get a build to complete. The newest checkout / release aren't compatible with my C libraries it seems, and I'm not sure its safe dependency wise to just replace the C libraries. So I rewound my subversion checkout to the same branch as is in my debian distribution repository. That build gave up when it couldn't find a directory called config/i386 I think it was. So I downloaded the same major release as my distro just now (4.0.0) and this one is trying to access the gcc/include directory with ../include from build*/liberty which obviously should be ../../include, so it gives up. I just can't win. Sounds like you're using ./configure. Are you following the directions at: http://gcc.gnu.org/install/configure.html -eric
RE: Fixed-point branch?
Bernd Schmidt wrote: +ACCUM_MODE (HA, 2, 8, 7); /* s8.7 */ +ACCUM_MODE (SA, 4, 16, 15); /* s16.15 */ +ACCUM_MODE (DA, 8, 32, 31); /* s32.31 */ +ACCUM_MODE (TA, 16, 64, 63); /* s64.63 */ Lots of predefined types and modes in this patch. What about targets with other requirements (the Blackfin has 40 bit (8 + 32) accumulators)? In bfin-modes.def, we can adjust the DA mode to (s7.32) by using ADJUST_IBIT(DA, 7) ADJUST_FBIT(DA, 32) For vectors, we let the targets define the supported modes. Why do we want something else for fractional support? I am not clear about this question. The new modes (FRACT, UFRACT, ACCUM, and UACCUM) enables GCC to recognize the formats of the underlying values to perform constant folding (e.g., + - * /). To use the DA mode for vector, we can use: VECTOR_MODE (ACCUM, DA, 2); No, I was trying to make an analogy of how ports explicitly define the modes their hardware supports, e.g. for arm: /* Vector modes. */ VECTOR_MODES (INT, 4);/*V4QI V2HI */ VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ VECTOR_MODES (FLOAT, 8); /*V4HF V2SF */ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ I'm wondering whether it's a good idea to have a lot of pre-defined fractional modes and types that may or may not match the target hardware. Not saying it's necessarily wrong; I'm just interested to hear why you chose to do it this way. (I also just noticed that things like SHORT_ACCUM_TYPE_SIZE are used but apparently not defined in the patch - does it actually compile?) Ok. I got it. Maybe we treat fixed-point modes as the first class modes like other scalar modes (integer, floating, etc.), so we pre-define them. We can argue that how about making machine modes (ex: floating-point, decimal floating-point) not pre-defined, similar to vector modes. I think, the default fixed-point formats are the efficient ones for 32-bit/64-bit processors (with or without hardware supports). One of the goals for the fixed-point extension is that all targets in GCC will enable the extension, so efficient formats may be set by default. We have all FRACT and ACCUM sizes in defaults.h. Thanks! #ifndef SHORT_FRACT_TYPE_SIZE #define SHORT_FRACT_TYPE_SIZE BITS_PER_UNIT #endif #ifndef FRACT_TYPE_SIZE #define FRACT_TYPE_SIZE (BITS_PER_UNIT * 2) #endif #ifndef LONG_FRACT_TYPE_SIZE #define LONG_FRACT_TYPE_SIZE (BITS_PER_UNIT * 4) #endif #ifndef LONG_LONG_FRACT_TYPE_SIZE #define LONG_LONG_FRACT_TYPE_SIZE (BITS_PER_UNIT * 8) #endif #ifndef SHORT_ACCUM_TYPE_SIZE #define SHORT_ACCUM_TYPE_SIZE (SHORT_FRACT_TYPE_SIZE * 2) #endif #ifndef ACCUM_TYPE_SIZE #define ACCUM_TYPE_SIZE (FRACT_TYPE_SIZE * 2) #endif #ifndef LONG_ACCUM_TYPE_SIZE #define LONG_ACCUM_TYPE_SIZE (LONG_FRACT_TYPE_SIZE * 2) #endif #ifndef LONG_LONG_ACCUM_TYPE_SIZE #define LONG_LONG_ACCUM_TYPE_SIZE (LONG_LONG_FRACT_TYPE_SIZE * 2) #endif Regards, Chao-ying
Re: I'm sorry, but this is unacceptable (union members and ctors)
Eric Christopher-2 wrote: Sounds like you're using ./configure. Are you following the directions at: http://gcc.gnu.org/install/configure.html -eric Thank you, I guess I missed that page somehow. Only I ran into the same Libc wall again, so I'm temporarily stumped: /usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.so when searching for -lc /usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.a when searching for -lc /usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.so when searching for -lc /usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.a when searching for -lc /usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc /usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc /usr/bin/ld: cannot find -lc collect2: ld returned 1 exit status -- View this message in context: http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11185246 Sent from the gcc - Dev mailing list archive at Nabble.com.
Re: I'm sorry, but this is unacceptable (union members and ctors)
Thank you, I guess I missed that page somehow. Only I ran into the same Libc wall again, so I'm temporarily stumped: /usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.so when searching for -lc /usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.a when searching for -lc /usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.so when searching for -lc /usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.a when searching for -lc /usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc /usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc /usr/bin/ld: cannot find -lc collect2: ld returned 1 exit status You might want to make sure you're passing the same configure options that the distro did when building. It might cause some incompatibility somewhere that ld is detecting. From a quick look it seems that ld believes that the libc that you have doesn't match the gcc that you're building. (i.e. the bfd arch is incompatible). -eric
gcc-4.1-20070618 is now available
Snapshot gcc-4.1-20070618 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20070618/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 125829 You'll find: gcc-4.1-20070618.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20070618.tar.bz2 C front end and core compiler gcc-ada-4.1-20070618.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20070618.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20070618.tar.bz2 C++ front end and runtime gcc-java-4.1-20070618.tar.bz2 Java front end and runtime gcc-objc-4.1-20070618.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20070618.tar.bz2The GCC testsuite Diffs from 4.1-20070611 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
RE: I'm sorry, but this is unacceptable (union members and ctors)
On 18 June 2007 23:46, michael.a wrote: Eric Christopher-2 wrote: You might want to make sure you're passing the same configure options that the distro did when building. It might cause some incompatibility somewhere that ld is detecting. From a quick look it seems that ld believes that the libc that you have doesn't match the gcc that you're building. (i.e. the bfd arch is incompatible). -eric I'm sure plenty will look down their nose at me for asking in this forum(mailing list) ...but I recall a tool for checking libraries and build options (libtool libraries only maybe) ...I can't recall how or where, or think what to feed google (The architecture I'm building on is amd64 if that stirs up any ideas) I always need to export LD_LIBRARY_PATH=/usr/lib64:/usr/lib on linux-x86_64 before I can do a build. (Yes, I should probably set it in my .bashrc or whatever...) cheers, DaveK -- Can't think of a witty .sigline today
Re: I'm sorry, but this is unacceptable (union members and ctors)
Eric Christopher-2 wrote: 'gcc -v' will give you the information on how the system gcc was configured. -eric Here is the gcc -v output for the binaries installed by the distro: Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.0 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk-default --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr --disable-werror --enable-checking=release x86_64-linux-gnu Thread model: posix gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5) I'm attempting to build from the 4.0.0 releases. -- View this message in context: http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11185796 Sent from the gcc - Dev mailing list archive at Nabble.com.
Re: I'm sorry, but this is unacceptable (union members and ctors)
michael.a wrote: gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5) This belongs on gcc-help not here. Debian-based distros use a 32/64 bit /usr/lib configuration that is backwards from what the rest of the world uses and requires a patched gcc to multilib correctly. You'll probably need to --disable-multilib if you're building FSF gcc. Brian
Re: I'm sorry, but this is unacceptable (union members and ctors)
Brian Dessent wrote: michael.a wrote: gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5) This belongs on gcc-help not here. Debian-based distros use a 32/64 bit /usr/lib configuration that is backwards from what the rest of the world uses and requires a patched gcc to multilib correctly. You'll probably need to --disable-multilib if you're building FSF gcc. Brian Yeah, I know (mailing lists are so particular -- I guess I fail to see the value beyond a noncentralized discussion) In any case, without multilib it makes it to here: make[2]: Leaving directory `/home/users/michael/gcc.obj/gcc' echo timestamp stmp-multilib cp doc/gcc.1 doc/g++.1 cp: cannot stat `doc/gcc.1': No such file or directory make[1]: *** [doc/g++.1] Error 1 make[1]: Leaving directory `/home/users/michael/gcc.obj/gcc' make: *** [all-gcc] Error 2 Not sure exactly what is going on here. The gcc/doc directory is empty. I'm assuming everything made it through. There are about a billion targets in the Makefile and no explanatory header. Any suggestions for just building the essentials? I've only really recently taken on serious linux development and haven't much actual build experience outside the usual routine. I have autotools under my belt, but I glossed over most of the auxiliary stuff. -- View this message in context: http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11186332 Sent from the gcc - Dev mailing list archive at Nabble.com.
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Uros Bizjak [EMAIL PROTECTED] wrote: IMO, due to limited range of operands for -mrecip pass (inf, -inf); where 0.0 is excluded, it should be keept out of -ffast-math. There is no point to fix reciprocals only for 0.0, we need to fix both conversions for infinity and 0.0, even in -ffast-math. Indeed there are holes in every direction when you pull in such transformation, and the cost of plugging every one of them would be prohibitive; the next batch of c2d supposedly will leave you with ~6 cycles to make it worth for a sqrt. Of course it only gets worse when you start composing. My point merely was that, considering one operation, you'd introduce NaN for a not so special value (0) which, in a *fast* math scenario, could be produced at any previous stage due to denormal clamping; with no sane way to take care of. Again, if you look at prior art (icc, AMD's manual...), that's the only special case they covered. Admittedly that's a trade off but not that unreasonable. Now, an option to remove such transformations from -ffast-math bag-o-tricks would be fine and would still buy gcc some Spec bragging rights :)
virtual stack regs.
I would like to get some more information about pr32374. I do not know what virtual_stack_vars are and there is no documentation in the doc directory. 1) What are these? 2) Why are they uninitialized? 3) If they really are uninitialized, why is it a problem to assign zero to them. 4) If they are not uninitialized, where is the initialization code? Why does df not see it? 5) How can I tell if a reg is a virtual_stack_reg?
Re: I'm sorry, but this is unacceptable (union members and ctors)
On Mon, Jun 18, 2007 at 04:57:46PM -0700, michael.a wrote: Yeah, I know (mailing lists are so particular -- I guess I fail to see the value beyond a noncentralized discussion) But since I believe three different people have asked you to move this problem to a different mailing list now, could you please do so? Thanks. -- Daniel Jacobowitz CodeSourcery
Object attribute tagging
The question was raised a while back on the gcc-patches and gdb-patches lists of how GCC should tag objects with some ABI information for the use of GDB, noting that various different methods have been in use http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00395.html. Mark suggested http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00854.html that we use the ARM EABI object attribute mechanism; there were no objections to this at that time. This provides for both processor-specific and vendor-specific tags, and for tags at the level of object files, sections or individual functions (although binutils only really supports tags at the object file level at present). Tags can be merged from input files (with compatibility and merging rules defined) to produce the tag in an output file; the linker can give a warning or error for incompatible tags (e.g. object files using different ABI variants). I now propose implementing this to mark MIPS and Power objects with whether they are using the hard-float or soft-float ABI, both so the linker can complain if users accidentally link incompatible objects together, and so GDB knows the ABI in use by a binary. (It's desirable to know the ABI even in the absence of debug information, e.g. to call functions in libc, so DW_AT_calling_convention doesn't seem a sufficient alternative.) The ARM EABI uses a .ARM.attributes section of type SHT_ARM_ATTRIBUTES. For platforms where there isn't such a specification for that processor, I propose .GNU.attributes and SHT_GNU_ATTRIBUTES, and an assembler directive .gnu_attribute in place of .eabi_attribute. This would generate entries under the gnu vendor (whereas .eabi_attribute uses the standard aeabi); if more processor ABI specifications pick up the attributes specification then we could switch to appropriate processor-specific sections. On ARM, both .gnu_attribute and .eabi_attribute could be used, and would both generate entries in .ARM.attributes, under the gnu and aeabi vendors respectively. Appropriate parts of the ARM binutils code would be made available to all ELF binutils targets. The ARM EABI says that only standard entries under aeabi should affect link-compatibility of object files, not vendor entries such as gnu, but in the absence of corresponding standards for other processors I don't think we can avoid use of gnu for link-compatibility on non-ARM processors for now - if processor ABIs standardize things in future we can deprecate the associated gnu attributes. Additional object tagging ay be of use in future with LTO, to mark objects with information about command-line options used where such options are relevant to code generation but not recorded directly in the IR (e.g., target-specific options selecting CPU features that may be used or built-in functions that are enabled). We can allocate such tags in future as and when needed. I propose to establish some convention for which gnu attributes are target-dependent and which are target-independent. Any comments on either the general approach or the details? -- Joseph S. Myers [EMAIL PROTECTED]
Re: I'm sorry, but this is unacceptable (union members and ctors)
Daniel Jacobowitz-2 wrote: On Mon, Jun 18, 2007 at 04:57:46PM -0700, michael.a wrote: Yeah, I know (mailing lists are so particular -- I guess I fail to see the value beyond a noncentralized discussion) But since I believe three different people have asked you to move this problem to a different mailing list now, could you please do so? Thanks. -- Daniel Jacobowitz CodeSourcery I'm sorry, it just occurred to me that gcc-help was another forum in this Nabble interface (I'm not really sure how everything is related -- but mailing list subscriptions drive me crazy, so I was reticent to deal with another) Just for the record... michael.a wrote: In any case, without multilib it makes it to here: make[2]: Leaving directory `/home/users/michael/gcc.obj/gcc' echo timestamp stmp-multilib cp doc/gcc.1 doc/g++.1 cp: cannot stat `doc/gcc.1': No such file or directory make[1]: *** [doc/g++.1] Error 1 make[1]: Leaving directory `/home/users/michael/gcc.obj/gcc' make: *** [all-gcc] Error 2 Not sure exactly what is going on here. The gcc/doc directory is empty. I'm assuming everything made it through. There are about a billion targets in the Makefile and no explanatory header. Any suggestions for just building the essentials? This hack http://gcc.gnu.org/ml/gcc-bugs/2005-04/msg03614.html seemed to get through that bug (so many pitfalls) Since I'm already posting, now I'm seeing: /home/users/michael/gcc.obj/gcc/f951: symbol lookup error: /home/users/michael/gcc.obj/gcc/f951: undefined symbol: __gmp_get_memory_functions I installed the latest GMP libraries earlier, so I'm not really sure what to think, unless the libraries aren't backwards compatible. I will mention it in gcc-help tomorrow, unless I hear something. I hope this conversation isn't otherwise dead at this point. sincerely, michael -- View this message in context: http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11187556 Sent from the gcc - Dev mailing list archive at Nabble.com.
Re: Object attribute tagging
On Tue, Jun 19, 2007 at 01:50:27AM +, Joseph S. Myers wrote: The question was raised a while back on the gcc-patches and gdb-patches lists of how GCC should tag objects with some ABI information for the use of GDB, noting that various different methods have been in use http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00395.html. Mark suggested http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00854.html that we use the ARM EABI object attribute mechanism; there were no objections to this at that time. This provides for both processor-specific and vendor-specific tags, and for tags at the level of object files, sections or individual functions (although binutils only really supports tags at the object file level at present). Tags can be merged from input files (with compatibility and merging rules defined) to produce the tag in an output file; the linker can give a warning or error for incompatible tags (e.g. object files using different ABI variants). I now propose implementing this to mark MIPS and Power objects with whether they are using the hard-float or soft-float ABI, both so the linker can complain if users accidentally link incompatible objects together, and so GDB knows the ABI in use by a binary. (It's desirable to know the ABI even in the absence of debug information, e.g. to call functions in libc, so DW_AT_calling_convention doesn't seem a sufficient alternative.) The ARM EABI uses a .ARM.attributes section of type SHT_ARM_ATTRIBUTES. For platforms where there isn't such a specification for that processor, I propose .GNU.attributes and SHT_GNU_ATTRIBUTES, and an assembler directive .gnu_attribute in place of .eabi_attribute. This would generate entries under the gnu vendor (whereas .eabi_attribute uses the standard aeabi); if more processor ABI specifications pick up the attributes specification then we could switch to appropriate processor-specific sections. On ARM, both .gnu_attribute and .eabi_attribute could be used, and would both generate entries in .ARM.attributes, under the gnu and aeabi vendors respectively. Appropriate parts of the ARM binutils code would be made available to all ELF binutils targets. The ARM EABI says that only standard entries under aeabi should affect link-compatibility of object files, not vendor entries such as gnu, but in the absence of corresponding standards for other processors I don't think we can avoid use of gnu for link-compatibility on non-ARM processors for now - if processor ABIs standardize things in future we can deprecate the associated gnu attributes. Additional object tagging ay be of use in future with LTO, to mark objects with information about command-line options used where such options are relevant to code generation but not recorded directly in the IR (e.g., target-specific options selecting CPU features that may be used or built-in functions that are enabled). We can allocate such tags in future as and when needed. I propose to establish some convention for which gnu attributes are target-dependent and which are target-independent. Any comments on either the general approach or the details? I like this initiative. For x86, currently we have no way to make an object/shared library to indicate 1. Different parameter passing schemes: on stack vs. in registers. It could be even per function based. 2. Different alignment requirements. -malign-double. 3. Different long double. -m128bit-long-double vs. -m96bit-long-double. 4. Different ISAs, x87, SSE, SSE2, 5. Different fpmath. x87 vs. SSE. 6. Different x86-64 models. 7. With or without x86-64 red zone. 8. Different x86-64 ABIs. ELF vs. Win64. 9. Different ia32 stack aligment requirements. psABI only requires 10. byte alignment and gcc wants 16. It will be nice to address those issuses in a general and extensible way. H.J.
Re: Activate -mrecip with -ffast-math?
Bradley Lucier wrote: If -ffinite-math-only is specified, then producing NaN instead of inf should be allowed. Agreed. After all, -finite-math says: Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs. Since the compiler can assume the output isn't a NaN or an Inf, it can freely switch one and the other. If -fno-finite-math-only is specified, then the generated code should do the right thing if an argument or result is inf or NaN. Also agreed. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC Status Report (2007-06-15)
On Sat, 2007-06-16 at 06:17 -0700, H. J. Lu wrote: BTW, an x86 DFP configure bug was reported 3 months ago. But it still hasn't benen fixed. I opened a DFP bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32351 with a patch. I hope it will be fixed before gcc 4.3 is released :-). Sorry about the delay. Yes, I assure you it will be fixed by then and perhaps by the end of this week .. :-) Cheers, Ben
Re: Help in understanding ccp propagator
On Sun, 17 Jun 2007, Revital1 Eres wrote: Hello, I have one more question regarding the comment in tree-ssa-ccp.c file - /* Note that for propagation purposes, we are only interested in visiting statements that load the exact same memory reference stored here. Those statements will have the exact same list of virtual uses, so it is enough to set the output of this statement to be its first virtual definition. */ *output_p = first_vdef (stmt); I wonder if this comment is true also if the vuses are not immediate as in stmt no. 1 in the following example: 1) arr[i].x = tmp1; ... 2) arr[i].y = tmp2; ... 3) reg1 = arr[i].x; ... 4) arr[i].z = tmp2; ... 5) reg2 = arr[i].x; Is it because we are looking for the exact same memory reference (although not immediate) it is enough to look at only first_vdef of every store we encounter in our walk through the virtual def-use chain; or by looking only at the first vdef we could miss vuses that could have been reached by vdefs other than the first one? Well, in the current code we do not walk virtual use-def chains, but make sure all virtual operands are the same. So the above works. If you walk the chains you need to make sure that all virtual operands of the final def you are using are the same - then the above will work again. Richard.
Re: PR other/32351 [Was: Re: GCC Status Report (2007-06-15)]
It is a libdecnumber bug, which only affects x86. The patch is ok. Paolo
Re: Resuming SPEC performance tracking at RedHat
Hi, On Fri, 15 Jun 2007, Richard Guenther wrote: so, no PPC testing from us (the old testing machine died and we don't have a replacement for it). Actually it's back, and just needs to be partitioned and set up. Ciao, Michael.
Re: Activate -mrecip with -ffast-math?
On 6/17/07, Uros Bizjak [EMAIL PROTECTED] wrote: Hello! I was wondering if there are objects to automatically activating Uros' new -mrecip flag when -ffast-math is specified. It looks like a good match since -mrecip is exactly about fast non-precise mathematics. There is a discussion in gcc-patches@ mailing list about this topic, in Re: [PATCH, middle-end, i386]: reciprocal rsqrt pass + full recip x86 backend support thread [1]. The main problem is, that one of the polyhedron tests segfaults with this patch (not the problem of the recip patch, but usage of questionable FP equivalence tests and FP indexes in the array). Of course there are cases with every optimization enabled by -ffast-math that can break existing programs. Just that we know of one case beforehand shouldn't prevent us from enabling -mrecip at -ffast-math (provided -mno-recip still works, regardless if provided before or after -ffast-math). [We'll at least get some more testing coverage this way] Richard.
Re: Some thoughts about steerring commitee work
Dorit Nuzman wrote: H. J. Lu wrote: Why don't we turn on vectorizer at -O3 or even -O2, depending on ISA? I added -ftree-vectorize to BOOT_CFLAGS on x86-64. According to -ftree-vectorizer-verbose=1, there are 82 loops vectorized in gcc source. There are no regressions. There are not much changes in bootstrap time as well as make check time. We have about two dozen cases of packages that break when -ftree-vectorize is used. I'm sure there are several more as we tend to discourage such bug reports. If you could take the time to find the reduced testcases and file PRs for these, that would be most appreciated. I believe the majority of them can be traced back to PR 25413. For example building zlib with -O2 -march=pentium4 -ftree-vectorize will cause several apps that link to it (firefox, openoffice, poppler, etc.) to segfault. The vectorizer generates movdqa instructions with datarefs that are not aligned on a 16 byte boundary. there is an old patch floating around by Devang to address this problem (as mentioned in the PR). we should push this forward, it's really a simple fix. I'll try to get to it soonish Other than that, I went through the rest of our -ftree-vectorize bugs this morning and found that many of them have been fixed in 4.2, so the situation is much better than I originally thought. cool! thanks, dorit -- dirtyepic salesman said this vacuum's guaranteed gentoo org it could suck an ancient virus from the sea 9B81 6C9F E791 83BB 3AB3 5B2D E625 A073 8379 37E8 (0x837937E8)
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote: Of course there are cases with every optimization enabled by -ffast-math that can break existing programs. Just that we know of one case beforehand shouldn't prevent us from enabling -mrecip at -ffast-math (provided -mno-recip still works, regardless if provided before or after -ffast-math). [We'll at least get some more testing coverage this way] Argh! Please do not make -ffast-math even more of a pain to work with than it is already. You have to enable it, on the whole compilation unit, to get anywhere near decent performance; there's no escape: either you do not turn it on and everything slows to a crawl, or you pay for not being able to inline from another unit. Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. So please, for the love of everything's sacred, leave such stunts out of -ffast-math. PS: and it's not like such reciprocals + NR couldn't be done with intrinsics or easily handle such common case.
Re: Some thoughts about steerring commitee work
Daniel Berlin [EMAIL PROTECTED] wrote on 17/06/2007 18:18:19: ... The whole purpose of SLP was to enable straight line code vectorization outside of loops. I wouldn't say that's the whole purpose of SLP. I think the purpose and beauty of SLP is that it's a simple algorithm that makes vectorization (including vectorization of loops) easier by removing the need to prove all kinds of properties about the loop as a whole, as well as the need to transform loops to make them vectorizable. The fact that this scheme also works out of loops is a neat property because it makes loop-vectorization a special case of SLP. However, as far as I know (also from talking with the SLP authors) pretty much all the opportunities they had found at the time were in loops. Also, a lot of the SLP based work that followed it focused on loops, and analysis to determine by how much to unroll loops in order to accommodate SLP. So, in reality, as always, there are no free meals - you often really have to compensate for the simple loop-ignorant SLP analysis by doing a lot of loop-level analysis and transformations before hand. While it cannot replace the classic SLP algorithm out-of-loops (e.g. for completely unrolled loops), doing SLP in loops actually makes a lot of sense, IMHO. It lets us leverage already existing infrastructure (the slp patch recently committed to autovect-branch is really not big), and opens up a lot of opportunities we couldn't vectorize before (partially unrolled loops, partially vectorizable loops, accesses to consecutive struct fields, and in the future also permutations), while taking advantage of the loop-context when it helps you do things more efficiently. In any case, we'll have to have a much better cost model before we start packing random sequences of stmts out of loops. Simply because you can't find cases in SPEC2000 doesn't mean it's not useful. I don't know where you're taking this from. SPEC2000 is really not so interesting vectorization wise, inside or outside loops. dorit
Re: missed vectorization (was Some thoughts about steerring commitee work)
Tim Prince [EMAIL PROTECTED] wrote on 17/06/2007 19:47:10: [EMAIL PROTECTED] wrote: Tim Prince [EMAIL PROTECTED] wrote on 17/06/2007 04:15:56: [EMAIL PROTECTED] wrote: On Sat, Jun 16, 2007 at 06:54:46PM +0300, Dorit Nuzman wrote: There are quite a few known simple cases which vectorizer fails to vectorize. by known you mean there are open missed-optimization PRs for them? (if Yes, that is what I meant. I'd be happy to file some PRs along this line, if there is interest. C yes, there is or C++, if there's more interest in that than in Fortran. But, gfortran fails to vectorize more than 50% of the stuff I run into every day, including most everything which involves distinct sections of the same array or COMMON block. I thought there was already a PR opened for this issue (probably by Toon), but I can't find it :-( thanks, dorit There are several issues. EQUIVALENCE produces such a problem (PR32373) as do various kinds of references to multiple sections of the same array (PR32375,32376,32377,32378,32379,32380). Only 2 of those PRs involve actual source/destination overlap, where the vectorizer would have to choose the correct direction (loop reversed or not). In the bigger case (PR32380) there are loops which vectorize in isolation but not in the presence of other loops. thanks for taking the time to extract the testcases and open the PRs. I guess the discussion can continue in bugzilla now... There are existing PRs on a somewhat similar issue involving type casting in C. IMHO, not vectorizing those might seem excusable. I think we should teach the vectorizer to handle those as well (another issue I've been wanting to get to in a while...) thanks, dorit Thanks, Tim
Re: Activate -mrecip with -ffast-math?
On 6/18/07, tbp [EMAIL PROTECTED] wrote: On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote: Of course there are cases with every optimization enabled by -ffast-math that can break existing programs. Just that we know of one case beforehand shouldn't prevent us from enabling -mrecip at -ffast-math (provided -mno-recip still works, regardless if provided before or after -ffast-math). [We'll at least get some more testing coverage this way] Argh! Please do not make -ffast-math even more of a pain to work with than it is already. You have to enable it, on the whole compilation unit, to get anywhere near decent performance; there's no escape: either you do not turn it on and everything slows to a crawl, or you pay for not being able to inline from another unit. Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. No, that's not the contract with -ffast-math. Note that -ffast-math enables -funsafe-math-optimizations which is allowed to change results (add/remove rounding operations, contract expressions, do transforms like a/b to a * 1/b, do transformations that get you bigger errors than 0.5ulp, etc.) But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Hm, which particular case are you concerned about (maybe it was mentioned, but I don't remember the details)? Note that -ffast-math enables -ffinite-math-only as well, so the compiler assumes nothing will result in NaNs or Infs. So please, for the love of everything's sacred, leave such stunts out of -ffast-math. Well - certainly another reason for the Math BOF ;) We all expect very different things from -ffast-math or -funsafe-math-optimizations. PS: and it's not like such reciprocals + NR couldn't be done with intrinsics or easily handle such common case. Well, most optimization challenges can be solved if we are allowed to touch the source ;) Thanks, Richard.
[M16C] : 20 bit data access
Hi, We have come up with two possible solutions to solve the 20 bit data access problem in m16c targets. We are very grateful for all the suggestions until now on this issue. Solution 1 is based on the discussion at the following link -: http://gcc.gnu.org/ml/gcc/2007-04/msg00402.html - 2 new attributes far_data (to use external memory for data storage) and far_rodata will be added. - Non-constant initialized variables specified with the attribute far_data will be placed in the section .fdata (far memory). - Constant variables specified with the attribute far_rodata will be placed in the section .frodata (far memory). - Default linker script will be modified for the addition of two new sections .fdata and frodata. - LDE/STE instructions will be used to access the variables specified with the attribute far_data and far_rodata. - Default constant strings (ex. strings in printf) and constant variable without the attribute far_rodata will be placed in Section '.rodata' (current implementation). - The section '.rodata' has to be still copied from ROM to RAM (current implementation for M16C devices that does not have Flash in near memory). Solution 2 based on the discussion at the following link -: http://sources.redhat.com/ml/binutils/2007-05/msg00381.html - By default, LDE instructions will be used to access the entire constant variables. - A new target specific option -mno-far-constdata will be added. - This option can be used to override default generation of LDE instructions. 'MOV' instruction will be used to access these variables instead (current implementation). - New attribute far_data (to use external memory for data storage) will be added. - Non-constant initialized variables specified with the attribute far_data will be placed in a section .fdata (far memory). - LDE/STE instructions will be used to access the non-constant variables specified with the attribute far_data. - New attribute near_rodata will be added. This attribute will be used for the latest M16C targets that have 4K/8K flash in near Memory. - Constant variables specified with the attribute near_rodata will be placed in a section .nrodata (near memory). - MOV instructions will be used to access the constant variables specified with the attribute near_rodata. - Default linker script will be modified for placing the default section '.rodata' in Far Memory. - Default linker script will be modified for the addition of a new section '.fdata' in far memory and nrodata in near memory. Please comment on above proposed solutions and also let us know the possibility of acceptance of any of these by FSF. Regards, Naveen.H.S. KPIT Cummins Infosystems Ltd, Pune (INDIA) ~~ Free download of GNU based tool-chains for Renesas' SH, H8, R8C, M16C and M32C Series. The following site also offers free technical support to its users. Visit http://www.kpitgnutools.com for details. Latest versions of KPIT GNU tools were released on June 1, 2007. ~~
Re: More vectorizer testcases?
Giovanni Bajo [EMAIL PROTECTED] wrote on 17/06/2007 20:43:15: Hi Dorit, some years ago I posted these testcases to Bugzilla's GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18439 It looks like none of those are vectorized as of GCC 4.3. I read today that you're asking for more vectorizer testsuite so I was wondering: 1) Shall we add a GCC bugzilla component for the vectorizer? Currently the bugs are filed under tree-optimization which might be a little toogeneric, these days. Maybe, I'm not sure. A lot of times the vectorizer missed-optimization bugs depend on other components of the compiler, so I don't know if filing them under a vectorizer component would help. I don't have a strong preference about this. 2) Do you need more testcases from geometric code like those above? Those 3 above are pretty simple in fact, so I doubt more complex ones can beof help, but I can extract something more from my code if you want... these 3 are actually not so simple... the main thing that's blocking 2 of them right now is that they need support for stores with gaps, which can be added except the other problem is that the vectorizer thinks it's not profitable to vectorize them (or rather 2 of them. as does ICC by the way). Since the time you opened these PRs we came quite a bit closer to vectorizing these (the support for interleaved accesses and for multiple data-types were also required). It will be fun to add the last missing bit - the support for the stores-with-gaps. I hope we'll get to it before too long... If you have other (hot) code examples that expose different missing features I think that's always interesting to know about (but if it's like the codes above then maybe it will not have much added value...). thanks! dorit -- Giovanni Bajo
Re: Incorrect bitfield aliasing with Tree SSA
On 6/18/07, Eric Botcazou [EMAIL PROTECTED] wrote: I'm completely unsurprised this is broken at the tree level given how it is implemented Nice tautology. :-) You have resisted implementing anything at the tree level to fix the problem and now you're complaining there is a problem... Pardon? If it was designed properly in the first place, there simply would *be no problem at the tree level*, because nothing would have broken. Everyone in the bug reports in question has told you this, not just me. Let's try and devise something plausible at the tree level. If we eventually fail, we could indeed consider disabling the optimization at the RTL level. So far you guys have resisted what seem like perfectly reasonable solutions by Adam
Re: Activate -mrecip with -ffast-math?
On 6/18/07, tbp [EMAIL PROTECTED] wrote: Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Attached patch to should fix these troubles for the cost of 2 extra clocks. The trick is to limit the result just below infinity for rsqrt, and this keeps 0.0*(inf-) - 0.0. Uros. Index: i386.c === --- i386.c (revision 125790) +++ i386.c (working copy) @@ -22590,7 +22590,7 @@ void ix86_emit_swdivsf (rtx res, rtx a, void ix86_emit_swsqrtsf (rtx res, rtx a, enum machine_mode mode, bool recip) { - rtx x0, e0, e1, e2, e3, three, half; + rtx x0, e0, e1, e2, e3, three, half, bignum; x0 = gen_reg_rtx (mode); e0 = gen_reg_rtx (mode); @@ -22600,15 +22600,18 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, three = CONST_DOUBLE_FROM_REAL_VALUE (dconst3, SFmode); half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, SFmode); + bignum = gen_lowpart (SFmode, GEN_INT (0x7f7f)); if (VECTOR_MODE_P (mode)) { three = ix86_build_const_vector (SFmode, true, three); half = ix86_build_const_vector (SFmode, true, half); + bignum = ix86_build_const_vector (SFmode, true, bignum); } three = force_reg (mode, three); half = force_reg (mode, half); + bignum = force_reg (mode, bignum); /* sqrt(a) = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) 1.0 / sqrt(a) = 0.5 * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) */ @@ -22617,6 +22620,9 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, emit_insn (gen_rtx_SET (VOIDmode, x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), UNSPEC_RSQRT))); + emit_insn (gen_rtx_SET (VOIDmode, x0, + gen_rtx_SMIN (mode, x0, bignum))); + /* e0 = x0 * a */ emit_insn (gen_rtx_SET (VOIDmode, e0, gen_rtx_MULT (mode, x0, a)));
[Bug tree-optimization/19910] [4.2/4.3 regression] ICE with -ftree-loop-linear
--- Comment #12 from pinskia at gcc dot gnu dot org 2007-06-18 06:01 --- This no longer crashes for me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19910
[Bug tree-optimization/21485] [4.0/4.1/4.2/4.3 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)
--- Comment #22 from pinskia at gcc dot gnu dot org 2007-06-18 06:12 --- This is basically fixed by the pointer_plus except we still have some combinable code (though this is not PRE's fault); see http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01996.html for how to fix that issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
[Bug middle-end/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)
--- Comment #11 from pinskia at gcc dot gnu dot org 2007-06-18 06:16 --- *** Bug 30958 has been marked as a duplicate of this bug. *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||dcb314 at hotmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784
[Bug tree-optimization/30958] [4.3 Regression] ice for legal code with -ftree-vectorize -Os (-m64)
--- Comment #7 from pinskia at gcc dot gnu dot org 2007-06-18 06:16 --- *** This bug has been marked as a duplicate of 30784 *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30958
[Bug tree-optimization/32383] [4.3 regression] ICE with reciprocals and -ffast-math
--- Comment #2 from ubizjak at gmail dot com 2007-06-18 06:41 --- Patch in testing. -- ubizjak at gmail dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |ubizjak at gmail dot com |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-06-18 06:41:08 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32383
[Bug middle-end/32176] [4.3 Regression] ICE tree-type mismatch: expected integer_cst, have plus_expr in int_cst_value, at tree.c:7720
--- Comment #7 from pinskia at gcc dot gnu dot org 2007-06-18 06:42 --- There is a cast which confuses SCEV. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32176
[Bug libstdc++/32354] libstdc++.so.6 missing RPATH
--- Comment #5 from stephan dot bergmann at sun dot com 2007-06-18 06:54 --- Re #3: http://gcc.gnu.org/onlinedocs/libstdc++/install.html#usage is not relevant here. That info is about how client code can find libstdc++.so. This issue is about how libstdc++.so can find the libraries it itself depends on. Re #4: Not sure I understand you completely. If you move libstdc++.so and libgcc_s.so somewhere else but keep their relative locations intact (i.e., both in the same directory), RPATH=$ORIGIN in libstdc++.so still works to locate the matching libgcc_s.so. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32354
[Bug middle-end/20983] [4.0/4.1/4.2/4.3 Regression] varargs functions force va_list variable to stack unnecessarily
--- Comment #10 from pinskia at gcc dot gnu dot org 2007-06-18 06:55 --- For -O1, it is even worse. I think we need to mark va_start/va_end as cannot call clober their inputs at the tree level. This should at least fix the -O1 issue. It might also help code gen in other cases which I am not thinking of currently. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20983
[Bug bootstrap/32334] Bootstrap comparison failure when comparing stage 2 and 3
--- Comment #2 from redriver at korea dot ac dot kr 2007-06-18 07:40 --- (In reply to comment #1) What version of GCC are you starting with? This works for me on an i686-linux-gnu machine (a pentium 4D). I think I found the problem. Previously, I use the gcc-3.2.2 to bootstrap the gcc-4.2.0. But when I used higher version of gcc (gcc-4.1.2) to bootstrap then there was no error. I think there is some incompatible between the two versions 3.2.2 and 4.2.0. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32334
[Bug tree-optimization/30175] [4.3 Regression] Runtime regressions with mem-ssa merge in Polyhedron and tramp3d-v4
--- Comment #2 from rguenth at gcc dot gnu dot org 2007-06-18 07:54 --- Fixed. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30175
[Bug fortran/20373] INTRINSIC symbols can be given the wrong type
--- Comment #13 from patchapp at dberlin dot org 2007-06-18 08:00 --- Subject: Bug number PR20373 A patch for this bug has been added to the patch tracker. The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01216.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20373
[Bug tree-optimization/32383] [4.3 regression] ICE with reciprocals and -ffast-math
--- Comment #3 from uros at gcc dot gnu dot org 2007-06-18 08:31 --- Subject: Bug 32383 Author: uros Date: Mon Jun 18 08:30:47 2007 New Revision: 125790 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=125790 Log: PR tree-optimization/32383 * targhooks.c (default_builtin_reciprocal): Add new bool argument. * targhooks.h (default_builtin_reciprocal): Update prototype. * target.h (struct gcc_target): Update builtin_reciprocal. * doc/tm.texi (TARGET_BUILTIN_RECIPROCAL): Update description. * tree-ssa-math-opts (execute_cse_reciprocals): Skip statements where arg1 is not SSA_NAME. Pass true to targetm.builtin_reciprocal when fndecl is in BUILT_IN_MD class. (execute_convert_to_rsqrt): Ditto. * config/i386/i386.c (ix86_builtin_reciprocal): Update for new bool argument. Convert IX86_BUILTIN_SQRTPS code only when md_fn is true. Convert BUILT_IN_SQRTF code only when md_fn is false. testsuite/ChangeLog: PR tree-optimization/32383 * testsuite/g++.dg/opt/pr32383.C: New test. Added: trunk/gcc/testsuite/g++.dg/opt/pr32383.C Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/doc/tm.texi trunk/gcc/target.h trunk/gcc/targhooks.c trunk/gcc/targhooks.h trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-math-opts.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32383
[Bug tree-optimization/32383] [4.3 regression] ICE with reciprocals and -ffast-math
--- Comment #4 from ubizjak at gmail dot com 2007-06-18 08:33 --- Fixed. -- ubizjak at gmail dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32383