Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
Quoting Ian Lance Taylor : The scheme that Paolo describes avoids virtual functions. But for this usage I personally would prefer virtual functions, since there is no efficiency cost compared to a target hook. Well, actually, there is: you first fetch the object pointer, then you find the vtable pointer, and then you load the function pointer. With the target hook, you load the function pointer. And with the function-name-valued macro, you directly call the function. Does it matter? I don't know, but I would guess it doesn't.
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
Nathan Froyd writes: > On Wed, Nov 17, 2010 at 03:40:39AM +0100, Paolo Bonzini wrote: >> True, but you can hide that cast in a base class. For example you >> can use a hierarchy >> >> Target // abstract base >> TargetImplBase // provides strong typing >> TargetI386 // actual implementation >> >> The Target class would indeed take a void *, but the middle class >> would let TargetI386 think in terms of TargetI386::CumulativeArgs >> with something like >> >> void f(void *x) { >> // T needs to provide void T::f(T::CumulativeArgs *) >> f(static_cast (x)); >> } >> >> The most similar thing in C (though not suitable for multitarget) is >> a struct, which is why I suggest using that now rather than void * >> (which would be an implementation detail). > > I am admittedly a C++ newbie; the first thing I thought of was: > > class gcc::cumulative_args { > virtual void advance (...) = 0; > virtual rtx arg (...) = 0; > virtual rtx incoming_arg (...) { return this->arg (...); }; > virtual int arg_partial_bytes (...) = 0; > // ...and so on for many of the hooks that take CUMULATIVE_ARGS * > // possibly with default implementations instead of pure virtual > // functions. > }; > > class i386::cumulative_args : gcc::cumulative_args { > // concrete implementations of virtual functions > }; > > // the hook interface is then solely for the backend to return > // `cumulative_args *' things (the current INIT_*_ARGS macros), which > // are then manipulated via the virtual functions above. > > AFAICS, this eliminates the casting issues Joern described. What are > the advantages of the scheme you describe above? (Honest question.) Or > are we talking about the same thing in slightly different terms? The scheme that Paolo describes avoids virtual functions. But for this usage I personally would prefer virtual functions, since there is no efficiency cost compared to a target hook. Ian
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
Quoting Nathan Froyd : I am admittedly a C++ newbie; the first thing I thought of was: class gcc::cumulative_args { virtual void advance (...) = 0; virtual rtx arg (...) = 0; virtual rtx incoming_arg (...) { return this->arg (...); }; virtual int arg_partial_bytes (...) = 0; // ...and so on for many of the hooks that take CUMULATIVE_ARGS * // possibly with default implementations instead of pure virtual // functions. }; Trying to put a target-derived object of that into struct rtl_data would be nonsentical. You might store a pointer, of course. But at any rate, the member function implementations would not be part of the globally-visible target vector. They are in a smaller vector, and only the pieces of the middle end that deal with argument passing get to see them. Does that mean you acknowledge that we shouldn't have CUMULATIVE_ARGS taking hooks in the global target vector?
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
On Wed, Nov 17, 2010 at 03:40:39AM +0100, Paolo Bonzini wrote: > True, but you can hide that cast in a base class. For example you > can use a hierarchy > > Target // abstract base > TargetImplBase // provides strong typing > TargetI386 // actual implementation > > The Target class would indeed take a void *, but the middle class > would let TargetI386 think in terms of TargetI386::CumulativeArgs > with something like > > void f(void *x) { > // T needs to provide void T::f(T::CumulativeArgs *) > f(static_cast (x)); > } > > The most similar thing in C (though not suitable for multitarget) is > a struct, which is why I suggest using that now rather than void * > (which would be an implementation detail). I am admittedly a C++ newbie; the first thing I thought of was: class gcc::cumulative_args { virtual void advance (...) = 0; virtual rtx arg (...) = 0; virtual rtx incoming_arg (...) { return this->arg (...); }; virtual int arg_partial_bytes (...) = 0; // ...and so on for many of the hooks that take CUMULATIVE_ARGS * // possibly with default implementations instead of pure virtual // functions. }; class i386::cumulative_args : gcc::cumulative_args { // concrete implementations of virtual functions }; // the hook interface is then solely for the backend to return // `cumulative_args *' things (the current INIT_*_ARGS macros), which // are then manipulated via the virtual functions above. AFAICS, this eliminates the casting issues Joern described. What are the advantages of the scheme you describe above? (Honest question.) Or are we talking about the same thing in slightly different terms? -Nathan
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
On 11/17/2010 03:10 AM, Ian Lance Taylor wrote: Joern Rennecke writes: I don't see how going to a struct cumulative_args gets us closer to a viable solution for a multi-target executable, even if you threw in C++. Having the target describe a type, and shoe-horning this through a target hook interface that is decribed in supposedly target-independent terms will require a cast at some point. [...] Converting an empty base class to a derived class is not really safer than converting a void * to a struct pointer. True, but you can hide that cast in a base class. For example you can use a hierarchy Target // abstract base TargetImplBase // provides strong typing TargetI386 // actual implementation The Target class would indeed take a void *, but the middle class would let TargetI386 think in terms of TargetI386::CumulativeArgs with something like void f(void *x) { // T needs to provide void T::f(T::CumulativeArgs *) f(static_cast (x)); } The most similar thing in C (though not suitable for multitarget) is a struct, which is why I suggest using that now rather than void * (which would be an implementation detail). Paolo
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
Joern Rennecke writes: > I don't see how going to a struct cumulative_args gets us closer to > a viable solution for a multi-target executable, even if you threw in > C++. Having the target describe a type, and shoe-horning this through > a target > hook interface that is decribed in supposedly target-independent terms > will require a cast at some point - either of the hook argument that > describes the cumulative args, the hook pointer (not valid C / C++), or > a pointer to the target vector, or a pointer to some factored-out part of > the target vector. Converting an empty base class to a derived class > is not really safer than converting a void * to a struct pointer. > And switching to a dynamically typed language is not really on the > agenda... In C++ we would use a pure abstract base class in the target hooks and the targets would have to provide an implementation for the base class. Ian
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
Quoting Paolo Bonzini : I think a multi-target executable would be just too ugly in C due to issues such as this. I don't think it's worthwhile to sacrifice type safety now, so a struct cumulative_args is preferrable. I don't see how going to a struct cumulative_args gets us closer to a viable solution for a multi-target executable, even if you threw in C++. Having the target describe a type, and shoe-horning this through a target hook interface that is decribed in supposedly target-independent terms will require a cast at some point - either of the hook argument that describes the cumulative args, the hook pointer (not valid C / C++), or a pointer to the target vector, or a pointer to some factored-out part of the target vector. Converting an empty base class to a derived class is not really safer than converting a void * to a struct pointer. And switching to a dynamically typed language is not really on the agenda... Fully hookizing the CUMULATIVE_ARGS taking macro has really landed us with this typing mess. If we had only used targhooks.c wrappers around the original macros, we could still enjoy type safety for the targhooks.c / target interface, a sane include hierarchy, and easy extension to a multi-target compiler. I'm afraid the only sane way to have these hooks is changing the CUMULATIVE_ARGS pointers into void pointers. As I said before, we can make this more readable by using a typedef cumulative_args_t; but there has to be a cast in every CUMULATIVE_ARGS taking target hook implementation, or in a helper function which the hook uses (unless the argument is unused). All in all it's a 136 KB patch; I'm currently writing the ChangeLog and running 38 builds. I've tried auto-generating a union before, and for some targets there are macros that cause conflicts. To get a cumulative_args union reliably would require separate header files for each target's definition. And you'd still have to select the target's field inside of each hook implementation - that is a direct consequence of an interface that connects not the target-specific middle-end to one target, but all parts of the compiler to potentially every target. The alternative would be to undo the the hookization of the CUMULATIVE_ARGS taking hooks. Tying the middle-end code that deals with calls again a bit closer to the target, but allowing all the other parts of the compiler to be blissfully ignorant of these interfaces. In C++, you could make the middle-end be a template that takes the target as a parameter, including a CUMULATIVE_ARGS type. But that's not much more than syntactic sugar for having the targets setting different macros and compiling the middle-end accordingly.
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
On 11/16/2010 10:17 PM, Ian Lance Taylor wrote: I don't know how we want to get there, but it seems to me that the place we want to end up is with the target hooks defined to take an argument of type struct cumulative_args * (or a better name if we can think of one). Actually, this doesn't work, because then different target vectors have different types. You might get away with it now, but LTO on a multi-target compiler would fail. Good point. I think we should just typedef void *cumulative_args_t; and use that for our hooks. Another area where we can do something much nicer when we move to C++. This something could be something like target_i386::cumulative_args, implemented e.g. using the curiously recurring template pattern (http://en.wikipedia.org/wiki/Curiously_recurring_template_pattern). I think a multi-target executable would be just too ugly in C due to issues such as this. I don't think it's worthwhile to sacrifice type safety now, so a struct cumulative_args is preferrable. Paolo
RE: __ghtread_recursive_mutex_destroy missing
> The gthreads portability layer is missing a function for destroying a > __ghtread_recursive_mutex object. > > For pthreads-based models the recursive mutex type is the same as the > normal mutex type so __gthread_mutex_destroy handles both, but they're > distinct types for (at least) gthr-win32.h, so we can't properly > cleanup recursive mutexes in libstdc++ > > Any objections if I prepare a patch to add > __gthread_recursive_mutex_destroy to each gthr header? It makes sense. libobjc could use these as well; all mutexes in libobjc are recursive mutexes. At the moment libobjc uses __gthread_objc_mutex_xxx and similar, but it should probably move to use __gthread_recursive_mutex_xxx. Thanks
gcc-4.4-20101116 is now available
Snapshot gcc-4.4-20101116 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20101116/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 166830 You'll find: gcc-4.4-20101116.tar.bz2 Complete GCC (includes all of below) MD5=97ccc9bf753f6de5efed685b93a9b49c SHA1=0df0f8f102ee40f05fa5141805c36cee712d448c gcc-core-4.4-20101116.tar.bz2C front end and core compiler MD5=9cee461ea45ad893964e5b3ce8ae0c15 SHA1=961c76a219af48778e72d472e55ce73cf03e1292 gcc-ada-4.4-20101116.tar.bz2 Ada front end and runtime MD5=0cf9434083986e61d0a5db6ab07b330b SHA1=6ec6e632b612bf1c9ae03af0d3829b0d2d19a840 gcc-fortran-4.4-20101116.tar.bz2 Fortran front end and runtime MD5=02b1543c6c9a0906757ed63dfe1ed9cc SHA1=f38c7ddfb83a50ef6171da06933be1001ad28f13 gcc-g++-4.4-20101116.tar.bz2 C++ front end and runtime MD5=a44b703fc3b75265095ca8b17d8e9733 SHA1=31ac3d39a382e90516c643c9589fe80b306bafa7 gcc-java-4.4-20101116.tar.bz2Java front end and runtime MD5=0c398e643705f2bc5f31c6f5ebf203ef SHA1=d31c275e188c5e22ad395be88d720ca95e50e72f gcc-objc-4.4-20101116.tar.bz2Objective-C front end and runtime MD5=a5fd4c4adc4c0e825163b0df2813b02c SHA1=23d03525473e2b11f63afdb757d77b7d0be5db43 gcc-testsuite-4.4-20101116.tar.bz2 The GCC testsuite MD5=433962a9cfbcd076fb0dfd381aaeca66 SHA1=ac5af875f8d3ea4443ed8a10221db73cd9eefcc3 Diffs from 4.4-20101109 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
Joern Rennecke writes: > Quoting Ian Lance Taylor : > >> Joern Rennecke writes: >> >>> Before I go and make all these target changes & test them, is there at >>> least agreemwent that this is the right approach, i.e replacing >>> CUMULATIVE_ARG * >>> with void *, and splitting up x_rtl into two variables. >> >> I don't know how we want to get there, but it seems to me that the place >> we want to end up is with the target hooks defined to take an argument >> of type struct cumulative_args * (or a better name if we can think of >> one). > > Actually, this doesn't work, because then different target vectors have > different types. You might get away with it now, but LTO on a multi-target > compiler would fail. Good point. > I think we should just > typedef void *cumulative_args_t; > > and use that for our hooks. Another area where we can do something much nicer when we move to C++. Ian
Re: Mailing lists for back-end development?
On 11/16/2010 11:24 AM, Dave Korn wrote: > I think it's probably an over-engineered solution to a problem we could > really address best by remembering to use []-tags in the subject lines. OK, that seems to be as close to consensus as we're probably going to get. Let's try and do that. Thank you, -- Mark Mitchell CodeSourcery m...@codesourcery.com (650) 331-3385 x713
CUMULATIVE_ARGS in hooks (Was: RFC: semi-automatic hookization)
Quoting Ian Lance Taylor : Joern Rennecke writes: Before I go and make all these target changes & test them, is there at least agreemwent that this is the right approach, i.e replacing CUMULATIVE_ARG * with void *, and splitting up x_rtl into two variables. I don't know how we want to get there, but it seems to me that the place we want to end up is with the target hooks defined to take an argument of type struct cumulative_args * (or a better name if we can think of one). Actually, this doesn't work, because then different target vectors have different types. You might get away with it now, but LTO on a multi-target compiler would fail. I think we should just typedef void *cumulative_args_t; and use that for our hooks.
Re: RFC: semi-automatic hookization
Quoting Ian Lance Taylor : Joern Rennecke writes: Before I go and make all these target changes & test them, is there at least agreemwent that this is the right approach, i.e replacing CUMULATIVE_ARG * with void *, and splitting up x_rtl into two variables. I don't know how we want to get there, but it seems to me that the place we want to end up is with the target hooks defined to take an argument of type struct cumulative_args * (or a better name if we can think of one). We could consider moving the struct definition into CPU.c, and having the target structure just report the size, or perhaps a combined allocation/INIT_CUMULATIVE_ARGS function. Ian
Re: Mailing lists for back-end development?
On 16/11/2010 17:29, Mark Mitchell wrote: > I spoke with a partner today who suggested that perhaps it would be a > bit easier to follow the voluminous GCC mailing list if we had separate (Do you mean "the voluminous gcc-patches mailing list" perhaps?) > lists for patches related to particular back-ends (e.g., ARM, MIPS, > Power, SuperH, x86, etc.). I think it's probably an over-engineered solution to a problem we could really address best by remembering to use []-tags in the subject lines. If usenet taught us anything, it's that you can't solve real problems just by renaming (or subdividing) groups. I think it would also be more-or-less counter-productive; as all the back-ends share a common interface, I think most backend maintainers need to keep an eye on what's going on with other backends anyway, even when not directly involved. So I think we'd all just end up subscribed to a dozen-plus mailing lists instead of one and still have pretty much the same amount of incoming mail to sift through anyway. That being so, doing it at our clients by filtering on tags makes as much sense as anything else. cheers, DaveK
Re: Mailing lists for back-end development?
On Tue, Nov 16, 2010 at 09:57, Richard Henderson wrote: > I think that splitting things all the way down to $arch is probably > not useful in that things that affect all backends will not get > addressed promptly if backend reviewers are so narrowly focused. Agreed. A backend specific list may work, but I don't think it would be useful to make it too specific. I'm not too sanguine about the whole idea, though. Perhaps encourage the use of [prefix] tags like we do for branches and large modules? I would rather use tagging than a fixed taxonomy. It's more flexible and easier to change if our needs change. Diego.
Re: Mailing lists for back-end development?
On 11/16/2010 09:29 AM, Mark Mitchell wrote: > The idea here is that (as with libstdc++), we'd send patches to > gcc-patches@ and gcc-$arch@, but that reviewers for a particular > back-end would find it easier to keep track of things on the > architecture-specific lists, and also that this would make it easier > when trying to track down patches to backport to distribution versions > of the compiler. > > What do people think about this idea? I think that splitting things all the way down to $arch is probably not useful in that things that affect all backends will not get addressed promptly if backend reviewers are so narrowly focused. I would, however, be amenable to a gcc-backend list, and lets say a strong suggestion that all messages to that list have [$arch] or [all] as a subject line prefix. Unless I miss the purpose of these lists? r~
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
On Tue, Nov 16, 2010 at 6:35 AM, Jan Hubicka wrote: >> More FDO related performance numbers >> >> Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance >> by 5% geomean >> Experiment 2: our internal gcc compiler (4.4.3 based with many local >> patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6% >> geomean >> Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs >> O2 (trunk gcc): LIPO improves by 12% >> Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2: LTO + >> FDO improves by 10.8% >> >> >> 1. Trunk gcc FDO vs O2 (5%) >> >> 164.gzip 1324 1302 -1.64% >> 175.vpr 1694 1725 1.84% >> 176.gcc 2293 2387 4.07% >> 181.mcf 1772 1756 -0.88% >> 186.crafty 2320 2280 -1.75% >> 197.parser 1166 1556 33.42% >> 252.eon 2443 2552 4.45% >> 253.perlbmk 2410 2586 7.28% >> 254.gap 1987 2021 1.71% >> 255.vortex 2392 2720 13.71% >> 256.bzip2 1719 1717 -0.12% >> 300.twolf 2288 2331 1.86% >> >> 2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%) > > Interesting, any idea from where this 1.6% is comming? Probably due to local patches (inliner, lrs, etc) we have, but I have not studied it. > I guess LIPO this might > be also reason for that 2% difference in LIPO results (in general LTO > -fwhole-program + FDO should be stronger, but it is not tunned at all yet). > > Since the LIPO branch was updated to mainline some time ago, it would be nice > to compare the LIPO from the branch with mainline LTO. i guess more fair > comparsion > should be O2+FDO+LTO WRT O2+LIPO as LIPO makes no whole program assumptions > at all, right? Yes. Raksit maintains the upstream lipo branch, but it has not been tuned for performance yet. We have open sourced our compiler changes via android. It is better to use that if any one is interested. Thanks, David > > Honza >
Re: Mailing lists for back-end development?
On Tue, Nov 16, 2010 at 9:29 AM, Mark Mitchell wrote: > What do people think about this idea? I think this is really bad idea. A lot of the time, back-end patches for one target inspires some folks to do patches for another target. Or for an example, look at how FMA has been done recently. Those patches would have a lot of overlapping. More mailing list is a bad idea as it signals that GCC development is splitting up. -- Pinski
Mailing lists for back-end development?
I spoke with a partner today who suggested that perhaps it would be a bit easier to follow the voluminous GCC mailing list if we had separate lists for patches related to particular back-ends (e.g., ARM, MIPS, Power, SuperH, x86, etc.). The idea here is that (as with libstdc++), we'd send patches to gcc-patches@ and gcc-$arch@, but that reviewers for a particular back-end would find it easier to keep track of things on the architecture-specific lists, and also that this would make it easier when trying to track down patches to backport to distribution versions of the compiler. What do people think about this idea? Thank you, -- Mark Mitchell CodeSourcery m...@codesourcery.com (650) 331-3385 x713
Invoking atomic functions from a C++ shared lib (or should I force linking with -lgcc?)
Hi, I have been investigating a problem I have while building Qt-embedded with GCC-4.5.0 for ARM/Linux, and managed to produce the reduced test case as follows. Consider this shared library (C++): atomic.cxx int atomicIncrement(int volatile* addend) { return __sync_fetch_and_add(addend, 1) + 1; } Compiled with: $ arm-linux-g++ atomic.cxx -fPIC -shared -o libatomic.so Now the main program: atomain.cxx extern int atomicIncrement(int volatile* addend); volatile int myvar; int main() { return atomicIncrement(&myvar); } Compiled & linked with: $ arm-linux-g++ atomain.cxx -o atomain -L. -latomic .../ld: atomain: hidden symbol `__sync_fetch_and_add_4' in /.../libgcc.a(linux-atomic.o) is referenced by DSO What I have found is that g++ (unlike gcc) links with -lgcc_s instead of -lgcc and that the atomic functions are present in libgcc.a and not in libgcc_s.so. If I create libatomic.so with -lgcc, it works. What I don't understand is if this is the intended behaviour and that adding -lgcc is the right fix, or not? [This surprises me, because as I said, I faced this problem when compiling Qt-embedded for ARM/Linux and I don't think I am the only one doing that, so I expected it to just work ;-)] Thanks, Christophe.
Re: decimal float, LIBGCC2_FLOAT_WORDS_BIG_ENDIAN, and ARM ABI issues
On Tue, 16 Nov 2010, Nathan Froyd wrote: > The saving grace here is that decimal float is not enabled by default > for arm platforms, so there are likely very few, if any, users of > decimal float on ARM; it might be worthwhile to go ahead and fix things, > ignoring the fallout from earlier versions. Not enabled by default generally implies not usable at all for decimal floating point; you generally need at least some support for the ABI, saying what modes are allowed in what registers, etc. - and the current revision of AAPCS doesn't include decimal floating point at all so there is no ABI to follow at present. It seems likely to me that if you enabled decimal floating point on ARM you'd get ICEs. -- Joseph S. Myers jos...@codesourcery.com
Re: RFC: semi-automatic hookization
Quoting Ian Lance Taylor : Joern Rennecke writes: Before I go and make all these target changes & test them, is there at least agreemwent that this is the right approach, i.e replacing CUMULATIVE_ARG * with void *, and splitting up x_rtl into two variables. I don't know how we want to get there, but it seems to me that the place we want to end up is with the target hooks defined to take an argument of type struct cumulative_args * (or a better name if we can think of one). We could consider moving the struct definition into CPU.c, and having the target structure just report the size, or perhaps a combined allocation/INIT_CUMULATIVE_ARGS function. If every target defines struct cumulative_args, allocation is straightforward. ctmrtl (or if you think a better name, propose one) is a macro for the global variable x_tm_rtl, which is defined in target-oriented middle-end code that includes tm.h . What is not quite clear is what is to happen with the args member of x_rtl. Should I remove the info member from struct incoming_args, and shift that to x_tm_rtl, or should I rather move the entire args member of x_rtl to x_tm_rtl? The latter would mean that struct incoming_args would remain intact - but OTOH more churn in config/*/*, because every access to crtl->args will have to be changed. Or maybe we should leave the target-specific stuff in x_rtl / crtl and instead move out the stuff that emit-rtl.h makes visible to non-rtl code, e.g. x_first_insn, x_last_insn ...
decimal float, LIBGCC2_FLOAT_WORDS_BIG_ENDIAN, and ARM ABI issues
The easiest way to deal with the use of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN in libgcc is to define a preprocessor macro __FLOAT_WORD_ORDER__ similar to how WORDS_BIG_ENDIAN was converted. That is, cppbuiltin.c will do: cpp_define_formatted (FOO, "__FLOAT_WORD_ORDER__=%s", (FLOAT_WORDS_BIG_ENDIAN ? "__ORDER_BIG_ENDIAN__" : "__ORDER_LITTLE_ENDIAN__")); and change any uses of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN to consult __FLOAT_WORD_ORDER__ instead. A grep reveals that there are no target definitions of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN, so we should be OK with the straightforward conversion, right? This runs into a curious case in the arm backend, though, which has: #define FLOAT_WORDS_BIG_ENDIAN (arm_float_words_big_endian ()) with no corresponding LIBGCC2_FLOAT_WORDS_BIG_ENDIAN. I think what this means is that the places that care about the order of float words (currently libdecnumber, libbid, and dfp-bit.h) will always use the order indicated by __BYTE_ORDER__/WORDS_BIG_ENDIAN, even when the backend is secretly using a different order. ARM has probably gotten lucky wrt dfp-bit.h because it has its own assembler fp routines that presumbly DTRT for unusual float word orderings. (dfp-bit.h also does not *use* the setting of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN, so that helps.) But IIUC, using __FLOAT_WORD_ORDER__ in the relevant libraries will break pre-existing code that used libdecnumber and/or libbid. I am not conversant enough with ARM ABIs and/or targets to know which ones would break. The saving grace here is that decimal float is not enabled by default for arm platforms, so there are likely very few, if any, users of decimal float on ARM; it might be worthwhile to go ahead and fix things, ignoring the fallout from earlier versions. What do the ARM maintainers think? Should I prepare a patch for getting rid of LIBGCC2_FLOAT_WORDS_BIG_ENDIAN and we'll declare decimal float horribly broken pre-4.6? Or is there a better way forward? -Nathan
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
> More FDO related performance numbers > > Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance > by 5% geomean > Experiment 2: our internal gcc compiler (4.4.3 based with many local > patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6% > geomean > Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs > O2 (trunk gcc): LIPO improves by 12% > Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2: LTO + > FDO improves by 10.8% > > > 1. Trunk gcc FDO vs O2 (5%) > > 164.gzip13241302 -1.64% > 175.vpr16941725 1.84% > 176.gcc22932387 4.07% > 181.mcf17721756 -0.88% > 186.crafty23202280 -1.75% > 197.parser11661556 33.42% > 252.eon24432552 4.45% > 253.perlbmk24102586 7.28% > 254.gap19872021 1.71% > 255.vortex23922720 13.71% >256.bzip217191717 -0.12% >300.twolf22882331 1.86% > > 2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%) Interesting, any idea from where this 1.6% is comming? I guess LIPO this might be also reason for that 2% difference in LIPO results (in general LTO -fwhole-program + FDO should be stronger, but it is not tunned at all yet). Since the LIPO branch was updated to mainline some time ago, it would be nice to compare the LIPO from the branch with mainline LTO. i guess more fair comparsion should be O2+FDO+LTO WRT O2+LIPO as LIPO makes no whole program assumptions at all, right? Honza
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
2010/11/16 Jan Hubicka : >> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka wrote: >> >> > Fortunately linker plugin solves the problem here and this is why I >> >> > want to >> >> > have it by default. GCC then can do effectively -fwhole-program for >> >> > binaries >> >> > (since linker knows what will be bound elsewhere) and take advantage of >> >> > visibility((hidden)) hints for shared libraries same way. Most of >> >> > important >> >> > shared libraries gets visibility ((hidden)) right. >> >> > >> >> > It is sad that LTO w/o linker plugin doesn't give that much benefits. >> >> > Ideas are welcome here. >> >> >> >> Linker feedback will be limited here -- mostly global variable >> >> aliasing (as I remember only 2/3 spec programs benefit from it), it >> >> helps You don't get whole program points-to, whole program mod-ref >> >> (with context sensitivity), whole program structure layout. The latter >> >> are the real kickers (in terms of SPEC performance), but promoting LTO >> >> with those numbers can be misleading as many programs won't get it. >> > >> > Well, I am speaking of our linker plugin here. What it does is to pass GCC >> > resolution information so it knows what symbols are bound externally. Since >> > typically you link LTO alone or with small non-LTO part, most of symbols >> > are >> > not bound and thus effecitvely you get -fwhole-program (-fwhole-program >> > just >> > declare everything static except for main ()) >> > >> > We don't really do whole program points-to or structure layout. >> >> gcc will eventually, right? > > Sure hope so ;) > We really need to solve scalability with our IPA points-to and make it > compatible with WHOPR. >> >> > Mod-ref is just >> > simple ipa-reference code. How you get context sensitivity on mod/ref? >> >> mod-ref relies on points-to. With context sensitive points-to, you can >> also get CS mod-ref -- basically mod-ref info per callsite. > > Ah sure, I was too focused on our current "mod/ref" :) Btw, IPA-PTA also performs mod/ref analysis (but of course it is context insensitive). Richard. > Honza >
__ghtread_recursive_mutex_destroy missing
The gthreads portability layer is missing a function for destroying a __ghtread_recursive_mutex object. For pthreads-based models the recursive mutex type is the same as the normal mutex type so __gthread_mutex_destroy handles both, but they're distinct types for (at least) gthr-win32.h, so we can't properly cleanup recursive mutexes in libstdc++ Any objections if I prepare a patch to add __gthread_recursive_mutex_destroy to each gthr header?
Re: extern "C" applied liberally?
On Mon, Nov 15, 2010 at 7:19 PM, Jay K wrote: > > I know it is debatable and I could be convinced otherwise, but I would > suggest: > > > > #ifdef __cplusplus > extern "C" { > #endif > > ... > > > #ifdef __cplusplus > } /* extern "C" */ > #endif > > > be applied liberally in gcc. > Not "around" #includes, it is the job of each .h file, and mindful of #ifdefs > (ie: correctly). > > > Rationale: > Any folks that get to see the mangled names, debugging, working on binutils, > whatever, are saved from them. > They are generally believed to be ugly, right? Yeah yeah, not a technical > argument. binutils are good at handling those stuff these days. In long term, that change looks counter productive. [...] > I think it is a good idea for any C or historically C code when moving to a > C++ compiler. it may or may not be. In this case, I don't think it is. The transition is complete now. > They could/would be removed as templates/function overloads/operator > overloading are introduced. why introduce kludge that we may have to remove later, when the kludge fixes no glaring problem?
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
More FDO related performance numbers Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance by 5% geomean Experiment 2: our internal gcc compiler (4.4.3 based with many local patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6% geomean Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs O2 (trunk gcc): LIPO improves by 12% Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2: LTO + FDO improves by 10.8% 1. Trunk gcc FDO vs O2 (5%) 164.gzip13241302 -1.64% 175.vpr16941725 1.84% 176.gcc22932387 4.07% 181.mcf17721756 -0.88% 186.crafty23202280 -1.75% 197.parser11661556 33.42% 252.eon24432552 4.45% 253.perlbmk24102586 7.28% 254.gap19872021 1.71% 255.vortex23922720 13.71% 256.bzip217191717 -0.12% 300.twolf22882331 1.86% 2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%) 164.gzip13241317 -0.48% 175.vpr16941758 3.76% 176.gcc22932472 7.79% 181.mcf17721730 -2.35% 186.crafty23202353 1.40% 197.parser11661652 41.70% 252.eon24432610 6.82% 253.perlbmk24102561 6.23% 254.gap19871987 -0.04% 255.vortex23922801 17.09% 256.bzip217191748 1.68% 300.twolf22882335 2.04% 3. LIPO vs trunk O2 (12%) 164.gzip13241350 1.99% 175.vpr16941758 3.77% 176.gcc22932519 9.83% 181.mcf17721766 -0.33% 186.crafty23202394 3.16% 197.parser11661683 44.32% 252.eon24432879 17.80% 253.perlbmk24102556 6.04% 254.gap19872139 7.61% 255.vortex23923669 53.40% 256.bzip217191824 6.09% 300.twolf22882345 2.49% 4. LTO + -fwhole-program + O2 + FDO vs O2 (10.8%) 164.gzip13241340 1.25% 175.vpr16941709 0.87% 176.gcc22932411 5.13% 181.mcf17721757 -0.80% 186.crafty23202566 10.59% 197.parser11661614 38.44% 252.eon24432785 13.98% 253.perlbmk24102618 8.61% 254.gap19872063 3.81% 255.vortex23923294 37.69% 256.bzip217191956 13.77% 300.twolf22882404 5.07% David On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li wrote: > More performance data: > > -O2 -funroll-all-loops vs O2: +1.1% geomean > > O2 O2 unroll-all-loops > 164.gzip 1324 1336 0.94% > 175.vpr 1694 1670 -1.44% > 176.gcc 2293 2353 2.60% > 181.mcf 1772 1793 1.20% > 186.crafty 2320 2300 -0.86% > 197.parser 1166 1171 0.39% > 252.eon 2443 2515 2.93% > 253.perlbmk 2410 2250 -6.66% > 254.gap 1987 2041 2.68% > 255.vortex