date:20070618


On 6/18/07, Uros Bizjak [EMAIL PROTECTED] wrote:

On 6/18/07, tbp [EMAIL PROTECTED] wrote:

 Until now, the contract was: you have to deal with (and contain) NaN
 and infinities. Fair enough, even if tricky that remained manageable.
 But if i can't expect a mere division by 0, or sqrt of 0 (quite common
 with FTZ/DAZ on) to give me respectively an infinite and 0 and instead
 get a NaN (which i can't filter, you remember?) because of the NR
 round, that's pure madness.

Attached patch to should fix these troubles for the cost of 2 extra
clocks. The trick is to limit the result just below infinity for
rsqrt, and this keeps 0.0*(inf-) - 0.0.


I guess I'm still confused how this will fix sqrt(x) - rsqrt for x == 0, so,
can we have a testcase enumerating the now bogus cases?

Thx,
Richard.


Uros.

Index: i386.c
===
--- i386.c  (revision 125790)
+++ i386.c  (working copy)
@@ -22590,7 +22590,7 @@ void ix86_emit_swdivsf (rtx res, rtx a,
 void ix86_emit_swsqrtsf (rtx res, rtx a, enum machine_mode mode,
 bool recip)
 {
-  rtx x0, e0, e1, e2, e3, three, half;
+  rtx x0, e0, e1, e2, e3, three, half, bignum;

   x0 = gen_reg_rtx (mode);
   e0 = gen_reg_rtx (mode);
@@ -22600,15 +22600,18 @@ void ix86_emit_swsqrtsf (rtx res, rtx a,

   three = CONST_DOUBLE_FROM_REAL_VALUE (dconst3, SFmode);
   half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, SFmode);
+  bignum = gen_lowpart (SFmode, GEN_INT (0x7f7f));

   if (VECTOR_MODE_P (mode))
 {
   three = ix86_build_const_vector (SFmode, true, three);
   half = ix86_build_const_vector (SFmode, true, half);
+  bignum = ix86_build_const_vector (SFmode, true, bignum);
 }

   three = force_reg (mode, three);
   half = force_reg (mode, half);
+  bignum = force_reg (mode, bignum);

   /* sqrt(a) = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a))
  1.0 / sqrt(a) = 0.5 * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) */
@@ -22617,6 +22620,9 @@ void ix86_emit_swsqrtsf (rtx res, rtx a,
   emit_insn (gen_rtx_SET (VOIDmode, x0,
  gen_rtx_UNSPEC (mode, gen_rtvec (1, a),
  UNSPEC_RSQRT)));
+  emit_insn (gen_rtx_SET (VOIDmode, x0,
+ gen_rtx_SMIN (mode, x0, bignum)));
+
   /* e0 = x0 * a */
   emit_insn (gen_rtx_SET (VOIDmode, e0,
  gen_rtx_MULT (mode, x0, a)));

Re: Activate -mrecip with -ffast-math?


On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote:

No, that's not the contract with -ffast-math.  Note that -ffast-math
enables -funsafe-math-optimizations which is allowed to change results
(add/remove rounding operations, contract expressions, do transforms
like a/b to a * 1/b, do transformations that get you bigger errors than
0.5ulp, etc.)

I can't expect a division by a constant to survive -ffast-math
unscathed, but then that's a change in precision and manageable.
Being returned a NaN i'm not supposed to be see for a common case
depending on some transformation is something else, entirely.


 But if i can't expect a mere division by 0, or sqrt of 0 (quite common
 with FTZ/DAZ on) to give me respectively an infinite and 0 and instead
 get a NaN (which i can't filter, you remember?) because of the NR
 round, that's pure madness.

Hm, which particular case are you concerned about (maybe it was mentioned,
but I don't remember the details)?  Note that -ffast-math enables
-ffinite-math-only as well, so the compiler assumes nothing will result in
NaNs or Infs.

Yes and that's why it's such a pain to handle them correctly while in
-ffast-math. But if i generate some, then i get what i've asked for
(and i'm in for a local fix). Fair enough. I'm not going to give up ie
fast  robust SSE ray/aabb slab tests (or ray/plane or...) because of
some arbitrary rule; the hardware handles it just fine (yes there's a
penalty, but then it's way faster than branching).

For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first
get an inf from said reciprocal which then turns to a NaN in the NR
stage but if you correct it by, say, doing a comparison to 0 and a
'and'.
That's what ICC used to do in your back. That's what you'll find page
151 of the amdfam10 optimization manual. Because that's a common case.

As far as i can see, there's no such provision in the current patch.
At the very least provide a mean to look after those NaNs without
losing sanity, like a way to enforce argument order of
min/max[ss|ps|pd] without ressorting to inline asm.


Well - certainly another reason for the Math BOF ;)  We all expect very
different things from -ffast-math or -funsafe-math-optimizations.

You mean fast  unsafe?
I think there's quite a margin between to let someone shoot himself in
the feet and put a gun on his head.

Re: More vectorizer testcases?

2007-06-18 Thread Giovanni Bajo


On 6/18/2007 1:26 PM, Dorit Nuzman wrote:


these 3 are actually not so simple... the main thing that's blocking 2 of
them right now is that they need support for stores with gaps, which can be
added except the other problem is that the vectorizer thinks it's not
profitable to vectorize them (or rather 2 of them. as does ICC by the way).


When you say not profitable, is that target-dependent? I would be 
satisfied when the vectorizer can vectorize it *but* prefer not to do it 
because it can be done more efficiently on the specific target.


Of course, it would interesting to still force the vectorizer to 
produce the code, so to compare the vectorized version with the 
non-vectorized version and see if it is really right. Is there (will 
there be) an option to turn off cost-based estimation within the vectorizer?



Since the time you opened these PRs we came quite a bit closer to
vectorizing these (the support for interleaved accesses and for multiple
data-types were also required). It will be fun to add the last missing bit
- the support for the stores-with-gaps. I hope we'll get to it before too
long...


Nice! I'm looking forward to it!


If you have other (hot) code examples that expose different missing
features I think that's always interesting to know about (but if it's like
the codes above then maybe it will not have much added value...).


I have dozens and dozens of loops which I believe that could be 
vectorized and are not. I don't whether they are related to 
store-with-gaps or not, though. So, well, I'll open the bugreports and 
let you do the analysys. Feel free to close them as duplicates if you 
think they're not worth to keep opened on their own.

--
Giovanni Bajo

Re: Activate -mrecip with -ffast-math?


On 6/18/07, Giovanni Bajo [EMAIL PROTECTED] wrote:

I understand your problems, but let me state that your objections are
totally subjective. *You* need a specific behaviour from -ffast-math
(eg: keep NaN/Inf), but that's not what *I* need. So, we have different
goals.

No. My NaN are my problem. Those generated by gcc, aren't.
At the very least provide a cannonical (efficient) way to filter them
(ie SSE min/max).

Re: Some thoughts about steerring commitee work


  However, as far as I know (also from talking with the
  SLP authors) pretty much all the opportunities they had found at the
time
  were in loops.

 I can hand you more than the testcases i've given so far.  There is
 tons of code out there that would  benefit from straight line

Interesting. I wasn't aware of this potential. Please do send some of this
code. thanks!

 vectorization.  In fact, we have some that gets written in loop form
 right now just so it gets vectorized!


that doesn't sound like such a bad idea to me... :-) (seriously - isn't it
more intuitive for the programmer and informative for the compiler to use
loops when possible? of course, I haven't seen the code you're talking
about so maybe it doesn't apply to the cases you're referring to)


  we'll have to have a much better cost model before we start packing
random
  sequences of stmts out of loops.

 This i'm happy to agree on, but it does not change that I am
 disappointed that you have tied the SLP implementation to loops so
 heavily.


I think the SLP that we did should really be viewed more as extending the
vectorizer to also consider intra-iteration DLP (in addition to
inter-iteration DLP). We used the term SLP cause there is a lot of analogy
to SLP, and a lot of people are familar with this term, but we are not
doing SLP per se. And, as I said before, I see the dependence on the
existing loop infrastructure as an advantage, and a way to efficiently
vectorize a lot of SLP-like-codes without writing a whole new vectorizer. I
don't debate that there's room to also implement real basic-block SLP
(well, at least now that I hear that there's ton of code that can benefit
from it), but I don't think you should be so disappointed... :-)

thanks,
dorit

 
   Simply because you can't find cases in SPEC2000 doesn't mean it's not
  useful.
 
  I don't know where you're taking this from. SPEC2000 is really not so
  interesting vectorization wise, inside or outside loops.

 This was from some private mails I received about how it is not useful
 for benchmarks.
 HTH,
 Dan

Re: Incorrect bitfield aliasing with Tree SSA

2007-06-18 Thread Eric Botcazou

 If it was designed properly in the first place, there simply would *be
 no problem at the tree level*, because nothing would have broken.

That's certainly a point of view.  The other is that the RTL implementation 
predates the Tree one, works fine in GCC 3.x, including for the C compiler.
One would have thought that the Tree implementation would be aware of it 
instead of overlooking it, given that alias.c is shared among them.

 So far you guys have resisted what seem like perfectly reasonable
 solutions by Adam

You mean the patch that would have disabled the whole thing at the RTL level?  
I'm sure that we can devise something better.

-- 
Eric Botcazou

Re: Incorrect bitfield aliasing with Tree SSA

 That is not the example case we have given where this breaks.
 The case where it breaks is exactly the case i have shown you.
 
 We have a pointer to a structure, and because you have not recorded
 the type's alias relationships properly, we claim derferences that are
 offsetted from the structure can not access the field.
  This is a direct consequence of trying to use the parent's alias set
 for that of the child type, instead of creating a new alias set.

Let me try to explain it this way: if you have a structure where all fields
are nonaddressable, EVERY reference should have the same alias set, that
of the structure.  So they all conflict, as they should, and no other
reference will conflict, which is also correct.

It sounds like there's a bug here in that somebody is using the wrong alias
set somewhere.  All the RTL dumps posted in this thread (and the related
one in gcc-patches) look correct, so it's right at the RTL level.  That
means that only place it's wrong would be at tree level, but get_alias_set
also does the right thing.

Re: More vectorizer testcases?

Giovanni Bajo [EMAIL PROTECTED] wrote on 18/06/2007 15:06:48:

 On 6/18/2007 1:26 PM, Dorit Nuzman wrote:

  these 3 are actually not so simple... the main thing that's blocking 2
of
  them right now is that they need support for stores with gaps, which
can be
  added except the other problem is that the vectorizer thinks it's not
  profitable to vectorize them (or rather 2 of them. as does ICC by the
way).

 When you say not profitable, is that target-dependent? I would be
 satisfied when the vectorizer can vectorize it *but* prefer not to do it

that's a fair point. In the one case (the 3X3 matrices and loop-bounds=3)
the vectorizer just can't handle sizes that don't evenly divide the
vector-size. Even when that will be extended (by conceptually unrolling the
loop by 4 to be able to pack into 3 vectors of size 4) it won't help this
case cause the loop bound is only 3. This particular testcase fails to
vectorize even without the newly added initial cost-model, just based on
the fact that the loop-count is less than the vector-size (this is not a
target dependent decision). ICC is reported to also choose to not vectorize
this loop. The other two loops we just can't vectorize yet (both of which
ICC chooses not to vectorize because it thinks it's not profitable).

 because it can be done more efficiently on the specific target.

 Of course, it would interesting to still force the vectorizer to
 produce the code, so to compare the vectorized version with the
 non-vectorized version and see if it is really right. Is there (will
 there be) an option to turn off cost-based estimation within the
vectorizer?


the choice not to vectorize when the loop-bound is less than the
vectorization factor is the only cost estimation that is hard coded. The
rest of the cost model is controlled by the flag -fvect-cost-model.

dorit

  Since the time you opened these PRs we came quite a bit closer to
  vectorizing these (the support for interleaved accesses and for
multiple
  data-types were also required). It will be fun to add the last missing
bit
  - the support for the stores-with-gaps. I hope we'll get to it before
too
  long...

 Nice! I'm looking forward to it!

  If you have other (hot) code examples that expose different missing
  features I think that's always interesting to know about (but if it's
like
  the codes above then maybe it will not have much added value...).

 I have dozens and dozens of loops which I believe that could be
 vectorized and are not. I don't whether they are related to
 store-with-gaps or not, though. So, well, I'll open the bugreports and
 let you do the analysys. Feel free to close them as duplicates if you
 think they're not worth to keep opened on their own.
 --
 Giovanni Bajo

gcov / gcov-dump

2007-06-18 Thread Eddy Pronk

I'm writing a tool which reads information (arcs) from the .gcno file 
produced by GCC with -ftest-coverage.


It calculates the NPATH complexity (number of execution paths in a function)

By doing this I found out the graph generated by GCC contains more paths 
then I expected. For every function call it generates emergency exits 
in case exceptions occur. (i'm guessing here)


I found some documentation in the gcov-io.h but I still have trouble 
figuring out how to interpret these paths. In some cases inline C++ code 
appears in the graph, but for simple cases it doesn't.


The .gcno files seemed a simple way to get access to the arcs, but I'm 
not sure the data is rich enough to work out the NPATH complexity. (for 
simple cases, it works well)


Any ideas how I can find more information to solve this?

Eddy

--
http://sourceforge.net/projects/gnocchi

Re: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread KJKHyperion


On 6/17/07, michael.a [EMAIL PROTECTED] wrote:

I appreciate the thought, but there is sort of an imperitive with this
effort to shy away from Boost/STL/virtual inheritance completely.


You'd be hard-pressed to find any instance of dynamic polymorphism
anywhere in Boost. Most of Boost is based on compile-time template
tricks. For example, you have been pointed to boost::optionalT;
here's boost/optional/optional.hpp, where it's implemented, see if
you can find virtual anywhere:

http://www.boost.org/boost/optional/optional.hpp

Re: Incorrect bitfield aliasing with Tree SSA

 If it was designed properly in the first place, there simply would *be
 no problem at the tree level*, because nothing would have broken.

It's possible to have bugs anytime and that's all we have here: somebody
is using the wrong alias set someplace.  We fix that and all is OK.

 So far you guys have resisted what seem like perfectly reasonable
 solutions by Adam

Because they turn off the feature rather than use it.  I still don't
understand what the difficulty is here and why you persist in thinking
that type alias set of the type of a non-addressable field has any
use at all: it doesn't and should be COMPLETELY ignored.  All the code has
been written to do that.  Somebody is trying to directly compute an alias
set instead of using get_alias_set and when you find that, you'll find
the bug.

missing symbols

2007-06-18 Thread costin_c


In the following code, compiled with
g++   cls.cc -Wall -W -g3 -o cls

why only only virtual functions  f1, f2 and constructor is listed by nm.

Only debugging symbols for virtual functions are included in
executable output file ?

//cls.cc
#include iostream
using namespace std;

class test
{
public:
   int u;
   test(int t){u=t;};
   virtual void f1(){
   cout uendl;
   }
   virtual void f2(int t){u=t;};
   void f3(int t){u=t;};

};

int main(int argc, char **argv)
{
   test t(100);

}
$nm -C  cls | grep test::

0804874e W test::f1()
08048740 W test::f2(int)
08048728 W test::test(int)

Re: missing symbols

2007-06-18 Thread costin_c


On 6/18/07, costin_c [EMAIL PROTECTED] wrote:

In the following code, compiled with
g++   cls.cc -Wall -W -g3 -o cls

why only only virtual functions  f1, f2 and constructor is listed by nm.

Only debugging symbols for virtual functions are included in
executable output file ?

//cls.cc
#include iostream
using namespace std;

class test
{
public:
int u;
test(int t){u=t;};
virtual void f1(){
cout uendl;
}
virtual void f2(int t){u=t;};
void f3(int t){u=t;};

};

int main(int argc, char **argv)
{
test t(100);

}
$nm -C  cls | grep test::

0804874e W test::f1()
08048740 W test::f2(int)
08048728 W test::test(int)



Wierd assembler file, generated by -S parameter,  include all
information about test  class methods: test,f1,f2,f3

..
   .string test::test
   .long   0x5d40
   .string test::f2
   .long   0x5d70
   .string test::f3
   .long   0x5e4d
   .string test::f1
   .long   0x5e71
   .string main

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Eric Botcazou [EMAIL PROTECTED] wrote:

 If it was designed properly in the first place, there simply would *be
 no problem at the tree level*, because nothing would have broken.

That's certainly a point of view.  The other is that the RTL implementation
predates the Tree one, works fine in GCC 3.x, including for the C compiler.
One would have thought that the Tree implementation would be aware of it
instead of overlooking it, given that alias.c is shared among them.


Uh, except as we've discovered, the RTL uses alias set 0, so whatever
alias set you choose for these doesn't matter anyway to the RTL level.




 So far you guys have resisted what seem like perfectly reasonable
 solutions by Adam

You mean the patch that would have disabled the whole thing at the RTL level?
I'm sure that we can devise something better.


No i mean the idea of making it a different alias set than the parent,
but a subset of the parent.


--
Eric Botcazou

Re: missing symbols

2007-06-18 Thread Andrew Pinski


On 6/18/07, costin_c [EMAIL PROTECTED] wrote:

On 6/18/07, costin_c [EMAIL PROTECTED] wrote:
 In the following code, compiled with
 g++   cls.cc -Wall -W -g3 -o cls

 why only only virtual functions  f1, f2 and constructor is listed by nm.


Because they are needed for the vtable.  While f3 is declared as
inline and not used so it is not outputed.



Wierd assembler file, generated by -S parameter,  include all
information about test  class methods: test,f1,f2,f3


Not really because that is what you get with -g3.

Thanks,
Andrew Pinski

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Daniel Berlin [EMAIL PROTECTED] wrote:

On 6/18/07, Eric Botcazou [EMAIL PROTECTED] wrote:
  If it was designed properly in the first place, there simply would *be
  no problem at the tree level*, because nothing would have broken.

 That's certainly a point of view.  The other is that the RTL implementation
 predates the Tree one, works fine in GCC 3.x, including for the C compiler.
 One would have thought that the Tree implementation would be aware of it
 instead of overlooking it, given that alias.c is shared among them.

Uh, except as we've discovered, the RTL uses alias set 0, so whatever
alias set you choose for these doesn't matter anyway to the RTL level.



  So far you guys have resisted what seem like perfectly reasonable
  solutions by Adam

 You mean the patch that would have disabled the whole thing at the RTL level?
 I'm sure that we can devise something better.

No i mean the idea of making it a different alias set than the parent,
but a subset of the parent.


also, unique from the alias set of the other type (IE int:31 has a
different alias set than int).

Re: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread Joern Rennecke

michael.a:
 A proper C++ style fix would require the introduction of new syntax rather
 than tagging unions or such. The dominant ctors would have to be specified,
 or unions themselves could simply be allowed ctors that override the member
 ctors. Call them constructor overloads or something, no new syntax or
 revolutionary semantics, just a quick and easy fix.

If all you need is one memeber that has constructors / destructors, and
all other members are PODs that provide an alternate view of the contents,
then I think that would make a logical extension of the transparent union
extension.  A transparent union as passed to functions in the same manner as
its first member.  You could define that a tranparent union is allowed
to have as its first member a class with constructors and/or destructors,
and that these constructors / destructors are then the constructors /
destructors of the union.

Caveat: If the union is larger or more alingned than its first member,
the argument passing semantics don't make sense.  This is documented in
extend.texi:
  Second, the argument is passed to the function using the calling
  conventions of the first member of the transparent union, not the calling
  conventions of the union itself.  All members of the union must have the
  same machine representation; this is necessary for this argument passing
  to work properly.

There is also a syntax example for __attribute__ ((__transparent_union__)) in
extend.texi.  
Inside the compiler, you can check if union is a transparent union using the
TYPE_TRANSPARENT_UNION macro.

Joe Buck:
 I wouldn't object if someone implemented a clean extension.  The problem
 with extensions, though, is documenting how all the corner cases work,
 and making sure that they all get tested.  This is somewhat easier when
 you're cloning someone else's extension, because the other implementation
 can be used for comparison.

To avoid having too many corner cases, you can keep the defined functionality
small and well delineated, declare anything beyond this scope as invoking
undefined behaviour (simplest for implementation - just make sure you
don't ICE) or as a constraint violation (i.e. you should make sure that
the compiler produces an error - the benefit is that it prevents people
from accidentially starting to use accidential functionality that is not
covered by the documented extension).

michael.a:
 Sometimes extensions just have to be quick and dirty. Microsoft is a major
 influence. The facilities should be there to match MS whenever within
 reason... as well as should be ever present warnings not to abuse such
 facilities.

If you make a quick and dirty hack, you have so be prepared for it not to
be maintainable for any length of time.

michael.a:
 I went to compile a tainted build last night, but I ran into a build error
 apparently related only to subversion checkouts, which might also be
 particular to the target debian distribution / hardware support for some
 esoteric reason according to what can be gleamed from google. So I went to
 just download the release sources, but all of the mirrors were down for some
 reason. 
 
 The error is related to a bison/flex build event, which for some reason
 can't be completed by autotools or something... I figure it easier to just
 go with the release sources as suggested (the relevant .c files are
 pregenerated in the release trees)

Try contrib/gcc_update --touch after the checkout.

Re: Incorrect bitfield aliasing with Tree SSA

 Uh, except as we've discovered, the RTL uses alias set 0, so whatever
 alias set you choose for these doesn't matter anyway to the RTL level.

Only in some cases.  That was a kludge put in to fix some obscure bug and
left there. I hope we can remove it at some point, and think we can.

 No i mean the idea of making it a different alias set than the parent,
 but a subset of the parent.

Because it *is* the parent in most places, so it should be in all.

Re: Incorrect bitfield aliasing with Tree SSA

 Again, the tree level relies on the documented (in the comments of
 alias.c) fact that given a structure, the fields contained in a
 structure will have alias sets that are strict subsets of the parent.

That is ONLY true for fields that don't have DECL_NONADDRESSABLE_P
and that's been the case foreever.  The documentation might be confusing,
but the code has never been.

 The bug reports are about cases where we have a struct foo * (where
 struct foo contains int a:31), and foo pointer-a is claimed to not
 alias with foo.a.

How can you take a pointer to the bitfield?

 I would much rather maintain the strict subset invariant than the
 component_uses_parent_alias_set stuff, since this is the documented
 invariant, and makes sense.

But throws away the entire DECL_NONADDRESSABLE_P mechanism!  Also, how
do we handle TYPE_NONALIASED_COMPONENT?  It's exactly the same issue?

Re: Incorrect bitfield aliasing with Tree SSA

 His first patch, which simply makes #1 true, would cause missed optimization.

It doesn't cause missed optimizations: it completely removes all the
functionality of DECL_NONADDRESSABLE_P!

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

 Again, the tree level relies on the documented (in the comments of
 alias.c) fact that given a structure, the fields contained in a
 structure will have alias sets that are strict subsets of the parent.

That is ONLY true for fields that don't have DECL_NONADDRESSABLE_P
and that's been the case foreever.  The documentation might be confusing,
but the code has never been.

 The bug reports are about cases where we have a struct foo * (where
 struct foo contains int a:31), and foo pointer-a is claimed to not
 alias with foo.a.

How can you take a pointer to the bitfield?

 I would much rather maintain the strict subset invariant than the
 component_uses_parent_alias_set stuff, since this is the documented
 invariant, and makes sense.

But throws away the entire DECL_NONADDRESSABLE_P mechanism!


No, an int* will still not conflict with int:31
a short * will still not conflict with short:31


Also, how
do we handle TYPE_NONALIASED_COMPONENT?  It's exactly the same issue?


Tell me what TYPE_NONALIASED_COMPONENT does, and i'll tell you what
will happen right now :)

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

 His first patch, which simply makes #1 true, would cause missed optimization.

It doesn't cause missed optimizations: it completely removes all the
functionality of DECL_NONADDRESSABLE_P!



Hence the reason for the second suggestion.

Re: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread Andrew Haley

Robert Dewar writes:
  Ross Ridge wrote:
  t formal definition.
  
   Most of GCC's long list of extensions to C are also implemented as
   extensions to C++, so you've already lost this battle in GNU C++.
  
  And many of them are ill-defined (and some would agree ill-considered).
  Mistakes in the past are not a good reason for mistakes in the future.
  
   Trying to add new a new feature without an existing implementation only
   makes it harder to get both a correct formal definition and something
   that people will actually want to use.
  
  I think the best procedure is to discuss new features from a language
  design point of view, and the committee is the best forum for that,
  then implement them as *part* of the (typically fairly drawn out)
  process of adding a new feature.

There's always a chicken and egg problem here: language features
that might be good for a standardization proposal need to be tested in
real-world applications before anyone knows that they will be useful.

Of course, some of gcc's C extensions are ill-considered and caused
problems, but one of the reasons we know how ill-considered they are
is that they were implemented and people tried to use them.  gcc has a
role to play as a deployment vehicle for language extensions.

The trouble is that it's very hard to kill an extension once people
are using it...

Andrew.

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

  But throws away the entire DECL_NONADDRESSABLE_P mechanism!

 No, an int* will still not conflict with int:31
 a short * will still not conflict with short:31

Using what mechanism?  That's what DECL_NONADDRESSABLE_P does!


Please read what the *second* proposal was again

1. The alias set is a subset of the parent set
and more importantly
2. The alias set is different than that of the underlying type and the
parent set.

Thus, the alias sets will not conflict with the underlying type, but
will conflict with the parent set which is exactly what you want.

How does this get a different result for trees than RTL?

As i've explained, we rely on the proper of the TBAA forest that given

 struct foo (set 1)
 / \
int :31  (set 2)  short :31 (set 3)

sets for int :31 and short :31 are strict subsets of that of struct
foo.  This is how it is documented and except for this one little wart
introduced by DECL_NONADDRESSABLE_P, how it is.

For the sake of a complete example, we also have int (set 4) and short
(set 5), which are both roots of this forest, and thus, do not
conflict with set 3 or 2.

The forest we have now says:

struct foo, int :31, short :31   (set 1)

(and int = set 4 and short = set 5).
Note again that in neither forest  does set 1 conflict with 4 or 5,
but in the first forest, the subset relationship between int:31 and
struct foo is properly represented as subset but different.

As I said to Eric, you can also change the strict subset to
subset_or_equals, but that is really not quite in accord with reality.
They *really are different alias sets than their parent*.

Note also that it is more precise in the first case.
If you were ever to ask can int:31 touch short:31, the first forest
would correctly say no, what we have now would say yes.



 Tell me what TYPE_NONALIASED_COMPONENT does, and i'll tell you what
 will happen right now :)

Very similar.  If I have

typedef xyz[100] foo;

and mark that type with that flag, it means that int * will not conflict
with it, just foo *.



Then these should simply have different non-conflicting aliasing sets :)

Re: Some thoughts about steerring commitee work

2007-06-18 Thread Devang Patel


I can hand you more than the testcases i've given so far.  There is
tons of code out there that would  benefit from straight line
vectorization.


I'm interested in these test cases. Thanks!


In fact, we have some that gets written in loop form
right now just so it gets vectorized!


May be loop materialization is useful in such situation ?

-
Devang

Re: RFC: Make dllimport/dllexport imply default visibility

2007-06-18 Thread Mark Mitchell

Chris Lattner wrote:

[Richard E., please see below for a question re. RealView's behavior.]

 You and Chris are taking the view that the type has a location.  But,
 a lot of people don't look at it this way.  The things that have
 locations are variables (including class data) and functions.  After
 all, types don't appear in object files; only variables and functions do.
 
 That is a limited view of things based on the current implementation of
 GCC.  When future developments (e.g. LTO) occur, this will change: types
 certainly do live in object files.

I don't see how LTO changes this.  Yes, type definitions will appear in
one or more object files.  But, the intended semantics of LTO are just
to do what the linker would do -- plus some consistency checking.

 Furthermore, you're taking the view that:

   __attribute__((visibility (hidden))

 on a type means something about visibility of the type in a linguistic
 sense, i.e., that it provides some kind of scoping, perhaps like an
 anonymous namespace that is different in each shared library.
 
 Yes.

That's a possible meaning, but it's not the meaning that was intended.
As Danny has said, it's not the meaning that Windows users want.  It's
also not the meaning that SymbianOS users want.

 But, the visibility attribute is only specified in terms of its effects
 on ELF symbols, not as having C++ semantics per se.  The hidden
 visibility attribute says that all members of the class have hidden
 visibility, unless otherwise specified.
 
 I'll paraphrase this as saying: this is already an extension, not a
 standard - we can extend the extension without remorse.

Currently, the compiler generates wrong code: it generates a hidden
reference to a dllimport'd function.  There are two ways to fix that:
declare the construct invalid, or make the compiler generate a
non-hidden reference.  The second option might be an extension to an
extension but it might also just be a bug fix.

 We also allow:

 struct __attribute__((visibility(hidden))) S {
   __attribute__((visibility(default))) void f();
   void g();
 };

 Because there is no standard to reference, I think it's important to
 consider these things in terms of explainability.  It is very easy (and
 common) to explain visibility and anon namespaces in terms of types
 (when applied to a type).

Here would be my explanation:

The visibility attribute to a class specifies the default visibility
for all of its members, including compiler-generated functions and
variables.  You can override that default by explicitly specifying a
different visibility for the members.

That seems acceptably simple to me.  As long as we allow visibility
specifications for the members that are different from the class
(independently of whether that is narrower or wider visibility) an
explanation in terms of namespaces will require a caveat.  For example:

Giving a class hidden visibility is similar to putting it in an
anonymous namespace shared not just within a single translation unit,
but across all translation units in a shared object.  However, if you
override the visibility of the members of the class, then they may have
more or less visibility than specified by the class.

ELF operates at a level below C++, and can be used to do things that C++
does not allow.  For example, the C++ standard (via the ODR) forbids a
single program from having two classes with the same name.  But, one of
the goals of ELF hidden visibility is to allow that, so that, for
example, two plugins can have classes with the same name without
conflicting.  You can also give two C++ functions the same address via
appropriate ELF magic.  These sorts of things must be done with care,
but they are techniques used by many real programs, and in the hands of
experts, useful.

 I suspect that the realview compiler accepts
 this as an oversight or a bug, not as an intentional feature.

Let's ask.

Richard E., is the fact that RealView 3.0SP1 accepts:

  class __declspec(notshared) S {
__declspec(dllimport) void f();
  };

a bug or a feature?  If this is considered a bug, is it something that
RealView is likely to change in a future release, or will it be
preserved for the forseeable future for backwards compatibility?

 There are two conflicting goals to balance:
 
 1. Define our extensions as well as possible and make their semantics as
 explainable and logical as possible.
 2. Compile existing code with maximum compatibility.
 
 To me, the best way to handle this is to reject this by default (based
 on #1).  To handle #2, add a flag (defaulting to off) to enable this
 extended extension.  In the diagnostic, tell the user about the option,
 and in the manual document the option and the issue.

Good; at this point we've agreed that we should accept the code.  Now
we're just arguing about whether we accept it by default.  That's a less
important issue, since at least there will be some way to get the
behavior that users want.

We have accepted this code:

 struct

Re: Suffix for __float128 FP constants

On Sun, Jun 17, 2007 at 09:06:36PM +, Joseph S. Myers wrote:
 On Sun, 17 Jun 2007, Uros Bizjak wrote:
 
  I was trying to load a full 128 bit constant into __float128 variable, but
  with L suffix, I was able to load only XFmode constant. Is there a special
  suffix for __float128 available in gcc?
 
 No; since the x86-64 ABI is what defines the __float128 name, you could 
 ask the associated mailing list about a standard suffix to associate with 
 it.
 

Lack of standard for __float128 is always a problem. Suffix for
__float128 constant is one, scanf/printf specifier for __float128
is another. We also don't have a name for string to __float128
function.


H.J.

Re: Suffix for __float128 FP constants

2007-06-18 Thread Uros Bizjak


H. J. Lu wrote:


I was trying to load a full 128 bit constant into __float128 variable, but
with L suffix, I was able to load only XFmode constant. Is there a special
suffix for __float128 available in gcc?
  
No; since the x86-64 ABI is what defines the __float128 name, you could 
ask the associated mailing list about a standard suffix to associate with 
it.


Lack of standard for __float128 is always a problem. Suffix for
__float128 constant is one, scanf/printf specifier for __float128
is another. We also don't have a name for string to __float128
function.
  


While the __float128 scanf/printf specifier is part of library (and this 
way, a custom library can provide these functions), the suffix for 
constant should be covered by the compiler. Otherwise there is no 
(clear) way to load the 128bit register with a 128bit constant value.


BTW: IA64 has the same issues with two FP types (long double XFmode and 
longer double TFmode). How is this solved for IA64?


Uros.

Re: Some thoughts about steerring commitee work

2007-06-18 Thread Sebastian Pop


On 6/18/07, Dorit Nuzman [EMAIL PROTECTED] wrote:

 I can hand you more than the testcases i've given so far.  There is
 tons of code out there that would  benefit from straight line

Interesting. I wasn't aware of this potential. Please do send some of this
code. thanks!



I'm thinking about loops whose bodies contain a call that is not
inlined, so the code in that function looks like stright line code,
but in fact is called from inside a loop.  This could happen even
because the compiler decided to outline the body of some loop, as is
the case for the openMP code gen, or autoparallelization.

Sebastian

Re: Some thoughts about steerring commitee work


On 6/18/07, Sebastian Pop [EMAIL PROTECTED] wrote:

On 6/18/07, Dorit Nuzman [EMAIL PROTECTED] wrote:
  I can hand you more than the testcases i've given so far.  There is
  tons of code out there that would  benefit from straight line

 Interesting. I wasn't aware of this potential. Please do send some of this
 code. thanks!


I'm thinking about loops whose bodies contain a call that is not
inlined, so the code in that function looks like stright line code,
but in fact is called from inside a loop.  This could happen even
because the compiler decided to outline the body of some loop, as
the case for the openMP code gen, or autoparallelization.


This is in fact, most of the code i will send to dorit.

It is, IMHO, not reasonable, to say that we should inline everything
that may ever turn out to be vectorizable :)

If you throw virtual functions into the mix, it may not even be possible ;)

Re: Suffix for __float128 FP constants

On Mon, Jun 18, 2007 at 07:25:06PM +0200, Uros Bizjak wrote:
 H. J. Lu wrote:
 
 I was trying to load a full 128 bit constant into __float128 variable, 
 but
 with L suffix, I was able to load only XFmode constant. Is there a 
 special
 suffix for __float128 available in gcc?
   
 No; since the x86-64 ABI is what defines the __float128 name, you could 
 ask the associated mailing list about a standard suffix to associate with 
 it.
 
 Lack of standard for __float128 is always a problem. Suffix for
 __float128 constant is one, scanf/printf specifier for __float128
 is another. We also don't have a name for string to __float128
 function.
   
 
 While the __float128 scanf/printf specifier is part of library (and this 
 way, a custom library can provide these functions), the suffix for 
 constant should be covered by the compiler. Otherwise there is no 
 (clear) way to load the 128bit register with a 128bit constant value.
 
 BTW: IA64 has the same issues with two FP types (long double XFmode and 
 longer double TFmode). How is this solved for IA64?

The same as x86-64 :-(. That is there is __float128 in ia64 psABI.
But it isn't fully implemented in gcc and glibc.


H.J.

New LTO branch ready


Hi guys, I have merged all patches touching lto/ into the new lto branch

I'm almost 100% positive the result will not compile.  There are no
interesting conflicts to report (most were just formatting changes).
I have not merged ChangeLog.lto onto the new branch, since it looked
like it only contained info about changes that were outside of lto/,
and the changes to lto/ went into lto/ChangeLog

If you want it, i'm happy to copy it.

I will perform  merges from mainline to branch every week or two,
unless you guys see a good reason not to

Re: Suffix for __float128 FP constants

2007-06-18 Thread Steve Ellcey

 BTW: IA64 has the same issues with two FP types (long double XFmode and 
 longer double TFmode). How is this solved for IA64?
 
 Uros.

This is different on IA64 HP-UX and IA64 Linux.  On HP-UX, 128 bits is
the standard long double and 80 bits is __float80.  We use the 'W'
suffix for a __float80 constant on HP-UX.  HP-UX also uses a lower case
'w' in math names for functions (e.g.  sqrtw) for __float80 functions.

Since __float128 == long double on HP-UX we can just use 'L' and 'l' for
those.

None of which helps on Linux.

Steve Ellcey
[EMAIL PROTECTED]

Re: Activate -mrecip with -ffast-math?

2007-06-18 Thread Uros Bizjak


tbp wrote:


For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first
get an inf from said reciprocal which then turns to a NaN in the NR
stage but if you correct it by, say, doing a comparison to 0 and a
'and'.
That's what ICC used to do in your back. That's what you'll find page
151 of the amdfam10 optimization manual. Because that's a common case.

As far as i can see, there's no such provision in the current patch.
At the very least provide a mean to look after those NaNs without
losing sanity, like a way to enforce argument order of
min/max[ss|ps|pd] without ressorting to inline asm.


But even if sqrt is corrected for 0.0 * inf, there would still be a lot 
of problems with the combinations of NR-enhanced rsqrt and rcp. Consider 
for example:


1.0/sqrt(a/b) alias rsqrt(a/b)

Having a=0, b != 0, the result is inf.

This expression is mathematically equal to sqrt(b/a) and the compiler is 
free to do this optimization. In this case, b*rcp(a) produces NaN due to 
NR of rcp(a) and here we loose.


Let's correct both, rsqrt and rcp NR steps for 0.0, so we have 
NR-rsqrt(0.0) = inf, NR-rcp(0.0) = inf.


Again, sqrt(b/a) will create sqrt(inf) = inf * rsqrt(inf), so NR step 
for rsqrt will hit (0.0 * inf) from the other side. We loose, because 
there is no correction for the case where input operand is infinity.


IMO,  due to limited range of operands for -mrecip pass (inf, -inf); 
where 0.0 is excluded, it should be keept out of -ffast-math. There is 
no point to fix reciprocals only for 0.0, we need to fix both 
conversions for infinity and 0.0, even in -ffast-math.


Uros.

RE: Fixed-point branch?

2007-06-18 Thread Fu, Chao-Ying

Bernd Schmidt wrote:

I attached a diff file for 14 files of the new structures
  and documents.  You and other maintainers are welcome to
  check it.  Thanks a lot!
  
  Note: 14 files are =
  genmodes.c mode-classes.def machmode.def machmode.h tree.def tree.h
  tree.c rtl.def rtl.h rtl.c fixed-value.h fixed-value.c
  doc/extend.texi doc/rtl.texi doc/c-tree.texi doc/md.texi
 
 Random comments..
 
  +  unsigned saturating_flag : 1; /* FIXME.  This new flag 
 increases the size of
  +  tree_common by a full word.  */
 
 Sounds undesirable.  We need to look hard for a way to avoid this.

  Yes, we can get one of 24 spare bits for this flag.  We just fixed this
issue last week.

 
  +ACCUM_MODE (HA, 2, 8, 7); /* s8.7 */
  +ACCUM_MODE (SA, 4, 16, 15); /* s16.15 */
  +ACCUM_MODE (DA, 8, 32, 31); /* s32.31 */
  +ACCUM_MODE (TA, 16, 64, 63); /* s64.63 */
 
 Lots of predefined types and modes in this patch.  What about targets
 with other requirements (the Blackfin has 40 bit (8 + 32) 
 accumulators)?

  In bfin-modes.def, we can adjust the DA mode to (s7.32) by using
ADJUST_IBIT(DA, 7)
ADJUST_FBIT(DA, 32)

 
 For vectors, we let the targets define the supported modes.  Why do we
 want something else for fractional support?

  I am not clear about this question.  The new modes (FRACT, UFRACT, ACCUM,
and UACCUM) enables GCC to recognize the formats of the underlying values
to perform constant folding (e.g., + - * /).
  To use the DA mode for vector, we can use:
VECTOR_MODE (ACCUM, DA, 2);

 
  +int
  +fixed_zerop (tree expr)
  +{
  +  return TREE_CODE (expr) == FIXED_CST
  + double_int_zero_p (TREE_FIXED_CST (expr).data);
  +}
 
 Formatting - this needs parentheses.  Elsewhere too.

  Ok.

 
  +static tree
  +make_or_reuse_fract_type (unsigned size, int unsignedp, int satp)
 
 Comments before functions.

  Ok.  Thanks!

Regards,
Chao-ying

Re: Activate -mrecip with -ffast-math?

2007-06-18 Thread Brooks Moses


Giovanni Bajo wrote:
Both our goals are legitimate. But that's not the point. The point is 
what -ffast-math semantically means (the simplistic list of suboptions 
activated by it is of couse unsufficiente because it doesn't explain how 
to behave in face of new options, like -mrecip). My proposal is:


-ffast-math activates all the mathematical-related optimizations that 
improves code speed while destroying floating point accuracy.


I don't think that's a workable proposal.  If it is taken literally, it 
means that the optimization of converting all floating-point arithmetic 
to no-ops and replacing all references to floating-point variables with 
zeros is allowed (and would be appropriate under this option).


And, personally, I don't think that documentation is of use if it can't 
be taken reasonably literally.  There's a line between what's acceptable 
and what's not, and regardless of where exactly it is, the documentation 
needs to fairly clearly indicate its location.


- Brooks

Re: Activate -mrecip with -ffast-math?

2007-06-18 Thread Bradley Lucier



On Jun 18, 2007, at 2:14 PM, Uros Bizjak wrote:



tbp wrote:

For example, when doing 1/x and sqrt(x) via reciprocal + NR, you  
first

get an inf from said reciprocal which then turns to a NaN in the NR
stage but if you correct it by, say, doing a comparison to 0 and a
'and'.
That's what ICC used to do in your back. That's what you'll find page
151 of the amdfam10 optimization manual. Because that's a common  
case.


As far as i can see, there's no such provision in the current patch.
At the very least provide a mean to look after those NaNs without
losing sanity, like a way to enforce argument order of
min/max[ss|ps|pd] without ressorting to inline asm.


But even if sqrt is corrected for 0.0 * inf, there would still be a  
lot of problems with the combinations of NR-enhanced rsqrt and rcp.  
Consider for example:


1.0/sqrt(a/b) alias rsqrt(a/b)

Having a=0, b != 0, the result is inf.


As already stated, -ffast-math turns on -ffinite-math-only, which  
allows the compiler to assume that a result of inf cannot happen, so  
gcc is allowed to ignore this possiblity.  Producing NaN instead of  
inf seems to be allowed.


This expression is mathematically equal to sqrt(b/a) and the  
compiler is free to do this optimization. In this case, b*rcp(a)  
produces NaN due to NR of rcp(a) and here we loose.


Let's correct both, rsqrt and rcp NR steps for 0.0, so we have NR- 
rsqrt(0.0) = inf, NR-rcp(0.0) = inf.


Again, sqrt(b/a) will create sqrt(inf) = inf * rsqrt(inf), so NR  
step for rsqrt will hit (0.0 * inf) from the other side. We loose,  
because there is no correction for the case where input operand is  
infinity.


IMO,  due to limited range of operands for -mrecip pass (inf, - 
inf); where 0.0 is excluded, it should be keept out of -ffast-math.  
There is no point to fix reciprocals only for 0.0, we need to fix  
both conversions for infinity and 0.0, even in -ffast-math.


I think that tbp wants just to ensure that sqrt(0.0)=0.0 even with  
your various reciprocal and sqrt optimizations.  (I can't test the  
new code now, but I think he claims that with the new sqrt  
optimizations sqrt(0.) = NaN; if indeed it does this then I would  
consider this a bug.)  I don't think he wants the optimizations to  
have to do the right thing when an argument or result of one of  
these operations is infinite or a NaN.


Of course, he can correct me if I'm wrong.

Brad

Re: Activate -mrecip with -ffast-math?

2007-06-18 Thread Bradley Lucier



On Jun 18, 2007, at 2:27 PM, Bradley Lucier wrote:

But even if sqrt is corrected for 0.0 * inf, there would still be  
a lot of problems with the combinations of NR-enhanced rsqrt and  
rcp. Consider for example:


1.0/sqrt(a/b) alias rsqrt(a/b)

Having a=0, b != 0, the result is inf.


As already stated, -ffast-math turns on -ffinite-math-only, which  
allows the compiler to assume that a result of inf cannot happen,  
so gcc is allowed to ignore this possiblity.  Producing NaN instead  
of inf seems to be allowed.


Let me restate this.

If -ffinite-math-only is specified, then producing NaN instead of inf  
should be allowed.


If -fno-finite-math-only is specified, then the generated code should  
do the right thing if an argument or result is inf or NaN.


In any case, I would consider it an error if the argument is finite,  
the result is supposed to be finite, and inf or NaN is produced.


Brad

Re: Incorrect bitfield aliasing with Tree SSA

 I am glad to see we are converging toward implementation issues now!
 
 I am storing it in a new field under the alias_set_entry:
 
   get_alias_set_entry (TYPE_ALIAS_SET (t))-nonaddr_alias_set.

Where T is which type?

Re: Incorrect bitfield aliasing with Tree SSA

 It gives you the alias set of the parent, which, for the reason that
 OTHER THINGS USE THE ALIAS SET SPLAY TREES, gives the wrong answer.

Can you give a few sentence explanation of what alias set splay trees
are and why they aren't using the alias set mechanism?

  I'm not sure what a TBAA forest is, but keep in mind that, at least in
  Ada, we have many different types (meaning different tree nodes) that have
  the same alias set and we really do mean that they are to conflict.
 
 That's nice.

But are they handled properly?

 There are other questions we ask about alias sets other than do these
 two alias sets conflict (which is asking whether they are subsets of
 each other, or equal).  We have good reasons to ask these questions.

Can you give examples of those questions?

Re: Incorrect bitfield aliasing with Tree SSA

2007-06-18 Thread Adam Nemet

Richard Kenner writes:
  I am glad to see we are converging toward implementation issues now!
  
  I am storing it in a new field under the alias_set_entry:
  
get_alias_set_entry (TYPE_ALIAS_SET (t))-nonaddr_alias_set.
 
 Where T is which type? 

Type of the expression passed to get_alias_set.  And without the
component_uses_parent_alias_set loop.

Adam

Re: Incorrect bitfield aliasing with Tree SSA

 Type of the expression passed to get_alias_set.  And without the
 component_uses_parent_alias_set loop.

So you mean the type of the *field*?  That can't work.  That type can't
be used for *anything*!

Otherwise, if you have

struct foo {int a: 32; int b: 32; };
struct bar {int c: 32; int d: 32; };

you have the fields A and C conflicting, which is wrong.

The T has to be the *record type*, so that when you share alias sets,
it's the same for every type in the same record, not every occurence of
some random type in different records.

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

 It gives you the alias set of the parent, which, for the reason that
 OTHER THINGS USE THE ALIAS SET SPLAY TREES, gives the wrong answer.

Can you give a few sentence explanation of what alias set splay trees
are and why they aren't using the alias set mechanism?


They are the alias set mechanism, which you don't seem to understand.
They always have been.

How do you believe we determine whether two alias sets conflict, or
are related at all?



  I'm not sure what a TBAA forest is, but keep in mind that, at least in
  Ada, we have many different types (meaning different tree nodes) that have
  the same alias set and we really do mean that they are to conflict.

 That's nice.

But are they handled properly?

Yes



 There are other questions we ask about alias sets other than do these
 two alias sets conflict (which is asking whether they are subsets of
 each other, or equal).  We have good reasons to ask these questions.

Can you give examples of those questions?


I'd rather not explain all of alias.c to you in an email message, to be honest

Re: GCC Status Report (2007-06-15)

2007-06-18 Thread Mark Mitchell

H. J. Lu wrote:

 Good. I have another question. Intel BID patch itself doesn't change
 any sources in DFP nor libdecnummber. The only significant change is
 to change Makefile in libgcc to use Intel BID library for DFP
 intrinsics when BID encoding is selected.  Currently, DFP is only
 supported on Linux/PPC, which uses DPD encoding, and Linux/x86, which
 uses BID encoding. So Intel BID patch only affects Linux/x86 as
 it changes libgcc/Makefile.in to use Intel BID library. Who has
 the final say on this patch?

The build system maintainers and the x86 maintainers.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713

Re: Incorrect bitfield aliasing with Tree SSA

 They are the alias set mechanism, which you don't seem to understand.
 They always have been.

I certainly understand the alias set mechanism.  It sounded like you were
talking about something else since if the only thing we're using is alias
sets, I'm mystified as to what the issue is.

 I'd rather not explain all of alias.c to you in an email message, to
 be honest

As I said, I completely understand alias.c.  It sounded like you were
trying to do something OUTSIDE of that.

So let's start again: why is it suddenly necessary that their be a
hierarchy of alias sets when no fields are addressable?  If I have
struct foo {int a: 1; int b: 1;};

why do we need more than one alias set?  Who is it that requires any
subsetting at all?  Certainly nothing in alias.c does.

RE: Some thoughts about steerring commitee work

2007-06-18 Thread Jagasia, Harsha

Hi Dorit,

loop-context when it helps you do things more efficiently. In any case,
we'll have to have a much better cost model before we start packing
random
sequences of stmts out of loops.

This is off topic from the discussion at hand, but we would be happy to
help with changing the cost model to handle this in the autovect-branch
or in mainline as you see it fit.

Thanks,
Harsha

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

 They are the alias set mechanism, which you don't seem to understand.
 They always have been.

I certainly understand the alias set mechanism.  It sounded like you were
talking about something else since if the only thing we're using is alias
sets, I'm mystified as to what the issue is.

 I'd rather not explain all of alias.c to you in an email message, to
 be honest

As I said, I completely understand alias.c.


You clearly do not


  It sounded like you were
trying to do something OUTSIDE of that.

So let's start again: why is it suddenly necessary that their be a
hierarchy of alias sets when no fields are addressable?  If I have
struct foo {int a: 1; int b: 1;};

why do we need more than one alias set?  Who is it that requires any
subsetting at all?  Certainly nothing in alias.c does.

I'm not going through this again, i'm just going to fix the problem.
I've wasted enough time on this.

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

 Type of the expression passed to get_alias_set.  And without the
 component_uses_parent_alias_set loop.

So you mean the type of the *field*?  That can't work.  That type can't
be used for *anything*!

Otherwise, if you have

struct foo {int a: 32; int b: 32; };
struct bar {int c: 32; int d: 32; };

you have the fields A and C conflicting, which is wrong.


With the current scheme you have fields a and b conflict
and c and d conflicting

Both of which are wrong

HTH,
Dan

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

  you have the fields A and C conflicting, which is wrong.

 Well, that is where structure-field aliasing comes in.  The two cannot
 even alias for addressable fields:

At tree level I'll take your word for it, but what about RTL level?
Is that nonconflicting status passed to RTL?  What *is* the problem
with just using the parent alias set?



At the RTL level, nothing.
If you ever wanted to make the RTL level do *better* than it does now,
you'd run into the same problem the tree level does.


I continue to strongly feel that the field type shouldn't be used
for ANYTHING!


Then you will continue to get worse code generation than you could, in
addition to bugs like we have now.

HTH,
Dan

The Linux binutils 2.17.50.0.17 is released

This is the beta release of binutils 2.17.50.0.17 for Linux, which is
based on binutils 2007 0615 in CVS on sourceware.org plus various
changes. It is purely for Linux.

All relevant patches in patches have been applied to the source tree.
You can take a look at patches/README to see what have been applied and
in what order they have been applied.

Starting from the 2.17.50.0.4 release, the default output section LMA
(load memory address) has changed for allocatable sections from being
equal to VMA (virtual memory address), to keeping the difference between
LMA and VMA the same as the previous output section in the same region.

For

.data.init_task : { *(.data.init_task) }

LMA of .data.init_task section is equal to its VMA with the old linker.
With the new linker, it depends on the previous output section. You
can use

.data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) }

to ensure that LMA of .data.init_task section is always equal to its
VMA. The linker script in the older 2.6 x86-64 kernel depends on the
old behavior.  You can add AT (ADDR(section)) to force LMA of
.data.init_task section equal to its VMA. It will work with both old
and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and
above is OK.

The new x86_64 assembler no longer accepts

monitor %eax,%ecx,%edx

You should use

monitor %rax,%ecx,%edx

or
monitor

which works with both old and new x86_64 assemblers. They should
generate the same opcode.

The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

movl (%eax),%ds
movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

mov (%eax),%ds
mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

movw (%eax),%ds
movw %ds,(%eax)

without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are
available at

http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch
http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch

The ia64 assembler is now defaulted to tune for Itanium 2 processors.
To build a kernel for Itanium 1 processors, you will need to add

ifeq ($(CONFIG_ITANIUM),y)
CFLAGS += -Wa,-mtune=itanium1
AFLAGS += -Wa,-mtune=itanium1
endif

to arch/ia64/Makefile in your kernel source tree.

Please report any bugs related to binutils 2.17.50.0.17 to [EMAIL PROTECTED]

and

http://www.sourceware.org/bugzilla/

Changes from binutils 2.17.50.0.16:

1. Update from binutils 2007 0615.
2. Preserve section alignment for copy relocation.  PR 4504.
3. Properly fix regression with objcopy --only-keep-debug.  PR 4479.
4. Fix ELF eh frame handling.  PR 4497.
5. Fix ia64 string merge.  PR 4590.
5. Don't use PE target on EFI files nor EFI target on PE files.
6. Speed up linker with many input files.
7. Support cross compiling windres.  PR 2737.
8. Fix various windres bugs.
9. Fix various arms bugs.
10. Fix various m68k bugs.
11. Fix various mips bugs.
12. Fix various ppc bugs.
13. Fix various sparc bugs.
14. Fix various spu bugs.
15. Fix various xtensa bugs.

Changes from binutils 2.17.50.0.15:

1. Update from binutils 2007 0511.
2. Fix objcopy --only-keep-debug and linker multiple BSS sections handling.
PR 4479.
3. Fix readelf -s -D for gnu hash.  PR 4476.
4. Fix ia64 linker crash with --unresolved-symbols=ignore-all. PR 4409.
5. Improve crc32 support in x86 assembler/dissassembler.
6. Improve displacement handling in x86 dissassembler. PR 4430.
7. Correct PC relative displacement handling in x86-64 dissassembler for
Intel mode. PR 4429.
8. Fix various PPC bugs.
9. Fix various SPU bugs.
10. Fix various ARM bugs.
11. Fix various m68k bugs.
12. Fix various xtensa bugs.

Changes from binutils 2.17.50.0.14:

1. Update from binutils 2007 0418.
2. Support Intel SSE4 instructions.
3. Fix linker --fatal-warnings for --warn-shared-textrel. PR 4304.
4. Improve linker error message to identify linker script error
location. PR 4090.
5. Fix objcopy to allow removing all sections. PR 4348.
6. Don't print addresses of 32-bit targets as 64-bit values on 64bit
host. PR 4292.
7. Improve checking for corrupted input files. PR 4110.
8. Improve alpha linker performance.
9. Add a new linker option, -l:foo.
10. Fix a PPC linker bug. PR 4267.
11. Misc vxworks bug fixes.
12. Misc SH bug fixes.
13. Misc SPU bug fixes.
14. Misc ARM bug fixes.
15. Misc MIPS bug fixes.
16. Misc xtensa bug fixes.

Changes from binutils 2.17.50.0.13:

1. Update from binutils 2007 0322.
2. Fix 16byte nop padding regression in x86 assembler.
3. Fix x86-64 disassembler for xchg. PR 4218.
4. Optimize opcode for x86-64 xchg.
5. Allow register operand with x86 nop.
6. Properly handle holes between sections for PE-COFF. PR 4210.
7. Print more PE-COFF info for

Re: Incorrect bitfield aliasing with Tree SSA

  struct foo {int a: 32; int b: 32; };
  struct bar {int c: 32; int d: 32; };
 
  you have the fields A and C conflicting, which is wrong.
 
 With the current scheme you have fields a and b conflict
 and c and d conflicting
 
 Both of which are wrong

But nothing is changing that!  This is true whether or not the fields
are addressable and for all proposals given so far.

The only way to change this would be to make a new unique alias set
for each nonaddressable field in a record and mark each as a subset of
the record.  This would be optimal, but is expensive for large records
(e.g., ones with thousands of fields) and there's no good place to store
such an alias set.

However, you don't really NEED to deconflict such fields using alias sets
since there are already mechanisms at both the tree and RTL level to know
that such accesses can't conflict (being different FIELD_DECLs).

Re: Incorrect bitfield aliasing with Tree SSA

  I continue to strongly feel that the field type shouldn't be used
  for ANYTHING!
 
 Then you will continue to get worse code generation than you could, in
 addition to bugs like we have now.

Explain to me why in the following case:

struct s1 {int a;};
struct s2 {short a;};

there should be any difference.  Why should one reference something having
to do with int and the other short?  How does knowing the type of
the field here help anything?

Perhaps you are forgetting about MEM_EXPR (which I understand *very well* 
since I was the implementor of it)!

Re: Suffix for __float128 FP constants

On Mon, Jun 18, 2007 at 11:10:43AM -0700, Steve Ellcey wrote:
  BTW: IA64 has the same issues with two FP types (long double XFmode and 
  longer double TFmode). How is this solved for IA64?
  
  Uros.
 
 This is different on IA64 HP-UX and IA64 Linux.  On HP-UX, 128 bits is
 the standard long double and 80 bits is __float80.  We use the 'W'
 suffix for a __float80 constant on HP-UX.  HP-UX also uses a lower case
 'w' in math names for functions (e.g.  sqrtw) for __float80 functions.
 
 Since __float128 == long double on HP-UX we can just use 'L' and 'l' for
 those.
 

We need a standard for __float128. Otherwise, a program using
__float128 may generate different results with different
compilers on different platforms.

BTW, I had a __float128 patch for glibc. Because there is no
__float128 standard, it wasn't accepted.


H.J.

Re: Fixed-point branch?

2007-06-18 Thread Bernd Schmidt


Fu, Chao-Ying wrote:


+ACCUM_MODE (HA, 2, 8, 7); /* s8.7 */
+ACCUM_MODE (SA, 4, 16, 15); /* s16.15 */
+ACCUM_MODE (DA, 8, 32, 31); /* s32.31 */
+ACCUM_MODE (TA, 16, 64, 63); /* s64.63 */

Lots of predefined types and modes in this patch.  What about targets
with other requirements (the Blackfin has 40 bit (8 + 32) 
accumulators)?


  In bfin-modes.def, we can adjust the DA mode to (s7.32) by using
ADJUST_IBIT(DA, 7)
ADJUST_FBIT(DA, 32)


For vectors, we let the targets define the supported modes.  Why do we
want something else for fractional support?


  I am not clear about this question.  The new modes (FRACT, UFRACT, ACCUM,
and UACCUM) enables GCC to recognize the formats of the underlying values
to perform constant folding (e.g., + - * /).
  To use the DA mode for vector, we can use:
VECTOR_MODE (ACCUM, DA, 2);


No, I was trying to make an analogy of how ports explicitly define the 
modes their hardware supports, e.g. for arm:


/* Vector modes.  */
VECTOR_MODES (INT, 4);/*V4QI V2HI */
VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI */
VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI */
VECTOR_MODES (FLOAT, 8);  /*V4HF V2SF */
VECTOR_MODES (FLOAT, 16); /*   V8HF V4SF V2DF */

I'm wondering whether it's a good idea to have a lot of pre-defined 
fractional modes and types that may or may not match the target 
hardware.  Not saying it's necessarily wrong; I'm just interested to 
hear why you chose to do it this way.  (I also just noticed that things 
like SHORT_ACCUM_TYPE_SIZE are used but apparently not defined in the 
patch - does it actually compile?)



Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH  Wilhelm-Wagenfeld-Str. 6  80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif

Re: Suffix for __float128 FP constants

2007-06-18 Thread Steve Ellcey

 We need a standard for __float128. Otherwise, a program using
 __float128 may generate different results with different
 compilers on different platforms.
 
 BTW, I had a __float128 patch for glibc. Because there is no
 __float128 standard, it wasn't accepted.
 
 
 H.J.

The HP compiler has an option that allows it to accept 'extended' as a
type that is equivalent to __float80 and 'quad' as equivalent to 'long
double' which is __float128 on HP-UX.  For the quad type it uses the Q
suffix for quad constants (and a lower case q for quad functions like
sqrtq).  I don't think this is a standard, but it is a precedent.

Steve Ellcey
[EMAIL PROTECTED]

Re: I'm sorry, but this is unacceptable (union members and ctors)

If all you need is one memeber that has constructors / destructors, and
all other members are PODs that provide an alternate view of the contents,
then I think that would make a logical extension of the transparent union
extension. A transparent union as passed to functions in the same manner as
its first member. You could define that a tranparent union is allowed
to have as its first member a class with constructors and/or destructors,
and that these constructors / destructors are then the constructors /
destructors of the union.

Caveat: If the union is larger or more alingned than its first member,
the argument passing semantics don't make sense. This is documented in
extend.texi:
Second, the argument is passed to the function using the calling
conventions of the first member of the transparent union, not the calling
conventions of the union itself. All members of the union must have the
same machine representation; this is necessary for this argument passing
to work properly.

There is also a syntax example for __attribute__ ((__transparent_union__))
in
extend.texi.
Inside the compiler, you can check if union is a transparent union using the
TYPE_TRANSPARENT_UNION macro.

If the initial union member can be an anonymous struct, and rather than
expecting further members to be POD instead their ctors/dtors are simply
ignored, then that would work for anything I can come up with.

Try contrib/gcc_update --touch after the checkout.

This suggestion made some ground. But I just can't get a build to complete.
The newest checkout / release aren't compatible with my C libraries it
seems, and I'm not sure its safe dependency wise to just replace the C
libraries. So I rewound my subversion checkout to the same branch as is in
my debian distribution repository. That build gave up when it couldn't find
a directory called config/i386 I think it was. So I downloaded the same
major release as my distro just now (4.0.0) and this one is trying to access
the gcc/include directory with ../include from build*/liberty which
obviously should be ../../include, so it gives up. I just can't win.

--
View this message in context:
http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11184663
Sent from the gcc - Dev mailing list archive at Nabble.com.

Re: Suffix for __float128 FP constants

On Mon, Jun 18, 2007 at 02:33:07PM -0700, Steve Ellcey wrote:
  We need a standard for __float128. Otherwise, a program using
  __float128 may generate different results with different
  compilers on different platforms.
  
  BTW, I had a __float128 patch for glibc. Because there is no
  __float128 standard, it wasn't accepted.
  
  
  H.J.
 
 The HP compiler has an option that allows it to accept 'extended' as a
 type that is equivalent to __float80 and 'quad' as equivalent to 'long
 double' which is __float128 on HP-UX.  For the quad type it uses the Q
 suffix for quad constants (and a lower case q for quad functions like
 sqrtq).  I don't think this is a standard, but it is a precedent.

I used `q' as suffix for __float128 functions like __isinfq/__isnanq.
But I used strtoqd since we have strtold. I like `Q' suffix in
__float128 constants.



H.J.

Re: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread Eric Christopher



On Jun 18, 2007, at 2:36 PM, michael.a wrote:

This suggestion made some ground. But I just can't get a build to  
complete.

The newest checkout / release aren't compatible with my C libraries it
seems, and I'm not sure its safe dependency wise to just replace the C
libraries. So I rewound my subversion checkout to the same branch as  
is in
my debian distribution repository. That build gave up when it  
couldn't find
a directory called config/i386 I think it was. So I downloaded the  
same
major release as my distro just now (4.0.0) and this one is trying  
to access
the gcc/include directory with ../include from build*/liberty  
which

obviously should be ../../include, so it gives up. I just can't win.


Sounds like you're using ./configure. Are you following the directions  
at:


http://gcc.gnu.org/install/configure.html

-eric

RE: Fixed-point branch?

2007-06-18 Thread Fu, Chao-Ying

Bernd Schmidt wrote:

  +ACCUM_MODE (HA, 2, 8, 7); /* s8.7 */
  +ACCUM_MODE (SA, 4, 16, 15); /* s16.15 */
  +ACCUM_MODE (DA, 8, 32, 31); /* s32.31 */
  +ACCUM_MODE (TA, 16, 64, 63); /* s64.63 */
  Lots of predefined types and modes in this patch.  What 
 about targets
  with other requirements (the Blackfin has 40 bit (8 + 32) 
  accumulators)?
  
In bfin-modes.def, we can adjust the DA mode to (s7.32) by using
  ADJUST_IBIT(DA, 7)
  ADJUST_FBIT(DA, 32)
  
  For vectors, we let the targets define the supported 
 modes.  Why do we
  want something else for fractional support?
  
I am not clear about this question.  The new modes 
 (FRACT, UFRACT, ACCUM,
  and UACCUM) enables GCC to recognize the formats of the 
 underlying values
  to perform constant folding (e.g., + - * /).
To use the DA mode for vector, we can use:
  VECTOR_MODE (ACCUM, DA, 2);
 
 No, I was trying to make an analogy of how ports explicitly 
 define the 
 modes their hardware supports, e.g. for arm:
 
 /* Vector modes.  */
 VECTOR_MODES (INT, 4);/*V4QI V2HI */
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI */
 VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI */
 VECTOR_MODES (FLOAT, 8);  /*V4HF V2SF */
 VECTOR_MODES (FLOAT, 16); /*   V8HF V4SF V2DF */
 
 I'm wondering whether it's a good idea to have a lot of pre-defined 
 fractional modes and types that may or may not match the target 
 hardware.  Not saying it's necessarily wrong; I'm just interested to 
 hear why you chose to do it this way.  (I also just noticed 
 that things 
 like SHORT_ACCUM_TYPE_SIZE are used but apparently not defined in the 
 patch - does it actually compile?)
 

  Ok. I got it.  Maybe we treat fixed-point modes as the first class 
modes like other scalar modes (integer, floating, etc.), so we pre-define them.
We can argue that how about making machine modes (ex: floating-point, decimal 
floating-point)
not pre-defined, similar to vector modes.

  I think, the default fixed-point formats are the efficient ones
for 32-bit/64-bit processors (with or without hardware supports).
One of the goals for the fixed-point extension is that all targets in GCC will
enable the extension, so efficient formats may be set by default.

  We have all FRACT and ACCUM sizes in defaults.h.  Thanks!

#ifndef SHORT_FRACT_TYPE_SIZE
#define SHORT_FRACT_TYPE_SIZE BITS_PER_UNIT
#endif

#ifndef FRACT_TYPE_SIZE
#define FRACT_TYPE_SIZE (BITS_PER_UNIT * 2)
#endif

#ifndef LONG_FRACT_TYPE_SIZE
#define LONG_FRACT_TYPE_SIZE (BITS_PER_UNIT * 4)
#endif

#ifndef LONG_LONG_FRACT_TYPE_SIZE
#define LONG_LONG_FRACT_TYPE_SIZE (BITS_PER_UNIT * 8)
#endif

#ifndef SHORT_ACCUM_TYPE_SIZE
#define SHORT_ACCUM_TYPE_SIZE (SHORT_FRACT_TYPE_SIZE * 2)
#endif

#ifndef ACCUM_TYPE_SIZE
#define ACCUM_TYPE_SIZE (FRACT_TYPE_SIZE * 2)
#endif

#ifndef LONG_ACCUM_TYPE_SIZE
#define LONG_ACCUM_TYPE_SIZE (LONG_FRACT_TYPE_SIZE * 2)
#endif

#ifndef LONG_LONG_ACCUM_TYPE_SIZE
#define LONG_LONG_ACCUM_TYPE_SIZE (LONG_LONG_FRACT_TYPE_SIZE * 2)
#endif

Regards,
Chao-ying

Re: I'm sorry, but this is unacceptable (union members and ctors)




Eric Christopher-2 wrote:
 
 
 Sounds like you're using ./configure. Are you following the directions  
 at:
 
 http://gcc.gnu.org/install/configure.html
 
 -eric
 
 

Thank you, I guess I missed that page somehow.

Only I ran into the same Libc wall again, so I'm temporarily stumped:

/usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.so when searching
for -lc
/usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.a when searching for
-lc
/usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.so when searching
for -lc
/usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.a when searching for
-lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
/usr/bin/ld: cannot find -lc
collect2: ld returned 1 exit status

-- 
View this message in context: 
http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11185246
Sent from the gcc - Dev mailing list archive at Nabble.com.

Re: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread Eric Christopher



Thank you, I guess I missed that page somehow.

Only I ran into the same Libc wall again, so I'm temporarily stumped:

/usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.so when  
searching

for -lc
/usr/bin/ld: skipping incompatible /usr/lib/../lib/libc.a when  
searching for

-lc
/usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.so when  
searching

for -lc
/usr/bin/ld: skipping incompatible /usr/bin/../lib/libc.a when  
searching for

-lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching  
for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching  
for -lc

/usr/bin/ld: cannot find -lc
collect2: ld returned 1 exit status


You might want to make sure you're passing the same configure options  
that the distro did when building. It might cause some incompatibility  
somewhere that ld is detecting. From a quick look it seems that ld  
believes that the libc that you have doesn't match the gcc that you're  
building. (i.e. the bfd arch is incompatible).


-eric

gcc-4.1-20070618 is now available

2007-06-18 Thread gccadmin

Snapshot gcc-4.1-20070618 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20070618/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.1 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch 
revision 125829

You'll find:

gcc-4.1-20070618.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.1-20070618.tar.bz2 C front end and core compiler

gcc-ada-4.1-20070618.tar.bz2  Ada front end and runtime

gcc-fortran-4.1-20070618.tar.bz2  Fortran front end and runtime

gcc-g++-4.1-20070618.tar.bz2  C++ front end and runtime

gcc-java-4.1-20070618.tar.bz2 Java front end and runtime

gcc-objc-4.1-20070618.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.1-20070618.tar.bz2The GCC testsuite

Diffs from 4.1-20070611 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.1
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

RE: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread Dave Korn

On 18 June 2007 23:46, michael.a wrote:

 Eric Christopher-2 wrote:
 
 
 You might want to make sure you're passing the same configure options
 that the distro did when building. It might cause some incompatibility
 somewhere that ld is detecting. From a quick look it seems that ld
 believes that the libc that you have doesn't match the gcc that you're
 building. (i.e. the bfd arch is incompatible).
 
 -eric
 
 
 
 I'm sure plenty will look down their nose at me for asking in this
 forum(mailing list) ...but I recall a tool for checking libraries and build
 options (libtool libraries only maybe) ...I can't recall how or where, or
 think what to feed google (The architecture I'm building on is amd64 if that
 stirs up any ideas)

  I always need to 

export LD_LIBRARY_PATH=/usr/lib64:/usr/lib

on linux-x86_64 before I can do a build.  (Yes, I should probably set it in my
.bashrc or whatever...)

cheers,
  DaveK
-- 
Can't think of a witty .sigline today

Re: I'm sorry, but this is unacceptable (union members and ctors)




Eric Christopher-2 wrote:
 
 
 'gcc -v' will give you the information on how the system gcc was  
 configured.
 
 -eric
 
 

Here is the gcc -v output for the binaries installed by the distro:

Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--program-suffix=-4.0 --enable-__cxa_atexit --enable-clocale=gnu
--enable-libstdcxx-debug --enable-java-awt=gtk-default --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr
--disable-werror --enable-checking=release x86_64-linux-gnu
Thread model: posix
gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)

I'm attempting to build from the 4.0.0 releases.
-- 
View this message in context: 
http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11185796
Sent from the gcc - Dev mailing list archive at Nabble.com.

Re: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread Brian Dessent

michael.a wrote:

 gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)

This belongs on gcc-help not here.

Debian-based distros use a 32/64 bit /usr/lib configuration that is
backwards from what the rest of the world uses and requires a patched
gcc to multilib correctly.  You'll probably need to --disable-multilib
if you're building FSF gcc.

Brian

Re: I'm sorry, but this is unacceptable (union members and ctors)

Brian Dessent wrote:

michael.a wrote:

gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)

This belongs on gcc-help not here.

Debian-based distros use a 32/64 bit /usr/lib configuration that is
backwards from what the rest of the world uses and requires a patched
gcc to multilib correctly. You'll probably need to --disable-multilib
if you're building FSF gcc.

Brian

Yeah, I know (mailing lists are so particular -- I guess I fail to see the
value beyond a noncentralized discussion)

In any case, without multilib it makes it to here:

make[2]: Leaving directory `/home/users/michael/gcc.obj/gcc'
echo timestamp stmp-multilib
cp doc/gcc.1 doc/g++.1
cp: cannot stat `doc/gcc.1': No such file or directory
make[1]: *** [doc/g++.1] Error 1
make[1]: Leaving directory `/home/users/michael/gcc.obj/gcc'
make: *** [all-gcc] Error 2

Not sure exactly what is going on here. The gcc/doc directory is empty. I'm
assuming everything made it through. There are about a billion targets in
the Makefile and no explanatory header. Any suggestions for just building
the essentials?

I've only really recently taken on serious linux development and haven't
much actual build experience outside the usual routine. I have autotools
under my belt, but I glossed over most of the auxiliary stuff.

--
View this message in context:
http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11186332
Sent from the gcc - Dev mailing list archive at Nabble.com.

Re: Activate -mrecip with -ffast-math?


On 6/18/07, Uros Bizjak [EMAIL PROTECTED] wrote:

IMO,  due to limited range of operands for -mrecip pass (inf, -inf);
where 0.0 is excluded, it should be keept out of -ffast-math. There is
no point to fix reciprocals only for 0.0, we need to fix both
conversions for infinity and 0.0, even in -ffast-math.

Indeed there are holes in every direction when you pull in such
transformation, and the cost of plugging every one of them would be
prohibitive; the next batch of c2d supposedly will leave you with ~6
cycles to make it worth for a sqrt.
Of course it only gets worse when you start composing.

My point merely was that, considering one operation, you'd introduce
NaN for a not so special value (0) which, in a *fast* math scenario,
could be produced at any previous stage due to denormal clamping; with
no sane way to take care of.
Again, if you look at prior art (icc, AMD's manual...), that's the
only special case they covered.
Admittedly that's a trade off but not that unreasonable.

Now, an option to remove such transformations from -ffast-math
bag-o-tricks would be fine and would still buy gcc some Spec bragging
rights :)

virtual stack regs.

2007-06-18 Thread Kenneth Zadeck

I would like to get some more information about pr32374.

I do not know what virtual_stack_vars are and there is no documentation
in the doc directory.

1) What are these?

2) Why are they uninitialized?

3) If they really are uninitialized, why is it a problem to assign zero
to them.

4) If they are not uninitialized, where is the initialization code? Why
does df not see it?

5) How can I tell if a reg is a virtual_stack_reg?

Re: I'm sorry, but this is unacceptable (union members and ctors)

2007-06-18 Thread Daniel Jacobowitz

On Mon, Jun 18, 2007 at 04:57:46PM -0700, michael.a wrote:
 Yeah, I know (mailing lists are so particular -- I guess I fail to see the
 value beyond a noncentralized discussion)

But since I believe three different people have asked you to move this
problem to a different mailing list now, could you please do so?  Thanks.

-- 
Daniel Jacobowitz
CodeSourcery

Object attribute tagging

2007-06-18 Thread Joseph S. Myers

The question was raised a while back on the gcc-patches and gdb-patches 
lists of how GCC should tag objects with some ABI information for the use 
of GDB, noting that various different methods have been in use 
http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00395.html.

Mark suggested http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00854.html 
that we use the ARM EABI object attribute mechanism; there were no 
objections to this at that time.  This provides for both 
processor-specific and vendor-specific tags, and for tags at the level of 
object files, sections or individual functions (although binutils only 
really supports tags at the object file level at present).  Tags can be 
merged from input files (with compatibility and merging rules defined) to 
produce the tag in an output file; the linker can give a warning or error 
for incompatible tags (e.g. object files using different ABI variants).

I now propose implementing this to mark MIPS and Power objects with 
whether they are using the hard-float or soft-float ABI, both so the 
linker can complain if users accidentally link incompatible objects 
together, and so GDB knows the ABI in use by a binary.  (It's desirable to 
know the ABI even in the absence of debug information, e.g. to call 
functions in libc, so DW_AT_calling_convention doesn't seem a sufficient 
alternative.)

The ARM EABI uses a .ARM.attributes section of type SHT_ARM_ATTRIBUTES.  
For platforms where there isn't such a specification for that processor, I 
propose .GNU.attributes and SHT_GNU_ATTRIBUTES, and an assembler directive 
.gnu_attribute in place of .eabi_attribute.  This would generate entries 
under the gnu vendor (whereas .eabi_attribute uses the standard 
aeabi); if more processor ABI specifications pick up the attributes 
specification then we could switch to appropriate processor-specific 
sections.  On ARM, both .gnu_attribute and .eabi_attribute could be used, 
and would both generate entries in .ARM.attributes, under the gnu and 
aeabi vendors respectively.  Appropriate parts of the ARM binutils code 
would be made available to all ELF binutils targets.

The ARM EABI says that only standard entries under aeabi should affect 
link-compatibility of object files, not vendor entries such as gnu, but 
in the absence of corresponding standards for other processors I don't 
think we can avoid use of gnu for link-compatibility on non-ARM 
processors for now - if processor ABIs standardize things in future we can 
deprecate the associated gnu attributes.

Additional object tagging ay be of use in future with LTO, to mark objects 
with information about command-line options used where such options are 
relevant to code generation but not recorded directly in the IR (e.g., 
target-specific options selecting CPU features that may be used or 
built-in functions that are enabled).  We can allocate such tags in future 
as and when needed.  I propose to establish some convention for which 
gnu attributes are target-dependent and which are target-independent.

Any comments on either the general approach or the details?

-- 
Joseph S. Myers
[EMAIL PROTECTED]

Re: I'm sorry, but this is unacceptable (union members and ctors)

Daniel Jacobowitz-2 wrote:

On Mon, Jun 18, 2007 at 04:57:46PM -0700, michael.a wrote:
Yeah, I know (mailing lists are so particular -- I guess I fail to see
the
value beyond a noncentralized discussion)

But since I believe three different people have asked you to move this
problem to a different mailing list now, could you please do so? Thanks.

--
Daniel Jacobowitz
CodeSourcery

I'm sorry, it just occurred to me that gcc-help was another forum in this
Nabble interface (I'm not really sure how everything is related -- but
mailing list subscriptions drive me crazy, so I was reticent to deal with
another)

Just for the record...

michael.a wrote:

In any case, without multilib it makes it to here:

Not sure exactly what is going on here. The gcc/doc directory is empty.
I'm assuming everything made it through. There are about a billion targets
in the Makefile and no explanatory header. Any suggestions for just
building the essentials?

This hack http://gcc.gnu.org/ml/gcc-bugs/2005-04/msg03614.html seemed to get
through that bug (so many pitfalls)

Since I'm already posting, now I'm seeing:

/home/users/michael/gcc.obj/gcc/f951: symbol lookup error:
/home/users/michael/gcc.obj/gcc/f951: undefined symbol:
__gmp_get_memory_functions

I installed the latest GMP libraries earlier, so I'm not really sure what to
think, unless the libraries aren't backwards compatible. I will mention it
in gcc-help tomorrow, unless I hear something.

I hope this conversation isn't otherwise dead at this point.

sincerely,

michael
--
View this message in context:
http://www.nabble.com/I%27m-sorry%2C-but-this-is-unacceptable-%28union-members-and-ctors%29-tf3930964.html#a11187556
Sent from the gcc - Dev mailing list archive at Nabble.com.

Re: Object attribute tagging

On Tue, Jun 19, 2007 at 01:50:27AM +, Joseph S. Myers wrote:
 The question was raised a while back on the gcc-patches and gdb-patches 
 lists of how GCC should tag objects with some ABI information for the use 
 of GDB, noting that various different methods have been in use 
 http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00395.html.
 
 Mark suggested http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00854.html 
 that we use the ARM EABI object attribute mechanism; there were no 
 objections to this at that time.  This provides for both 
 processor-specific and vendor-specific tags, and for tags at the level of 
 object files, sections or individual functions (although binutils only 
 really supports tags at the object file level at present).  Tags can be 
 merged from input files (with compatibility and merging rules defined) to 
 produce the tag in an output file; the linker can give a warning or error 
 for incompatible tags (e.g. object files using different ABI variants).
 
 I now propose implementing this to mark MIPS and Power objects with 
 whether they are using the hard-float or soft-float ABI, both so the 
 linker can complain if users accidentally link incompatible objects 
 together, and so GDB knows the ABI in use by a binary.  (It's desirable to 
 know the ABI even in the absence of debug information, e.g. to call 
 functions in libc, so DW_AT_calling_convention doesn't seem a sufficient 
 alternative.)
 
 The ARM EABI uses a .ARM.attributes section of type SHT_ARM_ATTRIBUTES.  
 For platforms where there isn't such a specification for that processor, I 
 propose .GNU.attributes and SHT_GNU_ATTRIBUTES, and an assembler directive 
 .gnu_attribute in place of .eabi_attribute.  This would generate entries 
 under the gnu vendor (whereas .eabi_attribute uses the standard 
 aeabi); if more processor ABI specifications pick up the attributes 
 specification then we could switch to appropriate processor-specific 
 sections.  On ARM, both .gnu_attribute and .eabi_attribute could be used, 
 and would both generate entries in .ARM.attributes, under the gnu and 
 aeabi vendors respectively.  Appropriate parts of the ARM binutils code 
 would be made available to all ELF binutils targets.
 
 The ARM EABI says that only standard entries under aeabi should affect 
 link-compatibility of object files, not vendor entries such as gnu, but 
 in the absence of corresponding standards for other processors I don't 
 think we can avoid use of gnu for link-compatibility on non-ARM 
 processors for now - if processor ABIs standardize things in future we can 
 deprecate the associated gnu attributes.
 
 Additional object tagging ay be of use in future with LTO, to mark objects 
 with information about command-line options used where such options are 
 relevant to code generation but not recorded directly in the IR (e.g., 
 target-specific options selecting CPU features that may be used or 
 built-in functions that are enabled).  We can allocate such tags in future 
 as and when needed.  I propose to establish some convention for which 
 gnu attributes are target-dependent and which are target-independent.
 
 Any comments on either the general approach or the details?
 

I like this initiative. For x86, currently we have no way to
make an object/shared library to indicate

1. Different parameter passing schemes: on stack vs. in registers. It
could be even per function based.
2. Different alignment requirements. -malign-double.
3. Different long double. -m128bit-long-double vs. -m96bit-long-double.
4. Different ISAs, x87, SSE, SSE2, 
5. Different fpmath. x87 vs. SSE.
6. Different x86-64 models.
7. With or without x86-64 red zone.
8. Different x86-64 ABIs. ELF vs. Win64.
9. Different ia32 stack aligment requirements. psABI only requires
10. byte alignment and gcc wants 16.

It will be nice to address those issuses in a general and
extensible way.


H.J.

Re: Activate -mrecip with -ffast-math?

2007-06-18 Thread Mark Mitchell

Bradley Lucier wrote:

 If -ffinite-math-only is specified, then producing NaN instead of inf
 should be allowed.

Agreed.  After all, -finite-math says:

 Allow optimizations for floating-point arithmetic that assume that
 arguments and results are not NaNs or +-Infs.

Since the compiler can assume the output isn't a NaN or an Inf, it can
freely switch one and the other.

 If -fno-finite-math-only is specified, then the generated code should
 do the right thing if an argument or result is inf or NaN.

Also agreed.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713

Re: GCC Status Report (2007-06-15)

2007-06-18 Thread Ben Elliston

On Sat, 2007-06-16 at 06:17 -0700, H. J. Lu wrote:

 BTW, an x86 DFP configure bug was reported 3 months ago.  But it still
 hasn't benen fixed. I opened a DFP bug report:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32351
 
 with a patch. I hope it will be fixed before gcc 4.3 is released :-).

Sorry about the delay.  Yes, I assure you it will be fixed by then and
perhaps by the end of this week .. :-)

Cheers, Ben

Re: Help in understanding ccp propagator

On Sun, 17 Jun 2007, Revital1 Eres wrote:

 Hello,
 
 I have one more question regarding the comment in
 tree-ssa-ccp.c file -
 
 
   /* Note that for propagation purposes, we are only interested in
  visiting statements that load the exact same memory reference
  stored here.  Those statements will have the exact same list
  of virtual uses, so it is enough to set the output of this
  statement to be its first virtual definition.  */
   *output_p = first_vdef (stmt);
 
 I wonder if this comment is true also if the vuses are not immediate as
 in stmt no. 1 in the following example:
 
  1) arr[i].x = tmp1;
  ...
  2) arr[i].y = tmp2;
  ...
  3) reg1 = arr[i].x;
  ...
  4) arr[i].z = tmp2;
  ...
  5) reg2 = arr[i].x;
 
 Is it because we are looking for the exact same memory reference (although
 not immediate) it is enough to look at only first_vdef of every store we
 encounter in our walk through the virtual def-use chain; or by looking
 only at the first vdef we could miss vuses that could have been reached
 by vdefs other than the first one?

Well, in the current code we do not walk virtual use-def chains, but make
sure all virtual operands are the same.  So the above works.  If you
walk the chains you need to make sure that all virtual operands of the
final def you are using are the same - then the above will work again.

Richard.

Re: PR other/32351 [Was: Re: GCC Status Report (2007-06-15)]

2007-06-18 Thread Paolo Bonzini




It is a libdecnumber bug, which only affects x86.


The patch is ok.

Paolo

Re: Resuming SPEC performance tracking at RedHat

2007-06-18 Thread Michael Matz

Hi,

On Fri, 15 Jun 2007, Richard Guenther wrote:

 so, no PPC testing from us (the old testing machine died and we don't 
 have a replacement for it).

Actually it's back, and just needs to be partitioned and set up.


Ciao,
Michael.

Re: Activate -mrecip with -ffast-math?


On 6/17/07, Uros Bizjak [EMAIL PROTECTED] wrote:

Hello!

 I was wondering if there are objects to automatically activating Uros'
 new -mrecip flag when -ffast-math is specified. It looks like a good
 match since -mrecip is exactly about fast non-precise mathematics.

There is a discussion in gcc-patches@ mailing list about this topic, in
Re: [PATCH, middle-end, i386]: reciprocal rsqrt pass + full recip x86
backend support thread [1]. The main problem is, that one of the
polyhedron tests segfaults with this patch (not the problem of the recip
patch, but usage of questionable FP equivalence tests and FP indexes in
the array).


Of course there are cases with every optimization enabled by -ffast-math that
can break existing programs.  Just that we know of one case beforehand shouldn't
prevent us from enabling -mrecip at -ffast-math (provided -mno-recip
still works,
regardless if provided before or after -ffast-math).  [We'll at least
get some more
testing coverage this way]

Richard.

Re: Some thoughts about steerring commitee work

 Dorit Nuzman wrote:
  H. J. Lu wrote:
 
  Why don't we turn on vectorizer at -O3 or even -O2, depending on
  ISA? I added -ftree-vectorize to BOOT_CFLAGS on x86-64. According to
  -ftree-vectorizer-verbose=1, there are 82 loops vectorized in
  gcc source. There are no regressions. There are not much changes
  in bootstrap time as well as make check time.
  We have about two dozen cases of packages that break when
  -ftree-vectorize is used.  I'm sure there are several more as we tend
to
  discourage such bug reports.
 
  If you could take the time to find the reduced testcases and file PRs
for
  these, that would be most appreciated.

 I believe the majority of them can be traced back to PR 25413.  For
 example building zlib with -O2 -march=pentium4 -ftree-vectorize will
 cause several apps that link to it (firefox, openoffice, poppler, etc.)
 to segfault.  The vectorizer generates movdqa instructions with datarefs
 that are not aligned on a 16 byte boundary.


there is an old patch floating around by Devang to address this problem (as
mentioned in the PR). we should push this forward, it's really a simple
fix. I'll try to get to it soonish

 Other than that, I went through the rest of our -ftree-vectorize bugs
 this morning and found that many of them have been fixed in 4.2, so the
 situation is much better than I originally thought.


cool!

thanks,
dorit


 --
 dirtyepic salesman said this vacuum's guaranteed
  gentoo org  it could suck an ancient virus from the sea
   9B81 6C9F E791 83BB 3AB3  5B2D E625 A073 8379 37E8 (0x837937E8)

Re: Activate -mrecip with -ffast-math?


On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote:

Of course there are cases with every optimization enabled by -ffast-math that
can break existing programs.  Just that we know of one case beforehand shouldn't
prevent us from enabling -mrecip at -ffast-math (provided -mno-recip
still works,
regardless if provided before or after -ffast-math).  [We'll at least
get some more
testing coverage this way]

Argh! Please do not make -ffast-math even more of a pain to work with
than it is already.
You have to enable it, on the whole compilation unit, to get anywhere
near decent performance; there's no escape: either you do not turn it
on and everything slows to a crawl, or you pay for not being able to
inline from another unit.

Until now, the contract was: you have to deal with (and contain) NaN
and infinities. Fair enough, even if tricky that remained manageable.
But if i can't expect a mere division by 0, or sqrt of 0 (quite common
with FTZ/DAZ on) to give me respectively an infinite and 0 and instead
get a NaN (which i can't filter, you remember?) because of the NR
round, that's pure madness.

So please, for the love of everything's sacred, leave such stunts out
of  -ffast-math.

PS: and it's not like such reciprocals + NR couldn't be done with
intrinsics or easily handle such common case.

Re: Some thoughts about steerring commitee work

Daniel Berlin [EMAIL PROTECTED] wrote on 17/06/2007 18:18:19:

...

 The whole purpose of SLP was to enable straight line code
 vectorization outside of loops.

I wouldn't say that's the whole purpose of SLP. I think the purpose and
beauty of SLP is that it's a simple algorithm that makes vectorization
(including vectorization of loops) easier by removing the need to prove all
kinds of properties about the loop as a whole, as well as the need to
transform loops to make them vectorizable. The fact that this scheme also
works out of loops is a neat property because it makes loop-vectorization a
special case of SLP. However, as far as I know (also from talking with the
SLP authors) pretty much all the opportunities they had found at the time
were in loops. Also, a lot of the SLP based work that followed it focused
on loops, and analysis to determine by how much to unroll loops in order to
accommodate SLP. So, in reality, as always, there are no free meals - you
often really have to compensate for the simple loop-ignorant SLP analysis
by doing a lot of loop-level analysis and transformations before hand.

While it cannot replace the classic SLP algorithm out-of-loops (e.g. for
completely unrolled loops), doing SLP in loops actually makes a lot of
sense, IMHO. It lets us leverage already existing infrastructure (the slp
patch recently committed to autovect-branch is really not big), and opens
up a lot of opportunities we couldn't vectorize before (partially unrolled
loops, partially vectorizable loops, accesses to consecutive struct fields,
and in the future also permutations), while taking advantage of the
loop-context when it helps you do things more efficiently. In any case,
we'll have to have a much better cost model before we start packing random
sequences of stmts out of loops.

 Simply because you can't find cases in SPEC2000 doesn't mean it's not
useful.

I don't know where you're taking this from. SPEC2000 is really not so
interesting vectorization wise, inside or outside loops.

dorit

Re: missed vectorization (was Some thoughts about steerring commitee work)

Tim Prince [EMAIL PROTECTED] wrote on 17/06/2007 19:47:10:

 [EMAIL PROTECTED] wrote:
  Tim Prince [EMAIL PROTECTED] wrote on 17/06/2007 04:15:56:
 
  [EMAIL PROTECTED] wrote:
  On Sat, Jun 16, 2007 at 06:54:46PM +0300, Dorit Nuzman wrote:
  There are quite a few known simple cases which vectorizer fails to
  vectorize.
  by known you mean there are open missed-optimization PRs for them?
  (if
  Yes, that is what I meant.
 
  I'd be happy to file some PRs along this line, if there is interest.
C
 
  yes, there is
 
  or C++, if there's more interest in that than in Fortran.  But,
gfortran
  fails to vectorize more than 50% of the stuff I run into every day,
  including most everything which involves distinct sections of the same
  array or COMMON block.
 
  I thought there was already a PR opened for this issue (probably by
Toon),
  but I can't find it :-(
 
  thanks,
  dorit
 
 There are several issues.  EQUIVALENCE produces such a problem (PR32373)
 as do various kinds of references to multiple sections of the same array
 (PR32375,32376,32377,32378,32379,32380).  Only 2 of those PRs involve
 actual source/destination overlap, where the vectorizer would have to
 choose the correct direction (loop reversed or not).
 In the bigger case (PR32380) there are loops which vectorize in
 isolation but not in the presence of other loops.


thanks for taking the time to extract the testcases and open the PRs. I
guess the discussion can continue in bugzilla now...

 There are existing PRs on a somewhat similar issue involving type
 casting in C. IMHO, not vectorizing those might seem excusable.


I think we should teach the vectorizer to handle those as well (another
issue I've been wanting to get to in a while...)

thanks,
dorit

 Thanks,
 Tim

Re: Activate -mrecip with -ffast-math?


On 6/18/07, tbp [EMAIL PROTECTED] wrote:

On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote:
 Of course there are cases with every optimization enabled by -ffast-math that
 can break existing programs.  Just that we know of one case beforehand 
shouldn't
 prevent us from enabling -mrecip at -ffast-math (provided -mno-recip
 still works,
 regardless if provided before or after -ffast-math).  [We'll at least
 get some more
 testing coverage this way]
Argh! Please do not make -ffast-math even more of a pain to work with
than it is already.
You have to enable it, on the whole compilation unit, to get anywhere
near decent performance; there's no escape: either you do not turn it
on and everything slows to a crawl, or you pay for not being able to
inline from another unit.

Until now, the contract was: you have to deal with (and contain) NaN
and infinities. Fair enough, even if tricky that remained manageable.


No, that's not the contract with -ffast-math.  Note that -ffast-math
enables -funsafe-math-optimizations which is allowed to change results
(add/remove rounding operations, contract expressions, do transforms
like a/b to a * 1/b, do transformations that get you bigger errors than
0.5ulp, etc.)


But if i can't expect a mere division by 0, or sqrt of 0 (quite common
with FTZ/DAZ on) to give me respectively an infinite and 0 and instead
get a NaN (which i can't filter, you remember?) because of the NR
round, that's pure madness.


Hm, which particular case are you concerned about (maybe it was mentioned,
but I don't remember the details)?  Note that -ffast-math enables
-ffinite-math-only as well, so the compiler assumes nothing will result in
NaNs or Infs.


So please, for the love of everything's sacred, leave such stunts out
of  -ffast-math.


Well - certainly another reason for the Math BOF ;)  We all expect very
different things from -ffast-math or -funsafe-math-optimizations.


PS: and it's not like such reciprocals + NR couldn't be done with
intrinsics or easily handle such common case.


Well, most optimization challenges can be solved if we are allowed to
touch the source ;)

Thanks,
Richard.

[M16C] : 20 bit data access

2007-06-18 Thread Naveen H.S.

Hi,

We have come up with two possible solutions to solve the 20 bit data
access problem in m16c targets.
We are very grateful for all the suggestions until now on this issue.

Solution 1 is based on the discussion at the following link -:
http://gcc.gnu.org/ml/gcc/2007-04/msg00402.html

- 2 new attributes far_data (to use external memory for data
storage) and far_rodata will be added.
- Non-constant initialized variables specified with the attribute 
far_data will be placed in the section .fdata (far memory).
- Constant variables specified with the attribute far_rodata will
be placed in the section .frodata (far memory). 
- Default linker script will be modified for the addition of two new
sections .fdata and frodata.
- LDE/STE instructions will be used to access the variables specified
with the attribute far_data and far_rodata.
- Default constant strings (ex. strings in printf) and constant 
variable without the attribute far_rodata will be placed in Section 
'.rodata' (current implementation).
- The section '.rodata' has to be still copied from ROM to RAM 
(current implementation for M16C devices that does not have Flash in
near memory).

Solution 2 based on the discussion at the following link -:
http://sources.redhat.com/ml/binutils/2007-05/msg00381.html

- By default, LDE instructions will be used to access the entire
constant
variables. 
- A new target specific option -mno-far-constdata will be added.
- This option can be used to override default generation of LDE 
instructions. 'MOV' instruction will be used to access these variables
instead (current implementation).
- New attribute far_data (to use external memory for data storage) 
will be added.
- Non-constant initialized variables specified with the attribute
far_data will be placed in a section .fdata (far memory). 
- LDE/STE instructions will be used to access the non-constant 
variables specified with the attribute far_data.
- New attribute near_rodata will be added. This attribute will be
used for the latest M16C targets that have 4K/8K flash in near Memory.
- Constant variables specified with the attribute near_rodata will 
be placed in a section .nrodata (near memory). 
- MOV instructions will be used to access the constant variables 
specified with the attribute near_rodata.
- Default linker script will be modified for placing the default 
section '.rodata' in Far Memory.
- Default linker script will be modified for the addition of a new 
section '.fdata' in far memory and nrodata in near memory.

Please comment on above proposed solutions and also let us know the
possibility of acceptance of any of these by FSF.

Regards,
Naveen.H.S.
KPIT Cummins Infosystems Ltd,
Pune (INDIA) 

~~  
Free download of GNU based tool-chains for Renesas' SH, H8, R8C, M16C   
and M32C Series. The following site also offers free technical support  
to its users. Visit http://www.kpitgnutools.com for details.
Latest versions of KPIT GNU tools were released on June 1, 2007.
~~

Re: More vectorizer testcases?

Giovanni Bajo [EMAIL PROTECTED] wrote on 17/06/2007 20:43:15:

 Hi Dorit,

 some years ago I posted these testcases to Bugzilla's GCC:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18439

 It looks like none of those are vectorized as of GCC 4.3. I read today
that
 you're asking for more vectorizer testsuite so I was wondering:

 1) Shall we add a GCC bugzilla component for the vectorizer? Currently
the
 bugs are filed under tree-optimization which might be a little
toogeneric,
 these days.


Maybe, I'm not sure. A lot of times the vectorizer missed-optimization bugs
depend on other components of the compiler, so I don't know if filing them
under a vectorizer component would help. I don't have a strong preference
about this.

 2) Do you need more testcases from geometric code like those above? Those
3
 above are pretty simple in fact, so I doubt more complex ones can beof
help,
 but I can extract something more from my code if you want...

these 3 are actually not so simple... the main thing that's blocking 2 of
them right now is that they need support for stores with gaps, which can be
added except the other problem is that the vectorizer thinks it's not
profitable to vectorize them (or rather 2 of them. as does ICC by the way).
Since the time you opened these PRs we came quite a bit closer to
vectorizing these (the support for interleaved accesses and for multiple
data-types were also required). It will be fun to add the last missing bit
- the support for the stores-with-gaps. I hope we'll get to it before too
long...
If you have other (hot) code examples that expose different missing
features I think that's always interesting to know about (but if it's like
the codes above then maybe it will not have much added value...).

thanks!

dorit

 --
 Giovanni Bajo

Re: Incorrect bitfield aliasing with Tree SSA


On 6/18/07, Eric Botcazou [EMAIL PROTECTED] wrote:

 I'm completely unsurprised this is broken at the tree level given how
 it is implemented

Nice tautology. :-)  You have resisted implementing anything at the tree level
to fix the problem and now you're complaining there is a problem...


Pardon?

If it was designed properly in the first place, there simply would *be
no problem at the tree level*, because nothing would have broken.
Everyone in the bug reports in question has told you this, not just me.



Let's try and devise something plausible at the tree level.  If we eventually
fail, we could indeed consider disabling the optimization at the RTL level.



So far you guys have resisted what seem like perfectly reasonable
solutions by Adam

Re: Activate -mrecip with -ffast-math?

2007-06-18 Thread Uros Bizjak


On 6/18/07, tbp [EMAIL PROTECTED] wrote:


Until now, the contract was: you have to deal with (and contain) NaN
and infinities. Fair enough, even if tricky that remained manageable.
But if i can't expect a mere division by 0, or sqrt of 0 (quite common
with FTZ/DAZ on) to give me respectively an infinite and 0 and instead
get a NaN (which i can't filter, you remember?) because of the NR
round, that's pure madness.


Attached patch to should fix these troubles for the cost of 2 extra
clocks. The trick is to limit the result just below infinity for
rsqrt, and this keeps 0.0*(inf-) - 0.0.

Uros.

Index: i386.c
===
--- i386.c  (revision 125790)
+++ i386.c  (working copy)
@@ -22590,7 +22590,7 @@ void ix86_emit_swdivsf (rtx res, rtx a,
void ix86_emit_swsqrtsf (rtx res, rtx a, enum machine_mode mode,
bool recip)
{
-  rtx x0, e0, e1, e2, e3, three, half;
+  rtx x0, e0, e1, e2, e3, three, half, bignum;

  x0 = gen_reg_rtx (mode);
  e0 = gen_reg_rtx (mode);
@@ -22600,15 +22600,18 @@ void ix86_emit_swsqrtsf (rtx res, rtx a,

  three = CONST_DOUBLE_FROM_REAL_VALUE (dconst3, SFmode);
  half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, SFmode);
+  bignum = gen_lowpart (SFmode, GEN_INT (0x7f7f));

  if (VECTOR_MODE_P (mode))
{
  three = ix86_build_const_vector (SFmode, true, three);
  half = ix86_build_const_vector (SFmode, true, half);
+  bignum = ix86_build_const_vector (SFmode, true, bignum);
}

  three = force_reg (mode, three);
  half = force_reg (mode, half);
+  bignum = force_reg (mode, bignum);

  /* sqrt(a) = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a))
 1.0 / sqrt(a) = 0.5 * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) */
@@ -22617,6 +22620,9 @@ void ix86_emit_swsqrtsf (rtx res, rtx a,
  emit_insn (gen_rtx_SET (VOIDmode, x0,
 gen_rtx_UNSPEC (mode, gen_rtvec (1, a),
 UNSPEC_RSQRT)));
+  emit_insn (gen_rtx_SET (VOIDmode, x0,
+ gen_rtx_SMIN (mode, x0, bignum)));
+
  /* e0 = x0 * a */
  emit_insn (gen_rtx_SET (VOIDmode, e0,
 gen_rtx_MULT (mode, x0, a)));

[Bug tree-optimization/19910] [4.2/4.3 regression] ICE with -ftree-loop-linear



--- Comment #12 from pinskia at gcc dot gnu dot org  2007-06-18 06:01 
---
This no longer crashes for me.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19910

[Bug tree-optimization/21485] [4.0/4.1/4.2/4.3 Regression] codegen regression due to PRE increasing register pressure (missing load PRE really)



--- Comment #22 from pinskia at gcc dot gnu dot org  2007-06-18 06:12 
---
This is basically fixed by the pointer_plus except we still have some
combinable code (though this is not PRE's fault); see
http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01996.html for how to fix that
issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485

[Bug middle-end/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)



--- Comment #11 from pinskia at gcc dot gnu dot org  2007-06-18 06:16 
---
*** Bug 30958 has been marked as a duplicate of this bug. ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784

[Bug tree-optimization/30958] [4.3 Regression] ice for legal code with -ftree-vectorize -Os (-m64)



--- Comment #7 from pinskia at gcc dot gnu dot org  2007-06-18 06:16 ---


*** This bug has been marked as a duplicate of 30784 ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30958

[Bug tree-optimization/32383] [4.3 regression] ICE with reciprocals and -ffast-math

2007-06-18 Thread ubizjak at gmail dot com



--- Comment #2 from ubizjak at gmail dot com  2007-06-18 06:41 ---
Patch in testing.


-- 

ubizjak at gmail dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |ubizjak at gmail dot com
   |dot org |
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2007-06-18 06:41:08
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32383

[Bug middle-end/32176] [4.3 Regression] ICE tree-type mismatch: expected integer_cst, have plus_expr in int_cst_value, at tree.c:7720



--- Comment #7 from pinskia at gcc dot gnu dot org  2007-06-18 06:42 ---
There is a cast which confuses SCEV.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32176

[Bug libstdc++/32354] libstdc++.so.6 missing RPATH

2007-06-18 Thread stephan dot bergmann at sun dot com



--- Comment #5 from stephan dot bergmann at sun dot com  2007-06-18 06:54 
---
Re #3:  http://gcc.gnu.org/onlinedocs/libstdc++/install.html#usage is not
relevant here.  That info is about how client code can find libstdc++.so.  This
issue is about how libstdc++.so can find the libraries it itself depends on.

Re #4:  Not sure I understand you completely.  If you move libstdc++.so and
libgcc_s.so somewhere else but keep their relative locations intact (i.e., both
in the same directory), RPATH=$ORIGIN in libstdc++.so still works to locate the
matching libgcc_s.so.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32354

[Bug middle-end/20983] [4.0/4.1/4.2/4.3 Regression] varargs functions force va_list variable to stack unnecessarily