Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Torvald Riegel
On Tue, 2014-02-18 at 23:48 +, Peter Sewell wrote:
 On 18 February 2014 20:43, Torvald Riegel trie...@redhat.com wrote:
  On Tue, 2014-02-18 at 12:12 +, Peter Sewell wrote:
  Several of you have said that the standard and compiler should not
  permit speculative writes of atomics, or (effectively) that the
  compiler should preserve dependencies.  In simple examples it's easy
  to see what that means, but in general it's not so clear what the
  language should guarantee, because dependencies may go via non-atomic
  code in other compilation units, and we have to consider the extent to
  which it's desirable to limit optimisation there.
 
  [...]
 
  2) otherwise, the language definition should prohibit it but the
 compiler would have to preserve dependencies even in compilation
 units that have no mention of atomics.  It's unclear what the
 (runtime and compiler development) cost of that would be in
 practice - perhaps Torvald could comment?
 
  If I'm reading the standard correctly, it requires that data
  dependencies are preserved through loads and stores, including nonatomic
  ones.  That sounds convenient because it allows programmers to use
  temporary storage.
 
 The standard only needs this for consume chains,

That's right, and the runtime cost / implementation problems of
mo_consume was what I was making statements about.  Sorry if that wasn't
clear.



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Torvald Riegel
On Tue, 2014-02-18 at 22:52 +0100, Peter Zijlstra wrote:
   4.Some drivers allow user-mode code to mmap() some of their
 state.  Any changes undertaken by the user-mode code would
 be invisible to the compiler.
  
  A good point, but a compiler that doesn't try to (incorrectly) assume
  something about the semantics of mmap will simply see that the mmap'ed
  data will escape to stuff if can't analyze, so it will not be able to
  make a proof.
  
  This is different from, for example, malloc(), which is guaranteed to
  return fresh nonaliasing memory.
 
 The kernel side of this is different.. it looks like 'normal' memory, we
 just happen to allow it to end up in userspace too.
 
 But on that point; how do you tell the compiler the difference between
 malloc() and mmap()? Is that some function attribute?

Yes:

malloc
The malloc attribute is used to tell the compiler that a
function may be treated as if any non-NULL pointer it returns
cannot alias any other pointer valid when the function returns
and that the memory has undefined content. This often improves
optimization. Standard functions with this property include
malloc and calloc. realloc-like functions do not have this
property as the memory pointed to does not have undefined
content.

I'm not quite sure whether GCC assumes malloc() to be indeed C's malloc
even if the function attribute isn't used, and/or whether that is
different for freestanding environments.



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Torvald Riegel
On Tue, 2014-02-18 at 14:58 -0800, Paul E. McKenney wrote:
 On Tue, Feb 18, 2014 at 10:40:15PM +0100, Torvald Riegel wrote:
  xagsmtp4.20140218214207.8...@vmsdvm9.vnet.ibm.com
  X-Xagent-Gateway: vmsdvm9.vnet.ibm.com (XAGSMTP4 at VMSDVM9)
  
  On Tue, 2014-02-18 at 09:16 -0800, Paul E. McKenney wrote:
   On Tue, Feb 18, 2014 at 08:49:13AM -0800, Linus Torvalds wrote:
On Tue, Feb 18, 2014 at 7:31 AM, Torvald Riegel trie...@redhat.com 
wrote:
 On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote:
 And exactly because I know enough, I would *really* like atomics to 
 be
 well-defined, and have very clear - and *local* - rules about how 
 they
 can be combined and optimized.

 Local?

Yes.

So I think that one of the big advantages of atomics over volatile is
that they *can* be optimized, and as such I'm not at all against
trying to generate much better code than for volatile accesses.

But at the same time, that can go too far. For example, one of the
things we'd want to use atomics for is page table accesses, where it
is very important that we don't generate multiple accesses to the
values, because parts of the values can be change *by*hardware* (ie
accessed and dirty bits).

So imagine that you have some clever global optimizer that sees that
the program never ever actually sets the dirty bit at all in any
thread, and then uses that kind of non-local knowledge to make
optimization decisions. THAT WOULD BE BAD.
   
   Might as well list other reasons why value proofs via whole-program
   analysis are unreliable for the Linux kernel:
   
   1.As Linus said, changes from hardware.
  
  This is what's volatile is for, right?  (Or the weak-volatile idea I
  mentioned).
  
  Compilers won't be able to prove something about the values of such
  variables, if marked (weak-)volatile.
 
 Yep.
 
   2.Assembly code that is not visible to the compiler.
 Inline asms will -normally- let the compiler know what
 memory they change, but some just use the memory tag.
 Worse yet, I suspect that most compilers don't look all
 that carefully at .S files.
   
 Any number of other programs contain assembly files.
  
  Are the annotations of changed memory really a problem?  If the memory
  tag exists, isn't that supposed to mean all memory?
  
  To make a proof about a program for location X, the compiler has to
  analyze all uses of X.  Thus, as soon as X escapes into an .S file, then
  the compiler will simply not be able to prove a thing (except maybe due
  to the data-race-free requirement for non-atomics).  The attempt to
  prove something isn't unreliable, simply because a correct compiler
  won't claim to be able to prove something.
 
 I am indeed less worried about inline assembler than I am about files
 full of assembly.  Or files full of other languages.
 
  One reason that could corrupt this is that if program addresses objects
  other than through the mechanisms defined in the language.  For example,
  if one thread lays out a data structure at a constant fixed memory
  address, and another one then uses the fixed memory address to get
  access to the object with a cast (e.g., (void*)0x123).
 
 Or if the program uses gcc linker scripts to get the same effect.
 
   3.Kernel modules that have not yet been written.  Now, the
 compiler could refrain from trying to prove anything about
 an EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() variable, but there
 is currently no way to communicate this information to the
 compiler other than marking the variable volatile.
  
  Even if the variable is just externally accessible, then the compiler
  knows that it can't do whole-program analysis about it.
  
  It is true that whole-program analysis will not be applicable in this
  case, but it will not be unreliable.  I think that's an important
  difference.
 
 Let me make sure that I understand what you are saying.  If my program has
 extern int foo;, the compiler will refrain from doing whole-program
 analysis involving foo?

Yes.  If it can't be sure to actually have the whole program available,
it can't do whole-program analysis, right?  Things like the linker
scripts you mention or other stuff outside of the language semantics
complicates this somewhat, and maybe some compilers assume too much.
There's also the point that data-race-freedom is required for
non-atomics even if those are shared with non-C-code.

But except those corner cases, a compiler sees whether something escapes
and becomes visible/accessible to other entities.

 Or to ask it another way, when you say
 whole-program analysis, are you restricting that analysis to the
 current translation unit?

No.  I mean, you can do analysis of the current translation unit, but
that will do just that; if the variable, for example, is accessible
outside of this translation unit, the compiler can't make a

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Torvald Riegel
On Tue, 2014-02-18 at 22:47 +0100, Peter Zijlstra wrote:
 On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote:
  Yes, I do.  But that seems to be volatile territory.  It crosses the
  boundaries of the abstract machine, and thus is input/output.  Which
  fraction of your atomic accesses can read values produced by hardware?
  I would still suppose that lots of synchronization is not affected by
  this.
 
 Its not only hardware; also the kernel/user boundary has this same
 problem. We cannot a-priory say what userspace will do; in fact, because
 we're a general purpose OS, we must assume it will willfully try its
 bestest to wreck whatever assumptions we make about its behaviour.

That's a good note, and I think a distinct case from those below,
because here you're saying that you can't assume that the userspace code
follows the C11 semantics ...

 We also have loadable modules -- much like regular userspace DSOs -- so
 there too we cannot say what will or will not happen.
 
 We also have JITs that generate code on demand.

.. whereas for those, you might assume that the other code follows C11
semantics and the same ABI, which makes this just a normal case already
handled as (see my other replies nearby in this thread).

 And I'm absolutely sure (with the exception of the JITs, its not an area
 I've worked on) that we have atomic usage across all those boundaries.

That would be fine as long as all involved parties use the same memory
model and ABI to implement it.

(Of course, I'm assuming here that the compiler is aware of sharing with
other entities, which is always the case except in those corner case
like accesses to (void*)0x123 magically aliasing with something else).



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Peter Zijlstra
On Wed, Feb 19, 2014 at 12:07:02PM +0100, Torvald Riegel wrote:
  Its not only hardware; also the kernel/user boundary has this same
  problem. We cannot a-priory say what userspace will do; in fact, because
  we're a general purpose OS, we must assume it will willfully try its
  bestest to wreck whatever assumptions we make about its behaviour.
 
 That's a good note, and I think a distinct case from those below,
 because here you're saying that you can't assume that the userspace code
 follows the C11 semantics ...

Right; we can malfunction in those cases though; as long as the
malfunctioning happens on the userspace side. That is, whatever
userspace does should not cause the kernel to crash, but userspace
crashing itself, or getting crap data or whatever is its own damn fault
for not following expected behaviour.

To stay on topic; if the kernel/user interface requires memory ordering
and userspace explicitly omits the barriers all malfunctioning should be
on the user. For instance it might loose a fwd progress guarantee or
data integrity guarantees.

In specific, given a kernel/user lockless producer/consumer buffer, if
the user-side allows the tail write to happen before its data reads are
complete, the kernel might overwrite the data its still reading.

Or in case of futexes, if the user side doesn't use the appropriate
operations its lock state gets corrupt but only userspace should suffer.

But yes, this does require some care and consideration from our side.


Stack layout change during lra

2014-02-19 Thread Joey Ye
Vlad,

When fixing PR60169, I found that reload fail to assert
verify_initial_elim_offsets ()
  if (insns_need_reload != 0 || something_needs_elimination
  || something_needs_operands_changed)
{
  HOST_WIDE_INT old_frame_size = get_frame_size ();

  reload_as_needed (global);

  gcc_assert (old_frame_size == get_frame_size ());

  gcc_assert (verify_initial_elim_offsets ());
}

The reason is that stack layout changes during reload_as_needed as a result
of a thumb1 backend heuristic.

I have a patch to make sure the heuristic doesn't change stack layout during
and after reload, and the assertion disappeared. However, I'm not sure if it
will also be a problem in lra. Here is the question more specific:

Is that any chance during lra_in_progress that: stack layout can no longer
be altered, but insns can still be added or changed?

Thanks,
Joey







Re: ARM inline assembly usage in Linux kernel

2014-02-19 Thread Richard Sandiford
Andrew Pinski pins...@gmail.com writes:
 On Tue, Feb 18, 2014 at 6:56 PM, Saleem Abdulrasool
 compn...@compnerd.org wrote:
 Hello.

 I am sending this at the behest of Renato.  I have been working on the ARM
 integrated assembler in LLVM and came across an interesting item in the Linux
 kernel.

 I am wondering if this is an unstated covenant between the kernel and GCC or
 simply a clever use of an unintended/undefined behaviour.

 The Linux kernel uses the *compiler* as a fancy preprocessor to generate a
 specially crafted assembly file.  This file is then post-processed via sed to
 generate a header containing constants which is shared across assembly and C
 sources.

 In order to clarify the question, I am selecting a particular example and
 pulling out the relevant bits of the source code below.

 #define DEFINE(sym, val) asm volatile(\n- #sym  %0  #val : : i (val))

 #define __NR_PAGEFLAGS 22

 void definitions(void) {
   DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS);
 }

 This is then assembled to generate the following:

 -NR_PAGEFLAGS #22 __NR_PAGEFLAGS

 This will later be post-processed to generate:

 #define NR_PAGELAGS 22 /* __NR_PAGEFLAGS */

 By using the inline assembler to evaluate (constant) expressions into 
 constant
 values and then emit that using a special identifier (-) is a fairly clever
 trick.  This leads to my question: is this just use of an unintentional
 feature or something that was worked out between the two projects.

If the output is being post-processed by sed then maybe you could put
a comment character at the beginning of the line and sed it out?
But I tend to agree with Andrew that for -S output the compiler should
be prepared to accept asm strings that it can't parse, even if the integrated
assembler thinks it understands every instruction.

 I don't see why this is a bad use of the inline-asm.  GCC does not
 know and is not supposed to know what the string inside the inline-asm
 is going to be.  In fact if you have a newer assembler than the
 compiler, you could use instructions that GCC does not even know
 about.

Yeah, FWIW, I agree this is a valid use of inline asm.  The use of volatile
in a reachable part of definitions() means that the asm (and thus the
asm string) must be kept if definitions() is kept.

I doubt the idea was agreed with GCC developers because no GCC changes were
needed to use inline asm this way.

 This is the purpose of inline-asm.  I think it was a bad
 design decision on LLVM/clang's part that it would check the assembly
 code up front.

Being able to parse it is a useful feature.  E.g. it means you can get an
accurate byte length for the asm, which is something that we otherwise
have to guess by multiplying the number of lines by a constant factor.
(And that's wrong for MIPS assembly macros, unless you use a very
conservative constant factor.)

I agree that having an unrecognised asm shouldn't be a hard error until
assembly time though.  Saleem, is the problem that this is being rejected
earlier?

Thanks,
Richard



Re: TYPE_BINFO and canonical types at LTO

2014-02-19 Thread Richard Biener
On Tue, 18 Feb 2014, Jan Hubicka wrote:

   Non-ODR types born from other frontends will then need to be made to 
   alias all the ODR variants that can be done by storing them into the 
   current canonical type hash.
   (I wonder if we want to support cross language aliasing for non-POD?)
  
  Surely for accessing components of non-POD types, no?  Like
  
  class Foo {
  Foo();
  int *get_data ();
  int *data;
  } glob_foo;
  
  extern C int *get_foo_data() { return glob_foo.get_data(); }
 
 OK, if we want to support this, then we want to merge.
 What about types with vtbl pointer? :)

I can easily create a C struct variant covering that.  Basically
in _practice_ I can inter-operate with any language from C if I
know its ABI.  Do we really want to make this undefined?  See
the (even standard) Fortran - C interoperability spec.  I'm sure
something exists for Ada interoperating with C (or even C++).

  ?  But you are talking about the tree merging part using ODR info
  to also merge types which differ in completeness of contained
  pointer types, right?  (exactly equal cases should be already merged)
 
 Actually I was speaking of canonical types here. I want to preserve more 
 of TBAA via honoring ODR and local types.

So, are you positive there will be a net gain in optimization when
doing that?  Please factor in the surprises you'll get when code
gets miscompiled because of slight ODR violations or interoperability
that no longer works.

 I want to change lto to not 
 merge canonical types for pairs of types of same layout (i.e. equivalent 
 in the current canonical type definition) but with different mangled 
 names.

Names are nothing ;)  In C I very often see different _names_ used
in headers vs. implementation (when the implementation uses a different
internal header).  You have struct Foo; in public headers vs.
struct Foo_impl; in the implementation.

 I also want it to never merge when types are local. For 
 inter-language TBAA we will need to ensure aliasing in between non-ODR 
 type of same layout and all unmerged variants of ODR type.
  Can it be 
 done by attaching chains of ODR types into the canonical type hash and 
 when non-ODR type appears, just make it alias with all of them?

No, how would that work?

 It would make sense to ODR merge in tree merging, too, but I am not sure if
 this fits the current design, since you would need to merge SCC components of
 different shape then that seems hard, right?

Right.  You'd lose the nice incremental SCC merging (where we haven't even
yet implemented the nicest way - avoid re-materializing the SCC until
we know it prevails).

 It may be easier to ODR merge after streaming (during DECL fixup) just to make
 WPA streaming cheaper and to reduce debug info size.  If you use
 -fdump-ipa-devirt, it will dump you ODR types that did not get merged (only
 ones with vtable pointers in them ATM) and there are quite long chains for
 firefox. Surely then hundreds of duplicated ODR types will end up in the 
 ltrans
 partition streams and they eventually hit debug output machinery.
 Eric sent me presentation about doing this in LLVM.
 http://llvm.org/devmtg/2013-11/slides/Christopher-DebugInfo.pdf

Debuginfo is sth completely separate and should be done separately
(early debug), avoiding to stream the types in the first place.

  
  The canonical type computation happens separately (only for prevailing
  types, of course), and there we already merge types which differ
  in completeness.  Canonical type merging is conservative the other
  way aroud - if we merge _all_ types to a single canonical type then
  TBAA is still correct (we get a single alias set).
 
 Yes, I think I understand that. One equivalence is kind of minimal so we merge
 only if we are sure there is no informationloss, other is maximal so we are
 sure that types that needs to be equivalent by whatever underlying langauge
 TBAA rules are actually equivalent.

The former is just not correct - it would mean that not merging at all
would be valid, which it is not (you'd create wrong-code all over the 
place).

We still don't merge enough (because of latent bugs that I didn't manage
to fix in time) - thus we do not merge all structurally equivalent types
right now.

   I also think we want explicit representation of types known to be local 
   to compilation unit - anonymous namespaces in C/C++, types defined 
   within function bodies in C and god knows what in Ada/Fortran/Java.
  
  But here you get into the idea of improving TBAA, thus having
  _more_ distinct canonical types?
 
 Yes.
  
  Just to make sure to not mix those two ;)
  
  And whatever frontend knowledge we want to excercise - please
  make sure we get a reliable way for the middle-end to see
  that frontend knowledge (no langhooks!).  Thus, make it
  middle-end knowledge.
 
 Sure that is what I am proposing - just have DECL_ASSEMBLER_NAME on TYPE_DECL
 and ODR flag. Middle-end when comparing types will test ODR flag and if flag
 is 

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Torvald Riegel
On Tue, 2014-02-18 at 14:14 -0800, Linus Torvalds wrote:
 On Tue, Feb 18, 2014 at 1:21 PM, Torvald Riegel trie...@redhat.com wrote:
 
  So imagine that you have some clever global optimizer that sees that
  the program never ever actually sets the dirty bit at all in any
  thread, and then uses that kind of non-local knowledge to make
  optimization decisions. THAT WOULD BE BAD.
 
  Do you see what I'm aiming for?
 
  Yes, I do.  But that seems to be volatile territory.  It crosses the
  boundaries of the abstract machine, and thus is input/output.  Which
  fraction of your atomic accesses can read values produced by hardware?
  I would still suppose that lots of synchronization is not affected by
  this.
 
 The hardware can change things case is indeed pretty rare.
 
 But quite frankly, even when it isn't hardware, as far as the compiler
 is concerned you have the exact same issue - you have TLB faults
 happening on other CPU's that do the same thing asynchronously using
 software TLB fault handlers. So *semantically*, it really doesn't make
 any difference what-so-ever if it's a software TLB handler on another
 CPU, a microcoded TLB fault, or an actual hardware path.

I think there are a few semantic differences:

* If a SW handler uses the C11 memory model, it will synchronize like
any other thread.  HW might do something else entirely, including
synchronizing differently, not using atomic accesses, etc.  (At least
that's the constraints I had in mind).

* If we can treat any interrupt handler like Just Another Thread, then
the next question is whether the compiler will be aware that there is
another thread.  I think that in practice it will be: You'll set up the
handler in some way by calling a function the compiler can't analyze, so
the compiler will know that stuff accessible to the handler (e.g.,
global variables) will potentially be accessed by other threads. 

* Similarly, if the C code is called from some external thing, it also
has to assume the presence of other threads.  (Perhaps this is what the
compiler has to assume in a freestanding implementation anyway...)

However, accessibility will be different for, say, stack variables that
haven't been shared with other functions yet; those are arguably not
reachable by other things, at least not through mechanisms defined by
the C standard.  So optimizing these should be possible with the
assumption that there is no other thread (at least as default -- I'm not
saying that this is the only reasonable semantics).

 So if the answer for all of the above is use volatile, then I think
 that means that the C11 atomics are badly designed.
 
 The whole *point* of atomic accesses is that stuff like above should
 JustWork(tm)

I think that it should in the majority of cases.  If the other thing
potentially accessing can do as much as a valid C11 thread can do, the
synchronization itself will work just fine.  In most cases except the
(void*)0x123 example (or linker scripts etc.) the compiler is aware when
data is made visible to other threads or other non-analyzable functions
that may spawn other threads (or just by being a plain global variable
accessible to other (potentially .S) translation units.

  Do you perhaps want a weaker form of volatile?  That is, one that, for
  example, allows combining of two adjacent loads of the dirty bits, but
  will make sure that this is treated as if there is some imaginary
  external thread that it cannot analyze and that may write?
 
 Yes, that's basically what I would want. And it is what I would expect
 an atomic to be. Right now we tend to use ACCESS_ONCE(), which is a
 bit of a misnomer, because technically we really generally want
 ACCESS_AT_MOST_ONCE() (but once is what we get, because we use
 volatile, and is a hell of a lot simpler to write ;^).
 
 So we obviously use volatile for this currently, and generally the
 semantics we really want are:
 
  - the load or store is done as a single access (atomic)
 
  - the compiler must not try to re-materialize the value by reloading
 it from memory (this is the at most once part)

In the presence of other threads performing operations unknown to the
compiler, that's what you should get even if the compiler is trying to
optimize C11 atomics.  The first requirement is clear, and the at most
once follows from another thread potentially writing to the variable.

The only difference I can see right now is that a compiler may be able
to *prove* that it doesn't matter whether it reloaded the value or not.
But this seems very hard to prove for me, and likely to require
whole-program analysis (which won't be possible because we don't know
what other threads are doing).  I would guess that this isn't a problem
in practice.  I just wanted to note it because it theoretically does
have a different semantics than plain volatiles.

 and quite frankly, volatile is a big hammer for this. In practice it
 tends to work pretty well, though, because in _most_ cases, there
 really is just 

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread David Lang

On Tue, 18 Feb 2014, Torvald Riegel wrote:


On Tue, 2014-02-18 at 22:40 +0100, Peter Zijlstra wrote:

On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote:

Well, that's how atomics that aren't volatile are defined in the
standard.  I can see that you want something else too, but that doesn't
mean that the other thing is broken.


Well that other thing depends on being able to see the entire program at
compile time. PaulMck already listed various ways in which this is
not feasible even for normal userspace code.

In particular; DSOs and JITs were mentioned.


No it doesn't depend on whole-program analysis being possible.  Because
if it isn't, then a correct compiler will just not do certain
optimizations simply because it can't prove properties required for the
optimization to hold.  With the exception of access to objects via magic
numbers (e.g., fixed and known addresses (see my reply to Paul), which
are outside of the semantics specified in the standard), I don't see a
correctness problem here.


Are you really sure that the compiler can figure out every possible thing that a 
loadable module or JITed code can access? That seems like a pretty strong claim.


David Lang


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Paul E. McKenney
On Wed, Feb 19, 2014 at 11:59:08AM +0100, Torvald Riegel wrote:
 On Tue, 2014-02-18 at 14:58 -0800, Paul E. McKenney wrote:
  On Tue, Feb 18, 2014 at 10:40:15PM +0100, Torvald Riegel wrote:
   xagsmtp4.20140218214207.8...@vmsdvm9.vnet.ibm.com
   X-Xagent-Gateway: vmsdvm9.vnet.ibm.com (XAGSMTP4 at VMSDVM9)
   
   On Tue, 2014-02-18 at 09:16 -0800, Paul E. McKenney wrote:
On Tue, Feb 18, 2014 at 08:49:13AM -0800, Linus Torvalds wrote:
 On Tue, Feb 18, 2014 at 7:31 AM, Torvald Riegel trie...@redhat.com 
 wrote:
  On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote:
  And exactly because I know enough, I would *really* like atomics 
  to be
  well-defined, and have very clear - and *local* - rules about how 
  they
  can be combined and optimized.
 
  Local?
 
 Yes.
 
 So I think that one of the big advantages of atomics over volatile is
 that they *can* be optimized, and as such I'm not at all against
 trying to generate much better code than for volatile accesses.
 
 But at the same time, that can go too far. For example, one of the
 things we'd want to use atomics for is page table accesses, where it
 is very important that we don't generate multiple accesses to the
 values, because parts of the values can be change *by*hardware* (ie
 accessed and dirty bits).
 
 So imagine that you have some clever global optimizer that sees that
 the program never ever actually sets the dirty bit at all in any
 thread, and then uses that kind of non-local knowledge to make
 optimization decisions. THAT WOULD BE BAD.

Might as well list other reasons why value proofs via whole-program
analysis are unreliable for the Linux kernel:

1.  As Linus said, changes from hardware.
   
   This is what's volatile is for, right?  (Or the weak-volatile idea I
   mentioned).
   
   Compilers won't be able to prove something about the values of such
   variables, if marked (weak-)volatile.
  
  Yep.
  
2.  Assembly code that is not visible to the compiler.
Inline asms will -normally- let the compiler know what
memory they change, but some just use the memory tag.
Worse yet, I suspect that most compilers don't look all
that carefully at .S files.

Any number of other programs contain assembly files.
   
   Are the annotations of changed memory really a problem?  If the memory
   tag exists, isn't that supposed to mean all memory?
   
   To make a proof about a program for location X, the compiler has to
   analyze all uses of X.  Thus, as soon as X escapes into an .S file, then
   the compiler will simply not be able to prove a thing (except maybe due
   to the data-race-free requirement for non-atomics).  The attempt to
   prove something isn't unreliable, simply because a correct compiler
   won't claim to be able to prove something.
  
  I am indeed less worried about inline assembler than I am about files
  full of assembly.  Or files full of other languages.
  
   One reason that could corrupt this is that if program addresses objects
   other than through the mechanisms defined in the language.  For example,
   if one thread lays out a data structure at a constant fixed memory
   address, and another one then uses the fixed memory address to get
   access to the object with a cast (e.g., (void*)0x123).
  
  Or if the program uses gcc linker scripts to get the same effect.
  
3.  Kernel modules that have not yet been written.  Now, the
compiler could refrain from trying to prove anything about
an EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() variable, but there
is currently no way to communicate this information to the
compiler other than marking the variable volatile.
   
   Even if the variable is just externally accessible, then the compiler
   knows that it can't do whole-program analysis about it.
   
   It is true that whole-program analysis will not be applicable in this
   case, but it will not be unreliable.  I think that's an important
   difference.
  
  Let me make sure that I understand what you are saying.  If my program has
  extern int foo;, the compiler will refrain from doing whole-program
  analysis involving foo?
 
 Yes.  If it can't be sure to actually have the whole program available,
 it can't do whole-program analysis, right?  Things like the linker
 scripts you mention or other stuff outside of the language semantics
 complicates this somewhat, and maybe some compilers assume too much.
 There's also the point that data-race-freedom is required for
 non-atomics even if those are shared with non-C-code.
 
 But except those corner cases, a compiler sees whether something escapes
 and becomes visible/accessible to other entities.

The traditional response to except those corner cases is of course
Murphy was an optimist.  ;-)

That said, point taken -- you 

Re: Stack layout change during lra

2014-02-19 Thread Vladimir Makarov

On 2/19/2014, 6:54 AM, Joey Ye wrote:

Vlad,

When fixing PR60169, I found that reload fail to assert
verify_initial_elim_offsets ()
   if (insns_need_reload != 0 || something_needs_elimination
   || something_needs_operands_changed)
 {
   HOST_WIDE_INT old_frame_size = get_frame_size ();

   reload_as_needed (global);

   gcc_assert (old_frame_size == get_frame_size ());

   gcc_assert (verify_initial_elim_offsets ());
 }

The reason is that stack layout changes during reload_as_needed as a result
of a thumb1 backend heuristic.

I have a patch to make sure the heuristic doesn't change stack layout during
and after reload, and the assertion disappeared. However, I'm not sure if it
will also be a problem in lra. Here is the question more specific:

Is that any chance during lra_in_progress that: stack layout can no longer
be altered, but insns can still be added or changed?



I believe LRA is less prone to the above reload problem because of its 
design.  You can change stack in the backend and all will work if you 
provide the right sfp to hfp/sp offsets.  In fact, LRA itself can 
allocate stack slots several times and add and change insns between 
these allocations.






Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Torvald Riegel
On Wed, 2014-02-19 at 07:14 -0800, Paul E. McKenney wrote:
 On Wed, Feb 19, 2014 at 11:59:08AM +0100, Torvald Riegel wrote:
  On Tue, 2014-02-18 at 14:58 -0800, Paul E. McKenney wrote:
   On Tue, Feb 18, 2014 at 10:40:15PM +0100, Torvald Riegel wrote:
xagsmtp4.20140218214207.8...@vmsdvm9.vnet.ibm.com
X-Xagent-Gateway: vmsdvm9.vnet.ibm.com (XAGSMTP4 at VMSDVM9)

On Tue, 2014-02-18 at 09:16 -0800, Paul E. McKenney wrote:
 On Tue, Feb 18, 2014 at 08:49:13AM -0800, Linus Torvalds wrote:
  On Tue, Feb 18, 2014 at 7:31 AM, Torvald Riegel 
  trie...@redhat.com wrote:
   On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote:
   And exactly because I know enough, I would *really* like atomics 
   to be
   well-defined, and have very clear - and *local* - rules about 
   how they
   can be combined and optimized.
  
   Local?
  
  Yes.
  
  So I think that one of the big advantages of atomics over volatile 
  is
  that they *can* be optimized, and as such I'm not at all against
  trying to generate much better code than for volatile accesses.
  
  But at the same time, that can go too far. For example, one of the
  things we'd want to use atomics for is page table accesses, where it
  is very important that we don't generate multiple accesses to the
  values, because parts of the values can be change *by*hardware* (ie
  accessed and dirty bits).
  
  So imagine that you have some clever global optimizer that sees that
  the program never ever actually sets the dirty bit at all in any
  thread, and then uses that kind of non-local knowledge to make
  optimization decisions. THAT WOULD BE BAD.
 
 Might as well list other reasons why value proofs via whole-program
 analysis are unreliable for the Linux kernel:
 
 1.As Linus said, changes from hardware.

This is what's volatile is for, right?  (Or the weak-volatile idea I
mentioned).

Compilers won't be able to prove something about the values of such
variables, if marked (weak-)volatile.
   
   Yep.
   
 2.Assembly code that is not visible to the compiler.
   Inline asms will -normally- let the compiler know what
   memory they change, but some just use the memory tag.
   Worse yet, I suspect that most compilers don't look all
   that carefully at .S files.
 
   Any number of other programs contain assembly files.

Are the annotations of changed memory really a problem?  If the memory
tag exists, isn't that supposed to mean all memory?

To make a proof about a program for location X, the compiler has to
analyze all uses of X.  Thus, as soon as X escapes into an .S file, then
the compiler will simply not be able to prove a thing (except maybe due
to the data-race-free requirement for non-atomics).  The attempt to
prove something isn't unreliable, simply because a correct compiler
won't claim to be able to prove something.
   
   I am indeed less worried about inline assembler than I am about files
   full of assembly.  Or files full of other languages.
   
One reason that could corrupt this is that if program addresses objects
other than through the mechanisms defined in the language.  For example,
if one thread lays out a data structure at a constant fixed memory
address, and another one then uses the fixed memory address to get
access to the object with a cast (e.g., (void*)0x123).
   
   Or if the program uses gcc linker scripts to get the same effect.
   
 3.Kernel modules that have not yet been written.  Now, the
   compiler could refrain from trying to prove anything about
   an EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() variable, but there
   is currently no way to communicate this information to the
   compiler other than marking the variable volatile.

Even if the variable is just externally accessible, then the compiler
knows that it can't do whole-program analysis about it.

It is true that whole-program analysis will not be applicable in this
case, but it will not be unreliable.  I think that's an important
difference.
   
   Let me make sure that I understand what you are saying.  If my program has
   extern int foo;, the compiler will refrain from doing whole-program
   analysis involving foo?
  
  Yes.  If it can't be sure to actually have the whole program available,
  it can't do whole-program analysis, right?  Things like the linker
  scripts you mention or other stuff outside of the language semantics
  complicates this somewhat, and maybe some compilers assume too much.
  There's also the point that data-race-freedom is required for
  non-atomics even if those are shared with non-C-code.
  
  But except those corner cases, a compiler sees whether something escapes
  and becomes visible/accessible 

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Torvald Riegel
On Wed, 2014-02-19 at 07:23 -0800, David Lang wrote:
 On Tue, 18 Feb 2014, Torvald Riegel wrote:
 
  On Tue, 2014-02-18 at 22:40 +0100, Peter Zijlstra wrote:
  On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote:
  Well, that's how atomics that aren't volatile are defined in the
  standard.  I can see that you want something else too, but that doesn't
  mean that the other thing is broken.
 
  Well that other thing depends on being able to see the entire program at
  compile time. PaulMck already listed various ways in which this is
  not feasible even for normal userspace code.
 
  In particular; DSOs and JITs were mentioned.
 
  No it doesn't depend on whole-program analysis being possible.  Because
  if it isn't, then a correct compiler will just not do certain
  optimizations simply because it can't prove properties required for the
  optimization to hold.  With the exception of access to objects via magic
  numbers (e.g., fixed and known addresses (see my reply to Paul), which
  are outside of the semantics specified in the standard), I don't see a
  correctness problem here.
 
 Are you really sure that the compiler can figure out every possible thing 
 that a 
 loadable module or JITed code can access? That seems like a pretty strong 
 claim.

If the other code can be produced by a C translation unit that is valid
to be linked with the rest of the program, then I'm pretty sure the
compiler has a well-defined notion of whether it does or does not see
all other potential accesses.  IOW, if the C compiler is dealing with C
semantics and mechanisms only (including the C mechanisms for sharing
with non-C code!), then it will know what to do.

If you're playing tricks behind the C compiler's back using
implementation-defined stuff outside of the C specification, then
there's nothing the compiler really can do.  For example, if you're
trying to access a variable on a function's stack from some other
function, you better know how the register allocator of the compiler
operates.  In contrast, if you let this function simply export the
address of the variable to some external place, all will be fine.

The documentation of GCC's -fwhole-program and -flto might also be
interesting for you.  GCC wouldn't need to have -fwhole-program if it
weren't conservative by default (correctly so).



Re: TYPE_BINFO and canonical types at LTO

2014-02-19 Thread Jan Hubicka
 On Tue, 18 Feb 2014, Jan Hubicka wrote:
 
Non-ODR types born from other frontends will then need to be made to 
alias all the ODR variants that can be done by storing them into the 
current canonical type hash.
(I wonder if we want to support cross language aliasing for non-POD?)
   
   Surely for accessing components of non-POD types, no?  Like
   
   class Foo {
   Foo();
   int *get_data ();
   int *data;
   } glob_foo;
   
   extern C int *get_foo_data() { return glob_foo.get_data(); }
  
  OK, if we want to support this, then we want to merge.
  What about types with vtbl pointer? :)
 
 I can easily create a C struct variant covering that.  Basically
 in _practice_ I can inter-operate with any language from C if I
 know its ABI.  Do we really want to make this undefined?  See
 the (even standard) Fortran - C interoperability spec.  I'm sure
 something exists for Ada interoperating with C (or even C++).

Well, if you know the ABI, you can interoperate C
struct {int a,b;};
with
struct {struct {int a;} a; struct {int b;}};

So we don't interoperate everything.  We may eventually want to define what
kind of inter-operation we support.  non-POD class layout is not always a
natural extension of C struct layout as one would expect
http://mentorembedded.github.io/cxx-abi/abi.html#layout
so it may make sense to declare it uninteroperable if it helps something in real
world (which I am not quite sure about)

I am not really shooting for this. I just want to get something that would
improve TBAA in pre-dominantly C++ programs (with some C code in them) such as
firefox, chromium, openoffice or GCC. Because I see us giving up on TBAA very
often when playing with cases that should be devirtualized by GVN/aggregate
propagation in ipa-prop but aren't. Also given that C++ standard actually
have notion of inter-module type equivalence it may be good idea to honnor it.
 
   ?  But you are talking about the tree merging part using ODR info
   to also merge types which differ in completeness of contained
   pointer types, right?  (exactly equal cases should be already merged)
  
  Actually I was speaking of canonical types here. I want to preserve more 
  of TBAA via honoring ODR and local types.
 
 So, are you positive there will be a net gain in optimization when
 doing that?  Please factor in the surprises you'll get when code
 gets miscompiled because of slight ODR violations or interoperability
 that no longer works.
 
  I want to change lto to not 
  merge canonical types for pairs of types of same layout (i.e. equivalent 
  in the current canonical type definition) but with different mangled 
  names.
 
 Names are nothing ;)  In C I very often see different _names_ used
 in headers vs. implementation (when the implementation uses a different
 internal header).  You have struct Foo; in public headers vs.
 struct Foo_impl; in the implementation.

Yes, I am aware of that - I had to go through all those issues with --combine
code in mid 2000s. C++ is different.
What I want is to make TBAA in between C++ types stronger and TBAA between
C++ and other languages to do pretty much what we donot, if possible.
 
  I also want it to never merge when types are local. For 
  inter-language TBAA we will need to ensure aliasing in between non-ODR 
  type of same layout and all unmerged variants of ODR type.
   Can it be 
  done by attaching chains of ODR types into the canonical type hash and 
  when non-ODR type appears, just make it alias with all of them?
 
 No, how would that work?

This is what I am asking you about.
At the end of streaming process, we will have canonical type hash with one 
leader
and list of ODR variants that was not merged based on ODR rules.
If leader happens to be non-ODR, then it must be made to alias all of them.
I think either we can rewrite all ODR cases to have the non-ODR as canonical 
type
or make something more fine grained so we can force loads/stores through non-ODR
types to alias with all of them, but loads/stores through ODR types to alias
as within C++ compilation unit.
 
  It would make sense to ODR merge in tree merging, too, but I am not sure if
  this fits the current design, since you would need to merge SCC components 
  of
  different shape then that seems hard, right?
 
 Right.  You'd lose the nice incremental SCC merging (where we haven't even
 yet implemented the nicest way - avoid re-materializing the SCC until
 we know it prevails).

Yes, SCC merging w/o re-materialization is a priority, indeed.
 
  It may be easier to ODR merge after streaming (during DECL fixup) just to 
  make
  WPA streaming cheaper and to reduce debug info size.  If you use
  -fdump-ipa-devirt, it will dump you ODR types that did not get merged (only
  ones with vtable pointers in them ATM) and there are quite long chains for
  firefox. Surely then hundreds of duplicated ODR types will end up in the 
  ltrans
  partition streams and they eventually hit debug output machinery.
  Eric sent me 

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Linus Torvalds
On Wed, Feb 19, 2014 at 6:40 AM, Torvald Riegel trie...@redhat.com wrote:

 If all those other threads written in whichever way use the same memory
 model and ABI for synchronization (e.g., choice of HW barriers for a
 certain memory_order), it doesn't matter whether it's a hardware thread,
 microcode, whatever.  In this case, C11 atomics should be fine.
 (We have this in userspace already, because correct compilers will have
 to assume that the code generated by them has to properly synchronize
 with other code generated by different compilers.)

 If the other threads use a different model, access memory entirely
 differently, etc, then we might be back to volatile because we don't
 know anything, and the very strict rules about execution steps of the
 abstract machine (ie, no as-if rule) are probably the safest thing to
 do.

Oh, I don't even care about architectures that don't have real hardware atomics.

So if there's a software protocol for atomics, all bets are off. The
compiler almost certainly has to do atomics with function calls
anyway, and we'll just plug in out own. And frankly, nobody will ever
care, because those architectures aren't relevant, and never will be.

Sure, there are some ancient Sparc platforms that only support a
single-byte ldstub and there are some embedded chips that don't
really do SMP, but have some pseudo-smp with special separate locking.
Really, nobody cares. The C standard has that crazy lock-free atomic
tests, and talks about address-free, but generally we require both
lock-free and address-free in the kernel, because otherwise it's just
too painful to do interrupt-safe locking, or do atomics in user-space
(for futexes).

So if your worry is just about software protocols for CPU's that
aren't actually designed for modern SMP, that's pretty much a complete
non-issue.

Linus


Re: Update x86-64 PLT for MPX

2014-02-19 Thread H.J. Lu
On Mon, Jan 27, 2014 at 1:50 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Jan 27, 2014 at 1:42 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Jan 18, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote:
 Hi,

 Here is the proposal to update x86-64 PLT for MPX.  The linker change
 is implemented on hjl/mpx/pltext8 branch.  Any comments/feedbacks?

 Thanks.

 --
 H.J.
 ---
 Intel MPX:

 http://software.intel.com/en-us/file/319433-017pdf

 introduces 4 bound registers, which will be used for parameter passing
 in x86-64.  Bound registers are cleared by branch instructions.  Branch
 instructions with BND prefix will keep bound register contents. This leads
 to 2 requirements to 64-bit MPX run-time:

 1. Dynamic linker (ld.so) should save and restore bound registers during
 symbol lookup.
 2. Change the current 16-byte PLT0:

   ff 35 08 00 00 00pushq  GOT+8(%rip)
   ff 25 00 10 00jmpq  *GOT+16(%rip)
   0f 1f 40 00nopl   0x0(%rax)

 and 16-byte PLT1:

   ff 25 00 00 00 00jmpq   *name@GOTPCREL(%rip)
   68 00 00 00 00   pushq  $index
   e9 00 00 00 00   jmpq   PLT0

 which clear bound registers, to preserve bound registers.

 We use 2 new relocations:

 #define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix 
 */
 #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */

 to mark branch instructions with BND prefix.

 When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
 it switches to a different PLT0:

   ff 35 08 00 00 00pushq  GOT+8(%rip)
   f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip)
   0f 1f 00nopl   (%rax)

 to preserve bound registers for symbol lookup and it also creates an
 external PLT section, .pl.bnd.  Linker will create a BND PLT1 entry
 in .plt:

   68 00 00 00 00   pushq  $index
   f2 e9 00 00 00 00 bnd jmpq PLT0
   0f 1f 44 00 00nopl 0(%rax,%rax,1)

 and a 8-byte BND PLT entry in .plt.bnd:

   f2 ff 25 00 00 00 00  bnd jmpq *name@GOTPCREL(%rip)
   90nop

 Otherwise, linker will create a legacy PLT entry in .plt:

   68 00 00 00 00   pushq  $index
   e9 00 00 00 00jmpq PLT0
   66 0f 1f 44 00 00 nopw 0(%rax,%rax,1)

 and a 8-byte legacy PLT in .plt.bnd:

   ff 25 00 00 00 00 jmpq  *name@GOTPCREL(%rip)
   66 90 xchg  %ax,%ax

 The initial value of the GOT entry for name will be set to the the
 pushq instruction in the corresponding entry in .plt.  Linker will
 resolve reference of symbol name to the entry in the second PLT,
 .plt.bnd.

 Prelink stores the offset of pushq of PLT1 (plt_base + 0x10) in GOT[1]
 and GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
 the corresponding the pushq offset with

 GOT[1] + (GOT offset - GOT[3]) * 2

 Since for each entry in .plt except for PLT0 we create a 8-byte entry in
 .plt.bnd, there is extra 8-byte per PLT symbol.

 We also investigated the 16-byte entry for .plt.bnd.  We compared the
 8-byte entry vs the the 16-byte entry for .plt.bnd on Sandy Bridge.
 There are no performance differences in SPEC CPU 2000/2006 as well as
 micro benchmarks.

 Pros:
 No change to undo prelink in dynamic linker.
 Only 8-byte memory overhead for each PLT symbol.
 Cons:
 Extra .plt.bnd section is needed.
 Extra 8 byte for legacy branches to PLT.
 GDB is unware of the new layout of .plt and .plt.bnd.

 Hi,

 I am enclosing the updated x86-64 psABI with PLT change.
 I checkeMy email is rejected due to PDF attachment.   I am resubmitting it 
 with
 out PDF file.
 d it onto hjl/mpx/master branch at

 https://github.com/hjl-tools/x86-64-psABI

 I will check in the binutils changes if there are no disagreements
 in 2 weeks.

 Thanks.


 My email is rejected due to PDF attachment.   I am resubmitting it with
 out PDF file.

I pushed the MPX binutils change into master:

https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=0ff2b86e7c14177ec7f9e1257f8e697814794017


-- 
H.J.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Paul E. McKenney
On Wed, Feb 19, 2014 at 06:55:51PM +0100, Torvald Riegel wrote:
 On Wed, 2014-02-19 at 07:14 -0800, Paul E. McKenney wrote:
  On Wed, Feb 19, 2014 at 11:59:08AM +0100, Torvald Riegel wrote:

[ . . . ]

   On both sides, the compiler will see that mmap() (or similar) is called,
   so that means the data escapes to something unknown, which could create
   threads and so on.  So first, it can't do whole-program analysis for
   this state anymore, and has to assume that other C11 threads are
   accessing this memory.  Next, lock-free atomics are specified to be
   address-free, meaning that they must work independent of where in
   memory the atomics are mapped (see C++ (e.g., N3690) 29.4p3; that's a
   should and non-normative, but essential IMO).  Thus, this then boils
   down to just a simple case of synchronization.  (Of course, the rest of
   the ABI has to match too for the data exchange to work.)
  
  The compiler will see mmap() on the user side, but not on the kernel
  side.  On the kernel side, something special is required.
 
 Maybe -- you'll certainly know better :)
 
 But maybe it's not that hard: For example, if the memory is in current
 code made available to userspace via calling some function with an asm
 implementation that the compiler can't analyze, then this should be
 sufficient.

The kernel code would need to explicitly tell the compiler what portions
of the kernel address space were covered by this.  I would not want the
compiler to have to work it out based on observing interactions with the
page tables.  ;-)

  Agree that address-free would be nice as shall rather than should.
  
I echo Peter's question about how one tags functions like mmap().

I will also remember this for the next time someone on the committee
discounts volatile.  ;-)

  5.  JITed code produced based on BPF: 
  https://lwn.net/Articles/437981/
 
 This might be special, or not, depending on how the JITed code gets
 access to data.  If this is via fixed addresses (e.g., (void*)0x123),
 then see above.  If this is through function calls that the compiler
 can't analyze, then this is like 4.

It could well be via the kernel reading its own symbol table, sort of
a poor-person's reflection facility.  I guess that would be for all
intents and purposes equivalent to your (void*)0x123.
   
   If it is replacing code generated by the compiler, then yes.  If the JIT
   is just filling in functions that had been undefined yet declared
   before, then the compiler will have seen the data escape through the
   function interfaces, and should be aware that there is other stuff.
  
  So one other concern would then be things things like ftrace, kprobes,
  ksplice, and so on.  These rewrite the kernel binary at runtime, though
  in very limited ways.
 
 Yes.  Nonetheless, I wouldn't see a problem if they, say, rewrite with
 C11-compatible code (and same ABI) on a function granularity (and when
 the function itself isn't executing concurrently) -- this seems to be
 similar to just having another compiler compile this particular
 function.

Well, they aren't using C11-compatible code yet.  They do patch within
functions.  And in some cases, they make staged sequences of changes to
allow the patching to happen concurrently with other CPUs executing the
code being patched.  Not sure that any of the latter is actually in the
kernel at the moment, but it has at least been prototyped and discussed.

Thanx, Paul



Re: ARM inline assembly usage in Linux kernel

2014-02-19 Thread Renato Golin
On 19 February 2014 11:58, Richard Sandiford
rsand...@linux.vnet.ibm.com wrote:
 I agree that having an unrecognised asm shouldn't be a hard error until
 assembly time though.  Saleem, is the problem that this is being rejected
 earlier?

Hi Andrew, Richard,

Thanks for your reviews! We agree that we should actually just ignore
the contents until object emission.

Just for context, one of the reasons why we enabled inline assembly
checks is for some obscure cases when the snippet changes the
instructions set (arm - thumb) and the rest of the function becomes
garbage. Our initial implementation was to always emit .arm/.thumb
after *any* inline assembly, which would become a nop in the worst
case. But since we had easy access to the assembler, we thought: why
not?.

The idea is now to try to parse the snippet for cases like .arm/.thumb
but only emit a warning IFF -Wbad-inline-asm (or whatever) is set (and
not to make it on by default), otherwise, ignore. We're hoping our
assembler will be able to cope with the multiple levels of indirection
automagically. ;)

Thanks again!
--renato


Re: ARM inline assembly usage in Linux kernel

2014-02-19 Thread Andrew Pinski
On Wed, Feb 19, 2014 at 3:17 PM, Renato Golin renato.go...@linaro.org wrote:
 On 19 February 2014 11:58, Richard Sandiford
 rsand...@linux.vnet.ibm.com wrote:
 I agree that having an unrecognised asm shouldn't be a hard error until
 assembly time though.  Saleem, is the problem that this is being rejected
 earlier?

 Hi Andrew, Richard,

 Thanks for your reviews! We agree that we should actually just ignore
 the contents until object emission.

 Just for context, one of the reasons why we enabled inline assembly
 checks is for some obscure cases when the snippet changes the
 instructions set (arm - thumb) and the rest of the function becomes
 garbage. Our initial implementation was to always emit .arm/.thumb
 after *any* inline assembly, which would become a nop in the worst
 case. But since we had easy access to the assembler, we thought: why
 not?.

With the unified assembly format, you should not need those
.arm/.thumb and in fact emitting them can make things even worse.

Thanks,
Andrew Pinski



 The idea is now to try to parse the snippet for cases like .arm/.thumb
 but only emit a warning IFF -Wbad-inline-asm (or whatever) is set (and
 not to make it on by default), otherwise, ignore. We're hoping our
 assembler will be able to cope with the multiple levels of indirection
 automagically. ;)

 Thanks again!
 --renato


Re: ARM inline assembly usage in Linux kernel

2014-02-19 Thread Renato Golin
On 19 February 2014 23:19, Andrew Pinski pins...@gmail.com wrote:
 With the unified assembly format, you should not need those
 .arm/.thumb and in fact emitting them can make things even worse.

If only we could get rid or all pre-UAL inline assembly on the planet... :)

The has been the only reason why we added support for those in our
assembler, because GAS supports them and people still use (or have
legacy code they won't change).

If the binutils folks (and you guys) are happy to start seriously
de-phasing pre-UAL support, I'd be more than happy to do so on our
end. Do you think I should start that conversation on the binutils
list?

Maybe a new serious compulsory warning, to start?

cheers,
--renato


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Linus Torvalds
On Tue, Feb 18, 2014 at 11:47 AM, Torvald Riegel trie...@redhat.com wrote:
 On Tue, 2014-02-18 at 09:44 -0800, Linus Torvalds wrote:

 Can you point to it? Because I can find a draft standard, and it sure
 as hell does *not* contain any clarity of the model. It has a *lot* of
 verbiage, but it's pretty much impossible to actually understand, even
 for somebody who really understands memory ordering.

 http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
 This has an explanation of the model up front, and then the detailed
 formulae in Section 6.  This is from 2010, and there might have been
 smaller changes since then, but I'm not aware of any bigger ones.

Ahh, this is different from what others pointed at. Same people,
similar name, but not the same paper.

I will read this version too, but from reading the other one and the
standard in parallel and trying to make sense of it, it seems that I
may have originally misunderstood part of the whole control dependency
chain.

The fact that the left side of ? :,  and || breaks data
dependencies made me originally think that the standard tried very
hard to break any control dependencies. Which I felt was insane, when
then some of the examples literally were about the testing of the
value of an atomic read. The data dependency matters quite a bit. The
fact that the other Mathematical paper then very much talked about
consume only in the sense of following a pointer made me think so even
more.

But reading it some more, I now think that the whole data dependency
logic (which is where the special left-hand side rule of the ternary
and logical operators come in) are basically an exception to the rule
that sequence points end up being also meaningful for ordering (ok, so
C11 seems to have renamed sequence points to sequenced before).

So while an expression like

atomic_read(p, consume) ? a : b;

doesn't have a data dependency from the atomic read that forces
serialization, writing

   if (atomic_read(p, consume))
  a;
   else
  b;

the standard *does* imply that the atomic read is happens-before wrt
a, and I'm hoping that there is no question that the control
dependency still acts as an ordering point.

THAT was one of my big confusions, the discussion about control
dependencies and the fact that the logical ops broke the data
dependency made me believe that the standard tried to actively avoid
the whole issue with control dependencies can break ordering
dependencies on some CPU's due to branch prediction and memory
re-ordering by the CPU.

But after all the reading, I'm starting to think that that was never
actually the implication at all, and the logical ops breaks the data
dependency rule is simply an exception to the sequence point rule.
All other sequence points still do exist, and do imply an ordering
that matters for consume

Am I now reading it right?

So the clarification is basically to the statement that the if
(consume(p)) a version *would* have an ordering guarantee between the
read of p and a, but the consume(p) ? a : b would *not* have
such an ordering guarantee. Yes?

   Linus


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Paul E. McKenney
On Wed, Feb 19, 2014 at 04:53:49PM -0800, Linus Torvalds wrote:
 On Tue, Feb 18, 2014 at 11:47 AM, Torvald Riegel trie...@redhat.com wrote:
  On Tue, 2014-02-18 at 09:44 -0800, Linus Torvalds wrote:
 
  Can you point to it? Because I can find a draft standard, and it sure
  as hell does *not* contain any clarity of the model. It has a *lot* of
  verbiage, but it's pretty much impossible to actually understand, even
  for somebody who really understands memory ordering.
 
  http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
  This has an explanation of the model up front, and then the detailed
  formulae in Section 6.  This is from 2010, and there might have been
  smaller changes since then, but I'm not aware of any bigger ones.
 
 Ahh, this is different from what others pointed at. Same people,
 similar name, but not the same paper.
 
 I will read this version too, but from reading the other one and the
 standard in parallel and trying to make sense of it, it seems that I
 may have originally misunderstood part of the whole control dependency
 chain.
 
 The fact that the left side of ? :,  and || breaks data
 dependencies made me originally think that the standard tried very
 hard to break any control dependencies. Which I felt was insane, when
 then some of the examples literally were about the testing of the
 value of an atomic read. The data dependency matters quite a bit. The
 fact that the other Mathematical paper then very much talked about
 consume only in the sense of following a pointer made me think so even
 more.
 
 But reading it some more, I now think that the whole data dependency
 logic (which is where the special left-hand side rule of the ternary
 and logical operators come in) are basically an exception to the rule
 that sequence points end up being also meaningful for ordering (ok, so
 C11 seems to have renamed sequence points to sequenced before).
 
 So while an expression like
 
 atomic_read(p, consume) ? a : b;
 
 doesn't have a data dependency from the atomic read that forces
 serialization, writing
 
if (atomic_read(p, consume))
   a;
else
   b;
 
 the standard *does* imply that the atomic read is happens-before wrt
 a, and I'm hoping that there is no question that the control
 dependency still acts as an ordering point.

The control dependency should order subsequent stores, at least assuming
that a and b don't start off with identical stores that the compiler
could pull out of the if and merge.  The same might also be true for ?:
for all I know.  (But see below)

That said, in this case, you could substitute relaxed for consume and get
the same effect.  The return value from atomic_read() gets absorbed into
the if condition, so there is no dependency-ordered-before relationship,
so nothing for consume to do.

One caution...  The happens-before relationship requires you to trace a
full path between the two operations of interest.  This is illustrated
by the following example, with both x and y initially zero:

T1: atomic_store_explicit(x, 1, memory_order_relaxed);
r1 = atomic_load_explicit(y, memory_order_relaxed);

T2: atomic_store_explicit(y, 1, memory_order_relaxed);
r2 = atomic_load_explicit(x, memory_order_relaxed);

There is a happens-before relationship between T1's load and store,
and another happens-before relationship between T2's load and store,
but there is no happens-before relationship from T1 to T2, and none
in the other direction, either.  And you don't get to assume any
ordering based on reasoning about these two disjoint happens-before
relationships.

So it is quite possible for r1==1r2==1 after both threads complete.

Which should be no surprise: This misordering can happen even on x86,
which would need a full smp_mb() to prevent it.

 THAT was one of my big confusions, the discussion about control
 dependencies and the fact that the logical ops broke the data
 dependency made me believe that the standard tried to actively avoid
 the whole issue with control dependencies can break ordering
 dependencies on some CPU's due to branch prediction and memory
 re-ordering by the CPU.
 
 But after all the reading, I'm starting to think that that was never
 actually the implication at all, and the logical ops breaks the data
 dependency rule is simply an exception to the sequence point rule.
 All other sequence points still do exist, and do imply an ordering
 that matters for consume
 
 Am I now reading it right?

As long as there is an unbroken chain of -data- dependencies from the
consume to the later access in question, and as long as that chain
doesn't go through the excluded operations, yes.

 So the clarification is basically to the statement that the if
 (consume(p)) a version *would* have an ordering guarantee between the
 read of p and a, but the consume(p) ? a : b would *not* have
 such an ordering guarantee. Yes?

Neither has a data-dependency guarantee, because there is no data
dependency from the load to either a or b.  

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Linus Torvalds
On Wed, Feb 19, 2014 at 8:01 PM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:

 The control dependency should order subsequent stores, at least assuming
 that a and b don't start off with identical stores that the compiler
 could pull out of the if and merge.  The same might also be true for ?:
 for all I know.  (But see below)

Stores I don't worry about so much because

 (a) you can't sanely move stores up in a compiler anyway
 (b) no sane CPU or moves stores up, since they aren't on the critical path

so a read-cmp-store is actually really hard to make anything sane
re-order. I'm sure it can be done, and I'm sure it's stupid as hell.

But that it's hard to screw up is *not* true for a load-cmp-load.

So lets make this really simple: if you have a consume-cmp-read, is
the ordering of the two reads guaranteed?

 As long as there is an unbroken chain of -data- dependencies from the
 consume to the later access in question, and as long as that chain
 doesn't go through the excluded operations, yes.

So let's make it *really* specific, and make it real code doing a real
operation, that is actually realistic and reasonable in a threaded
environment, and may even be in some critical code.

The issue is the read-side ordering guarantee for 'a' and 'b', for this case:

 - Initial state:

   a = b = 0;

 - Thread 1 (consumer):

if (atomic_read(a, consume))
 return b;
/* not yet initialized */
return -1;

 - Thread 2 (initializer):

 b = some_value_lets_say_42;
 /* We are now ready to party */
 atomic_write(a, 1, release);

and quite frankly, if there is no ordering guarantee between the read
of a and the read of b in the consumer thread, then the C atomics
standard is broken.

Put another way: I claim that if thread 1 ever sees a return value
other than -1 or 42, then the whole definition of atomics is broken.

Question 2: and what changes if the atomic_read() is turned into an
acquire, and why? Does it start working?

 Neither has a data-dependency guarantee, because there is no data
 dependency from the load to either a or b.  After all, the value
 loaded got absorbed into the if condition.  However, according to
 discussions earlier in this thread, the if variant would have a
 control-dependency ordering guarantee for any stores in a and b
 (but not loads!).

So exactly what part of the standard allows the loads to be
re-ordered, and why? Quite frankly, I'd think that any sane person
will agree that the above code snippet is realistic, and that my
requirement that thread 1 sees either -1 or 42 is valid.

And if the C standards body has said that control dependencies break
the read ordering, then I really think that the C standards committee
has screwed up.

If the consumer of an atomic load isn't a pointer chasing operation,
then the consume should be defined to be the same as acquire. None of
this conditionals break consumers. No, conditionals on the
dependency path should turn consumers into acquire, because otherwise
the consume load is dangerous as hell.

And if the definition of acquire doesn't include the control
dependency either, then the C atomic memory model is just completely
and utterly broken, since the above *trivial* and clearly useful
example is broken.

I really think the above example is pretty damn black-and-white.
Either it works, or the standard isn't worth wiping your ass with.

  Linus


Re: Building GCC with -Wmissing-declarations and addressing its warnings

2014-02-19 Thread Jonathan Wakely
On 13 February 2014 20:47, Patrick Palka wrote:
 On a related note, would a patch to officially enable
 -Wmissing-declarations in the build process be well regarded?

What would be the advantage?

  Since
 -Wmissing-prototypes is currently enabled, I assume it is the
 intention of the GCC devs to address these warnings, and that during
 the transition from a C to C++ bootstrap compiler a small oversight
 was made (that -Wmissing-prototypes is a no-op against C++ source
 files).

The additional safety provided by -Wmissing-prototypes is already
guaranteed for C++.

In C a missing prototype causes the compiler to guess, probably
incorrectly, how to call the function.

In C++ a function cannot be called without a previous declaration and
the linker will notice if you declare a function with one signature
and define it differently.


[Bug c++/60258] Member initialization for atomic fail.

2014-02-19 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60258

Jonathan Wakely redi at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-19
 Ever confirmed|0   |1


[Bug c/57896] [4.8 Regression] ICE in expand_expr_real_2

2014-02-19 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org
   Target Milestone|--- |4.8.4
Summary|ICE in expand_expr_real_2   |[4.8 Regression] ICE in
   ||expand_expr_real_2

--- Comment #7 from Marek Polacek mpolacek at gcc dot gnu.org ---
Currently, I see ICE only with 4.8:

./cc1 xx.c -m32 -quiet -march=x86-64

In function ‘__get_cpuid_max’:
Segmentation fault
 }
 ^
0xa3924f crash_signal
/home/marek/src/gcc/gcc/toplev.c:332
0x11fc5fe pp_base_string(pretty_print_info*, char const*)
/home/marek/src/gcc/gcc/pretty-print.c:835
0x11fd0a4 pp_base_format(pretty_print_info*, text_info*)
/home/marek/src/gcc/gcc/pretty-print.c:496
0x11fa7ac diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
/home/marek/src/gcc/gcc/diagnostic.c:755
0x11faabb internal_error(char const*, ...)
/home/marek/src/gcc/gcc/diagnostic.c:1094
0x9b9aa7 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int,
char const*)
/home/marek/src/gcc/gcc/rtl.c:773
0xc93a5d pro_epilogue_adjust_stack
/home/marek/src/gcc/gcc/config/i386/i386.c:9517
0xca0b20 ix86_expand_prologue()
/home/marek/src/gcc/gcc/config/i386/i386.c:10491
0xd645da gen_prologue()
/home/marek/src/gcc/gcc/config/i386/i386.md:11810
0x80dc79 thread_prologue_and_epilogue_insns
/home/marek/src/gcc/gcc/function.c:5949
0x8110a2 rest_of_handle_thread_prologue_and_epilogue
/home/marek/src/gcc/gcc/function.c:6973
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.

[Bug target/57896] [4.8 Regression] ICE in expand_expr_real_2

2014-02-19 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

  Component|c   |target

--- Comment #8 from Marek Polacek mpolacek at gcc dot gnu.org ---
Not a C FE issue.


[Bug c/60170] No -Wtype-limits warning with -O1

2014-02-19 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60170

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-19
 CC||mpolacek at gcc dot gnu.org
 Ever confirmed|0   |1


[Bug c/59933] for loop goes wild with assert() enabled

2014-02-19 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||mpolacek at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #4 from Marek Polacek mpolacek at gcc dot gnu.org ---
There's no good reproducer, but if -fno-aggressive-loop-optimizations helps,
most likely the code invokes undefined behavior.  Tentatively closing as
invalid (PR59982 was closed too).


[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 32167
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32167action=edit
gcc49-pr60267.patch

Untested fix for the preprocessing ICE.  So, with this patch you should be able
to preprocess the file now.


[Bug rtl-optimization/60268] [4.9 regression] ICE: in rank_for_schedule, at haifa-sched.c:2557

2014-02-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60268

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |4.9.0


[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build

2014-02-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |4.9.0

--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org ---
Similar to recently fixed maybe this is either the call fn or the static chain?


[Bug fortran/59537] Automatic array cannot have an initializer, for -finit-real and a SAVE statement present in subroutine

2014-02-19 Thread bugs at stellardeath dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59537

--- Comment #1 from Lorenz Hüdepohl bugs at stellardeath dot org ---
Maybe related to the already fixed #51800?

[Bug fortran/59537] Automatic array cannot have an initializer, for -finit-real and a SAVE statement present in subroutine

2014-02-19 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59537

Dominique d'Humieres dominiq at lps dot ens.fr changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-19
 Ever confirmed|0   |1

--- Comment #2 from Dominique d'Humieres dominiq at lps dot ens.fr ---
 Maybe related to the already fixed #51800?

Maybe, but this PR is still present on trunk (4.9 r207856), 4.7.4 and 4.8.2.


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #8 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Wed Feb 19 09:29:34 2014
New Revision: 207879

URL: http://gcc.gnu.org/viewcvs?rev=207879root=gccview=rev
Log:
2014-02-19  Richard Biener  rguent...@suse.de

PR ipa/60243
* ipa-prop.c: Include stringpool.h and tree-ssanames.h.
(ipa_modify_call_arguments): Emit an argument load explicitely and
preserve virtual SSA form there and for the replacement call.
Do not update SSA form nor free dominance info.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-prop.c


[Bug rtl-optimization/60268] [4.9 regression] ICE: in rank_for_schedule, at haifa-sched.c:2557

2014-02-19 Thread abel at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60268

Andrey Belevantsev abel at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |abel at gcc dot gnu.org

--- Comment #3 from Andrey Belevantsev abel at gcc dot gnu.org ---
I'm out of office today, so I'll have a look properly tomorrow, but...

(In reply to Jakub Jelinek from comment #2)
 So perhaps:
 --- gcc/haifa-sched.c 2014-02-18 08:18:53.045024428 +0100
 +++ gcc/haifa-sched.c 2014-02-19 07:58:38.191381581 +0100
 @@ -2550,7 +2550,7 @@ rank_for_schedule (const void *x, const
   return INSN_LUID (tmp) - INSN_LUID (tmp2);
  }
  
 -  if (live_range_shrinkage_p)
 +  if (live_range_shrinkage_p  sched_pressure != SCHED_PRESSURE_NONE)
  {
/* Don't use SCHED_PRESSURE_MODEL -- it results in much worse
code.  */

...  the fired assert below this code means that we have turned off
sched-pressure on the new region (unexpectedly to live_range_shrinkage) and I'd
like to know how this region was added.  I guess I missed some entry point
within the new scheduler code when fixing the previous PR.

 
 BTW, why
   if (sched_pressure != SCHED_PRESSURE_NONE)
 free_global_sched_pressure_data ();
 when free_global_sched_pressure_data () contains the same guard and thus
 could be called unconditionally?

Pilot error while being over cautious, I will simplify that too.

[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2014-02-19
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 32168
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32168action=edit
gcc49-pr60267-2.patch

Untested fix for the tsubst ICE.  Of course, without preprocessed testcase I
can't be sure if this patch fixed it.


[Bug c/59193] Unused postfix operator temporaries

2014-02-19 Thread manu at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193

Manuel López-Ibáñez manu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||manu at gcc dot gnu.org

--- Comment #5 from Manuel López-Ibáñez manu at gcc dot gnu.org ---
(In reply to Max TenEyck Woodbury from comment #4)
 Since there are hundreds, if not thousands of instances of this defect in the
 GCC code and there is no urgency in correcting these defects, this bug will
 only
 get resolved slowly.  Closing it for invalid reasons does the community a
 disservice.

Are you planning to help in fixing these and other problems? If so, please
start the copyright assignment process:
http://gcc.gnu.org/contribute.html#legal

Then, to get your feet wet, it would be better to start with some
uncontroversial bugs like: PR25801, or PR55080, or PR57622 or PR52347.

I have a long list of easy hacks that would help a lot GCC and its users.

[Bug c++/60269] New: #pragma simd tsubst related ICE

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60269

Bug ID: 60269
   Summary: #pragma simd tsubst related ICE
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
CC: bviyer at gcc dot gnu.org

template int N
void
foo (int *a, int *b, int *c)
{
#pragma simd vectorlength (N)
  for (int i = 0; i  N; i++)
a[i] = b[i] * c[i];
}

void
bar (int *a, int *b, int *c)
{
  foo 64 (a, b, c);
}

ICEs with -fcilkplus, so clearly tsubst on it is not performed correctly.


[Bug target/60204] struct with __m512i is mishandled in function parameter passing and return

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60204

Uroš Bizjak ubizjak at gmail dot com changed:

   What|Removed |Added

 CC||ubizjak at gmail dot com
   Target Milestone|--- |4.9.0
   Severity|normal  |major

--- Comment #3 from Uroš Bizjak ubizjak at gmail dot com ---
(In reply to H.J. Lu from comment #2)

 It is a bug in psABI.  It should read as eight \eightbytes.

So, let's fix the psABI first and then fix gcc.

This PR should be resolved before 4.9 is released.

[Bug target/60204] struct with __m512i is mishandled in function parameter passing and return

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60204

--- Comment #4 from Uroš Bizjak ubizjak at gmail dot com ---
(In reply to Uroš Bizjak from comment #3)

 So, let's fix the psABI first and then fix gcc.

psABI is fixed in [1].

[1]
https://github.com/hjl-tools/x86-64-psABI/commit/6d7ccd614fe67111d2aecec853c3df0310b372d2

[Bug target/57232] wcstol.c:213:1: internal compiler error

2014-02-19 Thread aoliva at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57232

Alexandre Oliva aoliva at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |aoliva at gcc dot 
gnu.org

--- Comment #15 from Alexandre Oliva aoliva at gcc dot gnu.org ---
Mine.  It looks like the call to cselib_preserve_cfa_base_value in
vt_initialize should be guarded by some conditions, like this:

  if (reg != hard_frame_pointer_rtx  fixed_regs[REGNO (reg)])
cselib_preserve_cfa_base_value (val, REGNO (reg));

This fixes the reported problem for me (though I haven't otherwise regtested
it).  Daniel, Jon, Nick, Sebastian, does it fix the problem for you?


[Bug c/59193] Unused postfix operator temporaries

2014-02-19 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193

Andrew Pinski pinskia at gcc dot gnu.org changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Andrew Pinski pinskia at gcc dot gnu.org ---
Take:
int f(int a)
{
  a++;
  return a;
}

int g(int a)
{
  ++a;
  return a;
}
--- CUT --- 

The gimplifier produces the exact same IR for both cases:
f (int a)
{
  int D.1790;

  a = a + 1;
  D.1790 = a;
  return D.1790;
}


g (int a)
{
  int D.1792;

  a = a + 1;
  D.1792 = a;
  return D.1792;
}
--- CUT --- 

So the compiler is already smart enough to remove the temporary storage even
at -O0.

With:
int f(int a)
{
  int b = a++;
  return a;
}
It does not remove it but that is because the result of a++ is not unused.

So it is the gimplifier knows if the result is unused and will not use them
otherwise.  This is the same issue as memcpy and its return value.


[Bug fortran/60255] Deferred character length variable at (1) cannot yet be associated with unlimited polymorphic entities

2014-02-19 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60255

Dominique d'Humieres dominiq at lps dot ens.fr changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-19
 Ever confirmed|0   |1


[Bug fortran/60238] Allow colon-separated triplet in array initialization

2014-02-19 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60238

Dominique d'Humieres dominiq at lps dot ens.fr changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2014-02-19
 Ever confirmed|0   |1

--- Comment #2 from Dominique d'Humieres dominiq at lps dot ens.fr ---
Anybody against closing this PR as WONTFIX?


[Bug c/59193] Unused postfix operator temporaries

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org ---
Also note that even at -O0, at no point during compilation with GCC/G++ if the
value of ++a or a++ isn't used and a has integral/pointer type one form is more
efficient than the other.  It is just a different tree code
({PRE,POST}{IN,DE}CREMENT_EXPR), but with the same operand, type etc.
So, there is no waste of any resources, it is not a defect to use either style,
it is purely coding convention matter.


[Bug libstdc++/60270] New: [C++1y] std::quoted is too eager to clear the string

2014-02-19 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60270

Bug ID: 60270
   Summary: [C++1y] std::quoted is too eager to clear the string
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org

Lars Gullik Bjønnes pointed out that we clear the string too early in
std::quoted, which causes this to fail:

#include string
#include sstream
#include iomanip
#include cassert

int main()
{
  std::istringstream in;
  std::string s = xxx;
  in  s;
  assert( !s.empty() );
  in  std::quoted(s);
  assert( !s.empty() );  // fails
}

[Bug libstdc++/60271] New: [C++1y] std::max(initializer_listT) cannot use std::max_element

2014-02-19 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60271

Bug ID: 60271
   Summary: [C++1y] std::max(initializer_listT) cannot use
std::max_element
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3893.html#2350
(approved in Issaquah) adds constexpr to std::max(initializer_listT), which
means we can't use std::max_element to implement it.


[Bug target/60204] struct with __m512i is mishandled in function parameter passing and return

2014-02-19 Thread tocarip.intel at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60204

--- Comment #5 from tocarip.intel at gmail dot com ---
Created attachment 32169
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32169action=edit
Proposed patch.

Currently testing attached patch.


[Bug c++/60272] New: atomic::compare_exchange_weak has spurious store and can cause race conditions

2014-02-19 Thread anthony.ajw at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272

Bug ID: 60272
   Summary: atomic::compare_exchange_weak has spurious store and
can cause race conditions
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: anthony.ajw at gmail dot com

Created attachment 32170
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32170action=edit
Sample code that demonstrates the problem

G++ 4.8.1 is producing incorrect code for std::atomic::compare_exchange_weak
on x86-64 linux.

In particular, if the exchange succeeds, then there is an additional spurious
store to the expected parameter after the exchange, which may race with other
threads and cause problems.

e.g.

#include atomic
struct Node { Node* next; };
void Push(std::atomicNode* head, Node* node)
{
node-next = head.load();
while(!head.compare_exchange_weak(node-next, node))
;
}

When compiled with

g++-4.8 -S -std=c++11 -pthread -O3 t.cpp

the generated code is:

movq(%rdi), %rax
movq%rax, (%rsi)
movq(%rsi), %rax
.p2align 4,,10
.p2align 3
.L3:
lock; cmpxchgq%rsi, (%rdi)
movq%rax, (%rsi) ***
jne.L3
rep; ret

The line marked *** is an unconditional store to node-next in this
example, and will be executed even if the exchange is successful.

This will cause a race with code that uses the compare-exchange to order memory
operations.

e.g.

void Pop(std::atomicNode* head){
for(;;){
Node* value=head.exchange(nullptr);
if(value){
delete value;
break;
}
}
}

If the exchange successfully retrieves a non-null value, it should be OK to
delete it (assuming the node was allocated with new). However, if one thread is
calling Push() and is suspended after the CMPXCHG and before the line marked
*** is executed then another thread running Pop() can successfully complete
the exchange and call delete. When the first thread is resumed, the line marked
*** will then store to deallocated memory.

This is in contradiction to 29.6.5p21 of the C++ Standard, which states that
expected is only updated in the case of failure.


[Bug c++/60272] atomic::compare_exchange_weak has spurious store and can cause race conditions

2014-02-19 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272

Jonathan Wakely redi at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-19
 Ever confirmed|0   |1


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #10 from Joey Ye joey.ye at arm dot com ---
(In reply to rguent...@suse.de from comment #9)
 On Mon, 17 Feb 2014, joey.ye at arm dot com wrote:
 
 
 But that doesn't make sense - it means that -fdisable-tree-forwprop4
 should get numbers back to good speed, no?  Because that's the
 only change forwprop4 does.
-fdisable-tree-forwprop4 dooms other transformation and results slightly worse
code than before. So the number isn't back to the best. I think forwprop4 does
some good stuff here and disabling it isn't the solution.
 
 For completeness please base checks on r207316 (it contains a fix
 for the blamed revision, but as far as I can see it shouldn't make
 a difference for the testcase).
I'm playing with r207686 and it is the same for this case.
 
 Did you check whether my hackish patch fixes things?
I did with trunk 20140208. But it didn't make any difference to Proc_8


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #11 from Joey Ye joey.ye at arm dot com ---
Repost from another record. It is annoying that after commenting one record it
automatically jumps to the next.

Here is good expansion:
;; _41 = _42 * 4;

(insn 20 19 0 (set (reg:SI 126 [ D.5038 ])
(ashift:SI (reg/v:SI 131 [ Int_1_Par_Val ])
(const_int 2 [0x2]))) -1
 (nil))

;; _40 = _2 + _41;

(insn 21 20 22 (set (reg:SI 136 [ D.5035 ])
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) -1
 (nil))

(insn 22 21 0 (set (reg/f:SI 125 [ D.5035 ])
(plus:SI (reg:SI 136 [ D.5035 ])
(reg:SI 126 [ D.5038 ]))) -1
 (nil))


;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 29 28 30 (set (reg:SI 139)
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) Proc_8.c:23 -1
 (nil))

(insn 30 29 31 (set (reg:SI 140)
(plus:SI (reg:SI 139)
(reg:SI 126 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 31 30 32 (set (reg/f:SI 141)
(plus:SI (reg:SI 140)
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

(insn 32 31 0 (set (mem:SI (plus:SI (reg/f:SI 141)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 124 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

After cse1 140 can be replaced by 125, thus lead a series of transformation
make it much more efficient.

Here is bad expansion:
;; _40 = Arr_2_Par_Ref_22(D) + _12;

(insn 22 21 23 (set (reg:SI 138 [ D.5038 ])
(plus:SI (reg:SI 128 [ D.5038 ])
(reg:SI 121 [ D.5036 ]))) -1
 (nil))

(insn 23 22 0 (set (reg/f:SI 127 [ D.5035 ])
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 138 [ D.5038 ]))) -1
 (nil))

;; _32 = _20 + 1000;

(insn 29 28 0 (set (reg:SI 124 [ D.5038 ])
(plus:SI (reg:SI 121 [ D.5036 ])
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 32 31 33 (set (reg:SI 141)
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 124 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 33 32 34 (set (reg/f:SI 142)
(plus:SI (reg:SI 141)
(reg:SI 128 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 34 33 0 (set (mem:SI (plus:SI (reg/f:SI 142)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 126 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

Here cse doesn't happen, resulting in less optimal insns. Reason why cse
doesn't happen is unclear yet.


[Bug tree-optimization/54742] Switch elimination in FSM loop

2014-02-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

--- Comment #36 from Joey Ye joey.ye at arm dot com ---
Please ignore previous comment as it shouldn't be here.


[Bug fortran/60232] [OOP] The rank of the element in the structure constructor does not match that of the component

2014-02-19 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60232

--- Comment #5 from janus at gcc dot gnu.org ---
Author: janus
Date: Wed Feb 19 11:52:39 2014
New Revision: 207896

URL: http://gcc.gnu.org/viewcvs?rev=207896root=gccview=rev
Log:
2014-02-19  Janus Weil  ja...@gcc.gnu.org

PR fortran/60232
* expr.c (gfc_get_variable_expr): Don't add REF_ARRAY for dimensionful
functions, which are used as procedure pointer target.


2014-02-19  Janus Weil  ja...@gcc.gnu.org

PR fortran/60232
* gfortran.dg/typebound_proc_33.f90: New.

Added:
trunk/gcc/testsuite/gfortran.dg/typebound_proc_33.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/expr.c
trunk/gcc/testsuite/ChangeLog


[Bug fortran/60232] [OOP] The rank of the element in the structure constructor does not match that of the component

2014-02-19 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60232

janus at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from janus at gcc dot gnu.org ---
Fixed on trunk with r207896. Closing.

Thanks for the report!


[Bug fortran/60255] Deferred character length variable at (1) cannot yet be associated with unlimited polymorphic entities

2014-02-19 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60255

--- Comment #4 from janus at gcc dot gnu.org ---
Antony, is it possible for you to try the patch in comment 2, in order to check
if it produces the expected runtime behavior for your code?


[Bug debug/56563] no debuginfo for explicit operator

2014-02-19 Thread mark at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56563

Mark Wielaard mark at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mark at gcc dot gnu.org

--- Comment #1 from Mark Wielaard mark at gcc dot gnu.org ---
PR debug/37959 did add support for explicit constructors by adding a
LANG_HOOKS_FUNCTION_DECL_EXPLICIT_P that dwarf2out uses to tag functions with
DW_AT_explicit attributes.


[Bug c/59933] for loop goes wild with assert() enabled

2014-02-19 Thread warnerme at ptd dot net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933

--- Comment #5 from Mark Warner warnerme at ptd dot net ---
sizeof(NSQ_del_dec_struct) / sizeof(opus_int32) is guaranteed to produced a
even number with a remainder of 0.
Note the __attribute__ ((__aligned__ (8))) to make it a multiple of 8 in size.


[Bug c++/60272] atomic::compare_exchange_weak has spurious store and can cause race conditions

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||rth at gcc dot gnu.org,
   ||torvald at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
Even our __atomic_compare_exchange* documentation states that:
If they are not equal, the current contents of
@code{*@var{ptr}} is written into @code{*@var{expected}}.
But then expand_builtin_atomic_compare_exchange doesn't care:
  oldval = expect;
  if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : target),
   oldval, mem, oldval, desired,
   is_weak, success, failure))
return NULL_RTX;

  if (oldval != expect)
emit_move_insn (expect, oldval);

That effectively means that expect will be stored unconditionally.
So, either we'd need to change this function, so that it sets oldval to
NULL_RTX
first, and passes ..., oldval, mem, expected, ... and needs to also always ask
for target, then conditionally on target store to expected, or perhaps add
extra parameter to expand_atomic_compare_and_swap and do the store only
conditionally in that case.  Richard/Torvald?


[Bug c++/60273] New: gcc gets confused when one class uses variadic

2014-02-19 Thread walter.mascarenhas at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60273

Bug ID: 60273
   Summary: gcc gets confused when one class uses variadic
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: walter.mascarenhas at gmail dot com

Created attachment 32171
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32171action=edit
gcc asked me to submit this file

//
// When compiling this file in Ubuntu 13.04, gcc 4.8.1 crashes with the
// following message. Clang 3.0 compiles the file with no problems.
//
// /home/walter/code/klein/tests/platform/gcc_bug/main.cc:-1: In instantiation
of 'struct BarFooint ':
// /home/walter/code/klein/tests/platform/gcc_bug/main.cc:25: required from
here
// /home/walter/code/klein/tests/platform/gcc_bug/main.cc:20: internal compiler
error: Segmentation fault
//  templateint... Ns using Buggy = typename X::template SNs...;
// :-1: error: [main.o] Error 1^
//

struct A {};

template class X
struct Foo
{
  using Type = int;

  // if the next line is replaced by template int... N then all is fine
  template int N1, int N2
  using S = A;
};

template class X
struct Bar
{
  // if the next line is commented then all is fine.
  using Type = typename X::Type;

  // if the next line is commented then all is fine.
  templateint... Ns using Buggy = typename X::template SNs...;
};

void foobar()
{
  Bar Fooint  bf;
}


[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887

2014-02-19 Thread slayoo at staszic dot waw.pl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267

--- Comment #7 from Sylwester Arabas slayoo at staszic dot waw.pl ---
Created attachment 32172
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32172action=edit
preprocessed source trigerring ICE with g++ snapshot 20140212

Thanks a lot for looking at it.
I'm attaching the source proprocessed with g++ 4.8.2.
It gives the tsubst_copy ICE with the 20140212 g++ snapshot.

HTH,
Sylwester


[Bug c++/60274] New: String as template parameter - regression in 4.8.2

2014-02-19 Thread ondrej.kolacek1 at centrum dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60274

Bug ID: 60274
   Summary: String as template parameter - regression in 4.8.2
   Product: gcc
   Version: 4.8.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ondrej.kolacek1 at centrum dot cz

Greetings, this issue happened to me with Debian's g++ so it is possible it is
just their bug but hopefully (well this is debatable :) ) it is not.


begin file test.cpp
typedef const char *const  ProtocolIdType;
//typedef int ProtocolIdType;

template ProtocolIdType protocolId
class C {
public:
typedef int ProtocolVersion;

class D
{
public:
ProtocolVersion GetProtocolVersion();
};

};
template ProtocolIdType protocolId
typename CprotocolId::ProtocolVersion CprotocolId::D::GetProtocolVersion()
{
return 1;
}

int main(void)
{
}
end file test.cpp

g++ test.cpp
test.cpp:18:41: error: prototype for ‘typename CprotocolId::ProtocolVersion
CprotocolId::D::GetProtocolVersion()’ does not match any in class
‘CprotocolId::D’
 typename CprotocolId::ProtocolVersion CprotocolId::D::GetProtocolVersion()
 ^
test.cpp:13:19: error: candidate is: CprotocolId::ProtocolVersion
CprotocolId::D::GetProtocolVersion()
   ProtocolVersion GetProtocolVersion();


The code used to work for ages, is compilable with MSVC, clang and gcc on
various platforms, was compilable with 4.8.1 but broke with 4.8.2. The issue is
with string template parameter; replacing 
typedef const char *const  ProtocolIdType;
by 
typedef int ProtocolIdType;
makes the error go away.


 g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.8.2-15'
--with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.8 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls
--with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap
--enable-plugin --with-system-zlib --disable-browser-plugin
--enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.2 (Debian 4.8.2-15)

[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887

2014-02-19 Thread slayoo at staszic dot waw.pl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267

--- Comment #8 from Sylwester Arabas slayoo at staszic dot waw.pl ---
BTW, I have initially reported it as a comment to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60198 (the same file/line in ICE
error message).

S.


[Bug c++/60274] [4.8/4.9 Regression] String as template parameter - regression in 4.8.3

2014-02-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60274

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
  Known to work||4.8.2
   Keywords||rejects-valid
   Last reconfirmed||2014-02-19
 Ever confirmed|0   |1
Summary|String as template  |[4.8/4.9 Regression] String
   |parameter - regression in   |as template parameter -
   |4.8.2   |regression in 4.8.3
   Target Milestone|--- |4.8.3
  Known to fail||4.8.3, 4.9.0

--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org ---
Actually 4.8.2 works but the top of the branch doesn't.


[Bug sanitizer/60275] New: [UBSAN] Add -f[no-]sanitize-recover/-fsanitize-undefined-trap-on-error to make UBSAN's runtime errors fatal

2014-02-19 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60275

Bug ID: 60275
   Summary: [UBSAN] Add
-f[no-]sanitize-recover/-fsanitize-undefined-trap-on-e
rror to make UBSAN's runtime errors fatal
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org,
mpolacek at gcc dot gnu.org

While I personally would like to see more fine tuning via UBSAN_FLAGS - similar
to ASAN, LSAN and TSAN, adding CLANG's -fsanitize-recover/-fno-sanitize-recover
and  -fsanitize-undefined-trap-on-error would be useful as additional feature.

From CLANG:

   Extra features of UndefinedBehaviorSanitizer:

   -  ``-fno-sanitize-recover``: By default, after a sanitizer diagnoses
  an issue, it will attempt to continue executing the program if there
  is a reasonable behavior it can give to the faulting operation. This
  option causes the program to abort instead.
   -  ``-fsanitize-undefined-trap-on-error``: Causes traps to be emitted
  rather than calls to runtime libraries when a problem is detected.
  This option is intended for use in cases where the sanitizer runtime
  cannot be used (for instance, when building libc or a kernel module).
  This is only compatible with the sanitizers in the ``undefined-trap``
  group.

That would be BUILT_IN_UNREACHABLE and BUILT_IN_TRAP. (But unreachable
shouldn't be dressed by SANITIZE_UNREACHABLE ;-)

See also LLVM's
* tools/clang/docs/UsersManual.rst
* tools/clang/lib/CodeGen/CGExpr.cpp (search for SanitizeUndefinedTrapOnError
and SanitizeRecover)


[Bug c/59933] for loop goes wild with assert() enabled

2014-02-19 Thread warnerme at ptd dot net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933

--- Comment #6 from Mark Warner warnerme at ptd dot net ---
If it is invalid, why does -Wall not trigger anything ?


[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build

2014-02-19 Thread trippels at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266

--- Comment #2 from Markus Trippelsdorf trippels at gcc dot gnu.org ---
It's caused by mixing -O0 and -O2 with LTO:

markus@x4 ~S % cat TableCopyHelper.ii
namespace com {
namespace sun {
namespace star {}
}
}
namespace css = com::sun::star;
namespace com {
namespace sun {
namespace star {
class A {};
template class interface_type class C : A {
public:
  interface_type *operator-();
};
}
}
typedef struct {
} uno_Any;
namespace sun {
namespace star {
class D : uno_Any {};
class B {
  virtual css::D m_fn1();
  virtual void m_fn2();
  virtual void m_fn3();
};
class F : css::B {
  virtual int m_fn4();
};
namespace sdb {
namespace application {
class XCopyTableWizard : css::F {
public:
  virtual void m_fn5();
  virtual void m_fn6();
};
}
}
}
}
using namespace com::sun::star;
using namespace com::sun::star::sdb::application;
void fn1(Cint ) try {
  CXCopyTableWizard a;
  a-m_fn6();
}
catch (int ) {
}
}


markus@x4 ~S % cat copytablewizard.ii
namespace com {
namespace sun {
namespace star {
class A {};
namespace sdb {
namespace application {
class XCopyTableWizard {
  virtual int m_fn1();
};
}
}
}
}
}
class OPropertyArrayUsageHelper {
public:
  virtual ~OPropertyArrayUsageHelper();
};
using com::sun::star::A;
using com::sun::star::sdb::application::XCopyTableWizard;
class CopyTableWizard : XCopyTableWizard, OPropertyArrayUsageHelper {
  ~CopyTableWizard();
};
CopyTableWizard::~CopyTableWizard() try {}

catch (A ) {
}


markus@x4 ~S % g++ -flto -fPIC -O0 -c copytablewizard.ii
markus@x4 ~S % g++ -flto -fPIC -std=gnu++11 -O2 -c TableCopyHelper.ii
markus@x4 ~S % g++ -w -r -nostdlib -O2 TableCopyHelper.o copytablewizard.o
lto1: internal compiler error: in ipa_get_parm_lattices, at ipa-cp.c:261
0x50c7ac ipa_get_parm_lattices
../../gcc/gcc/ipa-cp.c:261
0xc41824 ipa_get_parm_lattices
../../gcc/gcc/ipa-cp.c:261
0xc41824 propagate_constants_accross_call
../../gcc/gcc/ipa-cp.c:1443
0xc44308 propagate_constants_topo
../../gcc/gcc/ipa-cp.c:2231
0xc44308 ipcp_propagate_stage
../../gcc/gcc/ipa-cp.c:2327
0xc44308 ipcp_driver
../../gcc/gcc/ipa-cp.c:3705
0xc44308 execute
../../gcc/gcc/ipa-cp.c:3804
Please submit a full bug report,
with preprocessed source if appropriate.

(I think the ODR violation got introduced during reduction.)


[Bug sanitizer/60275] [UBSAN] Add -f[no-]sanitize-recover/-fsanitize-undefined-trap-on-error to make UBSAN's runtime errors fatal

2014-02-19 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60275

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2014-02-19
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org ---
Mine.  I think this is 5.0 material.


[Bug other/50925] [4.7/4.8/4.9 Regression][avr] ICE at spill_failure, at reload1.c:2118

2014-02-19 Thread amylaar at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50925

Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org changed:

   What|Removed |Added

 CC||amylaar at gcc dot gnu.org

--- Comment #28 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org ---
I can't reproduce this with the current trunk.  Can was mark this
as known to work for 4.9 ?


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #9 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Wed Feb 19 14:25:47 2014
New Revision: 207899

URL: http://gcc.gnu.org/viewcvs?rev=207899root=gccview=rev
Log:
2014-02-19  Richard Biener  rguent...@suse.de

PR ipa/60243
* tree-inline.c (estimate_num_insns): Avoid calling cgraph_get_node
for all calls.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-inline.c


[Bug c/59933] for loop goes wild with assert() enabled

2014-02-19 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933

--- Comment #7 from Marek Polacek mpolacek at gcc dot gnu.org ---
(int)(sizeof(NSQ_del_dec_struct) / sizeof(opus_int32) seems to be 1168/4 = 292,
but sLPC_Q14 has only 112 elements.


[Bug rtl-optimization/60155] ICE: in get_pressure_class_and_nregs at gcse.c:3438

2014-02-19 Thread danglin at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60155

John David Anglin danglin at gcc dot gnu.org changed:

   What|Removed |Added

 CC||mikulas at artax dot 
karlin.mff.cu
   ||ni.cz

--- Comment #5 from John David Anglin danglin at gcc dot gnu.org ---
*** Bug 54737 has been marked as a duplicate of this bug. ***


[Bug rtl-optimization/54737] ICE on PA-RISC with LTO and -ftrapv

2014-02-19 Thread danglin at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54737

John David Anglin danglin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||danglin at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #2 from John David Anglin danglin at gcc dot gnu.org ---
There is a patch in 60155 that probably fixes this PR.

*** This bug has been marked as a duplicate of bug 60155 ***


[Bug target/57232] wcstol.c:213:1: internal compiler error

2014-02-19 Thread gcc at jaseg dot net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57232

--- Comment #16 from Sebastian Götte gcc at jaseg dot net ---
Alexandre, curiously, applying this patch to the cross-compiler source tree
fixes the problem for me building 4.8.2 for rx-elf using a 4.8.2 x86_64 host
gcc. I did not even have to rebuild the host gcc.

[Bug target/59799] aarch64_pass_by_reference never passes arrays by value, contrary to ABI documentation

2014-02-19 Thread yroux at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59799

--- Comment #9 from yroux at gcc dot gnu.org ---
Author: yroux
Date: Wed Feb 19 15:32:54 2014
New Revision: 207908

URL: http://gcc.gnu.org/viewcvs?rev=207908root=gccview=rev
Log:
2014-02-19  Michael Hudson-Doyle  michael.hud...@linaro.org

 PR target/59799
* config/aarch64/aarch64.c (aarch64_pass_by_reference): The rules for
passing arrays in registers are the same as for structs, so remove the
special case for them.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c


[Bug target/57232] wcstol.c:213:1: internal compiler error

2014-02-19 Thread nickc at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57232

--- Comment #17 from Nick Clifton nickc at redhat dot com ---
Hi Alex,

   if (reg != hard_frame_pointer_rtx  fixed_regs[REGNO (reg)])
  cselib_preserve_cfa_base_value (val, REGNO (reg));

This works for the RX port - thanks!

Cheers
   Nick


[Bug c++/60274] [4.8/4.9 Regression] String as template parameter - regression in 4.8.3

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60274

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
Started with r207167.


[Bug c++/60064] [c++1y] ICE with auto as parameter of friend function

2014-02-19 Thread reichelt at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60064

Volker Reichelt reichelt at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |4.9.0

--- Comment #3 from Volker Reichelt reichelt at gcc dot gnu.org ---
Fixed with Adam's patch.


[Bug debug/56563] no debuginfo for explicit operator

2014-02-19 Thread mark at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56563

--- Comment #2 from Mark Wielaard mark at gcc dot gnu.org ---
Jakub proposed a patch:
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01166.html


[Bug c/59933] for loop goes wild with assert() enabled

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org ---
The code is not invalid C, just triggers undefined behavior, so it is not
invalid at compile time, just at runtime if you ever hit this.
GCC optimizes based on the assumption that undefined behavior doesn't happen in
a correct program.
While we have -Waggressive-loop-optimizations warning, it (intentionally) warns
solely about the case where the loop has single exit and constant loop
iteration count, which is not the case here, the number of iterations is
i = 292 ? 0 : 292 - i.
The loop will trigger undefined behavior whenever i is  292, if it is bigger,
then there is no bug.


[Bug fortran/51976] [F2003] Support deferred-length character components of derived types (allocatable string length)

2014-02-19 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51976

--- Comment #13 from janus at gcc dot gnu.org ---
The latest patch posted at

http://gcc.gnu.org/ml/fortran/2014-02/msg00109.html

works smoothly on the test case in comment 12.


[Bug target/59794] [4.7/4.8 Regression] i386 backend fails to detect MMX/SSE/AVX ABI changes

2014-02-19 Thread uros at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794

--- Comment #17 from uros at gcc dot gnu.org ---
Author: uros
Date: Wed Feb 19 15:53:59 2014
New Revision: 207910

URL: http://gcc.gnu.org/viewcvs?rev=207910root=gccview=rev
Log:
PR target/59794
* config/i386/i386.c (type_natural_mode): Warn for ABI changes
only when -Wpsabi is enabled.

testsuite/ChangeLog:

PR target/59794
* gcc.target/i386/pr39162.c: Add dg-prune-output.
(dg-options): Remove -Wno-psabi.
* gcc.target/i386/59794-2.c: Ditto.
* gcc.target/i386/60205-1.c: Ditto.
* gcc.target/i386/sse-5.c: Ditto.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/pr39162.c
trunk/gcc/testsuite/gcc.target/i386/pr59794-2.c
trunk/gcc/testsuite/gcc.target/i386/pr59794-3.c
trunk/gcc/testsuite/gcc.target/i386/pr60205-1.c
trunk/gcc/testsuite/gcc.target/i386/sse-5.c


[Bug c++/60251] [4.9 Regression] [c++11] ICE capturing variable-length array

2014-02-19 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60251

--- Comment #2 from Paolo Carlini paolo.carlini at oracle dot com ---
Not sure this is valid. Anyway, the ICE is due to the COMPONENT_REF being
wrapped in a NOP_EXPR.


[Bug target/59797] GCC doesn't warn AVX-512 ABI change

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59797

Uroš Bizjak ubizjak at gmail dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Uroš Bizjak ubizjak at gmail dot com ---
Dup.

*** This bug has been marked as a duplicate of bug 60205 ***

[Bug target/60205] No ABI warning for AVX-512

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60205

--- Comment #5 from Uroš Bizjak ubizjak at gmail dot com ---
*** Bug 59797 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/57320] [4.9 Regression] Shrink-wrapping leaves unreachable blocks in the CFG

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57320

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
This has been fixed by r204211 on the trunk, any reason to keep this PR open?

Note that Steven's patch has been approved, but never committed:
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01020.html


[Bug c/59933] for loop goes wild with assert() enabled

2014-02-19 Thread ian at g0tcd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933

--- Comment #9 from Ian Hamilton ian at g0tcd dot com ---
Yes, that's all proper and correct. The invalid C code induces undefined
behaviour. I don't think anyone is disputing that.

However, to be pragmatic for a moment, the experience of thousands of
developers out there, working with legacy code, and trying to update their
toolset to include gcc 4.8 is that code which compiled without warnings and
worked with the old gcc compiler now still compiles without warnings, but fails
at runtime with the 4.8 series compiler.

Sometimes, the runtime failures are occasional and difficult to track down if
(for example) it lies on an error handling path. This makes it even harder for
these developers to figure out what's going on.

If the compiler could provide a warning when it encounters this sort of invalid
code, that would be a good thing, as it would highlight the old latent bugs and
give developers the opportunity to fix them.

However, it doesn't, so the developers working on legacy code really have no
alternative to either using the -fno-aggressive-loop-optimizations switch to
stabilse their legacy code (even assuming they understand what's happening), or
sticking with the old version of the compiler.

So I think the request to the gcc developers is to find a way of providing a
compiler warning when the loop optimiser encounters problem code, to give
developers a fighting chance of debugging their legacy code.


[Bug c/59933] for loop goes wild with assert() enabled

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933

--- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org ---
We have -fsanitize=undefined which can catch some issues, though the array
bounds instrumentation (nor __builtin_object_size based instrumentation) has
not been added yet for GCC 4.9, will be hopefully there in the next release.
As for warnings, even the current -Waggressive-loop-optimizationsh warning (for
the const number of iterations, single loop exit easy case where you know that
if the loop is reachable, if there is undefined behavior in some loop
iteration, you will trigger it) still has occassional false positives (various
PRs about that, usually the issue is that while it is true there is such a
loop, the loop is actually dead), further warnings would have huge false
positive rate, to the extent that it would be rarely useful.


[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267

--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org ---
Author: jakub
Date: Wed Feb 19 16:45:21 2014
New Revision: 207911

URL: http://gcc.gnu.org/viewcvs?rev=207911root=gccview=rev
Log:
PR c++/60267
* c-pragma.c (init_pragma): Don't call cpp_register_deferred_pragma
for PRAGMA_IVDEP if flag_preprocess_only.

* gcc.dg/pr60267.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/pr60267.c
Modified:
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-pragma.c
trunk/gcc/testsuite/ChangeLog


[Bug target/59794] [4.7/4.8 Regression] i386 backend fails to detect MMX/SSE/AVX ABI changes

2014-02-19 Thread uros at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794

--- Comment #18 from uros at gcc dot gnu.org ---
Author: uros
Date: Wed Feb 19 16:50:22 2014
New Revision: 207912

URL: http://gcc.gnu.org/viewcvs?rev=207912root=gccview=rev
Log:
Backport from mainline
2014-02-19  Uros Bizjak  ubiz...@gmail.com

PR target/59794
* config/i386/i386.c (type_natural_mode): Warn for ABI changes
only when -Wpsabi is enabled.

testsuite/ChangeLog:

Backport from mainline
2014-02-19  Uros Bizjak  ubiz...@gmail.com

PR target/59794
* gcc.target/i386/pr39162.c: Add dg-prune-output.
(dg-options): Remove -Wno-psabi.
* gcc.target/i386/pr59794-2.c: Ditto.
* gcc.target/i386/sse-5.c: Ditto.


Added:
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-1.c
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-2.c
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-3.c
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-4.c
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-5.c
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-6.c
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-7.c
Modified:
branches/gcc-4_8-branch/gcc/ChangeLog
branches/gcc-4_8-branch/gcc/config/i386/i386.c
branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr39162.c
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/sse-5.c


[Bug target/59797] GCC doesn't warn AVX-512 ABI change

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59797
Bug 59797 depends on bug 59794, which changed state.

Bug 59794 Summary: [4.7/4.8 Regression] i386 backend fails to detect 
MMX/SSE/AVX ABI changes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED


[Bug target/59794] [4.7/4.8 Regression] i386 backend fails to detect MMX/SSE/AVX ABI changes

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794

Uroš Bizjak ubizjak at gmail dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|4.7.4   |4.8.3

--- Comment #19 from Uroš Bizjak ubizjak at gmail dot com ---
This is now implemented for 4.8.3 and 4.9, including -Wpsabi option.

No plan to backport these patches to 4.7.x.

FIXED for 4.8.3+.

[Bug other/50925] [4.7/4.8/4.9 Regression][avr] ICE at spill_failure, at reload1.c:2118

2014-02-19 Thread corsepiu at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50925

Ralf Corsepius corsepiu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||corsepiu at gcc dot gnu.org

--- Comment #29 from Ralf Corsepius corsepiu at gcc dot gnu.org ---
(In reply to Jorn Wolfgang Rennecke from comment #28)
 I can't reproduce this with the current trunk.
Confirmed. gcc-4.9 doesn't show this bug for --target=avr-rtems4.11, anymore.

  Can was mark this
 as known to work for 4.9 ?
I am inclined to agree.


[Bug c/37743] Bogus printf format warning with __builtin_bswap32.

2014-02-19 Thread joseph at codesourcery dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37743

--- Comment #11 from joseph at codesourcery dot com joseph at codesourcery dot 
com ---
Yes, we could do something like that (but I also think it's time to put 
the targets without this type information on the deprecation list and warn 
their maintainers that the target support will be removed in the absence 
of this information being added soon).


[Bug target/57896] [4.8 Regression] ICE in expand_expr_real_2

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896

--- Comment #9 from Uroš Bizjak ubizjak at gmail dot com ---
(In reply to Vittorio Zecca from comment #6)

 As an aside, in gcc 4.8.1 source code, before line 6995 of gcc/expr.c I put
  
 printf(\nexpr.c:6995 value-code=%d NUM_RTX_CODE=%d\n,(int)
 value-code,NUM_RTX_CODE);
 gcc_assert((int) value-code  NUM_RTX_CODE);
 
 and I get an ICE there because value-code is 34816 and NUM_RTX_CODE is 145
 
 Indeed at line 6995 ARITHMETIC_P (value) accesses rtx_class[(int)
 value-code]
 but the array rtx_class has only NUM_RTX_CODE elements.
 However, I do not know how this is relevant to this issue.

This one points to infrastructure problem.

Adding a debug patch:

--cut here--
Index: explow.c
===
--- explow.c(revision 207910)
+++ explow.c(working copy)
@@ -186,8 +186,13 @@ plus_constant (enum machine_mode mode, rtx x, HOST
 }

   if (c != 0)
-x = gen_rtx_PLUS (mode, x, GEN_INT (c));
+{
+  rtx z = GEN_INT (c);
+  printf (cc, %li\n, c);
+  debug_rtx (z);

+  x = gen_rtx_PLUS (mode, x, z);
+}
   if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF)
 return x;
   else if (all_constant)
--cut here--

~/gcc-build-48/gcc/cc1 pr57896.c

...
 __get_cpuidcc, -4
(const_int -4 [0xfffc])
cc, -16
(const_int -16 [0xfff0])
cc, -24
(const_int -24 [0xffe8])
cc, -32
(const_int -32 [0xffe0])
cc, -40
(??? bad code 47104
)

pr57896.c: In function ‘__get_cpuid’:
pr57896.c:5:5: internal compiler error: in emit_move_insn_1, at expr.c:3437
 int __get_cpuid (unsigned int __level, unsigned int *__eax, unsigned int
*__ebx, unsigned int *__ecx, unsigned int *__edx) {
 ^
0x62d74d emit_move_insn_1(rtx_def*, rtx_def*)
/home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:3437
0x62d7b5 emit_move_insn(rtx_def*, rtx_def*)
/home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:3535

Please note that the debug patch only encloses GEN_INT (...)

[Bug target/57896] [4.8 Regression] ICE in expand_expr_real_2

2014-02-19 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896

--- Comment #10 from Uroš Bizjak ubizjak at gmail dot com ---
(In reply to Vittorio Zecca from comment #5)

 Adding option -m32 I get ICE in ix86_expand_prologue, at
 config/i386/i386.c:10559

I can confirm this with:

gcc version 4.8.3 20140219 (prerelease) [gcc-4_8-branch revision 207910] (GCC)

~/gcc-build-48/gcc/cc1 -m32 -march=x86-64 pr57896.c

pr57896.c: In function ‘__get_cpuid_max’:
pr57896.c:4:1: internal compiler error: in ix86_expand_prologue, at
config/i386/i386.c:10539
 }
 ^
0x98fdf5 ix86_expand_prologue()
/home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/config/i386/i386.c:10539
0xa1ce5a gen_prologue()
   
/home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/config/i386/i386.md:11829
0x673927 thread_prologue_and_epilogue_insns
/home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/function.c:5949
0x673927 rest_of_handle_thread_prologue_and_epilogue
/home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/function.c:6973

The output from the debug patch doesn't look suspicious, so this looks like a
real bug to me. Can someone please bisect which commit fixed/hid this bug?

[Bug c/37743] Bogus printf format warning with __builtin_bswap32.

2014-02-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37743

--- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 32173
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32173action=edit
gcc49-pr37743.patch

Untested fix.  The deprecation can hopefully be done separately.


[Bug target/60207] Wrong TFmode check in construct_container

2014-02-19 Thread hjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60207

--- Comment #2 from hjl at gcc dot gnu.org hjl at gcc dot gnu.org ---
Author: hjl
Date: Wed Feb 19 18:10:04 2014
New Revision: 207913

URL: http://gcc.gnu.org/viewcvs?rev=207913root=gccview=rev
Log:
Remove TFmode check for X86_64_INTEGER_CLASS

PR target/60207
* config/i386/i386.c (construct_container): Remove TFmode check
for X86_64_INTEGER_CLASS.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c


  1   2   3   >