Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-19 Thread Joseph Myers
On Mon, 19 Sep 2016, Thomas Schwinge wrote:

> > The question is whether such a complex type could be a global tree which I
> > don't think it could.
> 
> Specifically, my question was whether for every complex type that is part
> of the global trees, it holds that the complex type's component type also
> is part of the global trees?

That should definitely be the case (but there might be (a) complex types 
that aren't part of those trees, whose components aren't part either, and 
(b) non-complex types that are part of the global trees, whose 
corresponding complex types exist but aren't part of the global trees).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-19 Thread Richard Biener
On Mon, Sep 19, 2016 at 1:19 PM, Thomas Schwinge
 wrote:
> Hi!
>
> On Mon, 19 Sep 2016 10:18:35 +0200, Richard Biener 
>  wrote:
>> On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge
>>  wrote:
>> > --- gcc/tree-core.h
>> > +++ gcc/tree-core.h
>> > @@ -553,20 +553,6 @@ enum tree_index {
>> >TI_BOOLEAN_FALSE,
>> >TI_BOOLEAN_TRUE,
>> >
>> > -  TI_COMPLEX_INTEGER_TYPE,
>> > -[...]
>> > -  TI_COMPLEX_FLOAT128X_TYPE,
>> > -
>> >TI_FLOAT_TYPE,
>> >TI_DOUBLE_TYPE,
>> >TI_LONG_DOUBLE_TYPE,
>> > @@ -596,6 +582,23 @@ enum tree_index {
>> >  - TI_FLOATN_NX_TYPE_FIRST  \
>> >  + 1)
>> >
>> > +  /* Put the complex types after their component types, so that in 
>> > (sequential)
>> > + tree streaming we can assert that their component types have already 
>> > been
>> > + handled (see tree-streamer.c:record_common_node).  */
>> > +  TI_COMPLEX_INTEGER_TYPE,
>> > +  TI_COMPLEX_FLOAT_TYPE,
>> > +  TI_COMPLEX_DOUBLE_TYPE,
>> > +  TI_COMPLEX_LONG_DOUBLE_TYPE,
>> > +
>> > +  TI_COMPLEX_FLOAT16_TYPE,
>> > +  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
>> > +  TI_COMPLEX_FLOAT32_TYPE,
>> > +  TI_COMPLEX_FLOAT64_TYPE,
>> > +  TI_COMPLEX_FLOAT128_TYPE,
>> > +  TI_COMPLEX_FLOAT32X_TYPE,
>> > +  TI_COMPLEX_FLOAT64X_TYPE,
>> > +  TI_COMPLEX_FLOAT128X_TYPE,
>> > +
>> >TI_FLOAT_PTR_TYPE,
>> >TI_DOUBLE_PTR_TYPE,
>> >TI_LONG_DOUBLE_PTR_TYPE,
>>
>> If the above change alone fixes your issues then it is fine to commit.
>
> That alone won't fix the problem, because we'd still have the recursion
> in gcc/tree-streamer.c:record_common_node done differently for x86_64
> target and nvptx offload target.

Doh - obviously.

>> > --- gcc/tree-streamer.c
>> > +++ gcc/tree-streamer.c
>> > @@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d 
>> > *cache, tree node)
>> >streamer_tree_cache_append (cache, node, cache->nodes.length ());
>> >
>> >if (POINTER_TYPE_P (node)
>> > -  || TREE_CODE (node) == COMPLEX_TYPE
>> >|| TREE_CODE (node) == ARRAY_TYPE)
>> >  record_common_node (cache, TREE_TYPE (node));
>> > +  else if (TREE_CODE (node) == COMPLEX_TYPE)
>> > +{
>> > +  /* Assert that complex types' component types have already been 
>> > handled
>> > +(and we thus don't need to recurse here).  See PR lto/77458.  */
>> > +  if (cache->node_map)
>> > +   gcc_assert (streamer_tree_cache_lookup (cache, TREE_TYPE (node), 
>> > NULL));
>> > +  else
>> > +   {
>> > + gcc_assert (cache->nodes.exists ());
>> > + bool found = false;
>> > + for (unsigned i = 0; !found && i < cache->nodes.length (); ++i)
>> > +   found = true;
>>
>> hmm, this doesn't actually test anything? ;)
>
> ;-) Haha, hooray for patch review!
>
>> > + gcc_assert (found);
>> > +   }
>> > +}
>> >else if (TREE_CODE (node) == RECORD_TYPE)
>> >  {
>> >/* The FIELD_DECLs of structures should be shared, so that every
>
>> So I very much like to go forward with this kind of change as well
>
> OK, good.  So, in plain text, we'll make it a requirement that:
> integer_types trees must only refer to earlier integer_types trees;
> sizetype_tab trees must only refer to integer_types trees, and earlier
> sizetype_tab trees; and global_trees must only refer to integer_types
> trees, sizetype_tab trees, and earlier global_trees.

Yeah, though I'd put sizetypes first.

>> (the assert code
>> should go to a separate helper function).
>
> Should this checking be done only in
> gcc/tree-streamer.c:record_common_node, or should generally

Yes.

> gcc/tree-streamer.c:streamer_tree_cache_append check/assert that such
> recursive trees are already present in the cache?  And generally do that,
> or "if (flag_checking)" only?

I think we should restrict it to flag_checking only because in general violating
it is harmless plus we never know what happens on all targets/frontend/flag(!)
combinations.

>
>> Did you try it on more than just
>> the complex type case?
>
> Not yet, but now that you have approved the general concept, I'll look
> into that.
>
> Here's the current patch with the assertion condition fixed, but still
> for complex types only.  OK for trunk already?

Ok with the checking blob moved to a helper function,

  bool common_node_recorded_p (cache, node)

and its body guarded with if (flag_checking).

[looks to me we miss handling of vector type components alltogether,
maybe there are no global vector type trees ...]

Thanks,
Richard.

> --- gcc/tree-core.h
> +++ gcc/tree-core.h
> @@ -553,20 +553,6 @@ enum tree_index {
>TI_BOOLEAN_FALSE,
>TI_BOOLEAN_TRUE,
>
> -  TI_COMPLEX_INTEGER_TYPE,
> -  TI_COMPLEX_FLOAT_TYPE,
> -  TI_COMPLEX_DOUBLE_TYPE,
> -  TI_COMPLEX_LONG_DOUBLE_TYPE,
> -
> -  TI_COMPLEX_FLOAT16_TYPE,
> -  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = 

Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-19 Thread Thomas Schwinge
Hi!

On Mon, 19 Sep 2016 10:18:35 +0200, Richard Biener  
wrote:
> On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge
>  wrote:
> > --- gcc/tree-core.h
> > +++ gcc/tree-core.h
> > @@ -553,20 +553,6 @@ enum tree_index {
> >TI_BOOLEAN_FALSE,
> >TI_BOOLEAN_TRUE,
> >
> > -  TI_COMPLEX_INTEGER_TYPE,
> > -[...]
> > -  TI_COMPLEX_FLOAT128X_TYPE,
> > -
> >TI_FLOAT_TYPE,
> >TI_DOUBLE_TYPE,
> >TI_LONG_DOUBLE_TYPE,
> > @@ -596,6 +582,23 @@ enum tree_index {
> >  - TI_FLOATN_NX_TYPE_FIRST  \
> >  + 1)
> >
> > +  /* Put the complex types after their component types, so that in 
> > (sequential)
> > + tree streaming we can assert that their component types have already 
> > been
> > + handled (see tree-streamer.c:record_common_node).  */
> > +  TI_COMPLEX_INTEGER_TYPE,
> > +  TI_COMPLEX_FLOAT_TYPE,
> > +  TI_COMPLEX_DOUBLE_TYPE,
> > +  TI_COMPLEX_LONG_DOUBLE_TYPE,
> > +
> > +  TI_COMPLEX_FLOAT16_TYPE,
> > +  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
> > +  TI_COMPLEX_FLOAT32_TYPE,
> > +  TI_COMPLEX_FLOAT64_TYPE,
> > +  TI_COMPLEX_FLOAT128_TYPE,
> > +  TI_COMPLEX_FLOAT32X_TYPE,
> > +  TI_COMPLEX_FLOAT64X_TYPE,
> > +  TI_COMPLEX_FLOAT128X_TYPE,
> > +
> >TI_FLOAT_PTR_TYPE,
> >TI_DOUBLE_PTR_TYPE,
> >TI_LONG_DOUBLE_PTR_TYPE,
> 
> If the above change alone fixes your issues then it is fine to commit.

That alone won't fix the problem, because we'd still have the recursion
in gcc/tree-streamer.c:record_common_node done differently for x86_64
target and nvptx offload target.

> > --- gcc/tree-streamer.c
> > +++ gcc/tree-streamer.c
> > @@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d 
> > *cache, tree node)
> >streamer_tree_cache_append (cache, node, cache->nodes.length ());
> >
> >if (POINTER_TYPE_P (node)
> > -  || TREE_CODE (node) == COMPLEX_TYPE
> >|| TREE_CODE (node) == ARRAY_TYPE)
> >  record_common_node (cache, TREE_TYPE (node));
> > +  else if (TREE_CODE (node) == COMPLEX_TYPE)
> > +{
> > +  /* Assert that complex types' component types have already been 
> > handled
> > +(and we thus don't need to recurse here).  See PR lto/77458.  */
> > +  if (cache->node_map)
> > +   gcc_assert (streamer_tree_cache_lookup (cache, TREE_TYPE (node), 
> > NULL));
> > +  else
> > +   {
> > + gcc_assert (cache->nodes.exists ());
> > + bool found = false;
> > + for (unsigned i = 0; !found && i < cache->nodes.length (); ++i)
> > +   found = true;
> 
> hmm, this doesn't actually test anything? ;)

;-) Haha, hooray for patch review!

> > + gcc_assert (found);
> > +   }
> > +}
> >else if (TREE_CODE (node) == RECORD_TYPE)
> >  {
> >/* The FIELD_DECLs of structures should be shared, so that every

> So I very much like to go forward with this kind of change as well

OK, good.  So, in plain text, we'll make it a requirement that:
integer_types trees must only refer to earlier integer_types trees;
sizetype_tab trees must only refer to integer_types trees, and earlier
sizetype_tab trees; and global_trees must only refer to integer_types
trees, sizetype_tab trees, and earlier global_trees.

> (the assert code
> should go to a separate helper function).

Should this checking be done only in
gcc/tree-streamer.c:record_common_node, or should generally
gcc/tree-streamer.c:streamer_tree_cache_append check/assert that such
recursive trees are already present in the cache?  And generally do that,
or "if (flag_checking)" only?

> Did you try it on more than just
> the complex type case?

Not yet, but now that you have approved the general concept, I'll look
into that.

Here's the current patch with the assertion condition fixed, but still
for complex types only.  OK for trunk already?

--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -553,20 +553,6 @@ enum tree_index {
   TI_BOOLEAN_FALSE,
   TI_BOOLEAN_TRUE,
 
-  TI_COMPLEX_INTEGER_TYPE,
-  TI_COMPLEX_FLOAT_TYPE,
-  TI_COMPLEX_DOUBLE_TYPE,
-  TI_COMPLEX_LONG_DOUBLE_TYPE,
-
-  TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOAT32_TYPE,
-  TI_COMPLEX_FLOAT64_TYPE,
-  TI_COMPLEX_FLOAT128_TYPE,
-  TI_COMPLEX_FLOAT32X_TYPE,
-  TI_COMPLEX_FLOAT64X_TYPE,
-  TI_COMPLEX_FLOAT128X_TYPE,
-
   TI_FLOAT_TYPE,
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
@@ -596,6 +582,23 @@ enum tree_index {
 - TI_FLOATN_NX_TYPE_FIRST  \
 + 1)
 
+  /* Put the complex types after their component types, so that in (sequential)
+ tree streaming we can assert that their component types have already been
+ handled (see tree-streamer.c:record_common_node).  */
+  TI_COMPLEX_INTEGER_TYPE,
+  TI_COMPLEX_FLOAT_TYPE,
+  TI_COMPLEX_DOUBLE_TYPE,
+  TI_COMPLEX_LONG_DOUBLE_TYPE,
+
+  TI_COMPLEX_FLOAT16_TYPE,
+  

Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-19 Thread Thomas Schwinge
Hi!

On Mon, 19 Sep 2016 10:12:48 +0200, Richard Biener  
wrote:
> On Fri, Sep 16, 2016 at 7:07 PM, Joseph Myers  wrote:
> > On Fri, 16 Sep 2016, Thomas Schwinge wrote:
> >
> >> That's what I was afraid of: for example, I can't tell if it holds for
> >> all GCC configurations (back ends), that complex types' component types
> >> will always match one of the already existing global trees?  (I can
> >
> > Well, a component type could certainly match a target-specific type
> > instead (e.g. __ibm128 on powerpc, which if it's not long double won't be
> > any of the other types either).  That's a type registered with
> > lang_hooks.types.register_builtin_type, not one of the global trees.
> > (You can't write _Complex __ibm128, but can get such a type with _Complex
> > float __attribute__ ((__mode__ (__IC__))).  Or similarly, with ARM __fp16,
> > the complex type _Complex float __attribute__ ((__mode__ (__HC__))).)
> 
> The question is whether such a complex type could be a global tree which I
> don't think it could.

Specifically, my question was whether for every complex type that is part
of the global trees, it holds that the complex type's component type also
is part of the global trees?


Grüße
 Thomas


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-19 Thread Richard Biener
On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge
 wrote:
> Hi!
>
> On Fri, 16 Sep 2016 10:59:16 +0200, Richard Biener 
>  wrote:
>> On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge
>>  wrote:
>> > On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
>> >> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener 
>> >>  wrote:
>> >> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge 
>> >> >  wrote:
>> >> > > As I noted in :
>> >> > >
>> >> > > As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx 
>> >> > > types", nvptx
>> >> > > offloading is broken, ICEs in LTO stream-in.  Probably some kind 
>> >> > > of data-type
>> >> > > mismatch that is not visible with Intel MIC offloading (using the 
>> >> > > same data
>> >> > > types) but explodes with nvptx.  I'm having a look.
>> >
>> >> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge 
>> >> > by
>> >> > frontend (!) it wasn't even designed to cope with global trees being 
>> >> > present
>> >> > or not dependent on target (well, because the target is always the
>> >> > same! mind you!)
>> >>
>> >> Scary.  ;-/
>> >>
>> >> > Now -- in theory it should deal with NULLs just fine (resulting in
>> >> > error_mark_node), but it can diverge when there are additional
>> >> > compount types (like vectors, complex
>> >> > or array or record types) whose element types are not in the set of
>> >> > global trees.
>> >> > The complex _FloatN types would be such a case given they appear before 
>> >> > their
>> >> > components.  That mixes up the ordering at least.
>> >>
>> >> ACK, but that's also an issue for "regular" float/complex float, which
>> >> also is in "reverse" order.  But that's "fixed" by the recursion in
>> >> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
>> >> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.
>> >
>> > So, that mechanism does work, but what's going wrong is the following:
>> > with differing target vs. offload target, we potentially (and actually
>> > do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
>> > _FloatNx types.  That is, for nvptx, a few of these don't exist (so,
>> > NULL_TREE), and so it follows their complex variants also don't exist,
>> > and thus the recursion that I just mentioned for complex types is no
>> > longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
>> > get an offset (of two, in this specific case), and consequently streaming
>> > explodes, for example, as soon as it hits a forward-reference (due to
>> > looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
>> > should be looking for tree 183 (nvptx lto1 view).
>> >
>> >> (I've
>> >> put "fixed" in quotes -- doesn't that recursion mean that we're thus
>> >> putting "complex float", "float", [...], "float" (again) into the cache?
>> >> Anyway that's for later...)
>> >
>> > Maybe it would make sense to do this tree streaming in two passes: first
>> > build a set of what we actually need, and then stream that, without
>> > duplicates.  (Or, is also for these "pickled" trees their order relevant,
>> > so that one tree may only refer to other earlier but not later ones?
>> > Anyway, we could still remember the set of trees already streamed, and
>> > avoid the double streaming I described?)
>> >
>> > So I now understand that due to the stream format, the integer tree IDs
>> > (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
>> > make that work for x86_64 target vs. nvptx offload target being
>> > different.  (I'm pondering some ideas about how to rework that integer
>> > tree ID generation.)
>> >
>> > (I have not digested completely yet what the implications are for the
>> > skipping we're doing for some trees in preload_common_nodes), but here is
>> > a first patch that at least gets us rid of the immediate problem.  (I'll
>> > fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
>> > gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
>> > now?
>>
>> Humm ... do we anywhere compare to those global trees by pointer equivalence?
>
> I have not verified that.  Does GCC permit/forbid that on a case-by-case
> basis -- which seems very fragile to me?
>
>> If so then it breaks LTO support for those types.
>
> OK, I think I understand that -- but I do have a "lto_stream_offload_p"
> conditional in my code changes, so these changes should only affect the
> offloading stream, in my understanding?
>
>> I think forcing proper ordering so that we can assert that at the
>> point we'd like
>> to recurse the nodes we recurse for are already in the cache would be a 
>> better
>> fix.
>
> ACK.  That's what I'd planned to look into as a next step.
>
>> This might need some additional global trees in case components do not
>> explicitely exist

Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-19 Thread Richard Biener
On Fri, Sep 16, 2016 at 7:07 PM, Joseph Myers  wrote:
> On Fri, 16 Sep 2016, Thomas Schwinge wrote:
>
>> That's what I was afraid of: for example, I can't tell if it holds for
>> all GCC configurations (back ends), that complex types' component types
>> will always match one of the already existing global trees?  (I can
>
> Well, a component type could certainly match a target-specific type
> instead (e.g. __ibm128 on powerpc, which if it's not long double won't be
> any of the other types either).  That's a type registered with
> lang_hooks.types.register_builtin_type, not one of the global trees.
> (You can't write _Complex __ibm128, but can get such a type with _Complex
> float __attribute__ ((__mode__ (__IC__))).  Or similarly, with ARM __fp16,
> the complex type _Complex float __attribute__ ((__mode__ (__HC__))).)

The question is whether such a complex type could be a global tree which I
don't think it could.

Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Joseph Myers
On Fri, 16 Sep 2016, Thomas Schwinge wrote:

> That's what I was afraid of: for example, I can't tell if it holds for
> all GCC configurations (back ends), that complex types' component types
> will always match one of the already existing global trees?  (I can

Well, a component type could certainly match a target-specific type 
instead (e.g. __ibm128 on powerpc, which if it's not long double won't be 
any of the other types either).  That's a type registered with 
lang_hooks.types.register_builtin_type, not one of the global trees.  
(You can't write _Complex __ibm128, but can get such a type with _Complex 
float __attribute__ ((__mode__ (__IC__))).  Or similarly, with ARM __fp16, 
the complex type _Complex float __attribute__ ((__mode__ (__HC__))).)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Joseph Myers
On Fri, 16 Sep 2016, Richard Biener wrote:

> Humm ... do we anywhere compare to those global trees by pointer equivalence?
> If so then it breaks LTO support for those types.

The C front end compares main variants to those types for handling usual 
arithmetic conversions (and more generally for type compatibility), but 
that's not relevant to LTO.  I don't think there should be such 
comparisons outside the front ends.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Thomas Schwinge
Hi!

On Fri, 16 Sep 2016 10:59:16 +0200, Richard Biener  
wrote:
> On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge
>  wrote:
> > On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
> >> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener 
> >>  wrote:
> >> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge 
> >> >  wrote:
> >> > > As I noted in :
> >> > >
> >> > > As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx 
> >> > > types", nvptx
> >> > > offloading is broken, ICEs in LTO stream-in.  Probably some kind 
> >> > > of data-type
> >> > > mismatch that is not visible with Intel MIC offloading (using the 
> >> > > same data
> >> > > types) but explodes with nvptx.  I'm having a look.
> >
> >> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge 
> >> > by
> >> > frontend (!) it wasn't even designed to cope with global trees being 
> >> > present
> >> > or not dependent on target (well, because the target is always the
> >> > same! mind you!)
> >>
> >> Scary.  ;-/
> >>
> >> > Now -- in theory it should deal with NULLs just fine (resulting in
> >> > error_mark_node), but it can diverge when there are additional
> >> > compount types (like vectors, complex
> >> > or array or record types) whose element types are not in the set of
> >> > global trees.
> >> > The complex _FloatN types would be such a case given they appear before 
> >> > their
> >> > components.  That mixes up the ordering at least.
> >>
> >> ACK, but that's also an issue for "regular" float/complex float, which
> >> also is in "reverse" order.  But that's "fixed" by the recursion in
> >> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
> >> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.
> >
> > So, that mechanism does work, but what's going wrong is the following:
> > with differing target vs. offload target, we potentially (and actually
> > do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
> > _FloatNx types.  That is, for nvptx, a few of these don't exist (so,
> > NULL_TREE), and so it follows their complex variants also don't exist,
> > and thus the recursion that I just mentioned for complex types is no
> > longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
> > get an offset (of two, in this specific case), and consequently streaming
> > explodes, for example, as soon as it hits a forward-reference (due to
> > looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
> > should be looking for tree 183 (nvptx lto1 view).
> >
> >> (I've
> >> put "fixed" in quotes -- doesn't that recursion mean that we're thus
> >> putting "complex float", "float", [...], "float" (again) into the cache?
> >> Anyway that's for later...)
> >
> > Maybe it would make sense to do this tree streaming in two passes: first
> > build a set of what we actually need, and then stream that, without
> > duplicates.  (Or, is also for these "pickled" trees their order relevant,
> > so that one tree may only refer to other earlier but not later ones?
> > Anyway, we could still remember the set of trees already streamed, and
> > avoid the double streaming I described?)
> >
> > So I now understand that due to the stream format, the integer tree IDs
> > (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
> > make that work for x86_64 target vs. nvptx offload target being
> > different.  (I'm pondering some ideas about how to rework that integer
> > tree ID generation.)
> >
> > (I have not digested completely yet what the implications are for the
> > skipping we're doing for some trees in preload_common_nodes), but here is
> > a first patch that at least gets us rid of the immediate problem.  (I'll
> > fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
> > gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
> > now?
> 
> Humm ... do we anywhere compare to those global trees by pointer equivalence?

I have not verified that.  Does GCC permit/forbid that on a case-by-case
basis -- which seems very fragile to me?

> If so then it breaks LTO support for those types.

OK, I think I understand that -- but I do have a "lto_stream_offload_p"
conditional in my code changes, so these changes should only affect the
offloading stream, in my understanding?

> I think forcing proper ordering so that we can assert that at the
> point we'd like
> to recurse the nodes we recurse for are already in the cache would be a better
> fix.

ACK.  That's what I'd planned to look into as a next step.

> This might need some additional global trees in case components do not
> explicitely exist

That's what I was afraid of: for example, I can't tell if it holds for
all GCC configurations (back ends), that complex types' component types
will always match one of the already existing 

Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge
 wrote:
> Hi!
>
> (CCing Bernd and Jakub -- for your information, or: "amusement" -- as
> you've discussed offloading preload_common_nodes issues before...)
>
> Got to look into this some more, yesterday:
>
> On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
>> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener 
>>  wrote:
>> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge  
>> > wrote:
>> > > On Fri, 19 Aug 2016 11:05:59 +, Joseph Myers 
>> > >  wrote:
>> > >> On Fri, 19 Aug 2016, Richard Biener wrote:
>> > >> > Can you quickly verify if LTO works with the new types?  I don't see 
>> > >> > anything
>> > >> > that would prevent it but having new global trees and backends 
>> > >> > initializing them
>> > >> > might come up with surprises (see 
>> > >> > tree-streamer.c:preload_common_nodes)
>> > >>
>> > >> Well, the execution tests are in gcc.dg/torture, which is run with 
>> > >> various
>> > >> options including -flto (and I've checked the testsuite logs to confirm
>> > >> these tests are indeed run with such options).  Is there something else
>> > >> you think should be tested?
>> > >
>> > > As I noted in :
>> > >
>> > > As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx 
>> > > types", nvptx
>> > > offloading is broken, ICEs in LTO stream-in.  Probably some kind of 
>> > > data-type
>> > > mismatch that is not visible with Intel MIC offloading (using the 
>> > > same data
>> > > types) but explodes with nvptx.  I'm having a look.
>
>> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge by
>> > frontend (!) it wasn't even designed to cope with global trees being 
>> > present
>> > or not dependent on target (well, because the target is always the
>> > same! mind you!)
>>
>> Scary.  ;-/
>>
>> > Now -- in theory it should deal with NULLs just fine (resulting in
>> > error_mark_node), but it can diverge when there are additional
>> > compount types (like vectors, complex
>> > or array or record types) whose element types are not in the set of
>> > global trees.
>> > The complex _FloatN types would be such a case given they appear before 
>> > their
>> > components.  That mixes up the ordering at least.
>>
>> ACK, but that's also an issue for "regular" float/complex float, which
>> also is in "reverse" order.  But that's "fixed" by the recursion in
>> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
>> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.
>
> So, that mechanism does work, but what's going wrong is the following:
> with differing target vs. offload target, we potentially (and actually
> do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
> _FloatNx types.  That is, for nvptx, a few of these don't exist (so,
> NULL_TREE), and so it follows their complex variants also don't exist,
> and thus the recursion that I just mentioned for complex types is no
> longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
> get an offset (of two, in this specific case), and consequently streaming
> explodes, for example, as soon as it hits a forward-reference (due to
> looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
> should be looking for tree 183 (nvptx lto1 view).
>
>> (I've
>> put "fixed" in quotes -- doesn't that recursion mean that we're thus
>> putting "complex float", "float", [...], "float" (again) into the cache?
>> Anyway that's for later...)
>
> Maybe it would make sense to do this tree streaming in two passes: first
> build a set of what we actually need, and then stream that, without
> duplicates.  (Or, is also for these "pickled" trees their order relevant,
> so that one tree may only refer to other earlier but not later ones?
> Anyway, we could still remember the set of trees already streamed, and
> avoid the double streaming I described?)
>
> So I now understand that due to the stream format, the integer tree IDs
> (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
> make that work for x86_64 target vs. nvptx offload target being
> different.  (I'm pondering some ideas about how to rework that integer
> tree ID generation.)
>
> (I have not digested completely yet what the implications are for the
> skipping we're doing for some trees in preload_common_nodes), but here is
> a first patch that at least gets us rid of the immediate problem.  (I'll
> fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
> gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
> now?

Humm ... do we anywhere compare to those global trees by pointer equivalence?
If so then it breaks LTO support for those types.

I think forcing proper ordering so that we can assert that at the
point we'd like
to recurse the nodes we recurse for are already in the 

[PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))

2016-09-16 Thread Thomas Schwinge
Hi!

(CCing Bernd and Jakub -- for your information, or: "amusement" -- as
you've discussed offloading preload_common_nodes issues before...)

Got to look into this some more, yesterday:

On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener 
>  wrote:
> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge  
> > wrote:
> > > On Fri, 19 Aug 2016 11:05:59 +, Joseph Myers 
> > >  wrote:
> > >> On Fri, 19 Aug 2016, Richard Biener wrote:
> > >> > Can you quickly verify if LTO works with the new types?  I don't see 
> > >> > anything
> > >> > that would prevent it but having new global trees and backends 
> > >> > initializing them
> > >> > might come up with surprises (see tree-streamer.c:preload_common_nodes)
> > >>
> > >> Well, the execution tests are in gcc.dg/torture, which is run with 
> > >> various
> > >> options including -flto (and I've checked the testsuite logs to confirm
> > >> these tests are indeed run with such options).  Is there something else
> > >> you think should be tested?
> > >
> > > As I noted in :
> > >
> > > As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx 
> > > types", nvptx
> > > offloading is broken, ICEs in LTO stream-in.  Probably some kind of 
> > > data-type
> > > mismatch that is not visible with Intel MIC offloading (using the 
> > > same data
> > > types) but explodes with nvptx.  I'm having a look.

> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge by
> > frontend (!) it wasn't even designed to cope with global trees being present
> > or not dependent on target (well, because the target is always the
> > same! mind you!)
> 
> Scary.  ;-/
> 
> > Now -- in theory it should deal with NULLs just fine (resulting in
> > error_mark_node), but it can diverge when there are additional
> > compount types (like vectors, complex
> > or array or record types) whose element types are not in the set of
> > global trees.
> > The complex _FloatN types would be such a case given they appear before 
> > their
> > components.  That mixes up the ordering at least.
> 
> ACK, but that's also an issue for "regular" float/complex float, which
> also is in "reverse" order.  But that's "fixed" by the recursion in
> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.

So, that mechanism does work, but what's going wrong is the following:
with differing target vs. offload target, we potentially (and actually
do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
_FloatNx types.  That is, for nvptx, a few of these don't exist (so,
NULL_TREE), and so it follows their complex variants also don't exist,
and thus the recursion that I just mentioned for complex types is no
longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
get an offset (of two, in this specific case), and consequently streaming
explodes, for example, as soon as it hits a forward-reference (due to
looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
should be looking for tree 183 (nvptx lto1 view).

> (I've
> put "fixed" in quotes -- doesn't that recursion mean that we're thus
> putting "complex float", "float", [...], "float" (again) into the cache?
> Anyway that's for later...)

Maybe it would make sense to do this tree streaming in two passes: first
build a set of what we actually need, and then stream that, without
duplicates.  (Or, is also for these "pickled" trees their order relevant,
so that one tree may only refer to other earlier but not later ones?
Anyway, we could still remember the set of trees already streamed, and
avoid the double streaming I described?)

So I now understand that due to the stream format, the integer tree IDs
(cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
make that work for x86_64 target vs. nvptx offload target being
different.  (I'm pondering some ideas about how to rework that integer
tree ID generation.)

(I have not digested completely yet what the implications are for the
skipping we're doing for some trees in preload_common_nodes), but here is
a first patch that at least gets us rid of the immediate problem.  (I'll
fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
now?

commit d2045bd028104be2ede5ae1d5d7b9395e67a8180
Author: Thomas Schwinge 
Date:   Thu Sep 15 21:56:16 2016 +0200

[PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx 
types

gcc/
PR lto/77458
* tree-streamer.c (preload_common_nodes): Skip _FloatN and
_FloatNx types if offloading.
---
 gcc/tree-streamer.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git gcc/tree-streamer.c