Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-17 Thread Richard Biener
On Tue, Jul 17, 2018 at 5:20 AM Michael Ploujnikov
 wrote:
>
> On 2018-07-16 04:30 AM, Richard Biener wrote:
> > On Mon, Jul 16, 2018 at 12:19 AM Michael Ploujnikov
> >  wrote:
> >>
> >> On 2018-07-04 04:52 AM, Richard Biener wrote:
> >>> On Tue, Jul 3, 2018 at 9:09 PM Jeff Law  wrote:
> 
>  On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
> > On 2018-07-03 12:46 PM, Richard Biener wrote:
> >> On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
> >>  wrote:
> >>> On 2018-06-20 04:23 AM, Richard Biener wrote:
>  On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
> >
> > On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
> >> Hi everyone,
> >>
> >> (I hope this is the right place to ask, if not my apologies; please
> >> point me in the right direction)
> >>
> >> I'm trying to get a better understanding of the following part in
> >> tree_swap_operands_p():
> >>
> >>   /* It is preferable to swap two SSA_NAME to ensure a canonical
> >>> form
> >>  for commutative and comparison operators.  Ensuring a
> >>> canonical
> >>  form allows the optimizers to find additional redundancies
> >>> without
> >>  having to explicitly check for both orderings.  */
> >>   if (TREE_CODE (arg0) == SSA_NAME
> >>   && TREE_CODE (arg1) == SSA_NAME
> >>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> >> return 1;
> >>
> >> My questions in no particular order: It looks like this was added
> >>> in
> >> 2004. I couldn't find any info other than what's in the
> >>> corresponding
> >> commit (cc0bdf913) so I'm wondering if the canonical form/order
> >>> still
> >> relevant/needed today? Does the ordering have to be done based on
> >>> the
> >> name versions specifically? Or can it be based on something more
> >> intrinsic to the input source code rather than a GCC internal
> >>> value, eg:
> >> would alphabetic sort order of the variable names be a reasonable
> >> replacement?
> > Canonicalization is still important and useful.
> 
>  Indeed.
> 
> > However, canonicalizing on SSA_NAMEs is problematical due to the way
> >>> we
> > recycle nodes and re-pack them.
> 
>  In the past we made sure to not disrupt order - hopefully that didn't
> >>> change
>  so the re-packing shoudln't invaidate previous canonicalization:
> 
>  static void
>  release_free_names_and_compact_live_names (function *fun)
>  {
>  ...
>    /* And compact the SSA number space.  We make sure to not change
> >>> the
>   relative order of SSA versions.  */
>    for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>  {
> 
> 
> > I think defining additional rules for canonicalization prior to
> >>> using
> > SSA_NAME_VERSION as the fallback would be looked upon favorably.
> 
>  I don't see a good reason to do that, it will be harder to spot
> >>> canonicalization
>  issues and it will take extra compile-time.
> 
> > Note however, that many of the _DECL nodes referenced by SSA_NAMEs
> >>> are
> > temporaries generated by the compiler and do not correspond to any
> > declared/defined object in the original source.  So you'll still
> >>> need
> > the SSA_NAME_VERSION (or something as stable or better)
> >>> canonicalization
> > to handle those cases.
> 
>  And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
> >>> names).
> 
>  Richard.
> 
> > Jeff
> >>>
> >>> After a bit more digging I found that insert_phi_nodes inserts PHIs in
> >>> the order of UIDs, which indirectly affects the order of vars in
> >>> old_ssa_names, which in turn affects the order in which
> >>> make_ssa_name_fn
> >>> is called to pick SSA versions from FREE_SSANAMES so in the end
> >>> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
> >>> by
> >>> UIDs. I'm trying to figure out if there's a way to avoid depending on
> >>> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
> >>> the same I'm wondering if there's some other info available at the
> >>> point
> >>> of insert_phi_nodes that would be a good replacement for UID. From my
> >>> very limited experience with a very small source input, and if I
> >>> understand things correctly, all of the var_infos have a var which is
> >>> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
> >>> general case? I don't fully understand what are all the things that
> 

Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-16 Thread Michael Ploujnikov
On 2018-07-16 04:30 AM, Richard Biener wrote:
> On Mon, Jul 16, 2018 at 12:19 AM Michael Ploujnikov
>  wrote:
>>
>> On 2018-07-04 04:52 AM, Richard Biener wrote:
>>> On Tue, Jul 3, 2018 at 9:09 PM Jeff Law  wrote:

 On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
> On 2018-07-03 12:46 PM, Richard Biener wrote:
>> On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
>>  wrote:
>>> On 2018-06-20 04:23 AM, Richard Biener wrote:
 On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>
> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
>> Hi everyone,
>>
>> (I hope this is the right place to ask, if not my apologies; please
>> point me in the right direction)
>>
>> I'm trying to get a better understanding of the following part in
>> tree_swap_operands_p():
>>
>>   /* It is preferable to swap two SSA_NAME to ensure a canonical
>>> form
>>  for commutative and comparison operators.  Ensuring a
>>> canonical
>>  form allows the optimizers to find additional redundancies
>>> without
>>  having to explicitly check for both orderings.  */
>>   if (TREE_CODE (arg0) == SSA_NAME
>>   && TREE_CODE (arg1) == SSA_NAME
>>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
>> return 1;
>>
>> My questions in no particular order: It looks like this was added
>>> in
>> 2004. I couldn't find any info other than what's in the
>>> corresponding
>> commit (cc0bdf913) so I'm wondering if the canonical form/order
>>> still
>> relevant/needed today? Does the ordering have to be done based on
>>> the
>> name versions specifically? Or can it be based on something more
>> intrinsic to the input source code rather than a GCC internal
>>> value, eg:
>> would alphabetic sort order of the variable names be a reasonable
>> replacement?
> Canonicalization is still important and useful.

 Indeed.

> However, canonicalizing on SSA_NAMEs is problematical due to the way
>>> we
> recycle nodes and re-pack them.

 In the past we made sure to not disrupt order - hopefully that didn't
>>> change
 so the re-packing shoudln't invaidate previous canonicalization:

 static void
 release_free_names_and_compact_live_names (function *fun)
 {
 ...
   /* And compact the SSA number space.  We make sure to not change
>>> the
  relative order of SSA versions.  */
   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
 {


> I think defining additional rules for canonicalization prior to
>>> using
> SSA_NAME_VERSION as the fallback would be looked upon favorably.

 I don't see a good reason to do that, it will be harder to spot
>>> canonicalization
 issues and it will take extra compile-time.

> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
>>> are
> temporaries generated by the compiler and do not correspond to any
> declared/defined object in the original source.  So you'll still
>>> need
> the SSA_NAME_VERSION (or something as stable or better)
>>> canonicalization
> to handle those cases.

 And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
>>> names).

 Richard.

> Jeff
>>>
>>> After a bit more digging I found that insert_phi_nodes inserts PHIs in
>>> the order of UIDs, which indirectly affects the order of vars in
>>> old_ssa_names, which in turn affects the order in which
>>> make_ssa_name_fn
>>> is called to pick SSA versions from FREE_SSANAMES so in the end
>>> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
>>> by
>>> UIDs. I'm trying to figure out if there's a way to avoid depending on
>>> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
>>> the same I'm wondering if there's some other info available at the
>>> point
>>> of insert_phi_nodes that would be a good replacement for UID. From my
>>> very limited experience with a very small source input, and if I
>>> understand things correctly, all of the var_infos have a var which is
>>> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
>>> general case? I don't fully understand what are all the things that
>>> insert_phi_nodes iterates over.
>>
>> Why do you want to remove the dependence on UID ordering? It's pervasive 
>> throughout the whole compilation...
>>
>> Richard.
>>
>>> - Michael
>>
>
>
> Well, I'm working on a reduction of the 

Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-16 Thread Richard Biener
On Mon, Jul 16, 2018 at 12:19 AM Michael Ploujnikov
 wrote:
>
> On 2018-07-04 04:52 AM, Richard Biener wrote:
> > On Tue, Jul 3, 2018 at 9:09 PM Jeff Law  wrote:
> >>
> >> On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
> >>> On 2018-07-03 12:46 PM, Richard Biener wrote:
>  On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
>   wrote:
> > On 2018-06-20 04:23 AM, Richard Biener wrote:
> >> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
> >>>
> >>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
>  Hi everyone,
> 
>  (I hope this is the right place to ask, if not my apologies; please
>  point me in the right direction)
> 
>  I'm trying to get a better understanding of the following part in
>  tree_swap_operands_p():
> 
>    /* It is preferable to swap two SSA_NAME to ensure a canonical
> > form
>   for commutative and comparison operators.  Ensuring a
> > canonical
>   form allows the optimizers to find additional redundancies
> > without
>   having to explicitly check for both orderings.  */
>    if (TREE_CODE (arg0) == SSA_NAME
>    && TREE_CODE (arg1) == SSA_NAME
>    && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
>  return 1;
> 
>  My questions in no particular order: It looks like this was added
> > in
>  2004. I couldn't find any info other than what's in the
> > corresponding
>  commit (cc0bdf913) so I'm wondering if the canonical form/order
> > still
>  relevant/needed today? Does the ordering have to be done based on
> > the
>  name versions specifically? Or can it be based on something more
>  intrinsic to the input source code rather than a GCC internal
> > value, eg:
>  would alphabetic sort order of the variable names be a reasonable
>  replacement?
> >>> Canonicalization is still important and useful.
> >>
> >> Indeed.
> >>
> >>> However, canonicalizing on SSA_NAMEs is problematical due to the way
> > we
> >>> recycle nodes and re-pack them.
> >>
> >> In the past we made sure to not disrupt order - hopefully that didn't
> > change
> >> so the re-packing shoudln't invaidate previous canonicalization:
> >>
> >> static void
> >> release_free_names_and_compact_live_names (function *fun)
> >> {
> >> ...
> >>   /* And compact the SSA number space.  We make sure to not change
> > the
> >>  relative order of SSA versions.  */
> >>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
> >> {
> >>
> >>
> >>> I think defining additional rules for canonicalization prior to
> > using
> >>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
> >>
> >> I don't see a good reason to do that, it will be harder to spot
> > canonicalization
> >> issues and it will take extra compile-time.
> >>
> >>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
> > are
> >>> temporaries generated by the compiler and do not correspond to any
> >>> declared/defined object in the original source.  So you'll still
> > need
> >>> the SSA_NAME_VERSION (or something as stable or better)
> > canonicalization
> >>> to handle those cases.
> >>
> >> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
> > names).
> >>
> >> Richard.
> >>
> >>> Jeff
> >
> > After a bit more digging I found that insert_phi_nodes inserts PHIs in
> > the order of UIDs, which indirectly affects the order of vars in
> > old_ssa_names, which in turn affects the order in which
> > make_ssa_name_fn
> > is called to pick SSA versions from FREE_SSANAMES so in the end
> > ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
> > by
> > UIDs. I'm trying to figure out if there's a way to avoid depending on
> > UIDs being ordered in a certain way. So if tree_swap_operands_p stays
> > the same I'm wondering if there's some other info available at the
> > point
> > of insert_phi_nodes that would be a good replacement for UID. From my
> > very limited experience with a very small source input, and if I
> > understand things correctly, all of the var_infos have a var which is
> > DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
> > general case? I don't fully understand what are all the things that
> > insert_phi_nodes iterates over.
> 
>  Why do you want to remove the dependence on UID ordering? It's pervasive 
>  throughout the whole compilation...
> 
>  Richard.
> 
> > - Michael
> 
> >>>
> >>>
> >>> Well, I'm working on a reduction of the number of changes seen with
> >>> binary diffing (a 

Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-15 Thread Michael Ploujnikov
On 2018-07-04 04:52 AM, Richard Biener wrote:
> On Tue, Jul 3, 2018 at 9:09 PM Jeff Law  wrote:
>>
>> On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
>>> On 2018-07-03 12:46 PM, Richard Biener wrote:
 On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
  wrote:
> On 2018-06-20 04:23 AM, Richard Biener wrote:
>> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>>>
>>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
 Hi everyone,

 (I hope this is the right place to ask, if not my apologies; please
 point me in the right direction)

 I'm trying to get a better understanding of the following part in
 tree_swap_operands_p():

   /* It is preferable to swap two SSA_NAME to ensure a canonical
> form
  for commutative and comparison operators.  Ensuring a
> canonical
  form allows the optimizers to find additional redundancies
> without
  having to explicitly check for both orderings.  */
   if (TREE_CODE (arg0) == SSA_NAME
   && TREE_CODE (arg1) == SSA_NAME
   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
 return 1;

 My questions in no particular order: It looks like this was added
> in
 2004. I couldn't find any info other than what's in the
> corresponding
 commit (cc0bdf913) so I'm wondering if the canonical form/order
> still
 relevant/needed today? Does the ordering have to be done based on
> the
 name versions specifically? Or can it be based on something more
 intrinsic to the input source code rather than a GCC internal
> value, eg:
 would alphabetic sort order of the variable names be a reasonable
 replacement?
>>> Canonicalization is still important and useful.
>>
>> Indeed.
>>
>>> However, canonicalizing on SSA_NAMEs is problematical due to the way
> we
>>> recycle nodes and re-pack them.
>>
>> In the past we made sure to not disrupt order - hopefully that didn't
> change
>> so the re-packing shoudln't invaidate previous canonicalization:
>>
>> static void
>> release_free_names_and_compact_live_names (function *fun)
>> {
>> ...
>>   /* And compact the SSA number space.  We make sure to not change
> the
>>  relative order of SSA versions.  */
>>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>> {
>>
>>
>>> I think defining additional rules for canonicalization prior to
> using
>>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
>>
>> I don't see a good reason to do that, it will be harder to spot
> canonicalization
>> issues and it will take extra compile-time.
>>
>>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
> are
>>> temporaries generated by the compiler and do not correspond to any
>>> declared/defined object in the original source.  So you'll still
> need
>>> the SSA_NAME_VERSION (or something as stable or better)
> canonicalization
>>> to handle those cases.
>>
>> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
> names).
>>
>> Richard.
>>
>>> Jeff
>
> After a bit more digging I found that insert_phi_nodes inserts PHIs in
> the order of UIDs, which indirectly affects the order of vars in
> old_ssa_names, which in turn affects the order in which
> make_ssa_name_fn
> is called to pick SSA versions from FREE_SSANAMES so in the end
> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
> by
> UIDs. I'm trying to figure out if there's a way to avoid depending on
> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
> the same I'm wondering if there's some other info available at the
> point
> of insert_phi_nodes that would be a good replacement for UID. From my
> very limited experience with a very small source input, and if I
> understand things correctly, all of the var_infos have a var which is
> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
> general case? I don't fully understand what are all the things that
> insert_phi_nodes iterates over.

 Why do you want to remove the dependence on UID ordering? It's pervasive 
 throughout the whole compilation...

 Richard.

> - Michael

>>>
>>>
>>> Well, I'm working on a reduction of the number of changes seen with
>>> binary diffing (a la https://wiki.debian.org/ReproducibleBuilds) and
>>> since current UID assignment is essentially tied to the order of things
>>> in the input source code one function's changes can cascade to others
>>> (even when they're unchanged). As you said UID dependence is quiet
>>> pervasive, and although 

Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-04 Thread Richard Biener
On Tue, Jul 3, 2018 at 9:09 PM Jeff Law  wrote:
>
> On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
> > On 2018-07-03 12:46 PM, Richard Biener wrote:
> >> On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
> >>  wrote:
> >>> On 2018-06-20 04:23 AM, Richard Biener wrote:
>  On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
> >
> > On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
> >> Hi everyone,
> >>
> >> (I hope this is the right place to ask, if not my apologies; please
> >> point me in the right direction)
> >>
> >> I'm trying to get a better understanding of the following part in
> >> tree_swap_operands_p():
> >>
> >>   /* It is preferable to swap two SSA_NAME to ensure a canonical
> >>> form
> >>  for commutative and comparison operators.  Ensuring a
> >>> canonical
> >>  form allows the optimizers to find additional redundancies
> >>> without
> >>  having to explicitly check for both orderings.  */
> >>   if (TREE_CODE (arg0) == SSA_NAME
> >>   && TREE_CODE (arg1) == SSA_NAME
> >>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> >> return 1;
> >>
> >> My questions in no particular order: It looks like this was added
> >>> in
> >> 2004. I couldn't find any info other than what's in the
> >>> corresponding
> >> commit (cc0bdf913) so I'm wondering if the canonical form/order
> >>> still
> >> relevant/needed today? Does the ordering have to be done based on
> >>> the
> >> name versions specifically? Or can it be based on something more
> >> intrinsic to the input source code rather than a GCC internal
> >>> value, eg:
> >> would alphabetic sort order of the variable names be a reasonable
> >> replacement?
> > Canonicalization is still important and useful.
> 
>  Indeed.
> 
> > However, canonicalizing on SSA_NAMEs is problematical due to the way
> >>> we
> > recycle nodes and re-pack them.
> 
>  In the past we made sure to not disrupt order - hopefully that didn't
> >>> change
>  so the re-packing shoudln't invaidate previous canonicalization:
> 
>  static void
>  release_free_names_and_compact_live_names (function *fun)
>  {
>  ...
>    /* And compact the SSA number space.  We make sure to not change
> >>> the
>   relative order of SSA versions.  */
>    for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>  {
> 
> 
> > I think defining additional rules for canonicalization prior to
> >>> using
> > SSA_NAME_VERSION as the fallback would be looked upon favorably.
> 
>  I don't see a good reason to do that, it will be harder to spot
> >>> canonicalization
>  issues and it will take extra compile-time.
> 
> > Note however, that many of the _DECL nodes referenced by SSA_NAMEs
> >>> are
> > temporaries generated by the compiler and do not correspond to any
> > declared/defined object in the original source.  So you'll still
> >>> need
> > the SSA_NAME_VERSION (or something as stable or better)
> >>> canonicalization
> > to handle those cases.
> 
>  And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
> >>> names).
> 
>  Richard.
> 
> > Jeff
> >>>
> >>> After a bit more digging I found that insert_phi_nodes inserts PHIs in
> >>> the order of UIDs, which indirectly affects the order of vars in
> >>> old_ssa_names, which in turn affects the order in which
> >>> make_ssa_name_fn
> >>> is called to pick SSA versions from FREE_SSANAMES so in the end
> >>> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
> >>> by
> >>> UIDs. I'm trying to figure out if there's a way to avoid depending on
> >>> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
> >>> the same I'm wondering if there's some other info available at the
> >>> point
> >>> of insert_phi_nodes that would be a good replacement for UID. From my
> >>> very limited experience with a very small source input, and if I
> >>> understand things correctly, all of the var_infos have a var which is
> >>> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
> >>> general case? I don't fully understand what are all the things that
> >>> insert_phi_nodes iterates over.
> >>
> >> Why do you want to remove the dependence on UID ordering? It's pervasive 
> >> throughout the whole compilation...
> >>
> >> Richard.
> >>
> >>> - Michael
> >>
> >
> >
> > Well, I'm working on a reduction of the number of changes seen with
> > binary diffing (a la https://wiki.debian.org/ReproducibleBuilds) and
> > since current UID assignment is essentially tied to the order of things
> > in the input source code one function's changes can cascade to others
> > (even when they're unchanged). As you said UID dependence is quiet
> > pervasive, and although finding and improving individual cases (such as
> > 

Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Jeff Law
On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
> On 2018-07-03 12:46 PM, Richard Biener wrote:
>> On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
>>  wrote:
>>> On 2018-06-20 04:23 AM, Richard Biener wrote:
 On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>
> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
>> Hi everyone,
>>
>> (I hope this is the right place to ask, if not my apologies; please
>> point me in the right direction)
>>
>> I'm trying to get a better understanding of the following part in
>> tree_swap_operands_p():
>>
>>   /* It is preferable to swap two SSA_NAME to ensure a canonical
>>> form
>>  for commutative and comparison operators.  Ensuring a
>>> canonical
>>  form allows the optimizers to find additional redundancies
>>> without
>>  having to explicitly check for both orderings.  */
>>   if (TREE_CODE (arg0) == SSA_NAME
>>   && TREE_CODE (arg1) == SSA_NAME
>>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
>> return 1;
>>
>> My questions in no particular order: It looks like this was added
>>> in
>> 2004. I couldn't find any info other than what's in the
>>> corresponding
>> commit (cc0bdf913) so I'm wondering if the canonical form/order
>>> still
>> relevant/needed today? Does the ordering have to be done based on
>>> the
>> name versions specifically? Or can it be based on something more
>> intrinsic to the input source code rather than a GCC internal
>>> value, eg:
>> would alphabetic sort order of the variable names be a reasonable
>> replacement?
> Canonicalization is still important and useful.

 Indeed.

> However, canonicalizing on SSA_NAMEs is problematical due to the way
>>> we
> recycle nodes and re-pack them.

 In the past we made sure to not disrupt order - hopefully that didn't
>>> change
 so the re-packing shoudln't invaidate previous canonicalization:

 static void
 release_free_names_and_compact_live_names (function *fun)
 {
 ...
   /* And compact the SSA number space.  We make sure to not change
>>> the
  relative order of SSA versions.  */
   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
 {


> I think defining additional rules for canonicalization prior to
>>> using
> SSA_NAME_VERSION as the fallback would be looked upon favorably.

 I don't see a good reason to do that, it will be harder to spot
>>> canonicalization
 issues and it will take extra compile-time.

> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
>>> are
> temporaries generated by the compiler and do not correspond to any
> declared/defined object in the original source.  So you'll still
>>> need
> the SSA_NAME_VERSION (or something as stable or better)
>>> canonicalization
> to handle those cases.

 And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
>>> names).

 Richard.

> Jeff
>>>
>>> After a bit more digging I found that insert_phi_nodes inserts PHIs in
>>> the order of UIDs, which indirectly affects the order of vars in
>>> old_ssa_names, which in turn affects the order in which
>>> make_ssa_name_fn
>>> is called to pick SSA versions from FREE_SSANAMES so in the end
>>> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
>>> by
>>> UIDs. I'm trying to figure out if there's a way to avoid depending on
>>> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
>>> the same I'm wondering if there's some other info available at the
>>> point
>>> of insert_phi_nodes that would be a good replacement for UID. From my
>>> very limited experience with a very small source input, and if I
>>> understand things correctly, all of the var_infos have a var which is
>>> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
>>> general case? I don't fully understand what are all the things that
>>> insert_phi_nodes iterates over.
>>
>> Why do you want to remove the dependence on UID ordering? It's pervasive 
>> throughout the whole compilation... 
>>
>> Richard. 
>>
>>> - Michael
>>
> 
> 
> Well, I'm working on a reduction of the number of changes seen with
> binary diffing (a la https://wiki.debian.org/ReproducibleBuilds) and
> since current UID assignment is essentially tied to the order of things
> in the input source code one function's changes can cascade to others
> (even when they're unchanged). As you said UID dependence is quiet
> pervasive, and although finding and improving individual cases (such as
> tree_swap_operands_p) won't make it perfect, I think it will be a step
> in the positive direction.
> 
> Also, I have some ideas for a UID assignment scheme that might improve
> things overall, that I'll try to share after I get back from vacation.
I'm still not sure what the point is.  

Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Michael Ploujnikov
On 2018-07-03 12:46 PM, Richard Biener wrote:
> On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
>  wrote:
>> On 2018-06-20 04:23 AM, Richard Biener wrote:
>>> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:

 On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
> Hi everyone,
>
> (I hope this is the right place to ask, if not my apologies; please
> point me in the right direction)
>
> I'm trying to get a better understanding of the following part in
> tree_swap_operands_p():
>
>   /* It is preferable to swap two SSA_NAME to ensure a canonical
>> form
>  for commutative and comparison operators.  Ensuring a
>> canonical
>  form allows the optimizers to find additional redundancies
>> without
>  having to explicitly check for both orderings.  */
>   if (TREE_CODE (arg0) == SSA_NAME
>   && TREE_CODE (arg1) == SSA_NAME
>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> return 1;
>
> My questions in no particular order: It looks like this was added
>> in
> 2004. I couldn't find any info other than what's in the
>> corresponding
> commit (cc0bdf913) so I'm wondering if the canonical form/order
>> still
> relevant/needed today? Does the ordering have to be done based on
>> the
> name versions specifically? Or can it be based on something more
> intrinsic to the input source code rather than a GCC internal
>> value, eg:
> would alphabetic sort order of the variable names be a reasonable
> replacement?
 Canonicalization is still important and useful.
>>>
>>> Indeed.
>>>
 However, canonicalizing on SSA_NAMEs is problematical due to the way
>> we
 recycle nodes and re-pack them.
>>>
>>> In the past we made sure to not disrupt order - hopefully that didn't
>> change
>>> so the re-packing shoudln't invaidate previous canonicalization:
>>>
>>> static void
>>> release_free_names_and_compact_live_names (function *fun)
>>> {
>>> ...
>>>   /* And compact the SSA number space.  We make sure to not change
>> the
>>>  relative order of SSA versions.  */
>>>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>>> {
>>>
>>>
 I think defining additional rules for canonicalization prior to
>> using
 SSA_NAME_VERSION as the fallback would be looked upon favorably.
>>>
>>> I don't see a good reason to do that, it will be harder to spot
>> canonicalization
>>> issues and it will take extra compile-time.
>>>
 Note however, that many of the _DECL nodes referenced by SSA_NAMEs
>> are
 temporaries generated by the compiler and do not correspond to any
 declared/defined object in the original source.  So you'll still
>> need
 the SSA_NAME_VERSION (or something as stable or better)
>> canonicalization
 to handle those cases.
>>>
>>> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
>> names).
>>>
>>> Richard.
>>>
 Jeff
>>
>> After a bit more digging I found that insert_phi_nodes inserts PHIs in
>> the order of UIDs, which indirectly affects the order of vars in
>> old_ssa_names, which in turn affects the order in which
>> make_ssa_name_fn
>> is called to pick SSA versions from FREE_SSANAMES so in the end
>> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
>> by
>> UIDs. I'm trying to figure out if there's a way to avoid depending on
>> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
>> the same I'm wondering if there's some other info available at the
>> point
>> of insert_phi_nodes that would be a good replacement for UID. From my
>> very limited experience with a very small source input, and if I
>> understand things correctly, all of the var_infos have a var which is
>> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
>> general case? I don't fully understand what are all the things that
>> insert_phi_nodes iterates over.
> 
> Why do you want to remove the dependence on UID ordering? It's pervasive 
> throughout the whole compilation... 
> 
> Richard. 
> 
>> - Michael
> 


Well, I'm working on a reduction of the number of changes seen with
binary diffing (a la https://wiki.debian.org/ReproducibleBuilds) and
since current UID assignment is essentially tied to the order of things
in the input source code one function's changes can cascade to others
(even when they're unchanged). As you said UID dependence is quiet
pervasive, and although finding and improving individual cases (such as
tree_swap_operands_p) won't make it perfect, I think it will be a step
in the positive direction.

Also, I have some ideas for a UID assignment scheme that might improve
things overall, that I'll try to share after I get back from vacation.

- Michael



signature.asc
Description: OpenPGP digital signature


Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Richard Biener
On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
 wrote:
>On 2018-06-20 04:23 AM, Richard Biener wrote:
>> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>>>
>>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
 Hi everyone,

 (I hope this is the right place to ask, if not my apologies; please
 point me in the right direction)

 I'm trying to get a better understanding of the following part in
 tree_swap_operands_p():

   /* It is preferable to swap two SSA_NAME to ensure a canonical
>form
  for commutative and comparison operators.  Ensuring a
>canonical
  form allows the optimizers to find additional redundancies
>without
  having to explicitly check for both orderings.  */
   if (TREE_CODE (arg0) == SSA_NAME
   && TREE_CODE (arg1) == SSA_NAME
   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
 return 1;

 My questions in no particular order: It looks like this was added
>in
 2004. I couldn't find any info other than what's in the
>corresponding
 commit (cc0bdf913) so I'm wondering if the canonical form/order
>still
 relevant/needed today? Does the ordering have to be done based on
>the
 name versions specifically? Or can it be based on something more
 intrinsic to the input source code rather than a GCC internal
>value, eg:
 would alphabetic sort order of the variable names be a reasonable
 replacement?
>>> Canonicalization is still important and useful.
>> 
>> Indeed.
>> 
>>> However, canonicalizing on SSA_NAMEs is problematical due to the way
>we
>>> recycle nodes and re-pack them.
>> 
>> In the past we made sure to not disrupt order - hopefully that didn't
>change
>> so the re-packing shoudln't invaidate previous canonicalization:
>> 
>> static void
>> release_free_names_and_compact_live_names (function *fun)
>> {
>> ...
>>   /* And compact the SSA number space.  We make sure to not change
>the
>>  relative order of SSA versions.  */
>>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>> {
>> 
>> 
>>> I think defining additional rules for canonicalization prior to
>using
>>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
>> 
>> I don't see a good reason to do that, it will be harder to spot
>canonicalization
>> issues and it will take extra compile-time.
>> 
>>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
>are
>>> temporaries generated by the compiler and do not correspond to any
>>> declared/defined object in the original source.  So you'll still
>need
>>> the SSA_NAME_VERSION (or something as stable or better)
>canonicalization
>>> to handle those cases.
>> 
>> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
>names).
>> 
>> Richard.
>> 
>>> Jeff
>
>After a bit more digging I found that insert_phi_nodes inserts PHIs in
>the order of UIDs, which indirectly affects the order of vars in
>old_ssa_names, which in turn affects the order in which
>make_ssa_name_fn
>is called to pick SSA versions from FREE_SSANAMES so in the end
>ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
>by
>UIDs. I'm trying to figure out if there's a way to avoid depending on
>UIDs being ordered in a certain way. So if tree_swap_operands_p stays
>the same I'm wondering if there's some other info available at the
>point
>of insert_phi_nodes that would be a good replacement for UID. From my
>very limited experience with a very small source input, and if I
>understand things correctly, all of the var_infos have a var which is
>DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
>general case? I don't fully understand what are all the things that
>insert_phi_nodes iterates over.

Why do you want to remove the dependence on UID ordering? It's pervasive 
throughout the whole compilation... 

Richard. 

>- Michael



Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Michael Ploujnikov
On 2018-06-20 04:23 AM, Richard Biener wrote:
> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>>
>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
>>> Hi everyone,
>>>
>>> (I hope this is the right place to ask, if not my apologies; please
>>> point me in the right direction)
>>>
>>> I'm trying to get a better understanding of the following part in
>>> tree_swap_operands_p():
>>>
>>>   /* It is preferable to swap two SSA_NAME to ensure a canonical form
>>>  for commutative and comparison operators.  Ensuring a canonical
>>>  form allows the optimizers to find additional redundancies without
>>>  having to explicitly check for both orderings.  */
>>>   if (TREE_CODE (arg0) == SSA_NAME
>>>   && TREE_CODE (arg1) == SSA_NAME
>>>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
>>> return 1;
>>>
>>> My questions in no particular order: It looks like this was added in
>>> 2004. I couldn't find any info other than what's in the corresponding
>>> commit (cc0bdf913) so I'm wondering if the canonical form/order still
>>> relevant/needed today? Does the ordering have to be done based on the
>>> name versions specifically? Or can it be based on something more
>>> intrinsic to the input source code rather than a GCC internal value, eg:
>>> would alphabetic sort order of the variable names be a reasonable
>>> replacement?
>> Canonicalization is still important and useful.
> 
> Indeed.
> 
>> However, canonicalizing on SSA_NAMEs is problematical due to the way we
>> recycle nodes and re-pack them.
> 
> In the past we made sure to not disrupt order - hopefully that didn't change
> so the re-packing shoudln't invaidate previous canonicalization:
> 
> static void
> release_free_names_and_compact_live_names (function *fun)
> {
> ...
>   /* And compact the SSA number space.  We make sure to not change the
>  relative order of SSA versions.  */
>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
> {
> 
> 
>> I think defining additional rules for canonicalization prior to using
>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
> 
> I don't see a good reason to do that, it will be harder to spot 
> canonicalization
> issues and it will take extra compile-time.
> 
>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs are
>> temporaries generated by the compiler and do not correspond to any
>> declared/defined object in the original source.  So you'll still need
>> the SSA_NAME_VERSION (or something as stable or better) canonicalization
>> to handle those cases.
> 
> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE names).
> 
> Richard.
> 
>> Jeff

After a bit more digging I found that insert_phi_nodes inserts PHIs in
the order of UIDs, which indirectly affects the order of vars in
old_ssa_names, which in turn affects the order in which make_ssa_name_fn
is called to pick SSA versions from FREE_SSANAMES so in the end
ordering by SSA_NAME_VERSION's is more or less equivalent to ordering by
UIDs. I'm trying to figure out if there's a way to avoid depending on
UIDs being ordered in a certain way. So if tree_swap_operands_p stays
the same I'm wondering if there's some other info available at the point
of insert_phi_nodes that would be a good replacement for UID. From my
very limited experience with a very small source input, and if I
understand things correctly, all of the var_infos have a var which is
DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
general case? I don't fully understand what are all the things that
insert_phi_nodes iterates over.

- Michael



signature.asc
Description: OpenPGP digital signature


Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-06-20 Thread Richard Biener
On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>
> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
> > Hi everyone,
> >
> > (I hope this is the right place to ask, if not my apologies; please
> > point me in the right direction)
> >
> > I'm trying to get a better understanding of the following part in
> > tree_swap_operands_p():
> >
> >   /* It is preferable to swap two SSA_NAME to ensure a canonical form
> >  for commutative and comparison operators.  Ensuring a canonical
> >  form allows the optimizers to find additional redundancies without
> >  having to explicitly check for both orderings.  */
> >   if (TREE_CODE (arg0) == SSA_NAME
> >   && TREE_CODE (arg1) == SSA_NAME
> >   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> > return 1;
> >
> > My questions in no particular order: It looks like this was added in
> > 2004. I couldn't find any info other than what's in the corresponding
> > commit (cc0bdf913) so I'm wondering if the canonical form/order still
> > relevant/needed today? Does the ordering have to be done based on the
> > name versions specifically? Or can it be based on something more
> > intrinsic to the input source code rather than a GCC internal value, eg:
> > would alphabetic sort order of the variable names be a reasonable
> > replacement?
> Canonicalization is still important and useful.

Indeed.

> However, canonicalizing on SSA_NAMEs is problematical due to the way we
> recycle nodes and re-pack them.

In the past we made sure to not disrupt order - hopefully that didn't change
so the re-packing shoudln't invaidate previous canonicalization:

static void
release_free_names_and_compact_live_names (function *fun)
{
...
  /* And compact the SSA number space.  We make sure to not change the
 relative order of SSA versions.  */
  for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
{


> I think defining additional rules for canonicalization prior to using
> SSA_NAME_VERSION as the fallback would be looked upon favorably.

I don't see a good reason to do that, it will be harder to spot canonicalization
issues and it will take extra compile-time.

> Note however, that many of the _DECL nodes referenced by SSA_NAMEs are
> temporaries generated by the compiler and do not correspond to any
> declared/defined object in the original source.  So you'll still need
> the SSA_NAME_VERSION (or something as stable or better) canonicalization
> to handle those cases.

And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE names).

Richard.

> Jeff


Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-06-19 Thread Jeff Law
On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
> Hi everyone,
> 
> (I hope this is the right place to ask, if not my apologies; please
> point me in the right direction)
> 
> I'm trying to get a better understanding of the following part in
> tree_swap_operands_p():
> 
>   /* It is preferable to swap two SSA_NAME to ensure a canonical form
>  for commutative and comparison operators.  Ensuring a canonical
>  form allows the optimizers to find additional redundancies without
>  having to explicitly check for both orderings.  */
>   if (TREE_CODE (arg0) == SSA_NAME
>   && TREE_CODE (arg1) == SSA_NAME
>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> return 1;
> 
> My questions in no particular order: It looks like this was added in
> 2004. I couldn't find any info other than what's in the corresponding
> commit (cc0bdf913) so I'm wondering if the canonical form/order still
> relevant/needed today? Does the ordering have to be done based on the
> name versions specifically? Or can it be based on something more
> intrinsic to the input source code rather than a GCC internal value, eg:
> would alphabetic sort order of the variable names be a reasonable
> replacement?
Canonicalization is still important and useful.

However, canonicalizing on SSA_NAMEs is problematical due to the way we
recycle nodes and re-pack them.

I think defining additional rules for canonicalization prior to using
SSA_NAME_VERSION as the fallback would be looked upon favorably.

Note however, that many of the _DECL nodes referenced by SSA_NAMEs are
temporaries generated by the compiler and do not correspond to any
declared/defined object in the original source.  So you'll still need
the SSA_NAME_VERSION (or something as stable or better) canonicalization
to handle those cases.

Jeff