Re: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-11-19 Thread Richard Biener
On Tue, Nov 18, 2014 at 5:33 PM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Monday, November 17, 2014 12:47 PM

 Hmm.  I am a little bit concerned about the malloc traffic generated here.
 So why not use a vecusedtree, get rid of the -next pointer and
 use a hash_map gimple, unsigned to associate the stmt with
 an index into the vector?

 Sure, it even makes things easier. However I don't understand why a vector is
 better than malloc. Is it a matter of fragmentation?

That and with a vector you get log(N) allocations instead of N.


 At this point I'd rather leave DCE to DCE ...

 I thought since all information is there why not do it. It makes it easier to
 read the dump of the pass.

Fair enough ...


 Ick - do we really have to use gotos here?  Can you refactor this
 a bit to avoid it?

 Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and
 found a way to avoid it.

Thanks.


 The whole scheme wouldn't exactly fly with the idea of eventually
 using a lattice to propagate the symbolic numbers, but well...


 I think the overall idea is sound and if you have time to adjust according
 to my comments that would be nice.

 To be honest I think it should wait for next stage1. After several ping I took
 another look at the testcase with the idea of measuring the size reduction
 the patch could give and was surprised that in all cases I could construct the
 size was actually bigger. Performance might be improved nonetheless but
 I think this needs more careful consideration.

 And as you said the approach would need to be changed if bswap was
 rewritten to do a forward analysis. At last, nobody reported a code size or
 performance regression so far due to the changes so that might be a non
 issue. If such a report happens, then it will be a good time to revisit that
 decision.

 Do you agree?

Yes.  If you can't come up with a testcase that it improves but it only
pessimizes cases then IMHO this would be a very premature optimization.
I suspect that we'd need to have a more complex cost analysis factoring
in that we _do_ remove a lot of code even if parts need to be kept alive.
So maybe only discard the transform if _all_ stmts need to be kept
live (as when we even remove one of the partial loads then this should
alread offset the single load we add ... for the bswap this may be
slightly different).



 Sorry for the very late review.

 That's alright, I know how it is. Thank you for keeping track of it. I 
 actually
 feel sorry I didn't warn about my findings. I thought the patch fell through
 the cracks and didn't want to spam gcc-patches uselessly.

Sure.

Thanks,
Richard.

 Best regards,

 Thomas





RE: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-11-18 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Monday, November 17, 2014 12:47 PM
 
 Hmm.  I am a little bit concerned about the malloc traffic generated here.
 So why not use a vecusedtree, get rid of the -next pointer and
 use a hash_map gimple, unsigned to associate the stmt with
 an index into the vector?

Sure, it even makes things easier. However I don't understand why a vector is
better than malloc. Is it a matter of fragmentation?

 
 At this point I'd rather leave DCE to DCE ...

I thought since all information is there why not do it. It makes it easier to
read the dump of the pass.

 
 Ick - do we really have to use gotos here?  Can you refactor this
 a bit to avoid it?

Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and
found a way to avoid it.

 
 The whole scheme wouldn't exactly fly with the idea of eventually
 using a lattice to propagate the symbolic numbers, but well...
 
 
 I think the overall idea is sound and if you have time to adjust according
 to my comments that would be nice.

To be honest I think it should wait for next stage1. After several ping I took
another look at the testcase with the idea of measuring the size reduction
the patch could give and was surprised that in all cases I could construct the
size was actually bigger. Performance might be improved nonetheless but
I think this needs more careful consideration.

And as you said the approach would need to be changed if bswap was
rewritten to do a forward analysis. At last, nobody reported a code size or
performance regression so far due to the changes so that might be a non
issue. If such a report happens, then it will be a good time to revisit that
decision.

Do you agree? 
 

 
 Sorry for the very late review.

That's alright, I know how it is. Thank you for keeping track of it. I actually
feel sorry I didn't warn about my findings. I thought the patch fell through
the cracks and didn't want to spam gcc-patches uselessly.

Best regards,

Thomas





Re: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-11-17 Thread Richard Biener
On Thu, Aug 7, 2014 at 12:55 PM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 Hi all,

 Currently, when an expression doing manual load or bswap is detected, the
 bswap optimization replace it by an actual load and/or bswap instruction
 without considering whether the intermediate values computed in the
 expressions are reused later. If that is the case, the construction of these
 values has to be retained and the sum of the load and/or bswap instruction
 and the instruction to contruct those values might be more expensive than
 the initial fully manual expression. This patch aims at detecting such a case
 and cancel the bswap optimization. In addition, it takes advantage of the
 infrastructure necessary for the detection to also cleanup the stmts that
 become useless when the bswap optimization is performed.

Comments below

 *** gcc/ChangeLog ***

 2014-08-01  Thomas Preud'homme  thomas.preudho...@arm.com

 * tree-ssa-math-opts.c (struct usedtree): New.
 (find_bswap_or_nop_1): Change prototype to take a hashtable and a list
 of struct usedtree. Fill respectively these with visited stmts and
 trees (along with the stmts where they are defined) that would not be
 defined if bswap optimization is applied. Adapt recursion call to
 prototype change.
 (find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype change.
 Pass the hashtable and the list of struct usedtree from
 pass_optimize_bswap::execute ().
 (do_bswap_p): New.
 (bswap_replace): Fix typo in heading comment. Stop returning whether
 the bswap optimization was performed since this is now handled by
 do_bswap_p (). Move alignment check to do_bswap_p ().
 (free_usedtrees_list): New.
 (pass_optimize_bswap::execute): Add allocation and deallocation
 handling of the hashtable of browsed stmts. Free the list of struct
 usedtree at the end of each iteration using free_usedtrees_list () and
 the new bswap_check_end_iter label. Move some of the logic to perform
 the bswap optimization to do_bswap_p (). Set gsi after performing the
 bswap optimization and loop manually via the new
 bswap_check_start_iter label so that all stmts are checked for
 load/bswap even when cur_stmt is moved around by bswap_replace ().

 *** gcc/testsuite/ChangeLog ***

 2014-08-01  Thomas Preud'homme  thomas.preudho...@arm.com

 * gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate
 variable in the mix to check that it is optimized when there is no
 use outside the expression computing a load/bswap.
 (read_be32_1): Likewise.
 * gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and
 read_be32_1 () based on identically named function in
 gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so
 as to check that bswap optimization is not performed in these cases.

 diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c 
 b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
 index de6e697..df7856b 100644
 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
 +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
 @@ -14,7 +14,9 @@ struct uint32_st {

  uint32_t read_le32_1 (void)
  {
 -  return data[0] | (data[1]  8) | (data[2]  16) | (data[3]  24);
 +  uint32_t low = data[0] | (data[1]  8);
 +  uint32_t ret = low | (data[2]  16) | (data[3]  24);
 +  return ret;
  }

  uint32_t read_le32_2 (struct uint32_st data)
 @@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data)

  uint32_t read_be32_1 (void)
  {
 -  return data[3] | (data[2]  8) | (data[1]  16) | (data[0]  24);
 +  uint32_t low = data[3] | (data[2]  8);
 +  uint32_t ret = low | (data[1]  16) | (data[0]  24);
 +  return ret;
  }

  uint32_t read_be32_2 (struct uint32_st data)
 diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c 
 b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
 new file mode 100644
 index 000..dc48d9d
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
 @@ -0,0 +1,31 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target bswap32 } */
 +/* { dg-require-effective-target stdint_types } */
 +/* { dg-options -O2 -fdump-tree-bswap } */
 +/* { dg-additional-options -march=z900 { target s390-*-* } } */
 +
 +#include stdint.h
 +
 +extern unsigned char data[4];
 +
 +/* No bswap optimization as low is reused */
 +uint32_t read_le32_1 (unsigned char *data, uint32_t *low_neg)
 +{
 +  uint32_t low = data[0] | (data[1]  8);
 +  uint32_t ret = low | (data[2]  16) | (data[3]  24);
 +  *low_neg = low;
 +  return ret;
 +}
 +
 +/* No bswap optimization as low is reused */
 +uint32_t read_be32_1 (unsigned char *data, uint32_t *low_neg)
 +{
 +  uint32_t low = data[3] | (data[2]  8);
 +  uint32_t ret = low | (data[1]  16) | (data[0]  24);
 +  *low_neg = low;
 +  return ret;
 +}
 +
 +/* { dg-final { scan-tree-dump-not 32 bit load in target endianness 

RE: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-08-18 Thread Thomas Preud'homme
Ping?

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Thursday, August 07, 2014 6:56 PM
 To: gcc-patches@gcc.gnu.org
 Subject: [PATCH] Cancel bswap opt when intermediate stmts are reused
 
 Hi all,
 
 Currently, when an expression doing manual load or bswap is detected, the
 bswap optimization replace it by an actual load and/or bswap instruction
 without considering whether the intermediate values computed in the
 expressions are reused later. If that is the case, the construction of these
 values has to be retained and the sum of the load and/or bswap instruction
 and the instruction to contruct those values might be more expensive than
 the initial fully manual expression. This patch aims at detecting such a case
 and cancel the bswap optimization. In addition, it takes advantage of the
 infrastructure necessary for the detection to also cleanup the stmts that
 become useless when the bswap optimization is performed.
 
 *** gcc/ChangeLog ***
 
 2014-08-01  Thomas Preud'homme  thomas.preudho...@arm.com
 
   * tree-ssa-math-opts.c (struct usedtree): New.
   (find_bswap_or_nop_1): Change prototype to take a hashtable and
 a list
   of struct usedtree. Fill respectively these with visited stmts and
   trees (along with the stmts where they are defined) that would not
 be
   defined if bswap optimization is applied. Adapt recursion call to
   prototype change.
   (find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype
 change.
   Pass the hashtable and the list of struct usedtree from
   pass_optimize_bswap::execute ().
   (do_bswap_p): New.
   (bswap_replace): Fix typo in heading comment. Stop returning
 whether
   the bswap optimization was performed since this is now handled by
   do_bswap_p (). Move alignment check to do_bswap_p ().
   (free_usedtrees_list): New.
   (pass_optimize_bswap::execute): Add allocation and deallocation
   handling of the hashtable of browsed stmts. Free the list of struct
   usedtree at the end of each iteration using free_usedtrees_list ()
 and
   the new bswap_check_end_iter label. Move some of the logic to
 perform
   the bswap optimization to do_bswap_p (). Set gsi after performing
 the
   bswap optimization and loop manually via the new
   bswap_check_start_iter label so that all stmts are checked for
   load/bswap even when cur_stmt is moved around by bswap_replace
 ().
 
 *** gcc/testsuite/ChangeLog ***
 
 2014-08-01  Thomas Preud'homme  thomas.preudho...@arm.com
 
   * gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate
   variable in the mix to check that it is optimized when there is no
   use outside the expression computing a load/bswap.
   (read_be32_1): Likewise.
   * gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and
   read_be32_1 () based on identically named function in
   gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so
   as to check that bswap optimization is not performed in these cases.
 
 diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
 b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
 index de6e697..df7856b 100644
 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
 +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
 @@ -14,7 +14,9 @@ struct uint32_st {
 
  uint32_t read_le32_1 (void)
  {
 -  return data[0] | (data[1]  8) | (data[2]  16) | (data[3]  24);
 +  uint32_t low = data[0] | (data[1]  8);
 +  uint32_t ret = low | (data[2]  16) | (data[3]  24);
 +  return ret;
  }
 
  uint32_t read_le32_2 (struct uint32_st data)
 @@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data)
 
  uint32_t read_be32_1 (void)
  {
 -  return data[3] | (data[2]  8) | (data[1]  16) | (data[0]  24);
 +  uint32_t low = data[3] | (data[2]  8);
 +  uint32_t ret = low | (data[1]  16) | (data[0]  24);
 +  return ret;
  }
 
  uint32_t read_be32_2 (struct uint32_st data)
 diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
 b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
 new file mode 100644
 index 000..dc48d9d
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
 @@ -0,0 +1,31 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target bswap32 } */
 +/* { dg-require-effective-target stdint_types } */
 +/* { dg-options -O2 -fdump-tree-bswap } */
 +/* { dg-additional-options -march=z900 { target s390-*-* } } */
 +
 +#include stdint.h
 +
 +extern unsigned char data[4];
 +
 +/* No bswap optimization as low is reused */
 +uint32_t read_le32_1 (unsigned char *data, uint32_t *low_neg)
 +{
 +  uint32_t low = data[0] | (data[1]  8);
 +  uint32_t ret = low | (data[2]  16) | (data[3]  24);
 +  *low_neg = low;
 +  return ret;
 +}
 +
 +/* No bswap optimization as low is reused */
 +uint32_t read_be32_1 (unsigned char *data, uint32_t *low_neg)
 +{
 +  uint32_t low = data[3] | (data[2]  8);
 +  uint32_t ret = low