Re: [PATCH] Cancel bswap opt when intermediate stmts are reused
On Tue, Nov 18, 2014 at 5:33 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Monday, November 17, 2014 12:47 PM Hmm. I am a little bit concerned about the malloc traffic generated here. So why not use a vecusedtree, get rid of the -next pointer and use a hash_map gimple, unsigned to associate the stmt with an index into the vector? Sure, it even makes things easier. However I don't understand why a vector is better than malloc. Is it a matter of fragmentation? That and with a vector you get log(N) allocations instead of N. At this point I'd rather leave DCE to DCE ... I thought since all information is there why not do it. It makes it easier to read the dump of the pass. Fair enough ... Ick - do we really have to use gotos here? Can you refactor this a bit to avoid it? Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and found a way to avoid it. Thanks. The whole scheme wouldn't exactly fly with the idea of eventually using a lattice to propagate the symbolic numbers, but well... I think the overall idea is sound and if you have time to adjust according to my comments that would be nice. To be honest I think it should wait for next stage1. After several ping I took another look at the testcase with the idea of measuring the size reduction the patch could give and was surprised that in all cases I could construct the size was actually bigger. Performance might be improved nonetheless but I think this needs more careful consideration. And as you said the approach would need to be changed if bswap was rewritten to do a forward analysis. At last, nobody reported a code size or performance regression so far due to the changes so that might be a non issue. If such a report happens, then it will be a good time to revisit that decision. Do you agree? Yes. If you can't come up with a testcase that it improves but it only pessimizes cases then IMHO this would be a very premature optimization. I suspect that we'd need to have a more complex cost analysis factoring in that we _do_ remove a lot of code even if parts need to be kept alive. So maybe only discard the transform if _all_ stmts need to be kept live (as when we even remove one of the partial loads then this should alread offset the single load we add ... for the bswap this may be slightly different). Sorry for the very late review. That's alright, I know how it is. Thank you for keeping track of it. I actually feel sorry I didn't warn about my findings. I thought the patch fell through the cracks and didn't want to spam gcc-patches uselessly. Sure. Thanks, Richard. Best regards, Thomas
RE: [PATCH] Cancel bswap opt when intermediate stmts are reused
From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Monday, November 17, 2014 12:47 PM Hmm. I am a little bit concerned about the malloc traffic generated here. So why not use a vecusedtree, get rid of the -next pointer and use a hash_map gimple, unsigned to associate the stmt with an index into the vector? Sure, it even makes things easier. However I don't understand why a vector is better than malloc. Is it a matter of fragmentation? At this point I'd rather leave DCE to DCE ... I thought since all information is there why not do it. It makes it easier to read the dump of the pass. Ick - do we really have to use gotos here? Can you refactor this a bit to avoid it? Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and found a way to avoid it. The whole scheme wouldn't exactly fly with the idea of eventually using a lattice to propagate the symbolic numbers, but well... I think the overall idea is sound and if you have time to adjust according to my comments that would be nice. To be honest I think it should wait for next stage1. After several ping I took another look at the testcase with the idea of measuring the size reduction the patch could give and was surprised that in all cases I could construct the size was actually bigger. Performance might be improved nonetheless but I think this needs more careful consideration. And as you said the approach would need to be changed if bswap was rewritten to do a forward analysis. At last, nobody reported a code size or performance regression so far due to the changes so that might be a non issue. If such a report happens, then it will be a good time to revisit that decision. Do you agree? Sorry for the very late review. That's alright, I know how it is. Thank you for keeping track of it. I actually feel sorry I didn't warn about my findings. I thought the patch fell through the cracks and didn't want to spam gcc-patches uselessly. Best regards, Thomas
Re: [PATCH] Cancel bswap opt when intermediate stmts are reused
On Thu, Aug 7, 2014 at 12:55 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi all, Currently, when an expression doing manual load or bswap is detected, the bswap optimization replace it by an actual load and/or bswap instruction without considering whether the intermediate values computed in the expressions are reused later. If that is the case, the construction of these values has to be retained and the sum of the load and/or bswap instruction and the instruction to contruct those values might be more expensive than the initial fully manual expression. This patch aims at detecting such a case and cancel the bswap optimization. In addition, it takes advantage of the infrastructure necessary for the detection to also cleanup the stmts that become useless when the bswap optimization is performed. Comments below *** gcc/ChangeLog *** 2014-08-01 Thomas Preud'homme thomas.preudho...@arm.com * tree-ssa-math-opts.c (struct usedtree): New. (find_bswap_or_nop_1): Change prototype to take a hashtable and a list of struct usedtree. Fill respectively these with visited stmts and trees (along with the stmts where they are defined) that would not be defined if bswap optimization is applied. Adapt recursion call to prototype change. (find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype change. Pass the hashtable and the list of struct usedtree from pass_optimize_bswap::execute (). (do_bswap_p): New. (bswap_replace): Fix typo in heading comment. Stop returning whether the bswap optimization was performed since this is now handled by do_bswap_p (). Move alignment check to do_bswap_p (). (free_usedtrees_list): New. (pass_optimize_bswap::execute): Add allocation and deallocation handling of the hashtable of browsed stmts. Free the list of struct usedtree at the end of each iteration using free_usedtrees_list () and the new bswap_check_end_iter label. Move some of the logic to perform the bswap optimization to do_bswap_p (). Set gsi after performing the bswap optimization and loop manually via the new bswap_check_start_iter label so that all stmts are checked for load/bswap even when cur_stmt is moved around by bswap_replace (). *** gcc/testsuite/ChangeLog *** 2014-08-01 Thomas Preud'homme thomas.preudho...@arm.com * gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate variable in the mix to check that it is optimized when there is no use outside the expression computing a load/bswap. (read_be32_1): Likewise. * gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and read_be32_1 () based on identically named function in gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so as to check that bswap optimization is not performed in these cases. diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c index de6e697..df7856b 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c @@ -14,7 +14,9 @@ struct uint32_st { uint32_t read_le32_1 (void) { - return data[0] | (data[1] 8) | (data[2] 16) | (data[3] 24); + uint32_t low = data[0] | (data[1] 8); + uint32_t ret = low | (data[2] 16) | (data[3] 24); + return ret; } uint32_t read_le32_2 (struct uint32_st data) @@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data) uint32_t read_be32_1 (void) { - return data[3] | (data[2] 8) | (data[1] 16) | (data[0] 24); + uint32_t low = data[3] | (data[2] 8); + uint32_t ret = low | (data[1] 16) | (data[0] 24); + return ret; } uint32_t read_be32_2 (struct uint32_st data) diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c new file mode 100644 index 000..dc48d9d --- /dev/null +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target bswap32 } */ +/* { dg-require-effective-target stdint_types } */ +/* { dg-options -O2 -fdump-tree-bswap } */ +/* { dg-additional-options -march=z900 { target s390-*-* } } */ + +#include stdint.h + +extern unsigned char data[4]; + +/* No bswap optimization as low is reused */ +uint32_t read_le32_1 (unsigned char *data, uint32_t *low_neg) +{ + uint32_t low = data[0] | (data[1] 8); + uint32_t ret = low | (data[2] 16) | (data[3] 24); + *low_neg = low; + return ret; +} + +/* No bswap optimization as low is reused */ +uint32_t read_be32_1 (unsigned char *data, uint32_t *low_neg) +{ + uint32_t low = data[3] | (data[2] 8); + uint32_t ret = low | (data[1] 16) | (data[0] 24); + *low_neg = low; + return ret; +} + +/* { dg-final { scan-tree-dump-not 32 bit load in target endianness
RE: [PATCH] Cancel bswap opt when intermediate stmts are reused
Ping? -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Thursday, August 07, 2014 6:56 PM To: gcc-patches@gcc.gnu.org Subject: [PATCH] Cancel bswap opt when intermediate stmts are reused Hi all, Currently, when an expression doing manual load or bswap is detected, the bswap optimization replace it by an actual load and/or bswap instruction without considering whether the intermediate values computed in the expressions are reused later. If that is the case, the construction of these values has to be retained and the sum of the load and/or bswap instruction and the instruction to contruct those values might be more expensive than the initial fully manual expression. This patch aims at detecting such a case and cancel the bswap optimization. In addition, it takes advantage of the infrastructure necessary for the detection to also cleanup the stmts that become useless when the bswap optimization is performed. *** gcc/ChangeLog *** 2014-08-01 Thomas Preud'homme thomas.preudho...@arm.com * tree-ssa-math-opts.c (struct usedtree): New. (find_bswap_or_nop_1): Change prototype to take a hashtable and a list of struct usedtree. Fill respectively these with visited stmts and trees (along with the stmts where they are defined) that would not be defined if bswap optimization is applied. Adapt recursion call to prototype change. (find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype change. Pass the hashtable and the list of struct usedtree from pass_optimize_bswap::execute (). (do_bswap_p): New. (bswap_replace): Fix typo in heading comment. Stop returning whether the bswap optimization was performed since this is now handled by do_bswap_p (). Move alignment check to do_bswap_p (). (free_usedtrees_list): New. (pass_optimize_bswap::execute): Add allocation and deallocation handling of the hashtable of browsed stmts. Free the list of struct usedtree at the end of each iteration using free_usedtrees_list () and the new bswap_check_end_iter label. Move some of the logic to perform the bswap optimization to do_bswap_p (). Set gsi after performing the bswap optimization and loop manually via the new bswap_check_start_iter label so that all stmts are checked for load/bswap even when cur_stmt is moved around by bswap_replace (). *** gcc/testsuite/ChangeLog *** 2014-08-01 Thomas Preud'homme thomas.preudho...@arm.com * gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate variable in the mix to check that it is optimized when there is no use outside the expression computing a load/bswap. (read_be32_1): Likewise. * gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and read_be32_1 () based on identically named function in gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so as to check that bswap optimization is not performed in these cases. diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c index de6e697..df7856b 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c @@ -14,7 +14,9 @@ struct uint32_st { uint32_t read_le32_1 (void) { - return data[0] | (data[1] 8) | (data[2] 16) | (data[3] 24); + uint32_t low = data[0] | (data[1] 8); + uint32_t ret = low | (data[2] 16) | (data[3] 24); + return ret; } uint32_t read_le32_2 (struct uint32_st data) @@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data) uint32_t read_be32_1 (void) { - return data[3] | (data[2] 8) | (data[1] 16) | (data[0] 24); + uint32_t low = data[3] | (data[2] 8); + uint32_t ret = low | (data[1] 16) | (data[0] 24); + return ret; } uint32_t read_be32_2 (struct uint32_st data) diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c new file mode 100644 index 000..dc48d9d --- /dev/null +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target bswap32 } */ +/* { dg-require-effective-target stdint_types } */ +/* { dg-options -O2 -fdump-tree-bswap } */ +/* { dg-additional-options -march=z900 { target s390-*-* } } */ + +#include stdint.h + +extern unsigned char data[4]; + +/* No bswap optimization as low is reused */ +uint32_t read_le32_1 (unsigned char *data, uint32_t *low_neg) +{ + uint32_t low = data[0] | (data[1] 8); + uint32_t ret = low | (data[2] 16) | (data[3] 24); + *low_neg = low; + return ret; +} + +/* No bswap optimization as low is reused */ +uint32_t read_be32_1 (unsigned char *data, uint32_t *low_neg) +{ + uint32_t low = data[3] | (data[2] 8); + uint32_t ret = low