Re: [patch] speed up ifcvt:cond_move_convert_if_block
On Thu, Aug 16, 2012 at 12:06 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Aug 16, 2012 at 1:11 AM, Steven Bosscher stevenb@gmail.com wrote: On Mon, Aug 6, 2012 at 1:27 PM, Paolo Bonzini bonz...@gnu.org wrote: 2. sparseset has the same problem of memory clearing (for valgrind, see sparseset_alloc). ... only the sparse array needs this clearing, but currently we do it for both. And according to the fat comment before the xcalloc, it's not even really needed. Why can't we just tell valgrind to treat the memory as defined, like in this patchlet: Indeed ... I suppose ok if it bootstraps / tests w/o valgrind checking and if a cc1 builds with valgrind checking. Here's what I'll commit: Index: sparseset.c === --- sparseset.c (revision 190474) +++ sparseset.c (working copy) @@ -30,12 +30,14 @@ unsigned int n_bytes = sizeof (struct sparseset_def) + ((n_elms - 1) * 2 * sizeof (SPARSESET_ELT_TYPE)); - /* We use xcalloc rather than xmalloc to silence some valgrind uninitialized + sparseset set = XNEWVAR(struct sparseset_def, n_bytes); + + /* Mark the sparseset as defined to silence some valgrind uninitialized read errors when accessing set-sparse[n] when n is not, and never has been, in the set. These uninitialized reads are expected, by design and - harmless. If this turns into a performance problem due to some future - additional users of sparseset, we can revisit this decision. */ - sparseset set = (sparseset) xcalloc (1, n_bytes); + harmless. */ + VALGRIND_DISCARD (VALGRIND_MAKE_MEM_DEFINED (set, n_bytes)); + set-dense = (set-elms[0]); set-sparse = (set-elms[n_elms]); set-size = n_elms;
Re: [patch] speed up ifcvt:cond_move_convert_if_block
On Thu, Aug 16, 2012 at 1:11 AM, Steven Bosscher stevenb@gmail.com wrote: On Mon, Aug 6, 2012 at 1:27 PM, Paolo Bonzini bonz...@gnu.org wrote: 2. sparseset has the same problem of memory clearing (for valgrind, see sparseset_alloc). ... only the sparse array needs this clearing, but currently we do it for both. And according to the fat comment before the xcalloc, it's not even really needed. Why can't we just tell valgrind to treat the memory as defined, like in this patchlet: Indeed ... I suppose ok if it bootstraps / tests w/o valgrind checking and if a cc1 builds with valgrind checking. Thanks, Richard. Index: sparseset.c === --- sparseset.c (revision 190418) +++ sparseset.c (working copy) @@ -30,12 +30,14 @@ sparseset_alloc (SPARSESET_ELT_TYPE n_elms) unsigned int n_bytes = sizeof (struct sparseset_def) + ((n_elms - 1) * 2 * sizeof (SPARSESET_ELT_TYPE)); - /* We use xcalloc rather than xmalloc to silence some valgrind uninitialized + sparseset set = XNEWVAR(sparseset, n_bytes); + + /* Mark the sparseset as defined to silence some valgrind uninitialized read errors when accessing set-sparse[n] when n is not, and never has been, in the set. These uninitialized reads are expected, by design and - harmless. If this turns into a performance problem due to some future - additional users of sparseset, we can revisit this decision. */ - sparseset set = (sparseset) xcalloc (1, n_bytes); + harmless. */ + VALGRIND_DISCARD (VALGRIND_MAKE_MEM_DEFINED (set, n_bytes)); + set-dense = (set-elms[0]); set-sparse = (set-elms[n_elms]); set-size = n_elms; I have never built GCC with valgrind checking, so I don't know if this is correct. But there has to be a better solution than calling xcalloc... Ciao! Steven
Re: [patch] speed up ifcvt:cond_move_convert_if_block
On Mon, Aug 6, 2012 at 1:27 PM, Paolo Bonzini bonz...@gnu.org wrote: 2. sparseset has the same problem of memory clearing (for valgrind, see sparseset_alloc). ... only the sparse array needs this clearing, but currently we do it for both. And according to the fat comment before the xcalloc, it's not even really needed. Why can't we just tell valgrind to treat the memory as defined, like in this patchlet: Index: sparseset.c === --- sparseset.c (revision 190418) +++ sparseset.c (working copy) @@ -30,12 +30,14 @@ sparseset_alloc (SPARSESET_ELT_TYPE n_elms) unsigned int n_bytes = sizeof (struct sparseset_def) + ((n_elms - 1) * 2 * sizeof (SPARSESET_ELT_TYPE)); - /* We use xcalloc rather than xmalloc to silence some valgrind uninitialized + sparseset set = XNEWVAR(sparseset, n_bytes); + + /* Mark the sparseset as defined to silence some valgrind uninitialized read errors when accessing set-sparse[n] when n is not, and never has been, in the set. These uninitialized reads are expected, by design and - harmless. If this turns into a performance problem due to some future - additional users of sparseset, we can revisit this decision. */ - sparseset set = (sparseset) xcalloc (1, n_bytes); + harmless. */ + VALGRIND_DISCARD (VALGRIND_MAKE_MEM_DEFINED (set, n_bytes)); + set-dense = (set-elms[0]); set-sparse = (set-elms[n_elms]); set-size = n_elms; I have never built GCC with valgrind checking, so I don't know if this is correct. But there has to be a better solution than calling xcalloc... Ciao! Steven
[patch] speed up ifcvt:cond_move_convert_if_block
Hello, In PR54146, ifcvt spends a lot of time just clearing memory. This patch changes the value maps to pointer-maps to fix this issue. Bootstrappedtested on x86_64-unknown-linux-gnu. OK? Ciao! Steven PR54146_ifcvt.diff Description: Binary data
Re: [patch] speed up ifcvt:cond_move_convert_if_block
On Mon, Aug 6, 2012 at 8:54 AM, Steven Bosscher stevenb@gmail.com wrote: Hello, In PR54146, ifcvt spends a lot of time just clearing memory. This patch changes the value maps to pointer-maps to fix this issue. Bootstrappedtested on x86_64-unknown-linux-gnu. OK? Ok! Thanks, Richard. Ciao! Steven
Re: [patch] speed up ifcvt:cond_move_convert_if_block
Il 06/08/2012 08:54, Steven Bosscher ha scritto: Hello, In PR54146, ifcvt spends a lot of time just clearing memory. This patch changes the value maps to pointer-maps to fix this issue. Bootstrappedtested on x86_64-unknown-linux-gnu. OK? Nice, but perhaps we need a sparsemap to do even better? Paolo
Re: [patch] speed up ifcvt:cond_move_convert_if_block
On Mon, Aug 6, 2012 at 1:07 PM, Paolo Bonzini bonz...@gnu.org wrote: Il 06/08/2012 08:54, Steven Bosscher ha scritto: Hello, In PR54146, ifcvt spends a lot of time just clearing memory. This patch changes the value maps to pointer-maps to fix this issue. Bootstrappedtested on x86_64-unknown-linux-gnu. OK? Nice, but perhaps we need a sparsemap to do even better? If you mean sparseset: no, it won't do any better. My first idea was to use a sparseset, but: 1. The amount of memory allocated is simply too big. In fact, it'd be even worse than before: 4*max_regno*sizeof(rtx) instead of only 2*max_regno*sizeof(rtx). 2. sparseset has the same problem of memory clearing (for valgrind, see sparseset_alloc). Remember, we can call this function cond_move_convert_if_block O(n_basic_block) times: On a CFG with mostly regular control flow, i.e. no or few computed jumps, jump tables, and exception handlers, the number of IF-THEN-{,ELSE_}-JOIN blocks will be O(n_basic_blocks) and cond_move_convert_if_block is called on almost all of them. What could work, is to allocate a sparseset once at the beginning of if_convert, but I don't see any good reason to add a pair of global variables for this. My patch reduces the time spent in ifcvt to almost nothing (down from ~250s, which is still not so bad, considering we're looking at a function with 26 basic blocks!). Ciao! Steven
Re: [patch] speed up ifcvt:cond_move_convert_if_block
Il 06/08/2012 13:15, Steven Bosscher ha scritto: On Mon, Aug 6, 2012 at 1:07 PM, Paolo Bonzini bonz...@gnu.org wrote: Il 06/08/2012 08:54, Steven Bosscher ha scritto: Hello, In PR54146, ifcvt spends a lot of time just clearing memory. This patch changes the value maps to pointer-maps to fix this issue. Bootstrappedtested on x86_64-unknown-linux-gnu. OK? Nice, but perhaps we need a sparsemap to do even better? If you mean sparseset: no, it won't do any better. No, I mean inventing a sparseset that cannot do boolean operations, but can store an associative value in a second dense array. That is sparsemap_insert(m, k, v) { if (!sparsemap_contains(m, k)) { s-sparse[k] = s-members++; s-dense_keys[i] = k; } i = s-sparse[k]; s-dense_values[i] = v; } sparsemap_contains(m, k) { if (!sparsemap_contains(m, k)) { return -1; } else { return s-dense_values[i]; } My first idea was to use a sparseset, but: 1. The amount of memory allocated is simply too big. In fact, it'd be even worse than before: 4*max_regno*sizeof(rtx) instead of only 2*max_regno*sizeof(rtx). Yeah, sparseset wastes some memory. I wonder if the dense array should be allocated separately and even made dynamically resizable, also because... 2. sparseset has the same problem of memory clearing (for valgrind, see sparseset_alloc). ... only the sparse array needs this clearing, but currently we do it for both. What could work, is to allocate a sparseset once at the beginning of if_convert, but I don't see any good reason to add a pair of global variables for this. Yes, this is what I meant. Fast clearing is where sparsesets behave well, so it makes sense to make them global in that case. What I missed was that the maps remain small. Otherwise they would also need clearing. Paolo