[PATCH] Fix PR46590

2014-01-17 Thread Richard Biener

This fixes PR46590 - I've worked on this at the beginning of last
year already but appearantly when deciding to not push the last
bits of the LIM reorg failed to check the full testcase again.

So this fixes memory usage of that large testcase (many loops)
from requiring a peak memory usage of 3.5GB (kills my machine)
to a peak memory usage of ~1GB (peak happens during IRA,
the testcase completes compile in 122s at -O1 and in 160s at -O3).

The main issue with LIM was the accesses_in_loop array which
we keep for each distinct memory reference.  This was an
array [number-of-loops][number-of-accesses-in-that-loop]
to be able to easily walk over all accesses of a memory reference
in a loop and its children (see for_all_locs_in_loop).
Obviously allocating a vec of size number-of-loops of vecs
(even if those remain un-allocated) makes memory requirements
O(number-of-accesses * number-of-loops) - very bad for this
testcase.

The fix is to flatten this to a [number-of-accesses] array
and support easy walking over accesses in a loop and its children
by sorting that array after the locations loop postorder number.
So all accesses are clustered and you can bsearch for a member
of that cluster.

Note that restoring this efficient walking isn't necessary
for the testcase in PR46590 but I've created an artificial
testcase where that improves compile-time from 13s to 0s (LIM time).

The patch below includes some more TLC and optimizations I applied
to LIM until I noticed this issue.

The patch also adds bsearch support to vec, alongside its support
for qsort, copying the glibc inline function implementation
(Jakub said we shouldn't rely on bsearch availability nor on
it being implemented with an inline function to make it fast).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'll commit this later today (if testing works out ok).

Thanks,
Richard.

2014-01-17  Richard Biener  rguent...@suse.de

PR tree-optimization/46590
* vec.h (vec::bseach): New member function implementing
binary search according to C89 bsearch.
(vec::qsort): Avoid calling ::qsort for vectors with sizes 0 or 1.
* tree-ssa-loop-im.c (struct mem_ref): Make stored member a
bitmap pointer again.  Make accesses_in_loop a flat array.
(mem_ref_obstack): New global.
(outermost_indep_loop): Adjust for mem_ref-stored changes.
(mark_ref_stored): Likewise.
(ref_indep_loop_p_2): Likewise.
(set_ref_stored_in_loop): New helper function.
(mem_ref_alloc): Allocate mem_refs on the mem_ref_obstack obstack.
(memref_free): Adjust.
(record_mem_ref_loc): Simplify.
(gather_mem_refs_stmt): Adjust.
(sort_locs_in_loop_postorder_cmp): New function.
(analyze_memory_references): Sort accesses_in_loop after
loop postorder number.
(find_ref_loc_in_loop_cmp): New function.
(for_all_locs_in_loop): Find relevant cluster of locs in
accesses_in_loop and iterate without recursion.
(execute_sm): Avoid uninit warning.
(struct ref_always_accessed): Simplify.
(ref_always_accessed::operator ()): Likewise.
(ref_always_accessed_p): Likewise.
(tree_ssa_lim_initialize): Initialize mem_ref_obstack, compute
loop postorder numbers here.
(tree_ssa_lim_finalize): Free mem_ref_obstack and loop postorder
numbers.

Index: gcc/vec.h
===
*** gcc/vec.h.orig  2014-01-07 10:19:52.979454286 +0100
--- gcc/vec.h   2014-01-17 12:18:11.352085931 +0100
*** public:
*** 476,481 
--- 476,482 
void unordered_remove (unsigned);
void block_remove (unsigned, unsigned);
void qsort (int (*) (const void *, const void *));
+   T *bsearch (const void *key, int (*compar)(const void *, const void *));
unsigned lower_bound (T, bool (*)(const T , const T )) const;
static size_t embedded_size (unsigned);
void embedded_init (unsigned, unsigned = 0);
*** templatetypename T, typename A
*** 938,944 
  inline void
  vecT, A, vl_embed::qsort (int (*cmp) (const void *, const void *))
  {
!   ::qsort (address (), length (), sizeof (T), cmp);
  }
  
  
--- 939,981 
  inline void
  vecT, A, vl_embed::qsort (int (*cmp) (const void *, const void *))
  {
!   if (length ()  1)
! ::qsort (address (), length (), sizeof (T), cmp);
! }
! 
! 
! /* Search the contents of the sorted vector with a binary search.
!CMP is the comparison function to pass to bsearch.  */
! 
! templatetypename T, typename A
! inline T *
! vecT, A, vl_embed::bsearch (const void *__key,
! int (*__compar) (const void *, const void *))
! {
!   const void *__base = this-address ();
!   size_t __nmemb = this-length ();
!   size_t __size = sizeof (T);
!   /* The following is a copy of glibc stdlib-bsearch.h.  */
!   size_t __l, __u, __idx;
!   const void *__p;
!   int 

Re: [PATCH] Fix PR46590

2014-01-17 Thread Jakub Jelinek
On Fri, Jan 17, 2014 at 12:32:34PM +0100, Richard Biener wrote:
 ! /* Search the contents of the sorted vector with a binary search.
 !CMP is the comparison function to pass to bsearch.  */

Can you please sed -i -e s/__//g in the whole method?
I mean, this isn't in libstdc++ or glibc header, so there is no point
in obfuscating the names, and the __ prefixed names are even reserved
for implementation, which we are not at least in stage1 built compiler.

 ! templatetypename T, typename A
 ! inline T *
 ! vecT, A, vl_embed::bsearch (const void *__key,
 !   int (*__compar) (const void *, const void *))
 ! {
 !   const void *__base = this-address ();
 !   size_t __nmemb = this-length ();
 !   size_t __size = sizeof (T);
 !   /* The following is a copy of glibc stdlib-bsearch.h.  */
 !   size_t __l, __u, __idx;
 !   const void *__p;
 !   int __comparison;
 ! 
 !   __l = 0;
 !   __u = __nmemb;
 !   while (__l  __u)
 ! {
 !   __idx = (__l + __u) / 2;
 !   __p = (const void *) (((const char *) __base) + (__idx * __size));
 !   __comparison = (*__compar) (__key, __p);
 !   if (__comparison  0)
 ! __u = __idx;
 !   else if (__comparison  0)
 ! __l = __idx + 1;
 !   else
 ! return (T *)const_castvoid *(__p);
 ! }
 ! 
 !   return NULL;
   }

Jakub


Re: [PATCH] Fix PR46590

2014-01-17 Thread Richard Biener
On Fri, 17 Jan 2014, Jakub Jelinek wrote:

 On Fri, Jan 17, 2014 at 12:32:34PM +0100, Richard Biener wrote:
  ! /* Search the contents of the sorted vector with a binary search.
  !CMP is the comparison function to pass to bsearch.  */
 
 Can you please sed -i -e s/__//g in the whole method?
 I mean, this isn't in libstdc++ or glibc header, so there is no point
 in obfuscating the names, and the __ prefixed names are even reserved
 for implementation, which we are not at least in stage1 built compiler.

Will do.

Richard.

  ! templatetypename T, typename A
  ! inline T *
  ! vecT, A, vl_embed::bsearch (const void *__key,
  ! int (*__compar) (const void *, const void *))
  ! {
  !   const void *__base = this-address ();
  !   size_t __nmemb = this-length ();
  !   size_t __size = sizeof (T);
  !   /* The following is a copy of glibc stdlib-bsearch.h.  */
  !   size_t __l, __u, __idx;
  !   const void *__p;
  !   int __comparison;
  ! 
  !   __l = 0;
  !   __u = __nmemb;
  !   while (__l  __u)
  ! {
  !   __idx = (__l + __u) / 2;
  !   __p = (const void *) (((const char *) __base) + (__idx * __size));
  !   __comparison = (*__compar) (__key, __p);
  !   if (__comparison  0)
  !   __u = __idx;
  !   else if (__comparison  0)
  !   __l = __idx + 1;
  !   else
  !   return (T *)const_castvoid *(__p);
  ! }
  ! 
  !   return NULL;
}
k