Yup, there are truckloads of cool possibilities of alternatives here.

   Barry

On Jun 9, 2012, at 4:35 PM, Jed Brown wrote:

> Forwarding here in case anyone has comments on this approach to 
> MatMatMultSymbolic. The idea is to use a heap as a priority queue for the 
> next nontrivial entry in the rows of B that need to be merged.
> 
> The asymptotics are basically the same as if you have a list with O(log n) 
> insertion. There are two versions, the _Heap uses the heap to make sure that 
> all duplicate entries come to the top at the same time. The _BTHeap version 
> uses a PetscBT to avoid putting duplicates into the heap (so each nonzero in 
> the merged version involves one heap operation).
> 
> It's possible that merging lists (accelerated by BT) is still the best option 
> in the long run. It seems to be somewhat more memory-friendly. Alternatively, 
> making the heap fatter might give enough of a boost to pay off.
> 
> The attached callgrind/kcachegrind profiles were moved to
> 
> http://i.imgur.com/j7yOh.png (btheap)
> http://i.imgur.com/1ORcH.png (standard)
> 
> 
> ---------- Forwarded message ----------
> From: Jed Brown <jedbrown at mcs.anl.gov>
> Date: Sat, Jun 9, 2012 at 4:18 PM
> Subject: Re: Heaps
> To: Hong Zhang <hzhang at mcs.anl.gov>
> 
> 
> On Sat, Jun 9, 2012 at 3:51 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
> Jed :
> 
> Wow, you are so fast!
> I implemented a heap-based MatMatMultSymbolic_SeqAIJ_SeqAIJ_Heap like we 
> discussed earlier, but it's slower than the PetscLLCondensed stuff.
> 
> I left the code in case you want to look at it or in case the heap can be 
> used elsewhere.
> 
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/d2e6d8577ed8
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/437bcb349f01
> 
> I'll look at this while supervising the high school kid :-)
> The algorithm gives optimal complexity than current linear search.
> We need investigate and optimize it though. This is a good project for
> the kid.
> 
> I just pushed a better one which has competitive performance (now there is 
> one heap operation per nonzero in C rather than one per entry in the rows of 
> B that need to be merged).
> 
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/27307154f71e
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/4ab803940a7f
> 
> This could perhaps be optimized more.
> 
> $ valgrind --tool=callgrind ./ex94 -f0 
> /home/jed/petsc/datafiles/matrices/arco3 -f1 
> /home/jed/petsc/datafiles/matrices/arco3 -viewer_binary_skip_info 
> -matmatmult_btheap
> 
> See attached profiles.
> 
> 


Reply via email to