Forwarding here in case anyone has comments on this approach to MatMatMultSymbolic. The idea is to use a heap as a priority queue for the next nontrivial entry in the rows of B that need to be merged.
The asymptotics are basically the same as if you have a list with O(log n) insertion. There are two versions, the _Heap uses the heap to make sure that all duplicate entries come to the top at the same time. The _BTHeap version uses a PetscBT to avoid putting duplicates into the heap (so each nonzero in the merged version involves one heap operation). It's possible that merging lists (accelerated by BT) is still the best option in the long run. It seems to be somewhat more memory-friendly. Alternatively, making the heap fatter might give enough of a boost to pay off. The attached callgrind/kcachegrind profiles were moved to http://i.imgur.com/j7yOh.png (btheap) http://i.imgur.com/1ORcH.png (standard) ---------- Forwarded message ---------- From: Jed Brown <[email protected]> Date: Sat, Jun 9, 2012 at 4:18 PM Subject: Re: Heaps To: Hong Zhang <hzhang at mcs.anl.gov> On Sat, Jun 9, 2012 at 3:51 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote: > Jed : > > Wow, you are so fast! > >> I implemented a heap-based MatMatMultSymbolic_SeqAIJ_SeqAIJ_Heap like we >> discussed earlier, but it's slower than the PetscLLCondensed stuff. >> >> I left the code in case you want to look at it or in case the heap can be >> used elsewhere. >> >> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/d2e6d8577ed8 >> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/437bcb349f01 >> > > I'll look at this while supervising the high school kid :-) > The algorithm gives optimal complexity than current linear search. > We need investigate and optimize it though. This is a good project for > the kid. > I just pushed a better one which has competitive performance (now there is one heap operation per nonzero in C rather than one per entry in the rows of B that need to be merged). http://petsc.cs.iit.edu/petsc/petsc-dev/rev/27307154f71e http://petsc.cs.iit.edu/petsc/petsc-dev/rev/4ab803940a7f This could perhaps be optimized more. $ valgrind --tool=callgrind ./ex94 -f0 /home/jed/petsc/datafiles/matrices/arco3 -f1 /home/jed/petsc/datafiles/matrices/arco3 -viewer_binary_skip_info -matmatmult_btheap See attached profiles. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120609/1dda185d/attachment.html>
