Yup, there are truckloads of cool possibilities of alternatives here. Barry
On Jun 9, 2012, at 4:35 PM, Jed Brown wrote: > Forwarding here in case anyone has comments on this approach to > MatMatMultSymbolic. The idea is to use a heap as a priority queue for the > next nontrivial entry in the rows of B that need to be merged. > > The asymptotics are basically the same as if you have a list with O(log n) > insertion. There are two versions, the _Heap uses the heap to make sure that > all duplicate entries come to the top at the same time. The _BTHeap version > uses a PetscBT to avoid putting duplicates into the heap (so each nonzero in > the merged version involves one heap operation). > > It's possible that merging lists (accelerated by BT) is still the best option > in the long run. It seems to be somewhat more memory-friendly. Alternatively, > making the heap fatter might give enough of a boost to pay off. > > The attached callgrind/kcachegrind profiles were moved to > > http://i.imgur.com/j7yOh.png (btheap) > http://i.imgur.com/1ORcH.png (standard) > > > ---------- Forwarded message ---------- > From: Jed Brown <jedbrown at mcs.anl.gov> > Date: Sat, Jun 9, 2012 at 4:18 PM > Subject: Re: Heaps > To: Hong Zhang <hzhang at mcs.anl.gov> > > > On Sat, Jun 9, 2012 at 3:51 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote: > Jed : > > Wow, you are so fast! > I implemented a heap-based MatMatMultSymbolic_SeqAIJ_SeqAIJ_Heap like we > discussed earlier, but it's slower than the PetscLLCondensed stuff. > > I left the code in case you want to look at it or in case the heap can be > used elsewhere. > > http://petsc.cs.iit.edu/petsc/petsc-dev/rev/d2e6d8577ed8 > http://petsc.cs.iit.edu/petsc/petsc-dev/rev/437bcb349f01 > > I'll look at this while supervising the high school kid :-) > The algorithm gives optimal complexity than current linear search. > We need investigate and optimize it though. This is a good project for > the kid. > > I just pushed a better one which has competitive performance (now there is > one heap operation per nonzero in C rather than one per entry in the rows of > B that need to be merged). > > http://petsc.cs.iit.edu/petsc/petsc-dev/rev/27307154f71e > http://petsc.cs.iit.edu/petsc/petsc-dev/rev/4ab803940a7f > > This could perhaps be optimized more. > > $ valgrind --tool=callgrind ./ex94 -f0 > /home/jed/petsc/datafiles/matrices/arco3 -f1 > /home/jed/petsc/datafiles/matrices/arco3 -viewer_binary_skip_info > -matmatmult_btheap > > See attached profiles. > >
