On Sat, Apr 13, 2013 at 1:59 PM, Kassiopi Kassiopi2 <kassiopik at gmail.com>wrote:
> Hello, > > I am trying to use PETSc in my code. My numerical scheme is BEM and > requires a dense matrix. I use the mpidense matrix type, and each matrix > entry is populated incrementally. This results in many calls to > matSetValue, for every entry of the matrix. However, I don't need to get > values from the matrix, until all the calculations are done. Moreover, when > the matrix is created I use PETSC_DECIDE for the local rows and columns and > I also do preallocation using MatMPIDenseSetPreallocation. > > Each process writes to specific rows of the matrix, and assuming that > mpidense matrices are distributed row-wise to the processes, each process > should write more or less to its own rows. Moreover, to avoid a bottleneck > at the final matrix assembly, I do a matAssemblyBegin-End with > MAT_FLUSH_ASSEMBLY, every time the stash size reaches a critical value (of > 1 million). > How many off-process values are you writing? This seems like tremendous overkill. > However, when all operations are done and matAssemblyBegin-End is called > with MAT_FINAL_ASSEMBLY the whole program gets stuck there. It doesn't > crash, but it doesn't gets through the assembly either. When to do 'top', > the processes seem to be in sleep status. I have tried waiting for many > hours, but without any development. Even though the remaining items in the > stash are less than 1million, which had an acceptable time cost for > MAT_FLUSH_ASSEMBLY, it seems as if MAT_FINAL_ASSEMBLY just cannot deal with > it. I would expect that this would take a few seconds but definitely not > hours... > This sounds like it goes to virtual (disk) memory for the transfer, which would explain why it does not happen for smaller sizes. Flush the assembly more frequently. > The matrix dimensions are 28356 x 28356. For smaller problems, i.e. ~9000 > rows and colums there is no significant delay. > > My questions are the following: > > 1) I know that the general advice is to fill the matrix in large blocks, > but I am trying to avoid it for now. I would expect that doing > matAssemblyBegin-End with MAT_FLUSH_ASSEMBLY every now and then, would > reduce the load during the final assembly. Is my assumption wrong? > It is not often enough. > 2) How is MatAssemblyBegin-End different when called with > MAT_FINAL_ASSEMBLY instead of MAT_FLUSH_ASSEMBLY? > Lots of things are setup for sparse matrices, but it should not be different for MPIDENSE. > 3) If this is the expected behavior, and it takes so long for a 28000 x > 28000 linear system, it would be impossible to scale up to millions of > dofs. It seems hard to believe that the cost communicating the matrix with > matAssemblyBegin-End is much bigger or even comparable to the cost of > actually calculating the values with numerical integration. > That intuition is exactly wrong. On modern hardware, you can do 1000 floating point operations for each memory reference. > 4) Unfortunately I am not experienced in debugging parallel programs. Is > there a way to see if the processes are blocked waiting for each other? > gdb should be easy to use. Run with -start_in_debugger, then C-c when it seems to hang and type 'where' to get the stack trace. Matt > I apologize for the long email and thank you for taking the time reading > it. > > Best Regards, > Kassiopik > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130413/99095a08/attachment.html>
