Hello, I am trying to use PETSc in my code. My numerical scheme is BEM and requires a dense matrix. I use the mpidense matrix type, and each matrix entry is populated incrementally. This results in many calls to matSetValue, for every entry of the matrix. However, I don't need to get values from the matrix, until all the calculations are done. Moreover, when the matrix is created I use PETSC_DECIDE for the local rows and columns and I also do preallocation using MatMPIDenseSetPreallocation.
Each process writes to specific rows of the matrix, and assuming that mpidense matrices are distributed row-wise to the processes, each process should write more or less to its own rows. Moreover, to avoid a bottleneck at the final matrix assembly, I do a matAssemblyBegin-End with MAT_FLUSH_ASSEMBLY, every time the stash size reaches a critical value (of 1 million). However, when all operations are done and matAssemblyBegin-End is called with MAT_FINAL_ASSEMBLY the whole program gets stuck there. It doesn't crash, but it doesn't gets through the assembly either. When to do 'top', the processes seem to be in sleep status. I have tried waiting for many hours, but without any development. Even though the remaining items in the stash are less than 1million, which had an acceptable time cost for MAT_FLUSH_ASSEMBLY, it seems as if MAT_FINAL_ASSEMBLY just cannot deal with it. I would expect that this would take a few seconds but definitely not hours... The matrix dimensions are 28356 x 28356. For smaller problems, i.e. ~9000 rows and colums there is no significant delay. My questions are the following: 1) I know that the general advice is to fill the matrix in large blocks, but I am trying to avoid it for now. I would expect that doing matAssemblyBegin-End with MAT_FLUSH_ASSEMBLY every now and then, would reduce the load during the final assembly. Is my assumption wrong? 2) How is MatAssemblyBegin-End different when called with MAT_FINAL_ASSEMBLY instead of MAT_FLUSH_ASSEMBLY? 3) If this is the expected behavior, and it takes so long for a 28000 x 28000 linear system, it would be impossible to scale up to millions of dofs. It seems hard to believe that the cost communicating the matrix with matAssemblyBegin-End is much bigger or even comparable to the cost of actually calculating the values with numerical integration. 4) Unfortunately I am not experienced in debugging parallel programs. Is there a way to see if the processes are blocked waiting for each other? I apologize for the long email and thank you for taking the time reading it. Best Regards, Kassiopik -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130413/787f21e4/attachment.html>
