> On Jan 19, 2017, at 4:10 PM, Fangbo Wang <[email protected]> wrote:
> 
> Hi,
> 
> Background:
> 
> I am using stochastic finite element to solve a solid mechanics problem with 
> random material properties. At the end of the day, I get a linear system of 
> equations Ax=b to solve.
> 
> The matrix A is very large with size of 1.3million by 1.3 million, and to 
> save this matrix needs more than 100 G memory. Fortunately, matrix A has some 
> nice features that it is a block matrix, most of the blocks inside the matrix 
> are similar, each block is 10,000 by 10,000. 
> 
> 
> Hence, I only need to save some  blocks (in my case 45). Most of the 
> computation  in my iterative solver is matrix-vec multiplication, that's why 
> I want to do it using block matrices.
> <FIG-2-Color-online-A-symmetric-block-Toeplitz-matrix-Each-block-is-also-a-symmetric.png>
> 
> 
> ​
> 
> Current:
> I tried to parallelize all my 45 block matrices in all the processors, and 
> all the corresponding 45 block vectors in all the processors. However, the 
> computation seems to be very slow, and no scalability at all.
> I am thinking of using small groups of processors to separate the 
> computation, like using intra-communicators and inter-communicators. Maybe 
> this will help to reduce the communication.

   No, just make things excessively complex.

> 
> Any one have some experiences on this? Is there any Petsc function to do 
> these jobs? I am open to any suggestions.

   Based on your picture it looks like if the matrix was explicitly formed it 
would be dense? Or are your 45 "small matrices" sparse? Are there any "empty" 
block matrices in your diagram or are they all one of the 45 small ones?

   There are two ways to order your unknowns; one with all unknowns for one 
"block" then all unknowns for the next block ... or interlacing the unknowns 
between blocks. Depending on the structure of the problem one or the other way 
can be significently better.

   The MatNest construct may be the way to go; it will behave like forming the 
full matrix but for each block in the matrix you would just have a pointer to 
the correct small matrix so you don't store the individual matrices more than 
once.

   Also if you get no speed up you need to verify that it is not due to the 
hardware or badly configured software so run the streams benchmark and make 
sure you have a good MPI binding 
http://www.mcs.anl.gov/petsc/documentation/faq.html#computers




  Barry

> 
> Thank you very much!
> 
> 
> 
> Fangbo Wang
> 
> -- 
> Fangbo Wang, PhD student
> Stochastic Geomechanics Research Group
> Department of Civil, Structural and Environmental Engineering
> University at Buffalo
> Email: [email protected]

Reply via email to