[PETSC #16886] Storage of the matrix

Barry Smith Sun, 14 Oct 2007 21:14:13 -0500 (CDT)


On Wed, 10 Oct 2007, Jaime Suarez wrote:


> Also, I would like to perform a matrix-vector multiplication in
> parallel, but this time the matrix would be symmetric, so that the
> libraries for sparce matrix do not seem efficient for equally
> distributing the jobs in several processors. Can you give me some advice
> about what to do?
> 
> Thanking you in advance,
> 
> Jaime Suarez
> 
> 

  This is a good point. Though the symmetric compressed sparse row (CSR) matrix
format is good for reducing memory usage, it makes load balancing of __floating 
point operations__
and __communication__ difficult. If roughly the same number of rows are 
assigned to 
each process the last process has less nonzero's (and hence less floating point 
to do),
in addition the neighbor to neighbor communication is not symmetric. The first 
process needs
to receive from several other processes, but not send ANY messages, the last 
process receives
no messages but needs to send to several. The other issue is when the messages 
are posted.
There need to be TWO "waves" of communication, each process needs to receive 
the remote vector
entries to do the "upper trianglar" part of the multiply. They also need to 
send the partial 
results they computed for the "lower triangular" part ot the multiply to the 
process that owns
it. We found that actually doing this as two seperate VecScatters made the 
whole multiply
a good bit slower thant the MPIAIJ format; hence we merged them so there is 
only one VecScatter
and hence one "wave" of communication. BUT this introduces its own problem, the 
first process cannot start
its scatter until it has done ALL the lower part of the multiply so later 
processes will be 
waiting a bit longer.

  Ways to improve the situation
1) give the later processes more rows to balance the number of nonzeros (and 
hence floating point operations)
per process, the drawback to this is 
  a) calculating the rows that each process should get (which depends on the 
nonzero pattern)
  b) the synchronization of when sends and receives are sent by different 
processes is worse
  c) the rest of the code may be now unbalanced since latter processes have 
many more entries in the 
     vector
2) just use the MPIAIJ format. Give up memory savings.
3) store the "block diagonal" portion of the matrix (that portion associated 
with the local rows AND columns
   of the matrix) in SBAIJ format but redundantly store the off-diagonal 
entries on the appropriate processes.
   This has nice load balancing. If we assume the vast majority of the matrix 
entries are in the 
   "block diagonal" portion then this will still save most of the memory that 
the MPISBAIJ format saves.
4) ???

  If there is a lot of demand for an efficient parallel symmetric format it may 
make sense for us
to implement 3.

  Barry

[PETSC #16886] Storage of the matrix

Reply via email to