[petsc-dev] Cache blocking ex2.c

Nystrom, William D Mon, 11 Mar 2013 18:35:03 +0000

This weekend, I spent some time making a modified version of 
src/ksp/ksp/examples/tutorials/ex2.c
where I added the capability to divide the m x n mesh into blocks of size mblk 
x nblk respectively.  I
believe I have this debugged and working properly.  In building the matrix, I 
travel through the blocks
in the same way as the original scheme for going through the m x n matrix.  
That is, I go through the
blocks with the nblk direction varying fastest and process the blocks with the 
blocks in the n direction
varying fastest.


I was interested in seeing if I would get better performance, both with and 
without threads, because
of getting better cache utilization.  However, when I try testing the new way 
of building the matrix and
vary mblk and nblk, I'm not really getting a meaningful speedup, either with or 
without threads.  Here
is the sort of command I am running with my hacked version of ex2:

ex2 -m 1000 -n 1000 -compute_matrix_flag 3 -mblk 50 -nblk 50 -ksp_type cg 
-pc_type jacobi \
       -log_summary -ksp_rtol 1.0e-10 -ksp_converged_reason -threadcomm_type 
pthread \
       -threadcomm_nthreads 12

I have varied mblk and nblk from 10 to 100.  I have not tried non-square aspect 
ratios.  I'm running
on my local workstation which is a dual socket Xeon with 6 cores per socket.  
Using 12 threads and
the original matrix build procedure of ex2, I get a speedup of about 3x over 
single thread.  This
experiment was motivated by an email interchange with Jed several months back 
where he suggested
that the organization of the matrix and mesh for the ex2.c example was poor.  
Another motivation was
that I get pretty decent speedup using threads if the problem is small enough, 
for instance 200 x 600
mesh.  But for larger problems like 1000 x 1000, the speedup decreases 
dramatically.  I assumed
this was because of the smaller problem running mostly out of cache.  I was 
hoping a blocking strategy
might help larger problems to get better speedup by using cache better.

Does what I am trying to do make sense?  Does my approach seem reasonable?  I'm 
happy to provide
my hacked version of ex2.c if anyone wants to look at it.

Thanks,

Dave

--
Dave Nystrom
LANL HPC-5
Phone: 505-667-7913
Email: wdn at lanl.gov
Smail: Mail Stop B272
       Group HPC-5
       Los Alamos National Laboratory
       Los Alamos, NM 87545

[petsc-dev] Cache blocking ex2.c

Reply via email to