THings to check are:

1) run on a dedicated machine

2) are the matrices partitioned in the same way for both tests.

3) MatVec is memory bandwidth limited.  14 or 15 speedup on 16 shared memory 
cores is great.  5 or 6 is bad.  My experience with old IBS SPs was getting a 
speed of 12 and SPs have a great (and expensive) memory system.  I don't know 
what state-of-the-art is now but memory bandwidth has been going down generally 
wrt processor speed so I'd take a look at the number from your code again.

Mark

On May 2, 2012, at 12:01 PM, Javier Fresno wrote:

> 
> 
> I have a very simple Petsc program that multiplies a matrix and a vector 
> several times. It works fine but it has some scalability issues. I execute it 
> in a shared memory machine with 16 processors and it only runs 5 or 6 times 
> faster (only taking into account the MatMult call). I have programmed the 
> same algorithm with C and MPI and it shows a proper speedup (around 14 or 
> 15). The matrices I use have millions of non zero elements, so I think they 
> are big enough.
> 
> What can I do to get the same speedup that in the manual C version?
> 
> I enclose an except of the code. Thank you in advance.
> 
> Javier
> 
> 
> 
> /**
> * Main function
> */
> int main(int argc, char ** argv){
> 
>    // Initialize Petsc
>    PetscInitialize(&argc, &argv, (char *) 0, NULL);
> 
>    // Timers
>    PetscLogDouble t_start, t_end;
> 
>    // File Viewer
>    PetscViewer fd;
>    PetscViewerBinaryOpen(PETSC_COMM_WORLD,"matrix_file",FILE_MODE_READ,&fd);
> 
>    // M matrix
>    Mat M;
>    MatCreate(PETSC_COMM_WORLD,&M);
>    MatSetFromOptions(M);
>    MatLoad(M,fd);
>    PetscViewerDestroy(&fd);
>    MatAssemblyBegin(M,MAT_FINAL_ASSEMBLY);
>    MatAssemblyEnd(M,MAT_FINAL_ASSEMBLY);
> 
>    PetscInt n, m, local_n, local_m;
>    MatGetSize(M,&n,&m);
>    MatGetLocalSize(M,&local_n,&local_m);
> 
>    // b and c vectors
>    Vec b,c;
>    VecCreate(PETSC_COMM_WORLD,&b);
>    VecSetFromOptions(b);
>    VecSetSizes(b,local_n,n);
> 
>    VecCreate(PETSC_COMM_WORLD,&c);
>    VecSetFromOptions(c);
>    VecSetSizes(c,local_n,n);
> 
>    init_vector_values(b);
> 
>    VecAssemblyBegin(b);
>    VecAssemblyEnd(b);
> 
> 
>    // Main computation
>    PetscGetTime(&t_start);
>    int i;
>    for(i=0; i<iter/2; i++){
>        MatMult(M,b,c);
>        MatMult(M,c,b);
>    }
>    PetscGetTime(&t_end);
> 
>    PetscPrintf(PETSC_COMM_WORLD,"Comp time: %lf\n",t_end-t_start);
> 
>    PetscFinalize();
> 
>    return 0;
> }
> 
> 
> 

Reply via email to