Re: [petsc-users] MatMult() returning different values depending on # of processors?

Barry Smith Thu, 26 Jun 2014 19:42:24 -0700

  Using the point wise differences in the two computed vectors is not a 
meaningful way to check if the matrix-vector product is correct. 
I loaded the reduced matrix up in matlab (after doing a 
MatView(C,PETSC_VIEWER_BINARY_WORLD) in the code) and did the calculation


>> A(82:162,1:81)*x(1:81) + A(82:162,82:162)*x(82:162) - A(82:162,:)*x

ans =

    -1.333828737741757e-23
     3.970466940254533e-22
     3.970466940254533e-22
     6.606856988583543e-20
     6.606856988583543e-20
    -4.878909776184770e-19
    -4.878909776184770e-19
                         0
                         0
                         0
                         0
                         0
                         0
    -2.628894374088577e-31
    -2.628894374088577e-31
    -1.709836290543913e-26
    -1.709836290543913e-26
    -2.136210087789533e-24
    -2.205680334546916e-24
    -2.584939414228211e-26
    -2.584939414228211e-26
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0

>> A(1:81,1:81)*x(1:81) + A(1:81,82:162)*x(82:162) - A(1:81,:)*x

ans =

                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
                         0
    -1.734723475976807e-18
                         0
                         0
                         0
     1.925929944387236e-34
     1.925929944387236e-34
     1.972152263052530e-31
     1.972152263052530e-31
     3.453317498695501e-26
     2.019483917365790e-28
    -1.333828737741757e-23

  Note that the two different ways of computing the same product (which would 
be identical in “infinite precision”) gives differences on the same order as 
running the PETSc program on two processes.    When running in parallel the 
order that the computations are done (and any movement of the partial sums from 
registers to memory) is different for different number of processes hence 
different results.

   Note also that the Intel floating point registers are 80 bits wide while 
double precision numbers in memory are 64 bit. Thus moving a partially computed 
sum to memory and then back to register wipes out 16 bits of the number.

   Barry



On Jun 26, 2014, at 9:10 PM, Arthur Kurlej <[email protected]> wrote:

> Hmm, so I made that change, but my program still exhibits the same problem 
> with it's output.
> 
> 
> On Thu, Jun 26, 2014 at 9:05 PM, Peter Brune <[email protected]> wrote:
> Yep!  Looks good.  One other thing is that you can pre/postface all PETSc 
> calls with ierr = ... CHKERRQ(ierr); and it will properly trace other 
> problems if they arise.
> 
> - Peter
> 
> 
> On Thu, Jun 26, 2014 at 8:56 PM, Arthur Kurlej <[email protected]> wrote:
> Hello,
> 
> I'm sorry, but this is a bit new for me, and I'm still not quite sure I 
> follow.
> 
> Are you recommending that opposed to doing this:
> if(procs!=1){
>       for(i=0;i<procs;i++){
>               if(rank==i){
>                       VecGetLocalSize(*x,&length);
>                       VecGetOwnershipRange(*x,&div,NULL);
>                       
> ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&iscol);
>                       
> ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&isrow);
>                       ierr = 
> MatGetSubMatrix(*A,isrow,iscol,MAT_INITIAL_MATRIX,AA); CHKERRQ(ierr);
>               }
>       }
> }
> else{
> ISCreateStride(PETSC_COMM_SELF,final_size,begin,1,&iscol);
> ISCreateStride(PETSC_COMM_SELF,final_size,begin,1,&isrow);
> ierr = MatGetSubMatrix(*A,isrow,iscol,MAT_INITIAL_MATRIX,AA); CHKERRQ(ierr);
> }
> 
> 
> The proper implementation would instead just be the following:
> VecGetLocalSize(*x,&length);
> VecGetOwnershipRange(*x,&div,NULL);
> ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&iscol);
> ISCreateStride(PETSC_COMM_WORLD,length,div+begin,1,&isrow);
> ierr = MatGetSubMatrix(*A,isrow,iscol,MAT_INITIAL_MATRIX,AA); CHKERRQ(ierr);
> ?
> 
> 
> 
> 
> 
> On Thu, Jun 26, 2014 at 5:32 PM, Peter Brune <[email protected]> wrote:
> MatGetSubMatrix() is collective on Mat and ISCreateXXX is collective on the 
> provided comm, so the logic you have built to call it on one proc at a time 
> is unnecessary at best and most likely incorrect and likely to produce 
> strange results.  You can forgo the if statement and loop over processors, 
> create the ISes on the same comm as x, and then call MatGetSubMatrix() once.
> 
> - Peter
> 
> 
> On Thu, Jun 26, 2014 at 4:26 PM, Arthur Kurlej <[email protected]> wrote:
> I cannot send the original code, but I reproduced the problem in another 
> code. I have attached a makefile the code, and the data for the x vector and 
> A matrix.
> 
> I think the problem may be with my ShortenMatrix function, but it's not clear 
> to me what exactly is going wrong and how to fix it. So I would appreciate 
> some assistance there.
> 
> 
> Thanks,
> Arthur
> 
> 
> 
> On Wed, Jun 25, 2014 at 6:24 PM, Barry Smith <[email protected]> wrote:
> 
>    Can you send the code that reproduces this behavior?
> 
>    Barry
> 
> On Jun 25, 2014, at 4:37 PM, Arthur Kurlej <[email protected]> wrote:
> 
> > Hi Barry,
> >
> > So for the matrix C that I am currently testing (size 162x162), the 
> > condition number is roughly 10^4.
> >
> > For reference, I'm porting MATLAB code into PETSc, and for one processor, 
> > the PETSc b vector is roughly equivalent to the MATLAB b vector. So I know 
> > that for one processor, my program is performing as expected.
> >
> > I've included examples below of values for b (also of size 162), ranging 
> > from indices 131 to 141.
> >
> > #processors=1:
> >                          0
> >      1.315217173959314e-20
> >      1.315217173959314e-20
> >      4.843201487740107e-17
> >      4.843201487740107e-17
> >      8.166104700666665e-14
> >      8.166104700666665e-14
> >      6.303834267553249e-11
> >      6.303834267553249e-11
> >      2.227932688485483e-08
> >      2.227932688485483e-08
> >
> > # processors=2:
> >      5.480410831461926e-22
> >      2.892553944350444e-22
> >      2.892553944350444e-22
> >      7.524038923310717e-24
> >      7.524038923214420e-24
> >     -3.340766769043093e-26
> >     -7.558372155761972e-27
> >      5.551561288838557e-25
> >      5.550551546879874e-25
> >     -1.579397982093437e-22
> >      2.655766754178065e-22
> >
> > # processors = 4:
> >      5.480410831461926e-22
> >      2.892553944351728e-22
> >      2.892553944351728e-22
> >      7.524092205125593e-24
> >      7.524092205125593e-24
> >     -2.584939414228212e-26
> >     -2.584939414228212e-26
> >                          0
> >                          0
> >     -1.245940797657998e-23
> >     -1.245940797657998e-23
> >
> > # processors = 8:
> >      5.480410831461926e-22
> >      2.892553944023035e-22
> >      2.892553944023035e-22
> >      7.524065744581494e-24
> >      7.524065744581494e-24
> >     -2.250265175188197e-26
> >     -2.250265175188197e-26
> >     -6.543127892265160e-26
> >     1.544288143499193e-317
> >      8.788794008375919e-25
> >      8.788794008375919e-25
> >
> >
> > Thanks,
> > Arthur
> >
> >
> >
> > On Wed, Jun 25, 2014 at 4:06 PM, Barry Smith <[email protected]> wrote:
> >
> >    How different are the values in b? Can you send back a few examples of 
> > the different b’s? Any idea of the condition number of C?
> >
> >    Barry
> >
> > On Jun 25, 2014, at 3:10 PM, Arthur Kurlej <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > While running my code, I have found that MatMult() returns different 
> > > values depending on the number of processors I use (and there is quite 
> > > the variance in the values).
> > >
> > > The setup of my code is as follows (I can go into more depth/background 
> > > if needed):
> > > -Generate parallel AIJ matrix of size NxN, denoted as A
> > > -Retrieve parallel AIJ submatrix from the last N-1 rows&columns from A, 
> > > denoted as C
> > > -Generate vector of length N-1, denoted as x
> > > -Find C*x=b
> > >
> > > I have already checked that A, C, and x are all equivalent when ran for 
> > > any number of processors, it is only the values of vector b that varies.
> > >
> > > Does anyone have an idea about what's going on?
> > >
> > >
> > > Thanks,
> > > Arthur
> > >
> >
> >
> 
> 
> 
> 
> 
>

Re: [petsc-users] MatMult() returning different values depending on # of processors?

Reply via email to