On 2/21/19 1:48 PM, Smith, Barry F. wrote:




On Feb 20, 2019, at 12:50 PM, Mills, Richard Tran via petsc-dev 
<[email protected]><mailto:[email protected]> wrote:

Folks,

I'm working with some PFLOTRAN examples, and I've been scratching my head for a 
while at errors that are being generated in MatSetValuesBlockedLocal(), related 
to new nonzeros requiring a malloc. E.g.,

[0]PETSC ERROR: Argument out of range
[0]PETSC ERROR: New nonzero at (2,0) caused a malloc

The strange thing is that all of these examples run fine when I use the 
PFLOTRAN default BAIJ matrix type (with blocks column-oriented) -- I only see 
the issue when I switch to using AIJ. If I set the matrix option 
MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE to allow things to run with AIJ, 
everything seems to work perfectly. (And once the Jacobian matrix is 
constructed the first time, the memory usage is not growing according to what I 
see from -malloc_log.)

I've spent some time poking around with a debugger in the variants of 
MatSetValues() and MatXAIJSetPreallocation(), and I haven't been able make much 
progress in determining why I'm seeing this in AIJ vs. BAIJ -- it's easy to get 
confused looking at this code, and it's going to take me more time to follow 
what is going on than I've had to spare so far. So, I've got a few questions. 
First, should it even be possible to see what I am seeing? That is, if the 
MatSetValuesBlocked() routine is not causing new allocations when using BAIJ, 
should this be possible with AIJ?




    Sure, because the preallocation for BAIJ matrix is by block, while for AIJ 
it is by point (even if you provide a block size to AIJ). So long as the new 
entry is within an allocated block it will not trigger an error with BAIJ but 
may with AIJ.

Hmm. It seems that I've misunderstood what MatXAIJSetPreallocation() is 
supposed to do. I thought that if I pass in a blocksize > 1, 
MatXAIJSetPreallocation() will turn the block-row preallocation into the 
equivalent scalar-row stuff. This seems to be what is happening in the code, 
unless I'm reading this wrong:

[...]
304:     } else {                    /* Convert block-row precallocation to 
scalar-row */
305:       PetscInt i,m,*sdnnz,*sonnz;
306:       MatGetLocalSize(A,&m,NULL);
307:       PetscMalloc2((!!dnnz)*m,&sdnnz,(!!onnz)*m,&sonnz);
308:       for (i=0; i<m; i++) {
309:         if (dnnz) sdnnz[i] = dnnz[i/bs] * bs;
310:         if (onnz) sonnz[i] = onnz[i/bs] * bs;
311:       }
312:       MatSeqAIJSetPreallocation(A,0,dnnz ? sdnnz : NULL);
313:       MatMPIAIJSetPreallocation(A,0,dnnz ? sdnnz : NULL,0,onnz ? sonnz : 
NULL);
[...]

Though might something be happening later that reduces the amount of nonzeros 
allocated, since the "dense" blocks of the matrix actually will have zero 
entries that never get a value set?

Something does appear to be screwy when I look at the "-snes_view" output for a 
time-dependent problem where I see these errors. When using BAIJ, at every time 
step I see

    Mat Object: Mat_0x84000000_0 (flow_) 1 MPI processes
      type: seqbaij
      rows=144, cols=144, bs=3
      total: nonzeros=2304, allocated nonzeros=2304
      total number of mallocs used during MatSetValues calls =0
          block size is 3

but when I switch to AIJ, I initially see

    Mat Object: Mat_0x84000000_0 (flow_) 1 MPI processes
      type: seqaij
      rows=144, cols=144, bs=3
      total: nonzeros=1584, allocated nonzeros=3324
      total number of mallocs used during MatSetValues calls =68
        using I-node routines: found 96 nodes, limit used is 5

but as things progress, the allocated nonzeros and mallocs count keeps growing; 
by step 43 I'm seeing

      total: nonzeros=1584, allocated nonzeros=115524
      total number of mallocs used during MatSetValues calls =7548

If I dump the matrices and look at them in something like Octave, the matrices 
from the BAIJ and AIJ runs appear identical within round-off error, and we 
certainly aren't adding any entries at each step. Why should PETSc keep calling 
malloc() at each MatSetValues() call, then? I suspect a bug somewhere in the 
AIJ case call sequence of MatSetValuesBlockedLocal() --> MatSetValuesBlocked() 
--> MatSetValues() --> MatSetValues_SeqAIJ(). I have trouble following all of 
the logic in matsetvaluesseqaij_(), though, so I guess I'm going to have try 
harder to understand why MatSeqXAIJReallocateAIJ() is being hit there.

Any educated guesses about what might be going wrong would be appreciated. 
There is a lot of PETSc code I'm not familiar with involved here.

Cheers,
Richard





   Barry



(Clearly it *is* possible, but should it be?) I'm still trying to figure out if 
there is something wrong with what PFLOTRAN is doing, vs. something going wrong 
somewhere inside PETSc.

Any hints about what to look at from someone with more familiarity with this 
code would be appreciated.

--Richard





Reply via email to