Hi, > Yes, I want to preserve the behavior Barry described in 2), i.e. > only error if the user specified an allocation pattern explicitly. > I'm only questioning the way it is currently implemented: > > If the number of nonzeros is specified as PETSC_DECIDE to > MatSeqAIJSetPreallocation___SeqAIJ(), > it does not set MAT_NEW_NONZERO_ALLOCATION___ERR. If the number of > nonzeros is larger than zero, it does. So far, so good. > > > If nnz is nonnegative, yes.
right, it's actually nz >= 0. > MatMPIAIJSetPreallocation___MPIAIJ(), however, first checks for the > number of nonzeros. If it is PETSC_DECIDE, it sets defaults and then > calls MatSeqAIJSetPreallocation___SeqAIJ() with the defaults. > Because MatSeqAIJSetPreallocation___SeqAIJ() no longer sees > PETSC_DECIDE, it sets MAT_NEW_NONZERO_ALLOCATION___ERR, which is > then reset right after in MatMPIAIJSetPreallocation___MPIAIJ(). The > CUSP-implementation does not reset the flag and thus leads to > errors, even though the implementation of the CUSP-Preallocation > routine looks perfectly fine. I rather want to fix the hack in > MatMPIAIJSetPreallocation___MPIAIJ() than to propagate it to the GPU > implementations as well. > > > I thought perhaps you were going to leave it to the _SeqAIJ parts to > store the nonew state. If the _MPIAIJ part resolves PETSC_DECIDE > independently from the _SeqAIJ part, then they have to *always* coincide > exactly, otherwise we'll have very confusing bugs. You can likely > forward PETSC_DECIDE down to the _SeqAIJ parts, then query a->nonew and > b->nonew to set the mpiaij->nonew flag. (I'm not sure it's consistent > now, but we want it to be consistent.) There is no mpiaij->nonew flag in Mat_MPIAIJ, this is all handled in the two diagonal/off-processor matrices of type Mat_SeqAIJ. I don't want to change this ;-) I just want to resolve the way way a->nonew and b->nonew is set, so that it does not require a hack in MatMPIAIJSetPreallocation_MPIAIJ(). A closer look showed three options: a) just forward PETSC_DECIDE to MatSeqAIJSetPreallocation_SeqAIJ(), don't modify the nonew-flags from _MPIAIJ(). This almost fixes the problem, the only drawback is that the number of nonzeros on the off-processor blocks is then also 5 instead of 2 when using PETSC_DECIDE. b) Allow to pass default values to MatSeqAIJSetPreallocation_SeqAIJ(). The cause of the issue is that MatSeqAIJSetPreallocation_SeqAIJ() does two things: It interprets PETSC_DECIDE and it does the preallocation. If we split these two tasks into two functions, i.e. - MatSeqAIJSetPreallocation_SeqAIJ() for interpreting PETSC_DECIDE and dealing with nonew, then calling - MatSeqAIJSetPreallocation_SeqAIJAlloc() for the actual preallocation we can directly call MatSeqAIJSetPreallocation_SeqAIJAlloc() from the _MPIAIJ() part and cleanly set MAT_NEW_NONZERO_ALLOCATION_ERR for the on- and off-processor matrices. c) Improve the existing hack by querying the state of a->nonew and b->nonew before calling MatSeqAIJSetPreallocation_SeqAIJ() and after that reset a user-defined state if necessary. I'll create a pull request for b), as it is consistent with the current behavior and appears to be the cleanest solution to this. Best regards, Karli
