On 10/10/2016 07:11 PM, Satish Balay wrote:
Thats from petsc-3.5
Anton - please post the stack trace you get with
--download-superlu_dist-commit=origin/maint
I guess this is it:
[0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421
/home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282
/home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[0]PETSC ERROR: [0] MatLUFactorNumeric line 2985
/home/anton/LIB/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: [0] PCSetUp_LU line 101
/home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c
[0]PETSC ERROR: [0] PCSetUp line 930
/home/anton/LIB/petsc/src/ksp/pc/interface/precon.c
According to the line numbers it crashes within
MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx.
Surprisingly this only happens on the second SNES iteration, but not on
the first.
I'm trying to reproduce this behavior with PETSc KSP and SNES examples.
However, everything I've tried up to now with SuperLU_DIST does just fine.
I'm also checking our code in Valgrind to make sure it's clean.
Anton
Satish
On Mon, 10 Oct 2016, Xiaoye S. Li wrote:
Which version of superlu_dist does this capture? I looked at the original
error log, it pointed to pdgssvx: line 161. But that line is in comment
block, not the program.
Sherry
On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov <[email protected]> wrote:
On 10/07/2016 05:23 PM, Satish Balay wrote:
On Fri, 7 Oct 2016, Kong, Fande wrote:
On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay <[email protected]> wrote:
On Fri, 7 Oct 2016, Anton Popov wrote:
Hi guys,
are there any news about fixing buggy behavior of SuperLU_DIST, exactly
what
is described here:
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.
mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm
l&d=CwIBAg&c=
54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_
JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG
1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ?
I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works
fine
with 3.5.4.
Do I still have to stick to maint branch, and what are the chances for
these
fixes to be included in 3.7.5?
3.7.4. is off maint branch [as of a week ago]. So if you are seeing
issues with it - its best to debug and figure out the cause.
This bug is indeed inside of superlu_dist, and we started having this
issue
from PETSc-3.6.x. I think superlu_dist developers should have fixed this
bug. We forgot to update superlu_dist?? This is not a thing users could
debug and fix.
I have many people in INL suffering from this issue, and they have to
stay
with PETSc-3.5.4 to use superlu_dist.
To verify if the bug is fixed in latest superlu_dist - you can try
[assuming you have git - either from petsc-3.7/maint/master]:
--download-superlu_dist --download-superlu_dist-commit=origin/maint
Satish
Hi Satish,
I did this:
git clone -b maint https://bitbucket.org/petsc/petsc.git petsc
--download-superlu_dist
--download-superlu_dist-commit=origin/maint (not sure this is needed,
since I'm already in maint)
The problem is still there.
Cheers,
Anton