Hi,

ok I will test with 5.1.3 with the option you gave me (--download-superlu_dit-commit=v5.1.3).

But from what you and Matthew said, I should have 5.1.3 with petsc-master, but the last night log shows me library file name 5.1.0:

http://www.giref.ulaval.ca/~cmpgiref/petsc-master-debug/2016.12.31.02h00m01s_configure.log

So I am a bit confused: Why did I got 5.1.0 last night? (I use the petsc-master tarball, is it the reason?)

Thanks,

Eric


Le 2016-12-31 à 11:52, Satish Balay a écrit :
On Sat, 31 Dec 2016, Eric Chamberland wrote:

Hi,

I am just starting to debug a bug encountered with and only with SuperLU_Dist
combined with MKL on a 2 processes validation test.

(the same test works fine with MUMPS on 2 processes).

I just noticed that the SuperLU_Dist version installed by PETSc configure
script is 5.1.0 and the latest SuperLU_DIST is 5.1.3.
If you use petsc-master - it will install 5.1.3 by default.
Before going further, I just want to ask:

Is there any specific reason to stick to 5.1.0?
We don't usually upgrade externalpackage version in PETSc releases
[unless its tested to work and fixes known bugs]. There could be API
changes - or build changes that can potentially conflict.

>From what I know - 5.1.3 should work with petsc-3.7 [it fixes a couple of 
bugs].

You might be able to do the following with petsc-3.7 [with git externalpackage 
repos]

--download-superlu_dist --download-superlu_dit-commit=v5.1.3

Satish

Here is some more information:

On process 2 I have this printed in stdout:

Intel MKL ERROR: Parameter 6 was incorrect on entry to DTRSM .

and in stderr:

Test.ProblemeEFGen.opt: malloc.c:2369: sysmalloc: Assertion `(old_top ==
(((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof
(struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size)
= (unsigned long)((((__builtin_offsetof (struct malloc_chunk,
fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) &&
((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
[saruman:15771] *** Process received signal ***

This is the 7th call to KSPSolve in the same execution. Here is the last
KSPView:

KSP Object:(o_slin) 2 MPI processes
   type: preonly
   maximum iterations=10000, initial guess is zero
   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
   left preconditioning
   using NONE norm type for convergence test
PC Object:(o_slin) 2 MPI processes
   type: lu
     LU: out-of-place factorization
     tolerance for zero pivot 2.22045e-14
     matrix ordering: natural
     factor fill ratio given 0., needed 0.
       Factored matrix follows:
         Mat Object:         2 MPI processes
           type: mpiaij
           rows=382, cols=382
           package used to perform factorization: superlu_dist
           total: nonzeros=0, allocated nonzeros=0
           total number of mallocs used during MatSetValues calls =0
             SuperLU_DIST run parameters:
               Process grid nprow 2 x npcol 1
               Equilibrate matrix TRUE
               Matrix input mode 1
               Replace tiny pivots FALSE
               Use iterative refinement FALSE
               Processors in row 2 col partition 1
               Row permutation LargeDiag
               Column permutation METIS_AT_PLUS_A
               Parallel symbolic factorization FALSE
               Repeated factorization SamePattern
   linear system matrix = precond matrix:
   Mat Object:  (o_slin)   2 MPI processes
     type: mpiaij
     rows=382, cols=382
     total: nonzeros=4458, allocated nonzeros=4458
     total number of mallocs used during MatSetValues calls =0
       using I-node (on process 0) routines: found 109 nodes, limit used is 5

I know this information is not enough to help debug, but I would like to know
if PETSc guys will upgrade to 5.1.3 before trying to debug anything.

Thanks,
Eric



Reply via email to