Could also be a bug in the new hash table stuff for options?
> ==3965== valgrind: Unrecognised instruction at address 0x20104a53. > ==3965== at 0x20104A53: kh_resize_HTPrinted (viewreg.c:12) > ==3965== by 0x20104EE2: kh_put_HTPrinted (viewreg.c:12) ==3965== by 0x201061EF: PetscOptionsHelpPrintedCheck (viewreg.c:89) Barry > On Jul 3, 2018, at 1:10 PM, Mills, Richard Tran <[email protected]> wrote: > > Mark, were you trying this in Valgrind with a binary targeting KNL, i.e., > built to use AVX-512 instructions? I don't think Valgrind implements all (or > any?) of those, so a failure is not a surprise. Indeed, I've had Valgrind > choke on some AVX2 instructions, though maybe the most recent versions of > Valgrind will handle these properly now. > > --Richard > > On Tue, Jul 3, 2018 at 4:25 AM, Mark Adams <[email protected]> wrote: > Well this does work without valgrind. > > On Tue, Jul 3, 2018 at 6:36 AM Mark Adams <[email protected]> wrote: > I built a 32 bit integer version and now it dies in PetscInit. Ugh. > > ==3965== Conditional jump or move depends on uninitialised value(s) > ==3965== at 0x27AFFD8F: _int_free (malloc.c:3945) > ==3965== by 0x20074F48: PetscOptionsSetValue (options.c:1152) > ==3965== by 0x20070450: PetscOptionsInsertArgs_Private (options.c:636) > ==3965== by 0x2007189F: PetscOptionsInsert (options.c:746) > ==3965== by 0x20093DB1: PetscInitialize (pinit.c:929) > ==3965== by 0x2000ABCE: main (ex19.c:106) > ==3965== > vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7F 0x8 0x7B 0xC2 0xC5 > 0xFB 0x10 0xD > vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 > vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE > vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 > ==3965== valgrind: Unrecognised instruction at address 0x20104a53. > ==3965== at 0x20104A53: kh_resize_HTPrinted (viewreg.c:12) > ==3965== by 0x20104EE2: kh_put_HTPrinted (viewreg.c:12) > ==3965== by 0x201061EF: PetscOptionsHelpPrintedCheck (viewreg.c:89) > ==3965== by 0x2009AD61: PetscOptionsBegin_Private (aoptions.c:34) > ==3965== by 0x2008362E: PetscOptionsSetFromOptions (options.c:2646) > ==3965== by 0x20094689: PetscInitialize (pinit.c:967) > ==3965== by 0x2000ABCE: main (ex19.c:106) > ==3965== Your program just tried to execute an instruction that Valgrind > ==3965== did not recognise. There are two possible reasons for this. > ==3965== 1. Your program has a bug and erroneously jumped to a non-code > ==3965== location. If you are running Memcheck and you just saw a > ==3965== warning about a bad jump, it's probably your program's fault. > ==3965== 2. The instruction is legitimate but Valgrind doesn't handle it, > ==3965== i.e. it's Valgrind's fault. If you think this is the case or > ==3965== you are not sure, please let us know and we'll try to fix it. > ==3965== Either way, Valgrind will now raise a SIGILL signal which will > ==3965== probably kill your program. > [0]PETSC ERROR: ==3965== Conditional jump or move depends on uninitialised > value(s) > ==3965== at 0x27B18B48: strchrnul (strchr.S:106) > ==3965== by 0x27AE30C8: __find_specmb (printf-parse.h:108) > ==3965== by 0x27AE30C8: vfprintf (vfprintf.c:1311) > ==3965== by 0x27AFA045: vsnprintf (vsnprintf.c:119) > ==3965== by 0x20116802: PetscVSNPrintf (mprint.c:178) > ==3965== by 0x20117006: PetscVFPrintfDefault (mprint.c:294) > ==3965== by 0x20136BB2: PetscErrorPrintfDefault (errtrace.c:116) > ==3965== by 0x2013938C: PetscSignalHandlerDefault (signal.c:131) > ==3965== by 0x2013903B: PetscSignalHandler_Private (signal.c:43) > ==3965== by 0x2335D07F: ??? (in > /global/u2/m/madams/petsc_install/petsc/src/snes/examples/tutorials/ex19) > ==3965== by 0x20104A52: kh_resize_HTPrinted (viewreg.c:12) > ==3965== by 0x20104EE2: kh_put_HTPrinted (viewreg.c:12) > ==3965== by 0x201061EF: PetscOptionsHelpPrintedCheck (viewreg.c:89) > ==3965== > ==3965== Conditional jump or move depends on uninitialised value(s) > ==3965== at 0x27B08184: strlen (strlen.S:210) > ==3965== by 0x200B3A48: PetscStrlen (str.c:158) > ==3965== by 0x2011693D: PetscVSNPrintf (mprint.c:188) > ==3965== by 0x20117006: PetscVFPrintfDefault (mprint.c:294) > ==3965== by 0x20136BB2: PetscErrorPrintfDefault (errtrace.c:116) > ==3965== by 0x2013938C: PetscSignalHandlerDefault (signal.c:131) > ==3965== by 0x2013903B: PetscSignalHandler_Private (signal.c:43) > ==3965== by 0x2335D07F: ??? (in > /global/u2/m/madams/petsc_install/petsc/src/snes/examples/tutorials/ex19) > ==3965== by 0x20104A52: kh_resize_HTPrinted (viewreg.c:12) > ==3965== by 0x20104EE2: kh_put_HTPrinted (viewreg.c:12) > ==3965== by 0x201061EF: PetscOptionsHelpPrintedCheck (viewreg.c:89) > ==3965== by 0x2009AD61: PetscOptionsBegin_Private (aoptions.c:34) > ==3965== > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 4 Illegal instruction: Likely due to > memory corruption > > > On Mon, Jul 2, 2018 at 8:53 PM Mark Adams <[email protected]> wrote: > Looping Treb and Baky back into the thread, and dropping PETSs. > > Great, thanks Barry for figuring this out. > > Treb and Baky need 64 bit indices, but in the mean time I can build a 32 bit > version to let them test. > > I am all set up to test a 64 bit version. If you can give me a branch I can > test. > > Thanks, > Mark > > > > On Mon, Jul 2, 2018 at 7:47 PM Smith, Barry F. <[email protected]> wrote: > > Progress has been made. These libraries do contain mkl_set_num_threads() > is found so ./configure knows that it is MKL libraries (unlike before when it > did not recognize that it was MKL libraries). > > > Damn it, here is why: > > --with-64-bit-indices=1 > > currently the MKL sparse stuff only works with 32 bit integers AND 32 bit > integer BLAS/LAPACK. > > It will never work with 64 bit indices and 32 bit integer BLAS/LAPACK. but > maybe could be upgraded to work with 64 bit indices and 64 bit integer > BLAS/LAPACK Richard? > > > Anyways the requirement in mkl_sparse.py is > > self.requires32bitint = 1 > > The problem is ./configure does not print enough information to make it > immediately clear it is rejecting the package because of this incompatibility. > > > > Barry > > > > > > On Jul 2, 2018, at 5:57 PM, Mark Adams <[email protected]> wrote: > > > > Same error: > > > > 15:53 nid02517 master *= ~/petsc_install/petsc/src/snes/examples/tutorials$ > > make > > PETSC_DIR=/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp > > PETSC_ARCH="" ex19 > > cc -o ex19.o -c -g -O0 -mkl -static-intel -fopenmp > > -I/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp/include > > > > -I/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp/include > > -I/global/homes/m/madams/tmp/hypre-2.14.0/include > > -I/opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/include > > -I/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp/include > > `pwd`/ex19.c > > > > > > cc -g -O0 -mkl -static-intel -fopenmp -o ex19 ex19.o > > -L/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp/lib > > -Wl,-rpath,/global/homes/m/madams/tmp/hypre-2.14.0/lib > > -L/global/homes/m/madams/tmp/hypre-2.14.0/lib > > -Wl,-rpath,/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp/lib > > -L/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp/lib > > -L/opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64 > > -lpetsc -lHYPRE -lparmetis -lmetis -lstdc++ -ldl -lmkl_intel_ilp64 > > -lmkl_intel_thread -lmkl_core -liomp5 -lpthread > > /opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.a(mkl_semaphore.o): > > In function `mkl_serv_load_inspector': > > > > mkl_semaphore.c:(.text+0x123): warning: Using 'dlopen' in statically linked > > applications requires at runtime the shared libraries from the glibc > > version used for linking > > /global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp/lib/libpetsc.a(send.o): > > In function `PetscOpenSocket': > > > > /global/u2/m/madams/petsc_install/petsc/src/sys/classes/viewer/impls/socket/send.c:108: > > warning: Using 'gethostbyname' in statically linked applications requires > > at runtime the shared libraries from the glibc version used for linking > > > > > > > > rm ex19.o > > > > > > 15:54 nid02517 master *= ~/petsc_install/petsc/src/snes/examples/tutorials$ > > make > > PETSC_DIR=/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp > > PETSC_ARCH="" runex19_gamg > > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > > > > > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > > [1]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > > > [1]PETSC ERROR: Unknown type. Check for miss-spelling or missing package: > > http://www.mcs.anl.gov/petsc/documentation/installation.html#external > > > > [1]PETSC ERROR: Unknown Mat type given: aijmkl > > > > On Mon, Jul 2, 2018 at 6:43 PM Satish Balay <[email protected]> wrote: > > Hm - I suspect its an issue with mkl includes - not the libraries. > > > > Satish > > > > On Mon, 2 Jul 2018, Smith, Barry F. wrote: > > > > > > > > Mark, > > > > > > This is not useful. We need the new configure log from when you list > > > all the libraries NERSE recommends (which may work). > > > > > > I already said what was wrong with this configuration. > > > > > > Barry > > > > > > > > > > On Jul 2, 2018, at 5:25 PM, Mark Adams <[email protected]> wrote: > > > > > > > > > > > > > > > > On Mon, Jul 2, 2018 at 6:22 PM Satish Balay <[email protected]> wrote: > > > > I don't understand the problem here.. > > > > > > > > > > > [0]PETSC ERROR: Unknown type. Check for miss-spelling or missing > > > > > > > package: > > > > > > > http://www.mcs.anl.gov/petsc/documentation/installation.html#external > > > > > > > [0]PETSC ERROR: Unknown Mat type given: aijmkl > > > > > > > > If this is the problem - then we'll have to look at configure.log to > > > > check why PETSC_HAVE_MKL_SPARSE flag is not set. > > > > > > > > <configure.log> > > > > > > > <configure.log> > >
