start_in_debugger noxterm -debugger_nodes 3 Use -start_in_debugger noxterm -debugger_nodes 0
when not opening windows for each debugger it is best to have the first rank associated with the tty as the debugger node > On Feb 23, 2021, at 3:46 PM, Francesco Brarda <[email protected]> > wrote: > > Using the command you suggested I got > > fbrarda@srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 > examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm > -debugger_nodes 3 > ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as > of version 3.14 and will be removed in a future release. Please use the > option -debugger_ranks instead. (Silence this warning with > -options_suppress_deprecated_warnings) > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > > history_size = 5 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data > save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > file = (Default) > init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default)init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > Initial log joint probability = -195.984 > Iter log prob ||dx|| ||grad|| alpha alpha0 # > evals Notes > 10 -0.97101 0.00292919 1.65855 0.001 0.001 > 46 LS failed, Hessian reset > 12 -0.483952 0.001316 1.18542 0.001 0.001 > 77 LS failed, Hessian reset > 13 -0.477916 0.0118542 0.163518 0.01 0.001 > 106 LS failed, Hessian reset > [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -debugger_nodes 3 > [1]PETSC ERROR: -start_in_debugger noxterm > [1]PETSC ERROR: ----------------End of Error Message -------send entire error > message to [email protected] <mailto:[email protected]>————— > > And then it does not go further. With the -debugger_ranks suggested, the > output is the same. What do you think, please? > I am using a cluster (one node, dual-socket system with twelve-core-CPUs), > but when I do the ssh I do not use the -X flag, if that's what you mean. > > Thank you, > Francesco > > >> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <[email protected] >> <mailto:[email protected]>> ha scritto: >> >> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <[email protected] >> <mailto:[email protected]>> wrote: >> Thank you for the quick response. >> Sorry, you are right. Here is the complete output: >> >> fbrarda@srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 >> examples/rosenbrock/rosenbrock optimize -start_in_debugger >> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on >> display :0.0 on machine srvulx13 >> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on >> display :0.0 on machine srvulx13 >> xterm: Xt error: Can't open display: :0.0 >> xterm: DISPLAY is not set >> xterm: Xt error: Can't open display: :0.0 >> xterm: DISPLAY is not set >> >> Do you have an Xserver running? If not, you can use >> >> -start_in_debugger noxterm -debugger_nodes 3 >> >> and try to get a stack trace from one node. >> >> Thanks, >> >> Matt >> >> method = optimize >> optimize >> algorithm = lbfgs (Default) >> lbfgs >> method = optimize >> optimize >> algorithm = lbfgs (Default) >> lbfgs >> init_alpha = 0.001 (Default) >> tol_obj = 9.9999999999999998e-13 (Default) >> tol_rel_obj = 10000 (Default) >> tol_grad = 1e-08 (Default) >> init_alpha = 0.001 (Default) >> tol_obj = 9.9999999999999998e-13 (Default) >> tol_rel_obj = 10000 (Default) >> tol_grad = 1e-08 (Default) >> tol_rel_grad = 10000000 (Default) >> tol_param = 1e-08 (Default) >> history_size = 5 (Default) >> tol_rel_grad = 10000000 (Default) >> tol_param = 1e-08 (Default) >> history_size = 5 (Default) >> iter = 2000 (Default) >> iter = 2000 (Default) >> save_iterations = 0 (Default) >> id = 0 (Default) >> data save_iterations = 0 (Default) >> id = 0 (Default) >> data >> file = (Default) >> >> file = (Default) >> init = 2 (Default) >> random >> seed = 3585768430 (Default) >> init = 2 (Default) >> random >> seed = 3585768430 (Default) >> output >> file = output.csv (Default) >> output >> file = output.csv (Default) >> diagnostic_file = (Default) >> refresh = 100 (Default) >> diagnostic_file = (Default) >> refresh = 100 (Default) >> >> >> Initial log joint probability = -731.444 >> Iter log prob ||dx|| ||grad|| alpha alpha0 >> # evals Notes >> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in >> src/cmdstan/main.cpp >> To prevent termination, change the error handler using >> PetscPushErrorHandler() >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 47804 RUNNING AT srvulx13 >> = EXIT CODE: 134 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> >> >> >> >> The code inside main.cpp is the following: >> >> #include <cmdstan/command.hpp> >> #include <stan/services/error_codes.hpp> >> >> #include <petsc.h> >> >> int main(int argc, char* argv[]) { >> >> PetscErrorCode ierr; >> ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); >> >> try { >> ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); >> } catch (const std::exception& e) { >> std::cout << e.what() << std::endl; >> ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); >> } >> >> ierr = PetscFinalize();CHKERRQ(ierr); >> return ierr; >> } >> >> I highlighted the line 12. Although I read the page where the command >> PetscPushErrorHandler is explained and the example provided >> (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively >> use the command. >> Should I change the entire try/catch with >> PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? >> >> Best, >> Francesco >> >> >>> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <[email protected]> >>> ha scritto: >>> >>> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda >>> <[email protected]> wrote: >>> Hi! >>> >>> I am very new to the PETSc world. I am working with a GitHub repo that uses >>> PETSc together with Stan (a statistics open source software), here you can >>> find the discussion. >>> It has been defined a functor to convert EigenVector to PetscVec and >>> viceversa, both sequentially and in parallel. >>> The file using these functions does the conversions with the sequential >>> setting. I changed to those using MPI, that is from >>> EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I >>> want to evaluate the scaling. >>> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock >>> optimize in the debug mode I get the error Caught signal number 11 SEGV. I >>> therefore used the option -start_in_debugger and I get the following: >>> >>> For some reason, the -start_in_debuggger option is not being seen. Are you >>> showing all the output? Once the debugger is attached, >>> you run the program (conr) and then when you hit the SEGV you get a stack >>> trace (where). >>> >>> THanks, >>> >>> Matt >>> >>> [2]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [2]PETSC ERROR: or see >>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >>> to find memory corruption errors >>> [2]PETSC ERROR: likely location of problem given in stack below >>> [2]PETSC ERROR: --------------------- Stack Frames >>> ------------------------------------ >>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [2]PETSC ERROR: INSTEAD the line number of the start of the function >>> [2]PETSC ERROR: is given. >>> [3]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [3]PETSC ERROR: or see >>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >>> to find memory corruption errors >>> [3]PETSC ERROR: likely location of problem given in stack below >>> [3]PETSC ERROR: --------------------- Stack Frames >>> ------------------------------------ >>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [3]PETSC ERROR: INSTEAD the line number of the start of the function >>> [3]PETSC ERROR: is given. >>> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in >>> unknown file (null) >>> To prevent termination, change the error handler using >>> PetscPushErrorHandler() >>> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in >>> unknown file (null) >>> To prevent termination, change the error handler using >>> PetscPushErrorHandler() >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 22939 RUNNING AT srvulx13 >>> = EXIT CODE: 134 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) >>> This typically refers to a problem with your application. >>> Please see the FAQ page for debugging suggestions >>> >>> I read the documentation regarding the PetscAbortErrorHandler, but I do not >>> know where should I use it. How can I solve the problem? >>> I hope I have been clear enough. >>> Attached you can find also my configure.log and make.log files. >>> >>> Best, >>> Francesco >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >
