This run is with '-n 2' - so -debugger_nodes value should be either 0 or 1 Satish
On Tue, 23 Feb 2021, Francesco Brarda wrote: > Using the command you suggested I got > > fbrarda@srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 > examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm > -debugger_nodes 3 > ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as > of version 3.14 and will be removed in a future release. Please use the > option -debugger_ranks instead. (Silence this warning with > -options_suppress_deprecated_warnings) > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > > history_size = 5 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data > save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > file = (Default) > init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default)init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > Initial log joint probability = -195.984 > Iter log prob ||dx|| ||grad|| alpha alpha0 # > evals Notes > 10 -0.97101 0.00292919 1.65855 0.001 0.001 > 46 LS failed, Hessian reset > 12 -0.483952 0.001316 1.18542 0.001 0.001 > 77 LS failed, Hessian reset > 13 -0.477916 0.0118542 0.163518 0.01 0.001 > 106 LS failed, Hessian reset > [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -debugger_nodes 3 > [1]PETSC ERROR: -start_in_debugger noxterm > [1]PETSC ERROR: ----------------End of Error Message -------send entire error > message to [email protected]————— > > And then it does not go further. With the -debugger_ranks suggested, the > output is the same. What do you think, please? > I am using a cluster (one node, dual-socket system with twelve-core-CPUs), > but when I do the ssh I do not use the -X flag, if that's what you mean. > > Thank you, > Francesco > > > > Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <[email protected]> > > ha scritto: > > > > On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda > > <[email protected]> wrote: > > Thank you for the quick response. > > Sorry, you are right. Here is the complete output: > > > > fbrarda@srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 > > examples/rosenbrock/rosenbrock optimize -start_in_debugger > > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on > > display :0.0 on machine srvulx13 > > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on > > display :0.0 on machine srvulx13 > > xterm: Xt error: Can't open display: :0.0 > > xterm: DISPLAY is not set > > xterm: Xt error: Can't open display: :0.0 > > xterm: DISPLAY is not set > > > > Do you have an Xserver running? If not, you can use > > > > -start_in_debugger noxterm -debugger_nodes 3 > > > > and try to get a stack trace from one node. > > > > Thanks, > > > > Matt > > > > method = optimize > > optimize > > algorithm = lbfgs (Default) > > lbfgs > > method = optimize > > optimize > > algorithm = lbfgs (Default) > > lbfgs > > init_alpha = 0.001 (Default) > > tol_obj = 9.9999999999999998e-13 (Default) > > tol_rel_obj = 10000 (Default) > > tol_grad = 1e-08 (Default) > > init_alpha = 0.001 (Default) > > tol_obj = 9.9999999999999998e-13 (Default) > > tol_rel_obj = 10000 (Default) > > tol_grad = 1e-08 (Default) > > tol_rel_grad = 10000000 (Default) > > tol_param = 1e-08 (Default) > > history_size = 5 (Default) > > tol_rel_grad = 10000000 (Default) > > tol_param = 1e-08 (Default) > > history_size = 5 (Default) > > iter = 2000 (Default) > > iter = 2000 (Default) > > save_iterations = 0 (Default) > > id = 0 (Default) > > data save_iterations = 0 (Default) > > id = 0 (Default) > > data > > file = (Default) > > > > file = (Default) > > init = 2 (Default) > > random > > seed = 3585768430 (Default) > > init = 2 (Default) > > random > > seed = 3585768430 (Default) > > output > > file = output.csv (Default) > > output > > file = output.csv (Default) > > diagnostic_file = (Default) > > refresh = 100 (Default) > > diagnostic_file = (Default) > > refresh = 100 (Default) > > > > > > Initial log joint probability = -731.444 > > Iter log prob ||dx|| ||grad|| alpha alpha0 > > # evals Notes > > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in > > src/cmdstan/main.cpp > > To prevent termination, change the error handler using > > PetscPushErrorHandler() > > > > =================================================================================== > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > = PID 47804 RUNNING AT srvulx13 > > = EXIT CODE: 134 > > = CLEANING UP REMAINING PROCESSES > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > > This typically refers to a problem with your application. > > Please see the FAQ page for debugging suggestions > > > > > > > > > > > > The code inside main.cpp is the following: > > > > #include <cmdstan/command.hpp> > > #include <stan/services/error_codes.hpp> > > > > #include <petsc.h> > > > > int main(int argc, char* argv[]) { > > > > PetscErrorCode ierr; > > ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); > > > > try { > > ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); > > } catch (const std::exception& e) { > > std::cout << e.what() << std::endl; > > ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); > > } > > > > ierr = PetscFinalize();CHKERRQ(ierr); > > return ierr; > > } > > > > I highlighted the line 12. Although I read the page where the command > > PetscPushErrorHandler is explained and the example provided > > (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should > > effectively use the command. > > Should I change the entire try/catch with > > PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? > > > > Best, > > Francesco > > > > > >> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <[email protected]> > >> ha scritto: > >> > >> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda > >> <[email protected]> wrote: > >> Hi! > >> > >> I am very new to the PETSc world. I am working with a GitHub repo that > >> uses PETSc together with Stan (a statistics open source software), here > >> you can find the discussion. > >> It has been defined a functor to convert EigenVector to PetscVec and > >> viceversa, both sequentially and in parallel. > >> The file using these functions does the conversions with the sequential > >> setting. I changed to those using MPI, that is from > >> EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I > >> want to evaluate the scaling. > >> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock > >> optimize in the debug mode I get the error Caught signal number 11 SEGV. I > >> therefore used the option -start_in_debugger and I get the following: > >> > >> For some reason, the -start_in_debuggger option is not being seen. Are you > >> showing all the output? Once the debugger is attached, > >> you run the program (conr) and then when you hit the SEGV you get a stack > >> trace (where). > >> > >> THanks, > >> > >> Matt > >> > >> [2]PETSC ERROR: > >> ------------------------------------------------------------------------ > >> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > >> probably memory access out of range > >> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > >> [2]PETSC ERROR: or see > >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > >> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X > >> to find memory corruption errors > >> [2]PETSC ERROR: likely location of problem given in stack below > >> [2]PETSC ERROR: --------------------- Stack Frames > >> ------------------------------------ > >> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not > >> available, > >> [2]PETSC ERROR: INSTEAD the line number of the start of the function > >> [2]PETSC ERROR: is given. > >> [3]PETSC ERROR: > >> ------------------------------------------------------------------------ > >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > >> probably memory access out of range > >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > >> [3]PETSC ERROR: or see > >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > >> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X > >> to find memory corruption errors > >> [3]PETSC ERROR: likely location of problem given in stack below > >> [3]PETSC ERROR: --------------------- Stack Frames > >> ------------------------------------ > >> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > >> available, > >> [3]PETSC ERROR: INSTEAD the line number of the start of the function > >> [3]PETSC ERROR: is given. > >> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > >> unknown file (null) > >> To prevent termination, change the error handler using > >> PetscPushErrorHandler() > >> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > >> unknown file (null) > >> To prevent termination, change the error handler using > >> PetscPushErrorHandler() > >> > >> =================================================================================== > >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > >> = PID 22939 RUNNING AT srvulx13 > >> = EXIT CODE: 134 > >> = CLEANING UP REMAINING PROCESSES > >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > >> =================================================================================== > >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > >> This typically refers to a problem with your application. > >> Please see the FAQ page for debugging suggestions > >> > >> I read the documentation regarding the PetscAbortErrorHandler, but I do > >> not know where should I use it. How can I solve the problem? > >> I hope I have been clear enough. > >> Attached you can find also my configure.log and make.log files. > >> > >> Best, > >> Francesco > >> > >> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which their > >> experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > >
