Re: [gmx-users] Gromacs 2016.3 orte error while running on cluster
HI! WORKS LIKE A CHARM NOW. Thank you for your hints! .-) Am 17.01.2018 um 13:04 schrieb Rainer Rutka: HI! Just a question. We try to start a MPIed job with Gromacs 2016.3 on our cluster-system here in Germany. Unfortunately we get this error: An ORTE daemon has unexpectedly failed after lunch... See more in the attached gromacs-run-error.txt file. Our submit-script is attached, too: gromacs-run-pbs.txt THANKS IN ADVANCE! -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Gromacs 2016.3 orte error while running on cluster
HI! Just a question. We try to start a MPIed job with Gromacs 2016.3 on our cluster-system here in Germany. Unfortunately we get this error: An ORTE daemon has unexpectedly failed after lunch... See more in the attached gromacs-run-error.txt file. Our submit-script is attached, too: gromacs-run-pbs.txt THANKS IN ADVANCE! -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 * hwloc 1.11.2 has encountered an incorrect PCI locality information. * PCI bus :80 is supposedly close to 2nd NUMA node of 1st package, * however hwloc believes this is impossible on this architecture. * Therefore the PCI bus will be moved to 1st NUMA node of 2nd package. * * If you feel this fixup is wrong, disable it by setting in your environment * HWLOC_PCI__80_LOCALCPUS= (empty value), and report the problem * to the hwloc's user mailing list together with the XML output of lstopo. * * You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your environment. -- An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun. This could be caused by a number of factors, including an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -- #!/bin/bash #MSUB -joe #MSUB -N XADpr-1 #MSUB -l walltime=48:00:00 #MSUB -l nodes=5:ppn=16 export OMP_NUM_THREADS=1 export PATH=/opt/bwhpc/common/mpi/openmpi/2.1.1-gnu-7.1/bin:/opt/bwhpc/common/compiler/gnu/7.1.0/bin:/opt/bwhpc/common/chem/gromacs/2016.3_gnu7.1/bin:/software/all/bin:/usr/lib64/qt-3.3/bin:/opt/moab/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/st/st_st/st_ac128541/.local/bin:/home/st/st_st/st_ac128541/bin export LD_BIND_NOW=1 cd /pfs/work2/workspace/scratch/st_ac128541-Lipase-0/1EDB module purge module load chem/gromacs/2016.3_gnu7.1 if [ 1 == 1 ] ; then # Start simulation gmx_mpi grompp -maxwarn 10 -f md.mdp -c XADpr.gro -p XADpr.top -o XADpr-1.tpr -po XADpr-1.mdp > XADpr-1.grompp.out 2>&1 mpirun -n 80 gmx_mpi mdrun -s XADpr-1.tpr -maxh 47 -npme 20 -cpo XADpr-1.cpt -o XADpr-1.trr -x XADpr-1.xtc -c XADpr-1.gro -e XADpr-1.edr -g XADpr-1.log > XADpr-1.mdrun.out 2>&1 else # Continue simulation using the checkpoint feature mpirun -n 80 gmx_mpi mdrun -cpi XADpr-0.cpt -cpo XADpr-1.cpt -s XADpr-1.tpr -maxh 47 -npme 20 -o XADpr-1.trr -x XADpr-1.xtc -c XADpr-1.gro -e XADpr-1.edr -g XADpr-1.log > XADpr-1.mdrun.out 2>&1 fi -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Gromacs 2016.3 FLOATING-POINT EXECPTION/DIVIDE-BY_ZERO errors
HI!. My name is Rainer. I am one of the module-/SW-maintainers for the bwHPC-C5-project. "http://www.bwhpc-c5.de/en/index.php; Since a couple of time we can't run Gromacs 2016.n on our clusters. Here's one of the errors we received from one of the users: I am using Gromacs on BwUniCluster for few months and I performed molecular Dynamics Simulations without any problem until the end of July. Since August, I cannot prepare the input file and if I prepare them elsewhere, the simulation crash although it was the same that the ones which perfectly worked few days before. It seems that I have not such problem smaller systems with less atoms. uc1n997:43275] *** Process received signal *** [uc1n997:43275] Signal: Floating point exception (8) [uc1n997:43275] Signal code: Integer divide-by-zero (1) [uc1n997:43275] Failing at address: 0x42fd74 [uc1n997:43275] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2b65e89805e0] [uc1n997:43275] [ 1] gmx_mpi_d(__svml_idiv4_h9+0x64)[0x42fd74] [uc1n997:43275] [ 2] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(count_bonded_distances+0x43c)[0x2b65e6b1f72c] [uc1n997:43275] [ 3] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(pme_load_estimate+0x20)[0x2b65e6b1ef70] [uc1n997:43275] [ 4] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(gmx_grompp+0x257a)[0x2b65e639e84a] [uc1n997:43275] [ 5] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x267)[0x2b65e6127e97] [uc1n997:43275] [ 6] gmx_mpi_d(main+0xbc)[0x40c13c] [uc1n997:43275] [ 7] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b65e8baec05] [uc1n997:43275] [ 8] gmx_mpi_d[0x40bfb9] [uc1n997:43275] *** End of error message *** Gromacs 2016.3 was build this way (excerpt): [...] # #(3) Load required modules for build process module load compiler/intel/16.0 module load mpi/openmpi/2.1-intel-16.0 module load numlib/mkl/11.3.4 module load devel/cmake/3.3.2 [...] # double precission cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DGMX_MPI=ON -DGMX_GPU=OFF -DGMX_DOUBLE=ON -DGMX_THREAD_MPI=OFF -DGMX_FFT_LIBRARY=mkl -DMPIEXEC=${MPI_BIN_DIR}/mpirun -DREGRESSIONTEST_DOWNLOAD=OFF -DCMAKE_INSTALL_PREFIX=${TARGET_DIR} make 2>&1 | tee ${LOG_DIR}/make_double.out make install 2>&1 | tee ${LOG_DIR}/make-install_double.out [...] ANY HELP IS MUCH APPRECIATED. Thanx in advance. .-) -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Gromacs 2016.3 FLOATING-POINT EXECPTION/DIVIDE-BY_ZERO errors
HI!. My name is Rainer. I am one of the module-/SW-maintainers for the bwHPC-C5-project. "http://www.bwhpc-c5.de/en/index.php; Since a couple of time we can't run Gromacs 2016.n on our clusters. Here's one of the errors we received from one of the users: I am using Gromacs on BwUniCluster for few months and I performed molecular Dynamics Simulations without any problem until the end of July. Since August, I cannot prepare the input file and if I prepare them elsewhere, the simulation crash although it was the same that the ones which perfectly worked few days before. It seems that I have not such problem smaller systems with less atoms. uc1n997:43275] *** Process received signal *** [uc1n997:43275] Signal: Floating point exception (8) [uc1n997:43275] Signal code: Integer divide-by-zero (1) [uc1n997:43275] Failing at address: 0x42fd74 [uc1n997:43275] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2b65e89805e0] [uc1n997:43275] [ 1] gmx_mpi_d(__svml_idiv4_h9+0x64)[0x42fd74] [uc1n997:43275] [ 2] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(count_bonded_distances+0x43c)[0x2b65e6b1f72c] [uc1n997:43275] [ 3] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(pme_load_estimate+0x20)[0x2b65e6b1ef70] [uc1n997:43275] [ 4] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(gmx_grompp+0x257a)[0x2b65e639e84a] [uc1n997:43275] [ 5] /pfs/data1/software_uc1/bwhpc/common/chem/gromacs/5.1.2-openmpi-1.8-intel-15.0/bin/../lib64/libgromacs_mpi_d.so.1(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x267)[0x2b65e6127e97] [uc1n997:43275] [ 6] gmx_mpi_d(main+0xbc)[0x40c13c] [uc1n997:43275] [ 7] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b65e8baec05] [uc1n997:43275] [ 8] gmx_mpi_d[0x40bfb9] [uc1n997:43275] *** End of error message *** Gromacs 2016.3 was build this way (excerpt): [...] # #(3) Load required modules for build process module load compiler/intel/16.0 module load mpi/openmpi/2.1-intel-16.0 module load numlib/mkl/11.3.4 module load devel/cmake/3.3.2 [...] # double precission cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DGMX_MPI=ON -DGMX_GPU=OFF -DGMX_DOUBLE=ON -DGMX_THREAD_MPI=OFF -DGMX_FFT_LIBRARY=mkl -DMPIEXEC=${MPI_BIN_DIR}/mpirun -DREGRESSIONTEST_DOWNLOAD=OFF -DCMAKE_INSTALL_PREFIX=${TARGET_DIR} make 2>&1 | tee ${LOG_DIR}/make_double.out make install 2>&1 | tee ${LOG_DIR}/make-install_double.out [...] ANY HELP IS MUCH APPRECIATED. Thanx in advance. .-) -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.