I can't think of anything in particular that changed that could affect this. Are you trying this with pvserver? Can you try pvbatch? Same problem?
On Tue, Dec 3, 2013 at 12:44 PM, Angelini, Richard C (Rick) CIV USARMY ARL (US) <[email protected]> wrote: > Classification: UNCLASSIFIED > Caveats: NONE > > I've built 4.1.0 on a couple of our HPC systems and I'm getting a clean > build, but fails on execution of parallel servers. On both systems (an SGI > Altix/ICE and an IBM iDataPlex) I'm using gcc and openmpi and the same > exact build environment that I used to build 4.0.1. However, both systems > are failing with identical errors that begins with a "Leave Pinned" mpi > feature which is a flag set in our mpirun command environment and works with > 4.0.1. Did something change behind that scenes in ParaView 4.1.0 that > impacts the build or runtime parameters? > > > > orterun -x MODULE_VERSION_STACK -x MANPATH -x MPI_VER -x HOSTNAME -x > _MODULESBEGINENV_ -x PBS_ACCOUNT -x HOST -x SHELL -x TMPDIR -x PBS_JOBNAME > -x PBS_ENVIRONMENT -x PBS_O_WORKDIR -x NCPUS -x DAAC_HOME -x GROUP -x > PBS_TASKNUM -x USER -x LD_LIBRARY_PATH -x LS_COLORS -x PBS_O_HOME -x > COMPILER_VER -x HOSTTYPE -x PBS_MOMPORT -x PV_ROOT -x PBS_O_QUEUE -x NLSPATH > -x MODULE_VERSION -x MAIL -x PBS_O_LOGNAME -x PATH -x PBS_O_LANG -x > PBS_JOBCOOKIE -x F90 -x PWD -x _LMFILES_ -x PBS_NODENUM -x LANG -x > MODULEPATH -x LOADEDMODULES -x PBS_JOBDIR -x F77 -x PBS_O_SHELL -x PBS_JOBID > -x MPICC_F77 -x CXX -x ENVIRONMENT -x SHLVL -x HOME -x OSTYPE -x PBS_O_HOST > -x MPIHOME -x FC -x VENDOR -x MACHTYPE -x LOGNAME -x MPICC_CXX -x PBS_QUEUE > -x MPI_HOME -x MODULESHOME -x COMPILER -x LESSOPEN -x OMP_NUM_THREADS -x > PBS_O_MAIL -x CC -x PBS_O_SYSTEM -x MPICC_F90 -x G_BROKEN_FILENAMES -x > PBS_NODEFILE -x MPICC_CC -x PBS_O_PATH -x module -x } -x premode -x premod > -x PBS_HOME -x PBS_GET_IBWINS -x NUM_MPITASKS -np 3 -machinefile > new.1133.machines.txt --prefix > /usr/cta/unsupported/openmpi/gcc/4.4.0/openmpi-1.6.3 -mca orte_rsh_agent ssh > -mca mpi_paffinity_alone 1 -mca maffinity first_use -mca mpi_leave_pinned 1 > -mca btl openib,self -mca orte_default_hostname new.1133.machines.txt > pvserver --use-offscreen-rendering --server-port=50481 > --client-host=localhost --reverse-connection --timeout=15 --connect-id=30526 > [pershing-n0221:01190] Warning: could not find environment variable "}" > -------------------------------------------------------------------------- > A process attempted to use the "leave pinned" MPI feature, but no > memory registration hooks were found on the system at run time. This > may be the result of running on a system that does not support memory > hooks or having some other software subvert Open MPI's use of the > memory hooks. You can disable Open MPI's use of memory hooks by > setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA > parameters to 0. > > Open MPI will disable any transports that are attempting to use the > leave pinned functionality; your job may still run, but may fall back > to a slower network transport (such as TCP). > > Mpool name: rdma > Process: [[43622,1],0] > Local host: xxx-n0221 > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > WARNING: There is at least one OpenFabrics device found but there are > no active ports detected (or Open MPI was unable to use them). This > is most certainly not what you wanted. Check your cables, subnet > manager configuration, etc. The openib BTL will be ignored for this > job. > > Local host: xxx-n0221 > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[43622,1],2]) is on host: xxx-n0221 > Process 2 ([[43622,1],0]) is on host: xxx-n0221 > BTLs attempted: self > > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > MPI_INIT has failed because at least one MPI process is unreachable > from another. This *usually* means that an underlying communication > plugin -- such as a BTL or an MTL -- has either not loaded or not > allowed itself to be used. Your MPI job will now abort. > > You may wish to try to narrow down the problem; > > * Check the output of ompi_info to see which BTL/MTL plugins are > available. > * Run your application with MPI_THREAD_SINGLE. > * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose, > if using MTL-based communications) to see exactly which > communication plugins were considered and/or discarded. > -------------------------------------------------------------------------- > [pershing-n0221:1198] *** An error occurred in MPI_Init > [pershing-n0221:1198] *** on a NULL communicator > [pershing-n0221:1198] *** Unknown error > [pershing-n0221:1198] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > -------------------------------------------------------------------------- > An MPI process is aborting at a time when it cannot guarantee that all > of its peer processes in the job will be killed properly. You should > double check that everything has shut down cleanly. > > Reason: Before MPI_INIT completed > Local host: pershing-n0221 > PID: 1198 > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > orterun has exited due to process rank 2 with PID 1198 on > node pershing-n0221 exiting improperly. There are two reasons this could > occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by orterun (as reported here). > -------------------------------------------------------------------------- > [pershing-n0221:01190] 2 more processes have sent help message > help-mpool-base.txt / leave pinned failed > [pershing-n0221:01190] Set MCA parameter "orte_base_help_aggregate" to 0 to > see all help / error messages > [pershing-n0221:01190] 2 more processes have sent help message > help-mpi-btl-openib.txt / no active ports found > [pershing-n0221:01190] 2 more processes have sent help message > help-mca-bml-r2.txt / unreachable proc > [pershing-n0221:01190] 2 more processes have sent help message > help-mpi-runtime / mpi_init:startup:pml-add-procs-fail > [pershing-n0221:01190] 2 more processes have sent help message > help-mpi-errors.txt / mpi_errors_are_fatal unknown handle > [pershing-n0221:01190] 2 more processes have sent help message > help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed > > ________________________________ > Rick Angelini > USArmy Research Laboratory > CISD/HPC Architectures Team > Building 120 Cube 315 > Aberdeen Proving Ground, MD > Phone: 410-278-6266 > > > > Classification: UNCLASSIFIED > Caveats: NONE > > > > _______________________________________________ > Powered by www.kitware.com > > Visit other Kitware open-source projects at > http://www.kitware.com/opensource/opensource.html > > Please keep messages on-topic and check the ParaView Wiki at: > http://paraview.org/Wiki/ParaView > > Follow this link to subscribe/unsubscribe: > http://www.paraview.org/mailman/listinfo/paraview > _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
