If it is the application, then there is probably a barrier in the
app_coord_init() to make sure all the applications are up and running.
After this point then the global coordinator knows that the application can
be checkpointed.
I don't think orte-checkpoint should be calling a barrier - from wha
Is it orte-checkpoint that is hanging, or the app you are trying to checkpoint?
On Jan 20, 2014, at 2:10 PM, Adrian Reber wrote:
> Thanks for your help. I tried initializing the barrier correctly (see
> attached patch) but now, instead of crashing, it just hangs on the
> barrier while running o
Thanks for your help. I tried initializing the barrier correctly (see
attached patch) but now, instead of crashing, it just hangs on the
barrier while running orte-checkpoint
[dcbz:20150] [[41665,0],0] grpcomm:bad entering barrier
[dcbz:20150] [[41665,0],0] ACTIVATING GRCPCOMM OP 0 at
../../../..
On 1/17/14 6:28 PM, "Paul Hargrove"
mailto:phhargr...@lbl.gov>> wrote:
I am trying to build the 1.7 nightly tarball (1.7.4rc2r30303) on a Linux/PPC
system with the xlc-11.1 compilers configured for 32-bit output:
$ export OBJECT_MODE=32
$ [pathto]/configure CC=xlc CXX=xlC FC=xlf90 --enable-debu
On 1/17/14 8:00 PM, "Paul Hargrove"
mailto:phhargr...@lbl.gov>> wrote:
Trying to build 1.7.4rc2r30303 with gcc on linux/mips32 yields the following
failure:
CXX mpicxx.lo
/home/phargrov/OMPI/openmpi-1.7.4-latest-linux-mips32/openmpi-1.7.4rc2r30303/ompi/mpi/cxx/mpicxx.cc:31:2:
warning: #
Just as a follow-up to this: I have added a sensor module to monitor core
temperatures per this email thread. I haven't added the cooling devices from
this last bit as the info I could find under there didn't seem all that helpful
right now - mostly just how fast the fan is running on a scale of