On Feb 7, 2014, at 5:08 PM, Jeff Squyres (jsquyres) wrote:
> AS_IF([test $crs_criu_happy -eq 1],
> [$2],
> [AS_IF([test "$with_criu" != "x" && "x$with_criu" != "xno"],
>[AC_MSG_WARN([You asked for CRIU support, but I can't find
> it.])
>
+1
On Feb 7, 2014, at 5:23 PM, Joshua Ladd
wrote:
> What: Add an internal random number generator to OPAL.
>
> Why: OMPI uses rand and srand all over the place. Because the middleware is
> mucking with the RNG’s global state, applications that use these library
>
+1.
On Fri, Feb 07, 2014 at 10:23:41PM +, Joshua Ladd wrote:
>What: Add an internal random number generator to OPAL.
>
>
>
>Why: OMPI uses rand and srand all over the place. Because the middleware
>is mucking with the RNG's global state, applications that use these
>
That is fantastic! Thanks for the hard work so far getting the C/R
infrastructure back in place.
On Fri, Feb 7, 2014 at 3:46 PM, Adrian Reber wrote:
> I have created a new CRS component using criu (criu.org) to support
> checkpoint/restart in Open MPI. My current patch only
Yes. After batting this around a bit with Jeff and Mike, we came to the
consensus that the interface should be more "rand_r", so that state is locally
managed by the consumer. The ALFG offers a powerful yet simple way to do it. We
may even expose it to users since it offers a very scalable and
Joshua,
This is for ticket #2928, right?
-Paul
On Fri, Feb 7, 2014 at 2:23 PM, Joshua Ladd wrote:
> What: Add an internal random number generator to OPAL.
>
>
>
> Why: OMPI uses rand and srand all over the place. Because the middleware
> is mucking with the RNG's
Ralph,
I'll try to test tonight's v1.7 taball for:
+ ia64 atomics (#4174)
+ bad getpwuid (#4164)
+ opalpath_nfs/EPERM (#4125)
+ torque smp (#4227)
All but torque are fully-automated tests and I need only check my email for
the results.
The torque one will require manual job submission.
-Paul
Sweet -- +1 for CRIU support!
FWIW, I see you modeled your configure.m4 off the blcr configure.m4, but I'd
actually go with making it a bit simpler. For example, I typically structure
my configure.m4's like this (typed in mail client -- forgive mistakes...):
-
AS_IF([...some test],
What: The current probe algorithm in ob1 is linear with respect to the
number or processes in the job. I wish to change the algorithm to be
linear in the number of processes with unexpected messages. To do this I
added an additional opal_list_t to the ob1 communicator and made the ob1
process a
Hi folks
As you may have noticed, I've been working my way thru the CMR backlog on
1.7.5. A large percentage of them were minor fixes (valgrind warning
suppressions, error message typos, etc.), so those went in the first round.
Today's round contains more "meaty" things, but I still consider
I have created a new CRS component using criu (criu.org) to support
checkpoint/restart in Open MPI. My current patch only provides the
framework and necessary configure scripts to detect and link against
criu. With this patch orte-checkpoint can request a checkpoint and the
new CRIU CRS component
Exchange is evil….
Attached.
Best,
P
p4.patch.gz
Description: p4.patch.gz
On Feb 7, 2014, at 12:41 PM, Nathan Hjelm wrote:Can you gzip the patch. The local exchange server has a habit ofconverting LF to CRLF.-NathanOn Fri, Feb 07, 2014 at 12:14:02PM -0500, Shamis, Pavel
Can you gzip the patch. The local exchange server has a habit of
converting LF to CRLF.
-Nathan
On Fri, Feb 07, 2014 at 12:14:02PM -0500, Shamis, Pavel wrote:
> Can you please give a try to the attached hot-fix.
> It unrolls most of the spaghetti, except the iboffload component (which is
>
Hah. You beet me to it. More or less identical to what I was doing. I
will give this a try. If this works we should push it and add it to the
coll/ml cmr.
-Nathan
On Fri, Feb 07, 2014 at 12:14:02PM -0500, Shamis, Pavel wrote:
> Can you please give a try to the attached hot-fix.
> It unrolls most
Can you please give a try to the attached hot-fix.
It unrolls most of the spaghetti, except the iboffload component (which is
anyway disabled).
Sorry for the mess.
Best,
Pasha
On Feb 7, 2014, at 10:52 AM, Nathan Hjelm
> wrote:
On Fri, Feb 07, 2014 at
If the directories are there and populated, then the problem is likely with
your path. Do this:
1. "which mpirun" - if you don't see your /bin, then you know your path
is wrong
2. "printenv PATH" - is it what you expected?
We generally suggest that you put your /bin and /lib at the
beginning
Thank you for considering my case seriously.
yes sir both directories along with other directories are there with files
in them. But still I feel I am missing something not sure what it is. how I
can check Open Mpi? mpirun is not responding not even mpicc ? any
instruction how to run parallel jobs
Well, it certainly looks okay - try doing "ls" in your prefix directory. Do you
see the bin and lib directories there? Anything in them?
On Feb 7, 2014, at 8:37 AM, Talla wrote:
> Hello sir
> I downloaded openmpi 1.7 and followed the installation instructions:
> cd openmpi
>
Hello sir
I downloaded openmpi 1.7 and followed the installation instructions:
cd openmpi
./configure --prefix="/home/$USER/.openmpi"
make
make install
export PATH="$PATH:/home/$USER/.openmpi/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/$USER/.openmpi/lib/"
echo export
On Feb 7, 2014, at 10:52 AM, Nathan Hjelm wrote:
> Should be ready today. The use of that coll/ml structure is unnecessary
> at this time. I am removing it in bcol right now. In the future we will
> put in a better fix but this should work for 1.7.x/1.8.x.
Sweet.
--
Jeff
On Feb 7, 2014, at 7:52 AM, Nathan Hjelm wrote:
> On Fri, Feb 07, 2014 at 07:46:03AM -0800, Ralph Castain wrote:
>> The issue in 1.7 is all the cross-integration, which means we violate our
>> normal behavior when it comes to no-building and user-directed component
>>
On Fri, Feb 07, 2014 at 07:46:03AM -0800, Ralph Castain wrote:
> The issue in 1.7 is all the cross-integration, which means we violate our
> normal behavior when it comes to no-building and user-directed component
> selection. Jeff and I just discussed how this could be resolved using the
>
The issue in 1.7 is all the cross-integration, which means we violate our
normal behavior when it comes to no-building and user-directed component
selection. Jeff and I just discussed how this could be resolved using the
PML-BTL model, but (a) that is not what we have in 1.7, and (b) it isn't
How is this a problem in 1.7? We don't have a .ompi_ignore in
1.7.4. That is there to prevent mtt failures while I fix some
outstanding bcol issues.
I will clean this up on trunk and add it to the cmr.
-Nathan
On Thu, Feb 06, 2014 at 08:42:27PM -0800, Ralph Castain wrote:
> As many of you will
In the original implementation, the OOB ft_event did not do much of
anything on checkpoint preparation and continue. We did not even close the
sockets. However, during restart the OOB will need to renegotiate the
socket connections - usually by calling the finalization function (close
stale
It is difficult to see it from the stack trace, as it happens in the ORTE
threads. But I do have all the output I expect, and as the application I was
running is hello_world I’m almost certain it happens during MPI_Finalize.
George.
On Feb 7, 2014, at 03:38 , Ralph Castain
Think I see the code path that causes this - I'll have to play with it a little
as the race condition is biased heavily towards success, so (as you noted) it
won't happen very often.
On Feb 6, 2014, at 6:38 PM, Ralph Castain wrote:
> Interesting - does it happen in
27 matches
Mail list logo