91-8149399160
>>>>>>>> ___
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>&g
urther below:
>
>
> - Original Message -
>> From: Joshua Hursey <jjhur...@open-mpi.org>
> [...]
>> What other configure options are you passing to Open MPI? Specifically the
>> configure test will always fail if '--with-ft=cr' is not specified - by
>> default
What version of BLCR are you using?
What other configure options are you passing to Open MPI? Specifically the
configure test will always fail if '--with-ft=cr' is not specified - by default
Open MPI will only build the BLCR component if C/R FT is requested by the user.
Can you send a zip'ed
here are also 2 sample result files (cpu.256^3.8N.*) which show the
> execution time difference between 2 cases.
> Hope you can take some time to find the problem.
> Thanks for your kindness.
>
> Best Regards,
> Nguyen Toan
>
> On Wed, Mar 2, 2011 at 3:00 AM, Joshua Hu
the MCA parameter you mentioned but it did not help, the unknown
> overhead still exists.
> Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1.
> Hope you can find out the problem.
> Thank you.
>
> Regards,
> Nguyen Toan
>
> On Wed, Feb 9, 2011 at 1
st
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
e checkpoint
> per application execution for my purpose, but the unknown overhead exists
> even when no checkpoint was taken.
>
> Do you have any other idea?
>
> Regards,
> Nguyen Toan
>
>
> On Wed, Feb 9, 2011 at 12:41 AM, Joshua Hursey <jjhur...@open-mpi
eliminate it?
> Thanks.
>
> Regards,
> Nguyen
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
On Jan 27, 2011, at 9:47 AM, Reuti wrote:
> Am 27.01.2011 um 15:23 schrieb Joshua Hursey:
>
>> The current version of Open MPI does not support continued operation of an
>> MPI application after process failure within a job. If a process dies, so
>> will the MPI job
this group into
> a working communicator?
>
> Thanks,
> Kirk
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
Joshua Hursey
Postdoctoral
roup
> HPC Research Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
Regards,
> Andrei
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______
> users mailing list
> us...@open-
can be transmitted via email. The recipient should
> check this email and any attachments for the presence of viruses. The company
> accepts no liability for any damage caused by any virus transmitted by this
> email.
>
> www.wipro.com
>
>
---
21.71] Finished -
> ompi_global_snapshot_27115.ckpt
> Snapshot Ref.: 0 ompi_global_snapshot_27115.ckpt
>
> As you see, it takes 200+ secconds to checkpoint. btw, what the former and
> latter number represent in [ , ]?
>
> Regards
>
> Whchen
>
AQ.html#prelink
If that doesn't work then I would suggest trying the current Open MPI trunk.
There should not be any problem with using NFS, since this is occurring in
MPI_Init, this is well before we ever try to use the file system. I also test
with NFS, and local staging on a fairly regular b
I am pleased to announce that Open MPI now supports checkpoint/restart process
migration and automatic recovery. This is in addition to our current support
for more traditional checkpoint/restart fault tolerance. These new features
were introduced in the Open MPI development trunk in commit
’t
> proceed after that.
>
> I have attached the stack traces of all the MPI processes that are part of
> the mpirun. I really appreciate if you can take a look at the stack trace and
> let m e know the potential problem. I am kind of stuck at this point and need
> your
led? Were you successful in checkpointing?
>
> - Ananda
> -Original Message-
> Message: 9
> Date: Fri, 13 Aug 2010 10:21:29 -0400
> From: Joshua Hursey <jjhur...@open-mpi.org>
> Subject: Re: [OMPI users] users Digest, Vol 1658, Issue 2
> To: Open MPI Users <u
ronment variables
> to have a successful compilation. I will keep you posted.
>
> BTW, were you successful in reproducing the problem on a system with
> OpenMPI 1.4.2?
>
> Thanks
> Ananda
> -Original Message-----
> Date: Thu, 12 Aug 2010 09:12:26 -0400
> From: J
hnical Architect
> Wipro Technologies
> Ph: 972 765 8093
> ananda.mu...@wipro.com
>
>
> -Original Message-
> Date: Mon, 9 Aug 2010 16:37:58 -0400
> From: Joshua Hursey <jjhur...@open-mpi.org>
> Subject: Re: [OMPI users] Checkpointing mpi4py program
> To: Open MPI Us
I have not tried to checkpoint an mpi4py application, so I cannot say for sure
if it works or not. You might be hitting something with the Python runtime
interacting in an odd way with either Open MPI or BLCR.
Can you attach a debugger and get a backtrace on a stuck checkpoint? That might
show
That is interesting. I cannot think of any reason why this might be causing a
problem just in Open MPI. popen() is similar to fork()/system() so you have to
be careful with interconnects that do not play nice with fork(), like openib.
But since it looks like you are excluding openib, this
cr_thread_sleep_wait=1000
Which will throttle down the thread when the application is in the MPI library.
You might want to play around with these MCA parameters to tune the
aggressiveness of the C/R thread to your performance needs. In the mean time I
will look into finding better default para
There is some overhead involved when activating the current C/R functionality
in Open MPI due to the wrapping of the internal point-to-point stack. The
wrapper (CRCP framework) tracks the signature of each message (not the buffer,
so constant time for any size MPI message) so that when we need
On Mar 4, 2010, at 8:17 AM, Fernando Lemos wrote:
> On Wed, Mar 3, 2010 at 10:24 PM, Fernando Lemos wrote:
>
>> Is there anything I can do to provide more information about this bug?
>> E.g. try to compile the code in the SVN trunk? I also have kept the
>> snapshots
On Mar 3, 2010, at 3:42 PM, Fernando Lemos wrote:
> On Wed, Mar 3, 2010 at 5:31 PM, Joshua Hursey <jjhur...@open-mpi.org> wrote:
>
>>
>> Yes, ompi-restart should be printing a helpful message and exiting normally.
>> Thanks for the bug report. I beli
On Mar 2, 2010, at 9:17 AM, Fernando Lemos wrote:
> On Sun, Feb 28, 2010 at 11:11 PM, Fernando Lemos
> wrote:
>> Hello,
>>
>>
>> I'm trying to come up with a fault tolerant OpenMPI setup for research
>> purposes. I'm doing some tests now, but I'm stuck with a segfault
You can use the 'checkpoint to local disk' example to checkpoint and restart
without access to a globally shared storage devices. There is an example on the
website that does not use a globally mounted file system:
http://www.osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-local
What
On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote:
> Hi,
>
> I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have
> downloaded today. When I want to checkpoint I am having the following error
> message:
> [[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
On Jan 14, 2010, at 2:50 AM, Andreea Costea wrote:
> Hei there
>
> I have some questions regarding checkpoint/restart:
>
> 1. Until recently I thought that ompi-restart and ompi-restart are used to
> checkpoint a process inside an MPI application. Now I reread this and I
> realized that
The --preload-* options to 'mpirun' currently use the ssh/scp commands (or
rsh/rcp via an MCA parameter) to move files from the machine local to the
'mpirun' command to the compute nodes during launch. This assumes that you have
Open MPI already installed on all of the machines. It was an
On Sep 16, 2009, at 8:30 AM, Marcin Stolarek wrote:
Hi,
It seems I solved my problem. Root of the error was, that I haven't
loaded blcr module. So I couldn't checkpoint even one therad
application.
I am glad to hear that you have things working now.
However I stil can't find MCA:blcr
That seemed to have done the trick.
Thanks,
Josh
On Jul 6, 2007, at 3:04 PM, Ethan Mallove wrote:
On Fri, Jul/06/2007 01:22:06PM, Joshua Hursey wrote:
Anyone seen the following error from MTT before? It looks like it
is in the
reporter stage.
<->
shell$ /spi
Anyone seen the following error from MTT before? It looks like it is
in the reporter stage.
<->
shell$ /spin/home/jjhursey/testing/mtt//client/mtt --mpi-install --
scratch /spin/home/jjhursey/testing/scratch/20070706 --file /spin/
34 matches
Mail list logo