Re: [OMPI users] Abort

2010-08-13 Thread David Ronis
I'm using mpirun and the nodes are all on the same machin (a 8 cpu box
with an intel i7).  coresize is unlimited:


ulimit -a
core file size  (blocks, -c) unlimited

David


n Fri, 2010-08-13 at 13:47 -0400, Jeff Squyres wrote:
> On Aug 13, 2010, at 1:18 PM, David Ronis wrote:
> 
> > Second coredumpsize is unlimited, and indeed I DO get core dumps when
> > I'm running a single-processor version.  
> 
> What launcher are you using underneath Open MPI?
> 
> You might want to make sure that the underlying launcher actually sets the 
> coredumpsize to unlimited on each server where you're running.  E.g., if 
> you're using rsh/ssh, check that your shell startup files set coredumpsize to 
> unlimited for non-interactive logins.  Or, if you're using (for example) 
> Torque, check to ensure that jobs launched under Torque don't have their 
> coredumpsize automatically reset to 0, etc.
> 



Re: [OMPI users] Abort

2010-08-13 Thread Jeff Squyres
On Aug 13, 2010, at 1:18 PM, David Ronis wrote:

> Second coredumpsize is unlimited, and indeed I DO get core dumps when
> I'm running a single-processor version.  

What launcher are you using underneath Open MPI?

You might want to make sure that the underlying launcher actually sets the 
coredumpsize to unlimited on each server where you're running.  E.g., if you're 
using rsh/ssh, check that your shell startup files set coredumpsize to 
unlimited for non-interactive logins.  Or, if you're using (for example) 
Torque, check to ensure that jobs launched under Torque don't have their 
coredumpsize automatically reset to 0, etc.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Abort

2010-08-13 Thread David Ronis
Thanks to all who replied.  

First, I'm running openmpi 1.4.2.  

Second coredumpsize is unlimited, and indeed I DO get core dumps when
I'm running a single-processor version.  Third, the problem isn't
stopping the program, MPI_Abort does that just fine, rather it's getting
a cordump.  According to the man page, MPI_Abort sends a SIGTERM, not a
SIGABRT so perhaps that's what should happen.   

Finally, my guess as to what's happening if I use the libc abort is that
the other nodes get stuck in an MPI call (I do lots of MPI_Reduces or
MPI_Bcasts in this code), but this doesn't explain why the node calling
abort doesn't exit with a coredump.

David

On Thu, 2010-08-12 at 20:44 -0600, Ralph Castain wrote:
> Sounds very strange - what OMPI version, on what type of machine, and how was 
> it configured?
> 
> 
> On Aug 12, 2010, at 7:49 PM, David Ronis wrote:
> 
> > I've got a mpi program that is supposed to to generate a core file if
> > problems arise on any of the nodes.   I tried to do this by adding a
> > call to abort() to my exit routines but this doesn't work; I get no core
> > file, and worse, mpirun doesn't detect that one of my nodes has
> > aborted(?) and doesn't kill off the entire job, except in the trivial
> > case where the number of processors I'm running on is 1.   I've replaced
> > abort with MPI_Abort, which kills everything off, but leaves no core
> > file.  Any suggestions how I can get one and still have mpi exit?
> > 
> > Thanks in advance.
> > 
> > David
> > 
> > 
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-13 Thread Michael E. Thomadakis

 On 08/12/10 21:53, Jed Brown wrote:


Or OMPI_CC=icc-xx.y mpicc ...




If we enable a different set of run time library paths for Intel 
compilers than those used to build OMPI when we compile and execute the 
MPI app these new run-time libs will be accessible to OMPI libs to run 
against instead of those used when OMPI was being built right? I would 
think that this may cause some problems if for some reason something in 
the modern run-time libfs differs from the ones used when OMPI was built ?


A user is hoping to avoid rebuilding his OMPI app but i guess just 
change LD_LIBRARY_PATH to the latest Intel compile run-time libs and 
just launch it with teh latest and greatest Intel Libs I mentioned 
to him that the right way is to build the combination of OMPI + Intel 
run-time that the application is known to work with (since some may 
fail) but he wants me to insert a fixed run-time lib path for OMPI libs 
but use different and variable one for the run-time libs of the OMPI 
application! It is frustrating with people who get "great ideas" but 
then they presss someone else to make them work instead of doing this 
themselves


anyway thanks

Michael


Jed

On Aug 12, 2010 5:18 PM, "Ralph Castain" > wrote:



On Aug 12, 2010, at 7:04 PM, Michael E. Thomadakis wrote:

> On 08/12/10 18:59, Tim Prince wrote:
>>...

The "easy" way to accomplish this would be to:

(a) build OMPI with whatever compiler you decide to use as a "baseline"

(b) do -not- use the wrapper compiler to build the application. 
Instead, do "mpicc --showme" (or whatever language equivalent you 
want) to get the compile line, substitute your "new" compiler library 
for the "old" one, and then execute the resulting command manually.


If you then set your LD_LIBRARY_PATH to the "new" libs, it might work 
- but no guarantees. Still, you could try it - and if it worked, you 
could always just explain that this is a case-by-case situation, and 
so it -could- break with other compiler combinations.


Critical note: the app developers would have to validate the code 
with every combination! Otherwise, correct execution will be a 
complete crap-shoot - just because the app doesn't abnormally 
terminate does -not- mean it generated a correct result!





> Thanks for the information on this. We indeed use Intel Compiler 
set 11.1.XXX + OMPI 1.4.1 and ...



___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread Joshua Hursey
Nope. I probably won't get to it for a while. I'll let you know if I do.

On Aug 13, 2010, at 12:17 PM,   
wrote:

> OK, I will do that.
> 
> But did you try this program on a system where the latest trunk is
> installed? Were you successful in checkpointing?
> 
> - Ananda
> -Original Message-
> Message: 9
> Date: Fri, 13 Aug 2010 10:21:29 -0400
> From: Joshua Hursey 
> Subject: Re: [OMPI users] users Digest, Vol 1658, Issue 2
> To: Open MPI Users 
> Message-ID: <7a43615b-a462-4c72-8112-496653d8f...@open-mpi.org>
> Content-Type: text/plain; charset=us-ascii
> 
> I probably won't have an opportunity to work on reproducing this on the
> 1.4.2. The trunk has a bunch of bug fixes that probably will not be
> backported to the 1.4 series (things have changed too much since that
> branch). So I would suggest trying the 1.5 series.
> 
> -- Josh
> 
> On Aug 13, 2010, at 10:12 AM, 
>  wrote:
> 
>> Josh
>> 
>> I am having problems compiling the sources from the latest trunk. It
>> complains of libgomp.spec missing even though that file exists on my
>> system. I will see if I have to change any other environment variables
>> to have a successful compilation. I will keep you posted.
>> 
>> BTW, were you successful in reproducing the problem on a system with
>> OpenMPI 1.4.2?
>> 
>> Thanks
>> Ananda
>> -Original Message-
>> Date: Thu, 12 Aug 2010 09:12:26 -0400
>> From: Joshua Hursey 
>> Subject: Re: [OMPI users] Checkpointing mpi4py program
>> To: Open MPI Users 
>> Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org>
>> Content-Type: text/plain; charset=us-ascii
>> 
>> Can you try this with the current trunk (r23587 or later)?
>> 
>> I just added a number of new features and bug fixes, and I would be
>> interested to see if it fixes the problem. In particular I suspect
> that
>> this might be related to the Init/Finalize bounding of the checkpoint
>> region.
>> 
>> -- Josh
>> 
>> On Aug 10, 2010, at 2:18 PM, 
>>  wrote:
>> 
>>> Josh
>>> 
>>> Please find attached is the python program that reproduces the hang
>> that
>>> I described. Initial part of this file describes the prerequisite
>>> modules and the steps to reproduce the problem. Please let me know if
>>> you have any questions in reproducing the hang.
>>> 
>>> Please note that, if I add the following lines at the end of the
>> program
>>> (in case sleep_time is True), the problem disappears ie; program
>> resumes
>>> successfully after successful completion of checkpoint.
>>> # Add following lines at the end for sleep_time is True
>>> else:
>>> time.sleep(0.1)
>>> # End of added lines
>>> 
>>> 
>>> Thanks a lot for your time in looking into this issue.
>>> 
>>> Regards
>>> Ananda
>>> 
>>> Ananda B Mudar, PMP
>>> Senior Technical Architect
>>> Wipro Technologies
>>> Ph: 972 765 8093
>>> ananda.mu...@wipro.com
>>> 
>>> 
>>> -Original Message-
>>> Date: Mon, 9 Aug 2010 16:37:58 -0400
>>> From: Joshua Hursey 
>>> Subject: Re: [OMPI users] Checkpointing mpi4py program
>>> To: Open MPI Users 
>>> Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org>
>>> Content-Type: text/plain; charset=windows-1252
>>> 
>>> I have not tried to checkpoint an mpi4py application, so I cannot say
>>> for sure if it works or not. You might be hitting something with the
>>> Python runtime interacting in an odd way with either Open MPI or
> BLCR.
>>> 
>>> Can you attach a debugger and get a backtrace on a stuck checkpoint?
>>> That might show us where things are held up.
>>> 
>>> -- Josh
>>> 
>>> 
>>> On Aug 9, 2010, at 4:04 PM, 
>>>  wrote:
>>> 
 Hi
 
 I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR
>>> 0.8.2. When I run ompi-checkpoint on the program written using
> mpi4py,
>> I
>>> see that program doesn?t resume sometimes after successful checkpoint
>>> creation. This doesn?t occur always meaning the program resumes after
>>> successful checkpoint creation most of the time and completes
>>> successfully. Has anyone tested the checkpoint/restart functionality
>>> with mpi4py programs? Are there any best practices that I should keep
>> in
>>> mind while checkpointing mpi4py programs?
 
 Thanks for your time
 -  Ananda
 Please do not print this email unless it is absolutely necessary.
 
 The information contained in this electronic message and any
>>> attachments to this message are intended for the exclusive use of the
>>> addressee(s) and may contain proprietary, confidential or privileged
>>> information. If you are not the intended recipient, you should not
>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>> immediately and destroy 

Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread ananda.mudar
Josh

I have stack traces of all 8 python processes when I observed the hang after 
successful completion of checkpoint. They are in the attached document. Please 
see if these stack traces provide any clue.

Thanks
Ananda



From: Ananda Babu Mudar (WT01 - Energy and Utilities)
Sent: Fri 8/13/2010 9:12 AM
To: us...@open-mpi.org
Subject: RE: users Digest, Vol 1658, Issue 2



Josh

I am having problems compiling the sources from the latest trunk. It complains 
of libgomp.spec missing even though that file exists on my system. I will see 
if I have to change any other environment variables to have a successful 
compilation. I will keep you posted.

BTW, were you successful in reproducing the problem on a system with OpenMPI 
1.4.2?

Thanks
Ananda
-Original Message-
List-Post: users@lists.open-mpi.org
Date: Thu, 12 Aug 2010 09:12:26 -0400
From: Joshua Hursey 
Subject: Re: [OMPI users] Checkpointing mpi4py program
To: Open MPI Users 
Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org>
Content-Type: text/plain; charset=us-ascii

Can you try this with the current trunk (r23587 or later)?

I just added a number of new features and bug fixes, and I would be interested 
to see if it fixes the problem. In particular I suspect that this might be 
related to the Init/Finalize bounding of the checkpoint region.

-- Josh

On Aug 10, 2010, at 2:18 PM,   
wrote:

> Josh
>
> Please find attached is the python program that reproduces the hang that
> I described. Initial part of this file describes the prerequisite
> modules and the steps to reproduce the problem. Please let me know if
> you have any questions in reproducing the hang.
>
> Please note that, if I add the following lines at the end of the program
> (in case sleep_time is True), the problem disappears ie; program resumes
> successfully after successful completion of checkpoint.
> # Add following lines at the end for sleep_time is True
> else:
>   time.sleep(0.1)
> # End of added lines
>
>
> Thanks a lot for your time in looking into this issue.
>
> Regards
> Ananda
>
> Ananda B Mudar, PMP
> Senior Technical Architect
> Wipro Technologies
> Ph: 972 765 8093
> ananda.mu...@wipro.com
>
>
> -Original Message-
> Date: Mon, 9 Aug 2010 16:37:58 -0400
> From: Joshua Hursey 
> Subject: Re: [OMPI users] Checkpointing mpi4py program
> To: Open MPI Users 
> Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org>
> Content-Type: text/plain; charset=windows-1252
>
> I have not tried to checkpoint an mpi4py application, so I cannot say
> for sure if it works or not. You might be hitting something with the
> Python runtime interacting in an odd way with either Open MPI or BLCR.
>
> Can you attach a debugger and get a backtrace on a stuck checkpoint?
> That might show us where things are held up.
>
> -- Josh
>
>
> On Aug 9, 2010, at 4:04 PM, 
>  wrote:
>
>> Hi
>>
>> I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR
> 0.8.2. When I run ompi-checkpoint on the program written using mpi4py, I
> see that program doesn?t resume sometimes after successful checkpoint
> creation. This doesn?t occur always meaning the program resumes after
> successful checkpoint creation most of the time and completes
> successfully. Has anyone tested the checkpoint/restart functionality
> with mpi4py programs? Are there any best practices that I should keep in
> mind while checkpointing mpi4py programs?
>>
>> Thanks for your time
>> -  Ananda
>> Please do not print this email unless it is absolutely necessary.
>>
>> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any attachments.
>>
>> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>>
>> www.wipro.com
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users



Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 

Re: [OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread Gus Correa

Hi Sunita

My guess is that you are picking a wrong mpiexec,
because of the way you set your PATH.
What do you get from "which mpiexec"?

Try *pre-pending" the OpenMPI path to the existing PATH,
instead of appending it (that's what you did with the LD_LIBRARY_PATH):

export PATH=/home/sunitap/soft/openmpi/bin:$PATH

My $0.02
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

sun...@chem.iitb.ac.in wrote:

Dear Open-mpi users,

I installed openmpi-1.4.1 in my user area and then set the path for
openmpi in the .bashrc file as follow. However, am still getting following
error message whenever am starting the parallel molecular dynamics
simulation using GROMACS. So every time am starting the MD job, I need to
source the .bashrc file again.

Earlier in some other machine I did the same thing and was not getting any
problem.

Could you guys suggest what would be the problem?

.bashrc
#path for openmpi
export PATH=$PATH:/home/sunitap/soft/openmpi/bin
export CFLAGS="-I/home/sunitap/soft/openmpi/include"
export LDFLAGS="-L/home/sunitap/soft/openmpi/lib"
export LD_LIBRARY_PATH=/home/sunitap/soft/openmpi/lib:$LD_LIBRARY_PATH

== error message ==
mdrun_mpi: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No such file or directory



Thanks for any help.
Best regards,
Sunita

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread Jeff Squyres
You might want to make sure that this .bashrc is both the same and is 
executated properly upon both interactive and non-interactive logins on all the 
systems that you are running on.


On Aug 13, 2010, at 1:57 AM, sun...@chem.iitb.ac.in wrote:

> Dear Open-mpi users,
> 
> I installed openmpi-1.4.1 in my user area and then set the path for
> openmpi in the .bashrc file as follow. However, am still getting following
> error message whenever am starting the parallel molecular dynamics
> simulation using GROMACS. So every time am starting the MD job, I need to
> source the .bashrc file again.
> 
> Earlier in some other machine I did the same thing and was not getting any
> problem.
> 
> Could you guys suggest what would be the problem?
> 
> .bashrc
> #path for openmpi
> export PATH=$PATH:/home/sunitap/soft/openmpi/bin
> export CFLAGS="-I/home/sunitap/soft/openmpi/include"
> export LDFLAGS="-L/home/sunitap/soft/openmpi/lib"
> export LD_LIBRARY_PATH=/home/sunitap/soft/openmpi/lib:$LD_LIBRARY_PATH
> 
> == error message ==
> mdrun_mpi: error while loading shared libraries: libmpi.so.0: cannot open
> shared object file: No such file or directory
> 
> 
> 
> Thanks for any help.
> Best regards,
> Sunita
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] users Digest, Vol 1658, Issue 2

2010-08-13 Thread Joshua Hursey
I probably won't have an opportunity to work on reproducing this on the 1.4.2. 
The trunk has a bunch of bug fixes that probably will not be backported to the 
1.4 series (things have changed too much since that branch). So I would suggest 
trying the 1.5 series.

-- Josh

On Aug 13, 2010, at 10:12 AM,   
wrote:

> Josh
> 
> I am having problems compiling the sources from the latest trunk. It
> complains of libgomp.spec missing even though that file exists on my
> system. I will see if I have to change any other environment variables
> to have a successful compilation. I will keep you posted.
> 
> BTW, were you successful in reproducing the problem on a system with
> OpenMPI 1.4.2?
> 
> Thanks
> Ananda
> -Original Message-
> Date: Thu, 12 Aug 2010 09:12:26 -0400
> From: Joshua Hursey 
> Subject: Re: [OMPI users] Checkpointing mpi4py program
> To: Open MPI Users 
> Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org>
> Content-Type: text/plain; charset=us-ascii
> 
> Can you try this with the current trunk (r23587 or later)?
> 
> I just added a number of new features and bug fixes, and I would be
> interested to see if it fixes the problem. In particular I suspect that
> this might be related to the Init/Finalize bounding of the checkpoint
> region.
> 
> -- Josh
> 
> On Aug 10, 2010, at 2:18 PM, 
>  wrote:
> 
>> Josh
>> 
>> Please find attached is the python program that reproduces the hang
> that
>> I described. Initial part of this file describes the prerequisite
>> modules and the steps to reproduce the problem. Please let me know if
>> you have any questions in reproducing the hang.
>> 
>> Please note that, if I add the following lines at the end of the
> program
>> (in case sleep_time is True), the problem disappears ie; program
> resumes
>> successfully after successful completion of checkpoint.
>> # Add following lines at the end for sleep_time is True
>> else:
>>  time.sleep(0.1)
>> # End of added lines
>> 
>> 
>> Thanks a lot for your time in looking into this issue.
>> 
>> Regards
>> Ananda
>> 
>> Ananda B Mudar, PMP
>> Senior Technical Architect
>> Wipro Technologies
>> Ph: 972 765 8093
>> ananda.mu...@wipro.com
>> 
>> 
>> -Original Message-
>> Date: Mon, 9 Aug 2010 16:37:58 -0400
>> From: Joshua Hursey 
>> Subject: Re: [OMPI users] Checkpointing mpi4py program
>> To: Open MPI Users 
>> Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org>
>> Content-Type: text/plain; charset=windows-1252
>> 
>> I have not tried to checkpoint an mpi4py application, so I cannot say
>> for sure if it works or not. You might be hitting something with the
>> Python runtime interacting in an odd way with either Open MPI or BLCR.
>> 
>> Can you attach a debugger and get a backtrace on a stuck checkpoint?
>> That might show us where things are held up.
>> 
>> -- Josh
>> 
>> 
>> On Aug 9, 2010, at 4:04 PM, 
>>  wrote:
>> 
>>> Hi
>>> 
>>> I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR
>> 0.8.2. When I run ompi-checkpoint on the program written using mpi4py,
> I
>> see that program doesn?t resume sometimes after successful checkpoint
>> creation. This doesn?t occur always meaning the program resumes after
>> successful checkpoint creation most of the time and completes
>> successfully. Has anyone tested the checkpoint/restart functionality
>> with mpi4py programs? Are there any best practices that I should keep
> in
>> mind while checkpointing mpi4py programs?
>>> 
>>> Thanks for your time
>>> -  Ananda
>>> Please do not print this email unless it is absolutely necessary.
>>> 
>>> The information contained in this electronic message and any
>> attachments to this message are intended for the exclusive use of the
>> addressee(s) and may contain proprietary, confidential or privileged
>> information. If you are not the intended recipient, you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately and destroy all copies of this message and any
> attachments.
>>> 
>>> WARNING: Computer viruses can be transmitted via email. The recipient
>> should check this email and any attachments for the presence of
> viruses.
>> The company accepts no liability for any damage caused by any virus
>> transmitted by this email.
>>> 
>>> www.wipro.com
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> Please do not print this email unless it is absolutely necessary. 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you 

Re: [OMPI users] users Digest, Vol 1658, Issue 2

2010-08-13 Thread ananda.mudar
Josh

I am having problems compiling the sources from the latest trunk. It
complains of libgomp.spec missing even though that file exists on my
system. I will see if I have to change any other environment variables
to have a successful compilation. I will keep you posted.

BTW, were you successful in reproducing the problem on a system with
OpenMPI 1.4.2?

Thanks
Ananda
-Original Message-
List-Post: users@lists.open-mpi.org
Date: Thu, 12 Aug 2010 09:12:26 -0400
From: Joshua Hursey 
Subject: Re: [OMPI users] Checkpointing mpi4py program
To: Open MPI Users 
Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org>
Content-Type: text/plain; charset=us-ascii

Can you try this with the current trunk (r23587 or later)?

I just added a number of new features and bug fixes, and I would be
interested to see if it fixes the problem. In particular I suspect that
this might be related to the Init/Finalize bounding of the checkpoint
region.

-- Josh

On Aug 10, 2010, at 2:18 PM, 
 wrote:

> Josh
> 
> Please find attached is the python program that reproduces the hang
that
> I described. Initial part of this file describes the prerequisite
> modules and the steps to reproduce the problem. Please let me know if
> you have any questions in reproducing the hang.
> 
> Please note that, if I add the following lines at the end of the
program
> (in case sleep_time is True), the problem disappears ie; program
resumes
> successfully after successful completion of checkpoint.
> # Add following lines at the end for sleep_time is True
> else:
>   time.sleep(0.1)
> # End of added lines
> 
> 
> Thanks a lot for your time in looking into this issue.
> 
> Regards
> Ananda
> 
> Ananda B Mudar, PMP
> Senior Technical Architect
> Wipro Technologies
> Ph: 972 765 8093
> ananda.mu...@wipro.com
> 
> 
> -Original Message-
> Date: Mon, 9 Aug 2010 16:37:58 -0400
> From: Joshua Hursey 
> Subject: Re: [OMPI users] Checkpointing mpi4py program
> To: Open MPI Users 
> Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org>
> Content-Type: text/plain; charset=windows-1252
> 
> I have not tried to checkpoint an mpi4py application, so I cannot say
> for sure if it works or not. You might be hitting something with the
> Python runtime interacting in an odd way with either Open MPI or BLCR.
> 
> Can you attach a debugger and get a backtrace on a stuck checkpoint?
> That might show us where things are held up.
> 
> -- Josh
> 
> 
> On Aug 9, 2010, at 4:04 PM, 
>  wrote:
> 
>> Hi
>> 
>> I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR
> 0.8.2. When I run ompi-checkpoint on the program written using mpi4py,
I
> see that program doesn?t resume sometimes after successful checkpoint
> creation. This doesn?t occur always meaning the program resumes after
> successful checkpoint creation most of the time and completes
> successfully. Has anyone tested the checkpoint/restart functionality
> with mpi4py programs? Are there any best practices that I should keep
in
> mind while checkpointing mpi4py programs?
>> 
>> Thanks for your time
>> -  Ananda
>> Please do not print this email unless it is absolutely necessary.
>> 
>> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any
attachments.
>> 
>> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of
viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>> 
>> www.wipro.com
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com



Re: [OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread Terry Dontje

sun...@chem.iitb.ac.in wrote:

Dear Open-mpi users,

I installed openmpi-1.4.1 in my user area and then set the path for
openmpi in the .bashrc file as follow. However, am still getting following
error message whenever am starting the parallel molecular dynamics
simulation using GROMACS. So every time am starting the MD job, I need to
source the .bashrc file again.

Earlier in some other machine I did the same thing and was not getting any
problem.

Could you guys suggest what would be the problem?

  

Have you set OPAL_PREFIX to /home/sunitap/soft/openmpi?

If you do a ldd on mdrun_mpi does libmpi.so.0 come up not found?
If so and there truly is a libmpi.so.0 in /home/sunitap/soft/openmpi/lib
you may want to make sure the bitness of libmpi.so.0 and mdrun_mpi are 
the same by

doing a file command on both.

--td

.bashrc
#path for openmpi
export PATH=$PATH:/home/sunitap/soft/openmpi/bin
export CFLAGS="-I/home/sunitap/soft/openmpi/include"
export LDFLAGS="-L/home/sunitap/soft/openmpi/lib"
export LD_LIBRARY_PATH=/home/sunitap/soft/openmpi/lib:$LD_LIBRARY_PATH

== error message ==
mdrun_mpi: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No such file or directory



Thanks for any help.
Best regards,
Sunita

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread Cristobal Navarro
hello Sunita,

what linux distribution is this?

On Fri, Aug 13, 2010 at 1:57 AM,  wrote:

> Dear Open-mpi users,
>
> I installed openmpi-1.4.1 in my user area and then set the path for
> openmpi in the .bashrc file as follow. However, am still getting following
> error message whenever am starting the parallel molecular dynamics
> simulation using GROMACS. So every time am starting the MD job, I need to
> source the .bashrc file again.
>
> Earlier in some other machine I did the same thing and was not getting any
> problem.
>
> Could you guys suggest what would be the problem?
>
> .bashrc
> #path for openmpi
> export PATH=$PATH:/home/sunitap/soft/openmpi/bin
> export CFLAGS="-I/home/sunitap/soft/openmpi/include"
> export LDFLAGS="-L/home/sunitap/soft/openmpi/lib"
> export LD_LIBRARY_PATH=/home/sunitap/soft/openmpi/lib:$LD_LIBRARY_PATH
>
> == error message ==
> mdrun_mpi: error while loading shared libraries: libmpi.so.0: cannot open
> shared object file: No such file or directory
>
> 
>
> Thanks for any help.
> Best regards,
> Sunita
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread sunita
Dear Open-mpi users,

I installed openmpi-1.4.1 in my user area and then set the path for
openmpi in the .bashrc file as follow. However, am still getting following
error message whenever am starting the parallel molecular dynamics
simulation using GROMACS. So every time am starting the MD job, I need to
source the .bashrc file again.

Earlier in some other machine I did the same thing and was not getting any
problem.

Could you guys suggest what would be the problem?

.bashrc
#path for openmpi
export PATH=$PATH:/home/sunitap/soft/openmpi/bin
export CFLAGS="-I/home/sunitap/soft/openmpi/include"
export LDFLAGS="-L/home/sunitap/soft/openmpi/lib"
export LD_LIBRARY_PATH=/home/sunitap/soft/openmpi/lib:$LD_LIBRARY_PATH

== error message ==
mdrun_mpi: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No such file or directory



Thanks for any help.
Best regards,
Sunita