[OMPI users] Is Iprobe fast when there is no message to recieve

2009-10-01 Thread Peter Lonjers
I am not sure if this is the right place the ask this question but here
it goes.

Simplified abstract version of the question.
I have 2 MPI processes and I want one to make an occasional signal to
the other process.  These signals will not happen at predictable times.
I want the other process sitting in some kind of work loop to be able to
make a very fast check to see if a signal has been sent to it.

What is the best way to do this.

Actual problem
I am working on a realistic neural net simulator. The neurons are split
into groups with one group to each processor to simulate them.
Occasionally a neuron will spike and have to send that message to
neurons on a different processor. This is a relatively rare event. The
receiving neurons need to be able to make a very fast check to see if
there is a message from neurons on another processor.

The way I am doing it now is to use simple send and receive commands.
The receiving cell does an iprobe check on every loop through  the
simulation for every cell that connects to it to see if there is a
message(spike) from that cell. If the iprobe says there is a message is
does a receive on that message. 

This seems convoluted though. I do not actually need to receive the
message just know that a message is there. And it seems like depending
on how Iprobe works there might be a faster method. 

Is Iprobe fast if there is no message to receive?
Would persistent connections work better?



Anyway any help would be greatly appreciated.



Re: [OMPI users] Profiling OpenMPI routines

2009-10-01 Thread Eugene Loh

Aniruddha Marathe wrote:


I am trying to profile (get the call graph/call sequence of) Open MPI
communication routines using GNU Profiler (gprof) since the
communication calls are implemented using macros and it's harder to
trace them statically. In order to do that I compiled the OpenMPI
source code with following options supplied to 'configure' tool:

./configure CFLAGS=-pg CPPFLAGS=-pg --enable-debug
--prefix=/home/amarathe/mpi/svn_openmpi/install

When I recompiled my test MPI application that does MPI_Send and
MPI_Recv with the new library, it generated gmon.out file as expected
(I ran it as 'mpirun -np 2 send_recv'). However, running 'gprof' on
this file didn't provide any information such as the call graphs for
MPI_Send or MPI_Recv. Following is the only function call that I see
in the output:

$ gprof send_recv gmon.out
...
...
%   cumulative   self  self total
time   seconds   secondscalls  Ts/call  Ts/call  name
0.00  0.00 0.00   25 0.00 0.00  data_start
...
...

I would like to know if anyone has done something similar with gprof
or any other open source tool with OpenMPI code.

(I found a similar, fairly recent post on the mailing list, but it
seems to talk about profiling the MPI application itself and not the
OpenMPI library routines -
http://www.open-mpi.org/community/lists/users/2009/04/8999.php)
 

Open source tool or free download?  That is, do you really need to be 
able to see the tool's source code, or are you just interested in 
avoiding license fees?  In any case, since that post you mention, a FAQ 
has appeared on performance tools.  Check 
http://www.open-mpi.org/faq/?category=perftools


You make an important distinction between profiling MPI applications 
versus profiling the library itself, and many tools will help just with 
applications.  But I've used Sun Studio for profiling Open MPI.  
Ideally, you should ./configure with -g among the compilation switches 
so that you get symbolic information about the library, but that isn't 
necessary.  The use of macros and dynamically loaded objects makes 
correlating profiles with source code hard, but it works.  When you 
bring the Analyzer up, I think you also have to unhide the symbols 
within the MPI library, which as I remember are hidden by default.  
Anyhow, it works and I've learned a lot doing things this way.


Re: [OMPI users] memalign usage in OpenMPIand it's consequencesforTotalVIew

2009-10-01 Thread Jeff Squyres

On Oct 1, 2009, at 3:27 PM, Åke Sandgren wrote:


Yes, but perhaps you need to verify that posix_memalign is also
intercepted?



Er... right.  Of course.  :-)

https://svn.open-mpi.org/trac/ompi/changeset/22045

I commented on memalign being obsolete since there are a couple of  
uses

of it in the rest of the openmpi code apart from that particular case.
They should probably be changed.




Some of those are in ROMIO; we don't really want to change those -- it  
just makes it harder to import new versions (speaking of which, we're  
due for a ROMIO refresh sometime in the 1.5 series).


--
Jeff Squyres
jsquy...@cisco.com




[OMPI users] Profiling OpenMPI routines

2009-10-01 Thread Aniruddha Marathe
Hi All,

I am trying to profile (get the call graph/call sequence of) Open MPI
communication routines using GNU Profiler (gprof) since the
communication calls are implemented using macros and it's harder to
trace them statically. In order to do that I compiled the OpenMPI
source code with following options supplied to 'configure' tool:

./configure CFLAGS=-pg CPPFLAGS=-pg --enable-debug
--prefix=/home/amarathe/mpi/svn_openmpi/install

When I recompiled my test MPI application that does MPI_Send and
MPI_Recv with the new library, it generated gmon.out file as expected
(I ran it as 'mpirun -np 2 send_recv'). However, running 'gprof' on
this file didn't provide any information such as the call graphs for
MPI_Send or MPI_Recv. Following is the only function call that I see
in the output:

$ gprof send_recv gmon.out
...
...
%   cumulative   self  self total
time   seconds   secondscalls  Ts/call  Ts/call  name
0.00  0.00 0.00   25 0.00 0.00  data_start
...
...

I would like to know if anyone has done something similar with gprof
or any other open source tool with OpenMPI code.

(I found a similar, fairly recent post on the mailing list, but it
seems to talk about profiling the MPI application itself and not the
OpenMPI library routines -
http://www.open-mpi.org/community/lists/users/2009/04/8999.php)

Thanks & Regards,
Aniruddha


Re: [OMPI users] memalign usage in OpenMPI and it's consequencesforTotalVIew

2009-10-01 Thread Åke Sandgren
On Thu, 2009-10-01 at 15:19 -0400, Jeff Squyres wrote:
> On Oct 1, 2009, at 2:19 PM, Åke Sandgren wrote:
> 
> > No it didn't. And memalign is obsolete according to the manpage.
> > posix_memalign is the one to use.
> >
> 
> 
> This particular call is testing the memalign intercept in the ptmalloc  
> component during startup; we can't replace it with posix_memalign.
> 
> Hence, the values that are passed are fairly meaningless.  It's just  
> testing that the intercept works.

Yes, but perhaps you need to verify that posix_memalign is also
intercepted?

I commented on memalign being obsolete since there are a couple of uses
of it in the rest of the openmpi code apart from that particular case.
They should probably be changed.



Re: [OMPI users] memalign usage in OpenMPI and it's consequencesforTotalVIew

2009-10-01 Thread Jeff Squyres

On Oct 1, 2009, at 2:19 PM, Åke Sandgren wrote:


No it didn't. And memalign is obsolete according to the manpage.
posix_memalign is the one to use.




This particular call is testing the memalign intercept in the ptmalloc  
component during startup; we can't replace it with posix_memalign.


Hence, the values that are passed are fairly meaningless.  It's just  
testing that the intercept works.


--
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] memalign usage in OpenMPI and it's consequences for TotalVIew

2009-10-01 Thread Peter Thompson
The value of 4 might be invalid (though maybe on a 32b machine, it would be 
okay?) but it's enough to allow TotalView to continue on without raising a 
memory event, so I'm okay with it ;-)


PeterT

Ashley Pittman wrote:

Simple malloc() returns pointers that are at least eight byte aligned
anyway, I'm not sure what the reason for calling memalign() with a value
of four would be be anyway.

Ashley,

On Thu, 2009-10-01 at 20:19 +0200, Åke Sandgren wrote:

No it didn't. And memalign is obsolete according to the manpage.
posix_memalign is the one to use.



https://svn.open-mpi.org/trac/ompi/changeset/21744




Re: [OMPI users] memalign usage in OpenMPI and it's consequencesfor TotalVIew

2009-10-01 Thread Samuel K. Gutierrez
Good point.  That particular call to memalign, however, is part of a  
series of OMPI memory hook tests.  The memory allocated by that  
memalign call is promptly freed (opal/mca/memory/ptmalloc2/ 
opal_ptmalloc2_component.c : line 111). The change is to silence  
TotalView's memory alignment error when memory debugging is enabled.


--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Oct 1, 2009, at 12:56 PM, Ashley Pittman wrote:



Simple malloc() returns pointers that are at least eight byte aligned
anyway, I'm not sure what the reason for calling memalign() with a  
value

of four would be be anyway.

Ashley,

On Thu, 2009-10-01 at 20:19 +0200, Åke Sandgren wrote:

No it didn't. And memalign is obsolete according to the manpage.
posix_memalign is the one to use.



https://svn.open-mpi.org/trac/ompi/changeset/21744


--

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] memalign usage in OpenMPI and it's consequencesfor TotalVIew

2009-10-01 Thread Åke Sandgren
On Thu, 2009-10-01 at 19:56 +0100, Ashley Pittman wrote:
> Simple malloc() returns pointers that are at least eight byte aligned
> anyway, I'm not sure what the reason for calling memalign() with a value
> of four would be be anyway.

That is not necessarily true on all systems.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



[OMPI users] Are there ways to reduce the memory used by OpenMPI?

2009-10-01 Thread Blosch, Edwin L
Are there are tuning parameters than I can use to reduce the amount of memory 
used by OpenMPI?  I would very much like to use OpenMPI instead of MVAPICH, but 
I'm on a cluster where memory usage is the most important consideration. Here 
are three results which capture the problem:

With the "leave_pinned" behavior turned on, I get good performance (19.528, 
lower is better)

mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile 
/var/spool/torque/aux/7972.fwnaeglingio -np 28 --mca btl ^tcp  --mca 
mpi_leave_pinned 1 --mca mpool_base_use_mem_hooks 1 -x LD_LIBRARY_PATH -x 
MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_openmpi -cycles 100 -ri 
restart.0 -ro /tmp/7972.fwnaeglingio/restart.0
Compute rate (processor-microseconds/cell/cycle):   19.528
Total memory usage:38155.3477 MB (38.1553 GB)


Turning off the leave_pinned behavior, I get considerably slower performance 
(28.788), but the memory usage is unchanged (still 38 GB)

mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile 
/var/spool/torque/aux/7972.fwnaeglingio -np 28 -x LD_LIBRARY_PATH -x 
MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_openmpi -cycles 100 -ri 
restart.0 -ro /tmp/7972.fwnaeglingio/restart.0
Compute rate (processor-microseconds/cell/cycle):   28.788
Total memory usage:38335.7656 MB (38.3358 GB)


Using MVAPICH, the performance is in the middle (23.6), but the memory usage is 
reduced by 5 to 6 GB out of 38 GB, a significant decrease to me.

/usr/mpi/intel/mvapich-1.1.0/bin/mpirun_rsh -ssh -np 28 -hostfile 
/var/spool/torque/aux/7972.fwnaeglingio 
LD_LIBRARY_PATH="/usr/mpi/intel/mvapich-1.1.0/lib/shared:/usr/mpi/intel/openmpi-1.2.8/lib64:/appserv/intel/fce/10.1.008/lib:/appserv/intel/cce/10.1.008/lib"
 MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_mvapich -cycles 100 -ri 
restart.0 -ro /tmp/7972.fwnaeglingio/restart.0
Compute rate (processor-microseconds/cell/cycle):   23.608
Total memory usage:32753.0586 MB (32.7531 GB)


I didn't see anything in the FAQ that discusses memory usage other than the 
impact of the "leave_pinned" option, which apparently does not affect the 
memory usage in my case.  But I figure there must be a justification why 
OpenMPI would use 6 GB more than MVAPICH on the same case.

Thanks for any insights.  Also attached is the output of ompi_info -a.






ompi_info.output
Description: ompi_info.output


Re: [OMPI users] memalign usage in OpenMPI and it's consequencesfor TotalVIew

2009-10-01 Thread Ashley Pittman

Simple malloc() returns pointers that are at least eight byte aligned
anyway, I'm not sure what the reason for calling memalign() with a value
of four would be be anyway.

Ashley,

On Thu, 2009-10-01 at 20:19 +0200, Åke Sandgren wrote:
> No it didn't. And memalign is obsolete according to the manpage.
> posix_memalign is the one to use.

> > > https://svn.open-mpi.org/trac/ompi/changeset/21744

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI users] memalign usage in OpenMPI and it's consequences for TotalVIew

2009-10-01 Thread Peter Thompson
Took a look at the changes and that looks like it should work.  It's certainly 
not in 1.3.3, but as long as you guys are on top of it, that relieves my 
concerns ;-)


Thanks,
PeterT


Samuel K. Gutierrez wrote:

Ticket created (#2040).  I hope it's okay ;-).

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Oct 1, 2009, at 11:58 AM, Jeff Squyres wrote:


Did that make it over to the v1.3 branch?


On Oct 1, 2009, at 1:39 PM, Samuel K. Gutierrez wrote:


Hi,

I think Jeff has already addressed this problem.

https://svn.open-mpi.org/trac/ompi/changeset/21744

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Oct 1, 2009, at 11:25 AM, Peter Thompson wrote:

> We had a question from a user who had turned on memory debugging in
> TotalView and experience a memory event error Invalid memory
> alignment request.  Having a 1.3.3 build of OpenMPI handy, I tested
> it and sure enough, saw the error.  I traced it down to, surprise, a
> call to memalign.  I find there are a few places where memalign is
> called, but the one I think I was dealing with was from malloc.c in
> ompi/mca//io/romio/romio/adio/common in the following lines:
>
>
> #ifdef ROMIO_XFS
>new = (void *) memalign(XFS_MEMALIGN, size);
> #else
>new = (void *) malloc(size);
> #endif
>
> I searched, but couldn't find a value for XFS_MEMALIGN, so maybe it
> was from opal_pt_malloc2_component.c instead, where the call is
>
>p = memalign(1, 1024 * 1024);
>
> There are only 10 to 12 references to memalign in the code that I
> can see, so it shouldn't be too hard to find.  What I can tell you
> is that the value that TotalView saw for alignment, the first arg,
> was 1, and the second, the size, was  0x10, which is probably
> right for 1024 squared.
>
> The man page for memalign says that the first argument is the
> alignment that the allocated memory use, and it must be a power of
> two.  The second is the length you want allocated.  One could argue
> that 1 is a power of two, but it seems a bit specious to me, and
> TotalView's memory debugger certainly objects to it. Can anyone tell
> me what the intent here is, and whether the memalign alignment
> argument is thought to be valid?  Or is this a bug (that might not
> affect anyone other than TotalView memory debug users?)
>
> Thanks,
> Peter Thompson
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] memalign usage in OpenMPI and it's consequencesfor TotalVIew

2009-10-01 Thread Åke Sandgren
On Thu, 2009-10-01 at 13:58 -0400, Jeff Squyres wrote:
> Did that make it over to the v1.3 branch?

No it didn't. And memalign is obsolete according to the manpage.
posix_memalign is the one to use.

> >
> > I think Jeff has already addressed this problem.
> >
> > https://svn.open-mpi.org/trac/ompi/changeset/21744




Re: [OMPI users] memalign usage in OpenMPI and it's consequencesfor TotalVIew

2009-10-01 Thread Samuel K. Gutierrez

Ticket created (#2040).  I hope it's okay ;-).

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Oct 1, 2009, at 11:58 AM, Jeff Squyres wrote:


Did that make it over to the v1.3 branch?


On Oct 1, 2009, at 1:39 PM, Samuel K. Gutierrez wrote:


Hi,

I think Jeff has already addressed this problem.

https://svn.open-mpi.org/trac/ompi/changeset/21744

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Oct 1, 2009, at 11:25 AM, Peter Thompson wrote:

> We had a question from a user who had turned on memory debugging in
> TotalView and experience a memory event error Invalid memory
> alignment request.  Having a 1.3.3 build of OpenMPI handy, I tested
> it and sure enough, saw the error.  I traced it down to,  
surprise, a

> call to memalign.  I find there are a few places where memalign is
> called, but the one I think I was dealing with was from malloc.c in
> ompi/mca//io/romio/romio/adio/common in the following lines:
>
>
> #ifdef ROMIO_XFS
>new = (void *) memalign(XFS_MEMALIGN, size);
> #else
>new = (void *) malloc(size);
> #endif
>
> I searched, but couldn't find a value for XFS_MEMALIGN, so maybe it
> was from opal_pt_malloc2_component.c instead, where the call is
>
>p = memalign(1, 1024 * 1024);
>
> There are only 10 to 12 references to memalign in the code that I
> can see, so it shouldn't be too hard to find.  What I can tell you
> is that the value that TotalView saw for alignment, the first arg,
> was 1, and the second, the size, was  0x10, which is probably
> right for 1024 squared.
>
> The man page for memalign says that the first argument is the
> alignment that the allocated memory use, and it must be a power of
> two.  The second is the length you want allocated.  One could argue
> that 1 is a power of two, but it seems a bit specious to me, and
> TotalView's memory debugger certainly objects to it. Can anyone  
tell

> me what the intent here is, and whether the memalign alignment
> argument is thought to be valid?  Or is this a bug (that might not
> affect anyone other than TotalView memory debug users?)
>
> Thanks,
> Peter Thompson
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
Hmm, i don't recall seeing that...

On Thu, Oct 1, 2009 at 1:51 PM, Jeff Squyres  wrote:
> FWIW, I saw this bug to have race-condition-like behavior.  I could run a
> few times and then it would work.
>
> On Oct 1, 2009, at 1:42 PM, Michael Di Domenico wrote:
>
>> On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres  wrote:
>> > On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote:
>> >
>> >> I just upgraded to the devel snapshot of 1.4a1r22031
>> >>
>> >> when i run a simple hello world with a barrier i get
>> >>
>> >> btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received
>> >> unexpected process identifier
>> >
>> > I have seen this failure over the last day or three myself.  I'll file a
>> > trac ticket about it.
>> >
>> > (all's fair in love, war, and trunk development snapshots!)
>>
>> Okay, thanks...  Unfortunately i need the dev snap for slurm
>> intergration... :(
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] memalign usage in OpenMPI and it's consequencesfor TotalVIew

2009-10-01 Thread Jeff Squyres

Did that make it over to the v1.3 branch?


On Oct 1, 2009, at 1:39 PM, Samuel K. Gutierrez wrote:


Hi,

I think Jeff has already addressed this problem.

https://svn.open-mpi.org/trac/ompi/changeset/21744

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Oct 1, 2009, at 11:25 AM, Peter Thompson wrote:

> We had a question from a user who had turned on memory debugging in
> TotalView and experience a memory event error Invalid memory
> alignment request.  Having a 1.3.3 build of OpenMPI handy, I tested
> it and sure enough, saw the error.  I traced it down to, surprise, a
> call to memalign.  I find there are a few places where memalign is
> called, but the one I think I was dealing with was from malloc.c in
> ompi/mca//io/romio/romio/adio/common in the following lines:
>
>
> #ifdef ROMIO_XFS
>new = (void *) memalign(XFS_MEMALIGN, size);
> #else
>new = (void *) malloc(size);
> #endif
>
> I searched, but couldn't find a value for XFS_MEMALIGN, so maybe it
> was from opal_pt_malloc2_component.c instead, where the call is
>
>p = memalign(1, 1024 * 1024);
>
> There are only 10 to 12 references to memalign in the code that I
> can see, so it shouldn't be too hard to find.  What I can tell you
> is that the value that TotalView saw for alignment, the first arg,
> was 1, and the second, the size, was  0x10, which is probably
> right for 1024 squared.
>
> The man page for memalign says that the first argument is the
> alignment that the allocated memory use, and it must be a power of
> two.  The second is the length you want allocated.  One could argue
> that 1 is a power of two, but it seems a bit specious to me, and
> TotalView's memory debugger certainly objects to it. Can anyone tell
> me what the intent here is, and whether the memalign alignment
> argument is thought to be valid?  Or is this a bug (that might not
> affect anyone other than TotalView memory debug users?)
>
> Thanks,
> Peter Thompson
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Jeff Squyres
FWIW, I saw this bug to have race-condition-like behavior.  I could  
run a few times and then it would work.


On Oct 1, 2009, at 1:42 PM, Michael Di Domenico wrote:

On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres   
wrote:

> On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote:
>
>> I just upgraded to the devel snapshot of 1.4a1r22031
>>
>> when i run a simple hello world with a barrier i get
>>
>> btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack]  
received

>> unexpected process identifier
>
> I have seen this failure over the last day or three myself.  I'll  
file a

> trac ticket about it.
>
> (all's fair in love, war, and trunk development snapshots!)

Okay, thanks...  Unfortunately i need the dev snap for slurm  
intergration... :(

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres  wrote:
> On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote:
>
>> I just upgraded to the devel snapshot of 1.4a1r22031
>>
>> when i run a simple hello world with a barrier i get
>>
>> btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received
>> unexpected process identifier
>
> I have seen this failure over the last day or three myself.  I'll file a
> trac ticket about it.
>
> (all's fair in love, war, and trunk development snapshots!)

Okay, thanks...  Unfortunately i need the dev snap for slurm intergration... :(


Re: [OMPI users] memalign usage in OpenMPI and it's consequences for TotalVIew

2009-10-01 Thread Samuel K. Gutierrez

Hi,

I think Jeff has already addressed this problem.

https://svn.open-mpi.org/trac/ompi/changeset/21744

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Oct 1, 2009, at 11:25 AM, Peter Thompson wrote:

We had a question from a user who had turned on memory debugging in  
TotalView and experience a memory event error Invalid memory  
alignment request.  Having a 1.3.3 build of OpenMPI handy, I tested  
it and sure enough, saw the error.  I traced it down to, surprise, a  
call to memalign.  I find there are a few places where memalign is  
called, but the one I think I was dealing with was from malloc.c in  
ompi/mca//io/romio/romio/adio/common in the following lines:



#ifdef ROMIO_XFS
   new = (void *) memalign(XFS_MEMALIGN, size);
#else
   new = (void *) malloc(size);
#endif

I searched, but couldn't find a value for XFS_MEMALIGN, so maybe it  
was from opal_pt_malloc2_component.c instead, where the call is


   p = memalign(1, 1024 * 1024);

There are only 10 to 12 references to memalign in the code that I  
can see, so it shouldn't be too hard to find.  What I can tell you  
is that the value that TotalView saw for alignment, the first arg,  
was 1, and the second, the size, was  0x10, which is probably  
right for 1024 squared.


The man page for memalign says that the first argument is the  
alignment that the allocated memory use, and it must be a power of  
two.  The second is the length you want allocated.  One could argue  
that 1 is a power of two, but it seems a bit specious to me, and  
TotalView's memory debugger certainly objects to it. Can anyone tell  
me what the intent here is, and whether the memalign alignment  
argument is thought to be valid?  Or is this a bug (that might not  
affect anyone other than TotalView memory debug users?)


Thanks,
Peter Thompson
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Jeff Squyres

On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote:


I just upgraded to the devel snapshot of 1.4a1r22031

when i run a simple hello world with a barrier i get

btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received
unexpected process identifier




I have seen this failure over the last day or three myself.  I'll file  
a trac ticket about it.


(all's fair in love, war, and trunk development snapshots!)

--
Jeff Squyres
jsquy...@cisco.com



[OMPI users] memalign usage in OpenMPI and it's consequences for TotalVIew

2009-10-01 Thread Peter Thompson
We had a question from a user who had turned on memory debugging in TotalView 
and experience a memory event error Invalid memory alignment request.  Having a 
1.3.3 build of OpenMPI handy, I tested it and sure enough, saw the error.  I 
traced it down to, surprise, a call to memalign.  I find there are a few places 
where memalign is called, but the one I think I was dealing with was from 
malloc.c in ompi/mca//io/romio/romio/adio/common in the following lines:



#ifdef ROMIO_XFS
new = (void *) memalign(XFS_MEMALIGN, size);
#else
new = (void *) malloc(size);
#endif

I searched, but couldn't find a value for XFS_MEMALIGN, so maybe it was from 
opal_pt_malloc2_component.c instead, where the call is


p = memalign(1, 1024 * 1024);

There are only 10 to 12 references to memalign in the code that I can see, so it 
shouldn't be too hard to find.  What I can tell you is that the value that 
TotalView saw for alignment, the first arg, was 1, and the second, the size, was 
 0x10, which is probably right for 1024 squared.


The man page for memalign says that the first argument is the alignment that the 
allocated memory use, and it must be a power of two.  The second is the length 
you want allocated.  One could argue that 1 is a power of two, but it seems a 
bit specious to me, and TotalView's memory debugger certainly objects to it. 
Can anyone tell me what the intent here is, and whether the memalign alignment 
argument is thought to be valid?  Or is this a bug (that might not affect anyone 
other than TotalView memory debug users?)


Thanks,
Peter Thompson


[OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
I just upgraded to the devel snapshot of 1.4a1r22031

when i run a simple hello world with a barrier i get

btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received
unexpected process identifier

if i pull the barrier out the hello world runs fine

interestingly enough, i can run IMB which also uses barrier and it
runs just fine

Any thoughts?


Re: [OMPI users] How to force the configure, and make to build a 32 bit opmi on a 64 bit linux ?

2009-10-01 Thread Jeff Squyres
You probably just want to pass the relevant compiler/linker flags to  
Open MPI's configure script, such as:


  ./configure CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 ...

You need to pass them to all four language flags (C, C++, F77, F90).   
I used -m32 as an example here; use whatever flag(s) is(are) relevant  
for your compiler.




On Oct 1, 2009, at 10:43 AM, Nader Ahmadi wrote:


Hello,

We have 64 bit linux box. For a number of reason I need to build a  
32 bit

openMPI.

I have searched FAQ and archived mail, but I couldn't find a good  
answer.
There are some references to this question, in the developer mailing  
list with no clear

response.

I am I looking for is:
How do I force the configure, and make to build a 32 bit OMPI on a  
64 bit linux.




Thanks

Nader,

.

Microsoft brings you a new way to search the web. Try Bing™ now  
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com




[OMPI users] How to force the configure, and make to build a 32 bit opmi on a 64 bit linux ?

2009-10-01 Thread Nader Ahmadi

Hello,

 

We have 64 bit linux box. For a number of reason I need to build a 32 bit

openMPI. 

 

I have searched FAQ and archived mail, but I couldn't find a good answer.

There are some references to this question, in the developer mailing list with 
no clear

response.

 

I am I looking for is:

How do I force the configure, and make to build a 32 bit OMPI on a 64 bit linux.

 

 

 

Thanks

 

Nader,

 

.
  
_
Microsoft brings you a new way to search the web.  Try  Bing™ now
http://www.bing.com?form=MFEHPG&publ=WLHMTAG&crea=TEXT_MFEHPG_Core_tagline_try 
bing_1x1

Re: [OMPI users] how to SPMD on openmpi

2009-10-01 Thread Jeff Squyres
Open MPI is a compliant MPI-2.1 implementation, meaning that your MPI  
applications are source compatible with other MPI 2.1  
implementations.  In short: use MPI_Send and all the other MPI_*  
functions that you're used to.



On Oct 1, 2009, at 6:36 AM, ankur pachauri wrote:


hi vipin,

thanks for the answer but one thing more, do openmpi had bit  
different library functions than mpi or it's usage is different  
(such as i'll have to use ompi_** insted of mpi_** functions)



thanks in advance

On Thu, Oct 1, 2009 at 2:53 PM, vipin kumar   
wrote:

Hi Ankur,

try this command,

$ mpirun -np 2 -host firstHostIp,secondHostIp a.out

for details read manual page for "mpirun".

$ man mpirun


Regards,


On Wed, Sep 30, 2009 at 3:22 PM, ankur pachauri > wrote:

Dear all,

I have been able to install open mpi on two independent machines  
having FC 10. The simple hello world programms are running fine on  
the independent machinesBut can any one pls help me by letting  
me know how to connect the two machines and run a common program  
between the twohow do we a do a lamboot -v lamhosts in case of  
openmpi?
How do we get the open mpi running on the two computers  
simultaneously and excute a common program on the two machines.


Thanks in advance

--
Ankur Pachauri.
09927590910

Research Scholar,
software engineering.
Department of Mathematics
Dayalbagh Educational Institute
Dayalbagh,
AGRA

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Vipin K.
Research Engineer,
C-DOTB, India

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Ankur Pachauri.
09927590910

Research Scholar,
software engineering.
Department of Mathematics
Dayalbagh Educational Institute
Dayalbagh,
AGRA
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] MPI_Comm_accept()/connect() errors

2009-10-01 Thread Blesson Varghese
The following is the information regarding the error. I am running Open MPI
1.2.5 on Ubuntu 4.2.4, kernel version 2.6.24



I ran the server program as mpirun -np 1 server. This program gave me the
output port as 0.1.0:2000. I used this port name value as the command line
argument for the client program: mpirun -np 1 client 0.1.1:2000.



- The output of the "ompi_info --all" is attached with the email

- PATH Variable:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr
/local/maui/bin/:

- LD_LIBRARY_PATH variable was empty

- The following is the output of ifconfig on hpcc00 from where the error has
been generated:

eth0  Link encap:Ethernet  HWaddr 00:12:3f:4c:2d:78

  inet addr:134.225.200.100  Bcast:134.225.200.255
Mask:255.255.255.0

  inet6 addr: fe80::212:3fff:fe4c:2d78/64 Scope:Link

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

  RX packets:15912728 errors:0 dropped:0 overruns:0 frame:0

  TX packets:15312376 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:1000

  RX bytes:2951880321 (2.7 GB)  TX bytes:2788249498 (2.5 GB)

  Interrupt:16



loLink encap:Local Loopback

  inet addr:127.0.0.1  Mask:255.0.0.0

  inet6 addr: ::1/128 Scope:Host

  UP LOOPBACK RUNNING  MTU:16436  Metric:1

  RX packets:3507489 errors:0 dropped:0 overruns:0 frame:0

  TX packets:3507489 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:0

  RX bytes:1794266658 (1.6 GB)  TX bytes:1794266658 (1.6 GB)



Regards,

Blesson.



From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Ralph Castain
Sent: 29 September 2009 23:59
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept()/connect() errors



I will ask the obvious - what version of Open MPI are you running? In what
environment? What was your command line?



:-)



On Sep 29, 2009, at 3:50 PM, Blesson Varghese wrote:



Hi,



I have been trying to execute the server.c and client.c program provided in
http://www.mpi-forum.org/docs/mpi21-report/node213.htm#Node213, using
accept() and connect() function in MPI. However, the following errors are
generated.



[hpcc00:16522] *** An error occurred in MPI_Comm_connect

[hpcc00:16522] *** on communicator MPI_COMM_WORLD

[hpcc00:16522] *** MPI_ERR_INTERN: internal error

[hpcc00:16522] *** MPI_ERRORS_ARE_FATAL (goodbye)



Could anybody please help me?



Many thanks,
Blesson.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Open MPI: 1.2.5
   Open MPI SVN revision: r16989
Open RTE: 1.2.5
   Open RTE SVN revision: r16989
OPAL: 1.2.5
   OPAL SVN revision: r16989
   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.5)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5)
 MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
 MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
   MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
  MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
  MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
  MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
  MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
  

Re: [OMPI users] how to SPMD on openmpi

2009-10-01 Thread ankur pachauri
hi vipin,

thanks for the answer but one thing more, do openmpi had bit different
library functions than mpi or it's usage is different (such as i'll have to
use ompi_** insted of mpi_** functions)


thanks in advance

On Thu, Oct 1, 2009 at 2:53 PM, vipin kumar  wrote:

> Hi Ankur,
>
> try this command,
>
> $ mpirun -np 2 -host firstHostIp,secondHostIp a.out
>
> for details read manual page for "mpirun".
>
> $ man mpirun
>
>
> Regards,
>
>
> On Wed, Sep 30, 2009 at 3:22 PM, ankur pachauri 
> wrote:
>
>> Dear all,
>>
>> I have been able to install open mpi on two independent machines having FC
>> 10. The simple hello world programms are running fine on the independent
>> machinesBut can any one pls help me by letting me know how to connect
>> the two machines and run a common program between the twohow do we a do
>> a lamboot -v lamhosts in case of openmpi?
>> How do we get the open mpi running on the two computers simultaneously and
>> excute a common program on the two machines.
>>
>> Thanks in advance
>>
>> --
>> Ankur Pachauri.
>> 09927590910
>>
>> Research Scholar,
>> software engineering.
>> Department of Mathematics
>> Dayalbagh Educational Institute
>> Dayalbagh,
>> AGRA
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Ankur Pachauri.
09927590910

Research Scholar,
software engineering.
Department of Mathematics
Dayalbagh Educational Institute
Dayalbagh,
AGRA


[OMPI users] job fails with "Signal: Bus error (7)"

2009-10-01 Thread Sangamesh B
Hi,

 A fortran application which is compiled with ifort-10.1 and open mpi
1.3.1 on Cent OS 5.2 fails after running 4 days with following error
message:

[compute-0-7:25430] *** Process received signal ***
[compute-0-7:25433] *** Process received signal ***
[compute-0-7:25433] Signal: Bus error (7)
[compute-0-7:25433] Signal code:  (2)
[compute-0-7:25433] Failing at address: 0x4217b8
[compute-0-7:25431] *** Process received signal ***
[compute-0-7:25431] Signal: Bus error (7)
[compute-0-7:25431] Signal code:  (2)
[compute-0-7:25431] Failing at address: 0x4217b8
[compute-0-7:25432] *** Process received signal ***
[compute-0-7:25432] Signal: Bus error (7)
[compute-0-7:25432] Signal code:  (2)
[compute-0-7:25432] Failing at address: 0x4217b8
[compute-0-7:25430] Signal: Bus error (7)
[compute-0-7:25430] Signal code:  (2)
[compute-0-7:25430] Failing at address: 0x4217b8
[compute-0-7:25431] *** Process received signal ***
[compute-0-7:25431] Signal: Segmentation fault (11)
[compute-0-7:25431] Signal code:  (128)
[compute-0-7:25431] Failing at address: (nil)
[compute-0-7:25430] *** Process received signal ***
[compute-0-7:25433] *** Process received signal ***
[compute-0-7:25433] Signal: Segmentation fault (11)
[compute-0-7:25433] Signal code:  (128)
[compute-0-7:25433] Failing at address: (nil)
[compute-0-7:25432] *** Process received signal ***
[compute-0-7:25432] Signal: Segmentation fault (11)
[compute-0-7:25432] Signal code:  (128)
[compute-0-7:25432] Failing at address: (nil)
[compute-0-7:25430] Signal: Segmentation fault (11)
[compute-0-7:25430] Signal code:  (128)
[compute-0-7:25430] Failing at address: (nil)
--
mpirun noticed that process rank 3 with PID 25433 on node
compute-0-7.local exited on signal 11 (Segmentation fault).


--

This job is run with 4 open mpi processes, on the nodes which have
interconnected with Gigabit.
The same job runs well on the nodes with infiniband connectivity.

What could be the reason for this? Is this due to loose physical
connectivities, as its giving a bus error?


Re: [OMPI users] how to SPMD on openmpi

2009-10-01 Thread vipin kumar
Hi Ankur,

try this command,

$ mpirun -np 2 -host firstHostIp,secondHostIp a.out

for details read manual page for "mpirun".

$ man mpirun


Regards,


On Wed, Sep 30, 2009 at 3:22 PM, ankur pachauri wrote:

> Dear all,
>
> I have been able to install open mpi on two independent machines having FC
> 10. The simple hello world programms are running fine on the independent
> machinesBut can any one pls help me by letting me know how to connect
> the two machines and run a common program between the twohow do we a do
> a lamboot -v lamhosts in case of openmpi?
> How do we get the open mpi running on the two computers simultaneously and
> excute a common program on the two machines.
>
> Thanks in advance
>
> --
> Ankur Pachauri.
> 09927590910
>
> Research Scholar,
> software engineering.
> Department of Mathematics
> Dayalbagh Educational Institute
> Dayalbagh,
> AGRA
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Vipin K.
Research Engineer,
C-DOTB, India


Re: [OMPI users] error in checkpointing an mpi application

2009-10-01 Thread Constantinos Makassikis

Hi,

from what you describe below, seems as if you did not configure well 
OpenMPI.


You issued

./configure --with-ft=cr --enable-mpi-threads --with-blcr=/usr/local/bin 
--with-blcr-libdir=/usr/local/lib

while according to the installation paths you gave it should have been 
more like


./configure --with-ft=cr --enable-mpi-threads --with-blcr=/root/MS 
--with-blcr-libdir=/root/MS/lib



Apart from that, if you wish to have BLCR modules loaded at start up of 
your machine, a simple way is to add the following lines in rc.local 
This file is somewhere in /etc: the exact location can vary from one linux 
distribution to another (e.g.: /etc/rc.d/rc.local or /etc/rc.local)


/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr_imports.ko 
/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr.ko


Just in case, if you have multiple MPIs installed, you can check which 
you are using with the following command:


which mpirun


HTH,

--
Constantinos

Mallikarjuna Shastry wrote:

 dear sir


i am sending the details as follows


1. i am using openmpi-1.3.3 and blcr 0.8.2 
2. i have installed blcr 0.8.2 first under /root/MS

3. then i installed openmpi 1.3.3 under /root/MS
4 i have configured and installed open mpi as follows

#./configure --with-ft=cr --enable-mpi-threads --with-blcr=/usr/local/bin 
--with-blcr-libdir=/usr/local/lib
# make 
# make install


then i added the following to the .bash_profile under home directory( i went to 
home directory by doing cd ~)

/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr_imports.ko 
/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr.ko 
PATH=$PATH:/usr/local/bin

MANPATH=$MANPATH:/usr/local/man
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

then i compiled and run the file arr_add.c as follows

[root@localhost examples]# mpicc -o res arr_add.c
[root@localhost examples]# mpirun -np 2 -am ft-enable-cr ./res

2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
--
Error: The process with PID 5790 is not checkpointable.
   This could be due to one of the following:
- An application with this PID doesn't currently exist
- The application with this PID isn't checkpointable
- The application with this PID isn't an OPAL application.
   We were looking for the named files:
 /tmp/opal_cr_prog_write.5790
 /tmp/opal_cr_prog_read.5790
--
[localhost.localdomain:05788] local) Error: Unable to initiate the handshake 
with peer [[7788,1],1]. -1
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 567
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 1054
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2


NOTE: the PID of mpirun is 5788

i geve the following command for taking the checkpoint

[root@localhost examples]#ompi-checkpoint -s 5788

i got the following output , but it was hanging like this

[localhost.localdomain:05796] Requested - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]   Pending - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]   Running - Global Snapshot 
Reference: (null)


can anybody resolve this problem
kindly rectify it.


with regards

mallikarjuna shastry



  
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  




[OMPI users] error in checkpointing an mpi application

2009-10-01 Thread Mallikarjuna Shastry
 dear sir


i am sending the details as follows


1. i am using openmpi-1.3.3 and blcr 0.8.2 
2. i have installed blcr 0.8.2 first under /root/MS
3. then i installed openmpi 1.3.3 under /root/MS
4 i have configured and installed open mpi as follows

#./configure --with-ft=cr --enable-mpi-threads --with-blcr=/usr/local/bin 
--with-blcr-libdir=/usr/local/lib
# make 
# make install

then i added the following to the .bash_profile under home directory( i went to 
home directory by doing cd ~)

/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr_imports.ko 
/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr.ko 
PATH=$PATH:/usr/local/bin
MANPATH=$MANPATH:/usr/local/man
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

then i compiled and run the file arr_add.c as follows

[root@localhost examples]# mpicc -o res arr_add.c
[root@localhost examples]# mpirun -np 2 -am ft-enable-cr ./res

2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
--
Error: The process with PID 5790 is not checkpointable.
   This could be due to one of the following:
- An application with this PID doesn't currently exist
- The application with this PID isn't checkpointable
- The application with this PID isn't an OPAL application.
   We were looking for the named files:
 /tmp/opal_cr_prog_write.5790
 /tmp/opal_cr_prog_read.5790
--
[localhost.localdomain:05788] local) Error: Unable to initiate the handshake 
with peer [[7788,1],1]. -1
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 567
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 1054
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2


NOTE: the PID of mpirun is 5788

i geve the following command for taking the checkpoint

[root@localhost examples]#ompi-checkpoint -s 5788

i got the following output , but it was hanging like this

[localhost.localdomain:05796] Requested - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]   Pending - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]   Running - Global Snapshot 
Reference: (null)


can anybody resolve this problem
kindly rectify it.


with regards

mallikarjuna shastry






Re: [OMPI users] Openmpi setup with intel compiler.

2009-10-01 Thread vighnesh
Dear Peter,
 I got info from the net that OPenmpi requires F77 bindings for F90
support. Thats where I was making mistake, i didnt configured for F77
bindings during openmpi setup. I rectified my mistake and after that
openmpi
was installed successfully for both PGI and INTEL compiler.

It was great help from you.  :-)

Thanks and Regards,
Vighnesh





>Dear Peter,
>Your suggestions did worked, it didnt showed any error during make and
>make install. But it didnt got installed with mpif90 support. I tried to
>compile my mpi code, but it gave following error.



>[vighnesh@test_node SIVA]$ /share/apps/mpi/openmpi/intel/bin/mpif90
code.f >-o code.exe
>--
>Unfortunately, this installation of Open MPI was not compiled with
Fortran >90 support.  As such, the mpif90 compiler is non-functional.
>--

> My configure script line is:
>[root@test_node vighnesh]# ./configure
>--prefix=/share/apps/mpi/openmpi/intel FC=ifort --with-tm=/opt/torque

>Please help me.

>Thanks and Regards,
>Vighnesh



> On Wednesday 30 September 2009, vighn...@aero.iitb.ac.in wrote:
> ...
>> during
>> configuring with Intel 9.0 compiler the installation gives following
error.
>>
>> [root@test_node openmpi-1.3.3]# make all install
> ...
>> make[3]: Entering directory `/tmp/openmpi-1.3.3/orte'
>> test -z "/share/apps/mpi/openmpi/intel/lib" || /bin/mkdir -p
>> "/share/apps/mpi/openmpi/intel/lib"
>>  /bin/sh ../libtool   --mode=install /usr/bin/install -c
>> 'libopen-rte.la'
>> '/share/apps/mpi/openmpi/intel/lib/libopen-rte.la'
>> libtool: install: error: cannot install `libopen-rte.la' to a directory
not ending in /share/apps/mpi/openmpi/pgi/lib
>
> The line above indicates that you've somehow attempted this from a dirty
tree
> and/or environment (dirty from the previous pgi installation...).
>
> Try a clean environment, clean build tree. Source the icc/ifort-vars.sh
files
> from your intel install dir, set CC, CXX, FC, F77 and do:
>  "./configure --prefix=... && make && make install"
>
> /Peter
>






Re: [OMPI users] Openmpi setup with intel compiler.

2009-10-01 Thread vighnesh
Dear Peter,
Your suggestions did worked, it didnt showed any error during make and
make install. But it didnt got installed with mpif90 support. I tried to
compile my mpi code, but it gave following error.



[vighnesh@test_node SIVA]$ /share/apps/mpi/openmpi/intel/bin/mpif90 code.f
-o code.exe
--
Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional.
--

 My configure script line is:
[root@test_node vighnesh]# ./configure
--prefix=/share/apps/mpi/openmpi/intel FC=ifort --with-tm=/opt/torque

Please help me.

Thanks and Regards,
Vighnesh



> On Wednesday 30 September 2009, vighn...@aero.iitb.ac.in wrote:
> ...
>> during
>> configuring with Intel 9.0 compiler the installation gives following
>> error.
>>
>> [root@test_node openmpi-1.3.3]# make all install
> ...
>> make[3]: Entering directory `/tmp/openmpi-1.3.3/orte'
>> test -z "/share/apps/mpi/openmpi/intel/lib" || /bin/mkdir -p
>> "/share/apps/mpi/openmpi/intel/lib"
>>  /bin/sh ../libtool   --mode=install /usr/bin/install -c
>> 'libopen-rte.la'
>> '/share/apps/mpi/openmpi/intel/lib/libopen-rte.la'
>> libtool: install: error: cannot install `libopen-rte.la' to a directory
>> not ending in /share/apps/mpi/openmpi/pgi/lib
>
> The line above indicates that you've somehow attempted this from a dirty
> tree
> and/or environment (dirty from the previous pgi installation...).
>
> Try a clean environment, clean build tree. Source the icc/ifort-vars.sh
> files
> from your intel install dir, set CC, CXX, FC, F77 and do:
>  "./configure --prefix=... && make && make install"
>
> /Peter
>