[OMPI users] compilation error using pgf90 ver 9.0

2010-08-31 Thread mohamed makhyoun
Dear opem-mpi users:

 I have got the following error while compiling openmpi using pgf90 ver 9
and CC=gcc

 How I can run make and avoiding the -pthread flag ?.

pgf90-Error-Unknown switch: -pthread
make[4]: *** [libmpi_f90.la] Error 1
make[4]: Leaving directory
`/home/mohamed/bin/openmpi-1.4.1/build/ompi/mpi/f90'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory
`/home/mohamed/bin/openmpi-1.4.1/build/ompi/mpi/f90'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/home/mohamed/bin/openmpi-1.4.1/build/ompi/mpi/f90'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/mohamed/bin/openmpi-1.4.1/build/ompi'
make: *** [all-recursive] Error 1

I appreciate if any one can help me.

Best Regard

M. Makhyoun


[OMPI users] Problem including C MPI code from C++ using C linkage

2010-08-31 Thread Patrik Jonsson
Hi all,

I'm have a C MPI code that I need to link into my C++ code. As usual,
from my C++ code, I do

extern "C" {
#include "c-code.h"
}

where c-code.h includes, among other things, mpi.h.

This doesn't work, because it appears mpi.h tries to detect whether
it's being compiled as C or C++ and includes mpicxx.h if the language
is C++. The problem is that that doesn't work in C linkage, so the
compilation dies with errors like:

mpic++  -I. -I$HOME/include/libPJutil -I$HOME/code/arepo -m32
arepotest.cc -I$HOME/include -I/sw/include -L/sw/lib
-L$HOME/code/arepo -larepo -lhdf5  -lgsl -lgmp -lmpi
In file included from /usr/include/c++/4.2.1/map:65,
                from /sw/include/openmpi/ompi/mpi/cxx/mpicxx.h:36,
                from /sw/include/mpi.h:1886,
                from /Users/patrik/code/arepo/allvars.h:23,
                from /Users/patrik/code/arepo/proto.h:2,
                from arepo_grid.h:36,
                from arepotest.cc:3:
/usr/include/c++/4.2.1/bits/stl_tree.h:134: error: template with C linkage
/usr/include/c++/4.2.1/bits/stl_tree.h:145: error: declaration of C
function 'const std::_Rb_tree_node_base* std::_Rb_tree_increment(const
std::_Rb_tree_node_base*)' conflicts with
/usr/include/c++/4.2.1/bits/stl_tree.h:142: error: previous
declaration 'std::_Rb_tree_node_base*
std::_Rb_tree_increment(std::_Rb_tree_node_base*)' here
/usr/include/c++/4.2.1/bits/stl_tree.h:151: error: declaration of C
function 'const std::_Rb_tree_node_base* std::_Rb_tree_decrement(const
std::_Rb_tree_node_base*)' conflicts with
/usr/include/c++/4.2.1/bits/stl_tree.h:148: error: previous
declaration 'std::_Rb_tree_node_base*
std::_Rb_tree_decrement(std::_Rb_tree_node_base*)' here
/usr/include/c++/4.2.1/bits/stl_tree.h:153: error: template with C linkage
/usr/include/c++/4.2.1/bits/stl_tree.h:223: error: template with C linkage
/usr/include/c++/4.2.1/bits/stl_tree.h:298: error: template with C linkage
/usr/include/c++/4.2.1/bits/stl_tree.h:304: error: template with C linkage
/usr/include/c++/4.2.1/bits/stl_tree.h:329: error: template with C linkage
etc. etc.

It seems a bit presumptuous of mpi.h to just include mpicxx.h just
because __cplusplus is defined, since that makes it impossible to link
C MPI code from C++.

I've had to resort to something like

#ifdef __cplusplus
#undef __cplusplus
#include 
#define __cplusplus
#else
#include 
#endif

in c-code.h, which seems to work but isn't exactly smooth. Is there
another way around this, or has linking C MPI code with C++ never come
up before?

Thanks,

/Patrik Jonsson



Re: [OMPI users] Fwd: Problems with OpenMPI

2010-08-31 Thread Gus Correa

Osvaldo

These FAQ may help:
http://www.open-mpi.org/faq/?category=rsh

Also, make sure the same OpenMPI is either installed on,
or accessible by (say via NFS) the "anotherhost".

Simple test: 'mpirun -np 8 --host [localhost,anotherhost] hostname'

IHIH
Gus Correa

David Zhang wrote:

Check firewall, network setting like subnet, and ssh keys?

On 8/31/10, Osvaldo Reis  wrote:

Hi, I wanted to use openmpi, I install with no errors, and when I run de
examples in local it work well, but when I specify another host to run it
doesn't work. There are no errors but the it doesn't show anything and don't
start the process in another host and don't stop to run in localhost. Then I
press Crtl+C to killl the process and it shows:

^Cmpirun: killing job...

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
--
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--
anotherhost - daemon did not report back when launched

I loot at /var/log/secure and it started the connection but after he close
without execute anything.


Running localhost

[user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c
Hello, world, I am 0 of 8
Hello, world, I am 1 of 8
Hello, world, I am 2 of 8
Hello, world, I am 4 of 8
Hello, world, I am 5 of 8
Hello, world, I am 6 of 8
Hello, world, I am 7 of 8
Hello, world, I am 3 of 8

Running in another host

[user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c


Some help please!



--
Osvaldo Reis Junior
Engenharia de Computação - UEPG
Laboratório de Genômica e Expressão - LGE
Universidade Estadual de Campinas - UNICAMP
MSN: osvaldorei...@hotmail.com
Skype: osvaldoreisss
Cel: (19) 8128-5273







Re: [OMPI users] Fwd: Problems with OpenMPI

2010-08-31 Thread Osvaldo Reis
Ty David, the problem was my firewall. The server machine is new and who
install the OS forgot to desable the iptables.

Thanks

2010/8/31 David Zhang 

> Check firewall, network setting like subnet, and ssh keys?
>
> On 8/31/10, Osvaldo Reis  wrote:
> > Hi, I wanted to use openmpi, I install with no errors, and when I run de
> > examples in local it work well, but when I specify another host to run it
> > doesn't work. There are no errors but the it doesn't show anything and
> don't
> > start the process in another host and don't stop to run in localhost.
> Then I
> > press Crtl+C to killl the process and it shows:
> >
> > ^Cmpirun: killing job...
> >
> >
> --
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> --
> >
> --
> > mpirun was unable to cleanly terminate the daemons on the nodes shown
> > below. Additional manual cleanup may be required - please refer to
> > the "orte-clean" tool for assistance.
> >
> --
> > anotherhost - daemon did not report back when launched
> >
> > I loot at /var/log/secure and it started the connection but after he
> close
> > without execute anything.
> >
> >
> > Running localhost
> >
> > [user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c
> > Hello, world, I am 0 of 8
> > Hello, world, I am 1 of 8
> > Hello, world, I am 2 of 8
> > Hello, world, I am 4 of 8
> > Hello, world, I am 5 of 8
> > Hello, world, I am 6 of 8
> > Hello, world, I am 7 of 8
> > Hello, world, I am 3 of 8
> >
> > Running in another host
> >
> > [user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c
> >
> >
> > Some help please!
> >
> >
> >
> > --
> > Osvaldo Reis Junior
> > Engenharia de Computação - UEPG
> > Laboratório de Genômica e Expressão - LGE
> > Universidade Estadual de Campinas - UNICAMP
> > MSN: osvaldorei...@hotmail.com
> > Skype: osvaldoreisss
> > Cel: (19) 8128-5273
> >
>
> --
> Sent from my mobile device
>
> David Zhang
> University of California, San Diego
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Osvaldo Reis Junior
Engenharia de Computação - UEPG
Laboratório de Genômica e Expressão - LGE
Universidade Estadual de Campinas - UNICAMP
MSN: osvaldorei...@hotmail.com
Skype: osvaldoreisss
Cel: (19) 8128-5273


Re: [OMPI users] Fwd: Problems with OpenMPI

2010-08-31 Thread David Zhang
Check firewall, network setting like subnet, and ssh keys?

On 8/31/10, Osvaldo Reis  wrote:
> Hi, I wanted to use openmpi, I install with no errors, and when I run de
> examples in local it work well, but when I specify another host to run it
> doesn't work. There are no errors but the it doesn't show anything and don't
> start the process in another host and don't stop to run in localhost. Then I
> press Crtl+C to killl the process and it shows:
>
> ^Cmpirun: killing job...
>
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> --
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --
> anotherhost - daemon did not report back when launched
>
> I loot at /var/log/secure and it started the connection but after he close
> without execute anything.
>
>
> Running localhost
>
> [user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c
> Hello, world, I am 0 of 8
> Hello, world, I am 1 of 8
> Hello, world, I am 2 of 8
> Hello, world, I am 4 of 8
> Hello, world, I am 5 of 8
> Hello, world, I am 6 of 8
> Hello, world, I am 7 of 8
> Hello, world, I am 3 of 8
>
> Running in another host
>
> [user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c
>
>
> Some help please!
>
>
>
> --
> Osvaldo Reis Junior
> Engenharia de Computação - UEPG
> Laboratório de Genômica e Expressão - LGE
> Universidade Estadual de Campinas - UNICAMP
> MSN: osvaldorei...@hotmail.com
> Skype: osvaldoreisss
> Cel: (19) 8128-5273
>

-- 
Sent from my mobile device

David Zhang
University of California, San Diego



[OMPI users] Fwd: Problems with OpenMPI

2010-08-31 Thread Osvaldo Reis
Hi, I wanted to use openmpi, I install with no errors, and when I run de
examples in local it work well, but when I specify another host to run it
doesn't work. There are no errors but the it doesn't show anything and don't
start the process in another host and don't stop to run in localhost. Then I
press Crtl+C to killl the process and it shows:

^Cmpirun: killing job...

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
--
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--
anotherhost - daemon did not report back when launched

I loot at /var/log/secure and it started the connection but after he close
without execute anything.


Running localhost

[user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c
Hello, world, I am 0 of 8
Hello, world, I am 1 of 8
Hello, world, I am 2 of 8
Hello, world, I am 4 of 8
Hello, world, I am 5 of 8
Hello, world, I am 6 of 8
Hello, world, I am 7 of 8
Hello, world, I am 3 of 8

Running in another host

[user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c


Some help please!



-- 
Osvaldo Reis Junior
Engenharia de Computação - UEPG
Laboratório de Genômica e Expressão - LGE
Universidade Estadual de Campinas - UNICAMP
MSN: osvaldorei...@hotmail.com
Skype: osvaldoreisss
Cel: (19) 8128-5273


[OMPI users] [R] Short survey concerning the use of software engineering in the field of High Performance Computing

2010-08-31 Thread Markus Schmidberger

Dear Colleagues,

this is a short survey (21 questions that take about 10 minutes to 
answer) in context of the research work for my PhD thesis and the Munich 
Center of Advanced Computing (Project B2). It would be very helpful, if 
you will take the time to answer my questions concerning the use of 
software engineering in the field of High Performance Computing.


Please note, all questions are mandatory to answer!

http://www.q-set.de/q-set.php?sCode=TCSBHMPZAASZ


Thank you very much, kind regards

Miriam Schmidberger
(Dipl. Medien-Inf.)

schmi...@in.tum.de

Technische Universität München
Institut für Informatik
Boltzmannstr. 3
85748 Garching
Germany
Office 01.07.037
Tel: +49 (89) 289-18226


Re: [OMPI users] Memory allocation error when linking with MPI libraries

2010-08-31 Thread Nicolas Deladerriere
Hi,

Thanks Nysal for these details.

I also fixed my memory allocation issue using environment variable
OMPI_MCA_memory_ptmalloc2_disable which is much more easier (at least in my
case) than compiled new openmpi package and install that new package.
The point is that it is a bit complicated to have information about this
variable (seems to be a secret variable !). Actually I have read that it
cannot be used as normal MCA parameter and cannot be set in configuration
file ( http://www.open-mpi.org/community/lists/users/2010/06/13208.php ).

When using this variable, I have added -x OMPI_MCA_memory_ptmalloc2_disable
option to my mpirun command line. Do I really have to do it ?
Is the environment variable (plus -x option if required) is still the only
solution to set this parameter to 1 ?

Regards,
Nicolas



2010/8/15 Nysal Jan 

> >What does it exactly imply to compile with this option ?
> Open MPI's internal malloc library (ptmalloc) will not be built/used. If
> you are using an RDMA capable interconnect such as Infiniband, you will not
> be able to use the "mpi_leave_pinned" feature. mpi_leave_pinned might
> improve performance for applications that reuse/repeatedly send from the
> same buffer. If you are not using such interconnects then there is no impact
> on performance. For more details see the FAQ entries (24-28) -
> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
>
> --Nysal
>
>
>
> On Thu, Aug 12, 2010 at 6:30 PM, Nicolas Deladerriere <
> nicolas.deladerri...@gmail.com> wrote:
>
>> building openmpi with option "--without-memory-manager" fix my problem.
>>
>> What does it exactly imply to compile with this option ?
>> I guess all malloc use functions from libc instead of openmpi one, but
>> does it have an effect on performance or something else ?
>>
>> Nicolas
>>
>> 2010/8/8 Nysal Jan 
>>
>> What interconnect are you using? Infiniband? Use
>>> "--without-memory-manager" option while building ompi in order to disable
>>> ptmalloc.
>>>
>>> Regards
>>> --Nysal
>>>
>>>
>>> On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere <
>>> nicolas.deladerri...@gmail.com> wrote:
>>>
 Yes, I'am using 24G machine on 64 bit Linux OS.
 If I compile without wrapper, I did not get any problems.

 It seems that when I am linking with openmpi, my program use a kind of
 openmpi implemented malloc. Is it possible to switch it off in order ot 
 only
 use malloc from libc ?

 Nicolas

 2010/8/8 Terry Frankcombe 

 You're trying to do a 6GB allocate.  Can your underlying system handle
> that?  IF you compile without the wrapper, does it work?
>
> I see your executable is using the OMPI memory stuff.  IIRC there are
> switches to turn that off.
>
>
> On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote:
> > Hello,
> >
> > I'am having an sigsegv error when using simple program compiled and
> > link with openmpi.
> > I have reproduce the problem using really simple fortran code. It
> > actually does not even use MPI, but just link with mpi shared
> > libraries. (problem does not appear when I do not link with mpi
> > libraries)
> >% cat allocate.F90
> >program test
> >implicit none
> >integer, dimension(:), allocatable :: z
> >integer(kind=8) :: l
> >
> >write(*,*) "l ?"
> >read(*,*) l
> >
> >ALLOCATE(z(l))
> >z(1) = 111
> >z(l) = 222
> >DEALLOCATE(z)
> >
> >end program test
> >
> > I am using openmpi 1.4.2 and gfortran for my tests. Here is the
> > compilation :
> >
> >% ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate
> > allocate.F90
> >gfortran -g -o testallocate allocate.F90
> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread
> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib
> > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90
> > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
> > -lutil -lm -ldl -pthread
> >
> > When I am running that test with different length, I sometimes get a
> > "Segmentation fault" error. Here are two examples using two specific
> > values, but error happens for many other values of length (I did not
> > manage to find which values of lenght gives that error)
> >
> >%  ./testallocate
> > l ?
> >16
> >Segmentation fault
> >% ./testallocate
> > l ?
> >20
> >
> > I used debugger with re-compiled version of openmpi using debug flag.
> > I got the folowing error in function sYSMALLOc
> >
> >Program received signal SIGSEGV, Segmentation fault.
> >0x2b70b3b3 in sYSMALLOc (nb=640016, av=0x2b930200)

Re: [OMPI users] High Checkpoint Overhead Ratio

2010-08-31 Thread Joshua Hursey
Have you tried testing without using the NFS? So setting the mca-params.conf to 
something like:
crs_base_snapshot_dir=/tmp/
snapc_base_global_snapshot_dir=/tmp/global
snapc_basee_store_in_place=0

This would remove the NFS time from the checkpoint time. However if you are 
using staging this may or may not reduce the application overhead significantly.

If you want to save to NFS, and it is globally mounted you could try setting 
the 'snapc_base_global_shared' parameter (deprecated in the trunk) which tells 
the system to use standard UNIX copy commands (i.e., cp) instead of the rsh 
varieties.

You might try changing the '--mca filem_rsh_max_incomming' parameter (default 
10) to increase or decrease the number of concurrent rcp/scp operations.

Something else to try is to look at the SnapC timing to pinpoint where the 
system is taking the most time:
  snapc_full_enable_timing=1

Dince you are using the C/R thread, it takes up some CPU cycles that may 
interfere with application performance. You can adjust the agressiveness of 
this thread by adjusting the 'opal_cr_thread_sleep_wait' parameter. In 1.5.0 it 
defaults to 0 microseconds, but on the trunk this has been adjusted to 1000 
microseconds. Try setting the parameter:
  opal_cr_thread_sleep_wait=1000

Depending on how much memory is required by CG.C and available on each node, 
you may be hitting a memory barrier that BLCR is struggling to overcome. What 
happens if you reduce the number of processes per node?

Those are some things to play around with to see what works best for your 
system and application. For a full list of parameters available in the C/R 
infrastructure see the link below:
  http://osl.iu.edu/research/ft/ompi-cr/api.php

-- Josh

On Aug 30, 2010, at 11:08 PM, 陈文浩 wrote:

> Dear OMPI Users,
>  
> I’m now using BLCR-0.8.2 and OpenMPI-1.5rc5. The problem is that it takes a 
> very long time to checkpoint.
>  
> BLCR configuration:
> ./onfigure --prefix=/opt/blcr --enable-static
> OpenMPi configuration:
> ./configure --prefix=/opt/ompi --with-ft=cr --with-blcr=/opt/blcr 
> --enable-static  --enable-ft-thread --enable-mpi-threads
>  
> Our blades use NFS. $HOME and /opt are shared.
>  
> In $HOME/.opnempi/mca-params.conf:
> crs_base_snapshot_dir=/tmp/
> snapc_base_global_snapshot_dir=/home/chenwh
> snapc_basee_store_in_place=0
>  
>  
> Now I run CG NPB (NPROCS=16, CLASS=C) on two nodes (blade02, blade04).
> With no checkpoint, 'Time in seconds' is about 100s. It's normal.
> But when I take a single checkpoint, 'Time in seconds' is up to 300s. The 
> overhead ratio is over 200%! WHY? How can I improve it?
>  
> blade02:~> ompi-checkpoint --status 27115
> [blade02:27130] [  0.00 /   0.25] Requested - ...
> [blade02:27130] [  0.00 /   0.25]   Pending - ...
> [blade02:27130] [  0.21 /   0.46]   Running - ...
> [blade02:27130] [221.25 / 221.71]  Finished - 
> ompi_global_snapshot_27115.ckpt
> Snapshot Ref.:   0 ompi_global_snapshot_27115.ckpt
>  
> As you see, it takes 200+ secconds to checkpoint. btw, what the former and 
> latter number represent in [ , ]?
>  
> Regards
>  
> Whchen
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey







[OMPI users] Checksuming in openmpi 1.4.1

2010-08-31 Thread Gilbert Grosdidier

Bonjour,

I'm not sure I understand how to trigger CHECKSUM use
inside of OpenMPI 1.4.1 (after digging in the FAQs, I got not explanations, 
sorry):

- Is checksuming activated by default and embedded automatically
within the Send/Recv pair mechanism, please ?
- If not, which MCA param(S) should I set to activate it ?
- Is there a time penalty for using it, please ?

Thanks in advance for any help.

--
Regards, Gilbert.




--
*-*
  Gilbert Grosdidier gilbert.grosdid...@in2p3.fr
  LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
  Facult??des Sciences, Bat. 200 Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
 -



[OMPI users] High Checkpoint Overhead Ratio

2010-08-31 Thread 陈文浩
Dear OMPI Users,

 

I’m now using BLCR-0.8.2 and OpenMPI-1.5rc5. The problem is that it takes a
very long time to checkpoint.

 

BLCR configuration:

./onfigure --prefix=/opt/blcr --enable-static

OpenMPi configuration:

./configure --prefix=/opt/ompi --with-ft=cr --with-blcr=/opt/blcr
--enable-static  --enable-ft-thread --enable-mpi-threads

 

Our blades use NFS. $HOME and /opt are shared.

 

In $HOME/.opnempi/mca-params.conf:

crs_base_snapshot_dir=/tmp/

snapc_base_global_snapshot_dir=/home/chenwh

snapc_basee_store_in_place=0

 

 

Now I run CG NPB (NPROCS=16, CLASS=C) on two nodes (blade02, blade04).

With no checkpoint, 'Time in seconds' is about 100s. It's normal.

But when I take a single checkpoint, 'Time in seconds' is up to 300s. The
overhead ratio is over 200%! WHY? How can I improve it?

 

blade02:~> ompi-checkpoint --status 27115

[blade02:27130] [  0.00 /   0.25] Requested - ...

[blade02:27130] [  0.00 /   0.25]   Pending - ...

[blade02:27130] [  0.21 /   0.46]   Running - ...

[blade02:27130] [221.25 / 221.71]  Finished -
ompi_global_snapshot_27115.ckpt

Snapshot Ref.:   0 ompi_global_snapshot_27115.ckpt

 

As you see, it takes 200+ secconds to checkpoint. btw, what the former and
latter number represent in [ , ]?

 

Regards

 

Whchen