[OMPI users] Could not execute the executable "/home/MET/hrm/bin/hostlist": Exec format error

2012-02-27 Thread Syed Ahsan Ali
Dear All,

I am running an application with mpirun but it gives following error, it is
not picking up hostlist, there are other applications which run well with
hostlist but it just gives following error with


 [pmdtest@pmd02 d00_dayfiles]$ tail -f *_hrm
mpirun -np  /home/MET/hrm/bin/hrm
--
Could not execute the executable "/home/MET/hrm/bin/hostlist": Exec format
error

This could mean that your PATH or executable name is wrong, or that you do
not
have the necessary permissions.  Please ensure that the executable is able
to be
found and executed.

--

Following the permission of the hostlist directory. Please help me to
remove this error.

 [pmdtest@pmd02 bin]$ ll
total 7570
-rwxrwxrwx 1 pmdtest pmdtest 2517815 Feb 16  2012 gme2hrm
-rwxrwxrwx 1 pmdtest pmdtest   0 Feb 16  2012 gme2hrm.map
*-rwxrwxrwx 1 pmdtest pmdtest 473 Jan 30  2012 hostlist*
-rwxrwxrwx 1 pmdtest pmdtest 5197698 Feb 16  2012 hrm
-rwxrwxrwx 1 pmdtest pmdtest   0 Dec 31  2010 hrm.map
-rwxrwxrwx 1 pmdtest pmdtest1680 Dec 31  2010 mpd.hosts


Thank you and Regards
Ahsan


Re: [OMPI users] OpenMPI at windows

2012-02-27 Thread Shiqing Fan

Hi Tal,

The released Windows binaries are built only for Microsoft VS compilers.

If you want to use gcc and g++, you have to build Open MPI by yourself 
using CMake.  Just select "MSYS Makefiles" as the generator, you can 
build binaries under MinGW. Please note that, this is only supported for 
32bit MinGW, and there may be run-time problems, as it is still 
experimental.



Regards,
Shiqing

On 2012-02-26 7:33 AM, Tal Regev wrote:

Hi all,
I am using windows 7 64 bit,
and i want to compile with mpicc and mpic++.
i download OpenMPI_v1.5.4-1_win32.exe 


and when i compile with it, it say i need lc.exe.
there is a way to config mpicc and mpic++ to use customize compiler?
I want that mpicc.exe will use mingw gcc -> gcc.exe
and mpic++ will use mingw g++ -> g++.exe.
can you help me what I need to do?

thx Tal.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
---
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de



[OMPI users] Problem running over IB with huge data set

2012-02-27 Thread Paul Kapinos

Hello Jeff, Ralph, All Open MPI folks,

We had an off-list discussion about an error in the Serpent program. Ralph said:

>We already have several tickets for that problem, each relating to a different 
scenario:

>https://svn.open-mpi.org/trac/ompi/ticket/2155
>https://svn.open-mpi.org/trac/ompi/ticket/2157
>https://svn.open-mpi.org/trac/ompi/ticket/2295

I've build a quite small reproducer for the original issue (with a huge memory 
footprint) and have send it to you.


The other week, another user got problemz if using huge data sets.

A program, which runs without any problem with smaller data sets (in order of 
24Gb data in total and smaller), got problem with huge data sets (in order of 
100Gb data in total and more),

_if running over infiniband or IPoIB_.

The program essentially hangs, mostly blocking the transport used. In some 
scenarios it crash.
The same program and data set run fine over ethernet or shared memory (yes, 
we've computers with 100ths of GB of memory). The behaviour is reproducible.


Diverse errors are produced, some of them are listed below. Another thing is 
that in the most cases, if the program hangs, it also blocks the transport, that 
is another programs cannot run over the same interface (just as reported earlier).


More fun: we also found some '#procs x #Nodes' combinations where the program 
run fine.


I.e.,
30 and 60 processes over 6 nodes run through fine,
6 procs over 6 nodes - killed with a error message (see below)
12,18,36,61,62,64,66 procs over 6 nodes - hangs and block the interface.

Well, we cannot give any warranty that that isn't a bug in the program itself, 
because it is just in development now. However, since the program works well for 
smaller sized data sets and over TCP and over ShMem, it smells like a MPI 
library error, thus this mail.


Or maybe the puzzling behaviour may be a follow-up of any bugs in the program 
itself? If yes, what it could be and how we could try no find it?


I did not attach a reproducer to this mail because the user do not want to 
spread the code all over the world, but can send it to you if you are interested 
in reproducing it. [The code is about matrix transpose of huge matrices and 
essentially calls MPI_Alltoallv, it is written a 'nice, well-structured' C++ 
code (nothing stays unwrapped) but is pretty small and readable].


Ralph, Jeff, anybody - any interest in reproducing this issue?

Best wishes,
Paul Kapinos


P.S. Open MPI 1.5.3 used - still waiting for 1.5.5 ;-)








Some error messages:

with 6 procs over 6 Nodes:
--
mlx4: local QP operation err (QPN 7c0063, WQE index 0, vendor syndrome 6f, 
opcode = 5e)
[[8771,1],5][btl_openib_component.c:3316:handle_wc] from 
linuxbdc07.rz.RWTH-Aachen.DE to: linuxbdc04 error polling LP CQ with status 
LOCAL QP OPERATION ERROR status number 2 for wr_id 6afb70 opcode 0  vendor error 
111 qp_idx 3
mlx4: local QP operation err (QPN 18005f, WQE index 0, vendor syndrome 6f, 
opcode = 5e)
[[8771,1],2][btl_openib_component.c:3316:handle_wc] from 
linuxbdc03.rz.RWTH-Aachen.DE to: linuxbdc02 error polling LP CQ with status 
LOCAL QP OPERATION ERROR status number 2 for wr_id 6afb70 opcode 0  vendor error 
111 qp_idx 3
[[8771,1],1][btl_openib_component.c:3316:handle_wc] from 
linuxbdc02.rz.RWTH-Aachen.DE to: linuxbdc01 error polling LP CQ with status 
LOCAL QP OPERATION ERROR status number 2 for wr_id 6afb70 opcode 0  vendor error 
111 qp_idx 3
mlx4: local QP operation err (QPN 340057, WQE index 0, vendor syndrome 6f, 
opcode = 5e)

--


with 61 processes using IPoIB:
mpiexec -mca btl ^openib -np 61 -host 1,2,3,4,5,6 a.out < dim100G.in
--
[linuxbdc02.rz.RWTH-Aachen.DE][[21403,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
connect() to 134.61.208.202 failed: Connection timed out (110)
[linuxbdc01.rz.RWTH-Aachen.DE][[21403,1],18][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
connect() to 134.61.208.203 failed: Connection timed out (110)
[linuxbdc01.rz.RWTH-Aachen.DE][[21403,1],18][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
connect() to 134.61.208.203 failed: Connection timed out (110)

--


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] ompi 1.5.5 mpicc linker problem

2012-02-27 Thread Ralph Castain
There was a missing CMR that hadn't been applied yet to the 1.5.5 release 
candidate - it has now been rectified, so please update.


On Feb 27, 2012, at 9:19 AM, Pinero, Pedro_jose wrote:

> Hi,
>  
> I have recently upgrade my open mpi version to v.1.5.5 and I have found a 
> problem with mpicc. When I try to compile my program, it returns the 
> following error at the linking step:
>  
> gcc: error: dummy: No such file or directory
> gcc: error: mt: No such file or directory
>  
> The same program compiles and links without problems with the previous 
> version (1.5.4). Does anyone know why I am getting these errors?
>  
> Regards
> Pedro
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] ompi 1.5.5 mpicc linker problem

2012-02-27 Thread Pinero, Pedro_jose
Hi,

 

I have recently upgrade my open mpi version to v.1.5.5 and I have found
a problem with mpicc. When I try to compile my program, it returns the
following error at the linking step:

 

gcc: error: dummy: No such file or directory

gcc: error: mt: No such file or directory

 

The same program compiles and links without problems with the previous
version (1.5.4). Does anyone know why I am getting these errors?

 

Regards

Pedro

 



[OMPI users] orted daemon no found! --- environment not passed to slave nodes?

2012-02-27 Thread yanyg
Greetings!

I have tried to run ring_c example test from a bash script. In this 
bash script, I setup PATH and LD_LIBRARY_PATH(I donot want to 
disturb ~/.bashrc, etc), then use a full path of mpirun to invoke mpi 
processes, the mpirun and orted are both on the PATH. However, 
from the Open MPI message, orted was not found, to me, it was 
not found only on slave nodes. Then I tried to set the --prefix or -x 
PATH -x LD_LIBRARY_PATH to hope these envars passed to 
slave nodes, but it turned out they are not forwarded to slave 
nodes. 

On the other hand, if I set the same PATH and 
LD_LIBRARY_PATH in ~/.bashrc which shared by all nodes, 
mpirun from bash script runs fine and orted could be found. This is 
easy to understand though, but I realy do not want to change 
~/.bashrc.

It seems the non-interactive bash shell does not pass envars to 
slave nodes. 

Any comments and solutions?

Thanks,
Yiguang



Re: [OMPI users] Environment variables [documentation]

2012-02-27 Thread Ralph Castain

On Feb 27, 2012, at 4:42 AM, Paul Kapinos wrote:

> Dear Open MPI developer,
> here:
> http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
> are enlisted four envvars Open MPI set for every process. We use they for 
> some scripting and thank you for providing they.
> 
> But simple "mpiexec -np 1 env | grep OMPI" brings lotz more enviers.

Yes, we set quite a few more, but those are intended solely for internal use 
and are not guaranteed. The list on the web site only identifies a set that are 
guaranteed to be provided.

> These are interesting for us:
> 
> 1) OMPI_COMM_WORLD_LOCAL_SIZE - seem to contain the number of processes which 
> are running on the specific node, see also
> http://www.open-mpi.org/community/lists/users/2008/07/6054.php
> 
> Is this envvar also "stable" as OMPI_COMM_WORLD_LOCAL_RANK is? (This would 
> make sense as it looks like the  OMPI_COMM_WORLD_SIZE, OMPI_COMM_WORLD_RANK 
> pair.)

Yes, and I'll add it to the page

> 
> If yes, maybe it also should be documented in the Wiki page.
> 
> 
> 
> 2) OMPI_COMM_WORLD_NODE_RANK - is that just a double of 
> OMPI_COMM_WORLD_LOCAL_RANK ?

No - the "local rank" is your rank on the node within your own job. The "node 
rank" is your rank on the node overall. The two differ when you do a 
comm_spawn. For example, suppose you have two ranks from your initial job on a 
node, and then comm_spawn three additional ranks. Their values would look like 
this:

job/rank   local_ranknode_rank
0/00   0
0/11   1
1/00   2
1/11   3
1/22   4

Again, I'll add it to the page

Thanks
Ralph


> 
> Best wishes,
> Paul Kapinos
> 
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Environment variables [documentation]

2012-02-27 Thread Paul Kapinos

Dear Open MPI developer,
here:
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
are enlisted four envvars Open MPI set for every process. We use they for some 
scripting and thank you for providing they.


But simple "mpiexec -np 1 env | grep OMPI" brings lotz more envvars. These are 
interesting for us:


1) OMPI_COMM_WORLD_LOCAL_SIZE - seem to contain the number of processes which 
are running on the specific node, see also

http://www.open-mpi.org/community/lists/users/2008/07/6054.php

Is this envvar also "stable" as OMPI_COMM_WORLD_LOCAL_RANK is? (This would make 
sense as it looks like the  OMPI_COMM_WORLD_SIZE, OMPI_COMM_WORLD_RANK pair.)


If yes, maybe it also should be documented in the Wiki page.



2) OMPI_COMM_WORLD_NODE_RANK - is that just a double of 
OMPI_COMM_WORLD_LOCAL_RANK ?


Best wishes,
Paul Kapinos



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] IMB-OpenMPI on Centos 6

2012-02-27 Thread Beat Rubischon
Hello!

On 27.02.12 08:10, Venkateswara Rao Dokku wrote:
> Our's is a customized OFED stack[Our own Driver specific library and
> Kernel drivers for the h/w], we use IMB tests for testing the same.

> All the tests [PingPong, Exchange.. etc] stalls after some time with
> no errors.

I found similar issues with a MLNX_OFED_LINUX-1.5.3 on top of RHEL6.2.
So far I found a note in the HP documentation [1] about a buggy mlx4
driver introduced in OFED 1.5.3. The workaround seems to help OpenMPI
1.5.x too, but still not perfect stable.

[1]
http://h10025.www1.hp.com/ewfrf/wc/document?cc=us=en=en_geoLoc=true=c03113904

HTH
Beat

-- 
 \|/   Beat Rubischon 
   ( 0-0 ) http://www.0x1b.ch/~beat/
oOO--(_)--OOo---
Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/


[OMPI users] IMB-OpenMPI on Centos 6

2012-02-27 Thread Venkateswara Rao Dokku
Hi,

We are facing a problem while running the IMB [Intel MPI Benchmark] tests
on Centos 6.0.
All the tests [PingPong, Exchange.. etc] stalls after some time with no
errors.

Introduction:
Our's is a customized OFED stack[Our own Driver specific library and Kernel
drivers for the h/w], we use IMB tests for testing the same.
We have already tested the same stack on RHEL5.4 and it was fine.

Observation:
Tests sends few packets and it is observed that acknowledgement for all
those packets are received. But no more Send Work Queue entries added for
the driver to process.
Test does not return at all, just stalls there after sending few packets.
Observed only in Centos 6/RHEL 6.

Versions of packages installed :
OpenMPI - 1.4.3
LibIbVerbs   - 1.1.4
LibIbUmad   - 1.3.6
IMB - 3.2.2

Please confirm if the versions are compatible with RHEL6. If not, Please
suggest the appropriate packages.

Please respond ASAP. Any help will be appreciated.



-- 
Thanks & Regards,
D.Venkateswara Rao,
Software Engineer,One Convergence Devices Pvt Ltd.,
Jubille Hills,Hyderabad.