Re: [OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood

On 11/06/07, Adrian Knoth  wrote:


What's the exact problem? compute-node -> frontend? I don't think you
have two processes on the frontend node, and even if you do, they should
use shared memory.


I stopped there being more than a single process on the frontend node
- this had no effect on the problem. The problem is that the processes
seem unable to communicate data to each other, although I can ssh
between machines with no problem ( I have set up passphraseless keys).



Use tcpdump and/or recompile with debug enabled. In addition, set
WANT_PEER_DUMP in ompi/mca/btl/tcp/btl_tcp_endpoint.c to 1 (line 120)
and recompile, thus giving you more debug output.

Depending on your OMPI version, you can also add

mpi_preconnect_all=1

to your ~/.openmpi/mca-params.conf, by this establishing all connections
during MPI_Init().


I can't use tcpdump as i don't have root access, but I have made the
change to btl_tcp_endpoint.c that you mention, rebuilt (make
distclean... ./configure --enable-debug) OpenMPI, rebuilt the
application against the new version of openMPI and re-ran the program.
This is the output I see (with -np 3, and only 1 slot on the
frontend):

[steinbeck.phys.ucl.ac.uk:08475] [0,0,0] setting up session dir with
[steinbeck.phys.ucl.ac.uk:08475]universe default-universe-8475
[steinbeck.phys.ucl.ac.uk:08475]user jgu
[steinbeck.phys.ucl.ac.uk:08475]host steinbeck.phys.ucl.ac.uk
[steinbeck.phys.ucl.ac.uk:08475]jobid 0
[steinbeck.phys.ucl.ac.uk:08475]procid 0
[steinbeck.phys.ucl.ac.uk:08475] procdir:
/tmp/openmpi-sessions-...@steinbeck.phys.ucl.ac.uk_0/default-universe-8475/0/0
[steinbeck.phys.ucl.ac.uk:08475] jobdir:
/tmp/openmpi-sessions-...@steinbeck.phys.ucl.ac.uk_0/default-universe-8475/0
[steinbeck.phys.ucl.ac.uk:08475] unidir:
/tmp/openmpi-sessions-...@steinbeck.phys.ucl.ac.uk_0/default-universe-8475
[steinbeck.phys.ucl.ac.uk:08475] top:
openmpi-sessions-...@steinbeck.phys.ucl.ac.uk_0
[steinbeck.phys.ucl.ac.uk:08475] tmp: /tmp
[steinbeck.phys.ucl.ac.uk:08475] [0,0,0] contact_file
/tmp/openmpi-sessions-...@steinbeck.phys.ucl.ac.uk_0/default-universe-8475/universe-s
etup.txt
[steinbeck.phys.ucl.ac.uk:08475] [0,0,0] wrote setup file
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: local csh: 0, local sh: 1
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: assuming same remote shell
as local shell
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: remote csh: 0, remote sh: 1
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: final template argv:
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: /usr/bin/ssh 
orted --debug --debug-daemons --bootproxy 1 --name  --num_p
rocs 3 --vpid_start 0 --nodename  --universe
j...@steinbeck.phys.ucl.ac.uk:default-universe-8475 --nsreplica
"0.0.0;tcp://128.40.5
.39:37256;tcp://192.168.1.1:37256" --gprreplica
"0.0.0;tcp://128.40.5.39:37256;tcp://192.168.1.1:37256"
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: launching on node frontend
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: frontend is a LOCAL node
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: changing to directory /homes/jgu
[steinbeck.phys.ucl.ac.uk:08475] pls:rsh: executing:
(/cluster/data/jgu/bin/orted) orted --debug --debug-daemons
--bootproxy 1 --name 0.0.1
--num_procs 3 --vpid_start 0 --nodename frontend --universe
j...@steinbeck.phys.ucl.ac.uk:default-universe-8475 --nsreplica
"0.0.0;tcp://12
8.40.5.39:37256;tcp://192.168.1.1:37256" --gprreplica
"0.0.0;tcp://128.40.5.39:37256;tcp://192.168.1.1:37256" --set-sid
[BIBINPUTS=.::/amp/
tex// NNTPSERVER=nntp-server.ucl.ac.uk SSH_AGENT_PID=8473
HOSTNAME=steinbeck.phys.ucl.ac.uk BSTINPUTS=.::/amp/tex// TERM=screen
SHELL=/bin/
bash HISTSIZE=1000 TMPDIR=/tmp SSH_CLIENT=128.40.5.249 55312 22
QTDIR=/usr/lib64/qt-3.3 SSH_TTY=/dev/pts/0 USER=jgu
LD_LIBRARY_PATH=:/clust
er/data/jgu/lib:/cluster/data/jgu/lib
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=0
1;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:
*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*
.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:
SSH_AUTH_SOCK=/tmp/ssh-KjHUoC8472/agent.8472 TERMCAP=SC|screen|VT 1
00/ANSI X3.64 virtual terminal:\
   :DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\
   :cd=\E[J:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:ct=\E[3g:\
   :do=^J:nd=\E[C:pt:rc=\E8:rs=\Ec:sc=\E7:st=\EH:up=\EM:\
   :le=^H:bl=^G:cr=^M:it#8:ho=\E[H:nw=\EE:ta=^I:is=\E)0:\
   :li#24:co#80:am:xn:xv:LP:sr=\EM:al=\E[L:AL=\E[%dL:\
   :cs=\E[%i%d;%dr:dl=\E[M:DL=\E[%dM:dc=\E[P:DC=\E[%dP:\
   :im=\E[4h:ei=\E[4l:mi:IC=\E[%d@:ks=\E[?1h\E=:\
   :ke=\E[?1l\E>:vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l:\
   :ti=\E[?1049h:te=\E[?1049l:us=\E[4m:ue=\E[24m:so=\E[3m:\
   :se=\E[23m:mb=\E[5m:md=\E[1m:mr=\E[7m:me=\E[m:ms:\
   :Co#8:pa#64:AF=\E[3%dm:AB=

Re: [OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood

Hi Adrian,

On 11/06/07, Adrian Knoth  wrote:

Which OMPI version?



1.2.2


> $ perl -e 'die$!=110'
> Connection timed out at -e line 1.

Looks pretty much like a routing issue. Can you sniff on eth1 on the
frontend node?



I don't have root access, so am afraid not.


> This error message occurs the first time one of the compute nodes,
> which are on a private network, attempts to send data to the frontend

> In actual fact, it seems that the error occurs the first time a
> process on the frontend tries to send data to another process on the
> frontend.

What's the exact problem? compute-node -> frontend? I don't think you
have two processes on the frontend node, and even if you do, they should
use shared memory.

> Any advice would be very welcome

Use tcpdump and/or recompile with debug enabled. In addition, set
WANT_PEER_DUMP in ompi/mca/btl/tcp/btl_tcp_endpoint.c to 1 (line 120)
and recompile, thus giving you more debug output.

Depending on your OMPI version, you can also add

mpi_preconnect_all=1

to your ~/.openmpi/mca-params.conf, by this establishing all connections
during MPI_Init().



OK, will try these things.


If nothing helps, exclude the frontend from computation.




OK.

Thanks for the suggestions!

Joanthan


Re: [OMPI users] TCP connection errors

2007-06-11 Thread Adrian Knoth
On Mon, Jun 11, 2007 at 10:55:17PM +0100, Jonathan Underwood wrote:

> Hi,

Hi!

> I am seeing problems with a small linux cluster when running OpenMPI
> jobs. The error message I get is:

Which OMPI version?

> $ perl -e 'die$!=110'
> Connection timed out at -e line 1.

Looks pretty much like a routing issue. Can you sniff on eth1 on the
frontend node?

> This error message occurs the first time one of the compute nodes,
> which are on a private network, attempts to send data to the frontend

> In actual fact, it seems that the error occurs the first time a
> process on the frontend tries to send data to another process on the
> frontend.

What's the exact problem? compute-node -> frontend? I don't think you
have two processes on the frontend node, and even if you do, they should
use shared memory.

> Any advice would be very welcome

Use tcpdump and/or recompile with debug enabled. In addition, set
WANT_PEER_DUMP in ompi/mca/btl/tcp/btl_tcp_endpoint.c to 1 (line 120)
and recompile, thus giving you more debug output.

Depending on your OMPI version, you can also add

mpi_preconnect_all=1

to your ~/.openmpi/mca-params.conf, by this establishing all connections
during MPI_Init().

If nothing helps, exclude the frontend from computation.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


[OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood

Hi,

I am seeing problems with a small linux cluster when running OpenMPI
jobs. The error message I get is:

[frontend][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=110

Following the FAQ, I looked to see what this error code corresponds to:

$ perl -e 'die$!=110'
Connection timed out at -e line 1.

This error message occurs the first time one of the compute nodes,
which are on a private network, attempts to send data to the frontend
(from where the job was started with mpirun).
In actual fact, it seems that the error occurs the first time a
process on the frontend tries to send data to another process on the
frontend.

I tried to play about with  things like --mca btl_tcp_if_exclude
lo,eth0, but that didn't help matters.  Nothing in the FAQ section on
TCP and routing actually seemed to help.


Any advice would be very welcome


The network configurations are:

a) frontend (2 network adapters, eth1 private for the cluster):

$ /sbin/ifconfig
eth0  Link encap:Ethernet  HWaddr 00:E0:81:30:A1:CE
 inet addr:128.40.5.39  Bcast:128.40.5.255  Mask:255.255.255.0
 inet6 addr: fe80::2e0:81ff:fe30:a1ce/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:3496038 errors:0 dropped:0 overruns:0 frame:0
 TX packets:2833685 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:500939570 (477.7 MiB)  TX bytes:671589665 (640.4 MiB)
 Interrupt:193

eth1  Link encap:Ethernet  HWaddr 00:E0:81:30:A1:CF
 inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
 inet6 addr: fe80::2e0:81ff:fe30:a1cf/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:2201778 errors:0 dropped:0 overruns:0 frame:0
 TX packets:2046572 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:188615778 (179.8 MiB)  TX bytes:247305804 (235.8 MiB)
 Interrupt:201

loLink encap:Local Loopback
 inet addr:127.0.0.1  Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING  MTU:16436  Metric:1
 RX packets:1528 errors:0 dropped:0 overruns:0 frame:0
 TX packets:1528 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:363101 (354.5 KiB)  TX bytes:363101 (354.5 KiB)



$ /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
192.168.1.0 *   255.255.255.0   U 0  00 eth1
128.40.5.0  *   255.255.255.0   U 0  00 eth0
default 128.40.5.2450.0.0.0 UG0  00 eth0



b) Compute nodes:

$ /sbin/ifconfig
eth0  Link encap:Ethernet  HWaddr 00:E0:81:30:A0:72
 inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
 inet6 addr: fe80::2e0:81ff:fe30:a072/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:189207 errors:0 dropped:0 overruns:0 frame:0
 TX packets:203507 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:23075241 (22.0 MiB)  TX bytes:17693363 (16.8 MiB)
 Interrupt:193

loLink encap:Local Loopback
 inet addr:127.0.0.1  Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING  MTU:16436  Metric:1
 RX packets:185 errors:0 dropped:0 overruns:0 frame:0
 TX packets:185 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:12644 (12.3 KiB)  TX bytes:12644 (12.3 KiB)


$ /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
192.168.1.0 *   255.255.255.0   U 0  00 eth0
default frontend.cluste 0.0.0.0 UG0  00 eth0

TIS
Jonathan


Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Kelley, Sean
Ralph,
 Thanks for the quick response, clarifications below.
  Sean



From: users-boun...@open-mpi.org on behalf of Ralph H Castain
Sent: Mon 6/11/2007 3:49 PM
To: Open MPI Users 
Subject: Re: [OMPI users] mpirun hanging when processes started on head node


Hi Sean

Could you please clarify something? I'm a little confused by your comments 
about where things are running. I'm assuming that you mean everything works 
fine if you type the mpirun command on the head node and just let it launch on 
your compute nodes - that the problems only occur when you specifically tell 
mpirun you want processes on the head node as well (or exclusively). Is that 
correct?

[Sean] This is correct.


There are several possible sources of trouble, if I have understood your 
situation correctly. Our bproc support is somewhat limited at the moment - you 
may be encountering one of those limits. We currently have bproc support 
focused on the configuration here at Los Alamos National Lab as (a) that is 
where the bproc-related developers are working, and (b) it is the only regular 
test environment we have to work with for bproc. We don't normally use bproc in 
combination with hostfiles, so I'm not sure if there is a problem in that 
combination. I can investigate that a little later this week.

[Sean] If it is helpful, running 'export NODES=-1; mpirun -np 1 hostname' 
exibits identical behaviour.

Similarly, we require that all the nodes being used must be accessible via the 
same launch environment. It sounds like we may be able to launch processes on 
your head node via rsh, but not necessarily bproc. You might check to ensure 
that the head node will allow bproc-based process launch (I know ours don't - 
all jobs are run solely on the compute nodes. I believe that is generally the 
case). We don't currently support mixed environments, and I honestly don't 
expect that to change anytime soon.


[Sean] I'm working through the strace output to follow the progression on the 
head node. It looks like mpirun consults '/bpfs/self' and determines that the 
request is to be run on the local machine so it fork/execs 'orted' which then 
runs 'hostname'. 'mpirun' didn't consult '/bpfs' or utilize 'rsh' after the 
determination to run on the local machine was made. When the 'hostname' command 
completes, 'orted' receives the SIGCHLD signal, performs some work and then 
both 'mpirun' and 'orted' go into what appears to be a poll() waiting for 
events.


Hope that helps at least a little.

[Sean] I appreciate the help. We are running processes on the head node because 
the head node is the only node which can access external resources (storage 
devices). 


Ralph





On 6/11/07 1:04 PM, "Kelley, Sean"  wrote:



I forgot to add that we are using 'bproc'. Launching processes on the 
compute nodes using bproc works well, I'm not sure if bproc is involved when 
processes are launched on the local node.

Sean




From: users-boun...@open-mpi.org on behalf of Kelley, Sean
Sent: Mon 6/11/2007 2:07 PM
To: us...@open-mpi.org
Subject: [OMPI users] mpirun hanging when processes started on head node

Hi,
  We are running the OFED 1.2rc4 distribution containing 
openmpi-1.2.2 on a RedhatEL4U4 system with Scyld Clusterware 4.1. The hardware 
configuration consists of a DELL 2950 as the headnode and 3 DELL 1950 blades as 
compute nodes using Cisco TopSpin Infiband HCAs and switches for the 
interconnect.

  When we use 'mpirun' from the OFED/Open MPI distribution to start 
processes on the compute nodes, everything works correctly. However, when we 
try to start processes on the head node, the processes appear to run correctly 
but 'mpirun' hangs and does not terminate until killed. The attached 'run1.tgz' 
file contains detailed information from running the following command:

 mpirun --hostfile hostfile1 --np 1 --byslot --debug-daemons -d 
hostname

where 'hostfile1' contains the following:

-1 slots=2 max_slots=2

The 'run.log' is the output of the above line. The 'strace.out.0' is 
the result of 'strace -f' on the mpirun process (and the 'hostname' child 
process since mpirun simply forks the local processes). The child process (pid 
23415 in this case) runs to completion and exits successfully. The parent 
process (mpirun) doesn't appear to recognize that the child has completed and 
hangs until killed (with a ^c). 

Additionally, when we run a set of processes which span the headnode 
and the compute nodes, the processes on the head node complete successfully, 
but the processes on the compute nodes do not appear to start. mpirun again 
appears to hang.

Do I have a configuration error or is there a problem that I have 
encountered? Thank you in advance fo

Re: [OMPI users] Open MPI issue with Iprobe

2007-06-11 Thread Galen Shipman
I think the problem is that we use MPI_STATUS_IGNORE in the C++  
bindings but don't check for it properly in mtl_mx_iprobe,


can you try applying this diff to ompi and having the user try again,  
we will also push this into the 1.2 branch.


- Galen


Index: ompi/mca/mtl/mx/mtl_mx_probe.c
===
--- ompi/mca/mtl/mx/mtl_mx_probe.c  (revision 14997)
+++ ompi/mca/mtl/mx/mtl_mx_probe.c  (working copy)
@@ -58,11 +58,12 @@
 }
 if (result) {
-status->MPI_ERROR = OMPI_SUCCESS;
-MX_GET_SRC(mx_status.match_info, status->MPI_SOURCE);
-MX_GET_TAG(mx_status.match_info, status->MPI_TAG);
-status->_count = mx_status.msg_length;
-
+if(MPI_STATUS_IGNORE != status) {
+status->MPI_ERROR = OMPI_SUCCESS;
+MX_GET_SRC(mx_status.match_info, status->MPI_SOURCE);
+MX_GET_TAG(mx_status.match_info, status->MPI_TAG);
+status->_count = mx_status.msg_length;
+}
 *flag = 1;
 } else {
 *flag = 0;




On Jun 11, 2007, at 12:55 PM, Corwell, Sophia wrote:


Hi,

We are seeing the following issue with Iprobe on our clusters running
openmpi-1.2.2. Here is the code and related information:

===
Modules currently loaded:

(sn31)/projects>module list

Currently Loaded Modulefiles:
  1) /opt/modules/oscar-modulefiles/default-manpath/1.0.1
  2) compilers/intel-9.1-f040-c045
  3) misc/env-openmpi-1.2
  4) mpi/openmpi-1.2.2_mx_intel-9.1-f040-c045
  5) libraries/intel-mkl

===

Source code:



(sn31)/projects/>more probeTest.cc

#include 
#include 

int main(int argc, char* argv[])
{
MPI::Init(argc, argv);

const int rank = MPI::COMM_WORLD.Get_rank();
const int size = MPI::COMM_WORLD.Get_size();
const int sendProc = (rank + size - 1) % size;
const int recvProc = (rank + 1) % size;
const int tag = 1;

// send an asynchronous message
const int sendVal = 1;
MPI::Request sendRequest =
MPI::COMM_WORLD.Isend(&sendVal, 1, MPI_INT, recvProc, tag);

// wait for message to arrive
while (!MPI::COMM_WORLD.Iprobe(sendProc, tag)) {}  // This line
causes problems

// Receive asynchronous message
int recvVal;
MPI::Request recvRequest =
MPI::COMM_WORLD.Irecv(&recvVal, 1, MPI_INT, sendProc, tag);
recvRequest.Wait();

MPI::Finalize();
}

===

Compiled with:


(sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi
-1.2.2_mx/bin/mpicxx
-I/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_m
x/include -g -c -o probeTest.o probeTest.cc

(sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi
-1.2.2_mx/bin/mpicxx -g -o probeTest
-L/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/lib
probeTest.o -lmpi

/projects/global/x86_64/compilers/intel/intel-9.1-cce-045/lib/ 
ibimf.so:

warning: warning: feupdateenv is not implemented and will always
fail



===

Error at runtime:



(sn31)/projects>mpiexec -n 1 ./probeTest [sn31:17616] *** Process
received signal *** [sn31:17616] Signal:
Segmentation fault (11) [sn31:17616] Signal code: Address not mapped
(1) [sn31:17616] Failing at address: 0x8 [sn31:17616] [ 0]
/lib64/tls/libpthread.so.0 [0x2a9665a4f0] [sn31:17616] [ 1]
/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
[0x2a9980b305]
[sn31:17616] [ 2]
/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
[0x2a995eb817]
[sn31:17616] [ 3]
/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
lib/libmpi.so.0(MPI_Iprobe+0xef)
[0x2a956d363f]
[sn31:17616] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a)
[0x4046aa][sn31:17616] [ 5] ./probeTest(main+0x147) [0x40480b]
[sn31:17616] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x2a967803fb]
[sn31:17616] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a)
[0x4038ca][sn31:17616] *** End of error message *** mpiexec noticed
that job rank 0 with PID 17616 on node sn31 exited

on

signal 11 (Segmentation fault).

(sn31)/projects/ceptre/sdpautz/NWCC/temp>mpiexec -n 2 ./probeTest
[sn31:17621] *** Process received signal *** [sn31:17620] ***

Process

received signal *** [sn31:17620] Signal: Segmentation fault (11)
[sn31:17620] Signal code: Address not mapped (1) [sn31:17620]
Failing at address: 0x8 [sn31:17620] [ 0] /lib64/tls/libpthread.so.0



[0x2a9665a4f0] [sn31:17620] [ 1]
/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
[0x2a9980b305]
[sn31:17620] [ 2]
/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
[0x2a995eb817]
[sn31:17620] [ 3]
/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
lib/libmpi.so.0(MPI_Iprobe+0xef)
[0x2a956d363f]
[sn31:17620] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a)
[0x4046aa][sn31:17620] [ 5] ./probeTest(main+0x147) [0x404

Re: [OMPI users] mixing MX and TCP

2007-06-11 Thread Reese Faucette

! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED,
!&nic_id, sizeof(nic_id),
 &value, sizeof(int))) != MX_SUCCESS ) 
{


yes, a NIC ID is required for this call because a host may have multiple 
NICs with different linespeeds, e.g. a 2G card and a 10G card.

-reese




Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Ralph H Castain
Hi Sean

Could you please clarify something? I¹m a little confused by your comments
about where things are running. I¹m assuming that you mean everything works
fine if you type the mpirun command on the head node and just let it launch
on your compute nodes ­ that the problems only occur when you specifically
tell mpirun you want processes on the head node as well (or exclusively). Is
that correct?

There are several possible sources of trouble, if I have understood your
situation correctly. Our bproc support is somewhat limited at the moment ­
you may be encountering one of those limits. We currently have bproc support
focused on the configuration here at Los Alamos National Lab as (a) that is
where the bproc-related developers are working, and (b) it is the only
regular test environment we have to work with for bproc. We don¹t normally
use bproc in combination with hostfiles, so I¹m not sure if there is a
problem in that combination. I can investigate that a little later this
week.

Similarly, we require that all the nodes being used must be accessible via
the same launch environment. It sounds like we may be able to launch
processes on your head node via rsh, but not necessarily bproc. You might
check to ensure that the head node will allow bproc-based process launch (I
know ours don¹t ­ all jobs are run solely on the compute nodes. I believe
that is generally the case). We don¹t currently support mixed environments,
and I honestly don¹t expect that to change anytime soon.

Hope that helps at least a little.
Ralph





On 6/11/07 1:04 PM, "Kelley, Sean"  wrote:

> I forgot to add that we are using 'bproc'. Launching processes on the compute
> nodes using bproc works well, I'm not sure if bproc is involved when processes
> are launched on the local node.
>  
> Sean
> 
> 
> From: users-boun...@open-mpi.org on behalf of Kelley, Sean
> Sent: Mon 6/11/2007 2:07 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] mpirun hanging when processes started on head node
> 
> Hi,
>   We are running the OFED 1.2rc4 distribution containing openmpi-1.2.2 on
> a RedhatEL4U4 system with Scyld Clusterware 4.1. The hardware configuration
> consists of a DELL 2950 as the headnode and 3 DELL 1950 blades as compute
> nodes using Cisco TopSpin Infiband HCAs and switches for the interconnect.
>  
>When we use 'mpirun' from the OFED/Open MPI distribution to start
> processes on the compute nodes, everything works correctly. However, when we
> try to start processes on the head node, the processes appear to run correctly
> but 'mpirun' hangs and does not terminate until killed. The attached
> 'run1.tgz' file contains detailed information from running the following
> command:
>  
>   mpirun --hostfile hostfile1 --np 1 --byslot --debug-daemons -d hostname
>  
> where 'hostfile1' contains the following:
>  
> -1 slots=2 max_slots=2
>  
> The 'run.log' is the output of the above line. The 'strace.out.0' is the
> result of 'strace -f' on the mpirun process (and the 'hostname' child process
> since mpirun simply forks the local processes). The child process (pid 23415
> in this case) runs to completion and exits successfully. The parent process
> (mpirun) doesn't appear to recognize that the child has completed and hangs
> until killed (with a ^c).
>  
> Additionally, when we run a set of processes which span the headnode and the
> compute nodes, the processes on the head node complete successfully, but the
> processes on the compute nodes do not appear to start. mpirun again appears to
> hang.
>  
> Do I have a configuration error or is there a problem that I have encountered?
> Thank you in advance for your assistance or suggestions
>  
> Sean
>  
> --
> Sean M. Kelley
> sean.kel...@solers.com
>  
>  
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Kelley, Sean
I forgot to add that we are using 'bproc'. Launching processes on the compute 
nodes using bproc works well, I'm not sure if bproc is involved when processes 
are launched on the local node.
 
Sean



From: users-boun...@open-mpi.org on behalf of Kelley, Sean
Sent: Mon 6/11/2007 2:07 PM
To: us...@open-mpi.org
Subject: [OMPI users] mpirun hanging when processes started on head node


Hi,
  We are running the OFED 1.2rc4 distribution containing openmpi-1.2.2 on a 
RedhatEL4U4 system with Scyld Clusterware 4.1. The hardware configuration 
consists of a DELL 2950 as the headnode and 3 DELL 1950 blades as compute nodes 
using Cisco TopSpin Infiband HCAs and switches for the interconnect.
 
   When we use 'mpirun' from the OFED/Open MPI distribution to start 
processes on the compute nodes, everything works correctly. However, when we 
try to start processes on the head node, the processes appear to run correctly 
but 'mpirun' hangs and does not terminate until killed. The attached 'run1.tgz' 
file contains detailed information from running the following command:
 
  mpirun --hostfile hostfile1 --np 1 --byslot --debug-daemons -d hostname
 
where 'hostfile1' contains the following:
 
-1 slots=2 max_slots=2
 
The 'run.log' is the output of the above line. The 'strace.out.0' is the result 
of 'strace -f' on the mpirun process (and the 'hostname' child process since 
mpirun simply forks the local processes). The child process (pid 23415 in this 
case) runs to completion and exits successfully. The parent process (mpirun) 
doesn't appear to recognize that the child has completed and hangs until killed 
(with a ^c). 
 
Additionally, when we run a set of processes which span the headnode and the 
compute nodes, the processes on the head node complete successfully, but the 
processes on the compute nodes do not appear to start. mpirun again appears to 
hang.
 
Do I have a configuration error or is there a problem that I have encountered? 
Thank you in advance for your assistance or suggestions
 
Sean
 
--
Sean M. Kelley
sean.kel...@solers.com
 
 


[OMPI users] Open MPI issue with Iprobe

2007-06-11 Thread Corwell, Sophia
Hi,

We are seeing the following issue with Iprobe on our clusters running
openmpi-1.2.2. Here is the code and related information:

===
Modules currently loaded:

(sn31)/projects>module list 
> > Currently Loaded Modulefiles:
> >   1) /opt/modules/oscar-modulefiles/default-manpath/1.0.1
> >   2) compilers/intel-9.1-f040-c045
> >   3) misc/env-openmpi-1.2
> >   4) mpi/openmpi-1.2.2_mx_intel-9.1-f040-c045
> >   5) libraries/intel-mkl
===

Source code:

> >
> > (sn31)/projects/>more probeTest.cc
> > 
> > #include 
> > #include 
> > 
> > int main(int argc, char* argv[])
> > {
> > MPI::Init(argc, argv);
> > 
> > const int rank = MPI::COMM_WORLD.Get_rank();
> > const int size = MPI::COMM_WORLD.Get_size();
> > const int sendProc = (rank + size - 1) % size;
> > const int recvProc = (rank + 1) % size;
> > const int tag = 1;
> > 
> > // send an asynchronous message
> > const int sendVal = 1;
> > MPI::Request sendRequest =
> > MPI::COMM_WORLD.Isend(&sendVal, 1, MPI_INT, recvProc, tag);
> > 
> > // wait for message to arrive
> > while (!MPI::COMM_WORLD.Iprobe(sendProc, tag)) {}  // This line 
> > causes problems
> > 
> > // Receive asynchronous message
> > int recvVal;
> > MPI::Request recvRequest =
> > MPI::COMM_WORLD.Irecv(&recvVal, 1, MPI_INT, sendProc, tag);
> > recvRequest.Wait();
> > 
> > MPI::Finalize();
> > }
===

Compiled with:

> > (sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi
> > -1.2.2_mx/bin/mpicxx
> > -I/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_m
> > x/include -g -c -o probeTest.o probeTest.cc
> >
> > (sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi
> > -1.2.2_mx/bin/mpicxx -g -o probeTest 
> > -L/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/lib
> > probeTest.o -lmpi
> >
/projects/global/x86_64/compilers/intel/intel-9.1-cce-045/lib/ibimf.so:
> > warning: warning: feupdateenv is not implemented and will always 
> > fail
> >

===

Error at runtime:

> >
> > (sn31)/projects>mpiexec -n 1 ./probeTest [sn31:17616] *** Process 
> > received signal *** [sn31:17616] Signal:
> > Segmentation fault (11) [sn31:17616] Signal code: Address not mapped
> > (1) [sn31:17616] Failing at address: 0x8 [sn31:17616] [ 0] 
> > /lib64/tls/libpthread.so.0 [0x2a9665a4f0] [sn31:17616] [ 1] 
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
> > [0x2a9980b305]
> > [sn31:17616] [ 2]
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
> > [0x2a995eb817]
> > [sn31:17616] [ 3]
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/libmpi.so.0(MPI_Iprobe+0xef)
> > [0x2a956d363f]
> > [sn31:17616] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a)
> > [0x4046aa][sn31:17616] [ 5] ./probeTest(main+0x147) [0x40480b] 
> > [sn31:17616] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> > [0x2a967803fb]
> > [sn31:17616] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a)
> > [0x4038ca][sn31:17616] *** End of error message *** mpiexec noticed 
> > that job rank 0 with PID 17616 on node sn31 exited
on
> > signal 11 (Segmentation fault).
> >
> > (sn31)/projects/ceptre/sdpautz/NWCC/temp>mpiexec -n 2 ./probeTest 
> > [sn31:17621] *** Process received signal *** [sn31:17620] ***
Process
> > received signal *** [sn31:17620] Signal: Segmentation fault (11) 
> > [sn31:17620] Signal code: Address not mapped (1) [sn31:17620] 
> > Failing at address: 0x8 [sn31:17620] [ 0] /lib64/tls/libpthread.so.0

> > [0x2a9665a4f0] [sn31:17620] [ 1] 
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
> > [0x2a9980b305]
> > [sn31:17620] [ 2]
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
> > [0x2a995eb817]
> > [sn31:17620] [ 3]
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/libmpi.so.0(MPI_Iprobe+0xef)
> > [0x2a956d363f]
> > [sn31:17620] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a)
> > [0x4046aa][sn31:17620] [ 5] ./probeTest(main+0x147) [0x40480b] 
> > [sn31:17620] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> > [0x2a967803fb]
> > [sn31:17620] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a)
> > [0x4038ca][sn31:17620] *** End of error message *** [sn31:17621]
> > Signal: Segmentation fault (11) [sn31:17621] Signal code: Address 
> > not mapped (1) [sn31:17621] Failing at address: 0x8 [sn31:17621] [ 
> > 0] /lib64/tls/libpthread.so.0 [0x2a9665a4f0] [sn31:17621] [ 1] 
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
> > [0x2a9980b305]
> > [sn31:17621] [ 2]
> > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> > lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
> > [0x2a995eb817]
> > [sn31:1762

[OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Kelley, Sean
Hi,
  We are running the OFED 1.2rc4 distribution containing openmpi-1.2.2 on a 
RedhatEL4U4 system with Scyld Clusterware 4.1. The hardware configuration 
consists of a DELL 2950 as the headnode and 3 DELL 1950 blades as compute nodes 
using Cisco TopSpin Infiband HCAs and switches for the interconnect.
 
   When we use 'mpirun' from the OFED/Open MPI distribution to start 
processes on the compute nodes, everything works correctly. However, when we 
try to start processes on the head node, the processes appear to run correctly 
but 'mpirun' hangs and does not terminate until killed. The attached 'run1.tgz' 
file contains detailed information from running the following command:
 
  mpirun --hostfile hostfile1 --np 1 --byslot --debug-daemons -d hostname
 
where 'hostfile1' contains the following:
 
-1 slots=2 max_slots=2
 
The 'run.log' is the output of the above line. The 'strace.out.0' is the result 
of 'strace -f' on the mpirun process (and the 'hostname' child process since 
mpirun simply forks the local processes). The child process (pid 23415 in this 
case) runs to completion and exits successfully. The parent process (mpirun) 
doesn't appear to recognize that the child has completed and hangs until killed 
(with a ^c). 
 
Additionally, when we run a set of processes which span the headnode and the 
compute nodes, the processes on the head node complete successfully, but the 
processes on the compute nodes do not appear to start. mpirun again appears to 
hang.
 
Do I have a configuration error or is there a problem that I have encountered? 
Thank you in advance for your assistance or suggestions
 
Sean
 
--
Sean M. Kelley
sean.kel...@solers.com
 
 


run1.tgz
Description: run1.tgz


[OMPI users] f90 support not built with gfortran?

2007-06-11 Thread Jeff Pummill

Greetings all,

I downloaded and configured v1.2.2 this morning on an Opteron cluster 
using the following configure directives...


./configure --prefix=/share/apps CC=gcc CXX=g++ F77=g77 FC=gfortran 
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64


Compilation seemed to go OK and there IS an mpif90 option in 
/bin..but it gives me the following error when I try to compile my 
source file:


/share/apps/bin/mpif90 -c -I/share/apps/include -O3 ft.f
Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional.

I am certain that gfortran is installed and working correctly as I 
tested compilation of a small piece of serial code with it.


Something I am doing wrong?

--
Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701



Re: [OMPI users] mixing MX and TCP

2007-06-11 Thread George Bosilca
It's about using multiple network interfaces to exchange messages  
between a pair of hosts. The networks can be identical or not.


  george.

On Jun 9, 2007, at 8:19 PM, Alex Tumanov wrote:


forgive a trivial question, but what's a multi-rail?

On 6/8/07, George Bosilca  wrote:

A fix for this problem is now available on the trunk. Please use any
revision after 14963 and your problem will vanish [I hope!]. There
are now some additional parameters which allow you to select which
Myrinet network you want to use in the case there are several
available (--mca btl_mx_if_include and --mca btl_mx_if_exclude). Even
multi-rails should now work over MX.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


[OMPI users] rdma over tcp?

2007-06-11 Thread Brock Palen

With openmpi-1.2.0

i ran a:  ompi_info --param btl tcp

and i see reference to:

MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072")
MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647")

Can TCP support RDMA?  I thought you needed fancy hardware to get  
such support?  Light on this subject is highly appreciated.


 Also if a user using ethernet, is trying to up the limit for  
'greedy'  messages that the btl_tcp_eager_limit?  Is there a problem  
increasing its size?  We will test it with his app of-course,  but  
was wondering if there was a 'gotcha'  I was going to walk into.

Thanks

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985




Re: [OMPI users] mixing MX and TCP

2007-06-11 Thread Kees Verstoep

George Bosilca wrote:

A fix for this problem is now available on the trunk. Please use any 
revision after 14963 and your problem will vanish [I hope!]. There are 
now some additional parameters which allow you to select which Myrinet 
network you want to use in the case there are several available (--mca 
btl_mx_if_include and --mca btl_mx_if_exclude). Even multi-rails should 
now work over MX.


I have tried nightly snapshot openmpi-1.3a1r14981 and it (almost)
seems to work.  The version as is, when run in combination with
MX-1.2.0j and the FMA mapper, currently results in the following
error on each node:

mx_get_info(MX_LINE_SPEED) failed with status 35 (Bad info length)

However, with the small patch below, multi-cluster jobs indeed seem
to be running fine (using MX locally). I'll do some more testing
later this week.

Thanks a lot for the fix!
Kees


*** ./ompi/mca/btl/mx/btl_mx_component.c.orig   2007-06-11 17:12:11.0 
+0200
--- ./ompi/mca/btl/mx/btl_mx_component.c2007-06-11 17:13:34.0 
+0200
***
*** 310,316 
  #if defined(MX_HAS_NET_TYPE)
  {
  int value;
! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, NULL, 
0,
 &value, sizeof(int))) != MX_SUCCESS ) {
  opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d 
(%s)\n",

   status, mx_strerror(status) );
--- 310,317 
  #if defined(MX_HAS_NET_TYPE)
  {
  int value;
! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED,
!&nic_id, sizeof(nic_id),
 &value, sizeof(int))) != MX_SUCCESS ) {
  opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d 
(%s)\n",

   status, mx_strerror(status) );



Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread victor marian
Hi Don,

The first time I ran the program I am working on. It
is perfectly scallable and on 20 processors it ran on
27 seconds (on two processors on 300 seconds).
The I had the curiosity to run it on a pentium D. It
ran in 30 senconds on a single core. On two cores it
ran on 37 seconds (I think something is wrong in the
mixture Solaris+Sunstudio+OpenMpi+Pentium).

I ran then the NAS benchmarks (CG benchmark CLASS A or
B) and the benchmark took 16 
 times longer on a SparcII Processor than on my
Pentium D(on one core).
So everything leads to the same conclusion.
I would be curious what such benchmark would give
between a Pentium D and the last generation Sparc
processor. 

 Victor


--- Don Kerr  wrote:

> Victor,
> 
> Obviously there are many variables involved with
> getting the best 
> performance out of a machine and understanding the 2
> environments you 
> are comparing would be necessary as well as the job.
> I would not be able 
> to get my hands on another E10K for validation or
> projecting possible 
> gains myself. If your university is looking to
> expand maybe gettting a 
> sales engineer involved for a proper analysis and
> proposal would be of 
> interest.
> 
> I am curious what benchmark you are using to compare
> the two platforms 
> though.
> 
> -DON 
> 
> victor marian wrote:
> 
> >Hi Don,
> >
> >Seeing your mail, I suppose you are working at Sun.
> We
> >
> >have a Sun 1 Server at our university, and my
> >program runs almost as fast on 16 UltraSparc2
> >processors as on a pentium D.The program is
> perfectly
> >scallable. I am a little bit dissapointed. Our
> Sparc
> >II are at 400MHz, and the Pentium D at 2.8GHz. I
> could
> >expect that the pentium is 4 time faster, but not
> 16
> >times.
> >I wonder how a Sparc IV would perform.
> >
> > Victor
> >
> >  
> >
> >--- Don Kerr  wrote:
> >
> >  
> >
> >>Additionally, Solaris comes with the IB drivers
> and
> >>since the libs are 
> >>there OMPI thinks that it is available. You can
> >>suppress this message with 
> >>--mca btl_base_warn_component_unused 0
> >>or specifically call out the btls you wish to use,
> >>example
> >>--mca btl self,sm,tcp
> >>
> >>Brock Palen wrote:
> >>
> >>
> >>
> >>>It means that your OMPI was compiled to support
> >>>  
> >>>
> >>uDAPL  (a type of  
> >>
> >>
> >>>infinibad network)  but that your computer does
> not
> >>>  
> >>>
> >>have such a card  
> >>
> >>
> >>>installed.  Because you dont it will fall back to
> >>>  
> >>>
> >>ethernet.  But  
> >>
> >>
> >>>because you are just running on a single machine.
> 
> >>>  
> >>>
> >>You will use the  
> >>
> >>
> >>>fastest form of communication using shared
> memory. 
> >>>  
> >>>
> >>so you can ignore  
> >>
> >>
> >>>that message.  Unless in the future you add a
> uDAPL
> >>>  
> >>>
> >>powered network  
> >>
> >>
> >>>and you still get that message then you need to
> >>>  
> >>>
> >>worry.
> >>
> >>
> >>>Brock Palen
> >>>Center for Advanced Computing
> >>>bro...@umich.edu
> >>>(734)936-1985
> >>>
> >>>
> >>>On Jun 10, 2007, at 9:18 AM, victor marian wrote:
> >>>
> >>> 
> >>>
> >>>  
> >>>
> Hello,
> 
> I have a Pentium D computer with Solaris 10
> 
> 
> >>installed.
> >>
> >>
> I installed OpenMPI, succesfully compiled my
> 
> 
> >>Fortran
> >>
> >>
> program, but when giving
> mpirun -np 2 progexe
> I receive
> [0,1,0]: uDAPL on host SERVSOLARIS was unable to
> 
> 
> >>find
> >>
> >>
> any NICs.
> Another transport will be used instead, although
> 
> 
> >>this
> >>
> >>
> may result in
> lower performance.
> 
> I am a begginer in MPI and don't know what it
> 
> 
> >>means.
> >>
> >>
> What
> should I do to solve the problem?
> Thank you.
> 
> 
> 
> 
> 
> 
> 
> 
>
>>>__
> >>>  
> >>>
> __
> Moody friends. Drama queens. Your life? Nope! -
> 
> 
> >>their life, your  
> >>
> >>
> story. Play Sims Stories at Yahoo! Games.
> http://sims.yahoo.com/
> ___
> users mailing list
> us...@open-mpi.org
>
http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
>    
> 
> 
> 
> >>>___
> >>>users mailing list
> >>>us...@open-mpi.org
>
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> 
> >>>
> >>>  
> 
=== message truncated ===


Victor MARIAN
Department of Machine Elements and Tribology
University Politehnica of Bucharest
Spl. Indepentendei 313
060042 Bucharest
ROMANIA




Get the free Yahoo! toolbar 

Re: [OMPI users] Library Definitions

2007-06-11 Thread Brock Palen
Yes,  we find its best to let users benchmark their code (if they  
have it already)  Or a code that uses similar algorithms.  And then  
have the user run on some machines we set aside.


While we are on the benchmark topic,  Users might be interested, we  
just installed a new set of Opteron 2220se's,  We used HPL with GOTO  
blas and on 58 machines (232 cpus)  achieved 1.099 Tflop,  (85% of  
theory)
On one node using 4 cpus (duel core duel socket)  I could only get  
88% so for a machine that had __no tuning__ of the IB network or the  
sysctl,  We were very happy.


Boy i love that compile one run on any network of Openmpi.

Info:

OS:  RHEL4
Compiler:  pgi/6.2
mpi:openmpi/1.2.0
BLAS:  GOTO-1.15
Cisco Topspin infiniband using openIB provided by redhat.

Thanks for all the help list :-)

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Jun 11, 2007, at 9:06 AM, Jeff Pummill wrote:


Glad to contribute Victor!

I am running on a home workstation that uses an AMD 3800 cpu  
attached to 2 gigs of ram.
My timings for FT were 175 secs with one core and 110 on two cores  
with -O3 and -mtune=amd64 as tuning options.


Brock, Terry and Jeff are all exactly correct in their comments  
regarding benchmarks. There are simply too many variables to  
contend with. In addition, one and two core runs on a single  
workstation probably isn't the best evaluation of OpenMPI. As you  
expand to more devices and generate bigger problems (HPL or HPCC  
for example), a better overall picture will emerge.



Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas



victor marian wrote:

  Thank you everybody for the advices.
  I ran the NAS benchmark class B and it runs in 181
seconds on one core and in 90 seconds on two cores, so
it scales almost perfectly.
  What were your timings, Jeff, and what processor do
you exactly have?
  Mine is a Pentium D at 2.8GHz.

 Victor


--- Jeff Pummill  wrote:



Victor,

Build the FT benchmark and build it as a class B
problem. This will run
in the 1-2 minute range instead of 2-4 seconds the
CG class A benchmark
does.


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas



Terry Frankcombe wrote:


Hi Victor

I'd suggest 3 seconds of CPU time is far, far to


small a problem to do


scaling tests with.  Even with only 2 CPUs, I


wouldn't go below 100


times that.


On Mon, 2007-06-11 at 01:10 -0700, victor marian


wrote:




Hi Jeff

I ran the NAS Parallel Bechmark and it gives for


me


-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$


mpirun -np 1 cg.A.1


- 
-



[0,1,0]: uDAPL on host SERVSOLARIS was unable to


find


any NICs.
Another transport will be used instead, although


this


may result in
lower performance.


- 
-



 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 1
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02
 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.512264003323E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 3.02
 Total processes =1
 Compiled procs  =1
 Mop/s total =   495.93
 Mop/s/process   =   495.93
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007





-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$


mpirun -np 2 cg.A.2


- 
-



[0,1,0]: uDAPL on host SERVSOLARIS was unable to


find


any NICs.
Another transport will be used instead, although


this


may result in
lower performance.


- 
-


- 
-



[0,1,1]: uDAPL on host SERVSOLARIS was unable to


find


any NICs.
Another transport will be used instead, although


this


may result in
lower performance.


- 
-



 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 2
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02

 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.522633719989E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations

Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread Don Kerr

Victor,

Obviously there are many variables involved with getting the best 
performance out of a machine and understanding the 2 environments you 
are comparing would be necessary as well as the job. I would not be able 
to get my hands on another E10K for validation or projecting possible 
gains myself. If your university is looking to expand maybe gettting a 
sales engineer involved for a proper analysis and proposal would be of 
interest.


I am curious what benchmark you are using to compare the two platforms 
though.


-DON 


victor marian wrote:


Hi Don,

Seeing your mail, I suppose you are working at Sun. We

have a Sun 1 Server at our university, and my
program runs almost as fast on 16 UltraSparc2
processors as on a pentium D.The program is perfectly
scallable. I am a little bit dissapointed. Our Sparc
II are at 400MHz, and the Pentium D at 2.8GHz. I could
expect that the pentium is 4 time faster, but not 16
times.
I wonder how a Sparc IV would perform.

Victor

 


--- Don Kerr  wrote:

 


Additionally, Solaris comes with the IB drivers and
since the libs are 
there OMPI thinks that it is available. You can
suppress this message with 
   --mca btl_base_warn_component_unused 0

or specifically call out the btls you wish to use,
example
   --mca btl self,sm,tcp

Brock Palen wrote:

   


It means that your OMPI was compiled to support
 

uDAPL  (a type of  
   


infinibad network)  but that your computer does not
 

have such a card  
   


installed.  Because you dont it will fall back to
 

ethernet.  But  
   

because you are just running on a single machine. 
 

You will use the  
   

fastest form of communication using shared memory. 
 

so you can ignore  
   


that message.  Unless in the future you add a uDAPL
 

powered network  
   


and you still get that message then you need to
 


worry.
   


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Jun 10, 2007, at 9:18 AM, victor marian wrote:



 


Hello,

I have a Pentium D computer with Solaris 10
   


installed.
   


I installed OpenMPI, succesfully compiled my
   


Fortran
   


program, but when giving
mpirun -np 2 progexe
I receive
[0,1,0]: uDAPL on host SERVSOLARIS was unable to
   


find
   


any NICs.
Another transport will be used instead, although
   


this
   


may result in
lower performance.

I am a begginer in MPI and don't know what it
   


means.
   


What
should I do to solve the problem?
Thank you.






   


__
 


__
Moody friends. Drama queens. Your life? Nope! -
   

their life, your  
   


story. Play Sims Stories at Yahoo! Games.
http://sims.yahoo.com/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


  

   


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

   







Don't get soaked.  Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 



Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread victor marian
Hi Don,

But as I see you must pay for these debuggers.

   Victor

--- Don Kerr  wrote:

> Victor,
> 
> You are right Prism will not work with Open MPI
> which Sun's ClusterTools 
> 7 is based on.  But Prism was not available for CT 6
> either. Totalview 
> and Allinea's ddt I believe have both been tested to
> work with Open MPI.
> 
> -DON
> 
> victor marian wrote:
> 
> >  I can't turn it off right now to look in BIOS
> (the
> >computer is not at home), but I think the Pentium D
> >which is dual-core doesn't support hyper-threading.
> 
> >  The program I made relies on an MPI library (it
> is
> >not a benchmarking program). I think you are right,
> >maibe I should run a benchmarking program first to
> see
> >what happens. If you have a benchmarking program I
> >would gladly test it. 
> >   What is the best way to debug OpenMPI programs?
> >Until now I ran prism which is part of the
> >SunClusterTools.
> >
> >  Victor
> >
> >--- Jeff Pummill  wrote:
> >
> >  
> >
> >>Victor,
> >>
> >>Just on a hunch, look in your BIOS to see if
> >>Hyperthreading is turned 
> >>on. If so, turn it off. We have seen some unusual
> >>behavior on some of 
> >>our machines unless this is disabled.
> >>
> >>I am interested in your progress as I have just
> >>begun working with 
> >>OpenMPI as well. I have used mpich for quite some
> >>time, but felt 
> >>compelled to get some experience with OpenMPI as
> >>well. I just installed 
> >>it this weekend on an AMD dual-core machine with 2
> >>gigs of ram. Maybe I 
> >>will try and replicate your experiment if you can
> >>direct me to what 
> >>program you are benchmarking.
> >>
> >>Jeff F. Pummill
> >>Senior Linux Cluster Administrator
> >>University of Arkansas
> >>Fayetteville, Arkansas 72701
> >>(479) 575 - 4590
> >>http://hpc.uark.edu
> >>
> >>victor marian wrote:
> >>
> >>
> >>>The problem is that my executable file runs on
> the
> >>>Pentium D in 80 seconds on two cores and in 25
> >>>  
> >>>
> >>seconds
> >>
> >>
> >>>on one core.
> >>>And on another Sun SMP machine with 20 processors
> >>>  
> >>>
> >>it
> >>
> >>
> >>>runs perfectly (the problem is perfectly
> >>>  
> >>>
> >>scallable).
> >>
> >>
> >>>  Victor Marian
> >>>  Laboratory of Machine Elements and
> Tribology
> >>>  University Politehnica of Bucharest
> >>>  Romania
> >>>
> >>>
> >>>--- Brock Palen  wrote:
> >>>
> >>>  
> >>>  
> >>>
> It means that your OMPI was compiled to support
> uDAPL  (a type of  
> infinibad network)  but that your computer does
> 
> 
> >>not
> >>
> >>
> have such a card  
> installed.  Because you dont it will fall back
> to
> ethernet.  But  
> because you are just running on a single
> machine.
> 
> 
> You will use the  
> fastest form of communication using shared
> 
> 
> >>memory. 
> >>
> >>
> so you can ignore  
> that message.  Unless in the future you add a
> 
> 
> >>uDAPL
> >>
> >>
> powered network  
> and you still get that message then you need to
> worry.
> 
> Brock Palen
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> On Jun 10, 2007, at 9:18 AM, victor marian
> wrote:
> 
> 
> 
> 
> >Hello,
> >
> >I have a Pentium D computer with Solaris 10
> >  
> >  
> >
> installed.
> 
> 
> 
> >I installed OpenMPI, succesfully compiled my
> >  
> >  
> >
> Fortran
> 
> 
> 
> >program, but when giving
> >mpirun -np 2 progexe
> >I receive
> >[0,1,0]: uDAPL on host SERVSOLARIS was unable
> to
> >  
> >  
> >
> find
> 
> 
> 
> >any NICs.
> >Another transport will be used instead,
> although
> >  
> >  
> >
> this
> 
> 
> 
> >may result in
> >lower performance.
> >
> >I am a begginer in MPI and don't know what it
> >  
> >  
> >
> means.
> 
> 
> 
> >What
> > should I do to solve the problem?
> >Thank you.
> >
> >
> >
> 
=== message truncated ===





Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, 
photos & more. 
http://mobile.yahoo.com/go?refer=1GNXIC


Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread Don Kerr

Victor,

You are right Prism will not work with Open MPI which Sun's ClusterTools 
7 is based on.  But Prism was not available for CT 6 either. Totalview 
and Allinea's ddt I believe have both been tested to work with Open MPI.


-DON

victor marian wrote:


 I can't turn it off right now to look in BIOS (the
computer is not at home), but I think the Pentium D
which is dual-core doesn't support hyper-threading. 
 The program I made relies on an MPI library (it is

not a benchmarking program). I think you are right,
maibe I should run a benchmarking program first to see
what happens. If you have a benchmarking program I
would gladly test it. 
  What is the best way to debug OpenMPI programs?

Until now I ran prism which is part of the
SunClusterTools.

 Victor

--- Jeff Pummill  wrote:

 


Victor,

Just on a hunch, look in your BIOS to see if
Hyperthreading is turned 
on. If so, turn it off. We have seen some unusual
behavior on some of 
our machines unless this is disabled.


I am interested in your progress as I have just
begun working with 
OpenMPI as well. I have used mpich for quite some
time, but felt 
compelled to get some experience with OpenMPI as
well. I just installed 
it this weekend on an AMD dual-core machine with 2
gigs of ram. Maybe I 
will try and replicate your experiment if you can
direct me to what 
program you are benchmarking.


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

victor marian wrote:
   


The problem is that my executable file runs on the
Pentium D in 80 seconds on two cores and in 25
 


seconds
   


on one core.
And on another Sun SMP machine with 20 processors
 


it
   


runs perfectly (the problem is perfectly
 


scallable).
   


 Victor Marian
 Laboratory of Machine Elements and Tribology
 University Politehnica of Bucharest
 Romania


--- Brock Palen  wrote:

 
 


It means that your OMPI was compiled to support
uDAPL  (a type of  
infinibad network)  but that your computer does
   


not
   

have such a card  
installed.  Because you dont it will fall back to
ethernet.  But  
because you are just running on a single machine.
   

You will use the  
fastest form of communication using shared
   

memory. 
   

so you can ignore  
that message.  Unless in the future you add a
   


uDAPL
   

powered network  
and you still get that message then you need to

worry.

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Jun 10, 2007, at 9:18 AM, victor marian wrote:

   
   


Hello,

I have a Pentium D computer with Solaris 10
 
 


installed.
   
   


I installed OpenMPI, succesfully compiled my
 
 


Fortran
   
   


program, but when giving
mpirun -np 2 progexe
I receive
[0,1,0]: uDAPL on host SERVSOLARIS was unable to
 
 


find
   
   


any NICs.
Another transport will be used instead, although
 
 


this
   
   


may result in
lower performance.

I am a begginer in MPI and don't know what it
 
 


means.
   
   


What
should I do to solve the problem?
Thank you.







 
 


__
 

 
 


__
Moody friends. Drama queens. Your life? Nope! -
 
 

their life, your  
   
   


story. Play Sims Stories at Yahoo! Games.
http://sims.yahoo.com/
___
users mailing list
us...@open-mpi.org

 


http://www.open-mpi.org/mailman/listinfo.cgi/users
   

 
 


___
users mailing list
us...@open-mpi.org

   


http://www.open-mpi.org/mailman/listinfo.cgi/users
   

   
   



  

 



 


Moody friends. Drama queens. Your life? Nope! -
 


their life, your story. Play Sims Stories at Yahoo!
Games.
   

http://sims.yahoo.com/  
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 


--

   


___
 


users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
   





 

Luggage? GPS? Comic books? 
Check out fitting gifts for grads at Yahoo! Search

http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 



Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread victor marian

Hi Don,

Seeing your mail, I suppose you are working at Sun. We

have a Sun 1 Server at our university, and my
program runs almost as fast on 16 UltraSparc2
processors as on a pentium D.The program is perfectly
scallable. I am a little bit dissapointed. Our Sparc
II are at 400MHz, and the Pentium D at 2.8GHz. I could
expect that the pentium is 4 time faster, but not 16
times.
I wonder how a Sparc IV would perform.

 Victor



--- Don Kerr  wrote:

> Additionally, Solaris comes with the IB drivers and
> since the libs are 
> there OMPI thinks that it is available. You can
> suppress this message with 
> --mca btl_base_warn_component_unused 0
> or specifically call out the btls you wish to use,
> example
> --mca btl self,sm,tcp
> 
> Brock Palen wrote:
> 
> >It means that your OMPI was compiled to support
> uDAPL  (a type of  
> >infinibad network)  but that your computer does not
> have such a card  
> >installed.  Because you dont it will fall back to
> ethernet.  But  
> >because you are just running on a single machine. 
> You will use the  
> >fastest form of communication using shared memory. 
> so you can ignore  
> >that message.  Unless in the future you add a uDAPL
> powered network  
> >and you still get that message then you need to
> worry.
> >
> >Brock Palen
> >Center for Advanced Computing
> >bro...@umich.edu
> >(734)936-1985
> >
> >
> >On Jun 10, 2007, at 9:18 AM, victor marian wrote:
> >
> >  
> >
> >>Hello,
> >>
> >>I have a Pentium D computer with Solaris 10
> installed.
> >>I installed OpenMPI, succesfully compiled my
> Fortran
> >>program, but when giving
> >>mpirun -np 2 progexe
> >>I receive
> >>[0,1,0]: uDAPL on host SERVSOLARIS was unable to
> find
> >>any NICs.
> >>Another transport will be used instead, although
> this
> >>may result in
> >>lower performance.
> >>
> >>I am a begginer in MPI and don't know what it
> means.
> >>What
> >> should I do to solve the problem?
> >>Thank you.
> >>
> >>
> >>
> >>
> >>
> >>
>
>>__
> 
> >>__
> >>Moody friends. Drama queens. Your life? Nope! -
> their life, your  
> >>story. Play Sims Stories at Yahoo! Games.
> >>http://sims.yahoo.com/
> >>___
> >>users mailing list
> >>us...@open-mpi.org
> >>http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >>
> >>
> >
> >___
> >users mailing list
> >us...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> >  
> >
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 





Don't get soaked.  Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather


Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-11 Thread Jeff Squyres

On Jun 11, 2007, at 8:55 AM, Cupp, Matthew R wrote:


Ah ha!  I didn't know that option was available as I didn't see it in
the documentation or in ./configure --help.


FWIW, the GNU Autoconf application creates configure scripts that  
automatically accept "without" and "disable" versions of all of its  
"with" and "enable" options as negation operators.  This is not Open  
MPI-specific functionality.



I just ended up rebuilding and installing torque to my /opt/torque
share.  Thank you for your help with this.


This will likely be best; Open MPI will use the native Torque  
launching mechanisms for running MPI applications, which has a number  
of advantages for system monitoring and control.



Matt

__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open- 
mpi.org] On

Behalf Of Brian Barrett
Sent: Friday, June 08, 2007 3:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

Or tell Open MPI not to build torque support, which can be done at
configure time with the --without-tm option.

Open MPI tries to build support for whatever it finds in the default
search paths, plus whatever things you specify the location of.  Most
of the time, this is what the user wants.  In this case, however,
it's not what you wanted so you'll have to add the --without-tm  
option.


Hope this helps,

Brian


On Jun 8, 2007, at 1:08 PM, Cupp, Matthew R wrote:


So I either have to uninstall torque, make the shared libraries
available on all nodes, or have torque as static libraries on the  
head

node?

__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open-
mpi.org] On
Behalf Of Jeff Squyres
Sent: Friday, June 08, 2007 2:21 PM
To: Open MPI Users
Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote:


Yes.  But the /opt/torque directory is just the source, not the
actual
installed directory.  The actual installed directory on the head
node is
the default location of /usr/lib/something.  And that is not
accessable
by every node.

But should it matter if it's not accessable if I don't specify
--with-tm?  I was wondering if ./configure detects torque has been
installed, and then builds the associated components under the
assumption that it's available.


This is what OMPI does.

However, if you only have static libraries for Torque, the issue
should be moot -- the relevant bits should be statically linked into
the OMPI tm plugins.  But if your Torque libraries are shared, then
you do need to have them available on all nodes for OMPI to be able
to leverage native Torque/TM support.

Make sense?

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread Don Kerr
Additionally, Solaris comes with the IB drivers and since the libs are 
there OMPI thinks that it is available. You can suppress this message with 
   --mca btl_base_warn_component_unused 0

or specifically call out the btls you wish to use, example
   --mca btl self,sm,tcp

Brock Palen wrote:

It means that your OMPI was compiled to support uDAPL  (a type of  
infinibad network)  but that your computer does not have such a card  
installed.  Because you dont it will fall back to ethernet.  But  
because you are just running on a single machine.  You will use the  
fastest form of communication using shared memory.  so you can ignore  
that message.  Unless in the future you add a uDAPL powered network  
and you still get that message then you need to worry.


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Jun 10, 2007, at 9:18 AM, victor marian wrote:

 


Hello,

I have a Pentium D computer with Solaris 10 installed.
I installed OpenMPI, succesfully compiled my Fortran
program, but when giving
mpirun -np 2 progexe
I receive
[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.

I am a begginer in MPI and don't know what it means.
What
should I do to solve the problem?
Thank you.






__ 
__
Moody friends. Drama queens. Your life? Nope! - their life, your  
story. Play Sims Stories at Yahoo! Games.

http://sims.yahoo.com/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


   



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 



Re: [OMPI users] Library Definitions

2007-06-11 Thread Jeff Pummill

Glad to contribute Victor!

I am running on a home workstation that uses an AMD 3800 cpu attached to 
2 gigs of ram.
My timings for FT were 175 secs with one core and 110 on two cores with 
-O3 and -mtune=amd64 as tuning options.


Brock, Terry and Jeff are all exactly correct in their comments 
regarding benchmarks. There are simply too many variables to contend 
with. In addition, one and two core runs on a single workstation 
probably isn't the best evaluation of OpenMPI. As you expand to more 
devices and generate bigger problems (HPL or HPCC for example), a better 
overall picture will emerge.



Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas



victor marian wrote:

  Thank you everybody for the advices.
  I ran the NAS benchmark class B and it runs in 181
seconds on one core and in 90 seconds on two cores, so
it scales almost perfectly.
  What were your timings, Jeff, and what processor do
you exactly have?
  Mine is a Pentium D at 2.8GHz.

 Victor


--- Jeff Pummill  wrote:

  

Victor,

Build the FT benchmark and build it as a class B
problem. This will run 
in the 1-2 minute range instead of 2-4 seconds the
CG class A benchmark 
does.



Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas



Terry Frankcombe wrote:


Hi Victor

I'd suggest 3 seconds of CPU time is far, far to
  

small a problem to do


scaling tests with.  Even with only 2 CPUs, I
  

wouldn't go below 100


times that.


On Mon, 2007-06-11 at 01:10 -0700, victor marian
  

wrote:

  
  

Hi Jeff

I ran the NAS Parallel Bechmark and it gives for


me


-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
  

mpirun -np 1 cg.A.1



--
  

[0,1,0]: uDAPL on host SERVSOLARIS was unable to


find


any NICs.
Another transport will be used instead, although


this


may result in
lower performance.



--
  

 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 1
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02
 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.512264003323E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 3.02
 Total processes =1
 Compiled procs  =1
 Mop/s total =   495.93
 Mop/s/process   =   495.93
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007





-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
  

mpirun -np 2 cg.A.2



--
  

[0,1,0]: uDAPL on host SERVSOLARIS was unable to


find


any NICs.
Another transport will be used instead, although


this


may result in
lower performance.



--
  
--
  

[0,1,1]: uDAPL on host SERVSOLARIS was unable to


find


any NICs.
Another transport will be used instead, although


this


may result in
lower performance.



--
  

 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 2
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02

 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.522633719989E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 2.47
 Total processes =2
 Compiled procs  =2
 Mop/s total =   606.32
 Mop/s/process   =   303.16
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007


You can remark that the scalling is not so


good


like yours. Maibe I am having comunications


problems


between processors.
   You can also remark that I am faster on one


process


concared to your processor.

   Victor





--- Jeff Pummill  

Re: [OMPI users] Library Definitions

2007-06-11 Thread victor marian
  Thank you everybody for the advices.
  I ran the NAS benchmark class B and it runs in 181
seconds on one core and in 90 seconds on two cores, so
it scales almost perfectly.
  What were your timings, Jeff, and what processor do
you exactly have?
  Mine is a Pentium D at 2.8GHz.

 Victor


--- Jeff Pummill  wrote:

> Victor,
> 
> Build the FT benchmark and build it as a class B
> problem. This will run 
> in the 1-2 minute range instead of 2-4 seconds the
> CG class A benchmark 
> does.
> 
> 
> Jeff F. Pummill
> Senior Linux Cluster Administrator
> University of Arkansas
> 
> 
> 
> Terry Frankcombe wrote:
> > Hi Victor
> >
> > I'd suggest 3 seconds of CPU time is far, far to
> small a problem to do
> > scaling tests with.  Even with only 2 CPUs, I
> wouldn't go below 100
> > times that.
> >
> >
> > On Mon, 2007-06-11 at 01:10 -0700, victor marian
> wrote:
> >   
> >> Hi Jeff
> >>
> >> I ran the NAS Parallel Bechmark and it gives for
> me
> >>
>
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
> >> mpirun -np 1 cg.A.1
> >>
>
--
> >> [0,1,0]: uDAPL on host SERVSOLARIS was unable to
> find
> >> any NICs.
> >> Another transport will be used instead, although
> this
> >> may result in
> >> lower performance.
> >>
>
--
> >>  NAS Parallel Benchmarks 3.2 -- CG Benchmark
> >>
> >>  Size:  14000
> >>  Iterations:15
> >>  Number of active processes: 1
> >>  Number of nonzeroes per row:   11
> >>  Eigenvalue shift: .200E+02
> >>  Benchmark completed
> >>  VERIFICATION SUCCESSFUL
> >>  Zeta is  0.171302350540E+02
> >>  Error is 0.512264003323E-13
> >>
> >>
> >>  CG Benchmark Completed.
> >>  Class   =A
> >>  Size=14000
> >>  Iterations  =   15
> >>  Time in seconds = 3.02
> >>  Total processes =1
> >>  Compiled procs  =1
> >>  Mop/s total =   495.93
> >>  Mop/s/process   =   495.93
> >>  Operation type  =   floating point
> >>  Verification=   SUCCESSFUL
> >>  Version =  3.2
> >>  Compile date=  11 Jun 2007
> >>
> >>
> >>
>
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
> >> mpirun -np 2 cg.A.2
> >>
>
--
> >> [0,1,0]: uDAPL on host SERVSOLARIS was unable to
> find
> >> any NICs.
> >> Another transport will be used instead, although
> this
> >> may result in
> >> lower performance.
> >>
>
--
> >>
>
--
> >> [0,1,1]: uDAPL on host SERVSOLARIS was unable to
> find
> >> any NICs.
> >> Another transport will be used instead, although
> this
> >> may result in
> >> lower performance.
> >>
>
--
> >>
> >>
> >>  NAS Parallel Benchmarks 3.2 -- CG Benchmark
> >>
> >>  Size:  14000
> >>  Iterations:15
> >>  Number of active processes: 2
> >>  Number of nonzeroes per row:   11
> >>  Eigenvalue shift: .200E+02
> >>
> >>  Benchmark completed
> >>  VERIFICATION SUCCESSFUL
> >>  Zeta is  0.171302350540E+02
> >>  Error is 0.522633719989E-13
> >>
> >>
> >>  CG Benchmark Completed.
> >>  Class   =A
> >>  Size=14000
> >>  Iterations  =   15
> >>  Time in seconds = 2.47
> >>  Total processes =2
> >>  Compiled procs  =2
> >>  Mop/s total =   606.32
> >>  Mop/s/process   =   303.16
> >>  Operation type  =   floating point
> >>  Verification=   SUCCESSFUL
> >>  Version =  3.2
> >>  Compile date=  11 Jun 2007
> >>
> >>
> >> You can remark that the scalling is not so
> good
> >> like yours. Maibe I am having comunications
> problems
> >> between processors.
> >>You can also remark that I am faster on one
> process
> >> concared to your processor.
> >>
> >>Victor
> >>
> >>
> >>
> >>
> >>
> >> --- Jeff Pummill  wrote:
> >>
> >> 
> >>> Perfect! Thanks Jeff!
> >>>
> >>> The NAS Parallel Benchmark on a dual core AMD
> >>> machine now returns this...
> >>> [jpummil@localhost bin]$ mpirun -np 1 cg.A.1
> >>> NAS Parallel Benchmarks 3.2 -- CG Benchmark
> >>> CG Benchmark Completed.
> >>>  Class   =A
> >>>  Size=14000
> >>>  Iterations  =   15
> >>>  Time in

Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-11 Thread Cupp, Matthew R
Ah ha!  I didn't know that option was available as I didn't see it in
the documentation or in ./configure --help.

I just ended up rebuilding and installing torque to my /opt/torque
share.  Thank you for your help with this.

Matt

__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Brian Barrett
Sent: Friday, June 08, 2007 3:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

Or tell Open MPI not to build torque support, which can be done at  
configure time with the --without-tm option.

Open MPI tries to build support for whatever it finds in the default  
search paths, plus whatever things you specify the location of.  Most  
of the time, this is what the user wants.  In this case, however,  
it's not what you wanted so you'll have to add the --without-tm option.

Hope this helps,

Brian


On Jun 8, 2007, at 1:08 PM, Cupp, Matthew R wrote:

> So I either have to uninstall torque, make the shared libraries
> available on all nodes, or have torque as static libraries on the head
> node?
>
> __
> Matt Cupp
> Battelle Memorial Institute
> Statistics and Information Analysis
>
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-bounces@open- 
> mpi.org] On
> Behalf Of Jeff Squyres
> Sent: Friday, June 08, 2007 2:21 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm
>
> On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote:
>
>> Yes.  But the /opt/torque directory is just the source, not the  
>> actual
>> installed directory.  The actual installed directory on the head
>> node is
>> the default location of /usr/lib/something.  And that is not
>> accessable
>> by every node.
>>
>> But should it matter if it's not accessable if I don't specify
>> --with-tm?  I was wondering if ./configure detects torque has been
>> installed, and then builds the associated components under the
>> assumption that it's available.
>
> This is what OMPI does.
>
> However, if you only have static libraries for Torque, the issue
> should be moot -- the relevant bits should be statically linked into
> the OMPI tm plugins.  But if your Torque libraries are shared, then
> you do need to have them available on all nodes for OMPI to be able
> to leverage native Torque/TM support.
>
> Make sense?
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Library Definitions

2007-06-11 Thread Jeff Pummill

Victor,

Build the FT benchmark and build it as a class B problem. This will run 
in the 1-2 minute range instead of 2-4 seconds the CG class A benchmark 
does.



Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas



Terry Frankcombe wrote:

Hi Victor

I'd suggest 3 seconds of CPU time is far, far to small a problem to do
scaling tests with.  Even with only 2 CPUs, I wouldn't go below 100
times that.


On Mon, 2007-06-11 at 01:10 -0700, victor marian wrote:
  

Hi Jeff

I ran the NAS Parallel Bechmark and it gives for me
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
mpirun -np 1 cg.A.1
--
[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
--
 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 1
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02
 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.512264003323E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 3.02
 Total processes =1
 Compiled procs  =1
 Mop/s total =   495.93
 Mop/s/process   =   495.93
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007


-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
mpirun -np 2 cg.A.2
--
[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
--
--
[0,1,1]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
--


 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 2
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02

 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.522633719989E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 2.47
 Total processes =2
 Compiled procs  =2
 Mop/s total =   606.32
 Mop/s/process   =   303.16
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007


You can remark that the scalling is not so good
like yours. Maibe I am having comunications problems
between processors.
   You can also remark that I am faster on one process
concared to your processor.

   Victor





--- Jeff Pummill  wrote:



Perfect! Thanks Jeff!

The NAS Parallel Benchmark on a dual core AMD
machine now returns this...
[jpummil@localhost bin]$ mpirun -np 1 cg.A.1
NAS Parallel Benchmarks 3.2 -- CG Benchmark
CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 4.75
 Total processes =1
 Compiled procs  =1
 Mop/s total =   315.32

...and...

[jpummil@localhost bin]$ mpirun -np 2 cg.A.2
NAS Parallel Benchmarks 3.2 -- CG Benchmark
 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 2.48
 Total processes =2
 Compiled procs  =2
 Mop/s total =   604.46

Not quite linear, but one must account for all of
the OS traffic that 
one core or the other must deal with.



Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"A supercomputer is a device for turning
compute-bound
problems into I/O-bound problems." -Seymour Cray


Jeff Squyres wrote:
  

Just remove the -L and 

Re: [OMPI users] Library Definitions

2007-06-11 Thread Brock Palen

I agree.  I like benchmarks to run 15 minutes to 24 hours.

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Jun 11, 2007, at 4:17 AM, Terry Frankcombe wrote:



Hi Victor

I'd suggest 3 seconds of CPU time is far, far to small a problem to do
scaling tests with.  Even with only 2 CPUs, I wouldn't go below 100
times that.


On Mon, 2007-06-11 at 01:10 -0700, victor marian wrote:

Hi Jeff

I ran the NAS Parallel Bechmark and it gives for me
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
mpirun -np 1 cg.A.1
- 
-

[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
- 
-

 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 1
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02
 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.512264003323E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 3.02
 Total processes =1
 Compiled procs  =1
 Mop/s total =   495.93
 Mop/s/process   =   495.93
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007


-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
mpirun -np 2 cg.A.2
- 
-

[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
- 
-
- 
-

[0,1,1]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
- 
-



 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 2
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02

 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.522633719989E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 2.47
 Total processes =2
 Compiled procs  =2
 Mop/s total =   606.32
 Mop/s/process   =   303.16
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007


You can remark that the scalling is not so good
like yours. Maibe I am having comunications problems
between processors.
   You can also remark that I am faster on one process
concared to your processor.

   Victor





--- Jeff Pummill  wrote:


Perfect! Thanks Jeff!

The NAS Parallel Benchmark on a dual core AMD
machine now returns this...
[jpummil@localhost bin]$ mpirun -np 1 cg.A.1
NAS Parallel Benchmarks 3.2 -- CG Benchmark
CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 4.75
 Total processes =1
 Compiled procs  =1
 Mop/s total =   315.32

...and...

[jpummil@localhost bin]$ mpirun -np 2 cg.A.2
NAS Parallel Benchmarks 3.2 -- CG Benchmark
 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 2.48
 Total processes =2
 Compiled procs  =2
 Mop/s total =   604.46

Not quite linear, but one must account for all of
the OS traffic that
one core or the other must deal with.


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"A supercomputer is a device for turning
compute-bound
problems into I/O-bound problems." -Seymour Cray


Jeff Squyres wrote:

Just remove the -L and -l arguments -- OMPI's

"mpif90" (and other

wrapper compilers) will do all

Re: [OMPI users] Timing communication

2007-06-11 Thread Jeff Squyres
Measuring communications is a very tricky process; there's a lot of  
factors involved.  Check out this FAQ item:


http://www.open-mpi.org/faq/?category=tuning#running-perf-numbers

You might want to use a well-known benchmark program (e.g., NetPIPE,  
link checker, etc.) to run pair-wise communication performance  
analysis rather than write your own application; it's typically not  
as simple as just doing a few sends within a loop.


The issue is that MPI may make different decisions on how to send  
messages, including factors such as:


- is this the first time you have sent between these peer pair?
- who are you sending to?
- what is the size of the message?
- are there other messages pending?
- are other messages incoming from different peers while you are  
sending?


Your simplistic loop below can cause some "bad" things to happen  
(i.e., not give a true/absolute measure of what max performance is  
between a pair of peers) by unintentionally stepping on several of  
the things that Open MPI does behind the scenes (e.g., we don't make  
network connections until the first time a message is sent between a  
given peer pair).


But on the flip side, there's a whole school of thought that micro  
benchmarks are only useful in a limited sense (because they test  
artificial scenarios), and the only thing that *really* matters is  
your application's performance.  Hence, micro benchmarks are good as  
input for guiding tuning issues, but they are not the absolute  
measure of how well a given OS/middleware/network are performing.   
That being said, a poorly-written application will tend perform  
poorly regardless of how well the OS/middleware/network performs.


And so on.

This is an age-old religious debate, and both sides have some good  
points.  I won't re-hash the entire debate here.  :-)



On Jun 4, 2007, at 10:00 AM, Allan, Mark ((UK Filton)) wrote:


Hi,

I'm new to this list and wonder if anyone can help.  I'm trying to  
measure communication time between parallel processes using  
openmpi.  As an example I might be running on 4 dual core  
processors (8 processes in total).  I was hoping that communication  
using shared memory (comms between dual cores on the same chip)  
would be faster than that over the network.  To measure  
communication time I'm sending a block of data to each process  
(from each process) using a blocking send, and am timing how long  
it takes.  I repeat this 50 times (for example) and take the  
average time.  The code is something like:


 for(int i=0;i  MPI::COMM_WORLD.Send(&sendData 
[0],dataSize,MPI::DOUBLE,j,i);

  double end = MPI::Wtime();
  time+=(end-start);
  }
  if(j==my_rank)
  {
  MPI::COMM_WORLD.Recv(&recvData 
[0],dataSize,MPI::DOUBLE,i,i);

  }
 }
 if(i==my_rank)
  out << i << " " << j << " " << time/50.0 <<  
std::endl;

 MPI::COMM_WORLD.Barrier();
 }

The problem I am having is that I'm not noticing any appreciable  
difference in communication times between shared memory and network  
protocols.  I expected shared memory to be faster(!?!).


Does anyone have a better way of measuring communication times?

Thanks,

Mark.

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Library Definitions

2007-06-11 Thread Terry Frankcombe

Hi Victor

I'd suggest 3 seconds of CPU time is far, far to small a problem to do
scaling tests with.  Even with only 2 CPUs, I wouldn't go below 100
times that.


On Mon, 2007-06-11 at 01:10 -0700, victor marian wrote:
> Hi Jeff
> 
> I ran the NAS Parallel Bechmark and it gives for me
> -bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
> mpirun -np 1 cg.A.1
> --
> [0,1,0]: uDAPL on host SERVSOLARIS was unable to find
> any NICs.
> Another transport will be used instead, although this
> may result in
> lower performance.
> --
>  NAS Parallel Benchmarks 3.2 -- CG Benchmark
> 
>  Size:  14000
>  Iterations:15
>  Number of active processes: 1
>  Number of nonzeroes per row:   11
>  Eigenvalue shift: .200E+02
>  Benchmark completed
>  VERIFICATION SUCCESSFUL
>  Zeta is  0.171302350540E+02
>  Error is 0.512264003323E-13
> 
> 
>  CG Benchmark Completed.
>  Class   =A
>  Size=14000
>  Iterations  =   15
>  Time in seconds = 3.02
>  Total processes =1
>  Compiled procs  =1
>  Mop/s total =   495.93
>  Mop/s/process   =   495.93
>  Operation type  =   floating point
>  Verification=   SUCCESSFUL
>  Version =  3.2
>  Compile date=  11 Jun 2007
> 
> 
> -bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
> mpirun -np 2 cg.A.2
> --
> [0,1,0]: uDAPL on host SERVSOLARIS was unable to find
> any NICs.
> Another transport will be used instead, although this
> may result in
> lower performance.
> --
> --
> [0,1,1]: uDAPL on host SERVSOLARIS was unable to find
> any NICs.
> Another transport will be used instead, although this
> may result in
> lower performance.
> --
> 
> 
>  NAS Parallel Benchmarks 3.2 -- CG Benchmark
> 
>  Size:  14000
>  Iterations:15
>  Number of active processes: 2
>  Number of nonzeroes per row:   11
>  Eigenvalue shift: .200E+02
> 
>  Benchmark completed
>  VERIFICATION SUCCESSFUL
>  Zeta is  0.171302350540E+02
>  Error is 0.522633719989E-13
> 
> 
>  CG Benchmark Completed.
>  Class   =A
>  Size=14000
>  Iterations  =   15
>  Time in seconds = 2.47
>  Total processes =2
>  Compiled procs  =2
>  Mop/s total =   606.32
>  Mop/s/process   =   303.16
>  Operation type  =   floating point
>  Verification=   SUCCESSFUL
>  Version =  3.2
>  Compile date=  11 Jun 2007
> 
> 
> You can remark that the scalling is not so good
> like yours. Maibe I am having comunications problems
> between processors.
>You can also remark that I am faster on one process
> concared to your processor.
> 
>Victor
> 
> 
> 
> 
> 
> --- Jeff Pummill  wrote:
> 
> > Perfect! Thanks Jeff!
> > 
> > The NAS Parallel Benchmark on a dual core AMD
> > machine now returns this...
> > [jpummil@localhost bin]$ mpirun -np 1 cg.A.1
> > NAS Parallel Benchmarks 3.2 -- CG Benchmark
> > CG Benchmark Completed.
> >  Class   =A
> >  Size=14000
> >  Iterations  =   15
> >  Time in seconds = 4.75
> >  Total processes =1
> >  Compiled procs  =1
> >  Mop/s total =   315.32
> > 
> > ...and...
> > 
> > [jpummil@localhost bin]$ mpirun -np 2 cg.A.2
> > NAS Parallel Benchmarks 3.2 -- CG Benchmark
> >  CG Benchmark Completed.
> >  Class   =A
> >  Size=14000
> >  Iterations  =   15
> >  Time in seconds = 2.48
> >  Total processes =2
> >  Compiled procs  =2
> >  Mop/s total =   604.46
> > 
> > Not quite linear, but one must account for all of
> > the OS traffic that 
> > one core or the other must deal with.
> > 
> > 
> > Jeff F. Pummill
> > Senior Linux Cluster Administrator
> > University of Arkansas
> > Fayetteville, Arkansas 72701
> > (479) 575 - 4590
> > http://hpc.uark.edu
> > 
> > "A supercomputer is a device for turning
> > compute-bound
> > probl

Re: [OMPI users] Library Definitions

2007-06-11 Thread victor marian
Hi Jeff

I ran the NAS Parallel Bechmark and it gives for me
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
mpirun -np 1 cg.A.1
--
[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
--
 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 1
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02
 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.512264003323E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 3.02
 Total processes =1
 Compiled procs  =1
 Mop/s total =   495.93
 Mop/s/process   =   495.93
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007


-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
mpirun -np 2 cg.A.2
--
[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
--
--
[0,1,1]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another transport will be used instead, although this
may result in
lower performance.
--


 NAS Parallel Benchmarks 3.2 -- CG Benchmark

 Size:  14000
 Iterations:15
 Number of active processes: 2
 Number of nonzeroes per row:   11
 Eigenvalue shift: .200E+02

 Benchmark completed
 VERIFICATION SUCCESSFUL
 Zeta is  0.171302350540E+02
 Error is 0.522633719989E-13


 CG Benchmark Completed.
 Class   =A
 Size=14000
 Iterations  =   15
 Time in seconds = 2.47
 Total processes =2
 Compiled procs  =2
 Mop/s total =   606.32
 Mop/s/process   =   303.16
 Operation type  =   floating point
 Verification=   SUCCESSFUL
 Version =  3.2
 Compile date=  11 Jun 2007


You can remark that the scalling is not so good
like yours. Maibe I am having comunications problems
between processors.
   You can also remark that I am faster on one process
concared to your processor.

   Victor





--- Jeff Pummill  wrote:

> Perfect! Thanks Jeff!
> 
> The NAS Parallel Benchmark on a dual core AMD
> machine now returns this...
> [jpummil@localhost bin]$ mpirun -np 1 cg.A.1
> NAS Parallel Benchmarks 3.2 -- CG Benchmark
> CG Benchmark Completed.
>  Class   =A
>  Size=14000
>  Iterations  =   15
>  Time in seconds = 4.75
>  Total processes =1
>  Compiled procs  =1
>  Mop/s total =   315.32
> 
> ...and...
> 
> [jpummil@localhost bin]$ mpirun -np 2 cg.A.2
> NAS Parallel Benchmarks 3.2 -- CG Benchmark
>  CG Benchmark Completed.
>  Class   =A
>  Size=14000
>  Iterations  =   15
>  Time in seconds = 2.48
>  Total processes =2
>  Compiled procs  =2
>  Mop/s total =   604.46
> 
> Not quite linear, but one must account for all of
> the OS traffic that 
> one core or the other must deal with.
> 
> 
> Jeff F. Pummill
> Senior Linux Cluster Administrator
> University of Arkansas
> Fayetteville, Arkansas 72701
> (479) 575 - 4590
> http://hpc.uark.edu
> 
> "A supercomputer is a device for turning
> compute-bound
> problems into I/O-bound problems." -Seymour Cray
> 
> 
> Jeff Squyres wrote:
> > Just remove the -L and -l arguments -- OMPI's
> "mpif90" (and other  
> > wrapper compilers) will do all that magic for you.
> >
> > Many -L/-l arguments in MPI application Makefiles
> are throwbacks to  
> > older versions of MPICH wrapper compilers that
> didn't always work  
> > properly.  Those days are long gone; most (all?)
> MPI wrapper  
> > compilers do not need you to specify -L/-l these
> days.
> >
> >
> >
> > On Jun 1