Re: [OMPI users] Open MPI instructional videos

2008-05-27 Thread Graham Jenkins
Jeff Squyres wrote:
> Over the past year or two, I have been slowly creating a large set of  
> Open MPI training material that I've used to present to my company's  
> customers and partners.  I have just recently received permission to  
> release all of my slides to the greater HPC community.  Woo hoo!

Great idea Jeff, sounds really useful.  But where do I find them?
-- 
Graham Jenkins
Senior Software Specialist, eResearch
Monash University (Clayton Campus, Bldg 11, Rm S503)

Email: graham.jenk...@its.monash.edu.au
Tel:   +613 9905-5942 (office)   +614 4850-2491 (mobile)


[OMPI users] Different Interfaces on Different Nodes .. OpenMPI 1.2.3, 1.2.4 ..

2008-04-14 Thread Graham Jenkins
We're moving from using a single (eth0) interface on our execute nodes
to using a bond interface (bond0) for resilience.
And what we're seeing on those nodes which have been upgraded is:
--
[0,1,1][btl_tcp_component.c:349:mca_btl_tcp_component_create_instances]
invalid interface "eth0"
--

This of course, is because all nodes share a common copy of
openmpi-mca-params.conf .. in which its says:
--
btl_tcp_if_include=eth0
--

So .. does anybody have a suggestion for a way around this during our
migration/upgrade period?
If we place "bond0" in there as well, then we get error messages about
whichever one is absent on the node where execution is happening.

Regards ..
-- 
Graham Jenkins
Senior Software Specialist, eResearch
Monash University (Clayton Campus, Bldg 11, Rm S503)

Email: graham.jenk...@its.monash.edu.au
Tel:   +613 9905-5942 (office)   +614 4850-2491 (mobile)


[OMPI users] Excessive Use of CPU System Resources with OpenMPI 1.2.4 using TCP only ..

2008-01-22 Thread Graham Jenkins
We've observed an excessive use of CPU system resources with OpenMPI
1.2.4 using TCP connections only on our SL5 x86_64 Cluster. Typically,
for a simple Canonical Ring Program, we're seeing between 30 and 70%
system usage.

Has anybody else noticed this sort of behaviour?
And does anybody have some suggestions for resolving the issue?

Present values we have are:
--
ompi_info --param btl tcp |grep MCA
 MCA btl: parameter "btl_base_debug" (current value: "0")
 MCA btl: parameter "btl" (current value: )
 MCA btl: parameter "btl_base_verbose" (current value: "0")
 MCA btl: parameter "btl_tcp_if_include" (current value:
"eth0")
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
"lo")
 MCA btl: parameter "btl_tcp_free_list_num" (current
value: "8")
 MCA btl: parameter "btl_tcp_free_list_max" (current
value: "-1")
 MCA btl: parameter "btl_tcp_free_list_inc" (current
value: "32")
 MCA btl: parameter "btl_tcp_sndbuf" (current value:
"131072")
 MCA btl: parameter "btl_tcp_rcvbuf" (current value:
"131072")
 MCA btl: parameter "btl_tcp_endpoint_cache" (current
value: "30720")
 MCA btl: parameter "btl_tcp_exclusivity" (current
value: "0")
 MCA btl: parameter "btl_tcp_eager_limit" (current
value: "65536")
 MCA btl: parameter "btl_tcp_min_send_size" (current
value: "65536")
 MCA btl: parameter "btl_tcp_max_send_size" (current
value: "131072")
 MCA btl: parameter "btl_tcp_min_rdma_size" (current
value: "131072")
 MCA btl: parameter "btl_tcp_max_rdma_size" (current
value: "2147483647")
 MCA btl: parameter "btl_tcp_flags" (current value: "122")
 MCA btl: parameter "btl_tcp_priority" (current value: "0")
 MCA btl: parameter "btl_base_warn_component_unused"
(current value: "1")

-- 
Graham Jenkins
Senior Software Specialist, eResearch
Monash University

Email: graham.jenk...@its.monash.edu.au
Tel:   +613 9905-5942 (office)   +614 4850-2491 (mobile)


Re: [OMPI users] NAMD/Charm++ Looking for libmpich

2007-08-06 Thread Graham Jenkins
Brock Palen wrote:
> I have done work before to make namd and charm++ work with openMPI I  
> dont remember what but it is doable.  Something like removing -lmpich  
> was enough i think, maybe a hack to use mpiCC and -fPIC (pgi compilers).
> 
> I could look more if you want.
--

I'd really appreciate that Brock, thanks!  Where would one remove
"-lmpich" from?   I've had some difficulty finding it.

It actually builds OK using:
  ./build charm++ mpi-linux-amd64 ifort \
--basedir /opt/sw/openmpi-1.2.3-i

But if barfs when you try to do: "try out a sample program like
tests/charm++/simplearrayhello"

You can actually make the test compile by doing:
  cd /opt/sw/openmpi-1.2.3-i/lib ; ln -s libmpi.so.0.0.0 libmpich.so
  .. but I'm not sure that it's legit! :)



-- 
Graham Jenkins
Senior Software Specialist, E-Research

Email: graham.jenk...@its.monash.edu.au
Tel:   +613 9905-5942
Mob:   +614 4850-2491


[OMPI users] NAMD/Charm++ Looking for libmpich

2007-08-06 Thread Graham Jenkins
This iteme was originally sent to the NAMD mailing list, but it occurred
to me that it's something you guys may ahve seen in another vein .. and
may have a solution for ..

I'm trying to build charm++ on a SL5 x86_64 machine on which the
openmpi-1.1.1-5.el5.x86_64 RPM has been installed.

So here's the sequence:
--
cd charm-5.9
module load openmpi-intel
./build charm++ mpi-linux-amd64 --libdir=/usr/lib64/openmpi \
--incdir=/usr/include/openmpi

 ..
cd tests/charm++/simplearrayhello
 make
../../../bin/charmc  -language charm++ -o hello hello.o
/usr/bin/ld: cannot find -lmpich
collect2: ld returned 1 exit status
--

Bottom line .. charm++ doesn't know about libmpi, even though it exists
thus:
  ls -1 /opt/sw/openmpi-1.2.3-i/lib/libmpi.??
/opt/sw/openmpi-1.2.3-i/lib/libmpi.la
/opt/sw/openmpi-1.2.3-i/lib/libmpi.so

So .. anybody got a solution .. please?
-- 
Graham Jenkins
Senior Software Specialist, E-Research

Email: graham.jenk...@its.monash.edu.au
Tel:   +613 9905-5942
Mob:   +614 4850-2491


[OMPI users] Unable to find any HCAs ..

2007-07-04 Thread Graham Jenkins
I'm using the openmpi-1.1.1-5.el5.x86_64 RPM on a Scientific Linux 5 
cluster, with no installed HCAs. And a simple MPI job submitted to that 
cluster runs OK .. except that it issues messages for each node like the 
one shown below.  Is there some way I can supress these, perhaps by an 
appropriate entry in /etc/openmpi-mca-params.conf ?


--
libibverbs: Fatal: couldn't open sysfs class 'infiniband_verbs'.
--
[0,1,0]: OpenIB on host localhost was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--

--
Graham Jenkins
Senior Software Specialist, E-Research

Email: graham.jenk...@its.monash.edu.au
Tel:   +613 9905-5942
Mob:   +614 4850-2491