Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-18 Thread Christopher Samuel
Hi Brian,

On 17/11/18 5:13 am, Barrett, Brian via devel wrote:

> Unfortunately, I don’t have a good idea of what to do now. We already 
> did the damage on the 3.x series. Our backwards compatibility testing 
> (as lame as it is) just links libmpi, so it’s all good. But if anyone 
> uses libtool, we’ll have a problem, because we install the .la files 
> that allow libtool to see the dependency of libmpi on libopen-pal, and 
> it gets too excited.
> 
> We’ll need to talk about how we think about this change in the future.

Thanks for that - personally I think it's a misfeature in libtool to add 
these extra dependencies, it would be handy if there was a way to turn 
it off - but that's not your problem.

For us it just means that when we bring in a new Open-MPI we just need 
to build new versions of our installed libraries and codes against it, 
fortunately that's something that Easybuild makes (relatively) easy.

Thanks for your time everyone - this is my last week at Swinburne before 
I leave Australia to start at NERSC in December!

All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
On 15/11/18 12:10 pm, Christopher Samuel wrote:

> I wonder if it's because they use libtool instead?

Yup, it's libtool - using it compile my toy example shows the same
behaviour with "readelf -d" pulling in the private libraries directly. :-(

[csamuel@farnarkle2 libtool]$ cat hhgttg.c
int answer(void)
{
return(42);
}


[csamuel@farnarkle2 libtool]$ libtool compile gcc hhgttg.c -c -o hhgttg.o
libtool: compile:  gcc hhgttg.c -c  -fPIC -DPIC -o .libs/hhgttg.o
libtool: compile:  gcc hhgttg.c -c -o hhgttg.o >/dev/null 2>&1


[csamuel@farnarkle2 libtool]$ libtool link gcc -o libhhgttg.la hhgttg.lo 
-lmpi -rpath /usr/local/lib
libtool: link: gcc -shared  -fPIC -DPIC  .libs/hhgttg.o   -Wl,-rpath 
-Wl,/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib 
-Wl,-rpath 
-Wl,/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libmpi.so 
-L/apps/skylake/software/core/gcccore/6.4.0/lib64 
-L/apps/skylake/software/core/gcccore/6.4.0/lib 
-L/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-rte.so 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-pal.so 
-ldl -lrt -lutil -lm -lpthread -lz -lhwloc-Wl,-soname 
-Wl,libhhgttg.so.0 -o .libs/libhhgttg.so.0.0.0
libtool: link: (cd ".libs" && rm -f "libhhgttg.so.0" && ln -s 
"libhhgttg.so.0.0.0" "libhhgttg.so.0")
libtool: link: (cd ".libs" && rm -f "libhhgttg.so" && ln -s 
"libhhgttg.so.0.0.0" "libhhgttg.so")
libtool: link: ar cru .libs/libhhgttg.a  hhgttg.o
libtool: link: ranlib .libs/libhhgttg.a
libtool: link: ( cd ".libs" && rm -f "libhhgttg.la" && ln -s 
"../libhhgttg.la" "libhhgttg.la" )


[csamuel@farnarkle2 libtool]$ readelf -d .libs/libhhgttg.so.0| fgrep -i lib
  0x0001 (NEEDED) Shared library: [libmpi.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-rte.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-pal.so.40]
  0x0001 (NEEDED) Shared library: [libdl.so.2]
  0x0001 (NEEDED) Shared library: [librt.so.1]
  0x0001 (NEEDED) Shared library: [libutil.so.1]
  0x0001 (NEEDED) Shared library: [libm.so.6]
  0x0001 (NEEDED) Shared library: [libpthread.so.0]
  0x0001 (NEEDED) Shared library: [libz.so.1]
  0x0001 (NEEDED) Shared library: [libhwloc.so.5]
  0x0001 (NEEDED) Shared library: [libc.so.6]
  0x000e (SONAME)     Library soname: [libhhgttg.so.0]
  0x001d (RUNPATH)Library runpath: 
[/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib]


All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
On 15/11/18 11:45 am, Christopher Samuel wrote:

> Unfortunately that's not the case, just creating a shared library
> that only links in libmpi.so will create dependencies on the private
> libraries too in the final shared library. :-(

Hmm, I might be misinterpreting the output of "ldd", it looks like it
reports the dependencies of dependencies not just the direct
dependencies.  "readelf -d" seems more reliable.

[csamuel@farnarkle2 libtool]$ readelf -d libhhgttg.so.1 | fgrep -i lib
  0x0001 (NEEDED) Shared library: [libmpi.so.40]
  0x0001 (NEEDED) Shared library: [libc.so.6]
  0x000e (SONAME) Library soname: [libhhgttg.so.1]

Whereas the HDF5 libraries really do have them listed as a dependency.

[csamuel@farnarkle2 1.10.1]$ readelf -d ./lib/libhdf5_fortran.so.100 | 
fgrep -i lib
  0x0001 (NEEDED) Shared library: [libhdf5.so.101]
  0x0001 (NEEDED) Shared library: [libsz.so.2]
  0x0001 (NEEDED) Shared library: 
[libmpi_usempif08.so.40]
  0x0001 (NEEDED) Shared library: 
[libmpi_usempi_ignore_tkr.so.40]
  0x0001 (NEEDED) Shared library: 
[libmpi_mpifh.so.40]
  0x0001 (NEEDED) Shared library: [libmpi.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-rte.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-pal.so.40]
  0x0001 (NEEDED) Shared library: [libdl.so.2]
  0x0001 (NEEDED) Shared library: [librt.so.1]
  0x0001 (NEEDED) Shared library: [libutil.so.1]
  0x0001 (NEEDED) Shared library: [libpthread.so.0]
  0x0001 (NEEDED) Shared library: [libz.so.1]
  0x0001 (NEEDED) Shared library: [libhwloc.so.5]
  0x0001 (NEEDED) Shared library: [libgfortran.so.3]
  0x0001 (NEEDED) Shared library: [libm.so.6]
  0x0001 (NEEDED) Shared library: [libquadmath.so.0]
  0x0001 (NEEDED) Shared library: [libc.so.6]
  0x0001 (NEEDED) Shared library: [libgcc_s.so.1]
  0x000e (SONAME) Library soname: 
[libhdf5_fortran.so.100]
  0x001d (RUNPATH)Library runpath: 
[/apps/skylake/software/mpi/gcc/6.4.0/openmpi/3.0.0/hdf5/1.10.1/lib:/apps/skylake/software/core/szip/2.1.1/lib:/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib:/apps/skylake/software/core/gcccore/6.4.0/lib/../lib64]

I wonder if it's because they use libtool instead?

All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
On 15/11/18 2:16 am, Barrett, Brian via devel wrote:

> In practice, this should not be a problem. The wrapper compilers (and
>  our instructions for linking when not using the wrapper compilers)
> only link against libmpi.so (or a set of libraries if using Fortran),
> as libmpi.so contains the public interface. libmpi.so has a
> dependency on libopen-pal.so so the loader will load the version of
> libopen-pal.so that matches the version of Open MPI used to build
> libmpi.so However, if someone explicitly links against libopen-pal.so
> you end up where we are today.

Unfortunately that's not the case, just creating a shared library
that only links in libmpi.so will create dependencies on the private
libraries too in the final shared library. :-(

Here's a toy example to illustrate that.

[csamuel@farnarkle2 libtool]$ cat hhgttg.c
int answer(void)
{
return(42);
}

[csamuel@farnarkle2 libtool]$ gcc hhgttg.c -c -o hhgttg.o

[csamuel@farnarkle2 libtool]$ gcc -shared -Wl,-soname,libhhgttg.so.1 -o 
libhhgttg.so.1 hhgttg.o -lmpi

[csamuel@farnarkle2 libtool]$ ldd libhhgttg.so.1
linux-vdso.so.1 =>  (0x7ffc625b3000)
libmpi.so.40 => 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libmpi.so.40 
(0x7f018a582000)
libc.so.6 => /lib64/libc.so.6 (0x7f018a09e000)
libopen-rte.so.40 => 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-rte.so.40 
(0x7f018a4b5000)
libopen-pal.so.40 => 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-pal.so.40 
(0x7f0189fde000)
libdl.so.2 => /lib64/libdl.so.2 (0x7f0189dda000)
librt.so.1 => /lib64/librt.so.1 (0x7f0189bd2000)
libutil.so.1 => /lib64/libutil.so.1 (0x7f01899cf000)
libm.so.6 => /lib64/libm.so.6 (0x7f01896cd000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7f01894b1000)
libz.so.1 => /lib64/libz.so.1 (0x7f018929b000)
libhwloc.so.5 => /lib64/libhwloc.so.5 (0x7f018905e000)
/lib64/ld-linux-x86-64.so.2 (0x7f018a46b000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x7f0188e52000)
libltdl.so.7 => /lib64/libltdl.so.7 (0x7f0188c48000)
libgcc_s.so.1 => 
/apps/skylake/software/core/gcccore/6.4.0/lib64/libgcc_s.so.1 
(0x7f018a499000)


All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
Hi folks,

Just resub'd after a long time to ask a question about binary/backwards 
compatibility.

We got bitten when upgrading from 3.0.0 to 3.0.3 which we assumed would be 
binary compatible and so (after some testing to confirm it was) replaced our 
existing 3.0.0 install with the 3.0.3 one (because we're using hierarchical 
namespaces in Lmod it meant we avoided needed to recompile everything we'd 
already built over the last 12 months with 3.0.0).

However, once we'd done that we heard from a user that their code would no 
longer run because it couldn't find libopen-pal.so.40 and saw that instead 
3.0.3 had libopen-pal.so.42.

Initially we thought this was some odd build system problem, but then on 
digging further we realised that they were linking against libraries that in 
turn were built against OpenMPI (HDF5) and that those had embedded the 
libopen-pal.so.40 names.

Of course our testing hadn't found that because we weren't linking against 
anything like those for our MPI tests. :-(

But I was really surprised to see that these version numbers were changing, I 
thought the idea was to keep things backwardly compatible within these series?

Now fortunately our reason for doing the forced upgrade (we found our 3.0.0 
didn't work with our upgrade to Slurm 18.08.3) was us missing one combination 
out of our testing whilst fault-finding and having gotten it going we've been 
able to drop back to the original 3.0.0 & fixed it for them.

But is this something that you folks have come across before?

All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav



___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Christopher Samuel
On 08/11/17 12:30, Kawashima, Takahiro wrote:

> As other people said, Fujitsu MPI used in K is based on old
> Open MPI (v1.6.3 with bug fixes). 

I guess the obvious question is will the vanilla Open-MPI work on K?

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open-MPI killing nodes with mlx5 drivers?

2017-11-05 Thread Christopher Samuel
On 30/10/17 14:07, Christopher Samuel wrote:

> We have an issue where codes compiled with Open-MPI kill nodes with
> ConnectX-4 and ConnectX-5 cards connected to Mellanox Ethernet switches
> using the mlx5 driver from the latest Mellanox OFED

For the record, this crash is fixed in Mellanox OFED 4.2, which came out
after I wrote that with the necessary fix.

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Open-MPI killing nodes with mlx5 drivers?

2017-10-29 Thread Christopher Samuel
Hi folks,

Trying the devel list to see if folks here have hit this issue when
testing out as I suspect it's not something many users will have access
to yet.

We have an issue where codes compiled with Open-MPI kill nodes with
ConnectX-4 and ConnectX-5 cards connected to Mellanox Ethernet switches
using the mlx5 driver from the latest Mellanox OFED, the kernel hangs
with no oops (or any other error) and we have to power cycle the node to
get it back.

This happens with even a singleton (no srun or mpirun) and from what I
can see from strace before the node hangs Open-MPI is starting to probe
for what fabrics are available.

The folks I'm helping have engaged Mellanox support but I was wondering
if anyone else had run across this?

Distro: RHEL 7.4 (x86-64)
Kernel: 4.12.9 (needed for the CephFS filesystem they use)
OFED: 4.1-1.0.2.0
Open-MPI: 1.10.x, 2.0.2, 3.0.0

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Segfault on MPI init

2017-02-21 Thread Christopher Samuel
On 15/02/17 00:45, Gilles Gouaillardet wrote:

> i would expect orted generate a core, and then you can use gdb post
> mortem to get the stack trace.
> there should be several threads, so you can
> info threads
> bt
> you might have to switch to an other thread

You can also get a backtrace from all threads at once with:

thread apply all bt

It's not just limited to 'bt' either:

(gdb) help thread apply
Apply a command to a list of threads.

List of thread apply subcommands:

thread apply all -- Apply a command to all threads


-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] RFC: Rename nightly snapshot tarballs

2016-10-17 Thread Christopher Samuel
On 18/10/16 07:17, Jeff Squyres (jsquyres) wrote:

> NOTE: It may be desirable to add HHMM in there; it's not common, but
> *sometimes* we do make more than one snapshot in a day (e.g., if one
> snapshot is borked, so we fix it and then generate another
> snapshot).

If it's been happened before then I'd suggest allow for it to happen
again by adding HHMM.  Otherwise looks sensible to me (YMMV).

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] Off-topic re: supporting old systems

2016-08-31 Thread Christopher Samuel
On 31/08/16 14:01, Paul Hargrove wrote:

> So, the sparc platform is a bit more orphaned that it already was when
> support stopped at Wheezy.

Ah sorry, I didn't realise you were on a non-LTS Wheezy architecture.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] Off-topic re: supporting old systems

2016-08-30 Thread Christopher Samuel
On 31/08/16 12:05, Paul Hargrove wrote:

> As Giles mentions the http: redirects to https: before anything is fetched.
> Replacing "-nv" in the wget command with "-v" shows that redirect clearly.

Agreed, but it still just works on Debian Wheezy for me. :-)

What does "apt-cache policy wget" say for you?

root@db3:/tmp# apt-cache policy wget
wget:
  Installed: 1.13.4-3+deb7u3
  Candidate: 1.13.4-3+deb7u3
[...]

Here's the plain wget, with redirect, don't even need to disable the
certificate check here on Debian Wheezy (though it still works if you do).

root@db3:/tmp# wget  
http://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
--2016-08-31 12:11:59--  
http://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
Resolving www.open-mpi.org (www.open-mpi.org)... 192.185.39.252
Connecting to www.open-mpi.org (www.open-mpi.org)|192.185.39.252|:80... 
connected.
HTTP request sent, awaiting response... 302 Found
Location: 
https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2 
[following]
--2016-08-31 12:11:59--  
https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
Connecting to www.open-mpi.org (www.open-mpi.org)|192.185.39.252|:443... 
connected.
HTTP request sent, awaiting response... 200 OK
Length: 8192091 (7.8M) [application/x-tar]
Saving to: `openmpi-2.0.1rc2.tar.bz2'

100%[>]
 8,192,091   1.75M/s   in 7.3s

2016-08-31 12:12:08 (1.07 MB/s) - `openmpi-2.0.1rc2.tar.bz2' saved 
[8192091/8192091]


All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] Off-topic re: supporting old systems

2016-08-30 Thread Christopher Samuel
On 31/08/16 06:22, Paul Hargrove wrote:

> It seems that a stock Debian Wheezy system cannot even *download* Open
> MPI any more:

Works for me, both http (which shouldn't be using SSL anyway) and https.

Are you behind some weird intercepting proxy?

root@db3:/tmp# wget -nv --no-check-certificate 
http://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
2016-08-31 10:42:34 
URL:https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
 [8192091/8192091] -> "openmpi-2.0.1rc2.tar.bz2" [1]

root@db3:/tmp# wget -nv --no-check-certificate 
https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
2016-08-31 10:43:10 
URL:https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
 [8192091/8192091] -> "openmpi-2.0.1rc2.tar.bz2.1" [1]

root@db3:/tmp# cat /etc/issue
Debian GNU/Linux 7 \n \l

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] Migration of mailman mailing lists

2016-07-18 Thread Christopher Samuel
On 19/07/16 02:05, Brice Goglin wrote:

> Yes, kill all netloc lists.

Will the archives be preserved somewhere for historical reference?

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


Re: [OMPI devel] [1.10.3rc4] testing results

2016-06-06 Thread Christopher Samuel
On 06/06/16 15:09, Larry Baker wrote:

> An impressive accomplishment by the development team.  And impressive
> coverage by Paul's testbed.  Well done!

Agreed, it is very impressive to watch both on the breaking & the fixing
side of things. :-)

Thanks so much to all involved with this.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


Re: [OMPI devel] RFC: Public Test Repo

2016-05-19 Thread Christopher Samuel
Hi Josh,

On 19/05/16 13:54, Josh Hursey wrote:

> Let me know what you think. Certainly everything here is open for
> discussion, and we will likely need to refine aspects as we go.

I think having an open test suite in conjunction with the current
private one is a great way to go, I think it sends the right message
about openness and hopefully allows a community to build around MPI
testing in general.

Certainly happy to try it out!

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


Re: [OMPI devel] Github pricing plan changes announced today

2016-05-17 Thread Christopher Samuel
On 18/05/16 09:59, Gilles Gouaillardet wrote:

> the (main) reason is none of us are lawyers and none of us know whether
> all test suites can be redistributed for general public use or not.

Thanks Gilles,

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


Re: [OMPI devel] Github pricing plan changes announced today

2016-05-17 Thread Christopher Samuel
On 12/05/16 06:21, Jeff Squyres (jsquyres) wrote:

> We basically have one important private repo (the tests repo).

Possibly a dumb question (sorry), but what's the reason for that repo
being private?

I ask as someone on the Beowulf list today was looking for an MPI
regression test tool and found MTT but commented:

# OpenMPI has the MPI Testing Tool which looks like it would work,
# but most of there tests seem private.

and so moved on to look at other options instead.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


Re: [OMPI devel] [2.0.0rc2] xlc-13.1.0 ICE (hwloc)

2016-05-05 Thread Christopher Samuel
On 03/05/16 18:11, Paul Hargrove wrote:

> xlc-13.1.0 on Linux dies compiling the embedded hwloc in this rc
> (details below).

In case it's useful xlc 12.1.0.9-140729 (yay for BGQ living in the past)
doesn't ICE on RHEL6 on Power7.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] mpif.h on Intel build when run with OMPI_FC=gfortran

2016-03-03 Thread Christopher Samuel
Hi Gilles,

On 04/03/16 13:33, Gilles Gouaillardet wrote:

> there is clearly no hope when you use mpi.mod and mpi_f08.mod
> my point was, it is not even possible to expect "legacy" mpif.h work
> with different compilers.

Sorry, my knowledge of FORTRAN is limited to trying to debug why their
code wouldn't compile. :-)

Apologies for the noise.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] mpif.h on Intel build when run with OMPI_FC=gfortran

2016-03-03 Thread Christopher Samuel
On 04/03/16 12:17, Dave Turner wrote:

>  My understanding is that OpenMPI built with either Intel or
> GNU compilers should be able to use the other compilers using the
> OMPI_CC and OMPI_FC environmental variables.

Sadly not, we tried this but when our one of our very few FORTRAN users
(who happened to be our director) tried to use it it failed because the
mpi.mod module created during the build is compiler dependent. :-(

So ever since we've done separate builds for GCC and for Intel.

All the best!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] Problem with the 1.8.8 version

2015-12-06 Thread Christopher Samuel
On 05/12/15 01:52, Baldassari Caroline wrote:

> I have installed OpenMPI 1.8.8 (the last version 1.8.8 downloaded on
> your site)

v1.8 morphed into the v1.10 series, I'd suggest trying that..

http://www.slideshare.net/jsquyres/open-mpi-new-version-number-scheme-and-roadmap

cheers!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] Slides from the Open MPI SC'15 State of the Union BOF

2015-11-19 Thread Christopher Samuel
On 20/11/15 03:31, Dasari, Annapurna wrote:

> Jeff, could you check the link it didn¹t work for me..
> I tried to check out the slides by opening the link and downloading the
> file, I am getting a file damaged error on my system.

Not sure if Jeff has fixed them up since this, but they open fine for me
at the moment (using KDE's Okular PDF viewer).

Thanks for putting them up Jeff!

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] PMI2 in Slurm 14.11.8

2015-09-02 Thread Christopher Samuel
On 02/09/15 13:09, Christopher Samuel wrote:

> Instead PMI2 is in a contrib directory which appears to need manual
> intervention to install.

Confirming from the Slurm list that PMI2 is not built by default, it's
only the RPM build process that will include it without intervention.

Thanks for your help Ralph!

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



[OMPI devel] PMI2 in Slurm 14.11.8

2015-09-01 Thread Christopher Samuel
Hi all,

The OpenMPI FAQ says:

https://www.open-mpi.org/faq/?category=slurm#slurm-direct-srun-mpi-apps

# Yes, if you have configured OMPI --with-pmi=foo, where foo is
# the path to the directory where pmi.h/pmi2.h is located.
# Slurm (> 2.6, > 14.03) installs PMI-2 support by default.

However, we've found on a new system we're bringing up this doesn't
appear to be true for the vanilla Slurm 14.11.8 we're installing.

Instead PMI2 is in a contrib directory which appears to need manual
intervention to install.

I've sent an email to the Slurm list to query this behaviour but I was
wondering in case anyone had run into this here too?

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Christopher Samuel
On 25/08/15 05:08, Jeff Squyres (jsquyres) wrote:

> FWIW, we have had verbal agreement in the past that the v1.8 series
> was the last one to contain MX support.  I think it would be fine for
> all MX-related components to disappear from v1.10.
> 
> Don't forget that Myricom as an HPC company no longer exists.

INRIA does have Open-MX (Myrinet Express over Generic Ethernet
Hardware), last release December 2014.  No idea if it's still developed
or used..

http://open-mx.gforge.inria.fr/

Brice?

Open-MPI is listed as working with it there. ;-)

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-14 Thread Christopher Samuel
On 14/07/15 01:49, Ralph Castain wrote:

> Okay, 1.8.7rc3 (we already had an rc2) is now out with all these changes
> - please take one last look.

Looks OK for XRC here, thanks!

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] Error in ./configure for Yocto

2015-07-09 Thread Christopher Samuel
On 10/07/15 01:38, Jeff Squyres (jsquyres) wrote:

> Just curious -- what's Yocto?

It's a system for building embedded Linux distros:

https://www.yoctoproject.org/

Intel announced the switch to Yocto for their MPSS distro
for Xeon Phi a couple of years ago (v3 and later).

https://software.intel.com/en-us/articles/intelr-mpss-transition-to-yocto-faq

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] Proposal: update Open MPI's version number and release process

2015-05-20 Thread Christopher Samuel
On 20/05/15 14:37, Howard Pritchard wrote:

> It would also be easy to trap the I-want-to-bypass-PR-because-I
> know-what-I'm-doing-developer with a second level of protection.  Just
> set up a jenkins project that does a smoke test after ever commit to
> master.  If the smoke test fails, send a naughty-gram to the committer
> and copy devel. Pretty soon the developer will get trained to use the PR
> process, unless they are that engineer I've yet to meet who always
> writes flawless code.

VMware used to have a bot that tweeted info about their testing,
including "$USER just broke the build at VMWare"; for example:

https://twitter.com/vmwarepbs/status/4634524702

:-)

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] Proposal: update Open MPI's version number and release process

2015-05-18 Thread Christopher Samuel
On 19/05/15 05:11, Jeff Squyres (jsquyres) wrote:

>  We've reached internal consensus, and would like to present this to the 
> larger community for feedback.

My gut feeling is that this is very good; from a cluster admin point of
view it means we keep a system tracking one level up from where we are
currently, i.e. at V4.x.x (for example) rather than v1.6.x or v1.8.x.

We've got a new system coming up in the next few months (fingers
crossed) and so it'll be interesting to see where we fall in terms of
the v1.10 or v2 releases but either way I see this as making our lives
easier.

Thanks!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] Chris Yeoh

2015-04-30 Thread Christopher Samuel
On 01/05/15 00:41, Jeff Squyres (jsquyres) wrote:

> I am saddened to inform the Open MPI developer community of the death
> of Chris Yeoh.

There is page for donations to lung cancer research in his memory here
(Chris was not a smoker, but it still took his life):

http://participate.freetobreathe.org/site/TR?px=1582460&fr_id=2710&pg=personal#.VSscH5SUd90

# Chris never smoked, yet was taken too early by this dreadful
# disease. Lung cancer has the greatest kill rate with the
# smallest funding rate because of the stigma associated with
# it being a "smoker's disease", but anyone with lungs can get
# it. We hope that further funding to research will one day
# provide a cure and better trials for others.

Valē Chris.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Christopher Samuel
On 03/02/15 05:09, Ralph Castain wrote:

> Just out of curiosity: I see you are reporting about a build on the
> headnode of a BG cluster. We've never ported OMPI to BG - are you using
> it on such a system? Or were you just test building the code on a
> convenient server?

Just a convenient server with a not-so-mainstream architecture (and an
older RHEL release through necessity).  Sorry to get your hopes up! :-)

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-01 Thread Christopher Samuel
On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:

> New tarball posted (same location).  Now featuring 100% fewer "make check" 
> failures.

On our BG/Q front-end node (PPC64, RHEL 6.4) I see:

../../config/test-driver: line 95: 30173 Segmentation fault  (core dumped) 
"$@" > $log_file 2>&1
FAIL: opal_lifo

Stack trace implies the culprit is in:

#0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
at 
/vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
51  old = *addr;

I've attached a script of gdb doing "thread apply all bt full" in
case that's helpful.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

Script started on Mon 02 Feb 2015 12:32:56 EST

[samuel@avoca class]$ gdb /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo core.32444
[?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
[New Thread 32465]
[New Thread 32464]
[New Thread 32466]
[New Thread 32444]
[New Thread 32469]
[New Thread 32467]
[New Thread 32470]
[New Thread 32463]
[New Thread 32468]
Missing separate debuginfo for /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
Missing separate debuginfo for 
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
Reading symbols from /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
Loaded symbols for /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld64.so.1
Core was generated by `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
Program terminated with signal 11, Segmentation fault.
#0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
51	old = *addr;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.ppc64
(gdb) thread apply all bt full

Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)):
#0  0x0080adb6629c in .__libc_write () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x0fff7d6905b4 in show_stackframe (signo=11, info=0xfff7a0ee3d8, p=0xfff7a0edd00)
at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/util/stacktrace.c:81
print_buffer = "[avoca:32444] *** Process received signal ***\n", '\000' 
tmp = 0xfff7a0ed858 "[avoc

Re: [OMPI devel] mlx4 QP operation err

2015-01-28 Thread Christopher Samuel
Hi Dave,

On 29/01/15 11:31, Dave Turner wrote:

>   I've found some old references to similar mlx4 errors dating back to
> 2009 that lead me to believe this may be a firmware error.  I believe we're
> running the most up to date version of the firmware.

There was a new version released a few days ago, 2.33.5100:

http://www.mellanox.com/page/firmware_table_ConnectX3ProEN

Release notes are here:

http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_33_5100-release_notes.pdf

Bug fixes start on page 23, looks like there are 29 fixes
in this version, and fix 1 is for RoCE (though of course may
not be relevant) - "The first Read response was not treated as
implicit ACK" (discovered in 2.30.8000).

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Christopher Samuel
On 22/08/14 10:43, Ralph Castain wrote:

> From your earlier concerns, I would have expected only to find 32 of
> them running. Was that not the case in this run?

As I understand it in his original email he mentioned that with 1.6.5
all 48 processes were running at 100% CPU and was wondering if the buggy
BIOS that caused hwloc the issues he reported on the hwloc-users list
might be the cause for this regression in performance.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



[OMPI devel] Grammar error in git master: 'You job will now abort'

2014-08-13 Thread Christopher Samuel
Hi all,

We spotted this in 1.6.5 and git grep shows it's fixed in the
v1.8 branch but in master it's still there:

samuel@haswell:~/Code/OMPI/ompi-svn-mirror$ git grep -n 'You job will now abort'
orte/tools/orterun/help-orterun.txt:679:You job will now abort.
samuel@haswell:~/Code/OMPI/ompi-svn-mirror$ 

I'm using https://github.com/open-mpi/ompi-svn-mirror.git so
let me know if I should be using something else now.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/05/14 00:16, Joshua Ladd wrote:

> The necessary packages will be supported and available in community
> OFED.

We're constrained to what is in RHEL6 I'm afraid.

This is because we have to run GPFS over IB to BG/Q from the same NSDs
that talk GPFS to all our Intel clusters.   We did try MOFED 2.x (in
connected mode) on a new Intel cluster during its bring up last year
which worked for MPI but stopped it talking to the NSDs.  Reverting to
vanilla RHEL6 fixed it.

Not your problem though. :-)  As Ralph has said there is work on an
alternative solution that we will be able to use.

Thanks!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNsG88ACgkQO2KABBYQAh8+SwCfZWpViBFwuhlxqERXpbXbr8Eq
awwAnjj7NJ2/zUGBeZNT0UPwkmaGOaLR
=nPxl
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/05/14 23:45, Ralph Castain wrote:

> Artem and I are working on a new PMIx plugin that will resolve it 
> for non-Mellanox cases.

Ah yes of course, sorry my bad!

- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNsGcsACgkQO2KABBYQAh/ATgCfeQHS1KsZbLS8Hdux6p98K3w3
DqsAn3vZJMtYGs1xWK4ubK26ceuACtf1
=zPyS
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/05/14 12:54, Ralph Castain wrote:

> I think there was one 2.6.x that was borked, and definitely
> problems in the 14.03.x line. Can't pinpoint it for you, though.

No worries, thanks.

> Sounds good. I'm going to have to dig deeper into those numbers, 
> though, as they don't entirely add up to me. Once the job gets 
> launched, the launch method itself should have no bearing on 
> computational speed - IF all things are equal. In other words, if
> the process layout is the same, and the binding pattern is the
> same, then computational speed should be roughly equivalent
> regardless of how the procs were started.

Not sure if it's significant but when mpirun was launching processes
it was using srun to start orted which then started MPI ranks whereas
with PMI/PMI2 it appeared to directly start the ranks.

> My guess is that your data might indicate a difference in the
> layout and/or binding pattern as opposed to PMI2 vs mpirun. At the
> scale you mention later in the thread (only 70 nodes x 16 ppn), the
> difference in launch timing would be zilch. So I'm betting you
> would find (upon further exploration) that (a) you might not have
> been binding processes when launching by mpirun, since we didn't
> bind by default until the 1.8 series, but were binding under direct
> srun launch, and (b) your process mapping would quite likely be
> different as we default to byslot mapping, and I believe srun
> defaults to bynode?

FWIW all our environment modules that do OMPI have:

setenv OMPI_MCA_orte_process_binding core

> Might be worth another comparison run when someone has time.

Yeah, I'll try and queue up some more tests - unfortunately the
cluster we tested on then is flat out at the moment but I'll try and
sneak a 64-core job using identical configs and compare mpirun, srun
on its own and srun with PMI2.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNq/K8ACgkQO2KABBYQAh/q0wCcDvYjl4tYVXrHNciCkKgbnwF7
VHoAn3Q+gZXQNKzs++3uajmiGTkq/EeD
=ucJg
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/05/14 18:00, Ralph Castain wrote:

> Interesting - how many nodes were involved? As I said, the bad 
> scaling becomes more evident at a fairly high node count.

Our x86-64 systems are low node counts (we've got BG/Q for capacity),
the cluster that those tests were run on has 70 nodes, each with 16
cores, so I suspect we're a long long way away from that pain point.

All the best!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNq4zQACgkQO2KABBYQAh8ErQCcCBFFeB5q27b7AkqfClliUdvC
NJIAn1Cun+yY8zd6IToEsYJELpJTIdGb
=K0XF
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

Apologies for having dropped out of the thread, night intervened here. ;-)

On 08/05/14 00:45, Ralph Castain wrote:

> Okay, then we'll just have to develop a workaround for all those 
> Slurm releases where PMI-2 is borked :-(

Do you know what these releases are?  Are we talking 2.6.x or 14.03?
The 14.03 series has had a fair few rapid point releases and doesn't
appear to be anywhere as near as stable as 2.6 was when it came out. :-(

> FWIW: I think people misunderstood my statement. I specifically
> did *not* propose to *lose* PMI-2 support. I suggested that we
> change it to "on-by-request" instead of the current "on-by-default"
> so we wouldn't keep getting asked about PMI-2 bugs in Slurm. Once
> the Slurm implementation stabilized, then we could reverse that
> policy.
> 
> However, given that both you and Chris appear to prefer to keep it 
> "on-by-default", we'll see if we can find a way to detect that
> PMI-2 is broken and then fall back to PMI-1.

My intention was to provide the data that led us to want PMI2, but if
configure had an option to enable PMI2 by default so that only those
who requested it got it then I'd be more than happy - we'd just add it
to our script to build it.

All the best!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNq2poACgkQO2KABBYQAh+7DwCfeahirvoQ9Wom4VNhJIIdufeP
7uIAnAruTnXZBn6HXhuMAlzzSsoKkXlt
=OvH4
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hiya Ralph,

On 07/05/14 14:49, Ralph Castain wrote:

> I should have looked closer to see the numbers you posted, Chris -
> those include time for MPI wireup. So what you are seeing is that
> mpirun is much more efficient at exchanging the MPI endpoint info
> than PMI. I suspect that PMI2 is not much better as the primary
> reason for the difference is that mpriun sends blobs, while PMI
> requires that everything be encoded into strings and sent in little
> pieces.
> 
> Hence, mpirun can exchange the endpoint info (the dreaded "modex"
> operation) much faster, and MPI_Init completes faster. Rest of the
> computation should be the same, so long compute apps will see the
> difference narrow considerably.

Unfortunately it looks like I had an enthusiastic cleanup at some point
and so I cannot find the out files from those runs at the moment, but
I did find some comparisons from around that time.

This first pair are comparing running NAMD with OMPI 1.7.3a1r29103
run with mpirun and srun successively from inside the same Slurm job.

mpirun namd2 macpf.conf 
srun --mpi=pmi2 namd2 macpf.conf 

Firstly the mpirun output (grep'ing the interesting bits):

Charm++> Running on MPI version: 2.1
Info: Benchmark time: 512 CPUs 0.0959179 s/step 0.555081 days/ns 1055.19 MB 
memory
Info: Benchmark time: 512 CPUs 0.0929002 s/step 0.537617 days/ns 1055.19 MB 
memory
Info: Benchmark time: 512 CPUs 0.0727373 s/step 0.420933 days/ns 1055.19 MB 
memory
Info: Benchmark time: 512 CPUs 0.0779532 s/step 0.451118 days/ns 1055.19 MB 
memory
Info: Benchmark time: 512 CPUs 0.0785246 s/step 0.454425 days/ns 1055.19 MB 
memory
WallClock: 1403.388550  CPUTime: 1403.388550  Memory: 1119.085938 MB

Now the srun output:

Charm++> Running on MPI version: 2.1
Info: Benchmark time: 512 CPUs 0.0906865 s/step 0.524806 days/ns 1036.75 MB 
memory
Info: Benchmark time: 512 CPUs 0.0874809 s/step 0.506255 days/ns 1036.75 MB 
memory
Info: Benchmark time: 512 CPUs 0.0746328 s/step 0.431903 days/ns 1036.75 MB 
memory
Info: Benchmark time: 512 CPUs 0.0726161 s/step 0.420232 days/ns 1036.75 MB 
memory
Info: Benchmark time: 512 CPUs 0.0710574 s/step 0.411212 days/ns 1036.75 MB 
memory
WallClock: 1230.784424  CPUTime: 1230.784424  Memory: 1100.648438 MB


The next two pairs are first launched using mpirun from 1.6.x and then with srun
from 1.7.3a1r29103.  Again each pair inside the same Slurm job with the same 
inputs.

First pair mpirun:

Charm++> Running on MPI version: 2.1
Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory
WallClock: 8341.524414  CPUTime: 8341.524414  Memory: 975.015625 MB

First pair srun:

Charm++> Running on MPI version: 2.1
Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB memory
WallClock: 7476.643555  CPUTime: 7476.643555  Memory: 968.867188 MB


Second pair mpirun:

Charm++> Running on MPI version: 2.1
Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB memory
WallClock: 7842.831543  CPUTime: 7842.831543  Memory: 1004.050781 MB

Second pair srun:

Charm++> Running on MPI version: 2.1
Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory
WallClock: 7522.677246  CPUTime: 7522.677246  Memory: 969.433594 MB


So to me it looks like (for NAMD on our system at least) that
PMI2 does seem to give better scalability.

All the best!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNAT

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-06 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/05/14 13:37, Moody, Adam T. wrote:

> Hi Chris,

Hi Adam,

> I'm interested in SLURM / OpenMPI startup numbers, but I haven't
> done this testing myself.  We're stuck with an older version of
> SLURM for various internal reasons, and I'm wondering whether it's
> worth the effort to back port the PMI2 support.  Can you share some
> of the differences in times at different scales?

We've not looked at startup times I'm afraid, this was time to
solution. We noticed it with Slurm when we first started using on
x86-64 for our NAMD tests (this from a posting to the list last year
when I raised the issue and were told PMI2 would be the solution):

> Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB.
> 
> Here are some timings as reported as the WallClock time by NAMD 
> itself (so not including startup/tear down overhead from Slurm).
> 
> srun:
> 
> run1/slurm-93744.out:WallClock: 695.079773  CPUTime: 695.079773 
> run4/slurm-94011.out:WallClock: 723.907959  CPUTime: 723.907959 
> run5/slurm-94013.out:WallClock: 726.156799  CPUTime: 726.156799 
> run6/slurm-94017.out:WallClock: 724.828918  CPUTime: 724.828918
> 
> Average of 692 seconds
> 
> mpirun:
> 
> run2/slurm-93746.out:WallClock: 559.311035  CPUTime: 559.311035 
> run3/slurm-93910.out:WallClock: 544.116333  CPUTime: 544.116333 
> run7/slurm-94019.out:WallClock: 586.072693  CPUTime: 586.072693
> 
> Average of 563 seconds.
> 
> So that's about 23% slower.
> 
> Everything is identical (they're all symlinks to the same golden 
> master) *except* for the srun / mpirun which is modified by
> copying the batch script and substituting mpirun for srun.



- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNprUUACgkQO2KABBYQAh9rLACfcZc4HR/u6G0bJejM3C/my7Nw
8b4AnRasOMvKZjpjpyKkbplc6/Iq9qBK
=pqH9
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-06 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/05/14 12:53, Ralph Castain wrote:

> We have been seeing a lot of problems with the Slurm PMI-2 support 
> (not in OMPI - it's the code in Slurm that is having problems). At 
> this time, I'm unaware of any advantage in using PMI-2 over PMI-1
> in Slurm - the scaling is equally poor, and PMI-2 does not supports
> any additional functionality.
> 
> I know that Cray PMI-2 has a definite advantage, so I'm proposing 
> that we turn PMI-2 "off" when under Slurm unless the user 
> specifically requests we use it.

Our local testing has shown that PMI-2 in 1.7.x gives a massive
improvement in scaling when starting jobs with srun over using srun
with OMPI 1.6.x and now that OMPI 1.8.x is out we're planning on
moving to using PMI2 with OMPI and srun.

Using mpirun gives good performance with OMPI 1.6.x but Slurm then
gets all its memory stats wrong and if you run with CR_Core_Memory in
Slurm you have a very high risk your job will get killed incorrectly.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNpqUwACgkQO2KABBYQAh/igwCfQSB/v3tI37Rq4z5z/0xT/BYU
6ToAn3Qt6tOt46LQD25eHhlx+3z/sjnQ
=LEHf
-END PGP SIGNATURE-


Re: [OMPI devel] SC13 birds of a feather

2013-12-05 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/12/13 01:52, Jeff Squyres (jsquyres) wrote:

> Ralph -- let's chat about this in Chicago next Friday.  I'll add
> it to the agenda on the wiki.  I assume this would not be
> difficult stuff; we don't really need to do anything fancy at all.
> I think we just want to sketch out what exactly we want to do, and
> it could probably be done in a day or three.

There's also stuff that ACPI can expose under:

/sys/class/thermal/thermal_zone*/temp

though it might need a bit more prodding to work out what's what there.

> (Thanks for the idea, Samuel!)

My pleasure!

All the best,

Chris

- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKgESUACgkQO2KABBYQAh+44gCeIsDplsMAiwC4PJBbco5vurVy
PbwAn0h9kJYIoeK1Y/mlowwHLBRb1oQX
=WFYZ
-END PGP SIGNATURE-


Re: [OMPI devel] SC13 birds of a feather

2013-12-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/12/13 09:27, Jeff Squyres (jsquyres) wrote:

> 2. The MPI_T performance variables are new.  There's only a few 
> created right now (e.g., in the Cisco usnic BTL).  But the field
> is pretty wide open here -- the infrastructure is there, but we're 
> really not exposing much information yet.  There's lots that can
> be done here.

Random thought - please shoot it down if crazy...

Would it make any sense to expose system/environmental/thermal
information to the application via MPI_T ?

For our sort of systems with a grab bag of jobs it's not likely
useful, but if you had a system dedicated to running an in house code
then you could conceive of situations where you might want to react to
over-temperature cores, nodes, etc.

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKefTUACgkQO2KABBYQAh8pCACaAo1Bf+5mKHWT2ZLysWkSG9fs
Rc8An3H4NwI0MwqkGxG2PWMJ+4U/Vdsv
=2YN+
-END PGP SIGNATURE-


Re: [OMPI devel] Openmpi 1.6.5 is freezing under GNU/Linux ia64

2013-12-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/12/13 23:50, Sylvestre Ledru wrote:

> FYI, Debian has stopped supporting ia64 for its next release
> So, I stopped working on that issue.

Yeah, it's not looking good - here's the context for this:

http://lists.debian.org/debian-devel-announce/2013/11/msg7.html

# We have stopped considering ia64 as a blocker for testing
# migration. This means that the out-of-date and uninstallability
# criteria on ia64 will not hold up transitions from unstable to
# testing for packages. It is expected that unless drastic
# improvement occurs, ia64 will be removed from testing on
# Friday 24th January 2014.


- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKeb1YACgkQO2KABBYQAh8jeQCfUVYyP39G5m31dQL/ZuEAZOIz
xJIAn0Fs+bBZCRSwbmU35CAN8X8tzpex
=tRU7
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: usnic BTL MPI_T pvar scheme

2013-11-22 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 16/11/13 02:00, Jeff Squyres (jsquyres) wrote:

> This actually raises a point that MPI_T makes you read individual 
> pvars separately -- there's no "atomically read this array of
> pvars" functionality.  That could lead to inconsistent results
> (e.g., first you read a network stat, and then you read an MPI
> layer stat -- but under the covers, the network stat could have
> changed by the time you read the MPI layer stat).  Hmm.

I suspect there's not much of a way around this, other than pausing
all MPI operations until you've read a value back from the OS.   But
then if you want to read multiple values from the OS you're going to
be out of luck there too.   Unless I'm missing something?

So perhaps the best thing is to just document this prominently.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKP/4cACgkQO2KABBYQAh/ZPwCgkdZA184I9S3q0lwzBMJAB+lM
JAIAnA5iVFlXHAdTt+au5SIgsUBJ4pka
=0mM0
-END PGP SIGNATURE-


[OMPI devel] Happy Open-MPI day everyone!

2013-11-22 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

As Jeff Squyres noted at the Open-MPI State of the Union at SC13, today
(22nd November) is the 10th anniversary of the first commit to the
Open-MPI project (shortly after SC03).

So as Jeff said "I hereby declare November 22nd Open-MPI Day!".

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKP/kcACgkQO2KABBYQAh8TlgCeJh5m2PqYMv2RjHTJX16dY/vM
KgwAn0f8jYjzVA514I4Mq+kkw5aHG5xQ
=tCyy
-END PGP SIGNATURE-


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29615 - in trunk: . contrib contrib/dist/linux debian debian/source

2013-11-06 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/11/13 04:40, Mike Dubman wrote:

> I did not find debian packaging files in the OMPI tree, could you please
> point me to it?

As Sylvestre explained Debian (and presumably Ubuntu too) will
automatically delete any /debian/ directory in an upstream tarball
and substitute their own packaging.

You can see what they put in for sid (testing) here:

http://ftp.de.debian.org/debian/pool/main/o/openmpi/openmpi_1.6.5-5.debian.tar.gz

Whilst I can understand the enthusiasm I don't think it's
going to be very helpful to Debian; perhaps a better way to
assist would be to help out Sylvestre and the other Debian
maintainers?  This might be a handy place to start:

http://qa.debian.org/developer.php?login=pkg-openmpi-maintainers%40lists.alioth.debian.org

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJ6yWMACgkQO2KABBYQAh/3pwCghbRhvVYPa5WV9XmcLzMQbCQB
mxsAn3LKsvax6RyiRtAj3Zag9yynEoe6
=sH/h
-END PGP SIGNATURE-


Re: [OMPI devel] 1.6.5 large matrix test doesn't pass (decode) ?

2013-10-16 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/10/13 01:49, KAWASHIMA Takahiro wrote:

> It is a bug in the test program, test/datatype/ddt_raw.c, and it
> was fixed at r24328 in trunk.
> 
> https://svn.open-mpi.org/trac/ompi/changeset/24328
> 
> I've confirmed the failure occurs with plain v1.6.5 and it doesn't 
> occur with patched v1.6.5.

Perfect, thanks!

Sorry for the delay, been away on holiday.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJeE8gACgkQO2KABBYQAh9LMgCeJ7EQKFD/nRPBtFFDH/kSFw51
j0AAn2RfQrNz6E1KTnL0BL5p3tQMLHVT
=VSfO
-END PGP SIGNATURE-


[OMPI devel] 1.6.5 large matrix test doesn't pass (decode) ?

2013-10-04 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Not sure if this is important, or expected, but I ran a make check out
of interest after seeing recent emails and saw the final one of these
tests be reported as "NOT PASSED" (it seems to be the only failure).

No idea if this is important or not.  The text I see is:

 #
 * TEST UPPER MATRIX
 #

test upper matrix
complete raw in 7 microsec
decode [NOT PASSED]


This happens on both our Nehalem and SandyBridge clusters and we are
building with the system GCC.  I've attached the full log from our
Nehalem cluster (RHEL 6.4).


Our configure script is:

#!/bin/bash

BASE=`basename $PWD | sed -e s,-,/,`

module purge

./configure --prefix=/usr/local/${BASE} --with-slurm --with-openib \
--enable-static  --enable-shared

make -j


I'm away on leave next week (first break for a year, yay!) but back
the week after..

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJOVUcACgkQO2KABBYQAh+J/QCfX+U1kZvtgFL1UxyIZBbNdqyW
PC4An2AciGo2BkOq5RnceDYjACcUsV7X
=0VKJ
-END PGP SIGNATURE-
Making check in config
make[1]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/config'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/config'
Making check in contrib
make[1]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/contrib'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/contrib'
Making check in opal
make[1]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal'
Making check in include
make[2]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/include'
make[2]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/include'
Making check in libltdl
make[2]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/libltdl'
make  check-am
make[3]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/libltdl'
make[3]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/libltdl'
make[2]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/libltdl'
Making check in asm
make[2]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/asm'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/asm'
Making check in datatype
make[2]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/datatype'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/datatype'
Making check in etc
make[2]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/etc'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/etc'
Making check in event
make[2]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event'
Making check in compat
make[3]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event/compat'
Making check in sys
make[4]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event/compat/sys'
make[4]: Nothing to be done for `check'.
make[4]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event/compat/sys'
make[4]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event/compat'
make[4]: Nothing to be done for `check-am'.
make[4]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event/compat'
make[3]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event/compat'
make[3]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event'
make[3]: Nothing to be done for `check-am'.
make[3]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event'
make[2]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/event'
Making check in util
make[2]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/util'
Making check in keyval
make[3]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/util/keyval'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/util/keyval'
make[3]: Entering directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/util'
make[3]: Nothing to be done for `check-am'.
make[3]: Leaving directory `/usr/local/src/OPENMPI/openmpi-1.6.5.1/opal/util'
make[2]: Leaving directory `/usr/local/src/OPENMPI/open

Re: [OMPI devel] Openmpi 1.6.5 is freezing under GNU/Linux ia64

2013-09-21 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21/09/13 14:33, Ralph Castain wrote:

> I think you misunderstood the issue here. The problem is that
> mpirun appears to be hanging before it ever gets to the point of
> launching something.

Ah, quite correct, I hadn't realised the debug info hadn't shown it
getting to the point of launching the executable. Mea culpa.
I blame jet-lag. ;-)

cheers,
Chris (about to get a second dose)
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlI96CoACgkQO2KABBYQAh9o6gCdFQ4HiKtHlhoqmQjHGRRMZXCC
QooAnjNRPf3dzh/MjD0rzspLRxs2ExFd
=V7Ux
-END PGP SIGNATURE-


Re: [OMPI devel] Openmpi 1.6.5 is freezing under GNU/Linux ia64

2013-09-20 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21/09/13 05:49, Sylvestre Ledru wrote:

> Does it ring a bell to anyone ?

Possibly, if you run the binary without mpirun does it do the same?

If so, could you try and run it with strace -f and see if you see
repeating SEGV's?

cheers!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlI9AXAACgkQO2KABBYQAh/QqQCeIXNLXsO094MdRT9OTguQdSqp
apAAniGAjZOJly2FLdM7YWyvrvZfhOPI
=MsBl
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-06 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/09/13 14:14, Christopher Samuel wrote:

> However, modifying the test program confirms that variable is getting
> propagated as expected with both mpirun and srun for 1.6.5 and the 1.7
> snapshot. :-(

Investigating further by setting:

export OMPI_MCA_orte_report_bindings=1
export SLURM_CPU_BIND=core
export SLURM_CPU_BIND_VERBOSE=verbose

reveals that only OMPI 1.6.5 with mpirun reports bindings being set
(see below).   We cannot understand why Slurm doesn't *appear* to be
setting bindings as we have the correct settings according to the
documentation.

Whilst it may explain the difference between 1.6.5 mpirun and srun
it doesn't to explain why the 1.7 snapshot is so much better as you'd
expect them to be hurt in the same way.


==OPENMPI 1.6.5==
==mpirun==
[barcoo003:03633] System has detected external process binding to cores 0001
[barcoo003:03633] MCW rank 0 bound to socket 0[core 0]: [B]
[barcoo004:04504] MCW rank 1 bound to socket 0[core 0]: [B]
Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 
universe envar 2
Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 
universe envar 2
==srun==
Hello, World, I am 0 of 2 on host barcoo003 from app number 1 universe size 2 
universe envar NULL
Hello, World, I am 1 of 2 on host barcoo004 from app number 1 universe size 2 
universe envar NULL
=
==OPENMPI 1.7.3==
DANGER: YOU ARE LOADING A TEST VERSION OF OPENMPI. THIS MAY BE BAD.
==mpirun==
Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 
universe envar 2
Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 
universe envar 2
==srun==
Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 
universe envar NULL
Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 
universe envar NULL
=====



- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIpcxcACgkQO2KABBYQAh/wdQCfR4q7DfGqJVSU0O3BmgXqAn8w
HsEAn3po0xaxB0+ywejWgSjQ385da7Pa
=T3w4
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-06 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/09/13 00:23, Hjelm, Nathan T wrote:

> I assume that process binding is enabled for both mpirun and srun?
> If not that could account for a difference between the runtimes.

You raise an interesting point, we have been doing that with:

[samuel@barcoo ~]$ module show openmpi 2>&1 | grep binding
setenv   OMPI_MCA_orte_process_binding core

However, modifying the test program confirms that variable is getting
propagated as expected with both mpirun and srun for 1.6.5 and the 1.7
snapshot. :-(

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIpVp4ACgkQO2KABBYQAh88rQCggOZkAjPV+/1PX2R9auuij+1M
jdsAn17nDCoubkdvCsLRKozqGEYWjOY1
=RaoK
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-05 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Ralph,

On 05/09/13 12:50, Ralph Castain wrote:

> Jeff and I were looking at a similar issue today and suddenly 
> realized that the mappings were different - i.e., what ranks are
> on what nodes differs depending on how you launch. You might want
> to check if that's the issue here as well. Just launch the
> attached program using mpirun vs srun and check to see if the maps
> are the same or not.

Very interesting, the ranks to node mappings are identical in all
cases (mpirun and srun for 1.6.5 and my test 1.7.3 snapshot) but what
is different is as follows.


For the 1.6.5 build I see mpirun report:

number 0 universe size 64 universe envar 64

whereas srun report:

number 1 universe size 64 universe envar NULL



For the 1.7.3 snapshot both report "number 0" so the only difference
there is that mpirun has:

envar 64

whereas srun has:

envar NULL


Are these differences significant?

I'm intrigued that the problem child (srun 1.6.5) is the only one
where number is 1.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIoCesACgkQO2KABBYQAh+0NACeK9uyDk3UZerufAopuQRxhR/T
4skAmwS/X+8jNOPlGt4H/t5yRK8vmMer
=8TGu
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-04 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/09/13 18:33, George Bosilca wrote:

> You can confirm that the slowdown happen during the MPI
> initialization stages by profiling the application (especially the
> MPI_Init call).

NAMD helpfully prints benchmark and timing numbers during the initial
part of the simulation, so here's what they say.  For both seconds
per step and days per nanosecond of simulation less is better.

I've included the benchmark numbers (every 100 steps or so from the
start) and the final timing number after 25000 steps.  It looks like
to me (as a sysadmin and not an MD person) that the final timing
number includes CPU time in seconds per step and wallclock time in
seconds per step.

64 cores over 10 nodes:

OMPI 1.7.3a1r29103 mpirun

Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory
Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory

TIMING: 25000  CPU: 8247.2, 0.330157/step  Wall: 8247.2, 0.330157/step, 
0.0229276 hours remaining, 921.894531 MB of memory in use.

OMPI 1.7.3a1r29103 srun

Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory
Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB memory

TIMING: 25000  CPU: 7390.15, 0.296/step  Wall: 7390.15, 0.296/step, 0.020 
hours remaining, 915.746094 MB of memory in use.


64 cores over 18 nodes:

OMPI 1.6.5 mpirun

Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB memory
Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB memory

TIMING: 25000  CPU: 7754.17, 0.312071/step  Wall: 7754.17, 0.312071/step, 
0.0216716 hours remaining, 950.929688 MB of memory in use.

OMPI 1.7.3a1r29103 srun

Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory
Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory

TIMING: 25000  CPU: 7420.91, 0.296029/step  Wall: 7420.91, 0.296029/step, 
0.0205575 hours remaining, 916.312500 MB of memory in use.


Hope this is useful!

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIn6UoACgkQO2KABBYQAh9GWgCghcYKSj1i9rDDQospURAeusD5
E+EAn2beqUlYZWHxi1Dgj8ZEpiai4zH1
=k5Uz
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/09/13 11:29, Ralph Castain wrote:

> Your code is obviously doing something much more than just
> launching and wiring up, so it is difficult to assess the
> difference in speed between 1.6.5 and 1.7.3 - my guess is that it
> has to do with changes in the MPI transport layer and nothing to do
> with PMI or not.

I'm testing with what would be our most used application in aggregate
across our systems, the NAMD molecular dynamics code from here:

http://www.ks.uiuc.edu/Research/namd/

so yes,  you're quite right, it's doing a lot more than that and has a
reputation for being a *very* chatty MPI code.

For comparison whilst users see GROMACS also suffer with srun under
1.6.5 they don't see anything like the slow down that NAMD gets.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlImsCwACgkQO2KABBYQAh8c4wCfQlOd6ZL68tncAd1h3Fyb1hAr
DicAn06seL8GzYPGtGImnYkb7sYd5op9
=pkwZ
-END PGP SIGNATURE-


Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-09-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/09/13 04:47, Jeff Squyres (jsquyres) wrote:

> Hmm.  Are you building Open MPI in a special way?  I ask because I'm
> unable to replicate the issue -- I've run your test (and a C
> equivalent) a few hundred times now:

I don't think we do anything unusual, the script we are using is
fairly simple (it does a module purge to ensure we are just using the
system compilers and don't pick up anything strange) and is as follows:

#!/bin/bash

BASE=`basename $PWD | sed -e s,-,/,`

module purge

./configure --prefix=/usr/local/${BASE} --with-slurm --with-openib 
--enable-static  --enable-shared

make -j


- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlImicgACgkQO2KABBYQAh83GQCcDp/TF/lCe3RnmNYq+tl6ef0D
q2AAn3BNG8omGncmLc4HadRPZgRjQEph
=56wh
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/09/13 10:56, Ralph Castain wrote:

> Yeah - --with-pmi=

Actually I found that just --with-pmi=/usr/local/slurm/latest worked. :-)

I've got some initial numbers for 64 cores, as I mentioned the system
I found this on initially is so busy at the moment I won't be able to
run anything bigger for a while, so I'm going to move my testing to
another system which is a bit quieter, but slower (it's Nehalem vs
SandyBridge).

All the below tests are with the same NAMD 2.9 binary and within the
same Slurm job so it runs on the same cores each time. It's nice to
find that C code at least seems to be backwardly compatible!

64 cores over 18 nodes:

Open-MPI 1.6.5 with mpirun - 7842 seconds
Open-MPI 1.7.3a1r29103 with srun - 7522 seconds

so that's about a 4% speedup.

64 cores over 10 nodes:

Open-MPI 1.7.3a1r29103 with mpirun - 8341 seconds
Open-MPI 1.7.3a1r29103 with srun - 7476 seconds

So that's about 11% faster, and the mpirun speed has decreased though
of course that's built using PMI so perhaps that's the cause?

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEUEARECAAYFAlImiUUACgkQO2KABBYQAh+WvwCeM1ufCWvK627oz8aBbgKjfONe
cDEAmM3w+/EJ0unbmaetNR3ay4U6nrM=
=v/PT
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 31/08/13 02:42, Ralph Castain wrote:

> We did some work on the OMPI side and removed the O(N) calls to 
> "get", so it should behave better now. If you get the chance,
> please try the 1.7.3 nightly tarball. We hope to officially release
> it soon.

Stupid question, but never having played with PMI before is it just
the case of appending the --with-pmi option to our current configure?

thanks,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIlLPAACgkQO2KABBYQAh9GhwCeN192n4g5PBHpeHwOi2Kpyhs3
+X8An0TJ2VzrgeKl4+2YVVeZXq+6fz/W
=h4Ip
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 31/08/13 02:42, Ralph Castain wrote:

> Hi Chris et al

Hiya,

> We did some work on the OMPI side and removed the O(N) calls to 
> "get", so it should behave better now. If you get the chance,
> please try the 1.7.3 nightly tarball. We hope to officially release
> it soon.

Thanks so much, I'll get our folks to rebuild a test version of NAMD
against 1.7.3a1r29103 which I built this afternoon.

It might be some time until I can get a test job of a suitable size to
run though, looks like our systems are flat out!

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIkNsMACgkQO2KABBYQAh9AqgCggCUKRRLODZhfXUAJ6T2pYjGI
iSgAniISxkxnHXyEj7L6kmTs4wERy1rW
=31Qg
-END PGP SIGNATURE-


Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-09-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/09/13 16:32, Christopher Samuel wrote:

> I cannot duplicate this under valgrind or gdb and given that this
> doesn't happen every time I run it and gdb indicates there are at
> least 2 threads running then we're wondering if this is a race condition.

I have also duplicated this problem with 1.7.3a1r29103.

 Hello, world, I am0  of1
[barcoo:03306] *** Process received signal ***
[barcoo:03306] Signal: Segmentation fault (11)
[barcoo:03306] Signal code: Address not mapped (1)
[barcoo:03306] Failing at address: 0x2009b4298
[barcoo:03306] [ 0] /lib64/libpthread.so.0() [0x3f7b60f500]
[barcoo:03306] [ 1] 
/usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5(opal_memory_ptmalloc2_int_malloc+0x96a)
 [0x7f47de6935aa]
[barcoo:03306] [ 2] 
/usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5(opal_memory_ptmalloc2_malloc+0x52)
 [0x7f47de694612]
[barcoo:03306] [ 3] ./1.7-gnumyhello_f90() [0x400dca]
[barcoo:03306] [ 4] ./1.7-gnumyhello_f90() [0x40104a]
[barcoo:03306] [ 5] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3f7b21ecdd]
[barcoo:03306] [ 6] ./1.7-gnumyhello_f90() [0x400bc9]
[barcoo:03306] *** End of error message ***

The backtrace I get from the core file isn't as useful though:

(gdb) bt full
#0  0x7fd9c4c255aa in opal_memory_ptmalloc2_int_malloc () from 
/usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5
No symbol table info available.
#1  0x7fd9c4c26612 in opal_memory_ptmalloc2_malloc () from 
/usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5
No symbol table info available.
#2  0x00400dca in main () at gnumyhello_f90.f90:26
ierr = 0
rank = 0
size = 1
work = 
#3  0x0040104a in main ()
No symbol table info available.

OMPI 1.7 is built with exactly the same configure options as 1.6
and the executable is built with -g -O0.

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIkNXsACgkQO2KABBYQAh9fhQCdHUrlsl3ftY8VyDNRa8E8jKBx
BZkAnjJJIXgUzRV8T+VBmrS0MQjXS8zO
=B7GU
-END PGP SIGNATURE-


Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-09-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/09/13 15:40, Christopher Samuel wrote:

> It dies when it does:
> 
> set_head(remainder, remainder_size | PREV_INUSE);
> 
> where remainder_size=0.

Ignore that, I've shown it to someone who is actually a programmer and
we've determined that it's remainder that is wrong, not (necessarily)
remainder_size.

(gdb) print remainder
$1 = (struct malloc_chunk *) 0x2008e5700
(gdb) print *remainder
Cannot access memory at address 0x2008e5700

I cannot duplicate this under valgrind or gdb and given that this
doesn't happen every time I run it and gdb indicates there are at
least 2 threads running then we're wondering if this is a race condition.

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIkMQYACgkQO2KABBYQAh9HMQCfRapLNicP5gBeqPecQA4xpM0+
fuIAoIUcwwSxsc1Y3QZiX8rNjvyKZFMC
=NQKd
-END PGP SIGNATURE-


Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-09-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30/08/13 16:01, Christopher Samuel wrote:

> Thanks for this, I'll take a look further next week..

The code where it's SEGV'ing is here:

  /* check that one of the above allocation paths succeeded */
  if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {
remainder_size = size - nb;
remainder = chunk_at_offset(p, nb);
av->top = remainder;
set_head(p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
set_head(remainder, remainder_size | PREV_INUSE);
check_malloced_chunk(av, p, nb);
return chunk2mem(p);
  }


It dies when it does:

set_head(remainder, remainder_size | PREV_INUSE);

where remainder_size=0.

This implies that size and nb are the same, so I'm wondering
if the test at the top of that block should not have the equals,
so instead be this?

  /* check that one of the above allocation paths succeeded */
  if ((unsigned long)(size) > (unsigned long)(nb + MINSIZE)) {

It would ensure that the set_head() macro would never get called
with a 0 argument.

The code would then fall through to the malloc failure part
(which is what I suspect we want).

Thoughts?

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIkJNkACgkQO2KABBYQAh+Y/QCeLwnqEQGK4meKQbETwqHg1RtI
iikAoIofXBPnpI8qbS+zau9ezX78WizW
=QCSz
-END PGP SIGNATURE-


Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-08-30 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hiya Jeff,

On 30/08/13 11:13, Jeff Squyres (jsquyres) wrote:

> FWIW, the stack traces you sent are not during MPI_INIT.

I did say it was a suspicion. ;-)

> What happens with OMPI's memory manager is that it inserts itself
> to be *the* memory allocator for the entire process before main()
> even starts.  We have to do this as part of the horribleness of
> that is OpenFabrics/verbs and how it just doesn't match the MPI
> programming model at all.  :-(  (I think I wrote some blog entries
> about this a while ago...  Ah, here's a few:

Thanks!  I'll take a look next week (just got out of a 5.5 hour
meeting and have to head home now).

> Therefore, (in C) if you call malloc() before MPI_Init(), it'll be 
> calling OMPI's ptmalloc.  The stack traces you sent imply that
> it's just when your app is calling the fortran allocate -- which is
> after MPI_Init().

OK, that makes sense.

> FWIW, you can build OMPI with --without-memory-manager, or you can 
> setenv OMPI_MCA_memory_linux_disable to 1 (note: this is NOT a 
> regular MCA parameter -- it *must* be set in the environment
> before the MPI app starts).  If this env variable is set, OMPI will
> *not* interpose its own memory manager in the pre-main hook.  That
> should be a quick/easy way to try with and without the memory
> manager and see what happens.

Well with OMPI_MCA_memory_linux_disable=1 I don't get the crash at all,
or the spin with the Intel compiler build.  Nice!

Thanks for this, I'll take a look further next week..

Very much obliged,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIgNSgACgkQO2KABBYQAh9UhwCfXPKDbParUn3XBOOcwBNjionS
KxAAnRH1HGFsKWNVGqvmh4caE8cN85jn
=U4yB
-END PGP SIGNATURE-


Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-08-29 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Jeff, Ralph,

On 29/08/13 23:30, Jeff Squyres (jsquyres) wrote:

> Let me try to understand this test:
> 
> - you're simulating a 1GB memory limit via ulimit of virtual
> memory ("ulimit -v $((1*1024*1024))"), or 1,048,576 bytes.

Yeah, basically doing by hand what Torque/Slurm do by default for jobs
(unless the user asks for more).

When this happens for Dalton (compiled with the Intel compilers) it
just sits there spinning its wheels at start up.

> - you're trying to alloc 1070*10^6 = 1,070,000,000 bytes in an MPI 
> app

That was the developer trying to simulate the failure in Dalton.

> - OMPI is barfing in the ptmalloc allocator

Sounds like it.

> Meaning: you're trying to allocate 1,000x memory than you're
> allowing in virtual memory -- so I guess part of this test depends
> on how much physical RAM you have, because you're limiting virtual
> memory, right?

No, it only depends on the memory limits for the job in Slurm.

The reason for the test is that he was trying to see whether or not
those limits were successfully being propagated to MPI ranks or not in
Slurm (and it appears not).

However, in the process he found he could also replicate this
livelock/deadlock in Dalton.

> It's quite possible that the ptmalloc included in OMPI doesn't
> guard well against a failed mmap.  FWIW, I've seen all kinds of
> random badness (not just with OMPI) when malloc/mmap/etc. start
> failing due to lack of memory.

OK, so I'll try testing again with a larger limit to see if that will
ameliorate this issue.  I'm also wondering where this is happening in
OMPI, I've a sneaking suspicion this is at MPI_INIT().

> Do you get the same behavior if you disable ptmalloc in OMPI?
> (your IB large message bandwidth will suffer a bit, though)

Not tried that, but I'll take a look at it if it doesn't seem possible
to fix it with a change to the default memory limits (that'll be the
least intrusive).

Thanks!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIf2lMACgkQO2KABBYQAh/JrACfRKATdmD3hbSX0mHWtAt2cBP6
1wYAn31EjuS37inIaD151n1DxuAH4GAM
=yaYe
-END PGP SIGNATURE-


Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-08-29 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 28/08/13 19:36, Chris Samuel wrote:

> With RHEL 6.4 gfortran it instead SEGV's straight away

Using strace I can see a mmap(2) (called from malloc I presume)
failing just before the SEGV.

Process 6799 detached
Process 6798 detached
 Hello, world, I am0  of1
[pid  6796] mmap(NULL, 8560001024, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid  6796] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[barcoo:06796] *** Process received signal ***
[barcoo:06796] Signal: Segmentation fault (11)
[barcoo:06796] Signal code: Address not mapped (1)
[barcoo:06796] Failing at address: 0x20078d708
[pid  6796] mmap(NULL, 2097152, PROT_NONE, 
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f75a5fed000
[barcoo:06796] [ 0] /lib64/libpthread.so.0() [0x3f7b60f500]
[barcoo:06796] [ 1] 
/usr/local/openmpi/1.6.5/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x982)
 [0x7f77a68c2dd2]
[barcoo:06796] [ 2] 
/usr/local/openmpi/1.6.5/lib/libmpi.so.1(opal_memory_ptmalloc2_malloc+0x52) 
[0x7f77a68c3f42]
[barcoo:06796] [ 3] ./gnumyhello_f90(MAIN__+0x146) [0x400f6a]
[barcoo:06796] [ 4] ./gnumyhello_f90(main+0x2a) [0x4011ea]
[barcoo:06796] [ 5] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3f7b21ecdd]
[barcoo:06796] [ 6] ./gnumyhello_f90() [0x400d69]
[barcoo:06796] *** End of error message ***
[pid  6796] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[pid  6796] +++ killed by SIGSEGV (core dumped) +++


The SEGV occurs (according to the gdb core dump I have) at the
second set_head() call in this code:

  /* check that one of the above allocation paths succeeded */
  if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {
remainder_size = size - nb;
remainder = chunk_at_offset(p, nb);
av->top = remainder;
set_head(p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
set_head(remainder, remainder_size | PREV_INUSE);
check_malloced_chunk(av, p, nb);
return chunk2mem(p);
  }


The arguments to that function are:

(gdb) print remainder
$1 = (struct malloc_chunk *) 0x2008e5700

(gdb) print remainder_size
$2 = 0

ANy ideas?

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIex30ACgkQO2KABBYQAh8HmQCgjj7tReOfdubczho7x9poprM7
5CwAnRBlw2LHrVHQsu2M1W6qo2H2HOzb
=dasp
-END PGP SIGNATURE-


[OMPI devel] How to deal with F90 mpi.mod with single stack and multiple compiler suites?

2013-08-22 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi folks,

We've got what we thought would be a fairly standard OMPI (1.6.5)
install which is a single install built with GCC and then setting the
appropriate variables to use the Intel compilers when someone loads
our "intel" module:

$ module show intel
[...]
setenv   OMPI_CC icc
setenv   OMPI_CXX icpc
setenv   OMPI_F77 ifort
setenv   OMPI_FC ifort
setenv   OMPI_CFLAGS -xHOST -O3 -mkl=sequential
setenv   OMPI_FFLAGS -xHOST -O3 -mkl=sequential
setenv   OMPI_FCFLAGS -xHOST -O3 -mkl=sequential
setenv   OMPI_CXXFLAGS -xHOST -O3 -mkl=sequential

This works wonderfully, *except* when our director attempted to build
an F90 program with the Intel compilers that fails to build because
the mpi.mod F90 module was produced with gfortran rather than the
Intel compilers. :-(

Is there any way to avoid having to do parallel installs of OMPI with
GCC and Intel compilers just to have two different versions of these
files?

My brief googling hasn't indicated anything, and I don't see anything
in the mpif90 manual page (though I have to admit I've had to rush to
try and get this done before I need to leave for the day). :-(

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIVrpIACgkQO2KABBYQAh/GAQCggQGnc18kSfMcGle3a3pWZGgD
UQ8AoIz61uuOPj+TFJwSYMTaAtUBLk3J
=yJ6J
-END PGP SIGNATURE-


Re: [OMPI devel] [slurm-dev] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-19 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Ralph,

On 12/08/13 06:17, Ralph Castain wrote:

> 1. Slurm has no direct knowledge or visibility into the
> application procs themselves when launched by mpirun. Slurm only
> sees the ORTE daemons. I'm sure that Slurm rolls up all the
> resources used by those daemons and their children, so the totals
> should include them
> 
> 2. Since all Slurm can do is roll everything up, the resources
> shown in sacct will include those used by the daemons and mpirun as
> well as the application procs. Slurm doesn't include their daemons
> or the slurmctld in their accounting. so the two numbers will be 
> significantly different. If you are attempting to limit overall 
> resource usage, you may need to leave some slack for the daemons
> and mpirun.

Thanks for that explanation, makes a lot of sense.

In the end due to time pressure we decided to just do what we did with
Torque and patch Slurm to set RLIMIT_AS instead of RLIMIT_DATA for
jobs so no single sub-process can request more RAM than the job has
asked for.

Works nicely and our users are used to it from Torque, we've not hit
any issues with it so far.

In the long term I suspect the jobacct_gather/cgroup plugin will give
better numbers once it's had more work.

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIS0dwACgkQO2KABBYQAh9X7ACgkTPVIJx7xhqYSPeqb4/vC5+W
+XYAn2xETmiTnO7S2Hv9C93gCjs2R8Gw
=ypc1
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-08-08 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Joshua,

On 23/07/13 19:34, Joshua Ladd wrote:

> The proposed solution that "we" (OMPI + SLURM) have come up with
> is to modify OMPI to support PMI2 and to use SLURM 2.6 which has
> support for PMI2 and is (allegedly) much more scalable than PMI1.
> Several folks in the combined communities are working hard, as we
> speak, trying to get this functional to see if it indeed makes a
> difference. Stay tuned, Chris. Hopefully we will have some data by
> the end of the week.

Is there any news on this?

We'd love to be able to test this out if we can as I currently see a
60% penalty with srun with my test NAMD job from our tame MM person.

thanks!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIDHbQACgkQO2KABBYQAh8vjgCgjPFB354t8dldPEA3pw2IHHze
vB4Ani5vfK+9+BkbRF92FGhtB4eyIF1u
=KoTt
-END PGP SIGNATURE-


Re: [OMPI devel] [slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:59, Janne Blomqvist wrote:

> That is, the memory accounting is per task, and when launching
> using mpirun the number of tasks does not correspond to the number
> of MPI processes, but rather to the number of "orted" processes (1
> per node).

That appears to be correct, I am seeing 1 task in the batch and 68
tasks for orted when I use mpirun whilst I see 1 task in the batch and
1104 tasks as namd2 when I use srun.

I could understand how that might result in Slurm (wrongly) thinking
that a single task is using more than its allowed memory per tasks,
but I'm not sure I understand how that could lead to Slurm thinking
the job is using vastly more memory than it actually is though.


cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB+lgACgkQO2KABBYQAh8uqgCdGuA03jCEdJVJE2dJGBHEJjb/
WY4An3em/48L25xq4Ui/GHijSJY2Oo6T
=Zk4G
-END PGP SIGNATURE-


Re: [OMPI devel] [slurm-dev] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:19, Christopher Samuel wrote:

> Anyone seen anything similar, or any ideas on what could be going
> on?

Sorry, this was with:

# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30

Since those initial tests we've started enforcing memory limits (the
system is not yet in full production) and found that this causes jobs
to get killed.

We tried the cgroups gathering method, but jobs still die with mpirun
and now the numbers don't seem to right for mpirun or srun either:

mpirun (killed):

[samuel@barcoo-test Mem]$ sacct -j 94564 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94564
94564.batch-523362K  0
94564.0 394525K  0

srun:

[samuel@barcoo-test Mem]$ sacct -j 94565 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94565
94565.batch998K  0
94565.0  88663K  0


All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB73wACgkQO2KABBYQAh+kwACfYnMbONcpxD2lsM5i4QDw5r93
KpMAn2hPUxMJ62u2gZIUGl5I0bQ6lllk
=jYrC
-END PGP SIGNATURE-


Re: [OMPI devel] Memory accounting issues with mpirun (was Re: [slurm-dev] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:18, Christopher Samuel wrote:

> Anyone seen anything similar, or any ideas on what could be going
> on?

Apologies, forgot to mention that Slurm is set up with:

# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30

We are testing with cgroups now.

- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB6h0ACgkQO2KABBYQAh8gowCfTG0p/RFOuUHQG47avDL2YwOg
uM8Anjw16dWen6kykBfMhWpHUWr709zv
=BR3G
-END PGP SIGNATURE-


[OMPI devel] Memory accounting issues with mpirun (was Re: [slurm-dev] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/07/13 17:06, Christopher Samuel wrote:

> Bringing up a new IBM SandyBridge cluster I'm running a NAMD test 
> case and noticed that if I run it with srun rather than mpirun it 
> goes over 20% slower.

Following on from this issue, we've found that whilst mpirun gives
acceptable performance the memory accounting doesn't appear to be correct.

Anyone seen anything similar, or any ideas on what could be going on?

Here are two identical NAMD jobs running over 69 nodes using 16 nodes
per core, this one launched with mpirun (Open-MPI 1.6.5):


==> slurm-94491.out <==
WallClock: 101.176193  CPUTime: 101.176193  Memory: 1268.554688 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94491 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94491
94491.batch6504068K  11167820K
94491.05952048K   9028060K


This one launched with srun (about 60% slower):

==> slurm-94505.out <==
WallClock: 163.314163  CPUTime: 163.314163  Memory: 1253.511719 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94505 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94505
94505.batch   7248K   1582692K
94505.01022744K   1307112K



cheers!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB5sEACgkQO2KABBYQAh9QMQCfQ57w0YqVDwgyGRqUe3dSvQDj
e9cAnRRx/kDNUNqUCuFGY87mXf2fMOr+
=JUPK
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-07-24 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/07/13 09:42, Ralph Castain wrote:

> Not to 1.6 series, but it is in the about-to-be-released 1.7.3,
> and will be there from that point onwards.

Oh dear, I cannot delay this machine any more to change to 1.7.x. :-(

> Still waiting to see if it resolves the difference.

When I've got the current rush out of the way I'll try a private build
of 1.7 and see how that goes with NAMD.

cheers!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHvbl8ACgkQO2KABBYQAh9a6QCgi0HOHHV/opqjPq+Av+lTasaj
4OkAnA8i8ajZ9Umw7MoaH8qJbWBgFOAf
=p7Xl
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-07-23 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/07/13 19:34, Joshua Ladd wrote:

> Hi, Chris

Hi Joshua,

I've quoted you in full as I don't think your message made it through
to the slurm-dev list (at least I've not received it from there yet).

> Funny you should mention this now. We identified and diagnosed the 
> issue some time ago as a combination of SLURM's PMI1
> implementation and some of, what I'll call, OMPI's topology
> requirements (probably not the right word.) Here's what is
> happening, in a nutshell, when you launch with srun:
> 
> 1. Each process pushes his endpoint data up to the PMI "cloud" via
> PMI put (I think it's about five or six puts, bottom line, O(1).) 
> 2. Then executes a PMI commit and PMI barrier to ensure all other 
> processes have finished committing their data to the "cloud". 3.
> Subsequent to this, each process executes O(N) (N is the number of 
> procs in the job) PMI gets in order to get all of the endpoint
> data for every process regardless of whether or not the process 
> communicates with that endpoint.
> 
> "We" (MLNX et al.) undertook an in-depth scaling study of this and 
> identified several poorly scaling pieces with the worst offenders 
> being:
> 
> 1. PMI Barrier scales worse than linear. 2. At scale, the PMI get
> phase starts to look quadratic.
> 
> The proposed solution that "we" (OMPI + SLURM) have come up with is
> to modify OMPI to support PMI2 and to use SLURM 2.6 which has
> support for PMI2 and is (allegedly) much more scalable than PMI1.
> Several folks in the combined communities are working hard, as we
> speak, trying to get this functional to see if it indeed makes a
> difference. Stay tuned, Chris. Hopefully we will have some data by
> the end of the week.

Wonderful, great to know that what we're seeing is actually real and
not just pilot error on our part!   We're happy enough to tell users
to keep on using mpirun as they will be used to from our other Intel
systems and to only use srun if the code requires it (one or two
commercial apps that use Intel MPI).

Can I ask, if the PMI2 ideas work out is that likely to get backported
to OMPI 1.6.x ?

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHvEZIACgkQO2KABBYQAh9QogCeMuR/E4oPivdsX3r671+z7EWd
Hv8An1N8csHMby7bouT/gC07i/J2PW+i
=gZsB
-END PGP SIGNATURE-


[OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-07-23 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi there slurm-dev and OMPI devel lists,

Bringing up a new IBM SandyBridge cluster I'm running a NAMD test case
and noticed that if I run it with srun rather than mpirun it goes over
20% slower.  These are all launched from an sbatch script too.

Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB.

Here are some timings as reported as the WallClock time by NAMD itself
(so not including startup/tear down overhead from Slurm).

srun:

run1/slurm-93744.out:WallClock: 695.079773  CPUTime: 695.079773
run4/slurm-94011.out:WallClock: 723.907959  CPUTime: 723.907959
run5/slurm-94013.out:WallClock: 726.156799  CPUTime: 726.156799
run6/slurm-94017.out:WallClock: 724.828918  CPUTime: 724.828918

Average of 692 seconds

mpirun:

run2/slurm-93746.out:WallClock: 559.311035  CPUTime: 559.311035
run3/slurm-93910.out:WallClock: 544.116333  CPUTime: 544.116333
run7/slurm-94019.out:WallClock: 586.072693  CPUTime: 586.072693

Average of 563 seconds.

So that's about 23% slower.

Everything is identical (they're all symlinks to the same golden
master) *except* for the srun / mpirun which is modified by copying
the batch script and substituting mpirun for srun.

When they are running I can see that for jobs launched with srun they
are direct children of slurmstepd whereas when started with mpirun
they are children of Open-MPI's orted (or mpirun on the launch node)
which itself is a child of slurmstepd.

Has anyone else seen anything like this, or got any ideas?

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHuKxoACgkQO2KABBYQAh8cYQCfT/YIFkyeDaNb/ksT2xk4W416
kycAoJfdZInLwy+nTIL7CzWapZZU20qm
=ZJ1B
-END PGP SIGNATURE-


Re: [OMPI devel] Any plans to support Intel MIC (Xeon Phi) in Open-MPI?

2013-05-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/05/13 14:30, Ralph Castain wrote:

> On May 2, 2013, at 9:18 PM, Christopher Samuel 
>  wrote:
> 
>> We're using Slurm, and it supports them already apparently, so I'm 
>> not sure if that helps?
> 
> It does - but to be clear: your saying that you can directly launch 
> processes onto the Phi's via srun?

Ah no, Slurm 2.5 supports them as coprocessors, allocated as GPUs are.

I've been told Slurm 2.6 (under development) may support them as nodes
in their own right, but that's not something I've had time to look into
myself (yet).

> If so, then this may not be a problem, assuming you can get
> confirmation that the Phi's have direct access to the interconnects.

I'll see what I can do.   There is a long README which will be my light
reading on the train home tonight here:

http://registrationcenter.intel.com/irc_nas/3047/readme-en.txt

This seems to indicate how that works, but other parts imply that it
*may* require Intel True Scale InfiniBand adapters:

3.4  Starting Intel(R) MPSS with OFED Support

  1) Start the Intel(R) MPSS service. Section 2.3, "Starting Intel(R) MPSS 
 Services" explains how.  Do not proceed any further if Intel(R) MPSS is not
 started.

  2) Start IB and HCA services. 
user_prompt> sudo service openibd start
user_prompt> sudo service opensmd start

  3) Start The Intel(R) Xeon Phi(TM) coprocessor specific OFED service.
user_prompt> sudo service ofed-mic start

  4) To start the experimental ccl-proxy service (see /etc/mpxyd.conf)
user_prompt> sudo service mpxyd start

3.5  Stopping Intel(R) MPSS with OFED Support 

o If the installed version is earlier than 2.x.28xx unload the driver using:
user_prompt> sudo modprobe -r mic

o If the installed version is 2.x.28xx or later, unload the driver using:   
   
user_prompt> sudo service ofed-mic stop
user_prompt> sudo service mpss stop
user_prompt> sudo service mpss unload
user_prompt> sudo service opensmd stop
user_prompt> sudo service openibd stop

o If the experimental ccl-proxy driver was started, unload the driver using:
user_prompt> sudo service mpxyd stop

> If the answer to both is "yes", then just srun the MPI procs
> directly - we support direct launch and use PMI to wireup. Problem
> solved :-)

That would be ideal, I'll do more digging into Slurm 2.6 (we had
planned on starting off with that, but as coprocessors, but this
may be enough for us to change).

> And yes - that support is indeed in the 1.6 series...just configure 
> --with-pmi. You may need to provide the path to where pmi.h is 
> located under the slurm install, but probably not.

Brilliant, thanks!

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGDUOMACgkQO2KABBYQAh9lcQCeIp5KjX2PJ/2Cia6fc51hSjFW
26UAn1eKqTqjZil7S8xwJrDDL5wkGof/
=2A67
-END PGP SIGNATURE-


Re: [OMPI devel] Any plans to support Intel MIC (Xeon Phi) in Open-MPI?

2013-05-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Ralph,  very quick reply as I've got an SGI engineer waiting for
me.. ;-)

On 03/05/13 12:21, Ralph Castain wrote:

> So the first problem is: how to know the Phi's are present, how
> many you have on each node, etc? We could push that into something
> like the hostfile, but that requires that someone build the file.
> Still, it would only have to be built once, so maybe that's not too
> bad - could have a "wildcard" entry if every node is the same,
> etc.

We're using Slurm, and it supports them already apparently, so I'm not
sure if that helps?

> Next, we have to launch processes across the PCI bus. We had to do
> an "rsh" launch of the MPI procs onto RR's cell processors as they
> appeared to be separate "hosts", though only visible on the local
> node (i.e., there was a stripped-down OS running on the cell) -
> Paul's cmd line implies this may also be the case here. If the same
> method works here, then we have most of that code still available
> (needs some updating). We would probably want to look at whether or
> not binding could be supported on the Phi local OS.

I believe that is the case - you can login via SSH to them is my
understanding.  We've not got that far with ours yet..

> Finally, we have to wire everything up. This is where RR got a
> little tricky, and we may encounter the same thing here. On RR, the
> cell's didn't have direct access to the interconnects - any
> messaging had to be relayed by a process running on the main cpu.
> So we had to create the ability to "route" MPI messages from
> processes running on the cells to processes residing on other
> nodes.

Gotcha.

> Solving the first two is relatively straightforward. In my mind,
> the primary issue is the last one - does anyone know if a process
> on the Phi's can "see" interconnects like a TCP NIC or an
> Infiniband adaptor?

I'm not sure, but I can tell you that the Intel RPMs include an OFED
install that looks like it's used on the Phi (if my reading is correct).

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGDOoAACgkQO2KABBYQAh/ZrQCgjwf5PDZWF7LYYcujxfLgiYP4
lLYAn1tMt4AQ0/Jz0o+gJMvudfEGjf99
=vQ5j
-END PGP SIGNATURE-


Re: [OMPI devel] Any plans to support Intel MIC (Xeon Phi) in Open-MPI?

2013-05-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/05/13 10:47, Ralph Castain wrote:

> We had something similar at one time - I developed it for the 
> Roadrunner cluster so you could run MPI tasks on the GPUs. Worked 
> well, but eventually fell into disrepair due to lack of use.

OK, interesting!   RR was Cell rather than GPU though wasn't it?

> In this case, I suspect it will be much easier to do as the Phis 
> appear to be a lot more visible to the host than the GPU did on RR.
>  Looking at the documentation, the Phis just sit directly on the
> PCIe bus, so they should look just like any other processor,

Yup, they show up in lspci:

[root@barcoo061 ~]# lspci -d 8086:2250
2a:00.0 Co-processor: Intel Corporation Device 2250 (rev 11)
90:00.0 Co-processor: Intel Corporation Device 2250 (rev 11)

> and they are Xeon binary compatible - so there is no issue with 
> tracking which binary to run on which processor.

Sadly they're not binary compatible, you have to cross-compile for
them (or compile on the Phi itself).

I haven't got any further than have xCAT install the (rebuilt) kernel
module so far, so I can't log into them yet.

> Brice: do the Phis appear in the hwloc topology object?

They appear in lstopo as mic0 and mic1.

> Chris: can you run lstopo on one of the nodes and send me the
> output (off-list)?

One of the hosts?  Not a problem, will do.

All the best!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGDDZIACgkQO2KABBYQAh/TUQCgh29RPf5FM3PWe/p/qpMW3wGX
ZaUAn0uxw8i/BZxXDOFXQZIyY1rn4/zm
=zock
-END PGP SIGNATURE-


[OMPI devel] Any plans to support Intel MIC (Xeon Phi) in Open-MPI?

2013-05-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi folks,

The new system we're bringing up has 10 nodes with dual Xeon Phi MIC
cards, are there any plans to support them by launching MPI tasks
directly on the Phis themselves (rather than just as offload devices
for code on the hosts)?

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGDAPYACgkQO2KABBYQAh+y9ACfZ0SdqDuV7Euq3B0ANtxPhH1D
3h4An1Zlhu2Ut+OFvbTa9xbLBkspwwPY
=TbIy
-END PGP SIGNATURE-


Re: [OMPI devel] Choosing an Open-MPI release for a new cluster

2013-05-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Ralph, Jeff, Paul,

On 02/05/13 14:14, Ralph Castain wrote:

> Depends on what you think you might want, and how tolerant you and 
> your users are about bugs.
> 
> The 1.6 series is clearly more mature and stable. It has nearly
> all the MPI-2 stuff now, but no MPI-3.

Great, thanks!

> If you think there is something in MPI-3 you might want, then the
> 1.7 series could be the way to go - though you'll have to suffer
> thru its growing pains.
[...]

Well our users are life sciences researchers and as a result very few
of those are developers, they are mostly using applications we build
for them on request (or Java and the occasional commercial package).

So from the sound of it 1.6 is the way to go and if we ever hit
something that needs MPI-3 then we'll install that in parallel but
leave the default at 1.6.

Thanks so much to you all!

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGC8u4ACgkQO2KABBYQAh8P9gCdHJGNLE63akY/1SMdeIhxMRyn
k90AnRLnj8nJbsnj/rWP/yj4E5u8up3n
=EfJH
-END PGP SIGNATURE-


[OMPI devel] Choosing an Open-MPI release for a new cluster

2013-05-01 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi folks,

We're about to bring up a new cluster (IBM iDataplex with SandyBridge
CPUs including 10 nodes with two Intel Xeon Phi cards) and I'm at the
stage where we need to pick an OMPI release to put on.

Given that this system is at the start of its life whatever we pick
now is likely to be baked in for the next 4 years or so (with OMPI
point release updates of course) and so I'm thinking that I should be
going with the 1.7.x release rather than the 1.6.x one.

For comparison the Nehalem iDP this is going in next to is still at
1.4.x, it wouldn't be worth the effort to take it to a later release
given it has probably only another 18 months of life left.

However, not having been able to keep up with this list for some time
I'd like to throw myself on your tender mercies for advice for whether
that's a good plan or not!

Thoughts please?

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlGB40cACgkQO2KABBYQAh+zHwCfbKMFtmmnc07PPrXdEHghxqf1
SCYAn2hgWaLBUXhbBAmzA20BXLBzdLsJ
=KGxX
-END PGP SIGNATURE-


[OMPI devel] CRIU checkpoint support in Open-MPI?

2012-12-05 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi folks,

I don't know if people have seen that the Linux kernel community is
following its own different checkpoint/restart path to those currently
supported by OMPI, namely that of the OpenVZ developers
"checkpoint/restore in user space" project (CRIU).

You can read more about its current state here:

 https://lwn.net/Articles/525675/

The CRIU website is here:

 http://criu.org/

CRIU will also be up for discussion at LCA2013 in Canberra this year
(though I won't be there):

http://linux.conf.au/schedule/30116/view_talk?day=thursday

Is there interest from OMPI in supporting this, given it looks like
it's quite likely to make it into the mainline kernel?

Or is better to wait for it to be merged, and then take a look?

All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlDACXYACgkQO2KABBYQAh8LIQCfagfyZNzK3KVKb+W0etJV4tyL
AxwAn0z6q7TVNcOTom0tmvy7brfFf4QV
=SLvF
-END PGP SIGNATURE-


Re: [OMPI devel] 1.6.1rc3 - 3 of 5 tests failed on OSX 10.8

2012-08-23 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21/08/12 05:30, Jeff Squyres wrote:

> I see a clang 3.1 on http://llvm.org/releases/, but I
> don't see a 4.0.  Is that a released version?

Thanks to a colleague at VLSCI who's got Mountain Lion the version
information for clang and llvm-gcc are as follows (and amusingly
use entirely different and unrelated version numbers):


Carls-MacBook-Pro:~ carlt$ clang --version
Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.1.0
Thread model: posix


Carls-MacBook-Pro:~ carlt$ llvm-gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) 
(LLVM build 2336.11.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Hope this is of use!

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlA2yPAACgkQO2KABBYQAh8wNQCghHec8+PtE/lid1mQkLO8JURk
RD4An3Jtbq5NXmkH1YybI0zfU/iLea2I
=Qhhi
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: ob1: fallback on put/send on rget failure

2012-03-18 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 16/03/12 08:14, Shamis, Pavel wrote:

> I did not get any patch.

It arrived OK here, you can get it from the archive:

http://www.open-mpi.org/community/lists/devel/2012/03/10717.php

- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9mkwoACgkQO2KABBYQAh/4FwCghl/yE6A7IMMON6u2/RpplhzE
HxQAn2suJEOYOoG+povWbuqKpkhWphyU
=6/CG
-END PGP SIGNATURE-


Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/03/12 02:56, Nathan Hjelm wrote:

> Found a pretty nasty frag leak (and a minor one) in ob1 (see
> commit below). If this fix addresses some hangs we are seeing on
> infiniband LANL might want a 1.4.6 rolled (or a faster rollout for
> 1.6.0).

What symptoms would an affected job show?  Does it fail with an OMPI
error or does it just hang using 0% CPU?

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9QN10ACgkQO2KABBYQAh9aRgCePZXdzqlI8lpfqWtHf8rtFvup
2D8An3E9y411xTyRBpfwHLPpWTzqUiuv
=3EXP
-END PGP SIGNATURE-


Re: [OMPI devel] Open MPI nightly tarballs suspended / 1.5.5rc3

2012-02-28 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 29/02/12 07:44, Jeffrey Squyres wrote:

> - BlueGene fixes

rc3 fixes the builds on our front end node, thanks!

- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9NesUACgkQO2KABBYQAh/xZACeOaKCOdHfOkcWu2W6KxZNsP9+
QMIAnAkwhmu3m/DnNubN4BoED51K8CGg
=T8Ca
-END PGP SIGNATURE-


Re: [OMPI devel] poor btl sm latency

2012-02-28 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 13/02/12 22:11, Matthias Jurenz wrote:

> Do you have any idea? Please help!

Do you see the same bad latency in the old branch (1.4.5) ?

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9MZBwACgkQO2KABBYQAh99aQCggjCQB/+aaQ3XCrdq4QyMlsD0
m2IAoI+TcrStWFkTZhEV50ax23ulmJvZ
=Soi0
-END PGP SIGNATURE-


Re: [OMPI devel] 1.5.5rc2

2012-02-23 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 15:12, Christopher Samuel wrote:

> I suspect this is irrelevant, but I got a build failure trying to 
> compile it on our BG/P front end node (login node) with the IBM XL 
> compilers.

Oops, forgot how I built it..

export
PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH

CC=xlc CXX=xlC F77=xlf ./configure && make

- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD
rdcAni+dfEMhlqMzYMILn8jeS9yWlInu
=+rA4
-END PGP SIGNATURE-


Re: [OMPI devel] 1.5.5rc2

2012-02-23 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 00:17, Jeffrey Squyres wrote:

> Please test!

I suspect this is irrelevant, but I got a build failure trying to
compile it on our BG/P front end node (login node) with the IBM XL
compilers.

make[5]: Entering directory
`/tmp/chris/openmpi-1.5.5rc2/ompi/contrib/vt/vt/vtlib'
  CC vt_pform_bgp.lo
"/bgsys/drivers/ppcfloor/arch/include/common/bgp_ras.h", line 652.1:
1506-508 (W) Option packed for pragma align is not supported.
.libs/vt_pform_bgp.s: Assembler messages:
.libs/vt_pform_bgp.s:453: Error: Unrecognized opcode: `mfdcrux'
.libs/vt_pform_bgp.s:494: Error: Unrecognized opcode: `mtdcrux'
1500-067: (S) asm statement generates errors in assembler output.
make[5]: *** [vt_pform_bgp.lo] Error 1
make[5]: Leaving directory
`/tmp/chris/openmpi-1.5.5rc2/ompi/contrib/vt/vt/vtlib'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory
`/tmp/chris/openmpi-1.5.5rc2/ompi/contrib/vt/vt'
make[3]: *** [all] Error 2
make[3]: Leaving directory
`/tmp/chris/openmpi-1.5.5rc2/ompi/contrib/vt/vt'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/tmp/chris/openmpi-1.5.5rc2/ompi/contrib/vt'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/chris/openmpi-1.5.5rc2/ompi'
make: *** [all-recursive] Error 1


- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HDjYACgkQO2KABBYQAh/NhQCcCNUpIejNu52jkXlehMUqR5vp
DDgAnR9tM/BFJHcF8ZEssa6w6gYL5cRY
=lXd3
-END PGP SIGNATURE-


Re: [OMPI devel] 1.5 supported systems

2012-02-23 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/02/12 09:44, Jeffrey Squyres wrote:

> - PBS Pro, Open PBS, Torque

Does anyone actually use OpenPBS any more these days? It was abandoned
almost 11 years ago now from what I see (2.3.16 was June 2001).

http://www.pbsworks.com/ResLibSearchResult.aspx?keywords=openpbs

Does anyone test against it?

cheers,
Chris
- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HBtAACgkQO2KABBYQAh9bvwCdHVF6sMoHbxceJObUbo46Jg3f
6KQAn3vKr++Mo1W0DbtIYl9s0GQ9SRb/
=s73m
-END PGP SIGNATURE-


Re: [OMPI devel] Compile-time MPI_Datatype checking

2012-02-02 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 29/01/12 10:07, Dmitri Gribenko wrote:

> My colleague and I want to implement a compile-time check (warning)
> for clang compiler that specified buffer type matches passed
> MPI_Datatype.

Interesting, is it possible to do the same for GCC with its plugin
architecture ?

cheers!
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8qH8cACgkQO2KABBYQAh/3WQCfXO4KaN5wVGsMSOCHaWFdiZH1
CDcAnRlYS+Xr1s38uiZB2QcH1B8Afhh1
=EprZ
-END PGP SIGNATURE-


Re: [OMPI devel] 1.4.5rc2 now released

2012-01-19 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20/01/12 04:55, Jeff Squyres wrote:

> Please test:

Great - we can now silence that warning for NFS, thanks!

- -- 
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8YzF8ACgkQO2KABBYQAh/nZACgipxqx8FesM3hN4HO6Qoo+Oag
pQEAnAuUMkyk1twDwxbaSJQFTX5Sb5Dy
=Y6j9
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Support Cross Memory Attach in sm btl

2012-01-12 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 13/01/12 11:47, Christopher Yeoh wrote:

> Here's some benchmarking results I did a while back on a single 64-way
> (SMT) POWER6 box.

Very nice numbers!  I'd love to test it out here on some real codes,
like Gromacs or NAMD (NAMD I know does a lot of comms and is latency
sensitive), but my Copious Free Time(tm) appears to have run out for the
moment. :-(

But certainly very interesting..

- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8PmqUACgkQO2KABBYQAh81nACggLO351rmjbTsN4redxAPWsbF
sV4AnR8IP2IN6sf77d4ofH4HMlSzflKd
=S8gy
-END PGP SIGNATURE-


Re: [OMPI devel] RFC: Support Cross Memory Attach in sm btl

2012-01-12 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Chris,

On 12/01/12 20:34, Christopher Yeoh wrote:

> Cross Memory Attach (CMA) is a pair of new syscalls (process_vm_readv
> and process_vm_writev) which allow for fast intranode
> communication. It has added to the Linux 3.2 kernel.

Do you have any figures comparing some code with and without CMA ?

cheers,
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8PZJwACgkQO2KABBYQAh+N8gCff8FfEJaR8CEpEfrc/wAuGL7O
mUUAoItDLRenBEBCctFLuPyOFhi7LDnm
=h6QC
-END PGP SIGNATURE-


Re: [OMPI devel] OMPI 1.4.5rc1 posted

2011-12-14 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 15/12/11 08:33, Ralph Castain wrote:

> That param was intended to catch user-level mistakes
> whereby the user specified a tmpdir location via the
> tmpdir_base MCA param that the system admin wanted to
> protect. It was not intended for someone to specify
> locations to skip.

Ahh, I understand!  Not a problem - if the patch to let us disable that
warning get's CMR'd into 1.4.5 then we'll just upgrade straight to that,
if it doesn't we might still upgrade and set our systems to use /dev/shm
until 1.4.6 appears.

cheers!
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7pPrIACgkQO2KABBYQAh+W5wCfV9r/kzAe/m1SKS8HYSaN7Gr2
XZgAnjqJO4+ULoF0ZtuM+S2COoO2gyBo
=ClF2
-END PGP SIGNATURE-


Re: [OMPI devel] OMPI 1.4.5rc1 posted

2011-12-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14/12/11 12:59, Jeff Squyres wrote:

> Fair enough.  We've definitely seen cases where the
> /tmp filesystem *did* matter, so perhaps it's a kernel
> version issue, or a phase of the moon issue, or...

Well it's more where $TMPDIR points to, not relating to
/tmp per se.   But yes, I can quite understand why it
could be problematic on some filesystems.

> But yet, the point is valid that the message should be
> disable-able.  Let me file a ticket about it...  Done:
> 
>   https://svn.open-mpi.org/trac/ompi/ticket/2937

That's great - much appreciated!

cheers,
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7oCfsACgkQO2KABBYQAh9z9ACffeBu5Ew0Iklo99M0WP6HUETY
ascAn0RNSz4VDm7B3o5xBelIRPgVSJjr
=fo8x
-END PGP SIGNATURE-


  1   2   >