Re: [hwloc-users] Build an OS-X Universal version

2021-03-24 Thread Robin Scher
I was eventually able to build what I needed, though I never could get glibtool 
working to build from the git repo. Thankfully I didn’t need to bootstrap 
configure from the tarball. 

In case anyone cares, this is essentially what I’m doing to build a Mac 
universal binary version:

configure --prefix …/arm64 --target arm64-apple-darwin CFLAGS=“-target 
arm64-apple-darwin”; make install
configure --prefix …/x86_64 --target x86_64-apple-darwin CFLAGS=“-target 
x86_64-apple-darwin”; make install
lipo -create -output libhwloc.a …/arm64/lib/libhwloc.a …/x86_64/lib/libhwloc.a

I had to supply the target via CFLAGS to get it to build and link correctly 
with the correct platform when cross compiling. 

-robin


> On Mar 23, 2021, at 10:19 AM, Erik Schnetter  wrote:
> 
> The default MacOS libtool is very different from GNU libtool. Some
> software knows about this, and you then need to make sure none of the
> Homebrew/MacPorts libtool is in your path. (Other software, of course,
> expects a GNU libtool, and then you need to proceed as described
> earlier.)
> 
> -erik
> 
> On Tue, Mar 23, 2021 at 12:47 PM Robin Scher  wrote:
>> 
>> Yes I think that’s the issue and I can’t resolve it with any usual trick to 
>> switch to the glibtool version. Been having issues with this new Mac, so 
>> I’ll keep working that angle for now. I got a little further using the 
>> download tarball, but I can’t seem to reliably build anything by command 
>> line yet so it’s probably a system issue with my machine.
>> 
>> Thanks,
>> 
>> Robin Scher
>> ro...@uberware.net
>> +1 (213) 448-0443
>> 
>>> On Mar 23, 2021, at 5:31 AM, Brice Goglin  wrote:
>>> 
>>> 
>>>> Le 23/03/2021 à 08:08, Brice Goglin a écrit :
>>>>> Le 23/03/2021 à 02:28, ro...@uberware.net a écrit :
>>>>> Hi. I'm trying to build hwloc on OS-X Big Sur on an M1. Ultimate plan is
>>>>> to build it as a universal binary. Right now, I cannot even get the git
>>>>> master to autogen. This is what I get:
>>>>> 
>>>>> robin@Robins-Mac-mini hwloc % ./autogen.sh
>>>>> autoreconf: Entering directory `.'
>>>>> autoreconf: configure.ac: not using Gettext
>>>>> autoreconf: running: aclocal --force -I ./config
>>>>> autoreconf: configure.ac: tracing
>>>>> configure.ac:77: error: libtool version 2.2.6 or higher is required
>>>> 
>>>> Hello
>>>> 
>>>> There's something strange in your environment if libtool 2.2.6+ couldn't
>>>> be found. It likely explains the rest of the messages.
>>> 
>>> 
>>> I read somewhere else that brew provides glibtool and glibtoolize while
>>> libtool/libtoolize still points to the old Apple libtool. If so,
>>> aliasing libtool/libtoolize to glibtool/glibtoolize may help.
>>> 
>>> Brice
>>> 
>>> 
>>> ___
>>> hwloc-users mailing list
>>> hwloc-users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>> 
>> 
>> ___
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> 
> 
> 
> -- 
> Erik Schnetter 
> http://www.perimeterinstitute.ca/personal/eschnetter/
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> 

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Build an OS-X Universal version

2021-03-23 Thread Robin Scher
Yes I think that’s the issue and I can’t resolve it with any usual trick to 
switch to the glibtool version. Been having issues with this new Mac, so I’ll 
keep working that angle for now. I got a little further using the download 
tarball, but I can’t seem to reliably build anything by command line yet so 
it’s probably a system issue with my machine. 

Thanks,

Robin Scher
ro...@uberware.net
+1 (213) 448-0443

> On Mar 23, 2021, at 5:31 AM, Brice Goglin  wrote:
> 
> 
>> Le 23/03/2021 à 08:08, Brice Goglin a écrit :
>>> Le 23/03/2021 à 02:28, ro...@uberware.net a écrit :
>>> Hi. I'm trying to build hwloc on OS-X Big Sur on an M1. Ultimate plan is
>>> to build it as a universal binary. Right now, I cannot even get the git
>>> master to autogen. This is what I get:
>>> 
>>> robin@Robins-Mac-mini hwloc % ./autogen.sh
>>> autoreconf: Entering directory `.'
>>> autoreconf: configure.ac: not using Gettext
>>> autoreconf: running: aclocal --force -I ./config
>>> autoreconf: configure.ac: tracing
>>> configure.ac:77: error: libtool version 2.2.6 or higher is required
>> 
>> Hello
>> 
>> There's something strange in your environment if libtool 2.2.6+ couldn't
>> be found. It likely explains the rest of the messages.
> 
> 
> I read somewhere else that brew provides glibtool and glibtoolize while
> libtool/libtoolize still points to the old Apple libtool. If so,
> aliasing libtool/libtoolize to glibtool/glibtoolize may help.
> 
> Brice
> 
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> 

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

[hwloc-users] Build an OS-X Universal version

2021-03-22 Thread Robin
Hi. I'm trying to build hwloc on OS-X Big Sur on an M1. Ultimate plan is
to build it as a universal binary. Right now, I cannot even get the git
master to autogen. This is what I get:

robin@Robins-Mac-mini hwloc % ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I ./config
autoreconf: configure.ac: tracing
configure.ac:77: error: libtool version 2.2.6 or higher is required
configure.ac:77: the top level
autom4te: /usr/bin/m4 failed with exit status: 63
autoreconf: configure.ac: not using Libtool
autoreconf: running: /opt/homebrew/Cellar/autoconf/2.69/bin/autoconf --force
configure.ac:77: error: libtool version 2.2.6 or higher is required
configure.ac:77: the top level
autom4te: /usr/bin/m4 failed with exit status: 63
autoreconf: /opt/homebrew/Cellar/autoconf/2.69/bin/autoconf failed with
exit status: 63
Checking whether configure needs patching for MacOS Big Sur libtool.m4
bug... grep: configure: No such file or directory
grep: configure: No such file or directory
yes
Trying to patch configure...
can't find file to patch at input line 9
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|Updated from libtool.m4 patch:
|
|[PATCH] Improve macOS version detection to support macOS 11 and simplify
legacy logic
|
|Signed-off-by: Jeremy Huddleston Sequoia 
|
|--- hwloc/configure.old2020-11-25 16:03:04.225097149 +0100
|+++ hwloc/configure2020-11-25 16:02:29.368995613 +0100
--
File to patch:

It hangs there waiting for me to supply a file. I see the patch is trying
to generate configure from configure.old, but there is no configure.old in
the repo. I know next to nothing about autogen and auto configure and have
always just followed basic instructions that have always worked before.

I'm using homebrew to get the gnu tools, and have autoconf 2.69, automate
1.16.3, and lib tool 2.4.6. Thanks for any help to get this building.
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] Having trouble getting CPU Model string on Windows 7 x64

2014-01-28 Thread Robin Scher
Hi, thanks for responding. 

The CPUModel is definitely available on this machine. A 32 bit process on the 
same machine correctly finds the model name using code that calls the cpuid 
inline assembly to get it, and the machine itself is a VM running on a Mac, and 
I get the same model name from the code on Mac and on a Linux VM running on the 
same machine as well. It seems to be an issue with the Windows port of hwloc, 
and possibly only with the 64 bit version (I haven’t tried the 32 bit version 
yet). 

I am using the prebuilt binary on Windows. I haven’t tried (and I’m not sure I 
want to try) building hwloc from source on Windows x64 using MSVC, but I have 
found out so far today that Microsoft makes available a compiler intrinsic to 
get access to cpuid as a C function, and that’s supposed to allow you to do the 
same kind of cpuid call work done previously as inline assembly. Perhaps 
someone out there is more familiar with this specific functionality in hwloc 
and can fix this for the 64 bit Windows build? I can take a stab at it, but 
like I said, the biggest hwloc development I’ve done is setting a flag in the 
configure script when I build on Unix.

As a last question, where is the “CPUModel” documented about where it would 
appear? I was looking for that but couldn’t find it.

Thank you for any further advice.
-robin

Robin Scher
ro...@uberware.net
+1 (213) 448-0443



On Jan 27, 2014, at 11:10 PM, Brice Goglin <brice.gog...@inria.fr> wrote:

> Hello,
> 
> The CPUModel attribute should be only in Socket or machine/root objects. At 
> least, that's what I documented and what I seem to see in the code. Did you 
> actually see any other place?
> 
> So it may just mean that the CPUModel is not available on your machine? Or 
> maybe the code below is buggy somehow? Does lstopo -v on Windows show a 
> CPUModel attribute? It does in a 32bits binary running on my Win7 64bits, but 
> doesn't seem to find anything when running the 64bits binary. I don't 
> remember well if there was a specific Windows 64bits issue in the cpuid code 
> in the x86 backend.
> 
> Brice
> 
> 
> 
> Le 28/01/2014 01:59, Robin Scher a écrit :
>> Hi again.
>> 
>> I’m trying to use hwloc 1.8 on Windows, Linux and Mac to get the CPU model 
>> string (e.g., “Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz”). Since hwloc on 
>> different platforms seem to stash this in different objects, I’m using code 
>> like this:
>> 
>> String name;
>> hwloc_obj_type_t objs[] = { 
>> HWLOC_OBJ_MACHINE, 
>> HWLOC_OBJ_SOCKET, 
>> HWLOC_OBJ_CORE, 
>> HWLOC_OBJ_PU, };
>> for( size_t index = 0; index < (sizeof( objs ) / sizeof( hwloc_obj_type_t )) 
>> && name.Empty(); ++index )
>> {
>> hwloc_obj_t obj = hwloc_get_obj_by_type( topology, objs[ index ], 0 );
>> if( !obj ) continue;
>> const char *str = hwloc_obj_get_info_by_name( obj, "CPUModel" );
>> if( str ) name = String( str ).Trim();
>> }
>> 
>> On Mac, it works (found string at HWLOC_OBJ_MACHINE), and on Linux it works 
>> (found string at HWLOC_OBJ_SOCKET), but on Windows x64, none of these find 
>> the string. They all return a NULL pointer.
>> 
>> Am I missing something? I tried a few other of the object types, but didn’t 
>> find it with them either (I actually tried looping through all integer 
>> values between 0 and HWLOC_OBJ_TYPE_MAX and it didn’t appear in any of them).
>> 
>> Thank you for any help you can provide.
>> -robin
>> 
>> Robin Scher
>> ro...@uberware.net
>> +1 (213) 448-0443
>> 
>> 
>> 
>> 
>> 
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



signature.asc
Description: Message signed with OpenPGP using GPGMail


[hwloc-users] Having trouble getting CPU Model string on Windows 7 x64

2014-01-27 Thread Robin Scher
Hi again.

I’m trying to use hwloc 1.8 on Windows, Linux and Mac to get the CPU model 
string (e.g., “Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz”). Since hwloc on 
different platforms seem to stash this in different objects, I’m using code 
like this:

String name;
hwloc_obj_type_t objs[] = { 
HWLOC_OBJ_MACHINE, 
HWLOC_OBJ_SOCKET, 
HWLOC_OBJ_CORE, 
HWLOC_OBJ_PU, };
for( size_t index = 0; index < (sizeof( objs ) / sizeof( hwloc_obj_type_t )) && 
name.Empty(); ++index )
{
hwloc_obj_t obj = hwloc_get_obj_by_type( topology, objs[ index ], 0 );
if( !obj ) continue;
const char *str = hwloc_obj_get_info_by_name( obj, "CPUModel" );
if( str ) name = String( str ).Trim();
}

On Mac, it works (found string at HWLOC_OBJ_MACHINE), and on Linux it works 
(found string at HWLOC_OBJ_SOCKET), but on Windows x64, none of these find the 
string. They all return a NULL pointer.

Am I missing something? I tried a few other of the object types, but didn’t 
find it with them either (I actually tried looping through all integer values 
between 0 and HWLOC_OBJ_TYPE_MAX and it didn’t appear in any of them).

Thank you for any help you can provide.
-robin

Robin Scher
ro...@uberware.net
+1 (213) 448-0443





signature.asc
Description: Message signed with OpenPGP using GPGMail


[hwloc-users] How to build hwloc static to link into a shared lib on Linux

2014-01-18 Thread Robin Scher
Hi

I’m trying to build hwloc (1.8) on Linux (CentOS 6 x64) as a static library 
that will be linked into my own shared library that is part of my application. 
I am not using very much of hwloc, and I am trying to avoid having the full 
hwloc shared library distributed with my application just for the tiny bit of 
it that I am using. However, this turns out to be a challenge.

I configured with:

./configure --enable-static --disable-shared

which builds the static library just fine, but when I link it to my shared 
library I get this error:

/usr/bin/ld: /usr/local/lib/libhwloc.a(topology.o): relocation R_X86_64_32S 
against `.rodata' can not be used when making a shared object; recompile with 
-fPIC

So, I tried re-configuring:

./configure --enable-static --disable-shared CXXFLAGS=-fPIC

but after rebuilding the library, I still get the same link error.

Is this a possible configuration? I can make my app work with hwloc in its own 
shared library distributed with my app, it just seems so wasteful for what I’m 
doing with it. I’m not the biggest Linux expert, so I’m pretty sure I’m doing 
something wrong, but I have managed to get other libraries I’m using 
(boost.regex and zeromq) to work this way, so it seems like it should be 
possible.

Thank you for any help you can provide.

Robin Scher
ro...@uberware.net
+1 (213) 448-0443





signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [hwloc-users] How do I access CPUModel info string

2012-10-26 Thread Robin Scher


On 10/25/2012 3:06 PM, Samuel Thibault wrote:

Robin Scher, le Thu 25 Oct 2012 23:57:38 +0200, a écrit :

Do you think those could be added to hwloc?

Yes: we already use cpuid for the x86 backend. That will only work on
x86 hosts of course.


Windows being x86 only for the time being, I'm OK with that.

I would love to get this by my next release, say in the next 3-6 months. 
Is that something that would be possible? Is there anything I can do to 
help?


Thanks,
-robin

--
*Robin Scher* Uberware
ro...@uberware.net
+1 (213) 448-0443




[hwloc-users] How do I access CPUModel info string

2012-10-25 Thread Robin Scher
Is there a way to get this string (e.g. "Intel(R) Core(TM) i7 CPU M 620 
@ 2.67GHz") consistently on Windows, Linux, OS-X and Solaris?


Thanks,
-robin

--
*Robin Scher* Uberware
ro...@uberware.net
+1 (213) 448-0443




[OMPI users] multi-rail failover with IB

2008-04-02 Thread Robin Humble
Hi,

from reading the FAQ and this list it seems OpenMPI can use multiple
InfiniBand rails by round-robining across the ports out of each node (as
long as they're configured to be on separate subnets (I think)).

can OpenMPI also deal with one of the subnets failing?
ie. will OpenMPI automatically fall back to using the last remaining
working IB port out of a node, or even fallback to GigE if all the IB
fails?

the reason I ask is that we're worried about switches failing in the IB
network and whether OpenMPI can solve all our problems for us if we
configure up 2 or more independent IB networks out of each node.

possibly this sort of failover in the MPI isn't needed with ConnectX as
long as it's adaptive routing works as advertised? If so then I guess
it's not that important, and I wouldn't want to make you guys do a lot
of unecessary work :-)

the FAQ entry here:
  http://www.open-mpi.org/faq/?category=ft#ft-future
says
  - Data Reliability and network fault tolerance. Similar to those
implemented in LA-MPI
but I don't actually know what LA-MPI implemented in this area, so that
doesn't really help me.

cheers,
robin


[OMPI users] quadrics

2007-03-19 Thread Robin Humble

does OpenMPI support Quadrics elan3/4 interconnects?

I saw a few hits on google suggesting that support was partial or maybe
planned, but couldn't find much in the openmpi sources to suggest any
support at all.

cheers,
robin


Re: [OMPI users] IB bandwidth vs. kernels

2007-01-19 Thread Robin Humble
On Thu, Jan 18, 2007 at 03:10:15PM +0200, Gleb Natapov wrote:
>On Thu, Jan 18, 2007 at 07:17:13AM -0500, Robin Humble wrote:
>> On Thu, Jan 18, 2007 at 11:08:04AM +0200, Gleb Natapov wrote:
>> >On Thu, Jan 18, 2007 at 03:52:19AM -0500, Robin Humble wrote:
>> >> On Wed, Jan 17, 2007 at 08:55:31AM -0700, Brian W. Barrett wrote:
>> >> >On Jan 17, 2007, at 2:39 AM, Gleb Natapov wrote:
>> >> >> On Wed, Jan 17, 2007 at 04:12:10AM -0500, Robin Humble wrote:
>> >> >>> basically I'm seeing wildly different bandwidths over InfiniBand 4x 
>> >> >>> DDR
>> >> >>> when I use different kernels.
>> >> >> Try to load ib_mthca with tune_pci=1 option on those kernels that are
>> >> >> slow.
>...
>> >> tune_pci=1 makes a huge difference at the top end, and
>> >Well this is broken BIOS then. Look here for more explanation:
>> >https://staging.openfabrics.org/svn/openib/gen2/branches/1.1/ofed/docs/mthca_release_notes.txt
>> >search for "tune_pci=1".
>> ok. thanks :-/
>...
>BIOS should configure MaxReadReq to maximum value supported by chipset.
>Linux shouldn't touch this value at all.

thanks. I'm told there's a bug already open with our vendor on this
issue and they're talking to Intel.

looks similar to this thread:
  http://www.mail-archive.com/openib-general@openib.org/msg25305.html

>> is there a way to check pci burst settings from userland? or BIOS?
>You can see PCI settings with lspci. Newest lspci decode this value for
>you, with older once you'll have to dump PCI config space to the file
>and decode it by yourself.

ah, yes, thanks. lspci -vvv can see MaxReadReq.
for the IB card:

 MaxReadReq(bytes)kernel OS
 4096 2.6.16.21-0.8-smpsles10
 512  2.6.9-42.0.3.ELsmp   centos4.4
 128  2.6.19.2 centos4.4
 128  2.6.18-1.2732.4.2.el5.OFED_1_1   centos4.4
     128  2.6.20-rc4   centos4.4
 4096 anything + tune_pci=1centos4.4

so errr... I have no idea which is the correct one :-/
bandwidth is only crap with 128.

thanks for all your help.

cheers,
robin


Re: [OMPI users] IB bandwidth vs. kernels

2007-01-18 Thread Robin Humble
On Thu, Jan 18, 2007 at 11:08:04AM +0200, Gleb Natapov wrote:
>On Thu, Jan 18, 2007 at 03:52:19AM -0500, Robin Humble wrote:
>> On Wed, Jan 17, 2007 at 08:55:31AM -0700, Brian W. Barrett wrote:
>> >On Jan 17, 2007, at 2:39 AM, Gleb Natapov wrote:
>> >> On Wed, Jan 17, 2007 at 04:12:10AM -0500, Robin Humble wrote:
>> >>> basically I'm seeing wildly different bandwidths over InfiniBand 4x DDR
>> >>> when I use different kernels.
>> >> Try to load ib_mthca with tune_pci=1 option on those kernels that are
>> >> slow.
>> >when an application has high buffer reuse (like NetPIPE), which can  
>> >be enabled by adding "-mca mpi_leave_pinned 1" to the mpirun command  
>> >line.
>> thanks! :-)
>> tune_pci=1 makes a huge difference at the top end, and
>Well this is broken BIOS then. Look here for more explanation:
>https://staging.openfabrics.org/svn/openib/gen2/branches/1.1/ofed/docs/mthca_release_notes.txt
>search for "tune_pci=1".

ok. thanks :-/

>> -mca mpi_leave_pinned 1 adds lots of midrange bandwidth.
>> 
>> latencies (~4us) and the low end performance are all unchanged.
>> 
>> see attached for details.
>> most curves are for 2.6.19.2 except the last couple (tagged as old)
>> which are for 2.6.9-42.0.3.ELsmp and for which tune_pci changes nothing.
>> 
>> why isn't tune_pci=1 the default I wonder?
>> files in /sys/module/ib_mthca/ tell me it's off by default in
>> 2.6.9-42.0.3.ELsmp, but the results imply that it's on... maybe PCIe
>> handling is very different in that kernel.
>This is explained in the link above.

hmmm...
but (sorry to harp on about this) /sys/module/ib_mthca/tune_pci is 0
for 2.6.9-42.0.3.ELsmp.
and even if that's lying, then mthca_tune_pci() appears identically
invoked in mthca_main.c from both 2.6.9-42.0.3.ELsmp and 2.6.19.2.
mthca_main.c is the only place in infiniband/hw/mthca that
pci_write_config_word() is called from, so you'd think that's got to be
how PCIe for IB was setup.

basically it's not clear to me how or if tune_pci is being set in
2.6.9-42.0.3.ELsmp, nor why it's any different to 2.6.19.2 :-/

maybe it's some other level in the kernel setting up PCIe differently?
but that would presumably be unrelated to OFED.

is there a way to check pci burst settings from userland? or BIOS?

BTW, the card appears to be Voltaire and system is SGI xe (210 and 240)
if that helps. /sys/class/infiniband/mthca0/board_id is VLT0050010001
not that I'm blaming anyone! :-)

cheers,
robin


Re: [OMPI users] IB bandwidth vs. kernels

2007-01-18 Thread Robin Humble

argh. attached.

cheers,
robin

On Thu, Jan 18, 2007 at 03:52:19AM -0500, Robin Humble wrote:
>On Wed, Jan 17, 2007 at 08:55:31AM -0700, Brian W. Barrett wrote:
>>On Jan 17, 2007, at 2:39 AM, Gleb Natapov wrote:
>>> On Wed, Jan 17, 2007 at 04:12:10AM -0500, Robin Humble wrote:
>>>> basically I'm seeing wildly different bandwidths over InfiniBand 4x DDR
>>>> when I use different kernels.
>>> Try to load ib_mthca with tune_pci=1 option on those kernels that are
>>> slow.
>>when an application has high buffer reuse (like NetPIPE), which can  
>>be enabled by adding "-mca mpi_leave_pinned 1" to the mpirun command  
>>line.
>
>thanks! :-)
>tune_pci=1 makes a huge difference at the top end, and
>-mca mpi_leave_pinned 1 adds lots of midrange bandwidth.
>
>latencies (~4us) and the low end performance are all unchanged.
>
>see attached for details.
>most curves are for 2.6.19.2 except the last couple (tagged as old)
>which are for 2.6.9-42.0.3.ELsmp and for which tune_pci changes nothing.
>
>why isn't tune_pci=1 the default I wonder?
>files in /sys/module/ib_mthca/ tell me it's off by default in
>2.6.9-42.0.3.ELsmp, but the results imply that it's on... maybe PCIe
>handling is very different in that kernel.
>
>is ~10Gbit the best I can expect from 4x DDR IB with MPI?
>some docs @HP suggest up to 16Gbit (data rate) should be possible, and
>I've heard that 13 or 14 has been achieved before. but those might be
>verbs numbers, or maybe horsepower >> 4 cores of 2.66GHz core2 is
>required?
>
>>It would be interesting to know if the bandwidth differences appear  
>>when the leave pinned protocol is used.  My guess is that they will  
>
>yeah, it definitely makes a difference in the 10kB to 10mB range.
>at around 100kB there's 2x the bandwidth when using pinned.
>
>thanks again!
>
>>   Brian Barrett
>>   Open MPI Team, CCS-1
>>   Los Alamos National Laboratory
>
>how's OpenMPI on Cell? :)
>
>cheers,
>robin
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users


blah5.ps.gz
Description: GNU Zip compressed data


Re: [OMPI users] IB bandwidth vs. kernels

2007-01-18 Thread Robin Humble
On Wed, Jan 17, 2007 at 08:55:31AM -0700, Brian W. Barrett wrote:
>On Jan 17, 2007, at 2:39 AM, Gleb Natapov wrote:
>> On Wed, Jan 17, 2007 at 04:12:10AM -0500, Robin Humble wrote:
>>> basically I'm seeing wildly different bandwidths over InfiniBand 4x DDR
>>> when I use different kernels.
>> Try to load ib_mthca with tune_pci=1 option on those kernels that are
>> slow.
>when an application has high buffer reuse (like NetPIPE), which can  
>be enabled by adding "-mca mpi_leave_pinned 1" to the mpirun command  
>line.

thanks! :-)
tune_pci=1 makes a huge difference at the top end, and
-mca mpi_leave_pinned 1 adds lots of midrange bandwidth.

latencies (~4us) and the low end performance are all unchanged.

see attached for details.
most curves are for 2.6.19.2 except the last couple (tagged as old)
which are for 2.6.9-42.0.3.ELsmp and for which tune_pci changes nothing.

why isn't tune_pci=1 the default I wonder?
files in /sys/module/ib_mthca/ tell me it's off by default in
2.6.9-42.0.3.ELsmp, but the results imply that it's on... maybe PCIe
handling is very different in that kernel.

is ~10Gbit the best I can expect from 4x DDR IB with MPI?
some docs @HP suggest up to 16Gbit (data rate) should be possible, and
I've heard that 13 or 14 has been achieved before. but those might be
verbs numbers, or maybe horsepower >> 4 cores of 2.66GHz core2 is
required?

>It would be interesting to know if the bandwidth differences appear  
>when the leave pinned protocol is used.  My guess is that they will  

yeah, it definitely makes a difference in the 10kB to 10mB range.
at around 100kB there's 2x the bandwidth when using pinned.

thanks again!

>   Brian Barrett
>   Open MPI Team, CCS-1
>   Los Alamos National Laboratory

how's OpenMPI on Cell? :)

cheers,
robin