Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
Brian and I chatted a bit about this off-list, and I think we're in  
agreement now:


- do not change the default value or meaning of  
btl_base_want_component_unsed.


- major point of confusion: the openib BTL is actually fairly unique  
in that it can (and does) tell the difference between "there are no  
devices present" and "there are devices, but something went wrong".   
Other BTL's have network interfaces that can't tell the difference and  
can *only* call the no_nics function, regardless of whether there are  
no relevant network interfaces or some error occurred during  
initialization.


- so a reasonable solution would be an openib-BTL-specific mechanism  
that doesn't call the no_nics function (to display that  
btl_base_want_component_unused) if there are no verbs-capable devices  
found because of the fact that mainline Linuxes are starting to ship  
libibverbs.  Specific mechanism TBD; likely to be an openib MCA param.



On May 21, 2008, at 9:56 PM, Jeff Squyres wrote:


On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:


If this is true (for some reason I thought it wasn't), then I think
we'd
actually be ok with your proposal, but you're right, you'd need
something
new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a
simple
delivered from the white box vendor IB setup that barely works on a
good
day (and unfortunately, there seems to be evidence that these exist).



I'm not sure I understand what you're saying -- you agree, but what
"new" do you think we need in the openib BTL?  The MCA params saying
which ports you expect to be ACTIVE?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:

If this is true (for some reason I thought it wasn't), then I think  
we'd
actually be ok with your proposal, but you're right, you'd need  
something

new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a  
simple
delivered from the white box vendor IB setup that barely works on a  
good

day (and unfortunately, there seems to be evidence that these exist).



I'm not sure I understand what you're saying -- you agree, but what  
"new" do you think we need in the openib BTL?  The MCA params saying  
which ports you expect to be ACTIVE?


--
Jeff Squyres
Cisco Systems



[OMPI devel] Does Open MPI class exist?

2008-05-21 Thread Jennis Pruett

I would dearly like a week-long class on Open MPI -
what it is, does, how to build, parameter tweaking, etc. 

Does anyone know if such a class exists *anywhere* ? 



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett

On Wed, 21 May 2008, Jeff Squyres wrote:


I'm only concerned about the case where there's an IB card, the user
expects the IB card to be used, and the IB card isn't used.


Can you put in a site wide

btl = ^tcp

to avoid the problem?  If the IB card fails, then you'll get
unreachable MPI errors.


And how many users are going to figure that one out before complaining 
loudly?  That's what LANL did (probably still does) and it worked great 
there, but that doesn't mean that others will figure that out (after all, 
not everyone has an OMPI developer on staff...).



If the
changes don't silence a warning in that situation, I'm fine with
whatever
you do.  But does ibv_get_device_list return an HCA when the port is
down
(because the SM failed and the machine rebooted since that time)?


Yes.


If this is true (for some reason I thought it wasn't), then I think we'd 
actually be ok with your proposal, but you're right, you'd need something 
new in the IB btl.  I'm not concerned about the dual rail issue -- if 
you're smart enough to configure dual rail IB, you're smart enough to 
figure out OMPI mca params.  I'm not sure the same is true for a simple 
delivered from the white box vendor IB setup that barely works on a good 
day (and unfortunately, there seems to be evidence that these exist).



Brian


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 4:29 PM, Brian W. Barrett wrote:


Previously, there has not been such a distinction, so I really have no
idea which caused the openib BTL throw its error (and never really  
cared,

as it was always somebody else's problem at that point).


In the scenarios that I'm talking about, ibv_devinfo(1) and  
ibv_devices(1) commands should return that there are no devices (you  
have OFED or equivalent installed but have no verbs-capable hardware):


-
[15:21] queeg:~/mpi % ibv_devinfo
No IB devices found
[16:41] queeg:~/mpi % ibv_devices
device node GUID
--  
[16:41] queeg:~/mpi %
-

Since there's no need for an immediate change to the code base --  
perhaps you could watch over the next few weeks and when you see  
problems of the kind that you're worried about, run ibv_devices and  
ibv_devinfo.  If you see OMPI-reported openfabrics problems with no  
warnings from libibverbs itself (like I mentioned in my first mail)  
and ibv_dev* are reporting no devices, then we need to worry about  
cases where the verbs stack itself doesn't even see the devices (which  
is a Really Big Error; the OS/driver stack doesn't even see the device).


If ibv_dev* reports that there *are* devices when you see the errors  
that you're worried about, then OMPI would have gotten past this first  
case and reported something a bit more specific.  And therefore is a  
different warning than the one I'm proposing to remove [by default].



I'm only concerned about the case where there's an IB card, the user
expects the IB card to be used, and the IB card isn't used.


Can you put in a site wide

btl = ^tcp

to avoid the problem?  If the IB card fails, then you'll get  
unreachable MPI errors.



If the
changes don't silence a warning in that situation, I'm fine with  
whatever
you do.  But does ibv_get_device_list return an HCA when the port is  
down

(because the SM failed and the machine rebooted since that time)?


Yes.


If not,
we still ahve a (fairly common, unfortunately) error case that we  
need to

report (in my opinion).



Agreed.  This scenario is already covered by the checking that the  
openib BTL performs, and I agree that we should not remove this warning.


That being said, note that the current error-checking code in the  
openib BTL only reports if *no* active ports are found on the host.   
If there are multiple ports in a host where some are active and some  
are [erroneously] not active, OMPI does not report this (because some  
real-world users have dual-port HCAs but are only using 1 port).


Two options jump to mind:

1. Add yet another MCA param to say "all my ports should be active;  
warn/error if you find any non-active ports."
2. Add yet another MCA param where ports that *should* be active are  
itemized.  If OMPI finds that any of them are not active, warn/error.


#1 could really be a special case of #2 (e.g., a keyword "all").  Both  
of these options wouldn't be too difficult to do, but we technically  
are feature frozen...


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] openib btl build question

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 4:37 PM, Brian W. Barrett wrote:


ptmalloc2 is not *required* by the openib btl.  But it is required on
Linux if you want to use the mpi_leave_pinned functionality.  I see
one function call to __pthread_initialize in the ptmalloc2 code -- it
*looks* like it's a function of glibc, but I don't know for sure.


There's actually more than that, it's just buried a bit.  There's a  
whole
bunch of thread-specific data stuff, which is wrapped so that  
different
thread packages can be used (although OMPI only supports pthreads).   
The

wrappers are in ptmalloc2/sysdeps/pthreads.



Doh!  I didn't "grep -r"; my bad...

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] openib btl build question

2008-05-21 Thread Brian W. Barrett

On Wed, 21 May 2008, Jeff Squyres wrote:


On May 21, 2008, at 4:17 PM, Don Kerr wrote:


Just want to make sure what I think I see is true:

Linux build.  openib btl requires ptmalloc2 and ptmalloc2 requires
posix
threads, is that correct?


ptmalloc2 is not *required* by the openib btl.  But it is required on
Linux if you want to use the mpi_leave_pinned functionality.  I see
one function call to __pthread_initialize in the ptmalloc2 code -- it
*looks* like it's a function of glibc, but I don't know for sure.


There's actually more than that, it's just buried a bit.  There's a whole 
bunch of thread-specific data stuff, which is wrapped so that different 
thread packages can be used (although OMPI only supports pthreads).  The 
wrappers are in ptmalloc2/sysdeps/pthreads.


Brian


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 12:21 PM, Terry Dontje wrote:


So are you proposing to set btl_base_warn_component_unused to 0 or
something more BTL specific?


Probably something more btl-specific.  The libibverbs-being-shipped-in- 
main-line-Linux-distro's issue doesn't really affect other BTLs, so I  
don't think it's appropriate to make this a global-to-all-of-OMPI issue.


But the question of when exactly the openib BTL should call the  
function that displays that message is up in the air.  Brian and I  
seem to disagree.  :-)


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett

On Wed, 21 May 2008, Jeff Squyres wrote:


On May 21, 2008, at 3:38 PM, Jeff Squyres wrote:


It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by
parsing
the PCI tables, but that sounds ... painful.



Thinking about this a bit more -- I think it depends on what kind of
errors you are worried about seeing.  IBV does separate the discovery
of devices (ibv_get_device_list) from trying to open a device
(ibv_open_device).  So hypothetically, we *can* distinguish between
these kinds of errors already.

Do you see devices that are so broken that they don't show up in the
list returned from ibv_get_device_list?

FWIW: the *only* case I'm talking about changing the default for is
when ibv_get_device_list returns an empty list (meaning that according
to the verbs stack, there are no devices in the host).  I think that
we should *always* warn for any kinds of errors that occur after that
(e.g., we find a device but can't open it, we find one or more devices
but no active ports, etc.).


Previously, there has not been such a distinction, so I really have no 
idea which caused the openib BTL throw its error (and never really cared, 
as it was always somebody else's problem at that point).


I'm only concerned about the case where there's an IB card, the user 
expects the IB card to be used, and the IB card isn't used.  If the 
changes don't silence a warning in that situation, I'm fine with whatever 
you do.  But does ibv_get_device_list return an HCA when the port is down 
(because the SM failed and the machine rebooted since that time)?  If not, 
we still ahve a (fairly common, unfortunately) error case that we need to 
report (in my opinion).



Brian


Re: [OMPI devel] openib btl build question

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 4:17 PM, Don Kerr wrote:


Just want to make sure what I think I see is true:

Linux build.  openib btl requires ptmalloc2 and ptmalloc2 requires  
posix

threads, is that correct?


ptmalloc2 is not *required* by the openib btl.  But it is required on  
Linux if you want to use the mpi_leave_pinned functionality.  I see  
one function call to __pthread_initialize in the ptmalloc2 code -- it  
*looks* like it's a function of glibc, but I don't know for sure.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 3:38 PM, Jeff Squyres wrote:


It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by
parsing
the PCI tables, but that sounds ... painful.



Thinking about this a bit more -- I think it depends on what kind of  
errors you are worried about seeing.  IBV does separate the discovery  
of devices (ibv_get_device_list) from trying to open a device  
(ibv_open_device).  So hypothetically, we *can* distinguish between  
these kinds of errors already.


Do you see devices that are so broken that they don't show up in the  
list returned from ibv_get_device_list?


FWIW: the *only* case I'm talking about changing the default for is  
when ibv_get_device_list returns an empty list (meaning that according  
to the verbs stack, there are no devices in the host).  I think that  
we should *always* warn for any kinds of errors that occur after that  
(e.g., we find a device but can't open it, we find one or more devices  
but no active ports, etc.).


--
Jeff Squyres
Cisco Systems



[OMPI devel] openib btl build question

2008-05-21 Thread Don Kerr


Just want to make sure what I think I see is true:

Linux build.  openib btl requires ptmalloc2 and ptmalloc2 requires posix 
threads, is that correct?


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
Then we disagree on a core point.  I believe that users should never have 
something silently unexpected happen (like falling back to TCP from a high 
speed interconnect because of a NIC reset / software issue).  YOu clearly 
don't feel this way.  I don't really work on the project, but do have lots 
of experience being yelled at by users when something unexpected happens.


I guarantee you we'll see a report of poor IB / application performance 
because of the silent fallback to TCP.  There's a reason that error 
message was put in.  I don't get a vote anymore, so do whatever you think 
is best.


Brian


On Wed, 21 May 2008, Jeff Squyres wrote:


One thing I should clarify -- the ibverbs error message from my
previous mail is a red herring.  libibverbs prints that message on
systems where the kernel portions of the OFED stack are not installed
(such as the quick-n-dirty test that I did before -- all I did was
install libibverbs without the corresponding kernel stuff).  I
installed the whole OFED stack on a machine with no verbs-capable
hardware and verified that the libibverbs message does *not* appear
when the kernel bits are properly installed and running.

So we're only talking about the Open MPI warning message here.  More
below.



On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:


2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.


Which is easily solved with a better error message, as Pasha
suggested.


I guess this is where we disagree: I don't believe that the issue is
solved by making a "better" message.  Specifically: this is the first
case where we're saying "if you run with a valid configuration, you're
going to get a warning message and you have to do something extra to
turn it off."

That just seems darn weird to me, especially when other MPI's don't do
the same thing.  Come to think of it, I can't think of many other
software packages that do that.


In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.


But here's the real problem -- with our current selection logic, a
user
with libibverbs but no IB cards gets an error message saying "hey,
we need
you to set this flag to make this error go away" (or would, per
Pasha's
suggestion).  A user with a busted IB stack on a node (which we
still saw
pretty often at LANL) starts using TCP and their application runs
like a
dog.

I guess it's a matter of how often you see errors in the IB stack that
cause nic initialization to fail.  The machines I tend to use still
exhibit this problem pretty often, but it's possible I just work on
bad
hardware more often than is usual in the wild.


I guess this is the central issue: what *is* the common case?  Which
set of users should be forced to do something different?

I'm claiming that now that the Linux distros are shipping libibverbs,
the number of users who have the openib BTL installed but do not have
verbs-capable hardware will be *much* larger than those with verbs-
capable hardware.  Hence, I think the pain point should be for the
smaller group (those with verbs-capable hardware): set an MCA param if
you want to see the warning message.

(we can debate the default value for the BTL-wide base param later --
let's first just debate the *concept* as specific to the openib BTL)


It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by
parsing
the PCI tables, but that sounds ... painful.


Yes, this capability in libiverbs would be good.  Parsing the PCI
tables doesn't sound like our role.

I'll ask the libibverbs authors about it...


I guess the root of my concern is that unexpected behavior with no
explanation is (in my mind) the most dangerous case and the one we
should
address by default.  And turning this error message off is going to
cause
unexpected behavior without explanation.



But more information is available, and subject to normal
troubleshooting techniques.  And if you're in an environment where you
*do* want to use verbs-capable hardware, then setting the MCA param
seems perfectly acceptable to me.  IIRC, LANL sets a whole pile of MCA
params in the top-level openmpi-mca-params.conf file that are specific
to their environment (right?).  If that's true, what's one more param?

Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca-
params.cof by default (which is what most verbs-capable-hardware-users
utilize).  That would solve the issue for 98% of the IB/iWARP users
out there.  Those who compile from source would need to do it manually.

I agree that this is less than perfect.  My main point is that I
really don't like the idea 

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
One thing I should clarify -- the ibverbs error message from my  
previous mail is a red herring.  libibverbs prints that message on  
systems where the kernel portions of the OFED stack are not installed  
(such as the quick-n-dirty test that I did before -- all I did was  
install libibverbs without the corresponding kernel stuff).  I  
installed the whole OFED stack on a machine with no verbs-capable  
hardware and verified that the libibverbs message does *not* appear  
when the kernel bits are properly installed and running.


So we're only talking about the Open MPI warning message here.  More  
below.




On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:


2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.


Which is easily solved with a better error message, as Pasha  
suggested.


I guess this is where we disagree: I don't believe that the issue is  
solved by making a "better" message.  Specifically: this is the first  
case where we're saying "if you run with a valid configuration, you're  
going to get a warning message and you have to do something extra to  
turn it off."


That just seems darn weird to me, especially when other MPI's don't do  
the same thing.  Come to think of it, I can't think of many other  
software packages that do that.



In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.


But here's the real problem -- with our current selection logic, a  
user
with libibverbs but no IB cards gets an error message saying "hey,  
we need
you to set this flag to make this error go away" (or would, per  
Pasha's
suggestion).  A user with a busted IB stack on a node (which we  
still saw
pretty often at LANL) starts using TCP and their application runs  
like a

dog.

I guess it's a matter of how often you see errors in the IB stack that
cause nic initialization to fail.  The machines I tend to use still
exhibit this problem pretty often, but it's possible I just work on  
bad

hardware more often than is usual in the wild.


I guess this is the central issue: what *is* the common case?  Which  
set of users should be forced to do something different?


I'm claiming that now that the Linux distros are shipping libibverbs,  
the number of users who have the openib BTL installed but do not have  
verbs-capable hardware will be *much* larger than those with verbs- 
capable hardware.  Hence, I think the pain point should be for the  
smaller group (those with verbs-capable hardware): set an MCA param if  
you want to see the warning message.


(we can debate the default value for the BTL-wide base param later --  
let's first just debate the *concept* as specific to the openib BTL)


It would be great if libibverbs could return two different error  
messages
- one for "there's no IB card in this machine" and one for "there's  
an IB

card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by  
parsing

the PCI tables, but that sounds ... painful.


Yes, this capability in libiverbs would be good.  Parsing the PCI  
tables doesn't sound like our role.


I'll ask the libibverbs authors about it...


I guess the root of my concern is that unexpected behavior with no
explanation is (in my mind) the most dangerous case and the one we  
should
address by default.  And turning this error message off is going to  
cause

unexpected behavior without explanation.



But more information is available, and subject to normal  
troubleshooting techniques.  And if you're in an environment where you  
*do* want to use verbs-capable hardware, then setting the MCA param  
seems perfectly acceptable to me.  IIRC, LANL sets a whole pile of MCA  
params in the top-level openmpi-mca-params.conf file that are specific  
to their environment (right?).  If that's true, what's one more param?


Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca- 
params.cof by default (which is what most verbs-capable-hardware-users  
utilize).  That would solve the issue for 98% of the IB/iWARP users  
out there.  Those who compile from source would need to do it manually.


I agree that this is less than perfect.  My main point is that I  
really don't like the idea of "mpirun a.out" will result in warning  
messages for perfectly valid configurations.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Dirk Eddelbuettel
On Wed, May 21, 2008 at 07:41:58PM +0300, Pavel Shamis (Pasha) wrote:
> As I know only Openib kernel drivers is installed by default with 
> distribution.
> But the user level - libibverbs and other openib stuff is not installed 
> by default. User need go to the package manager and explicitly
> select libibverb.  So if user decided to install libibverbs he had 
> reasons for it, and I think it will be ok to show this warning.

Debian builds with libibverbs because it is available -- we'd be doing
a disservice to those who have the hardware if we didn't.

Because we build with it, Open MPI packages depend on it. So if you
install Open MPI on Debian (and hence Ubuntu and ), you get libibverbs.

So yes, suppressing the warning would be great as a default. 

And generally speaking, we prefer to not diverge from upstream so if
you made it a default, we wouldn;t have to differ in what we
ship. That's how the thread started.

Thanks to all for considering this.

Dirk (who as Debian co-maintainer is rather happy with all your good work :)

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
As I know only Openib kernel drivers is installed by default with 
distribution.
But the user level - libibverbs and other openib stuff is not installed 
by default. User need go to the package manager and explicitly
select libibverb.  So if user decided to install libibverbs he had 
reasons for it, and I think it will be ok to show this warning.


Pasha.

Jeff Squyres wrote:

On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:

  
I think having a parameter to turn off the warning is a great idea.   
So
great in fact, that it already exists in the trunk and v1.2 :)!   
Setting
the default value for the btl_base_warn_component_unused flag from 0  
to 1

will have the desired effect.



Ah, ok.  I either didn't know about this flag or forgot about it.  :-)

I just tested this myself and see that there are actually *two* error  
messages (on a machine where I installed libibverbs, but with no  
OpenFabrics hardware, with OMPI 1.2.6):


% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--

So the MCA param takes care of the OMPI message; I'll contact the  
libibverbs authors about their message.


  
I'm not sure I agree with setting the default to 0, however.  The  
warning
has proven extremely useful for diagnosing that IB (or less often GM  
or
MX) isn't properly configured on a compute node due to some random  
error.

It's trivially easy for any packaging group to have the line

  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install  
phase of

the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.



I guess that this is what I am torn about.  Yes, it's a useful message  
-- in some cases.  But now that libibverbs is shipping in Debain and  
other Linuxes, the number of machines out there with verbs-capable  
hardware is far, far smaller than the number of machines without verbs- 
capable hardware.  Specifically:


1. The number of cases where seeing the message by default is *not*  
useful is now potentially [much] larger than the number of cases where  
the default message is useful.


2. An out-of-the-box "mpirun a.out" will print warning messages in  
perfectly valid/good configurations (no verbs-capable hardware, but  
just happen to have libibverbs installed).  This is a Big Deal.


3. Problems with HCA hardware and/or verbs stack are uncommon  
(nowadays).  I'd be ok asking someone to enable a debug flag to get  
more information on configuration problems or hardware faults.


Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with  
libibverbs installed must also have verbs-capable hardware.


  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Terry Dontje
So are you proposing to set btl_base_warn_component_unused to 0 or 
something more BTL specific?


--td
Jeff Squyres wrote:
What: Change default in openib BTL to not complain if no OpenFabrics  
devices are found


Why: Many linuxes are shipping libibverbs these days, but most users  
still don't have OpenFabrics hardware


Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.   
OMPI will therefore build the openib BTL by default, but then  
complains at run time when there's no OpenFabrics hardware.


We should change the default in v1.3 to not complain if there is no  
OpenFabrics devices found (perhaps have an MCA param to enable the  
warning if desired).


Longer version
==

I just got a request from the Debian Open MPI package maintainers to  
include the following in the default openmpi-mca-params.conf for the  
OMPI v1.2 package:


# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy  
documentation path for users to shut up these warnings when they build  
on machines with libibverbs present but no OpenFabrics hardware.


I think that this is fine for the v1.2 series (and will file a CMR for  
it).  But for v1.3, I think we should change the default.


The vast majority of users will not have OpenFabrics devices, and we  
should therefore not complain if we can't find any at run-time.  We  
can/should still complain if we find OpenFabrics devices but no active  
ports (i.e., don't change this behavior).


But for optimizing the common case: I think we should (by default) not  
print a warning if no OpenFabrics devices are found.  We can also  
[easily] have an MCA parameter that *will* display a warning if no  
OpenFabrics devices are found.


  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett

On Wed, 21 May 2008, Jeff Squyres wrote:


2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.


Which is easily solved with a better error message, as Pasha suggested.


3. Problems with HCA hardware and/or verbs stack are uncommon
(nowadays).  I'd be ok asking someone to enable a debug flag to get
more information on configuration problems or hardware faults.

Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.


But here's the real problem -- with our current selection logic, a user 
with libibverbs but no IB cards gets an error message saying "hey, we need 
you to set this flag to make this error go away" (or would, per Pasha's 
suggestion).  A user with a busted IB stack on a node (which we still saw 
pretty often at LANL) starts using TCP and their application runs like a 
dog.


I guess it's a matter of how often you see errors in the IB stack that 
cause nic initialization to fail.  The machines I tend to use still 
exhibit this problem pretty often, but it's possible I just work on bad 
hardware more often than is usual in the wild.


It would be great if libibverbs could return two different error messages 
- one for "there's no IB card in this machine" and one for "there's an IB 
card here, but we can't initialize it".  I think that would make this 
argument go away.  Open MPI could probably mimic that behavior by parsing 
the PCI tables, but that sounds ... painful.


I guess the root of my concern is that unexpected behavior with no 
explanation is (in my mind) the most dangerous case and the one we should 
address by default.  And turning this error message off is going to cause 
unexpected behavior without explanation.


Just my $0.02.


Brian


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres

On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:

I think having a parameter to turn off the warning is a great idea.   
So
great in fact, that it already exists in the trunk and v1.2 :)!   
Setting
the default value for the btl_base_warn_component_unused flag from 0  
to 1

will have the desired effect.


Ah, ok.  I either didn't know about this flag or forgot about it.  :-)

I just tested this myself and see that there are actually *two* error  
messages (on a machine where I installed libibverbs, but with no  
OpenFabrics hardware, with OMPI 1.2.6):


% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--

So the MCA param takes care of the OMPI message; I'll contact the  
libibverbs authors about their message.


I'm not sure I agree with setting the default to 0, however.  The  
warning
has proven extremely useful for diagnosing that IB (or less often GM  
or
MX) isn't properly configured on a compute node due to some random  
error.

It's trivially easy for any packaging group to have the line

  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install  
phase of

the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.


I guess that this is what I am torn about.  Yes, it's a useful message  
-- in some cases.  But now that libibverbs is shipping in Debain and  
other Linuxes, the number of machines out there with verbs-capable  
hardware is far, far smaller than the number of machines without verbs- 
capable hardware.  Specifically:


1. The number of cases where seeing the message by default is *not*  
useful is now potentially [much] larger than the number of cases where  
the default message is useful.


2. An out-of-the-box "mpirun a.out" will print warning messages in  
perfectly valid/good configurations (no verbs-capable hardware, but  
just happen to have libibverbs installed).  This is a Big Deal.


3. Problems with HCA hardware and/or verbs stack are uncommon  
(nowadays).  I'd be ok asking someone to enable a debug flag to get  
more information on configuration problems or hardware faults.


Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with  
libibverbs installed must also have verbs-capable hardware.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
And there's a typo in my first paragraph.  The flag currently defaults to 
1 (print the warning).  It should be switched to 0 to turn off the 
warning.  Sorry for any confusion I might have caused -- I blame the lack 
of caffeine in the morning.


Brian

On Wed, 21 May 2008, Pavel Shamis (Pasha) wrote:


I'm agree with Brian. We may add to the warning message detailed
description how to disable it.

Pasha

Brian W. Barrett wrote:

I think having a parameter to turn off the warning is a great idea.  So
great in fact, that it already exists in the trunk and v1.2 :)!  Setting
the default value for the btl_base_warn_component_unused flag from 0 to 1
will have the desired effect.

I'm not sure I agree with setting the default to 0, however.  The warning
has proven extremely useful for diagnosing that IB (or less often GM or
MX) isn't properly configured on a compute node due to some random error.
It's trivially easy for any packaging group to have the line

   btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install phase of
the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.


Brian


On Wed, 21 May 2008, Jeff Squyres wrote:



What: Change default in openib BTL to not complain if no OpenFabrics
devices are found

Why: Many linuxes are shipping libibverbs these days, but most users
still don't have OpenFabrics hardware

Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.
OMPI will therefore build the openib BTL by default, but then
complains at run time when there's no OpenFabrics hardware.

We should change the default in v1.3 to not complain if there is no
OpenFabrics devices found (perhaps have an MCA param to enable the
warning if desired).

Longer version
==

I just got a request from the Debian Open MPI package maintainers to
include the following in the default openmpi-mca-params.conf for the
OMPI v1.2 package:

# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy
documentation path for users to shut up these warnings when they build
on machines with libibverbs present but no OpenFabrics hardware.

I think that this is fine for the v1.2 series (and will file a CMR for
it).  But for v1.3, I think we should change the default.

The vast majority of users will not have OpenFabrics devices, and we
should therefore not complain if we can't find any at run-time.  We
can/should still complain if we find OpenFabrics devices but no active
ports (i.e., don't change this behavior).

But for optimizing the common case: I think we should (by default) not
print a warning if no OpenFabrics devices are found.  We can also
[easily] have an MCA parameter that *will* display a warning if no
OpenFabrics devices are found.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Trunk check-in policy until the branch for 1.3

2008-05-21 Thread Richard Graham
Thanks,
Rich


On 5/20/08 10:37 PM, "Brad Benton"  wrote:

> 
> 
> 2008/5/20 Richard Graham :
>> Brad,
>>   Do you want these for bug fixes too ?
> 
> I think that it's okay to check in small bug fixes without a ticket.  I know
> this is a somewhat nebulous guideline, but I'm thinking bug fixes of a few
> lines as being "small".  So, unless George has an objection, I'm fine with
> that.
> 
> --Brad
> 
> 
>  
>> 
>> 
>> Rich
>> 
>> 
>> 
>> On 5/20/08 5:53 PM, "Brad Benton"  wrote:
>> 
>>> All:
>>> 
>>> In order to better track changes on the trunk until we branch for 1.3, we
>>> (the release managers) would like to ask that all trunk checkins have
>>> corresponding tickets associated with them.  This will help us to keep
>>> better track of the state of the trunk prior to branching.  Note, this is
>>> just until we branch, which, hopefully, will be in a few days.
>>> 
>>> The plan is to branch the trunk for 1.3 this Friday evening (May 23).
>>> However, depending on the state of the trunk and the final items to get in
>>> before the branch, we might decide to delay the branch until the following
>>> Tuesday (May 27).  George, Jeff & I will discuss this on Friday afternoon
>>> and will send out the final plan for branching (Friday or Tuesday) at that
>>> time.
>>> 
>>> Thanks,
>>> --Brad
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
I'm agree with Brian. We may add to the warning message detailed 
description how to disable it.


Pasha

Brian W. Barrett wrote:
I think having a parameter to turn off the warning is a great idea.  So 
great in fact, that it already exists in the trunk and v1.2 :)!  Setting 
the default value for the btl_base_warn_component_unused flag from 0 to 1 
will have the desired effect.


I'm not sure I agree with setting the default to 0, however.  The warning 
has proven extremely useful for diagnosing that IB (or less often GM or 
MX) isn't properly configured on a compute node due to some random error. 
It's trivially easy for any packaging group to have the line


   btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install phase of 
the package build (indeed, our simple build scripts at LANL used to do 
this on a regular bases due to our need to tweek the OOB to keep IPoIB 
happier at scale).


I think keeping the Debian guys happy is a good thing.  Giving them an 
easy way to turn off silly warnings is a good thing.  Removing a known 
useful warning to help them doesn't seem like a good thing.



Brian


On Wed, 21 May 2008, Jeff Squyres wrote:

  

What: Change default in openib BTL to not complain if no OpenFabrics
devices are found

Why: Many linuxes are shipping libibverbs these days, but most users
still don't have OpenFabrics hardware

Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.
OMPI will therefore build the openib BTL by default, but then
complains at run time when there's no OpenFabrics hardware.

We should change the default in v1.3 to not complain if there is no
OpenFabrics devices found (perhaps have an MCA param to enable the
warning if desired).

Longer version
==

I just got a request from the Debian Open MPI package maintainers to
include the following in the default openmpi-mca-params.conf for the
OMPI v1.2 package:

# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy
documentation path for users to shut up these warnings when they build
on machines with libibverbs present but no OpenFabrics hardware.

I think that this is fine for the v1.2 series (and will file a CMR for
it).  But for v1.3, I think we should change the default.

The vast majority of users will not have OpenFabrics devices, and we
should therefore not complain if we can't find any at run-time.  We
can/should still complain if we find OpenFabrics devices but no active
ports (i.e., don't change this behavior).

But for optimizing the common case: I think we should (by default) not
print a warning if no OpenFabrics devices are found.  We can also
[easily] have an MCA parameter that *will* display a warning if no
OpenFabrics devices are found.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




[OMPI devel] Intel ifort / Libtool 2.x problem

2008-05-21 Thread Jeff Squyres
Heads up: Tim switched over the trunk tarballs yesterday to use  
Libtool 2.2.4, Autoconf 2.62, and Automake 1.10.1.


MTT shows that there is a problem with ifort and LT 2.2.4 and Fortran  
shared libraries on Linux -- LT seems to be dropping the necessary  
compiler flags to build Fortran shared libraries.  I did some more  
testing and have filed a bug report with our friendly neighborhood  
Libtool developers.  I'll file an OMPI trac ticket when my message  
arrives in the bug-libtool list web archives.


For the time being, if you want to build with the Intel compilers, you  
must pass -fPIC in FCFLAGS:


./configure CC=icc CXX=icpc FC=ifort F77=ifort FCFLAGS=-fPIC ...

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
I think having a parameter to turn off the warning is a great idea.  So 
great in fact, that it already exists in the trunk and v1.2 :)!  Setting 
the default value for the btl_base_warn_component_unused flag from 0 to 1 
will have the desired effect.


I'm not sure I agree with setting the default to 0, however.  The warning 
has proven extremely useful for diagnosing that IB (or less often GM or 
MX) isn't properly configured on a compute node due to some random error. 
It's trivially easy for any packaging group to have the line


  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install phase of 
the package build (indeed, our simple build scripts at LANL used to do 
this on a regular bases due to our need to tweek the OOB to keep IPoIB 
happier at scale).


I think keeping the Debian guys happy is a good thing.  Giving them an 
easy way to turn off silly warnings is a good thing.  Removing a known 
useful warning to help them doesn't seem like a good thing.



Brian


On Wed, 21 May 2008, Jeff Squyres wrote:


What: Change default in openib BTL to not complain if no OpenFabrics
devices are found

Why: Many linuxes are shipping libibverbs these days, but most users
still don't have OpenFabrics hardware

Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.
OMPI will therefore build the openib BTL by default, but then
complains at run time when there's no OpenFabrics hardware.

We should change the default in v1.3 to not complain if there is no
OpenFabrics devices found (perhaps have an MCA param to enable the
warning if desired).

Longer version
==

I just got a request from the Debian Open MPI package maintainers to
include the following in the default openmpi-mca-params.conf for the
OMPI v1.2 package:

# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy
documentation path for users to shut up these warnings when they build
on machines with libibverbs present but no OpenFabrics hardware.

I think that this is fine for the v1.2 series (and will file a CMR for
it).  But for v1.3, I think we should change the default.

The vast majority of users will not have OpenFabrics devices, and we
should therefore not complain if we can't find any at run-time.  We
can/should still complain if we find OpenFabrics devices but no active
ports (i.e., don't change this behavior).

But for optimizing the common case: I think we should (by default) not
print a warning if no OpenFabrics devices are found.  We can also
[easily] have an MCA parameter that *will* display a warning if no
OpenFabrics devices are found.




[OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Jeff Squyres
What: Change default in openib BTL to not complain if no OpenFabrics  
devices are found


Why: Many linuxes are shipping libibverbs these days, but most users  
still don't have OpenFabrics hardware


Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.   
OMPI will therefore build the openib BTL by default, but then  
complains at run time when there's no OpenFabrics hardware.


We should change the default in v1.3 to not complain if there is no  
OpenFabrics devices found (perhaps have an MCA param to enable the  
warning if desired).


Longer version
==

I just got a request from the Debian Open MPI package maintainers to  
include the following in the default openmpi-mca-params.conf for the  
OMPI v1.2 package:


# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy  
documentation path for users to shut up these warnings when they build  
on machines with libibverbs present but no OpenFabrics hardware.


I think that this is fine for the v1.2 series (and will file a CMR for  
it).  But for v1.3, I think we should change the default.


The vast majority of users will not have OpenFabrics devices, and we  
should therefore not complain if we can't find any at run-time.  We  
can/should still complain if we find OpenFabrics devices but no active  
ports (i.e., don't change this behavior).


But for optimizing the common case: I think we should (by default) not  
print a warning if no OpenFabrics devices are found.  We can also  
[easily] have an MCA parameter that *will* display a warning if no  
OpenFabrics devices are found.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Threaded progress for CPCs

2008-05-21 Thread Jeff Squyres

One more point that Pasha and I hashed out yesterday in IM...

To avoid the problem of posting a short handshake buffer to already- 
existing SRQs, we will only do the extra handshake if there are PPRQ's  
in receive_queues.  The handshake will go across the smallest PPRQ,  
and represent all QPs in receive_queues (even the SRQs).


If there are no PPRQ's in the receive_queues value, we'll just skip  
the handshake and rely on IB's SRQ RNR retransmitting to fix any race  
conditions.


One point that needs clarification: whether IBCM and RDMACM *require*  
posting receive buffers on the new QP's.  If so, this scheme will run  
into trouble because we do not want to post any buffers on SRQs; that  
gets racy and difficult to synchronize right (especially if multiple  
remote peers are simultaneously trying to connect to a single SRQ).   
I'll check this out today or tomorrow.


We'll have to re-visit this when iWARP NICs start supporting SRQ, but  
if the above assumption is true (no need to post any receive buffers  
for IBCM and RDMACM), it will be good enough for v1.3.



On May 20, 2008, at 12:37 PM, Jeff Squyres wrote:


Ok, I think we're mostly converged on a solution.  This might not get
implemented immediately (got some other pending v1.3 stuff to bug fix,
etc.), but it'll happen for v1.3.

- endpoint creation will mpool alloc/register a small buffer for
handshake
- cpc does not need to call _post_recvs()); instead, it can just post
the single small buffer on each BSRQ QP (from the small buffer on the
endpoint)
- cpc will call _connected() (in the main thread, not the CPC progress
thread) when all BSRQ QPs are connected
  - if _post_recvs() was previously called, do the normal "finish
setting up" stuff and declare the endpoint CONNECTED
  - if _post_recvs() was not previously called, then:
- call _post_recvs()
- send a short CTS message on the 1st BSRQ QP
- wait for CTS from peer
- when both CTS from peer has arrived *and* we have sent our CTS,
declare endpoint CONNECTED

Doing it this way adds no overhead to OOB/XOOB (who don't need this
extra handshake).  I think the code can be factored nicely to make
this not too complicated.

I'll work on this once I figure out the memory corruption I'm seeing
in the receive_queues patch...

Note that this addresses the wireup multi-threading issues -- not
iWarp SRQ issues. We'll tackle those separately, and possibly not for
the initial v1.3.0 release.


On May 20, 2008, at 6:02 AM, Gleb Natapov wrote:


On Mon, May 19, 2008 at 01:38:53PM -0400, Jeff Squyres wrote:

5. ...?

What about moving posting of receive buffers into main thread. With
SRQ it is easy: don't post anything in CPC thread. Main thread will
prepost buffers automatically after first fragment received on the
endpoint (in btl_openib_handle_incoming()). With PPRQ it's more
complicated. What if we'll prepost dummy buffers (not from free
list)
during IBCM connection stage and will run another three way
handshake
protocol using those buffers, but from the main thread. We will  
need

to
prepost one buffer on the active side and two buffers on the  
passive

side.



This is probably the most viable alternative -- it would be easiest
if
we did this for all CPC's, not just for IBCM:

- for PPRQ: CPCs only post a small number of receive buffers,
suitable
for another handshake that will run in the upper-level openib BTL
- for SRQ: CPCs don't post anything (because the SRQ already
"belongs"
to the upper level openib BTL)

Do we have a BSRQ restriction that there *must* be at least one  
PPRQ?

No. We don't have such restriction and I wouldn't want to add it.


If so, we could always run the upper-level openib BTL really-post-
the-
buffers handshake over the smallest buffer size BSRQ RC PPRQ (i.e.,
have the CPC post a single receive on this QP -- see below), which
would make things much easier.  If we don't already have this
restriction, would we mind adding it?  We have one PPRQ in our
default
receive_queues value, anyway.
If there is not PPRQ then we can relay on RNR/retransmit logic in  
case

there is not enough buffer in SRQ. We do that anyway in openib BTL
code.



With this rationale, once the CPC says "ok, all BSRQ QP's are
connected", then _endpoint.c can run a CTS handshake to post the
"real" buffers, where each side does the following:

- CPC calls _endpoint_connected() to tell the upper level BTL that  
it

is fully connected (the function is invoked in the main thread)
- _endpoint_connected() posts all the "real" buffers to all the BSRQ
QP's on the endpoint
- _endpoint_connected() then sends a CTS control message to remote
peer via smallest RC PPRQ
- upon receipt of CTS:
 - release the buffer (***)
 - set endpoint state of CONNECTED and let all pending messages
flow... (as it happens today)

So it actually doesn't even have to be a handshake -- it's just an
additional CTS sent over the newly-created RC QP.  Since it's RC, we
don't have to do much -- just wait for the CTS to kn

Re: [OMPI devel] get_iwarp_subnet_id in openib btl

2008-05-21 Thread Pak Lui

Yup. It works. Thanks! With r18470 it works even better!

Jon Mason wrote:

On Tue, May 20, 2008 at 03:44:41PM -0400, Pak Lui wrote:

Hi Jon,

This is CentOS 4.6 on Ranger. Sorry I didn't mention it. So what should  
I do?


login3% more /etc/*release*
::
/etc/redhat-release
::
CentOS release 4.6 (Final)
::
/etc/rocks-release
::
Rocks release 4.2.1 (Cydonia)
login3%


Sorry, looks like I busted you.  Please pull the latest bits and verify my
fix solves your issue.

Thanks,
Jon



Jon Mason wrote:

On Tue, May 20, 2008 at 02:48:49PM -0400, Pak Lui wrote:

Hi,

I am not familiar with get_iwarp_subnet_id and I am not sure why it 
is  causing trunk to barf. I think I am using ofed 1.2.5. See 
attached for  

That is in the 1.3 tree, not 1.2.  There was a bug in Solaris that was
fixed recently that was around this area.  Please make sure you are at
the latest level.

Thanks,
Jon


config.log.

  10439 libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info 
components.o  ompi_info.o output.o param.o version.o  
../../../ompi/.libs/libmpi.so  -L/opt/ofed/lib64 -libcm -libverbs 
-lrt   
/work/00951/paklui/ompi-trunk7/config-data1/orte/.libs/libopen-rte.so 
  
/work/00951/paklui/ompi-trunk7/config-data1/opal/.libs/libopen-pal.so 
 -lnuma -ldl -lnsl -lutil -lpthread -Wl,--rpath   
-Wl,/work/00951/paklui/ompi-trunk7/shared-install1/lib
  10440 ../../../ompi/.libs/libmpi.so: undefined reference to   
`get_iwarp_subnet_ id'

  10441 make[2]: *** [ompi_info] Error 2
  10442 make[2]: Leaving directory   
`/work/00951/paklui/ompi-trunk7/config-data1/ 
ompi/tools/ompi_info'

  10443 make[1]: *** [install-recursive] Error 1
  10444 make[1]: Leaving directory   
`/work/00951/paklui/ompi-trunk7/config-data1/ompi'

  10445 make: *** [install-recursive] Error 1
"make.install.log.0" 10445L, 2050037C 10445,1 
Bot


--

- Pak Lui
pak@sun.com



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--

- Pak Lui
pak@sun.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--


- Pak Lui
pak@sun.com