Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

2018-09-20 Thread Pavel Shamis
More UCX packages:
Fedora: http://rpms.famillecollet.com/rpmphp/zoom.php?rpm=ucx
OpenSUSE: https://software.opensuse.org/package/openucx


On Thu, Sep 20, 2018 at 7:53 AM Yossi Itigin  wrote:

> Currently the target is RH 8
> And yes, UCX is also available on EPEL, for example:
> https://centos.pkgs.org/7/epel-x86_64/ucx-1.3.1-1.el7.x86_64.rpm.html
>
> -Original Message-
> From: Peter Kjellström 
> Sent: Thursday, September 20, 2018 2:11 PM
> To: Yossi Itigin 
> Cc: Open MPI Developers 
> Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1
>
> On Thu, 20 Sep 2018 09:03:44 +
> Yossi Itigin  wrote:
>
> > Hi,
> >
> > UCX is on the way into RH distro and will be available and ON by
> > default (auto-detectable by OMPI build process) automatically in the
> > near future.
>
> That is good to hear, already in the upcomming 7.6?
>
> > Meanwhile, user can enable UCX by two methods:
> > 1. Download & Install UCX from openucx.org and Build openmpi with it.
> > 2. Download HPC-X from Mellanox site (openmpi pre-compiled and
> > packaged with UCX) for distro of interest (User can re-compile the
> > package with site default as well)
>
> There's even:
>  3. enable epel and install it from there.
>
> /Peter K
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
>
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] v2.1.5rc1 is out

2018-08-17 Thread Pavel Shamis
It looks to me like mxm related failure ?

On Thu, Aug 16, 2018 at 1:51 PM Vallee, Geoffroy R. 
wrote:

> Hi,
>
> I ran some tests on Summitdev here at ORNL:
> - the UCX problem is solved and I get the expected results for the tests
> that I am running (netpipe and IMB).
> - without UCX:
> * the performance numbers are below what would be expected but I
> believe at this point that the slight performance deficiency is due to
> other users using other parts of the system.
> * I also encountered the following problem while running IMB_EXT
> and I now realize that I had the same problem with 2.4.1rc1 but did not
> catch it at the time:
> [summitdev-login1:112517:0] Caught signal 11 (Segmentation fault)
> [summitdev-r0c2n13:91094:0] Caught signal 11 (Segmentation fault)
>  backtrace 
>  2 0x00073864 mxm_handle_error()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
>  3 0x00073fa4 mxm_error_signal_handler()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
>  4 0x00017b24 ompi_osc_rdma_component_query()
> osc_rdma_component.c:0
>  5 0x000d4634 ompi_osc_base_select()  ??:0
>  6 0x00065e84 ompi_win_create()  ??:0
>  7 0x000a2488 PMPI_Win_create()  ??:0
>  8 0x1000b28c IMB_window()  ??:0
>  9 0x10005764 IMB_init_buffers_iter()  ??:0
> 10 0x10001ef8 main()  ??:0
> 11 0x00024980 generic_start_main.isra.0()  libc-start.c:0
> 12 0x00024b74 __libc_start_main()  ??:0
> ===
>  backtrace 
>  2 0x00073864 mxm_handle_error()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
>  3 0x00073fa4 mxm_error_signal_handler()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
>  4 0x00017b24 ompi_osc_rdma_component_query()
> osc_rdma_component.c:0
>  5 0x000d4634 ompi_osc_base_select()  ??:0
>  6 0x00065e84 ompi_win_create()  ??:0
>  7 0x000a2488 PMPI_Win_create()  ??:0
>  8 0x1000b28c IMB_window()  ??:0
>  9 0x10005764 IMB_init_buffers_iter()  ??:0
> 10 0x10001ef8 main()  ??:0
> 11 0x00024980 generic_start_main.isra.0()  libc-start.c:0
> 12 0x00024b74 __libc_start_main()  ??:0
> ===
>
> FYI, the 2.x series is not important to me so it can stay as is. I will
> move on testing 3.1.2rc1.
>
> Thanks,
>
>
> > On Aug 15, 2018, at 6:07 PM, Jeff Squyres (jsquyres) via devel <
> devel@lists.open-mpi.org> wrote:
> >
> > Per our discussion over the weekend and on the weekly webex yesterday,
> we're releasing v2.1.5.  There are only two changes:
> >
> > 1. A trivial link issue for UCX.
> > 2. A fix for the vader BTL issue.  This is how I described it in NEWS:
> >
> > - A subtle race condition bug was discovered in the "vader" BTL
> >  (shared memory communications) that, in rare instances, can cause
> >  MPI processes to crash or incorrectly classify (or effectively drop)
> >  an MPI message sent via shared memory.  If you are using the "ob1"
> >  PML with "vader" for shared memory communication (note that vader is
> >  the default for shared memory communication with ob1), you need to
> >  upgrade to v2.1.5 to fix this issue.  You may also upgrade to the
> >  following versions to fix this issue:
> >  - Open MPI v3.0.1 (released March, 2018) or later in the v3.0.x
> >series
> >  - Open MPI v3.1.2 (expected end of August, 2018) or later
> >
> > This vader fix was warranted serious enough to generate a 2.1.5
> release.  This really will be the end of the 2.1.x series.  Trust me; my
> name is Joe Isuzu.
> >
> > 2.1.5rc1 will be available from the usual location in a few minutes (the
> website will update in about 7 minutes):
> >
> >https://www.open-mpi.org/software/ompi/v2.1/
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
>
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI v2.1.4rc1

2018-08-09 Thread Pavel Shamis
Adding Alina and Yossi.

On Thu, Aug 9, 2018 at 2:34 PM Vallee, Geoffroy R. 
wrote:

> Hi,
>
> I tested on Summitdev here at ORNL and here are my comments (but I only
> have a limited set of data for summitdev so my feedback is somewhat
> limited):
> - netpipe/mpi is showing a slightly lower bandwidth than the 3.x series (I
> do not believe it is a problem).
> - I am facing a problem with UCX, it is unclear to me that it is relevant
> since I am using UCX master and I do not know whether it is expected to
> work with OMPI v2.1.x. Note that I am using the same tool for testing all
> other releases of Open MPI and I never had that problem before, having in
> mind that I only tested the 3.x series so far.
>
> make[2]: Entering directory
> `/autofs/nccs-svm1_home1/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_build/ompi/mca/pml/ucx'
> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc -std=gnu99  -O3
> -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -module
> -avoid-version  -o mca_pml_ucx.la -rpath
> /ccs/home/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_install/lib/openmpi
> pml_ucx.lo pml_ucx_request.lo pml_ucx_datatype.lo pml_ucx_component.lo
> -lucp  -lrt -lm -lutil
> libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  .libs/pml_ucx.o
> .libs/pml_ucx_request.o .libs/pml_ucx_datatype.o .libs/pml_ucx_component.o
>  -lucp -lrt -lm -lutil  -O3 -pthread   -pthread -Wl,-soname
> -Wl,mca_pml_ucx.so -o .libs/mca_pml_ucx.so
> /usr/bin/ld: cannot find -lucp
> collect2: error: ld returned 1 exit status
> make[2]: *** [mca_pml_ucx.la] Error 1
> make[2]: Leaving directory
> `/autofs/nccs-svm1_home1/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_build/ompi/mca/pml/ucx'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory
> `/autofs/nccs-svm1_home1/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_build/ompi'
> make: *** [all-recursive] Error 1
>
> My 2 cents,
>
> > On Aug 6, 2018, at 5:04 PM, Jeff Squyres (jsquyres) via devel <
> devel@lists.open-mpi.org> wrote:
> >
> > Open MPI v2.1.4rc1 has been pushed.  It is likely going to be the last
> in the v2.1.x series (since v4.0.0 is now visible on the horizon).  It is
> just a bunch of bug fixes that have accumulated since v2.1.3; nothing
> huge.  We'll encourage users who are still using the v2.1.x series to
> upgrade to this release; it should be a non-event for anyone who has
> already upgraded to the v3.0.x or v3.1.x series.
> >
> >https://www.open-mpi.org/software/ompi/v2.1/
> >
> > If no serious-enough issues are found, we plan to release 2.1.4 this
> Friday, August 10, 2018.
> >
> > Please test!
> >
> > Bug fixes/minor improvements:
> > - Disable the POWER 7/BE block in configure.  Note that POWER 7/BE is
> >  still not a supported platform, but it is no longer automatically
> >  disabled.  See
> >  https://github.com/open-mpi/ompi/issues/4349#issuecomment-374970982
> >  for more information.
> > - Fix bug with request-based one-sided MPI operations when using the
> >  "rdma" component.
> > - Fix issue with large data structure in the TCP BTL causing problems
> >  in some environments.  Thanks to @lgarithm for reporting the issue.
> > - Minor Cygwin build fixes.
> > - Minor fixes for the openib BTL:
> >  - Support for the QLogic RoCE HCA
> >  - Support for the Boradcom Cumulus RoCE HCA
> >  - Enable support for HDR link speeds
> > - Fix MPI_FINALIZED hang if invoked from an attribute destructor
> >  during the MPI_COMM_SELF destruction in MPI_FINALIZE.  Thanks to
> >  @AndrewGaspar for reporting the issue.
> > - Java fixes:
> >  - Modernize Java framework detection, especially on OS X/MacOS.
> >Thanks to Bryce Glover for reporting and submitting the fixes.
> >  - Prefer "javac -h" to "javah" to support newer Java frameworks.
> > - Fortran fixes:
> >  - Use conformant dummy parameter names for Fortran bindings.  Thanks
> >to Themos Tsikas for reporting and submitting the fixes.
> >  - Build the MPI_SIZEOF() interfaces in the "TKR"-style "mpi" module
> >whenever possible.  Thanks to Themos Tsikas for reporting the
> >issue.
> >  - Fix array of argv handling for the Fortran bindings of
> >MPI_COMM_SPAWN_MULTIPLE (and its associated man page).
> >  - Make NAG Fortran compiler support more robust in configure.
> > - Disable the "pt2pt" one-sided MPI component when MPI_THREAD_MULTIPLE
> >  is used.  This component is simply not safe in MPI_THREAD_MULTIPLE
> >  scenarios, and will not be fixed in the v2.1.x series.
> > - Make the "external" hwloc component fail gracefully if it is tries
> >  to use an hwloc v2.x.y installation.  hwloc v2.x.y will not be
> >  supported in the Open MPI v2.1.x series.
> > - Fix "vader" shared memory support for messages larger than 2GB.
> >  Thanks to Heiko Bauke for the bug report.
> > - Configure fixes for external PMI directory detection.  Thanks to
> >  Davide Vanzo for the report.
> >
> >
> > --
> 

Re: [OMPI devel] Time to remove the openib btl? (Was: Users/Eager RDMA causing slow osu_bibw with 3.0.0)

2018-04-11 Thread Pavel Shamis
Personally, I haven't tried iWARP, I just don't have this HW. As far as I
can remember, one of the "special" requirements for  iWARP's NICs is
RDMACM. UCT does have support for RDMACM, but somebody have to try and see
if there are any other gaps.
P.

On Fri, Apr 6, 2018 at 8:17 AM, Jeff Squyres (jsquyres) 
wrote:

> Does UCT support iWARP?
>
>
> > On Apr 6, 2018, at 9:45 AM, Jeff Squyres (jsquyres) 
> wrote:
> >
> > That would be sweet.  Are you aiming for v4.0.0, perchance?
> >
> > I.e., should we have "remove openib BTL / add uct BTL" to the target
> feature list for v4.0.0?
> >
> > (I did laugh at the Kylo name, but I really, really don't want to keep
> the propagate the idea of meaningless names for BTLs -- especially since
> BTL names, more so than most other component names, are visible to the
> user.  ...unless someone wants to finally finish the ideas and implement
> the whole network-transport-name system that we've talked about for a few
> years... :-) )
> >
> >
> >> On Apr 5, 2018, at 1:49 PM, Thananon Patinyasakdikul <
> tpati...@vols.utk.edu> wrote:
> >>
> >> Just more information to help with the decision:
> >>
> >> I am working on Nathan’s uct btl to make it work with ob1 and
> infiniband. So this could be a replacement for openib and honestly we
> should totally call this new uct btl Kylo.
> >>
> >> Arm
> >>
> >>> On Apr 5, 2018, at 1:37 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> >>>
> >>> Below is an email exchange from the users mailing list.
> >>>
> >>> I'm moving this over to devel to talk among the developer community.
> >>>
> >>> Multiple times recently on the users list, we've told people with
> problems with the openib BTL that they should be using UCX (per Mellanox's
> publicly-stated support positions).
> >>>
> >>> Is it time to deprecate / print warning messages / remove the openib
> BTL?
> >>>
> >>>
> >>>
>  Begin forwarded message:
> 
>  From: Nathan Hjelm 
>  Subject: Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0
>  Date: April 5, 2018 at 12:48:08 PM EDT
>  To: Open MPI Users 
>  Cc: Open MPI Users 
>  Reply-To: Open MPI Users 
> 
> 
>  Honestly, this is a configuration issue with the openib btl. There is
> no reason to keep either eager RDMA nor is there a reason to pipeline RDMA.
> I haven't found an app where either of these "features" helps you with
> infiniband. You have the right idea with the parameter changes but Howard
> is correct, for Mellanox the future is UCX not verbs. I would try it and
> see if it works for you but if it doesn't I would set those two parameters
> in your /etc/openmpi-mca-params.conf and run like that.
> 
>  -Nathan
> 
>  On Apr 05, 2018, at 01:18 AM, Ben Menadue 
> wrote:
> 
> > Hi,
> >
> > Another interesting point. I noticed that the last two message sizes
> tested (2MB and 4MB) are lower than expected for both osu_bw and osu_bibw.
> Increasing the minimum size to use the RDMA pipeline to above these sizes
> brings those two data-points up to scratch for both benchmarks:
> >
> > 3.0.0, osu_bw, no rdma for large messages
> >
> >> mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -map-by
> ppr:1:node -np 2 -H r6,r7 ./osu_bw -m 2097152:4194304
> > # OSU MPI Bi-Directional Bandwidth Test v5.4.0
> > # Size  Bandwidth (MB/s)
> > 2097152  6133.22
> > 4194304  6054.06
> >
> > 3.0.0, osu_bibw, eager rdma disabled, no rdma for large messages
> >
> >> mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -mca
> btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw -m
> 2097152:4194304
> > # OSU MPI Bi-Directional Bandwidth Test v5.4.0
> > # Size  Bandwidth (MB/s)
> > 2097152 11397.85
> > 4194304 11389.64
> >
> > This makes me think something odd is going on in the RDMA pipeline.
> >
> > Cheers,
> > Ben
> >
> >
> >
> >> On 5 Apr 2018, at 5:03 pm, Ben Menadue 
> wrote:
> >> Hi,
> >>
> >> We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and
> noticed that osu_bibw gives nowhere near the bandwidth I’d expect (this is
> on FDR IB). However, osu_bw is fine.
> >>
> >> If I disable eager RDMA, then osu_bibw gives the expected numbers.
> Similarly, if I increase the number of eager RDMA buffers, it gives the
> expected results.
> >>
> >> OpenMPI 1.10.7 gives consistent, reasonable numbers with default
> settings, but they’re not as good as 3.0.0 (when tuned) for large buffers.
> The same option changes produce no different in the performance for 1.10.7.
> >>
> >> I was wondering if anyone else has noticed anything similar, and if
> this is unexpected, if anyone has a suggestion on how to investigate
> further?
> >>
> >> Thanks,
> >> Ben
> >>
> >>
> >> Here’s are the numbers:
> >>
> >> 3.0.0, osu_bw,

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-25 Thread Pavel Shamis
Update: I got conceptual ok from our legal.
- Pasha

On Tue, Oct 18, 2016 at 11:15 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> Fair point; everyone needs to look at this and decide a) if that's ok, or
> b) if we want to change it to incorporate clause #3.
>
>
> > On Oct 12, 2016, at 12:14 PM, Pavel Shamis 
> wrote:
> >
> > I think one of the main add-ons of the CLA over BSD-3 license was the
> clause #3 (Grant of Patent License). As far as I can tell it does not
> appear in the new sign-off-CLA.
> >
> > On Wed, Oct 12, 2016 at 11:06 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > On Oct 12, 2016, at 9:02 AM, Pavel Shamis 
> wrote:
> > >
> > > You mentioned that such a change will block contributions.  Did you
> mean only temporarily, while individual Contributor/Member organization
> legal departments are reviewing the new terms?  If so, that one-time "cost"
> may be acceptable, since the goal of the new terms are designed to put us
> in a better place, long-term.
> > >
> > > I hope this will be a temporary freeze, unless legal folks will have
> some other concerns. This is a substantial change since you modify exiting
> terms and conditions (as I read it).
> >
> > Not sure what you mean: the license under which Open MPI is distributed
> is still the same.
> >
> > What terms and conditions are you referring to, specifically?
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread Pavel Shamis
I think one of the main add-ons of the CLA over BSD-3 license was the
clause #3 (Grant of Patent License). As far as I can tell it does not
appear in the new sign-off-CLA.

On Wed, Oct 12, 2016 at 11:06 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> On Oct 12, 2016, at 9:02 AM, Pavel Shamis  wrote:
> >
> > You mentioned that such a change will block contributions.  Did you mean
> only temporarily, while individual Contributor/Member organization legal
> departments are reviewing the new terms?  If so, that one-time "cost" may
> be acceptable, since the goal of the new terms are designed to put us in a
> better place, long-term.
> >
> > I hope this will be a temporary freeze, unless legal folks will have
> some other concerns. This is a substantial change since you modify exiting
> terms and conditions (as I read it).
>
> Not sure what you mean: the license under which Open MPI is distributed is
> still the same.
>
> What terms and conditions are you referring to, specifically?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread Pavel Shamis
>
> You mentioned that such a change will block contributions.  Did you mean
> only temporarily, while individual Contributor/Member organization legal
> departments are reviewing the new terms?  If so, that one-time "cost" may
> be acceptable, since the goal of the new terms are designed to put us in a
> better place, long-term.
>

I hope this will be a temporary freeze, unless legal folks will have some
other concerns. This is a substantial change since you modify exiting terms
and conditions (as I read it). This is not an administrative change that
modifies the way the CLA is submitted or singed.
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread Pavel Shamis
Regardless, I would have to notify legal teams about amendment of the
existing CLA. If organizations that already signed the agreement don't have
any say, then this conversation is pointless.

-Pasha

On Wed, Oct 12, 2016 at 9:29 AM, r...@open-mpi.org  wrote:

> The OMPI community members have had their respective legal offices review
> the changes, but we decided to provide notice and get input from others
> prior to the formal vote of acceptance. Once approved, there will no longer
> be a CLA at all. The only requirement for contribution will be the sign-off.
>
> Rationale: the open source world has evolved considerably since we first
> initiated the project. The sign-off method has become the most commonly
> used one for accepting contributions. The CLA was intended primarily to
> protect the contributor, not the project, as it ensured that the
> contributor had discussed their contribution with their employer prior to
> submitting it.
>
> This approach puts more responsibility on the contributor. It doesn’t
> impact the project very much - trying to “relicense” OMPI would be just as
> problematic today as under the revised bylaws, and quite frankly is
> something we would never envision attempting.
>
> The frequency with which OMPI is receiving pull requests from non-members
> is the driving force here. We have traditionally accepted such
> contributions “if they are small”, but that is too arbitrary. We either
> have to reject all such contributions, or move to a model that allows them.
> We collectively decided to pursue the latter approach, and hence the change
> to the bylaws.
>
> Just to be clear: only official OMPI members have a vote in this matter.
> If you are not a “member” (e.g., you are a “contributor” status), then this
> is only informational. We respect and want your input, but you don’t
> actually have a vote on this matter.
>
> HTH
> Ralph
>
>
> On Oct 12, 2016, at 7:24 AM, Pavel Shamis  wrote:
>
> Well, at least on my side I will not be able to provide the answer without
> legal involvement.
>
> On Wed, Oct 12, 2016 at 9:16 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> My understanding is there will be a vote, and the question will be
>> "Do we replace existing CLA with the new one ?"
>> If we vote to do so, then everyone will have to sign-off their commits,
>> regardless they previously had (or not) signed a CA
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Wednesday, October 12, 2016, Pavel Shamis 
>> wrote:
>>
>>> a. As a developer I think it is a good idea to lower barriers for code
>>> contribution.
>>> b. IANAL, but this "signature/certification" is not identical to the
>>> existing CLA, which I think has special statement about patents. Seems like
>>> the new model is a bit more relaxed. Does it mean that OMPI amends existing
>>> CLA ? If not - what is the relation between the two. Most likely existing
>>> member would have to take the "new" CLA to the legal for a review.
>>>
>>> -Pasha
>>>
>>> On Wed, Oct 12, 2016 at 8:38 AM, George Bosilca 
>>> wrote:
>>>
>>>> Yes, my understanding is that unsystematic contributors will not have
>>>> to sign the contributor agreement, but instead will have to provide a
>>>> signed patch.
>>>>
>>>>   George.
>>>>
>>>>
>>>> On Wed, Oct 12, 2016 at 9:29 AM, Pavel Shamis 
>>>> wrote:
>>>>
>>>>> Does it mean that contributors don't have to sign contributor
>>>>> agreement ?
>>>>>
>>>>> On Tue, Oct 11, 2016 at 2:35 PM, Geoffrey Paulsen >>>> > wrote:
>>>>>
>>>>>> We have been discussing new Bylaws for the Open MPI Community.  The
>>>>>> primary motivator is to allow non-members to commit code.  Details in the
>>>>>> proposal (link below).
>>>>>>
>>>>>> Old Bylaws / Procedures:  https://github.com/open-mpi/om
>>>>>> pi/wiki/Admistrative-rules
>>>>>>
>>>>>> New Bylaws proposal: https://github.com/open-mpi/om
>>>>>> pi/wiki/Proposed-New-Bylaws
>>>>>>
>>>>>> Open MPI members will be voting on October 25th.  Please voice any
>>>>>> comments or concerns.
>>>>>>
>>>>>>
>>>>>> ___
>>>>>> devel mailing list
>>>>>> devel@lists.open

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread Pavel Shamis
Well, at least on my side I will not be able to provide the answer without
legal involvement.

On Wed, Oct 12, 2016 at 9:16 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> My understanding is there will be a vote, and the question will be
> "Do we replace existing CLA with the new one ?"
> If we vote to do so, then everyone will have to sign-off their commits,
> regardless they previously had (or not) signed a CA
>
> Cheers,
>
> Gilles
>
>
> On Wednesday, October 12, 2016, Pavel Shamis 
> wrote:
>
>> a. As a developer I think it is a good idea to lower barriers for code
>> contribution.
>> b. IANAL, but this "signature/certification" is not identical to the
>> existing CLA, which I think has special statement about patents. Seems like
>> the new model is a bit more relaxed. Does it mean that OMPI amends existing
>> CLA ? If not - what is the relation between the two. Most likely existing
>> member would have to take the "new" CLA to the legal for a review.
>>
>> -Pasha
>>
>> On Wed, Oct 12, 2016 at 8:38 AM, George Bosilca 
>> wrote:
>>
>>> Yes, my understanding is that unsystematic contributors will not have to
>>> sign the contributor agreement, but instead will have to provide a signed
>>> patch.
>>>
>>>   George.
>>>
>>>
>>> On Wed, Oct 12, 2016 at 9:29 AM, Pavel Shamis 
>>> wrote:
>>>
>>>> Does it mean that contributors don't have to sign contributor agreement
>>>> ?
>>>>
>>>> On Tue, Oct 11, 2016 at 2:35 PM, Geoffrey Paulsen 
>>>> wrote:
>>>>
>>>>> We have been discussing new Bylaws for the Open MPI Community.  The
>>>>> primary motivator is to allow non-members to commit code.  Details in the
>>>>> proposal (link below).
>>>>>
>>>>> Old Bylaws / Procedures:  https://github.com/open-mpi/om
>>>>> pi/wiki/Admistrative-rules
>>>>>
>>>>> New Bylaws proposal: https://github.com/open-mpi/om
>>>>> pi/wiki/Proposed-New-Bylaws
>>>>>
>>>>> Open MPI members will be voting on October 25th.  Please voice any
>>>>> comments or concerns.
>>>>>
>>>>>
>>>>> ___
>>>>> devel mailing list
>>>>> devel@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>>
>>>>
>>>>
>>>> ___
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>
>>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread Pavel Shamis
>
> There might be no more contributor agreement at all...
> (See the discussion on the devel-core ML)
>

My concern (based on experience) that this may prevent some organization
from contribution. Obviously people would have to take this back to legal,
which may lead to a "freeze" in terms of contribution.
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread Pavel Shamis
a. As a developer I think it is a good idea to lower barriers for code
contribution.
b. IANAL, but this "signature/certification" is not identical to the
existing CLA, which I think has special statement about patents. Seems like
the new model is a bit more relaxed. Does it mean that OMPI amends existing
CLA ? If not - what is the relation between the two. Most likely existing
member would have to take the "new" CLA to the legal for a review.

-Pasha

On Wed, Oct 12, 2016 at 8:38 AM, George Bosilca  wrote:

> Yes, my understanding is that unsystematic contributors will not have to
> sign the contributor agreement, but instead will have to provide a signed
> patch.
>
>   George.
>
>
> On Wed, Oct 12, 2016 at 9:29 AM, Pavel Shamis 
> wrote:
>
>> Does it mean that contributors don't have to sign contributor agreement ?
>>
>> On Tue, Oct 11, 2016 at 2:35 PM, Geoffrey Paulsen 
>> wrote:
>>
>>> We have been discussing new Bylaws for the Open MPI Community.  The
>>> primary motivator is to allow non-members to commit code.  Details in the
>>> proposal (link below).
>>>
>>> Old Bylaws / Procedures:  https://github.com/open-mpi/om
>>> pi/wiki/Admistrative-rules
>>>
>>> New Bylaws proposal: https://github.com/open-mpi/om
>>> pi/wiki/Proposed-New-Bylaws
>>>
>>> Open MPI members will be voting on October 25th.  Please voice any
>>> comments or concerns.
>>>
>>>
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>
>>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread Pavel Shamis
Does it mean that contributors don't have to sign contributor agreement ?

On Tue, Oct 11, 2016 at 2:35 PM, Geoffrey Paulsen 
wrote:

> We have been discussing new Bylaws for the Open MPI Community.  The
> primary motivator is to allow non-members to commit code.  Details in the
> proposal (link below).
>
> Old Bylaws / Procedures:  https://github.com/open-mpi/
> ompi/wiki/Admistrative-rules
>
> New Bylaws proposal: https://github.com/open-mpi/
> ompi/wiki/Proposed-New-Bylaws
>
> Open MPI members will be voting on October 25th.  Please voice any
> comments or concerns.
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] OSHMEM and shmem_ptr

2016-09-29 Thread Pavel Shamis
Hi Nick,

> I agree that symmetric objects in the data segment are a challenging issue.
> Could dynamically-allocated symmetric objects be supported much more easily?

Indeed. At least on the technical level you don't depend on a 3rd
party kernel module.

> Does Open MPI / OSHMEM already have cross-process mappings for symmetric
> allocations (e.g., for local put/get via memcpy)?

I just went over the code. It will depend on the underlaying
transport/configuration but the overall answer is yes.

Thanks,
Pasha
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] OSHMEM and shmem_ptr

2016-09-28 Thread Pavel Shamis
Nick,

IMHO, the major issue here is the support for "static" data objects.
This requirement gets directly translated to the XPMEM support for the
platform, otherwise it is very challenging to get it implement.
OpenMPI/OSHMEM provides support for xpmem (actually there is more than
one transport), but you have to make sure that the underlaying system
has the XPMEM kernel module installed. There is an open source version
of XPMEM (https://github.com/hjelmn/xpmem) which went through numerous
updates and bugfixes during the last year, even so it is not a
formally "QA"ed product.

-Pasha
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-23 Thread Pavel Shamis (Pasha)

~ 2300 KB - is it difference per machine or per MPI process ?
In OMPI XRC mode we allocate some additional resources that may consume 
some memory (the hash table), but even so ~2M sounds too much for me. 
When I will have time I will try to calculate the "resonable" difference.


Pasha

Sylvain Jeaugey wrote:

On Mon, 17 May 2010, Pavel Shamis (Pasha) wrote:


Sylvain Jeaugey wrote:
The XRC protocol seems to create shared receive queues, which is a 
good thing. However, comparing memory used by an "X" queue versus 
and "S" queue, we can see a large difference. Digging a bit into 
the code, we found some

So, do you see that X consumes more that S ? This is really odd.

Yes, but that's what we see. At least after MPI_Init.

What is the difference (in Kb)?
At 32 nodes x 32 cores (1024 MPI processes), I get a difference of 
~2300 KB in favor of "S,65536,16,4,1" versus "X,65536,16,4,1".


The proposed patch doesn't seem to solve the problem however, there's 
still something that's taking more memory than expected.


Sylvain





Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-17 Thread Pavel Shamis (Pasha)

Sylvain Jeaugey wrote:
The XRC protocol seems to create shared receive queues, which is a 
good thing. However, comparing memory used by an "X" queue versus 
and "S" queue, we can see a large difference. Digging a bit into the 
code, we found some

So, do you see that X consumes more that S ? This is really odd.

Yes, but that's what we see. At least after MPI_Init.

What is the difference (in Kb)?


strange things, like the completion queue size not being the same as 
"S" queues (the patch below would fix it, but the root of the 
problem may be elsewhere).


Is anyone able to comment on this ?

The fix looks ok, please submit it to trunk.
I don't have an account to do this, so I'll let maintainers push it 
into SVN.

Ok, I will push it.


BTW do you want to prepare the patch for send queue size factor ? It 
should be quite simple.
Maybe we can do this. However, we are a little playing with parameters 
and code without really knowing the deep consequences of what we do. 
Therefore, I would feel more confortable if someone who knows much on 
the openib btl confirms it's not breaking everything.
Well, please feel free to submit a patch for review. Also if you see any 
other issues with XRC, MLNX will be happy to help.


Regards,
Pasha.


Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-17 Thread Pavel Shamis (Pasha)

Please see below.



When using XRC queues, Open MPI is indeed creating only one XRC queue 
per node (instead of per-host). The problem is that the number of send 
elements in this queue is multiplied by the number of processes on the 
remote host.


So, what are we getting from this ? Not much, except that we can 
reduce the sd_max parameter to 1 element, and still have 8 elements in 
the send queue (on 8 cores machines), which may still be ok on the 
performance side.
Don't forget the the QP object itself consume some memory on 
driver/verbs level.
BUT , but I agree that we need to provide more flexibility and it will 
be nice that default multiply coefficient will be  smaller , as well I 
think we need to make it user tunable parameter (yep, one more parameter).


Send queues are created lazily, so having a lot of memory for send 
queues is not necessary blocking. What's


blocking is the receive queues, because they are created during 
MPI_Init, so in a way, they are the "basic fare" of MPI.
BTW SRQ resources are also allocated on demand. We start with very small 
SRQ and it is increased on SRQ limit event.




The XRC protocol seems to create shared receive queues, which is a 
good thing. However, comparing memory used by an "X" queue versus and 
"S" queue, we can see a large difference. Digging a bit into the code, 
we found some

So, do you see that X consumes more that S ? This is really odd.
strange things, like the completion queue size not being the same as 
"S" queues (the patch below would fix it, but the root of the problem 
may be elsewhere).


Is anyone able to comment on this ?

The fix looks ok, please submit it to trunk.
BTW do you want to prepare the patch for send queue size factor ? It 
should be quite simple.


Regards,
Pasha


Re: [OMPI devel] v1.4 broken

2010-02-17 Thread Pavel Shamis (Pasha)

I'm checking this issue.

Pasha

Joshua Hursey wrote:

I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB 
BTL last night. The error was:

-
btl_openib_component.c: In function 'init_one_device':
btl_openib_component.c:2089: error: 'mca_btl_openib_component_t' has no member 
named 'default_recv_qps'
-

It looks like CMR #2251 is the problem.

-- Josh
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] #2163: 1.5 blocker

2010-01-14 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

Note that I just moved #2163 into "blocker" state because the bug breaks any 
non-SRQ-capable OpenFabrics device (e.g., both Chelsio and Neteffect RNICs).  The bug came in 
recently with the "resize SRQ" patches.

Mellanox / IBM: we are branching for 1.5 tomorrow.  Can this bug be fixed 
before then?

https://svn.open-mpi.org/trac/ompi/ticket/2163
  
Vasily will submit the patch in few minutes. Can you please test it on 
Iwarp devices ?

Pasha.



Re: [OMPI devel] Question about ompi_proc_t

2009-12-08 Thread Pavel Shamis (Pasha)



Both of these types (mca_pml_endpoint_t and mca_pml_base_endpoint_t) are 
meaningless, they can safely be replaced by void*. We have them clearly typed 
(but with just for the sake of understanding, so one can easily figure out what 
is supposed to be stored in this specific field. As such, we can remove one of 
them (mca_pml_base_endpoint_t) and use the other one (mca_pml_endpoint_t) 
everywhere.
  
George, thank you for clarification! For me it sound like a good idea to 
leave only one of them.

I wonder what is exactly the reason that drives your questions?
  

The question was raised during internal top-down code review.

Regards,
Pasha

  

George,
Actually My original question was correct.

In the ompi code base I found ONLY two places where we "use" the structure.
Actually we only assign values for the pointer in DR and CM PML:

ompi/mca/pml/cm/pml_cm.c:145:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoints[i];
ompi/mca/pml/dr/pml_dr.c:264:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoint;

I do not see any struct definiton/declaration mca_pml_base_endpoint_t in the 
OMPI code at all.

But I do see the "struct mca_pml_endpoint_t;" declaration in pml.h. As well, I comment 
that says: "A pointer to an mca_pml_endpoint_t is maintained on each ompi_proc_t". So it 
looks that the idea was to use use mca_pml_endpoint_t on the ompi_proc_t and not 
mca_pml_base_endpoint_t, is not it ?

Thanks !

Pasha

George Bosilca wrote:


Actually your answer is correct. The endpoint is defined down below in the PML. 
In addition, I think only the MTL and the DR PML use it, all OB1 derivative 
completely ignore it.

 george.

On Dec 7, 2009, at 08:30 , Timothy Hayes wrote:

 
  

Sorry, I think I read your question too quickly. Ignore me. :-)

2009/12/7 Timothy Hayes 
Is it not a forward definition and then defined in the PML components 
individually based on their own requirements?

2009/12/7 Pavel Shamis (Pasha) 

In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to proc_pml - "struct 
mca_pml_base_endpoint_t* proc_pml" . I tired to find definition for "struct 
mca_pml_base_endpoint_t" , but I failed. Does somebody know where is it defined ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   

 
  



  




Re: [OMPI devel] Question about ompi_proc_t

2009-12-08 Thread Pavel Shamis (Pasha)

George,
Actually My original question was correct.

In the ompi code base I found ONLY two places where we "use" the structure.
Actually we only assign values for the pointer in DR and CM PML:

ompi/mca/pml/cm/pml_cm.c:145:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoints[i];
ompi/mca/pml/dr/pml_dr.c:264:procs[i]->proc_pml = (struct 
mca_pml_base_endpoint_t*) endpoint;


I do not see any struct definiton/declaration mca_pml_base_endpoint_t in 
the OMPI code at all.


But I do see the "struct mca_pml_endpoint_t;" declaration in pml.h. As 
well, I comment that says: "A pointer to an mca_pml_endpoint_t is 
maintained on each ompi_proc_t". So it looks that the idea was to use 
use mca_pml_endpoint_t on the ompi_proc_t and not 
mca_pml_base_endpoint_t, is not it ?


Thanks !

Pasha

George Bosilca wrote:

Actually your answer is correct. The endpoint is defined down below in the PML. 
In addition, I think only the MTL and the DR PML use it, all OB1 derivative 
completely ignore it.

  george.

On Dec 7, 2009, at 08:30 , Timothy Hayes wrote:

  

Sorry, I think I read your question too quickly. Ignore me. :-)

2009/12/7 Timothy Hayes 
Is it not a forward definition and then defined in the PML components 
individually based on their own requirements?

2009/12/7 Pavel Shamis (Pasha) 

In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to proc_pml - "struct 
mca_pml_base_endpoint_t* proc_pml" . I tired to find definition for "struct 
mca_pml_base_endpoint_t" , but I failed. Does somebody know where is it defined ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




Re: [OMPI devel] Question about ompi_proc_t

2009-12-07 Thread Pavel Shamis (Pasha)

Sorry , I wanted to ask about

struct mca_pml_base_endpoint_t

Declaration and not definition.

Pasha.

George Bosilca wrote:

Actually your answer is correct. The endpoint is defined down below in the PML. 
In addition, I think only the MTL and the DR PML use it, all OB1 derivative 
completely ignore it.

  george.

On Dec 7, 2009, at 08:30 , Timothy Hayes wrote:

  

Sorry, I think I read your question too quickly. Ignore me. :-)

2009/12/7 Timothy Hayes 
Is it not a forward definition and then defined in the PML components 
individually based on their own requirements?

2009/12/7 Pavel Shamis (Pasha) 

In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to proc_pml - "struct 
mca_pml_base_endpoint_t* proc_pml" . I tired to find definition for "struct 
mca_pml_base_endpoint_t" , but I failed. Does somebody know where is it defined ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




[OMPI devel] Question about ompi_proc_t

2009-12-07 Thread Pavel Shamis (Pasha)
In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to 
proc_pml - "struct mca_pml_base_endpoint_t* proc_pml" . I tired to find 
definition for "struct mca_pml_base_endpoint_t" , but I failed. Does 
somebody know where is it defined ?


Regards,
Pasha


Re: [OMPI devel] [PATCH] Not optimal SRQ resource allocation

2009-12-06 Thread Pavel Shamis (Pasha)

Jeff,
During the original code review we found ,that by default we allocate 
SRQ with size "rd_num + sd_max" but
on the SRQ we post only rd_num receive entries. It means that we do not 
fill the queue completely. Looks like a bug.


Pasha

Jeff Squyres wrote:

SRQ hardware vendors -- please review and reply...

More below.


On Dec 2, 2009, at 10:20 AM, Vasily Philipov wrote:

  

diff -r a5938d9dcada ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c  Mon Nov 23 19:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib.c  Wed Dec 02 16:24:55 2009 +0200
@@ -214,6 +214,7 @@
static int create_srq(mca_btl_openib_module_t *openib_btl)
{
int qp;
+int32_t rd_num, rd_curr_num; 


/* create the SRQ's */
for(qp = 0; qp < mca_btl_openib_component.num_qps; qp++) {
@@ -242,6 +243,24 @@
   
ibv_get_device_name(openib_btl->device->ib_dev));
return OMPI_ERROR;
}
+
+rd_num = mca_btl_openib_component.qp_infos[qp].rd_num;
+rd_curr_num = openib_btl->qps[qp].u.srq_qp.rd_curr_num = 
mca_btl_openib_component.qp_infos[qp].u.srq_qp.rd_init;
+
+if(true == mca_btl_openib_component.enable_srq_resize) {
+if(0 == rd_curr_num) {
+openib_btl->qps[qp].u.srq_qp.rd_curr_num = 1;
+}
+
+openib_btl->qps[qp].u.srq_qp.rd_low_local = rd_curr_num - 
(rd_curr_num >> 2);
+openib_btl->qps[qp].u.srq_qp.srq_limit_event_flag = true;
+} else {
+openib_btl->qps[qp].u.srq_qp.rd_curr_num = rd_num;
+openib_btl->qps[qp].u.srq_qp.rd_low_local = 
mca_btl_openib_component.qp_infos[qp].rd_low;
+/* Not used in this case, but we don't need a garbage */
+mca_btl_openib_component.qp_infos[qp].u.srq_qp.srq_limit = 0;
+openib_btl->qps[qp].u.srq_qp.srq_limit_event_flag = false;
+}
}
}

diff -r a5938d9dcada ompi/mca/btl/openib/btl_openib.h
--- a/ompi/mca/btl/openib/btl_openib.h  Mon Nov 23 19:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib.h  Wed Dec 02 16:24:55 2009 +0200
@@ -87,6 +87,12 @@

struct mca_btl_openib_srq_qp_info_t {
int32_t sd_max;
+/* The init value for rd_curr_num variables of all SRQs */
+int32_t rd_init;
+/* The watermark, threshold - if the number of WQEs in SRQ is less then this 
value =>
+   the SRQ limit event (IBV_EVENT_SRQ_LIMIT_REACHED) will be generated on 
corresponding SRQ.
+   As result the maximal number of pre-posted WQEs on the SRQ will be 
increased */
+int32_t srq_limit;
}; typedef struct mca_btl_openib_srq_qp_info_t mca_btl_openib_srq_qp_info_t;

struct mca_btl_openib_qp_info_t {
@@ -254,6 +260,8 @@
ompi_free_list_t recv_user_free;
/**< frags for coalesced massages */
ompi_free_list_t send_free_coalesced;
+/**< Whether we want a dynamically resizing srq, enabled by default */
+bool enable_srq_resize;



/**< means that the comment refers to the field above.  I think you mean /* or /** 
here (although we haven't used doxygen for a long, long time).  I see that other 
fields are incorrectly marked /**<, but please don't propagate the badness. ;-)


  

}; typedef struct mca_btl_openib_component_t mca_btl_openib_component_t;

OMPI_MODULE_DECLSPEC extern mca_btl_openib_component_t mca_btl_openib_component;
@@ -348,6 +356,16 @@
int32_t sd_credits;  /* the max number of outstanding sends on a QP when 
using SRQ */
 /*  i.e. the number of frags that  can be outstanding 
(down counter) */
opal_list_t pending_frags[2];/**< list of high/low prio frags */
+/**< The number of max rd that we can post in the current time.
+ The value may be increased in the IBV_EVENT_SRQ_LIMIT_REACHED
+ event handler. The value starts from (rd_num / 4) and increased up to 
rd_num */
+int32_t rd_curr_num;



The comment says "max", but the field name is "curr"[ent]?  Seems a little odd.

  

+/**< We post additional WQEs only if a number of WQEs (in specific SRQ) is 
less of this value.
+ The value increased together with rd_curr_num. The value is unique 
for every SRQ. */
+int32_t rd_low_local;
+/**< The flag points if we want to get the 
+ IBV_EVENT_SRQ_LIMIT_REACHED events for dynamically resizing SRQ */

+bool srq_limit_event_flag;



Can you explain how this is different than enable_srq_resize?

  

}; typedef struct mca_btl_openib_module_srq_qp_t mca_btl_openib_module_srq_qp_t;

struct mca_btl_openib_module_qp_t {
diff -r a5938d9dcada ompi/mca/btl/openib/btl_openib_async.c
--- a/ompi/mca/btl/openib/btl_openib_async.cMon Nov 23 19:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib_async.cWed Dec 02 16:24:55 2009 +0200
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2008 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2008-2009 Mellanox Technologies. All r

Re: [OMPI devel] [Fwd: strong ordering for data registered memory]

2009-11-13 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

On Nov 11, 2009, at 8:13 AM, Terry Dontje wrote:


Sun's IB group has asked me to forward the following email to see if
anyone has any comments on this email.



Tastes great / less filling. :-)

I think (assume) we'll be happy to implement changes like this that 
come from the upstream OpenFabrics verbs API (I see that this mail was 
first directed to the linux-rdma list, which is where the OF verbs API 
is discussed).


Sounds good. Now it is only left to convince OFA community to make this 
changes.


Pasha


Re: [OMPI devel] Adding support for RDMAoE devices.

2009-11-01 Thread Pavel Shamis (Pasha)

Jeff,
Can you please check that we do not brake iwarp devices functionality ?

Pasha

Vasily Philipov wrote:
The attached patch adds support for RDMAoE (RDMA over Ethernet) 
devices to Openib BTL. The code changes are very minimal, actually we 
only modified the RDMACM code to provide better support for IB and 
RDMAoE devices. Please let me know if you have any comments.


Regards,Vasily.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Trunk is brokem ?

2009-10-21 Thread Pavel Shamis (Pasha)

It was broken :-(
I fixed it - r22119

Pasha

Pavel Shamis (Pasha) wrote:

On my systems I see follow error:

gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. 
-O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare 
-Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread -fvisibility=hidden -MT sensor_pru.lo 
-MD -MP -MF .deps/sensor_pru.Tpo -c sensor_pru.c -fPIC -DPIC -o 
.libs/sensor_pru.o

sensor_pru_component.c: In function 'orte_sensor_pru_open':
sensor_pru_component.c:77: error: implicit declaration of function 
'opal_output'


Looks like the sensor code is broken.

Thanks,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





[OMPI devel] Trunk is brokem ?

2009-10-21 Thread Pavel Shamis (Pasha)

On my systems I see follow error:

gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. 
-O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare 
-Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread -fvisibility=hidden -MT sensor_pru.lo -MD 
-MP -MF .deps/sensor_pru.Tpo -c sensor_pru.c  -fPIC -DPIC -o 
.libs/sensor_pru.o

sensor_pru_component.c: In function 'orte_sensor_pru_open':
sensor_pru_component.c:77: error: implicit declaration of function 
'opal_output'


Looks like the sensor code is broken.

Thanks,
Pasha


Re: [OMPI devel] Device failover on ob1

2009-08-03 Thread Pavel Shamis (Pasha)



I have not, but there should be no difference.  The failover code only 
gets triggered when an error happens.  Otherwise, there are no 
differences in the code paths while everything is functioning normally.
Sounds good. I still did not have time to review the code. I will try to 
do it during this week.


Pasha


Rolf

On 08/03/09 11:14, Pavel Shamis (Pasha) wrote:

Rolf,
Did you compare latency/bw for failover-enabled code VS trunk ?

Pasha.

Rolf Vandevaart wrote:

Hi folks:

As some of you know, I have also been looking into implementing 
failover as well.  I took a different approach as I am solving the 
problem within the openib BTL itself.  This of course means that 
this only works for failing from one openib BTL to another but that 
was our area of interest.  This also means that we do not need to 
keep track of fragments as we get them back from the completion 
queue upon failure. We then extract the relevant information and 
repost on the other working endpoint.


My work has been progressing at 
http://bitbucket.org/rolfv/ompi-failover.


This only currently works for send semantics so you have to run with 
-mca btl_openib_flags 1.


Rolf

On 07/31/09 05:49, Mouhamed Gueye wrote:

Hi list,

Here is an update on our work concerning device failover.

As many of you suggested, we reoriented our work on ob1 rather than 
dr and we now have a working prototype on top of ob1. The approach 
is to store btl descriptors sent to peers and delete them when we 
receive proof of delivery. So far, we rely on completion callback 
functions, assuming that the message is delivered when the 
completion function is called, that is the case of openib. When a 
btl module fails, it is removed from the endpoint's btl list and 
the next one is used to retransmit stored descriptors. No 
extra-message is transmitted, it only consists in additions to the 
header. It has been mainly tested with two IB modules, in both 
multi-rail (two separate networks) and multi-path (a big unique 
network).


You can grab and test the patch here (applies on top of the trunk) :
http://bitbucket.org/gueyem/ob1-failover/

To compile with failover support, just define 
--enable-device-failover at configure. You can then run a 
benchmark, disconnect a port and see the failover operate.


A little latency increase (~ 2%) is induced by the failover layer 
when no failover occurs. To accelerate the failover process on 
openib, you can try to lower the btl_openib_ib_timeout openib 
parameter to 15 for example instead of 20 (default value).


Mouhamed
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel












Re: [OMPI devel] Device failover on ob1

2009-08-03 Thread Pavel Shamis (Pasha)

Rolf,
Did you compare latency/bw for failover-enabled code VS trunk ?

Pasha.

Rolf Vandevaart wrote:

Hi folks:

As some of you know, I have also been looking into implementing 
failover as well.  I took a different approach as I am solving the 
problem within the openib BTL itself.  This of course means that this 
only works for failing from one openib BTL to another but that was our 
area of interest.  This also means that we do not need to keep track 
of fragments as we get them back from the completion queue upon 
failure. We then extract the relevant information and repost on the 
other working endpoint.


My work has been progressing at http://bitbucket.org/rolfv/ompi-failover.

This only currently works for send semantics so you have to run with 
-mca btl_openib_flags 1.


Rolf

On 07/31/09 05:49, Mouhamed Gueye wrote:

Hi list,

Here is an update on our work concerning device failover.

As many of you suggested, we reoriented our work on ob1 rather than 
dr and we now have a working prototype on top of ob1. The approach is 
to store btl descriptors sent to peers and delete them when we 
receive proof of delivery. So far, we rely on completion callback 
functions, assuming that the message is delivered when the completion 
function is called, that is the case of openib. When a btl module 
fails, it is removed from the endpoint's btl list and the next one is 
used to retransmit stored descriptors. No extra-message is 
transmitted, it only consists in additions to the header. It has been 
mainly tested with two IB modules, in both multi-rail (two separate 
networks) and multi-path (a big unique network).


You can grab and test the patch here (applies on top of the trunk) :
http://bitbucket.org/gueyem/ob1-failover/

To compile with failover support, just define 
--enable-device-failover at configure. You can then run a benchmark, 
disconnect a port and see the failover operate.


A little latency increase (~ 2%) is induced by the failover layer 
when no failover occurs. To accelerate the failover process on 
openib, you can try to lower the btl_openib_ib_timeout openib 
parameter to 15 for example instead of 20 (default value).


Mouhamed
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] selectively bind MPI to one HCA out of available ones

2009-07-16 Thread Pavel Shamis (Pasha)

Hi,
You can select ib device used with openib btl by using follow parametres:
MCA btl: parameter "btl_openib_if_include" (current value: , data 
source: default value)
 Comma-delimited list of devices/ports to be 
used (e.g. "mthca0,mthca1:2"; empty value means to
 use all ports found).  Mutually exclusive with 
btl_openib_if_exclude.
MCA btl: parameter "btl_openib_if_exclude" (current value: , data 
source: default value)
 Comma-delimited list of device/ports to be 
excluded (empty value means to not exclude any
 ports).  Mutually exclusive with 
btl_openib_if_include.


For example, if you want to use first port on mthc0 you command line 
will look like:


mpirun -np. --mca btl_openib_if_include mthca0:1 

Pasha

nee...@crlindia.com wrote:


Hi all,

I have a cluster where both HCA's of blade are active, but 
connected to different subnet.
Is there an option in MPI to select one HCA out of available 
one's? I know it can be done by making changes in openmpi code, but i 
need clean interface like option during mpi launch time to select 
mthca0 or mthca1?


Any help is appreciated. Btw i just checked Mvapich and 
feature is there inside.


Regards

Neeraj Chourasia (MTS)
Computational Research Laboratories Ltd.
(A wholly Owned Subsidiary of TATA SONS Ltd)
B-101, ICC Trade Towers, Senapati Bapat Road
Pune 411016 (Mah) INDIA
(O) +91-20-6620 9863  (Fax) +91-20-6620 9862
M: +91.9225520634

=-=-= Notice: The information contained in this 
e-mail message and/or attachments to it may contain confidential or 
privileged information. If you are not the intended recipient, any 
dissemination, use, review, distribution, printing or copying of the 
information contained in this e-mail message and/or attachments to it 
are strictly prohibited. If you have received this communication in 
error, please notify us by reply e-mail or telephone and immediately 
and permanently delete the message and any attachments. Internet 
communications cannot be guaranteed to be timely, secure, error or 
virus-free. The sender does not accept liability for any errors or 
omissions.Thank you =-=-=




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Multi-rail on openib

2009-06-14 Thread Pavel Shamis (Pasha)

Nifty Tom Mitchell wrote:

On Tue, Jun 09, 2009 at 04:33:51PM +0300, Pavel Shamis (Pasha) wrote:
  
Open MPI currently needs to have connected fabrics, but maybe that's  
something we will like to change in the future, having two separate  
rails. (Btw Pasha, will your current work enable this ?)
  
I do not completely understand what do you mean here under two separate  
rails ...
Already today you may connect each port to different subnet, and ports  
in the same

subnet may talk to each other.




Subnet?  (subnet .vs. fabric)
  

About subnet id definition you may read here.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-set-subnet-id


Does this imply tcp/ip
What IB protocols are involved and
Is there any agent that notices the disconnect and will trigger the switch?

  
In Open MPI we use RC (connected) protocol. On connection failure we get 
error
and handle it. If APM is enabled, Open MPI will try to migrate to other 
path , otherway we will fail.


Pasha.


Re: [OMPI devel] Multi-rail on openib

2009-06-09 Thread Pavel Shamis (Pasha)


Open MPI currently needs to have connected fabrics, but maybe that's 
something we will like to change in the future, having two separate 
rails. (Btw Pasha, will your current work enable this ?)
I do not completely understand what do you mean here under two separate 
rails ...
Already today you may connect each port to different subnet, and ports 
in the same

subnet may talk to each other.

Pasha.


Re: [OMPI devel] Multi-rail on openib

2009-06-09 Thread Pavel Shamis (Pasha)




Most of the IB protocols used by MPI target a LID.   There is no
existing notification path I know of that can replace LID-xyz with
LID-123.  The subnet manager might be able to do this but begs
security issues.

Interesting problem.
  

It is not exactly correct. For migration between ports
on the same HCA you may use IB APM feature. It is already implemented
in Open MPI 1.3.X

In case of port migration between different HCA we need to do it in 
software. And I guess it is

what Sylvain is doing.

Pasha.



Re: [OMPI devel] Open MPI mirrors list

2009-04-26 Thread Pavel Shamis (Pasha)

Thanks Jeff !

Jeff Squyres wrote:
We just added a mirror web site in Israel.  That one was fun because 
they requested to have their web tagline be in Hebrew.


Fun fact: with this addition, Open MPI now has 14 mirror web sites 
around the world.


http://www.open-mpi.org/community/mirrors/

Interested in becoming an Open MPI mirror?  See this web page:

http://www.open-mpi.org/community/mirrors/become-a-mirror.php






Re: [OMPI devel] ***SPAM*** Re: [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)

Steve,
If you will compile OMPI code with CFLAGS="-g" ,generate segfault 
core_file and send the core + IMB-MPI1 to me I will be able to 
understand the problem better.


Regards,
Pasha

Steve Wise wrote:


Hey Pasha,


I just applied r20872 and retested, and I still hit this seg fault.  
So I think this is a new bug.


Lemme pull the trunk and try that.



Pavel Shamis (Pasha) wrote:
I think you problem is related to this bug: 
https://svn.open-mpi.org/trac/ompi/ticket/1823


And it is resolved on the ompi-trunk.

Pasha.

Steve Wise wrote:
When this happens, that node logs this type of message also in 
/var/log/messages:


IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 
rsp 7fffb1021330 error 4


Steve Wise wrote:

Hey Jeff,

Have you seen this?  I'm hitting this regularly running on 
ofed-1.4.1-rc2.


Test:
[ompi@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
   mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
--mca btl openib,self,sm  --mca btl_openib_max_btls 1 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  
bcast scatter sendrecv exchange 
done


Seg Fault output:

[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33800]
[vic21:04047] [ 2] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc38c2d]
[vic21:04047] [ 3] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33fcb]
[vic21:04047] [ 4] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc22af8]
[vic21:04047] [ 5] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
[0x2b911933da33]
[vic21:04047] [ 6] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
[0x2b9118ea3fb0]
[vic21:04047] [ 7] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
[0x2b911ba1938f]
[vic21:04047] [ 8] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
[0x2b911b601cde]
[vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
[0x2b9118e7241b]
[vic21:04047] [10] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
[0x403498]
[vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3ddd61d974]
[vic21:04047] [12] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]

[vic21:04047] *** End of error message ***

___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg





Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)
I think you problem is related to this bug: 
https://svn.open-mpi.org/trac/ompi/ticket/1823


And it is resolved on the ompi-trunk.

Pasha.

Steve Wise wrote:
When this happens, that node logs this type of message also in 
/var/log/messages:


IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp 
7fffb1021330 error 4


Steve Wise wrote:

Hey Jeff,

Have you seen this?  I'm hitting this regularly running on 
ofed-1.4.1-rc2.


Test:
[ompi@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
   mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
--mca btl openib,self,sm  --mca btl_openib_max_btls 1 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  bcast 
scatter sendrecv exchange 
done


Seg Fault output:

[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33800]
[vic21:04047] [ 2] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc38c2d]
[vic21:04047] [ 3] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33fcb]
[vic21:04047] [ 4] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc22af8]
[vic21:04047] [ 5] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
[0x2b911933da33]
[vic21:04047] [ 6] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
[0x2b9118ea3fb0]
[vic21:04047] [ 7] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
[0x2b911ba1938f]
[vic21:04047] [ 8] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
[0x2b911b601cde]
[vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
[0x2b9118e7241b]
[vic21:04047] [10] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
[0x403498]
[vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3ddd61d974]
[vic21:04047] [12] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]

[vic21:04047] *** End of error message ***

___
ewg mailing list
e...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] Meta Question -- Open MPI: Is it a dessert toppingor is it a floor wax?

2009-03-13 Thread Pavel Shamis (Pasha)

I would like to add my 0.02 to this discussion.
The "code stability" it is not something that should stop project from
development and progress. During MPI Project live we already saw some
pretty critical changes (pml/btl/etc...) and as result after all we have
more stable and more optimized MPI. OMPI already have defined
workflow that allow to us handle big code changes without loosing  code
stability for a long period of time.  As I understand the btl movement
will make the code more modular and will open window for new features
and optimizations.
So if the BTL movement will not effect negatively on MPI performance and
code stability (for a long period of time, temporary unstability may be
reasonable) I do not see any constructive reason to block this
modification. If I understand correct the code will go to the
feature_oriented branch and we will have enough time to run QA cycles
before it will be moved to super_stable mode.
And after all I hope that we will get more stable,modular and feature
rich MPI implementation.

Thanks,
Pasha


Brian W. Barrett wrote:
I'm going to stay out of the debate about whether Andy correctly 
characterized the two points you brought up as a distributed OS or not.


Sandia's position on these two points remains the same as I previously 
stated when the question was distributed OS or not.  The primary goal 
of the Open MPI project was and should remain to be the best MPI 
project available.  Low-cost items to support different run-times or 
different non-MPI communication contexts are worth the work.  But 
high-cost items should be avoided, as they degrade our ability to 
provide the best MPI project available (of course, others, including 
OMPI developers, can take the source and do what they wish outside the 
primary development tree).


High performance is a concern, but so is code maintainability.  If it 
takes twices as long to implement feature A because I have to worry 
about it's impact not only on MPI, but also on projects X, Y, Z, as an 
MPI developer, I've lost something important.


Brian

On Thu, 12 Mar 2009, Richard Graham wrote:

I am assuming that by distributed OS you are referring to the changes 
that

we (not just ORNL) are trying to do.  If this is the case, this is a
mischaracterization of the of out intentions.  We have two goals

 - To be able to use a different run-time than ORTE to drive Open MPI
 - To use the communication primitives outside the context of MPI 
(with or

without ORTE)

High performance is critical, and at NO time have we ever said anything
about sacrificing performance - these have been concerns that others
(rightfully) have expressed.

Rich


On 3/12/09 8:24 AM, "Jeff Squyres"  wrote:


I think I have to agree with Terry.

I love to inspire and see new, original, and unintended uses for Open
MPI.  But our primary focus must remain to create, maintain, and
continue to deliver a high performance MPI implementation.

We have a long history of adding "small" things to Open MPI that are
useful to 3rd parties because it helps them, helps further Open MPI
adoption/usefulness, and wasn't difficult for us to do ("small" can
have varying definitions).  I'm in favor of such things, as long as we
maintain a policy of "in cases of conflict, OMPI/high performance MPI
wins".


On Mar 12, 2009, at 9:01 AM, Terry Dontje wrote:


Sun's participation in this community was to obtain a stable and
performant MPI implementation that had some research work done on the
side to improve those goals and the introduction of new features.   We
don't have problems with others using and improving on the OMPI code
base but we need to make sure such usage doesn't detract from our
primary goal of performant MPI implementation.

However, changes to the OMPI code base to allow it to morph or even
support a distributed OS does cause for some concern.  That is are we
opening the door to having more interfaces to support?  If so is this
wise in the fact that it seems to me we have a hard enough time trying
to focus on the MPI items?  Not to mention this definitely starts
detracting from the original goals.

--td

Andrew Lumsdaine wrote:

Hi all -- There is a meta question that I think is underlying some

of

the discussion about what to do with BTLs etc.  Namely, is Open

MPI an

MPI implementation with a portable run time system -- or is it a
distributed OS with an MPI interface?  It seems like some of the
changes being asked for (e.g., with the BTLs) reflect the latter --
but perhaps not everyone shares that view and hence the impedance
mismatch.

I doubt this is the last time that tensions will come up because of
differing views on this question.

I suggest that we come to some kind of common understanding of the
question (and answer) and structure development and administration
accordingly.

Best Regards,
Andrew Lumsdaine

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


__

Re: [OMPI devel] VT compile error: Fwd: [ofa-general] OFED 1.4.1 (rc1) is available

2009-03-05 Thread Pavel Shamis (Pasha)
Adding pointer to OFED bugzilla ticket for more information: 
https://bugs.openfabrics.org/show_bug.cgi?id=1537


Jeff Squyres wrote:

VT guys --

It looks like we still have a compile bug in OMPI 1.3.1rc4...  See below.

Do you think you can get a fix ASAP for OMPI 1.3.1final?


Begin forwarded message:


*From: *"PN" mailto:pok...@gmail.com>>
*Date: *March 5, 2009 12:51:28 AM EST
*To: *"Tziporet Koren" >
*Cc: *mailto:e...@lists.openfabrics.org>>, 
mailto:gene...@lists.openfabrics.org>>

*Subject:** Re: [ofa-general] OFED 1.4.1 is available*

HI,

I have a build error of OFED-1.4.1-rc1 under CentOS 5.2:

.
Build openmpi_gcc RPM
Running  rpmbuild --rebuild  --define '_topdir /var/tmp/OFED_topdir' 
--define 'dist %{nil}' --target x86_64 --define '_name openmpi_gcc' 
--define 'mpi_selector /usr/bin/mpi-selector' --define 
'use_mpi_selector 1' --define 'install_shell_scripts 1' --define 
'shell_scripts_basename mpivars' --define '_usr /usr' --define 'ofed 
0' --define '_prefix /usr/mpi/gcc/openmpi-1.3.1rc4' --define 
'_defaultdocdir /usr/mpi/gcc/openmpi-1.3.1rc4' --define '_mandir 
%{_prefix}/share/man' --define 'mflags -j 4' --define 
'configure_options   --with-openib=/usr 
--with-openib-libdir=/usr/lib64  CC=gcc CXX=g++ F77=gfortran 
FC=gfortran --enable-mpirun-prefix-by-default' --define 
'use_default_rpm_opt_flags 1' 
/opt/software/packages/ofed/OFED-1.4.1-rc1/OFED-1.4.1-rc1/SRPMS/openmpi-1.3.1rc4-1.src.rpm

Failed to build openmpi RPM
See /tmp/OFED.28377.logs/openmpi.rpmbuild.log


In /tmp/OFED.28377.logs/openmpi.rpmbuild.log:
.
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib 
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  -DVT_MEMHOOK 
-DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT 
vt_iowrap_helper.o -MD -MP -MF .deps/vt_iowrap_helper.Tpo -c -o 
vt_iowrap_helper.o vt_iowrap_helper.c

mv -f .deps/vt_memhook.Tpo .deps/vt_memhook.Po
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib 
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  -DVT_MEMHOOK 
-DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT 
rfg_regions.o -MD -MP -MF .deps/rfg_regions.Tpo -c -o rfg_regions.o 
rfg_regions.c
vt_iowrap.c:1242: error: expected declaration specifiers or '...' 
before numeric constant

vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk'
mv -f .deps/vt_iowrap_helper.Tpo .deps/vt_iowrap_helper.Po
make[5]: *** [vt_iowrap.o] Error 1
make[5]: *** Waiting for unfinished jobs
mv -f .deps/vt_comp_gnu.Tpo .deps/vt_comp_gnu.Po
mv -f .deps/rfg_regions.Tpo .deps/rfg_regions.Po
make[5]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt/vtlib'

make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[3]: *** [all] Error 2
make[3]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi'

make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.40739 (%build)


RPM build errors:
user jsquyres does not exist - using root
group eng10 does not exist - using root
user jsquyres does not exist - using root
group eng10 does not exist - using root
Bad exit status from /var/tmp/rpm-tmp.40739 (%build)


The error seems similar to 
http://www.open-mpi.org/community/lists/devel/2009/01/5230.php


Regards,
PN


2009/3/5 Tziporet Koren >


Hi,

OFED-1.4.1-rc1 release is available on

_http://www.openfabrics.org/downloads/OFED/ofed-1.4.1/OFED-1.4.1-rc1.tgz_


To get BUILD_ID run ofed_info

Please report any issues in bugzilla
_https://bugs.openfabrics.org/_ for OFED 1.4.1

Vladimir & Tziporet




Release information:

--

Linux Operating Systems:

   - RedHat EL4 up4:  2.6.9-42.ELsmp  *

   - RedHat EL4 up5:  2.6.9-55.ELsmp

   - RedHat EL4 up6:  2.6.9-67.ELsmp

   - RedHat EL4 up7:2.6.9-78.ELsmp

   - RedHat EL5:2.6.18-8.el5

   - RedHat EL5 up1:  2.6.18-53.el5

   - RedHat EL5 up2:  2.6.18-92.el5

   - RedHat EL5 up3:  2.6.18-128.el5

   - OEL 4.5:   2.6.9-55.ELsmp

   - OEL 5.2: 

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-05 Thread Pavel Shamis (Pasha)
I tried to build latest OFED with new ompi rc4, but is looks that vtune 
code is broken again ?


gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib 
-I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  
-DVT_MEMHOOK -DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 
-march=i386 -mtune=generic -fasynchronous-unwind-tables -MT 
vt_iowrap_helper.o -MD -MP -MF .deps/vt_iowrap_helper.Tpo -c -o 
vt_iowrap_helper.o vt_iowrap_helper.c
gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib 
-I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  
-DVT_MEMHOOK -DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 
-march=i386 -mtune=generic -fasynchronous-unwind-tables -MT 
rfg_regions.o -MD -MP -MF .deps/rfg_regions.Tpo -c -o rfg_regions.o 
rfg_regions.c
*vt_iowrap.c:1242: error: expected declaration specifiers or '...' 
before numeric constant

vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk'
*gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib 
-I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE 
-DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" 
-DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG  
-DVT_MEMHOOK -DVT_IOWRAP  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 
-march=i386 -mtune=generic -fasynchronous-unwind-tables -MT rfg_filter.o 
-MD -MP -MF .deps/rfg_filter.Tpo -c -o rfg_filter.o rfg_filter.c

make[5]: *** [vt_iowrap.o] Error 1
make[5]: *** Waiting for unfinished jobs
mv -f .deps/vt_iowrap_helper.Tpo .deps/vt_iowrap_helper.Po
mv -f .deps/rfg_filter.Tpo .deps/rfg_filter.Po
mv -f .deps/rfg_regions.Tpo .deps/rfg_regions.Po
make[5]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt/vtlib'

make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[3]: *** [all] Error 2
make[3]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt'

make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi'

make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.43206 (%build)



Ralph H. Castain wrote:

Looks okay to me Brian - I went ahead and filed the CMR and sent it on to
Brad for approval.

Ralph


  

On Tue, 3 Mar 2009, Brian W. Barrett wrote:



On Tue, 3 Mar 2009, Jeff Squyres wrote:

  

1.3.1rc3 had a race condition in the ORTE shutdown sequence.  The only
difference between rc3 and rc4 was a fix for that race condition.
Please
test ASAP:

  http://www.open-mpi.org/software/ompi/v1.3/


I'm sorry, I've failed to test rc1 & rc2 on Catamount.  I'm getting a
compile
failure in the ORTE code.  I'll do a bit more testing and send Ralph an
e-mail this afternoon.
  

Attached is a patch against v1.3 branch that makes it work on Red Storm.
I'm not sure it's right, so I'm just e-mailing it rather than committing..
Sorry Ralph, but can you take a look? :(

Brian___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)

Matt,
For all 1.2.X versions you should use btl_openib_ib_pkey_val
In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val.

BTW we plan to release 1.2.8 version very soon and it will include the 
partition bug fix.


Regards,
Pasha

Matt Burgess wrote:

Pasha,

With your patch and parameter suggestion, it works! So to be clear 
btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 
1.2.7?


Thanks again,
Matt

On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) 
mailto:pa...@dev.mellanox.co.il>> wrote:


Matt,
Can you please run " cat
/sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
I would like to check the partition configuration.

Ohh, BTW I see that the command line in previous email was wrong,
Please use follow command line (the parameter name should be
"btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
HEX/DEC values):
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
openib,self -mca btl_openib_ib_pkey_val 0x8109
/cluster/pallas/x86_64-ib/IMB-MPI1

Ompi 1.2.6 version should work ok with this patch.


Thanks,
Pasha

Matt Burgess wrote:

Pasha,

Thanks for the patch. Unfortunately, it doesn't seem like that
fixed the problem. I realized earlier I didn't mention what
version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
<http://1.2.6.> Should I be trying 1.2.7 with this patch?

    Thanks,
Matt

2008/10/7 Pavel Shamis (Pasha) mailto:pa...@dev.mellanox.co.il>
<mailto:pa...@dev.mellanox.co.il
<mailto:pa...@dev.mellanox.co.il>>>


   Matt,
   Can you please try attached patch ? I guess it will resolve
this
   issue.

   Thanks,
   Pasha

   Matt Burgess wrote:

   Lenny,

   Thanks for the info. It doesn't seem to be be working
still.
   My command line is:

   /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
-mca btl
   openib,self -mca btl_openib_of_pkey_val 33033
   /cluster/pallas/x86_64-ib/IMB-MPI1

   I don't have a
"/sys/class/infiniband/mthca0/ports/1/pkeys/"
   but I do have
"/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
   It's contents are:

   0106  114  122  16   24   32   40   49   57   65  
73   81

 998
   1107  115  123  17   25   33   41   558   66  
74   82

 90   99
   10   108  116  124  18   26   34   42   50   59   67  
75   83
 91  100  109  117  125  19   27   35   43   51   6  
 68  76   84   92  101  11   118  126  228   36  
44   52   60
 69   77   85   93  102  110  119  127  20   29   37  
45  53   61   778   86   94  103  111  12   13  
21   338
 46   54   62   70   79   87   95  104  112  120  14  
22  30   39   47   55   63   71   888   96  105

 113  121  15
 23   31   448   56   64   72   80   89   97
   We aren't using the opensm, but voltaire's SM on a 2012
switch.

   Thanks again,
   Matt


   On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
   mailto:lenny.verkhov...@gmail.com>
   <mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>
   <mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>
   <mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>>> wrote:

  Hi Matt,

  It seems that the right way to do it is the fallowing:

  -mca btl openib,self -mca btl_openib_ib_pkey_val 33033

  when the value is a decimal number of the pkey, in
your case
  0x8109 = 33033, and no need for
btl_openib_ib_pkey_ix value.

  ex.
  mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
  btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
  LT (2) (size min max avg) 1 3.511429 3.511429 3.511429

  if it's not working check cat
  /sys/class/infiniband/mthca0/ports/1/pkeys/* for
pkeys ans SM,
  maybe it's a setup.

  Pasha is currently checking this issue.

  Best regards,

  Lenny.





  On 10/7/08, *Jeff Squyres* mailto:jsquy...@cisco.com>
   <mailto:jsquy...@cisco.c

[OMPI devel] OpenIB BTL - removing btl_openib_ib_pkey_ix parameter

2008-10-07 Thread Pavel Shamis (Pasha)

Hi,

I would like to remove the btl_openib_ib_pkey_ix (btl_openib_ib_pkey_ix) 
parameter.
The partition key index (btl_openib_ib_pkey_ix) is defined locally by 
each HCA, so in most cases each host will have different pkey index and
user have no control over this value. So direct pkey_ix specification is 
not possible because each HCA will have different value.
If user want to use specific partition he should specify only the 
btl_openib_ibib_pkey_val parameter, and Open-mpi translate it to 
corresponding pkey_ix.


I think the btl_openib_ib_pkey_ix is useless and I would like to remove it.

Comments ?

--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.



Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)

Matt,
Can you please run " cat /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " 
on your d2-ib,d3-ib.

I would like to check the partition configuration.

Ohh, BTW I see that the command line in previous email was wrong,
Please use follow command line (the parameter name should be 
"btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts HEX/DEC 
values):
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl 
openib,self -mca btl_openib_ib_pkey_val 0x8109 
/cluster/pallas/x86_64-ib/IMB-MPI1


Ompi 1.2.6 version should work ok with this patch.

Thanks,
Pasha

Matt Burgess wrote:

Pasha,

Thanks for the patch. Unfortunately, it doesn't seem like that fixed 
the problem. I realized earlier I didn't mention what version of 
OpenMPI I was trying - it's 1.2.6. <http://1.2.6.> Should I be trying 
1.2.7 with this patch?


Thanks,
Matt

2008/10/7 Pavel Shamis (Pasha) <mailto:pa...@dev.mellanox.co.il>>


Matt,
Can you please try attached patch ? I guess it will resolve this
issue.

Thanks,
Pasha

Matt Burgess wrote:

Lenny,

Thanks for the info. It doesn't seem to be be working still.
My command line is:

/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
openib,self -mca btl_openib_of_pkey_val 33033
/cluster/pallas/x86_64-ib/IMB-MPI1

I don't have a "/sys/class/infiniband/mthca0/ports/1/pkeys/"
but I do have "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
It's contents are:

0106  114  122  16   24   32   40   49   57   65   73   81
  998
1107  115  123  17   25   33   41   558   66   74   82
  90   99
10   108  116  124  18   26   34   42   50   59   67   75   83
  91  100  109  117  125  19   27   35   43   51   668  
76   84   92  101  11   118  126  228   36   44   52   60
  69   77   85   93  102  110  119  127  20   29   37   45  
53   61   778   86   94  103  111  12   13   21   338
  46   54   62   70   79   87   95  104  112  120  14   22  
30   39   47   55   63   71   888   96  105  113  121  15

  23   31   448   56   64   72   80   89   97
We aren't using the opensm, but voltaire's SM on a 2012 switch.

Thanks again,
Matt


On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
mailto:lenny.verkhov...@gmail.com>
<mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>> wrote:

   Hi Matt,

   It seems that the right way to do it is the fallowing:

   -mca btl openib,self -mca btl_openib_ib_pkey_val 33033

   when the value is a decimal number of the pkey, in your case
   0x8109 = 33033, and no need for btl_openib_ib_pkey_ix value.

   ex.
   mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
   btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
   LT (2) (size min max avg) 1 3.511429 3.511429 3.511429

   if it's not working check cat
   /sys/class/infiniband/mthca0/ports/1/pkeys/* for pkeys ans SM,
   maybe it's a setup.

   Pasha is currently checking this issue.

   Best regards,

   Lenny.





   On 10/7/08, *Jeff Squyres* mailto:jsquy...@cisco.com>
   <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>> wrote:

   FWIW, if this configuration is for all of your users, you
   might want to specify these MCA params in the default MCA
   param file, or the environment, ...etc.  Just so that you
   don't have to specify it on every mpirun command line.

   See
 
 http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.




   On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:

   Sorry, misunderstood the question,

   thanks for Pasha the right command line will be

   -mca btl openib,self -mca btl_openib_of_pkey_val 0x8109
   -mca btl_openib_of_pkey_ix 1

   ex.

   #mpirun -np 2 -H witch2,witch3 -mca btl openib,self
-mca
   btl_openib_of_pkey_val 0x8001 -mca
btl_openib_of_pkey_ix 1
   ./mpi_p1_4_TRUNK -t lt
   LT (2) (size min max avg) 1 3.443480 3.443480 3.443480


   Best regards

   Lenny.


   On 10/6/08, Jeff Squyres mailto:jsquy...@cisco.com>
   <mailto:jsquy...@cisco.com
<mailto:jsquy...@cisco.com>>> wrote: On Oct 5, 2008, at

   1:22 PM, Lenny Verkhovsky wrote:

   you should probably use -mca tcp,self  -m

Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)
 parameter I am missing?

I was successful using tcp only:

/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile
machinefile -mca btl tcp,self -mca btl_openib_max_btls 1
-mca btl_openib_ib_pkey_val 0x8109
/cluster/pallas/x86_64-ib/IMB-MPI1



Thanks,
Matt Burgess

___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres

Cisco Systems


___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Jeff Squyres

Cisco Systems






_______
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.

Index: ompi/mca/btl/openib/btl_openib_component.c
===
--- ompi/mca/btl/openib/btl_openib_component.c  (revision 19490)
+++ ompi/mca/btl/openib/btl_openib_component.c  (working copy)
@@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
  goto dealloc_pd;
 }

-ret = OMPI_SUCCESS; 
+ret = OMPI_SUCCESS;
 /* Note ports are 1 based hence j = 1 */
 for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
 struct ibv_port_attr ib_port_attr;
@@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
 uint16_t pkey,j;
 for (j=0; j < hca->ib_dev_attr.max_pkeys; j++) {
 ibv_query_pkey(hca->ib_dev_context, i, j, &pkey);
-pkey=ntohs(pkey);
+pkey=ntohs(pkey) & 0x7fff;
 if(pkey == mca_btl_openib_component.ib_pkey_val){
 ret = init_one_port(btl_list, hca, i, j, 
&ib_port_attr);
 break;
Index: ompi/mca/btl/openib/btl_openib_ini.c
===
--- ompi/mca/btl/openib/btl_openib_ini.c(revision 19490)
+++ ompi/mca/btl/openib/btl_openib_ini.c(working copy)
@@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
 static void reset_section(bool had_previous_value, parsed_section_values_t *s);
 static void reset_values(ompi_btl_openib_ini_values_t *v);
 static int save_section(parsed_section_values_t *s);
-static int intify(char *string);
-static int intify_list(char *str, uint32_t **values, int *len);
 static inline void show_help(const char *topic);


@@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
all whitespace at the beginning and ending of the value. */

 if (0 == strcasecmp(key_buffer, "vendor_id")) {
-if (OMPI_SUCCESS != (ret = intify_list(value, &sv->vendor_ids, 
+if (OMPI_SUCCESS != (ret = ompi_btl_openib_ini_intify_list(value, 
&sv->vendor_ids, 
&sv->vendor_ids_len))) {
 return ret;
 }
 }

 else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
-if (OMPI_SUCCESS != (ret = intify_list(value, &sv->vendor_part_ids,
+if (OMPI_SUCCESS != (ret = ompi_btl_openib_ini_intify_list(value, 
&sv->vendor_part_ids,
&sv->vendor_part_ids_len))) {
 return ret;
 }
@@ -379,13 +377,13 @@ static int parse_line(parsed_section_val

 else if (0 == strcasecmp(key_buffer, "mtu")) {
 /* Single value */
-sv->values.mtu = (uint32_t) intify(value);
+sv->values.mtu = (uint32_t) ompi_btl_openib_ini_intify(value);
 sv->values.mtu_set = true;
 }

 else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
 /* Single value */
-sv->values.use_eager_rdma = (uint32_t) intify(value);
+sv->values.use_eager_rdma = (uint32_t) 
ompi_btl_openib_ini_intify(value);
 sv->values.use_eager_rdma_set = true;
 }

@@ -547,7 +545,7 @@ static int save_section(parsed_section_v
 /*
  * Do string-to-integer conversion, for both hex and decimal numbers
  */
-static int intify(char *str)
+int ompi_btl_openib_ini_intify(char *str)
 {
 while (isspace(*str)) {
 ++str;
@@ -568,7 +566,7 @@ static int intify(char *str)
 /*
  * Take a comma-delimite

Re: [OMPI devel] openib component lock

2008-08-06 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
In working on https://svn.open-mpi.org/trac/ompi/ticket/1434, I see 
fairly inconsistent use of the mca_openib_btl_component.ib_lock, such 
as within btl_openib_proc.c.


In fixing #1434, do I need to be mindful doing all the locking 
properly, or is it already so broken that it doesn't really matter, 
and all I need to do is ensure that I don't put in any bozo deadlocks?


Hmm good question... I never tested thread build of openib btl, so I 
don't know if it really works. But we try to keep it thread safe.

(All the thread stuff in openib btl require serious review..)


Re: [OMPI devel] IBCM error

2008-08-03 Thread Pavel Shamis (Pasha)

Thanks for update.

Sean Hefty wrote:

I've committed a patch to my libibcm git tree with the values

IB_CM_ASSIGN_SERVICE_ID
IB_CM_ASSIGN_SERVICE_ID_MASK

these will be in libibcm release 1.0.3, which will shortly...

- Sean


  




Re: [OMPI devel] trunk hangs since r19010

2008-07-29 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:


This used to be true, but I think we changed it a while ago (Pasha: do 
you remember?) because Mellanox HCAs are capable of send-to-self 
(process) and there were no code changes necessary to enable it.  So 
it allowed a slightly simpler command line.  This was quite a while 
ago, IIRC.

Yep, Correct.

FYI. In my MTT testing I also see a lot of killed tests.


Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)

Sean Hefty wrote:

It is not zero, it should be:
#define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL)

Unfortunately the value defined in kernel level IBCM and does not
exposed to user level.
Can you please expose it to user level (infiniband/cm.h)



Oops - good catch.  I will add the assign ID and mask values to the header file
for the next release.  Until then, can you try using the values given in the
kernel header file and let me know if it solves the problem?
  

I already prepared patch for OMPI that defines the value.
Few people already reported that the patch ok ( 
https://svn.open-mpi.org/trac/ompi/ticket/1388 )


Pasha


Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

On Jul 16, 2008, at 11:07 AM, Don Kerr wrote:


Pasha added configure switches for this about a week ago:

   --en|disable-openib-ibcm
   --en|disable-openib-rdmacm
I like these flags but I thought there was going to be a run time 
check for cases where Open MPI is built on a system that has ibcm 
support but is later run on a system without ibcm support.



Yes, there are.

- if the /dev/infiniband/ucm* files aren't there, we silently return 
"not supported" and skip ibcm
- if ib_cm_open_device() (the first function call) fails, we assume 
that IBCM simply isn't supported on this platform and silently return 
"not supported" and skip ibcm


Right not we are skipping the IBCM all time. Only if user specify IBCM 
explicitly via include/exclude interface the IBCM will be used.


Pasha


Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)

Sean Hefty wrote:

If you don't care what the service ID is, you can specify 0, and the kernel will
assign one.  The assigned value can be retrieved by calling ib_cm_attr_id().
(I'm assuming that you communicate the IDs out of band somehow.)
  

It is not zero, it should be:
#define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL)

Unfortunately the value defined in kernel level IBCM and does not 
exposed to user level.

Can you please expose it to user level (infiniband/cm.h)

Regards,
Pasha



Re: [OMPI devel] Segfault in 1.3 branch

2008-07-15 Thread Pavel Shamis (Pasha)

I opened ticket for the bug:
https://svn.open-mpi.org/trac/ompi/ticket/1389

Ralph Castain wrote:

It looks like a new issue to me, Pasha. Possibly a side consequence of the
IOF change made by Jeff and I the other day. From what I can see, it looks
like you app was a simple "hello" - correct?

If you look at the error, the problem occurs when mpirun is trying to route
a message. Since the app is clearly running at this time, the problem is
probably in the IOF. The error message shows that mpirun is attempting to
route a message to a jobid that doesn't exist. We have a test in the RML
that forces an "abort" if that occurs.

I would guess that there is either a race condition or memory corruption
occurring somewhere, but I have no idea where.

This may be the "new hole in the dyke" I cautioned about in earlier notes
regarding the IOF... :-)

Still, given that this hits rarely, it probably is a more acceptable bug to
leave in the code than the one we just fixed (duplicated stdin)...

Ralph



On 7/14/08 1:11 AM, "Pavel Shamis (Pasha)"  wrote:

  

Please see http://www.open-mpi.org/mtt/index.php?do_redir=764

The error is not consistent. It takes a lot of iteration to reproduce it.
In my MTT testing I seen it few times.

Is it know issue ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)



Guess what - we don't always put them out there because - tada - we don't
use them! What goes out on the backend is a stripped down version of
libraries we require. Given the huge number of libraries people provide
(looking at the bigger, beyond OMPI picture), it consumes a lot of limited
disk space to install every library on every node. So sometimes we build our
own rpm's to pick up only what we need.

As long as --without-rdmacm --without-ibcm are present, then we are happy.

  

FYI
I recently added options that allow enable/disable all the *cm stuff:

 --enable-openib-ibcmEnable Open Fabrics IBCM support in openib BTL
 (default: enabled)
 --enable-openib-rdmacm  Enable Open Fabrics RDMACM support in openib BTL
 (default: enabled)




Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)


I need to check on this.  You may want to look at section A3.2.3 of 
the spec.
If you set the first byte (network order) to 0x00, and the 2nd byte 
to 0x01,
then you hit a 'reserved' range that probably isn't being used 
currently.


If you don't care what the service ID is, you can specify 0, and the 
kernel will
assign one.  The assigned value can be retrieved by calling 
ib_cm_attr_id().

(I'm assuming that you communicate the IDs out of band somehow.)



Ok; we'll need to check into this.  I don't remember the ordering -- 
we might actually be communicating the IDs before calling 
ib_cm_listen() (since we were simply using the PIDs, we could do that).


Thanks for the tip!  Pasha -- can you look into this?
It looks that th modex message we are preparing during query stage, so 
the order looks ok.
Unfortunately on my machines ibcm module does not create 
"/dev/infiniband/ucm*" and I can not thest the functionality.


Regards,
Pasha.



Re: [OMPI devel] Segfault in 1.3 branch

2008-07-15 Thread Pavel Shamis (Pasha)



It looks like a new issue to me, Pasha. Possibly a side consequence of the
IOF change made by Jeff and I the other day. From what I can see, it looks
like you app was a simple "hello" - correct?
  

Yep, it is simple hello application.

If you look at the error, the problem occurs when mpirun is trying to route
a message. Since the app is clearly running at this time, the problem is
probably in the IOF. The error message shows that mpirun is attempting to
route a message to a jobid that doesn't exist. We have a test in the RML
that forces an "abort" if that occurs.

I would guess that there is either a race condition or memory corruption
occurring somewhere, but I have no idea where.

This may be the "new hole in the dyke" I cautioned about in earlier notes
regarding the IOF... :-)

Still, given that this hits rarely, it probably is a more acceptable bug to
leave in the code than the one we just fixed (duplicated stdin)...
  
It is not so rare issue, 19 failures in my MTT run 
(http://www.open-mpi.org/mtt/index.php?do_redir=765).


Pasha

Ralph



On 7/14/08 1:11 AM, "Pavel Shamis (Pasha)"  wrote:

  

Please see http://www.open-mpi.org/mtt/index.php?do_redir=764

The error is not consistent. It takes a lot of iteration to reproduce it.
In my MTT testing I seen it few times.

Is it know issue ?

Regards,
Pasha
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)




Should we not even build support for it?
I think IBCM CPC build should be enabled by default. The IBCM is 
supplied with OFED so it should not be any problem during install.


PRO: don't even allow the possibility of running with it, because we 
know that there are issues with the ibcm userspace library (i.e., 
reduce problem reports from users)


PRO: users don't have to have libibcm installed on compute nodes 
(we've actually gotten some complaints about this)
We got compliances only for case when ompi was build on platform with 
IBCM and after it was run on platform without IBCM.  Also we did not 
have option to disable
the ibcm during compilation. So actually it was no way to install OMPI 
on compute node. We added the option and the problem was resolved.
In most cases the OFED install is the same on all nodes and it should 
not be any problem to build IBCM support by default.


Pasha




Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)

I can add in head of  query function something like :

if (!mca_btl_openib_component.cpc_explicitly_defined)
   return OMPI_ERR_NOT_SUPPORTED;


Jeff Squyres wrote:

On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote:


Seems to be fixed.


Well, it's "fixed" in that Pasha turned off the error message.  But 
the same issue is undoubtedly happening.


I was asking for something a little stronger: perhaps we should 
actually have IBCM not try to be used unless it's specifically asked 
for.  Or maybe it shouldn't even build itself unless specifically 
asked for (which would obviously take care of the run-time issues as 
well).


The whole point of doing IBCM was to have a simple and fast mechanism 
for IB wireup.  But with these two problems (IBCM not properly 
installed on all systems, and ib_cm_listen() fails periodically), it 
more or less makes it unusable.  Therefore we shouldn't recommend it 
to production customers, and per precedent elsewhere in the code base, 
we should either not build it by default and/or not use it unless 
specifically asked for.






[OMPI devel] Segfault in 1.3 branch

2008-07-14 Thread Pavel Shamis (Pasha)

Please see http://www.open-mpi.org/mtt/index.php?do_redir=764

The error is not consistent. It takes a lot of iteration to reproduce it.
In my MTT testing I seen it few times.

Is it know issue ?

Regards,
Pasha


Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)

Fixed in https://svn.open-mpi.org/trac/ompi/changeset/18897

Is it any other know IBCM issue ?

Regards,
Pasha

Jeff Squyres wrote:
I think you said opposite things: Lenny's command line did not 
specifically ask for ibcm, but it was used anyway.  Lenny -- did you 
explicitly request it somewhere else (e.g., env var or MCA param file)?


I suspect that you did not; I suspect (without looking at the code 
again) that ibcm tried to select itself and failed on the 
ibcm_listen() call, so it fell back to oob.  This might have to be 
another workaround in OMPI, perhaps something like this:


if (ibcm_listen() fails)
   if (ibcm explicitly requested)
   print_warning()
   fail to use ibcm

Has this been filed as a bug at openfabrics.org?  I don't think that I 
filed it when Brad and I were testing on RoadRunner -- it would 
probably be good if someone filed it.




On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote:


Pasha is right, I didn't disabled it.

On 7/13/08, Pavel Shamis (Pasha)  wrote: 
Jeff Squyres wrote:
Brad and I did some scale testing of IBCM and saw this error 
sometimes.  It seemed to happen with higher frequency when you 
increased the number of processes on a single node.


I talked to Sean Hefty about it, but we never figured out a 
definitive cause or solution.  My best guess is that there is 
something wonky about multiple processes simultaneously interacting 
with the IBCM kernel driver from userspace; but I don't know jack 
about kernel stuff, so that's a total SWAG.


Thanks for reminding me of this issue; I admit that I had forgotten 
about it.  :-(  Pasha -- should IBCM not be the default?

It is not default. I guess Lenny configured it explicitly, is not it ?

Pasha.





On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:

Hi,

I am getting this error sometimes.

/home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hello
[witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] 
failed to ib_cm_listen 10 times: rc=-1, errno=22

Hello world! I'm 0 of 100 on witch2


Best Regards

Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
Brad and I did some scale testing of IBCM and saw this error 
sometimes.  It seemed to happen with higher frequency when you 
increased the number of processes on a single node.


I talked to Sean Hefty about it, but we never figured out a definitive 
cause or solution.  My best guess is that there is something wonky 
about multiple processes simultaneously interacting with the IBCM 
kernel driver from userspace; but I don't know jack about kernel 
stuff, so that's a total SWAG.


Thanks for reminding me of this issue; I admit that I had forgotten 
about it.  :-(  Pasha -- should IBCM not be the default?

It is not default. I guess Lenny configured it explicitly, is not it ?

Pasha.





On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:


Hi,

I am getting this error sometimes.

/home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hostfile 
/home/USERS/lenny/TESTS/COMPILERS/hello
[witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] 
failed to ib_cm_listen 10 times: rc=-1, errno=22

Hello world! I'm 0 of 100 on witch2


Best Regards

Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] open ib dependency question

2008-07-10 Thread Pavel Shamis (Pasha)

FYI the issue was resolved - https://svn.open-mpi.org/trac/ompi/ticket/1376
Regards,
Pasha

Bogdan Costescu wrote:

On Thu, 3 Jul 2008, Pavel Shamis (Pasha) wrote:

I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.


Not sure if this is related... I am looking at using a nightly 1.3 
snapshot and I get this type of error messages when running:


[n020205][[36506,1],0][connect/btl_openib_connect_ibcm.c:723:ibcm_component_query] 
failed to open IB CM device: /dev/infiniband/ucm0


which is actually right, as /dev/infiniband on the nodes doesn't 
contain ucm0. On the same cluster, OpenMPI 1.2.7rc2 works fine; the 
configure options for building them are the same.


The output of ldd shows for the binary linked with 1.3a:

libibcm.so.1 => /opt/ofed/1.2.5.4/lib64/libibcm.so.1

while this is missing from the binary linked with 1.2. Even after 
printing these messages, the binary linked with 1.3a works; it works 
even when I specify "--mca btl openib,self" so I think that the IB 
stack is still being used (there is also a TCP/GigE network which 
could be chosen otherwise).


I don't know whether this is caused by a somehow inconsistent setup of 
the system, but I would welcome an option to make 1.3a behave like 1.2.






Re: [OMPI devel] open ib dependency question

2008-07-08 Thread Pavel Shamis (Pasha)




Pasha -- do you want to open a ticket?


Done. https://svn.open-mpi.org/trac/ompi/ticket/1376
Pasha.


Re: [OMPI devel] open ib dependency question

2008-07-06 Thread Pavel Shamis (Pasha)

I added patch to the ticked, please review.

Pasha.
Jeff Squyres wrote:

Ok:

https://svn.open-mpi.org/trac/ompi/ticket/1375

I think any of us could do this -- it's pretty straightforward.  No 
guarantees on when I can get to it; my 1.3 list is already pretty long...



On Jul 3, 2008, at 6:20 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

Do you need configury to disable building ibcm / rdmacm support?

The more I think about it, the more I think that these would be good 
features to have for v1.3...
I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.



On Jul 3, 2008, at 2:52 AM, Don Kerr wrote:

I did not think it was required but it hung me up when I built ompi 
on one system which had the ibcm libraries and then ran on a system 
without the ibcm libs. I had another issue on the system without 
ibcm libs which prevented my building there but I will go down that 
path again. Thanks.


Jeff Squyres wrote:
That is the IBCM library for the IBCM CPC -- IB connection manager 
stuff.


It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links 
it in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on 
libibcm.so. Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] open ib dependency question

2008-07-06 Thread Pavel Shamis (Pasha)
I see the same issue on my Mellanox OFED 1.3. IBCM module is loaded but 
is no such device in system.

Jeff, looks like some bug in IBCM stuff... (not ompi)
I think we should print the error only if  ibcm  was  explicitly 
selected by user. But from the cpc level it is no way to know

about explicit selectionMaybe just hide the print ?

Bogdan Costescu wrote:

On Thu, 3 Jul 2008, Pavel Shamis (Pasha) wrote:

I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.


Not sure if this is related... I am looking at using a nightly 1.3 
snapshot and I get this type of error messages when running:


[n020205][[36506,1],0][connect/btl_openib_connect_ibcm.c:723:ibcm_component_query] 
failed to open IB CM device: /dev/infiniband/ucm0


which is actually right, as /dev/infiniband on the nodes doesn't 
contain ucm0. On the same cluster, OpenMPI 1.2.7rc2 works fine; the 
configure options for building them are the same.


The output of ldd shows for the binary linked with 1.3a:

libibcm.so.1 => /opt/ofed/1.2.5.4/lib64/libibcm.so.1

while this is missing from the binary linked with 1.2. Even after 
printing these messages, the binary linked with 1.3a works; it works 
even when I specify "--mca btl openib,self" so I think that the IB 
stack is still being used (there is also a TCP/GigE network which 
could be chosen otherwise).


I don't know whether this is caused by a somehow inconsistent setup of 
the system, but I would welcome an option to make 1.3a behave like 1.2.






Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Pavel Shamis (Pasha)



Ok:

https://svn.open-mpi.org/trac/ompi/ticket/1375

I think any of us could do this -- it's pretty straightforward.  No 
guarantees on when I can get to it; my 1.3 list is already pretty long...

No problem. I will take this one.
Pasha.



On Jul 3, 2008, at 6:20 AM, Pavel Shamis (Pasha) wrote:


Jeff Squyres wrote:

Do you need configury to disable building ibcm / rdmacm support?

The more I think about it, the more I think that these would be good 
features to have for v1.3...
I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.



On Jul 3, 2008, at 2:52 AM, Don Kerr wrote:

I did not think it was required but it hung me up when I built ompi 
on one system which had the ibcm libraries and then ran on a system 
without the ibcm libs. I had another issue on the system without 
ibcm libs which prevented my building there but I will go down that 
path again. Thanks.


Jeff Squyres wrote:
That is the IBCM library for the IBCM CPC -- IB connection manager 
stuff.


It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links 
it in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on 
libibcm.so. Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:

Do you need configury to disable building ibcm / rdmacm support?

The more I think about it, the more I think that these would be good 
features to have for v1.3...
I had similar issue recently. It will be nice to have option to 
disable/enable *CM via config flags.



On Jul 3, 2008, at 2:52 AM, Don Kerr wrote:

I did not think it was required but it hung me up when I built ompi 
on one system which had the ibcm libraries and then ran on a system 
without the ibcm libs. I had another issue on the system without ibcm 
libs which prevented my building there but I will go down that path 
again. Thanks.


Jeff Squyres wrote:
That is the IBCM library for the IBCM CPC -- IB connection manager 
stuff.


It's not *necessary*; you could use the OOB CPC if you want to.

That being said, OMPI currently builds support for it (and links it 
in) if it finds the right headers and library files. We don't 
currently have configury to disable this behavior (and *not* build 
RDMACM and/or IBCM support).


Do you have a problem / need to disable building support for IBCM?



On Jul 2, 2008, at 12:02 PM, Don Kerr wrote:



It appears that the mca_btl_openib.so has a dependency on 
libibcm.so. Is this necessary?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

I did fresh check out and everything works well.
So looks like some svn up screw my svn.
Ralph, thanks for help !

Ralph H Castain wrote:

Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.

I went ahead and fixed the grpcomm/basic module, but as I note in the commit
message, that is now an experimental area. The grpcomm/bad module is the
default for that reason.

Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is
getting built. My guess is that you have a corrupted checkout or build and
that the component is either missing or not getting built.


On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)"  wrote:

  

Ralph H Castain wrote:


I can't find anything wrong so far. I'm waiting in a queue on Odin to try
there since Jeff indicated you are using rsh as a launcher, and that's the
only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.
  
  

 I use ssh., here is command line:
./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
./osu_benchmarks-3.0/osu_latency


Meantime, I'm building on RoadRunner and will test there (TM enviro).


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)"  wrote:

  
  

You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
  
  

Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.



So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" 
wrote:

  
  
  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365]
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

Ralph H Castain wrote:

Ha! I found it - you left out one very important detail. You are specifying
the use of the grpcomm basic module instead of the default "bad" one.
  

Hmm , I did not specified any "grpcomm" module.

I just checked and that module is indeed showing a problem. I'll see what I
can do.

For now, though, just use the default grpcomm and it will work fine.


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)"  wrote:

  

You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
  

Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.


So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" 
wrote:

  
  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

Ralph H Castain wrote:

I can't find anything wrong so far. I'm waiting in a queue on Odin to try
there since Jeff indicated you are using rsh as a launcher, and that's the
only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.
  

I use ssh., here is command line:
./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self   
./osu_benchmarks-3.0/osu_latency

Meantime, I'm building on RoadRunner and will test there (TM enviro).


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)"  wrote:

  

You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
  

Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.


So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" 
wrote:

  
  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)



You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
  
Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M 
, OFED 1.3.1

Pasha.

So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" 
wrote:

  

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




[OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)

I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past 
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past 
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365] 
--

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 orte_grpcomm_modex failed
 --> Returned "Data unpack would read past end of buffer" (-26) instead 
of "Success" (0)




Re: [OMPI devel] Memory hooks change testing

2008-06-12 Thread Pavel Shamis (Pasha)

In my MTT testing it looks ok, too.

Brad Benton wrote:

Brian,

This is working smoothly now:  both the configuration/build and 
execution.  So,

from my standpoint, it looks good for inclusion into the trunk.

--brad



On Wed, Jun 11, 2008 at 4:50 PM, Brian W. Barrett 
mailto:brbar...@open-mpi.org>> wrote:


Brad unfortunately figured out I had done something to annoy the
gods of
mercurial and the repository below didn't contain all the changes
advertised (and in fact didn't work).  I've since rebuilt the
repository
and verified it works now.  I'd recommend deleting your existing
clones of
the repository below and starting over.

Sorry about that,

Brian


On Wed, 11 Jun 2008, Brian Barrett wrote:

> Did anyone get a chance to test (or think about testing) this?
 I'd like to
> commit the changes on Friday evening, if I haven't heard any
negative
> feedback.
>
> Brian
>
> On Jun 9, 2008, at 8:56 PM, Brian Barrett wrote:
>
>> Hi all -
>>
>> Per the RFC I sent out last week, I've implemented a revised
behavior of
>> the memory hooks for high speed networks.  It's a bit different
than the
>> RFC proposed, but still very minor and fairly straight foward.
>>
>> The default is to build ptmalloc2 support, but as an almost
complete
>> standalone library.  If the user wants to use ptmalloc2, he
only has to add
>> -lopenmpi-malloc to his link line.  Even when standalone and
openmpi-malloc
>> is not linked in, we'll still intercept munmap as it's needed
for mallopt
>> (below) and we've never had any trouble with that part of
ptmalloc2 (it's
>> easy to intercept).
>>
>> As a *CHANGE* in behavior, if leave_pinned support is turned on
and there's
>> no ptmalloc2 support, we will automatically enable mallopt.  As
a *CHANGE*
>> in behavior, if the user disables mallopt or mallopt is not
available and
>> leave pinned is requested, we'll abort.  I think these both
make sense and
>> are closest to expected behavior, but wanted to point them out.
 It is
>> possible for the user to disable mallopt and enable
leave_pinned, but that
>> will *only* work if there is some other mechanism for
intercepting free
>> (basically, it allows a way to ensure your using ptmalloc2
instead of
>> mallopt).
>>
>> There is also a new memory component, mallopt, which only
intercepts munmap
>> and exists only to allow users to enable mallopt while not
building the
>> ptmalloc2 component at all.  Previously, our mallopt support
was lacking in
>> that it didn't cover the case where users explicitly called
munmap in their
>> applications.  Now, it does.
>>
>> The changes are fairly small and can be seen/tested in the HG
repository
>> bwb/mem-hooks, URL below.  I think this would be a good thing
to push to
>> 1.3, as it will solve an ongoing problem on Linux (basically,
users getting
>> screwed by our ptmalloc2 implementation).
>>
>>   http://www.open-mpi.org/hg/hgwebdir.cgi/bwb/mem-hooks/
>>
>> Brian
>>
>> --
>> Brian Barrett
>> Open MPI developer
>> http://www.open-mpi.org/
>>
>>
>
>
___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Pallas fails

2008-06-12 Thread Pavel Shamis (Pasha)

With 1.3a1r18643 Pallas tests pass on my machine.
But I see new failures (assertion) in Intel-Test 
http://www.open-mpi.org/mtt/index.php?do_redir=733


PI_Type_struct_types_c: btl_sm.c:684: mca_btl_sm_sendi: Assertion `max_data == 
payload_size'
failed.
[sw216:32013] *** Process received signal ***
[sw216:32013] Signal: Aborted (6)
[sw216:32013] Signal code:  (-6)
[sw216:32013] [ 0] /lib64/libpthread.so.0 [0x2aba5e51ec10]
[sw216:32013] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x2aba5e657b95]
[sw216:32013] [ 2] /lib64/libc.so.6(abort+0x110) [0x2aba5e658f90]
[sw216:32013] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x2aba5e651256]
[sw216:32013] [ 4]




Pavel Shamis (Pasha) wrote:

Last conf. call Jeff mentioned that he see some collectives failures.
In my MTT testing I also see that Pallas collectives failed - 
http://www.open-mpi.org/mtt/index.php?do_redir=682


 Alltoall

#
# Benchmarking Alltoall 
# #processes = 20 
#

   #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
0 1000 0.03 0.05 0.04
1 1000   179.15   179.22   179.18
2 1000   155.96   156.02   155.98
4 1000   156.93   156.98   156.95
8 1000   163.63   163.67   163.65
   16 1000   115.04   115.08   115.07
   32 1000   123.57   123.62   123.59
   64 1000   129.78   129.82   129.80
  128 1000   141.45   141.49   141.48
  256 1000   960.11   960.24   960.20
  512 1000   900.95   901.11   901.04
 1024 1000   921.95   922.05   922.00
 2048 1000   862.50   862.72   862.60
 4096 1000  1044.90  1044.95  1044.92
 8192 1000  1458.59  1458.77  1458.69
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Memory hooks change testing

2008-06-11 Thread Pavel Shamis (Pasha)

Brian Barrett wrote:
Did anyone get a chance to test (or think about testing) this?  I'd  
like to commit the changes on Friday evening, if I haven't heard any  
negative feedback.
  

I will run it tomorrow on my cluster.

Brian

On Jun 9, 2008, at 8:56 PM, Brian Barrett wrote:

  

Hi all -

Per the RFC I sent out last week, I've implemented a revised  
behavior of the memory hooks for high speed networks.  It's a bit  
different than the RFC proposed, but still very minor and fairly  
straight foward.


The default is to build ptmalloc2 support, but as an almost complete  
standalone library.  If the user wants to use ptmalloc2, he only has  
to add -lopenmpi-malloc to his link line.  Even when standalone and  
openmpi-malloc is not linked in, we'll still intercept munmap as  
it's needed for mallopt (below) and we've never had any trouble with  
that part of ptmalloc2 (it's easy to intercept).


As a *CHANGE* in behavior, if leave_pinned support is turned on and  
there's no ptmalloc2 support, we will automatically enable mallopt.   
As a *CHANGE* in behavior, if the user disables mallopt or mallopt  
is not available and leave pinned is requested, we'll abort.  I  
think these both make sense and are closest to expected behavior,  
but wanted to point them out.  It is possible for the user to  
disable mallopt and enable leave_pinned, but that will *only* work  
if there is some other mechanism for intercepting free (basically,  
it allows a way to ensure your using ptmalloc2 instead of mallopt).


There is also a new memory component, mallopt, which only intercepts  
munmap and exists only to allow users to enable mallopt while not  
building the ptmalloc2 component at all.  Previously, our mallopt  
support was lacking in that it didn't cover the case where users  
explicitly called munmap in their applications.  Now, it does.


The changes are fairly small and can be seen/tested in the HG  
repository bwb/mem-hooks, URL below.  I think this would be a good  
thing to push to 1.3, as it will solve an ongoing problem on Linux  
(basically, users getting screwed by our ptmalloc2 implementation).


   http://www.open-mpi.org/hg/hgwebdir.cgi/bwb/mem-hooks/

Brian

--
 Brian Barrett
 Open MPI developer
 http://www.open-mpi.org/





  




[OMPI devel] Pallas fails

2008-06-04 Thread Pavel Shamis (Pasha)

Last conf. call Jeff mentioned that he see some collectives failures.
In my MTT testing I also see that Pallas collectives failed - 
http://www.open-mpi.org/mtt/index.php?do_redir=682


Alltoall

#
# Benchmarking Alltoall 
# #processes = 20 
#

  #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   0 1000 0.03 0.05 0.04
   1 1000   179.15   179.22   179.18
   2 1000   155.96   156.02   155.98
   4 1000   156.93   156.98   156.95
   8 1000   163.63   163.67   163.65
  16 1000   115.04   115.08   115.07
  32 1000   123.57   123.62   123.59
  64 1000   129.78   129.82   129.80
 128 1000   141.45   141.49   141.48
 256 1000   960.11   960.24   960.20
 512 1000   900.95   901.11   901.04
1024 1000   921.95   922.05   922.00
2048 1000   862.50   862.72   862.60
4096 1000  1044.90  1044.95  1044.92
8192 1000  1458.59  1458.77  1458.69
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0
*** An error occurred in MPI_Alltoall
*** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0



Re: [OMPI devel] r18551 - brakes ob1 compilation on SLES10

2008-06-03 Thread Pavel Shamis (Pasha)

The compilation passed on SLES 10 SP1.
So SP1 resolves the gcc/binutils issue.
We need to add somewhere notice that " ompi 1.3 can not be compiled on 
SLES10 , please update ... "


Regards,
Pasha

Pavel Shamis (Pasha) wrote:

Ralf Wildenhues wrote:

* Pavel Shamis (Pasha) wrote on Mon, Jun 02, 2008 at 02:25:13PM CEST:
 

r18551 brakes ompi compilation on SLES10 gcc 4.1.0.

I got follow error on my systems  
(http://www.open-mpi.org/mtt/index.php?do_redir=672 ):



[...]
 
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
.libs/pml_ob1_sendreq.o: relocation R_X86_64_PC32 against  
`mca_pml_ob1_rndv_completion' can not be used when making a shared  
object; recompile with -fPIC



The build log shows that your clock isn't set properly, so I'd first fix
that and do a complete rebuild afterwards.  

I fixed the clock and the problem still was there.

The log also shows that
.libs/pml_ob1_sendreq.o is compiled with -fPIC, so second I'd assume a
compiler or binutils bug.  The GCC bugzilla lists
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30153
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28781
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21382
as possible starting points for further investigation.  Maybe your
distributor has fixed or newer binutils packages for you.
  

I will check for SLES 10 updates.

Cheers,
Ralf

  







Re: [OMPI devel] r18551 - brakes ob1 compilation on SLES10

2008-06-02 Thread Pavel Shamis (Pasha)

Ralf Wildenhues wrote:

* Pavel Shamis (Pasha) wrote on Mon, Jun 02, 2008 at 02:25:13PM CEST:
  

r18551 brakes ompi compilation on SLES10 gcc 4.1.0.

I got follow error on my systems  
(http://www.open-mpi.org/mtt/index.php?do_redir=672 ):



[...]
  
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
.libs/pml_ob1_sendreq.o: relocation R_X86_64_PC32 against  
`mca_pml_ob1_rndv_completion' can not be used when making a shared  
object; recompile with -fPIC



The build log shows that your clock isn't set properly, so I'd first fix
that and do a complete rebuild afterwards.  

I fixed the clock and the problem still was there.

The log also shows that
.libs/pml_ob1_sendreq.o is compiled with -fPIC, so second I'd assume a
compiler or binutils bug.  The GCC bugzilla lists
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30153
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28781
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21382
as possible starting points for further investigation.  Maybe your
distributor has fixed or newer binutils packages for you.
  

I will check for SLES 10 updates.

Cheers,
Ralf

  




[OMPI devel] r18551 - brakes ob1 compilation on Sles10

2008-06-02 Thread Pavel Shamis (Pasha)

r18551 brakes ompi compilation on SLES10 gcc 4.1.0.

I got follow error on my systems 
(http://www.open-mpi.org/mtt/index.php?do_redir=672 ):
make[2]: Entering directory 
`/.autodirect/hpc/work/pasha/tmp/mtt-8/installs/5VHm/src/openmpi-1.3a1r18553/ompi/mca/pml/ob1'
/bin/sh ../../../../libtool --tag=CC   --mode=link gcc  -g -pipe -Wall 
-Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes 
-Wstrict-prototypes -Wcomment -pedantic 
-Werror-implicit-function-declaration -finline-functions 
-fno-strict-aliasing -pthread -fvisibility=hidden -module -avoid-version 
-export-dynamic   -o mca_pml_ob1.la -rpath 
/.autodirect/hpc/work/pasha/tmp/mtt-8/installs/5VHm/install/lib/openmpi 
pml_ob1.lo pml_ob1_comm.lo pml_ob1_component.lo pml_ob1_iprobe.lo 
pml_ob1_irecv.lo pml_ob1_isend.lo pml_ob1_progress.lo pml_ob1_rdma.lo 
pml_ob1_rdmafrag.lo pml_ob1_recvfrag.lo pml_ob1_recvreq.lo 
pml_ob1_sendreq.lo pml_ob1_start.lo  -lnsl -lutil  -lm
libtool: link: gcc -shared  .libs/pml_ob1.o .libs/pml_ob1_comm.o 
.libs/pml_ob1_component.o .libs/pml_ob1_iprobe.o .libs/pml_ob1_irecv.o 
.libs/pml_ob1_isend.o .libs/pml_ob1_progress.o .libs/pml_ob1_rdma.o 
.libs/pml_ob1_rdmafrag.o .libs/pml_ob1_recvfrag.o 
.libs/pml_ob1_recvreq.o .libs/pml_ob1_sendreq.o .libs/pml_ob1_start.o   
-lnsl -lutil -lm  -pthread   -pthread -Wl,-soname -Wl,mca_pml_ob1.so -o 
.libs/mca_pml_ob1.so
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
.libs/pml_ob1_sendreq.o: relocation R_X86_64_PC32 against 
`mca_pml_ob1_rndv_completion' can not be used when making a shared 
object; recompile with -fPIC
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: 
final link failed: Bad value

collect2: ld returned 1 exit status

Removing inline from some of functions (see attached file), resolves the 
problem.


Thanks,
Pasha

--- pml_ob1_sendreq.c   2008-06-01 10:59:51.094063000 +0300
+++ pml_ob1_sendreq.c.new   2008-06-02 15:07:02.612983000 +0300
@@ -192,7 +192,7 @@
 MCA_PML_OB1_PROGRESS_PENDING(bml_btl);
 }

-static inline void
+static void
 mca_pml_ob1_match_completion_free( struct mca_btl_base_module_t* btl,  
struct mca_btl_base_endpoint_t* ep,
struct mca_btl_base_descriptor_t* des,
@@ -235,7 +235,7 @@
  *  Completion of the first fragment of a long message that 
  *  requires an acknowledgement
  */
-static inline void
+static void
 mca_pml_ob1_rndv_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,
@@ -269,7 +269,7 @@
  * Completion of a get request.
  */

-static inline void
+static void
 mca_pml_ob1_rget_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,
@@ -295,7 +295,7 @@
  * Completion of a control message - return resources.
  */

-static inline void
+static void
 mca_pml_ob1_send_ctl_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,
@@ -312,7 +312,7 @@
  * to schedule additional fragments.
  */

-static inline void
+static void
 mca_pml_ob1_frag_completion( mca_btl_base_module_t* btl,
  struct mca_btl_base_endpoint_t* ep,
  struct mca_btl_base_descriptor_t* des,


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Pavel Shamis (Pasha)


I got some more feedback from Roland off-list explaining that if /sys/ 
class/infiniband does exist and is non-empty and /sys/class/ 
infiniband_verbs/abi_version does not exist, then this is definitely a  
case where we want to warn because it implies that config is screwed  
up -- RDMA devices are present but not usable.
  
Is it possible that /sys/class/infiniband directory exist and it is 
empty ? In which cases ?


Re: [OMPI devel] New HCA vendor part ID

2008-05-26 Thread Pavel Shamis (Pasha)


it seems that Open MPI doen't have HCA parameters for the following 
InfiniBand adapter

which using the compute nodes of our SGI Altix ICE 8200EX:

Mellanox Technologies MT26418 ConnectX

I assume that the vendor part ID 26418 should be listed in the section 
"Mellanox Hermon"

of the HCA parameter file. Is that correct?
Thanks for pointing it out. I fixed on the trunk 
https://svn.open-mpi.org/trac/ompi/changeset/18495

Regards,
Pasha



Matthias

--
Matthias Jurenz,
Center for Information Services and
High Performance Computing (ZIH), TU Dresden,
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Memory hooks stuff

2008-05-25 Thread Pavel Shamis (Pasha)

Jeff Squyres wrote:
Who would be interested in discussing this stuff?  (me, Brian, ? 
someone from Sun?, ...?)
  

Me.



Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Pavel Shamis (Pasha)




1. Driver doesn't support the HCA - If I remember correct , RH40 by 
default doesn't support ConnectX hca . The device_list will be empty. 
It is very exotic case.

2. Driver version doesn't correspond with fw version
3. FW was broken
4. Driver was broken and failed to start - it is not very exotic case 
too. Some times user make some modification - upgrade/install/etc.. 
and it brakes driver.


 In such cases, the ibv_devinfo(1) and ibv_devices(1) commands would 
show the same error.

Yep these utilities will show the same error.

Cases 1-2-3 we may cover pretty simple. OPENIB driver creates 
"/dev/infiniband" during his startup. So if  /dev/infiniband exists 
and  _get_device_list() is empty we may print warning.


Ok, that seems reasonable.


I don't know how we can cover case 4 :-(


If the user makes modifications to the driver and breaks it, I don't 
think we can be held responsible for that -- prudence declares that 
you should verify that your [self-modified] driver is not broken first 
before blaming Open MPI.  I'm not that concerned about #4; most of my 
customers do not modify the drivers.

Agree about #4.

The check for /dev/infiniband should be simple and I think we can add it 
to 1.3 .


Pasha.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Pavel Shamis (Pasha)





I'm not sure I follow this logic -- can you explain more?

Sure


Why does this only apply to binary distribution?  If libibverbs is 
installed by default, then OMPI will still build the openib BTL (and 
therefore warn if it's not used).  Granted, some distros will only 
install libibverbs if either explicitly or implicitly requested (e.g., 
via dependency).  What if some other dependency pulls in libibverbs, 
even if OMPI was built from a source tarball?
My point is that it is not correct to say that libibverbs installed by 
defaults on all Linuxes and all users that will install ompi will see 
this problem.
It possible that libibverbs maybe installed somehow implicitly as 
dependency package. But usually (not always) it will be installed as 
part of some IB native (openib only !) application.


If user will decide to upgrade his ompi +  libibverb rpm/deb package 
install , he will be need to do a lot of other "annoying" steps, like: 
source code download, installing all required *-dev.rpm , compilation.
And I guess that disabling the defaults warning messages will be 
simplest step on all the way :-)


I don't want to say that current solution is best one.
But I would like to find something better than disabling the warning by 
default only for openib.




Let me ask another question: is it common to have the verbs stack / 
hardware so hosed up that ibv_get_device_list() returns an empty list 
when there really is a device there?  My assumption is that this is 
quite uncommon; that ibv_get_device_list() will usually return that 
there *are* devices and errors show up later during initialization, 
etc.  Never say "never", of course; I'm sure that there are degenerate 
corner cases where a badly hosed device will cause 
ibv_get_device_list() to return an empty list -- but I'm assuming that 
those cases are very few and far between.

I can not say that it is very uncommon case.
For example:
1. Driver doesn't support the HCA - If I remember correct , RH40 by 
default doesn't support ConnectX hca . The device_list will be empty. It 
is very exotic case.

2. Driver version doesn't correspond with fw version
3. FW was broken
4. Driver was broken and failed to start - it is not very exotic case 
too. Some times user make some modification - upgrade/install/etc.. and 
it brakes driver.


  In such cases, the ibv_devinfo(1) and ibv_devices(1) commands would 
show the same error.

Yep these utilities will show the same error.

Cases 1-2-3 we may cover pretty simple. OPENIB driver creates 
"/dev/infiniband" during his startup. So if  /dev/infiniband exists and  
_get_device_list() is empty we may print warning.

I don't know how we can cover case 4 :-(

BTW I think that problem is relevant for all BTLs and not only openib 
and may be we need look for some global solution.


Pasha.


Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Pavel Shamis (Pasha)


Brian and I chatted a bit about this off-list, and I think we're in  
agreement now:


- do not change the default value or meaning of  
btl_base_want_component_unsed.


- major point of confusion: the openib BTL is actually fairly unique  
in that it can (and does) tell the difference between "there are no  
devices present" and "there are devices, but something went wrong".   
Other BTL's have network interfaces that can't tell the difference and  
can *only* call the no_nics function, regardless of whether there are  
no relevant network interfaces or some error occurred during  
initialization.


- so a reasonable solution would be an openib-BTL-specific mechanism  
that doesn't call the no_nics function (to display that  
btl_base_want_component_unused) if there are no verbs-capable devices  
found because of the fact that mainline Linuxes are starting to ship  
libibverbs.  Specific mechanism TBD; likely to be an openib MCA param.
  
Ok, we will have own warning mechanism. But we still open question, Will 
we show (by default) error message in case

when libibverbs exists but it is no hca in the hca_list ?
I think we should show the error. The problem of libibverbs default 
install is relevant only  for
binary distribution, that install all ompi dependences with ompi 
package. In this case
distribution will have openib mca parameter that will allow to disable 
by default the warning message

during ompi package install (or build).
I guess that most people still install ompi from sources. And in this 
case it sound reasonable for me

to print this "no hca"  warning it openib btl was build.

Pasha



On May 21, 2008, at 9:56 PM, Jeff Squyres wrote:

  

On May 21, 2008, at 5:02 PM, Brian W. Barrett wrote:



If this is true (for some reason I thought it wasn't), then I think
we'd
actually be ok with your proposal, but you're right, you'd need
something
new in the IB btl.  I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params.  I'm not sure the same is true for a
simple
delivered from the white box vendor IB setup that barely works on a
good
day (and unfortunately, there seems to be evidence that these exist).
  

I'm not sure I understand what you're saying -- you agree, but what
"new" do you think we need in the openib BTL?  The MCA params saying
which ports you expect to be ACTIVE?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
As I know only Openib kernel drivers is installed by default with 
distribution.
But the user level - libibverbs and other openib stuff is not installed 
by default. User need go to the package manager and explicitly
select libibverb.  So if user decided to install libibverbs he had 
reasons for it, and I think it will be ok to show this warning.


Pasha.

Jeff Squyres wrote:

On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:

  
I think having a parameter to turn off the warning is a great idea.   
So
great in fact, that it already exists in the trunk and v1.2 :)!   
Setting
the default value for the btl_base_warn_component_unused flag from 0  
to 1

will have the desired effect.



Ah, ok.  I either didn't know about this flag or forgot about it.  :-)

I just tested this myself and see that there are actually *two* error  
messages (on a machine where I installed libibverbs, but with no  
OpenFabrics hardware, with OMPI 1.2.6):


% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--

So the MCA param takes care of the OMPI message; I'll contact the  
libibverbs authors about their message.


  
I'm not sure I agree with setting the default to 0, however.  The  
warning
has proven extremely useful for diagnosing that IB (or less often GM  
or
MX) isn't properly configured on a compute node due to some random  
error.

It's trivially easy for any packaging group to have the line

  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install  
phase of

the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.



I guess that this is what I am torn about.  Yes, it's a useful message  
-- in some cases.  But now that libibverbs is shipping in Debain and  
other Linuxes, the number of machines out there with verbs-capable  
hardware is far, far smaller than the number of machines without verbs- 
capable hardware.  Specifically:


1. The number of cases where seeing the message by default is *not*  
useful is now potentially [much] larger than the number of cases where  
the default message is useful.


2. An out-of-the-box "mpirun a.out" will print warning messages in  
perfectly valid/good configurations (no verbs-capable hardware, but  
just happen to have libibverbs installed).  This is a Big Deal.


3. Problems with HCA hardware and/or verbs stack are uncommon  
(nowadays).  I'd be ok asking someone to enable a debug flag to get  
more information on configuration problems or hardware faults.


Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with  
libibverbs installed must also have verbs-capable hardware.


  




Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
I'm agree with Brian. We may add to the warning message detailed 
description how to disable it.


Pasha

Brian W. Barrett wrote:
I think having a parameter to turn off the warning is a great idea.  So 
great in fact, that it already exists in the trunk and v1.2 :)!  Setting 
the default value for the btl_base_warn_component_unused flag from 0 to 1 
will have the desired effect.


I'm not sure I agree with setting the default to 0, however.  The warning 
has proven extremely useful for diagnosing that IB (or less often GM or 
MX) isn't properly configured on a compute node due to some random error. 
It's trivially easy for any packaging group to have the line


   btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install phase of 
the package build (indeed, our simple build scripts at LANL used to do 
this on a regular bases due to our need to tweek the OOB to keep IPoIB 
happier at scale).


I think keeping the Debian guys happy is a good thing.  Giving them an 
easy way to turn off silly warnings is a good thing.  Removing a known 
useful warning to help them doesn't seem like a good thing.



Brian


On Wed, 21 May 2008, Jeff Squyres wrote:

  

What: Change default in openib BTL to not complain if no OpenFabrics
devices are found

Why: Many linuxes are shipping libibverbs these days, but most users
still don't have OpenFabrics hardware

Where: btl_openib_component.c

When: For v1.3

Timeout: Teleconf, 27 May 2008

Short version
=

Many major linuxes are shipping libibverbs by default these days.
OMPI will therefore build the openib BTL by default, but then
complains at run time when there's no OpenFabrics hardware.

We should change the default in v1.3 to not complain if there is no
OpenFabrics devices found (perhaps have an MCA param to enable the
warning if desired).

Longer version
==

I just got a request from the Debian Open MPI package maintainers to
include the following in the default openmpi-mca-params.conf for the
OMPI v1.2 package:

# Disable the use of InfiniBand
#   btl = ^openib

Having this in the openmpi-mca-params.conf gives Debian an easy
documentation path for users to shut up these warnings when they build
on machines with libibverbs present but no OpenFabrics hardware.

I think that this is fine for the v1.2 series (and will file a CMR for
it).  But for v1.3, I think we should change the default.

The vast majority of users will not have OpenFabrics devices, and we
should therefore not complain if we can't find any at run-time.  We
can/should still complain if we find OpenFabrics devices but no active
ports (i.e., don't change this behavior).

But for optimizing the common case: I think we should (by default) not
print a warning if no OpenFabrics devices are found.  We can also
[easily] have an MCA parameter that *will* display a warning if no
OpenFabrics devices are found.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] Threaded progress for CPCs

2008-05-20 Thread Pavel Shamis (Pasha)


Is it possible to have sane SRQ implementation without HW flow  
control?



It seems pretty unlikely if the only available HW flow control is to  
terminate the connection.  ;-)


  

Even if we can get the iWARP semantics to work, this feels kinda
icky.  Perhaps I'm overreacting and this isn't a problem that needs  
to

be fixed -- after all, this situation is no different than what
happens after the initial connection, but it still feels icky.
  
What is so icky about it? Sender is faster than a receiver so flow  
control

kicks in.



My point is that we have no real flow control for SRQ.

  

2. The CM progress thread posts its own receive buffers when creating
a QP (which is a necessary step in both CMs).  However, this is
problematic in two cases:

  

[skip]

I don't like 1,2 and 3. :(



4. Have a separate mpool for drawing initial receive buffers for the
CM-posted RQs.  We'd probably want this mpool to be always empty (or
close to empty) -- it's ok to be slow to allocate / register more
memory when a new connection request arrives.  The memory obtained
from this mpool should be able to be returned to the "main" mpool
after it is consumed.
  

This is slightly better, but still...



Agreed; my reactions were pretty much the same as yours.

  

5. ...?
  

What about moving posting of receive buffers into main thread. With
SRQ it is easy: don't post anything in CPC thread. Main thread will
prepost buffers automatically after first fragment received on the
endpoint (in btl_openib_handle_incoming()). With PPRQ it's more
complicated. What if we'll prepost dummy buffers (not from free list)
during IBCM connection stage and will run another three way handshake
protocol using those buffers, but from the main thread. We will need  
to
prepost one buffer on the active side and two buffers on the passive  
side.




This is probably the most viable alternative -- it would be easiest if  
we did this for all CPC's, not just for IBCM:


- for PPRQ: CPCs only post a small number of receive buffers, suitable  
for another handshake that will run in the upper-level openib BTL
- for SRQ: CPCs don't post anything (because the SRQ already "belongs"  
to the upper level openib BTL)
  
Currently I Iwarp do not have SRQ at and and IMHO the SRQ in not 
possible without HW flow control

So lets resolve the problem only for PPRQ  ?

Do we have a BSRQ restriction that there *must* be at least one PPRQ?   
  

No it is not such restriction.
If so, we could always run the upper-level openib BTL really-post-the- 
buffers handshake over the smallest buffer size BSRQ RC PPRQ (i.e.,  
have the CPC post a single receive on this QP -- see below), which  
would make things much easier.  If we don't already have this  
restriction, would we mind adding it?  We have one PPRQ in our default  
receive_queues value, anyway.
  

I don't see such reason to add such restrictions, at least for IB.
We may add it for Iwarp only (actually we already have it for Iwarp)



Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)




What about moving posting of receive buffers into main thread. With
SRQ it is easy: don't post anything in CPC thread. Main thread will
prepost buffers automatically after first fragment received on the
endpoint (in btl_openib_handle_incoming()).   
It still doesn't guaranty that we will not see RNR (as I understand 
we trying to resolve this problem  for iwarp?!)




I don't think that iwarp has SRQ at all. And if it has then it should
have HW flow control for it too. I don't see what advantage SRQ without
flow control can provide over PPRQ.   

I'm agree that HW flow it is no reason for SRQ.
 
So this solution will cost 1 buffer on each srq ... sounds 
acceptable for me. But I don't see too much
difference compared to #1, as I understand  we anyway will be need 
the pipe for communication with main thread.

so why don't use #1 ?


What communication? No communication at all. Just don't prepost buffers
to SRQ during connection establishment. Problem solved (only for SRQ of
cause).  
As i know Jeff use the pipe for some status update (Jeff, please correct 
me if  I wrong).

If we still need pipe for communication , I prefer #1.
If we don't have the pipe , I prefer your solution

Pasha



Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)


1. When CM progress thread completes an incoming connection, it sends  
a command down a pipe to the main thread indicating that a new  
endpoint is ready to use.  The pipe message will be noticed by  
opal_progress() in the main thread and will run a function to do all  
necessary housekeeping (sets the endpoint state to CONNECTED, etc.).   
But it is possible that the receiver process won't dip into the MPI  
layer for a long time (and therefore not call opal_progress and the  
housekeeping function).  Therefore, it is possible that with an active  
sender and a slow receiver, the sender can overwhelm an SRQ.  On IB,  
this will just generate RNRs and be ok (we configure SRQs to have  
infinite RNRs), but I don't understand the semantics of what will  
happen on iWARP (it may terminate?  I sent an off-list question to  
Steve Wise to ask for detail -- we may have other issues with SRQ on  
iWARP if this is the case, but let's skip that discussion for now).




Is it possible to have sane SRQ implementation without HW flow control?
Anyway the described problem exists with SRQ right now too. If receiver
doesn't enter progress for a long time sender can overwhelm an SRQ.
I don't see how this can be fixed without progress thread (and I am not
even sure that this is the problem that has to be fixed).
  
It may be resolved particularly by srq_limit_event (this event is 
generated when number posted receive buffer come down under predefined 
watermark )
But I'm not sure that we want to move the RNR problem from sender side 
to receiver.


The full solution will be progress thread + srq_limit_event.

  
Even if we can get the iWARP semantics to work, this feels kinda  
icky.  Perhaps I'm overreacting and this isn't a problem that needs to  
be fixed -- after all, this situation is no different than what  
happens after the initial connection, but it still feels icky.


What is so icky about it? Sender is faster than a receiver so flow control
kicks in.

  
2. The CM progress thread posts its own receive buffers when creating  
a QP (which is a necessary step in both CMs).  However, this is  
problematic in two cases:




[skip]
 
I don't like 1,2 and 3. :(
  

If Iwarp may handle RNR , #1 - sounds ok for me, at least for 1.3.
  

4. Have a separate mpool for drawing initial receive buffers for the
CM-posted RQs.  We'd probably want this mpool to be always empty (or
close to empty) -- it's ok to be slow to allocate / register more
memory when a new connection request arrives.  The memory obtained
from this mpool should be able to be returned to the "main" mpool
after it is consumed.



This is slightly better, but still...

  

5. ...?


What about moving posting of receive buffers into main thread. With
SRQ it is easy: don't post anything in CPC thread. Main thread will
prepost buffers automatically after first fragment received on the
endpoint (in btl_openib_handle_incoming()). 
It still doesn't guaranty that we will not see RNR (as I understand we 
trying to resolve this problem  for iwarp?!)


So this solution will cost 1 buffer on each srq ... sounds acceptable 
for me. But I don't see too much
difference compared to #1, as I understand  we anyway will be need the 
pipe for communication with main thread.

so why don't use #1 ?

With PPRQ it's more
complicated. What if we'll prepost dummy buffers (not from free list)
during IBCM connection stage and will run another three way handshake
protocol using those buffers, but from the main thread. We will need to
prepost one buffer on the active side and two buffers on the passive side.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




Re: [OMPI devel] heterogeneous OpenFabrics adapters

2008-05-13 Thread Pavel Shamis (Pasha)

Jeff,
Your proposal for 1.3 sounds ok for me.

For 1.4 we need review this point again. The qp information is split 
over 3 different structs:
mca_btl_openub_module_qp_t (used by module), mca_btl_openib_qp_t (used 
by component) and mca_btl_openib_endpoint_qp_t (used by endpoint).
Need see how we will resolve the issue for each of them. Lets put it to 
1.4 todo list.


Pasha.


Jeff Squyres wrote:

Short version:
--

I propose that we should disallow multiple different  
mca_btl_openib_receive_queues values (or receive_queues values from  
the INI file) to be used in a single MPI job for the v1.3 series.


More details:
-

The reason I'm looking into this heterogeneity stuff is to help  
Chelsio support their iWARP NIC in OMPI.  Their NIC needs a specific  
value for mca_btl_openib_receive_queues (specifically: it does not  
support SRQ and it has the wireup race condition that we discussed  
before).


The major problem is that all the BSRQ information is currently stored  
in on the openib component -- it is *not* maintained on a per-HCA (or  
per port) basis.  We *could* move all the BSRQ info to live on the  
hca_t struct (or even the openib module struct), but it has at least 3  
big consequences:


1. It would touch a lot of code.  But touching all this code is  
relatively low risk; it will be easy to check for correctness because  
the changes will either compile or not.


2. There are functions (some of which are static inline) that read the  
BSRQ data.  These functions would have to take an additional (hca_t*)  
(or (btl_openib_module_t*)) parameter.


3. Getting to the BSRQ info will take at least 1 or 2 more  
dereferences (e.g., module->hca->bsrq_info or module->bsrq_info...).


I'm not too concerned about #1 (it's grunt work), but I am a bit  
concerned about #2 and #3 since at least some of these places are in  
the critical performance path.


Given these concerns, I propose the following v1.3:

- Add a "receive_queues" field to the INI file so that the Chelsio  
adapter can run out of the box (i.e., "mpirun -np 4 a.out" with hosts  
containing Chelsio NICs will get a value for btl_openib_receive_queues  
that will work).


- NetEffect NICs will also require overriding  
btl_openib_receive_queues, but will likely have a different value than  
Chelsio NICs (they don't have the wireup race condition that Chelsio  
does).


- Because the BSRQ info is on the component (i.e., global), we should  
detect when multiple different receive_queues values are specified and  
gracefully abort.


I think it'll be quite uncommon to have a need for two different  
receive_queues values, and that this proposal will be fine for v1.3


Comments?



On May 12, 2008, at 6:44 PM, Jeff Squyres wrote:

  

After looking at the code a bit, I realized that I completely forgot
that the INI file was invented to solve at least the heterogeneous-
adapters-in-a-host problem.

So I amended the ticket to reflect that that problem is already
solved.  The other part is not, though -- consider two MPI procs on
different hosts, each with an iWARP NIC, but one NIC supports SRQs and
one does not.


On May 12, 2008, at 5:36 PM, Jeff Squyres wrote:



I think that this issue has come up before, but I filed a ticket
about it because at least one developer (Jon) has a system with both
IB and iWARP adapters:

  https://svn.open-mpi.org/trac/ompi/ticket/1282

My question: do we care about the heterogeneous adapter scenarios?
For v1.3?  For v1.4?  For ...some version in the future?

I think the first issue I identified in the ticket is grunt work to
fix (annoying and tedious, but not difficult), but the second one
will be a little dicey -- it has scalability issues (e.g., sending
around all info in the modex, etc.).

--
Jeff Squyres
Cisco Systems

  

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Pavel Shamis (Pasha)

The trivial tests Pass and now I'm running full testing.
In the NOT_XRC tests i got:

libibcm: unable to open /dev/infiniband/ucm0
libibcm: couldn't read ABI version

But the test PASS successfully. So as I understood it use OOB. Can we 
prevent the message somehow ?


Jeff Squyres wrote:
Thanks!  That's a result of some [helpful] error messages and handling 
that I added yesterday when ibcm is not configured on the host.


Fixed in r18273.


On Apr 24, 2008, at 8:22 AM, Pavel Shamis (Pasha) wrote:


The patch below resolves the segfault :

-- btl_openib_connect_ibcm.c.orig  2008-04-24 15:14:28.500676000 
+0300

+++ btl_openib_connect_ibcm.c   2008-04-24 15:15:08.961168000 +0300
@@ -328,7 +328,7 @@
{
   int rc;
   modex_msg_t *msg;
-ibcm_module_t *m;
+ibcm_module_t *m = NULL;
   opal_list_item_t *item;
   ibcm_listen_cm_id_t *cmh;
   ibcm_module_list_item_t *imli;


Jeff Squyres wrote:
I had a linker error with the rdmacm library yesterday that I fixed 
later, sorry.


Could you try it again?  You'll need to svn up, re-autogen, etc.  It 
should be obvious whether I fixed it -- even trivial apps will work 
or not work.


Thanks.


On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote:


On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha) wrote:

Jeff,
All my tests fail.
XRC disabled tests failed with:
mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so: undefined
symbol: rdma_create_event_channel
XRC enabled failed with segfault , I will take a look later today.

Well it is a little bit better for me. I compiled only OOB connection
manager and ompi passes simple testing.



Pasha

Jeff Squyres wrote:

As we discussed yesterday, I have started the merge from the /tmp-
public/openib-cpc2 branch.  "oob" is currently the default.

Unfortunately, it caused quite a few conflicts when I merged with 
the
trunk, so I created a new temp branch and put all the work there: 
/tmp-

public/openib-cpc3.

Could all the IB and iWARP vendors and any other interested parties
please try this branch before we bring it back to the trunk?  Please
test all functionality that you care about -- XRC, etc.  I'd like to
bring it back to the trunk COB Thursday.  Please let me know if this
is too soon.

You can force the selection of a different CPC with the
btl_openib_cpc_include MCA param:

   mpirun --mca btl_openib_cpc_include oob ...
   mpirun --mca btl_openib_cpc_include xoob ...
   mpirun --mca btl_openib_cpc_include rdma_cm ...
   mpirun --mca btl_openib_cpc_include ibcm ...

You might want to concentrate on testing oob and xoob to ensure that
we didn't cause any regressions.  The ibcm and rdma_cm CPCs probably
still have some rough edges (and the IBCM package in OFED itself may
not be 100% -- that's one of the things we're evaluating.  It's 
known

to not install properly on RHEL4U4, for example -- you have to
manually mknod and chmod a device in /dev/infiniband for every 
HCA in

the host).

Thanks.





--
Pavel Shamis (Pasha)
Mellanox Technologies

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
   Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
Pavel Shamis (Pasha)
Mellanox Technologies







--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Pavel Shamis (Pasha)

The patch below resolves the segfault :

-- btl_openib_connect_ibcm.c.orig  2008-04-24 15:14:28.500676000 +0300
+++ btl_openib_connect_ibcm.c   2008-04-24 15:15:08.961168000 +0300
@@ -328,7 +328,7 @@
{
int rc;
modex_msg_t *msg;
-ibcm_module_t *m;
+ibcm_module_t *m = NULL;
opal_list_item_t *item;
ibcm_listen_cm_id_t *cmh;
ibcm_module_list_item_t *imli;


Jeff Squyres wrote:
I had a linker error with the rdmacm library yesterday that I fixed 
later, sorry.


Could you try it again?  You'll need to svn up, re-autogen, etc.  It 
should be obvious whether I fixed it -- even trivial apps will work or 
not work.


Thanks.


On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote:


On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha) wrote:

Jeff,
All my tests fail.
XRC disabled tests failed with:
mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so: undefined
symbol: rdma_create_event_channel
XRC enabled failed with segfault , I will take a look later today.

Well it is a little bit better for me. I compiled only OOB connection
manager and ompi passes simple testing.



Pasha

Jeff Squyres wrote:

As we discussed yesterday, I have started the merge from the /tmp-
public/openib-cpc2 branch.  "oob" is currently the default.

Unfortunately, it caused quite a few conflicts when I merged with the
trunk, so I created a new temp branch and put all the work there: 
/tmp-

public/openib-cpc3.

Could all the IB and iWARP vendors and any other interested parties
please try this branch before we bring it back to the trunk?  Please
test all functionality that you care about -- XRC, etc.  I'd like to
bring it back to the trunk COB Thursday.  Please let me know if this
is too soon.

You can force the selection of a different CPC with the
btl_openib_cpc_include MCA param:

mpirun --mca btl_openib_cpc_include oob ...
mpirun --mca btl_openib_cpc_include xoob ...
mpirun --mca btl_openib_cpc_include rdma_cm ...
mpirun --mca btl_openib_cpc_include ibcm ...

You might want to concentrate on testing oob and xoob to ensure that
we didn't cause any regressions.  The ibcm and rdma_cm CPCs probably
still have some rough edges (and the IBCM package in OFED itself may
not be 100% -- that's one of the things we're evaluating.  It's known
to not install properly on RHEL4U4, for example -- you have to
manually mknod and chmod a device in /dev/infiniband for every HCA in
the host).

Thanks.





--
Pavel Shamis (Pasha)
Mellanox Technologies

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Pavel Shamis (Pasha)

Jeff,
All my tests fail.
XRC disabled tests failed with:
mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so: undefined 
symbol: rdma_create_event_channel

XRC enabled failed with segfault , I will take a look later today.

Pasha

Jeff Squyres wrote:
As we discussed yesterday, I have started the merge from the /tmp- 
public/openib-cpc2 branch.  "oob" is currently the default.


Unfortunately, it caused quite a few conflicts when I merged with the  
trunk, so I created a new temp branch and put all the work there: /tmp- 
public/openib-cpc3.


Could all the IB and iWARP vendors and any other interested parties  
please try this branch before we bring it back to the trunk?  Please  
test all functionality that you care about -- XRC, etc.  I'd like to  
bring it back to the trunk COB Thursday.  Please let me know if this  
is too soon.


You can force the selection of a different CPC with the  
btl_openib_cpc_include MCA param:


 mpirun --mca btl_openib_cpc_include oob ...
 mpirun --mca btl_openib_cpc_include xoob ...
 mpirun --mca btl_openib_cpc_include rdma_cm ...
 mpirun --mca btl_openib_cpc_include ibcm ...

You might want to concentrate on testing oob and xoob to ensure that  
we didn't cause any regressions.  The ibcm and rdma_cm CPCs probably  
still have some rough edges (and the IBCM package in OFED itself may  
not be 100% -- that's one of the things we're evaluating.  It's known  
to not install properly on RHEL4U4, for example -- you have to  
manually mknod and chmod a device in /dev/infiniband for every HCA in  
the host).


Thanks.

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Trunk launch scaling

2008-04-02 Thread Pavel Shamis (Pasha)

Ralph,
If you plan to compare OMPI to Mvapich, make sure to take the version 
1.0.0 (or above). In 1.0.0 OSU introduced

new launcher that works much faster than previous one.

Regards,
Pasha

Ralph H Castain wrote:

Per this morning's telecon, I have added the latest scaling test results to
the wiki:

https://svn.open-mpi.org/trac/ompi/wiki/ORTEScalabilityTesting

As you will see upon review, the trunk is scaling about an order of
magnitude better than 1.2.x, both in terms of sheer speed and in the
strength of the non-linear components of the scaling law. Those of us
working on scaling issues expect to make additional improvements over the
next few weeks.

Update results will be posted to the wiki as they become available.

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Setting CQ depth

2008-02-26 Thread Pavel Shamis (Pasha)

Please apply. It should be cqe.
Pasha.

Jon Mason wrote:

A quick sanity check.

When setting the cq depth in the openib btl, it checks the calculated
depth against the maxmium cq depth allowed and sets the minimum of those
two.  However, I think it is checking the wrong variable.  If I
understand correctly, ib_dev_attr.max_cq represents the maximum number
of cqs while ib_dev_attr.max_cqe represents the max depth allowed in
each individual cq.  Is this correct?

If the above is true, then I'll apply the patch below.

Thanks,
Jon

Index: ompi/mca/btl/openib/btl_openib.c
===
--- ompi/mca/btl/openib/btl_openib.c(revision 17472)
+++ ompi/mca/btl/openib/btl_openib.c(working copy)
@@ -140,8 +140,8 @@
  if(cq_size < mca_btl_openib_component.ib_cq_size[cq])
 cq_size = mca_btl_openib_component.ib_cq_size[cq];

-if(cq_size > (uint32_t)hca->ib_dev_attr.max_cq)
-cq_size = hca->ib_dev_attr.max_cq;
+if(cq_size > (uint32_t)hca->ib_dev_attr.max_cqe)
+cq_size = hca->ib_dev_attr.max_cqe;

 if(NULL == hca->ib_cq[cq]) {
 hca->ib_cq[cq] = ibv_create_cq_compat(hca->ib_dev_context, cq_size,
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-21 Thread Pavel Shamis (Pasha)

Brad,
APM code was committed to trunk.
So you may mark it as done.

Thanks,
Pasha.

Brad Benton wrote:

All:

The latest scrub of the 1.3 release schedule and contents is ready for 
review and comment.  Please use the following links:
  1.3 milestones:  
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3
  1.3.1 milestones: 
https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3.1


In order to try and keep the dates for 1.3 in, I've pushed a bunch of 
stuff (particularly ORTE things) to 1.3.1.  Even though there will be 
new functionality slated for 1.3.1, the goal is to not have any 
interface changes between the phases.


Please look over the list and schedules and let me or my fellow 1.3 
co-release manager George Bosilca (bosi...@eecs.utk.edu 
<mailto:bosi...@eecs.utk.edu>) know of any issues, errors, 
suggestions, omissions, heartburn, etc.


Thanks,
--Brad

Brad Benton
IBM


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] open ib btl and xrc

2008-01-20 Thread Pavel Shamis (Pasha)


It easy to see the benefit of fewer qps (per node instead of per peer) 
and less consumption of resources the better but I am curious about 
the actual percentage of memory footprint decrease. I am thinking that 
the largest portion of the footprint comes from the fragments.
BTW here is link to another paper 
http://www.cs.sandia.gov/~rbbrigh/papers/ompi-ib-pvmmpi07.pdf

that talks about more efficient usage of receive buffers.
Do you have any numbers showing the actual memory footprint savings 
when using xrc?

I don't have.

Pasha.


-DON

Pavel Shamis (Pasha) wrote:
Here is paper from openib 
http://www.openib.org/archives/nov2007sc/XRC.pdf
and here is mvapich presentation 
http://mvapich.cse.ohio-state.edu/publications/ofa_nov07-mvapich-xrc.pdf


Button line: XRC decrease number of QPs that ompi opens and as result 
decrease ompi's memory footprint.
In the openib paper you may see more details about XRC. If you need 
more details about XRC implemention

in openib blt , please let me know.


Instead Don Kerr wrote:
 

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of 
xrc, maybe point me to a paper, or both?


TIA
-DON

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



  





--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] open ib btl and xrc

2008-01-17 Thread Pavel Shamis (Pasha)

Here is paper from openib http://www.openib.org/archives/nov2007sc/XRC.pdf
and here is mvapich presentation 
http://mvapich.cse.ohio-state.edu/publications/ofa_nov07-mvapich-xrc.pdf


Button line: XRC decrease number of QPs that ompi opens and as result 
decrease ompi's memory footprint.
In the openib paper you may see more details about XRC. If you need more 
details about XRC implemention

in openib blt , please let me know.


Instead 
Don Kerr wrote:

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of xrc, 
maybe point me to a paper, or both?


TIA
-DON

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] Open IB BTL development question

2008-01-17 Thread Pavel Shamis (Pasha)

I plan to add IB APM support  (not something specific to OFED)

Don Kerr wrote:
Looking at the list of new features for OFED 1.3 and seeing that support 
for XRC went into the trunk I am curious if support for additional OFED 
1.3 features will be included, or plan to be included in Open MPI? 

I am looking at the list of features here: 
http://64.233.167.104/search?q=cache:RXXOrY36QHcJ:www.openib.org/archives/nov2007sc/OFED%25201.3%2520status.ppt+ofed+1.3+feature&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a
but I do not have any specific feature in mind, just wanted to get an 
idea what others are planning.


Thanks
-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



Re: [OMPI devel] [PATCH] openib btl: extensable cpc selection enablement

2008-01-14 Thread Pavel Shamis (Pasha)

Jon Mason wrote:
  

I have few machines with connectX and i will try to run MTT on Sunday.



Awesome!  I appreciate it.

  
After fixing the compilation problem in XRC part of code I was able to 
run mtt. Most of the test pass and one test
failed: mpi2c++_dynamics_test. The test pass without XRC. But I also see 
the test failed in trunk. Last time that is working is 1.3a1r17085

strange
Pasha.


Re: [OMPI devel] [PATCH] openib btl: extensable cpc selection enablement

2008-01-10 Thread Pavel Shamis (Pasha)

Jon Mason wrote:

The new cpc selection framework is now in place.  The patch below allows
for dynamic selection of cpc methods based on what is available.  It
also allows for inclusion/exclusions of methods.  It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.

This patch also contains XRC compile time disablement (per Jeff's
patch).
  
Need make sure that CM stuff will be disabled during compile tine if it 
is not installed on machine.

At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun.  It returns a priority if it is permissable or a -1 if not.  All
of the cpc names and priorities are rolled into a string.  This string
is then encapsulated in a message and passed around all the ompi
processes.  Once received and unpacked, the list received is compared
to a local copy of the list.  The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally.  Any non-negative number is a potentially valid
connection method.  The method below of determining the optimal
connection method is to take the cross-section of the two lists.  The
highest single value (and the other side being non-negative) is selected
as the cpc method.

Please test it out.  The tree can be found at
https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/

This patch has been tested with IB and iWARP adapters on a 2 node system
(with it correctly choosing to use oob and happily ignoring iWARP
adapters).  It needs XRC testing and testing of larger node systems.
  

Did you run MTT over all thess changes ?
I have few machines with connectX and i will try to run MTT on Sunday.


--
Pavel Shamis (Pasha)
Mellanox Technologies



  1   2   >