Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-31 Thread Kenneth A. Lloyd
Yeah, I forgot that pure ANSI C doesn't really have namespaces, other than
to fully qualify modules and variables. Bummer.

Makes writing large, maintainable middleware more difficult.

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Kenneth A.
Lloyd
Sent: Thursday, July 31, 2014 6:04 AM
To: 'Open MPI Developers'
Subject: Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs.
mca_FRAMEWORK_COMPONENT_symbol

Doesn't namespacing obviate the need for this convoluted identifier scheme?
See, for example, UML package import and include behaviors.

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Goodell
(dgoodell)
Sent: Wednesday, July 30, 2014 3:35 PM
To: Open MPI Developers
Subject: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs.
mca_FRAMEWORK_COMPONENT_symbol

Jeff and I were talking about some namespacing issues that have come up in
the recent BTL move from OMPI to OPAL.  AFAIK, the current system for
namespacing external symbols is to name them
"mca_FRAMEWORK_COMPONENT_symbol" (e.g., "mca_btl_tcp_add_procs" in the tcp
BTL).  Similarly, the DSO for the component is named
"mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").

Jeff asserted that the eventual goal is to move to a system where all MCA
frameworks/components are also prefixed by the project name.  So the above
examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so".
Does anyone actually care about pursuing this goal?

I ask because if nobody wants to pursue the goal of adding project names to
namespaces then I already have an easy solution to most of our namespacing
problems.  OTOH, if someone does wish to pursue that goal, then I have a
namespace-related RFC that I would like to propose (in a subsequent email).

-Dave

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15371.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7949 - Release Date: 07/30/14

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15392.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7951 - Release Date: 07/30/14



Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-31 Thread Kenneth A. Lloyd
Doesn't namespacing obviate the need for this convoluted identifier scheme?
See, for example, UML package import and include behaviors.

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Goodell
(dgoodell)
Sent: Wednesday, July 30, 2014 3:35 PM
To: Open MPI Developers
Subject: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs.
mca_FRAMEWORK_COMPONENT_symbol

Jeff and I were talking about some namespacing issues that have come up in
the recent BTL move from OMPI to OPAL.  AFAIK, the current system for
namespacing external symbols is to name them
"mca_FRAMEWORK_COMPONENT_symbol" (e.g., "mca_btl_tcp_add_procs" in the tcp
BTL).  Similarly, the DSO for the component is named
"mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").

Jeff asserted that the eventual goal is to move to a system where all MCA
frameworks/components are also prefixed by the project name.  So the above
examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so".
Does anyone actually care about pursuing this goal?

I ask because if nobody wants to pursue the goal of adding project names to
namespaces then I already have an easy solution to most of our namespacing
problems.  OTOH, if someone does wish to pursue that goal, then I have a
namespace-related RFC that I would like to propose (in a subsequent email).

-Dave

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15371.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7949 - Release Date: 07/30/14



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Kenneth A. Lloyd
What about providing garbage collection for both POSIX and MPI threads? This
problem hints at several underlying layers of "programming faults".

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, July 16, 2014 8:59 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function
to opal

I discussed this over IM with Nathan to try and get a better understanding
of the options. Basically, we have two approaches available to us:

1. my solution resolves the segv problem and eliminates leaks so long as the
user calls MPI_Init/Finalize after calling the MPI_T init/finalize
functions. This method will still leak memory if the user doesn't use MPI
after calling the MPI_T functions, but does mean that all memory used by MPI
will be released upon MPI_Finalize. So if the user program continues beyond
MPI, they won't be carrying the MPI memory footprint with them. This
continues our current behavior.

2. the destructor method, which release the MPI memory footprint upon final
program termination instead of at MPI_Finalize. This also solves the segv
and leak problems, and ensures that someone calling only the MPI_T
init/finalize functions will be valgrind-clean, but means that a user
program that runs beyond MPI will carry the MPI memory footprint with them.
This is a change in our current behavior.

I'm not sure which approach is best, but I think this captures the heart of
the decision.


On Jul 16, 2014, at 7:36 AM, Nathan Hjelm  wrote:

> On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote:
>> A number of issues have been raised as part of this discussion. Here 
>> is what I have seen so far:
>> 
>> - contructor/destructor order not garaunteed: From an opal perspective
>>   this should not be a problem. Most components are unloaded by
>>   opal_finalize () not opal_finalize_util (). So opal components
>>   opal should already be finalized by the time the destructor is called
>>   (or we can finalize them in the destructor if necessary).
>> 
>> - portability: All the compilers most of us care about: gcc, intel,
>>   clang. The exceptions appear to be xlc and pgi. For these compilers
>>   we can fall back on Ralph's solution and just leak if
>>   MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
>>   implementation that does that (needs some adjustment).
> 
> Correction. xlc does support the destructor function attribute. The 
> odd one out is PGI.
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15168.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15170.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7863 - Release Date: 07/16/14



Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-03 Thread Kenneth A. Lloyd
Would you consider a user-defined process language library outside of
OpenMPI? Process functors could be defined by compositions in this external
area, and maintenance of the language simply the user's responsibility?

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, April 03, 2014 8:17 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk:
opal/mca/base orte/tools/orterun

I can see the potential utility, but I do have concerns about how to make it
work without causing a lot of user problems:

* as currently implemented, it only affects procs launched via mpirun. This
seems odd - if the user does a direct launch, they would get totally
different behavior? Shouldn't the registering and parsing of this MCA param
follow our usual procedure and be done by the application itself instead of
by orterun?

* Imagine someone has entered this MCA param into the default MCA param
file, and that it includes "foo=car" in it. Now the user sets "foo=baz" in
their environment. How many hours will the user spend tearing out his/her
hair trying to understand why the application behavior isn't as expected
before they finally realize that the default MCA param file is messing with
their non-OMPI envars? Once they finally do figure it out, how do they
"zero" that MCA param so it isn't processed? We don't have a mechanism for
overriding a value with "NULL" - doesn't this option require one?

* again, someone puts an entry in the default MCA param file that includes
"foo=car". The user executes mpirun with "-x foo=baz", which is perfectly
legitimate. What is the precedence rule we use to determine the value of
foo? If we consolidate the two options (as you suggest), then this would be
alleviated - but one is an MCA param and the other a non-MCA cmd-line
option, so we would have to break backward compatibility to resolve it
(which isn't impossible - just worth a discussion).

* assume an entry in the MCA param file that includes multiple envars, one
of which is "foo=car". If the user then puts "-mca env_list foo=baz" on
their cmd line, do we delete all the other envars in the original entry and
only do the new one? Or would someone expect that only the new one would be
modified or added, but the others would remain?

Hence I think this requires some discussion at next week's call. Remember,
by policy, we don't forward non-MCA envars - but now we are forcibly setting
them only in the app procs. Strikes me as a major change in behavior, and
I'm not sure it won't cause more trouble than it solves.


On Apr 3, 2014, at 1:01 AM, Shamis, Pavel  wrote:

> 
>> mca param file treats any key=val as mca parameter only.
>> In order to add parser support for something that is not mca param, will
require change file syntax and it will look bad, i.e.:
>> 
>> mca btl = sm,self,openib
>> env DISPLAY = console:0
>> 
>> I think the current implementation is less intrusive and re-uses existing
infra in the most elegant way.
>> The param file syntax change is too big effort to justify this feature
(IMHO) which can be provided with existing infra w/o breaking anything.
> 
> 
> IMHO this is a useful parameter option to have. If we may consolidate 
> these two parameters (-x and the new one) into single one, it might be
even more helpful.
> 
> Best,
> Pasha.
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14452.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/04/14453.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4355 / Virus Database: 3722/7293 - Release Date: 04/03/14



Re: [OMPI devel] autoconf warnings: openib BTL

2014-03-24 Thread Kenneth A. Lloyd
Vasily,

The problem you've identified of differing kernel versions is exacerbated by
also computing across hybrid,  heterogeneous hardware architectures (i.e.
SMP& NUMA, different streaming processor architectures, or different shared
memory architectures).

======
Kenneth A. Lloyd, Jr.
CEO - Director, Systems Science
Watt Systems Technologies Inc.
Albuquerque, NM USA
www.wattsys.com
kenneth.ll...@wattsys.com

This e-mail is covered by the Electronic Communications Privacy Act, 18
U.S.C. 2510-2521, and is intended only for the addressee named above. It may
contain privileged or confidential information. If you are not the addressee
you must not copy, distribute, disclose or use any of the information in
this transmission. If you received it in error, please delete it and
immediately notify the sender.



-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Vasily Filipov
Sent: Monday, March 24, 2014 7:44 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] autoconf warnings: openib BTL

Actually I think if you build your job with one kernel version and run it on
nodes that have another version so rdmacm will be the smallest your problem.
Anyway, here is the revision fixes the issue.


r31194 | vasily | 2014-03-24 15:36:04 +0200 (Mon, 24 Mar 2014) | 3 lines

BTL/OPENIB: remove AC_RUN_IFELSE from configure and check AF_IB support by
lib rdmacm during component_init.




Thank you,
Vasily.

On 13-Mar-14 15:44, Ralph Castain wrote:
> I think the critical point is this one:
>
>> To be clear: whether AF_IB works or not is a determination to make on the
machines on which you *run* -- NOT on the machine on which you *build*.
> Many of our users compile on the frontend node of their cluster, which
doesn't even have an IB NIC installed (they only have the libraries present
so it can compile). You need to test this at run time to ensure you are on a
machine where someone actually is able to run rdmacm.
>
>
> On Mar 13, 2014, at 5:53 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:
>
>> On Mar 13, 2014, at 4:59 AM, Mike Dubman <mi...@dev.mellanox.co.il>
wrote:
>>
>>>>>> Right?  If so, I don't see why you need the AC_TRY_RUN -- if RDMACM
is easily detectable as to which way it is compiled (because it has, for
example, different fields), then AC_CHECK_DECLS should be enough, right?
>>> RDMACM API has different implementation requirements for its providers:
tcp, af_ib (different structs/fields should be used/passed. different
APIs/hooks should be called for bring-up).
>> Yes, this was said before.  Which is why I don't understand why
AC_CHECK_DECLS isn't enough -- it's a compile-time check, right?
>>
>> Let me get this straight:
>>
>> 1. AF_IB may or may not be present.
>> 2. If AF_IB is present, it may or may not work (i.e., support for AF_IB
may or may not work in the kernel).
>> 3. If AF_IB is present, you can only compile with the AF_IB fields and
methods.
>> 4. If AF_IB is not present, you can only compile with the non-AF_IB
fields and methods.
>>
>> I think #2 is not relevant for configure -- only #1, #3, and #4 are
relevant.  So you should have code something like this:
>>
>> #if HAVE_DECL_AF_IB
>> ret = do_the_stuff_with_af_ib(...);
>> if (OMPI_SUCCESS != ret) {
>> opal_show_help(...AF_IB doesn't work...);
>> return ret;
>> }
>> #else
>> ret = do_the_stuff_without_af_ib(...);
>> if (OMPI_SUCCESS != ret) {
>> opal_show_help(...non-AF_IB doesn't work...);
>> return ret;
>> }
>> #endif
>>
>> To be clear: whether AF_IB works or not is a determination to make on the
machines on which you *run* -- NOT on the machine on which you *build*.
>>
>> This is one of the key reasons that OMPI prefers run-time detection for
run-time characteristics over configure-time detection for run-time
characteristics (because you may run OMPI on different machines than where
you built OMPI).
>>
>>> Currently, the RDMACM provider can be selected at compile time only and
mpirun becomes incompatible to other RDMACM providers.
>> What does mpirun have to do with this?  We're talking about the openib
BTL, right?
>>
>>> AC_TRY_RUN is a protection that selected provider will be able to
run,otherwise no fallback to other provider will be available for user at
runtime.
>> I can't parse this statement...?
>>
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___

Re: [OMPI devel] Open MPI shirts and more

2013-10-23 Thread Kenneth A. Lloyd
+1 - and my family has been notified for the holidays.

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
Sent: Friday, October 18, 2013 1:42 PM
To: Open MPI Developers List
Subject: [OMPI devel] Open MPI shirts and more

OMPI Developer Community --

Per the upcoming 10th anniversary of OMPI's SVN r1 (and our discussion on
the call this past Tuesday), Ralph and I setup a Cafe Press story with a
pre-selected bundle of Open MPI-branded stuff (mostly shirts, but some cups
and other stuff, too):

http://www.cafepress.com/openmpi

We set the markup to $0.00 on all the items, so hypothetically we won't earn
any money from this.  If we do, we'll pick some charity and donate
everything (maybe the EFF?).

What does everyone think?  If the community likes it, we'll publish this on
the user list and a few other places.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4158 / Virus Database: 3609/6739 - Release Date: 10/10/13



Re: [OMPI devel] RFC: Add GPU Direct RDMA support to openib btl

2013-10-08 Thread Kenneth A. Lloyd
+1



From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Tuesday, October 08, 2013 3:05 PM
To: de...@open-mpi.org
Subject: [OMPI devel] RFC: Add GPU Direct RDMA support to openib btl



WHAT: Add GPU Direct RDMA support to openib btl 

WHY: Better latency for small GPU message transfers

WHERE: Several files, see ticket for list

WHEN: Friday,  October 18, 2013 COB 

More detail: 

This RFC looks to make use of GPU Direct RDMA support that is coming in the
future in Mellanox libraries.  With GPU Direct RDMA, we can register GPU
memory with the ibv_reg_mr() calls.  Therefore, we are simply piggy backing
on the large message RDMA support (RGET) that exists in the PML and openib
BTL.  For best performance, we want to use the RGET protocol at small
messages and the switch to a pipeline protocol at larger messages.



To make use of this, we add some extra code paths that are followed when
moving GPU buffers.   If we have the support compiled in, then when we
detect we have a GPU buffer, we use the RGET protocol even for small
messages.   When the messages get larger, we switch to using the regular
pipeline protocol.  There is some other support code that is added as well.
We add a flag to any GPU memory that is registered so we can check for
cuMemAlloc/cuMemFree/cuMemAlloc issues.  Each GPU has a buffer ID associated
with it, so we can ensure that any registrations in the rcache are still
valid.   



To view the changes, go to https://svn.open-mpi.org/trac/ompi/ticket/3836
and click on gdr.diff
 .





  _  

This email message is for the sole use of the intended recipient(s) and may
contain confidential information.  Any unauthorized review, use, disclosure
or distribution is prohibited.  If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the
original message. 

  _  



Re: [OMPI devel] RFC MPI 2.2 Dist_graph addition

2013-06-24 Thread Kenneth A. Lloyd
Thank for making this patch available.

Ken Lloyd

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of George Bosilca
Sent: Monday, June 24, 2013 1:39 PM
To: Open MPI Developers
Subject: [OMPI devel] RFC MPI 2.2 Dist_graph addition

WHAT:Support for MPI 2.2 dist_graph

WHY: To become [almost entierly] MPI 2.2 compliant

WHEN:Monday July 1st

As discussed during the last phone call, a missing functionality of the MPI
2.2 standard (the distributed graph topology) is ready for prime-time. The
attached patch provide a minimal version (no components supporting
reordering), that will complete the topology support in Open MPI.

It is somehow a major change compared with what we had before and it reshape
the way we deal with topologies completely. Where our topologies were mainly
storage components (they were not capable of creating the new communicator
as an example), the new version is built around a [possibly] common
representation (in mca/topo/topo.h), but the functions to attach and
retrieve the topological information are specific to each component. As a
result the ompi_create_cart and ompi_create_graph functions become useless
and have been removed.

In addition to adding the internal infrastructure to manage the topology
information, it updates the MPI interface, and the debuggers support and
provides all Fortran interfaces. From a correctness point of view it passes
all the tests we have in ompi-tests for the cart and graph topology, and
some tests/applications for the dist_graph interface.

I don't think there is a need for a long wait on this one so I would like to
propose a short deadline, a week from now on Monday July 1st. A patch based
on Open MPI trunk r28670 is attached below.

Thanks,
  George.







Re: [OMPI devel] RFC: Remove pml/csum

2013-02-28 Thread Kenneth A. Lloyd
Is that because end-to-end checksums don't match?

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Nathan Hjelm
Sent: Wednesday, February 27, 2013 10:54 AM
To: Open MPI Developers
Subject: [OMPI devel] RFC: Remove pml/csum

What: svn rm the csum PML component.

Why: We will no longer be maintaining this component.

When: Today @ 5:00 PM. This is a notification only for the component's
removal. If anyone cares about it let me know and I will spare it.

Are there any other ob1 clones which should go away? I am finishing the
update to move from mca_base_param_* to mca_base_var_* and the less I have
to do the better.

-Nathan Hjelm
HPC-3, LANL
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2013.0.2899 / Virus Database: 2641/6136 - Release Date: 02/27/13



Re: [OMPI devel] 1.6.4rc1 has been posted

2013-01-17 Thread Kenneth A. Lloyd
Paul,



Have you tried llvm with clang?



Ken



From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Paul Hargrove
Sent: Thursday, January 17, 2013 4:58 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] 1.6.4rc1 has been posted



On Thu, Jan 17, 2013 at 2:26 PM, Paul Hargrove  wrote:

[snip]

The BAD news is a new failure (SEGV in orted at exit) on OpenBSD-5.2/amd64,
which I will report in a separate email once I've completed some triage.

[snip]



You can disregard the "BAD news" above.

Everything was fine with gcc, but fails with llvm-gcc.

Looking deeper (details upon request) the SEGV appears to be caused by a bug
in llvm-gcc.



-Paul



-- 

Paul H. Hargrove  phhargr...@lbl.gov

Future Technologies Group

Computer and Data Sciences Department Tel: +1-510-495-2352

Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] Compiling OpenMPI 1.7 with LLVM clang or llvm-gcc

2013-01-08 Thread Kenneth A. Lloyd
Thanks.  I did, but got it fixed.

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Shamis, Pavel
Sent: Tuesday, January 08, 2013 3:06 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Compiling OpenMPI 1.7 with LLVM clang or llvm-gcc

Ken,

I have no problem to compile OMPI trunk with llvm-gcc-4.2  (os x 10.8)

Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 7, 2013, at 3:49 PM, Kenneth A. Lloyd <kenneth.ll...@wattsys.com>
wrote:

> Has anyone experienced any problems compiling OpenMPI 1/7 with the 
> llvm compiler and C front ends?
> 
> -- Ken
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Compiling OpenMPI 1.7 with LLVM clang or llvm-gcc

2013-01-07 Thread Kenneth A. Lloyd
Has anyone experienced any problems compiling OpenMPI 1/7 with the llvm
compiler and C front ends?

-- Ken



Re: [OMPI devel] Compile-time MPI_Datatype checking

2012-11-01 Thread Kenneth A. Lloyd
Ralph,



Indeed, some of us are using clang (and other llvm front ends) for JIT on
our hetero HPC clusters for amorphous problem spaces.  Obviously, I don't
work for a National Lab. But I do mod/sim/vis for quantum, nano, and
meso-physics.



Just wanted you to be aware.



Ken



From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Paul Hargrove
Sent: Wednesday, October 31, 2012 12:48 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Compile-time MPI_Datatype checking



Note that with Apple's latest versions of Xcode (4.2 and higher, IIRC) Clang
is now the default C compiler.  I am told that Clang is the ONLY bundled
compiler for OSX 10.8 (Mountain Lion) unless you take extra steps to install
gcc (which is actually llvm-gcc and cross-compiles for OSX 10.7).



So, Clang *is* gaining some "market share", though not yet in major HPC
systems.



-Paul



On Wed, Oct 31, 2012 at 11:40 AM, Ralph Castain  wrote:

If it's only on for Clang, I very much doubt anyone will care - I'm unaware
of any of our users that currently utilize that compiler, and certainly not
on the clusters in the national labs (gcc, Intel, etc. - but I've never seen
them use Clang).

Not saying anything negative about Clang - just noting it isn't much used in
our current community that I've heard.


On Oct 31, 2012, at 11:19 AM, Dmitri Gribenko  wrote:

> On Wed, Oct 31, 2012 at 5:04 PM, Jeff Squyres  wrote:
>> On Oct 31, 2012, at 9:38 AM, Dmitri Gribenko wrote:
>>
 The rationale here is that correct MPI applications should not need to
add any extra compiler files to compile without warnings.
>>>
>>> I would disagree with this.  Compiler warnings are most useful when
>>> they are on by default.  Only a few developers will turn on a warning
>>> because warnings are hard to discover and enabling a warning requires
>>> an explicit action from the developer.
>>
>> Understood, but:
>>
>> a) MPI explicitly allows this kind of deliberate mismatch.  It does not
make sense to warn for things that are correct in MPI.
>
> I don't think it is MPI.  It is the C standard that allows one to
> store any pointer in void* and char*.  But C standard also considers
> lots of other weird things to be valid, see below.
>
>> b) Warnings are significantly less useful if the user looks at them and
says, "the compiler is wrong; I know that MPI says that this deliberate
mismatch in my code is ok."
>
> Well, one can also argue that since the following is valid C, the
> warning in question should not be implemented at all:
>
> long *b = malloc(sizeof(int));
> MPI_Recv(b, 1, MPI_INT, ...);
>
> But this code is clearly badly written, so we are left with a question
> about where to draw the line.
>
>> c) as such, these warnings are really only useful for the application
where type/MPI_Datatype matching is expected/desired.
>
> Compilers already warn about valid C code.  Silencing many warnings
> relies on conventions that are derived from best practices of being
> explicit about something unusual.  For example:
>
> $ cat /tmp/aaa.c
> void foo(void *a) {
>  for(int i = a; i < 10; i++)
>  {
>if(i = 5)
>  return;
>  }
> }
> $ clang -fsyntax-only -std=c99 /tmp/aaa.c
> /tmp/aaa.c:2:11: warning: incompatible pointer to integer conversion
> initializing 'int' with an expression of type 'void *'
> [-Wint-conversion]
>  for(int i = a; i < 10; i++)
>  ^   ~
> /tmp/aaa.c:4:10: warning: using the result of an assignment as a
> condition without parentheses [-Wparentheses]
>if(i = 5)
>   ~~^~~
> /tmp/aaa.c:4:10: note: place parentheses around the assignment to
> silence this warning
>if(i = 5)
> ^
>   ()
> /tmp/aaa.c:4:10: note: use '==' to turn this assignment into an
> equality comparison
>if(i = 5)
> ^
> ==
> 2 warnings generated.
>
> According to C standard this is valid C code, but clang emits two
> warnings on this.
>
>> Can these warnings be enabled as part of the warnings rollup -Wall
option?  That would be an easy way to find/enable these warnings.
>
> IIRC, -Wall warning set is frozen in clang.  -Wall is misleading in
> that it does not turn on all warnings implemented in the compiler.
> Clang has -Weverything to really turn on all warnings.  But
> -Weverything is very noisy (by design, not to be fixed) unless one
> also turns off all warnings that are not interesting for the project
> with -Wno-foo.
>
> I don't think it is possible to disable this warning by default
> because off-by-default warnings are discouraged in Clang.  There is no
> formal policy, but the rule of thumb is: either make the warning good
> enough for everyone or don't implement it; if some particular app does
> something strange, it can disable this warning.
>
>>> The pattern you described is an important one, but most MPI
>>> applications will have matching buffer types/type tags.
>>
>> I agree that most applications *probably* don't do 

Re: [OMPI devel] RFC: hostfile setting of #slots

2012-09-02 Thread Kenneth A. Lloyd
I should note that we only virtualize the private cloud / management nodes
over our HPC.  The HPC is not virtualized as such.

Ken

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Kenneth A. Lloyd
Sent: Sunday, September 02, 2012 7:14 AM
To: 'Open MPI Developers'
Subject: Re: [OMPI devel] RFC: hostfile setting of #slots

This is a tricky issue, isn't it?  With the differences between AMD & Intel,
and between base operating systems "touching" & heaps (betw. Linux &
Windows), and various virtual machines schemes- we have opted for an
"outside the main code path" solution to get deterministic results. But that
is as things are now.  Who knows how stuff like AVX2 / memory mapping - or
the next new thing - will affect this?  This is similar to issues we've
found with CPU/GPU memory & affinity mapping over IB.  The basis of the
decision is (as is often) how much do you trust the user to do the right
thing?  What happens if you are wrong?

Only my opinion.

Ken

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Ralph Castain
Sent: Saturday, September 01, 2012 3:54 AM
To: Open MPI Developers
Subject: [OMPI devel] RFC: hostfile setting of #slots

This is not a notice of intended change, but rather a solicitation for
comment.

We currently default the number of slots on a host provided via hostfile or
-host to 1. This is a historical "feature" driven by the fact that our
initial launch system spawned daemons on the remote nodes after we had
already mapped the processes to them - so we had no way of guessing a
reasonable value for the number of slots on any node.

However, the "vm" launch method starts daemons on every node prior to  doing
the mapping, precisely so we can use the topology in the mapping algorithm.
This creates the possibility of setting the number of slots on a node to the
number of cpus (either cores or hardware threads, depending on how that flag
is set) IF it wasn't provided in the hostfile.

This would have an impact on the default "byslot" mapping algorithm. With
only one slot/node, byslot essentially equates to bynode mapping. So a
user-provided hostfile that doesn't give any info on the number of slots on
a node effectively changes the default mapping algorithm to "bynode". This
change would alter that behavior and retain a "byslot" algorithm, with the
number of slots being the number of cpus.

I have a use-case that would benefit from making the change, but can handle
it outside of the main code path. However, if others would also find it of
use, I can add it to the main code path, either as the default or via MCA
param.

Any thoughts?
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RFC: hostfile setting of #slots

2012-09-02 Thread Kenneth A. Lloyd
This is a tricky issue, isn't it?  With the differences between AMD & Intel,
and between base operating systems "touching" & heaps (betw. Linux &
Windows), and various virtual machines schemes- we have opted for an
"outside the main code path" solution to get deterministic results. But that
is as things are now.  Who knows how stuff like AVX2 / memory mapping - or
the next new thing - will affect this?  This is similar to issues we've
found with CPU/GPU memory & affinity mapping over IB.  The basis of the
decision is (as is often) how much do you trust the user to do the right
thing?  What happens if you are wrong?

Only my opinion.

Ken

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Ralph Castain
Sent: Saturday, September 01, 2012 3:54 AM
To: Open MPI Developers
Subject: [OMPI devel] RFC: hostfile setting of #slots

This is not a notice of intended change, but rather a solicitation for
comment.

We currently default the number of slots on a host provided via hostfile or
-host to 1. This is a historical "feature" driven by the fact that our
initial launch system spawned daemons on the remote nodes after we had
already mapped the processes to them - so we had no way of guessing a
reasonable value for the number of slots on any node.

However, the "vm" launch method starts daemons on every node prior to  doing
the mapping, precisely so we can use the topology in the mapping algorithm.
This creates the possibility of setting the number of slots on a node to the
number of cpus (either cores or hardware threads, depending on how that flag
is set) IF it wasn't provided in the hostfile.

This would have an impact on the default "byslot" mapping algorithm. With
only one slot/node, byslot essentially equates to bynode mapping. So a
user-provided hostfile that doesn't give any info on the number of slots on
a node effectively changes the default mapping algorithm to "bynode". This
change would alter that behavior and retain a "byslot" algorithm, with the
number of slots being the number of cpus.

I have a use-case that would benefit from making the change, but can handle
it outside of the main code path. However, if others would also find it of
use, I can add it to the main code path, either as the default or via MCA
param.

Any thoughts?
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-31 Thread Kenneth A. Lloyd
I haven't used SGE or Oracle Grid Engine in ages, but apparently it is now
called Open Grid Engine
http://gridscheduler.sourceforge.net/


-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Rayson Ho
Sent: Friday, July 27, 2012 8:25 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] OpenMPI and SGE integration made more stable

On Fri, Jul 27, 2012 at 8:53 AM, Daniel Gruber  wrote:
> A while after u5 the open source repository was closed and most of the 
> German engineers from Sun/Oracle moved to Univa, working on Univa Grid 
> Engine. Currently you have the choice between Univa Grid Engine, Son 
> of Grid Engine (free acadmic project), and OGS.

Oracle Grid Engine is still alive, and in fact updates are still released by
Oracle from time to time.

(But of course it is not free, and since most people are looking for a free
download, it is usually not mentioned in the mailing list
discussions...)

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


> Daniel
>
>
>>
>> +-+--+
>> | Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
>> | TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
>> | Erwin-Schrödinger-Str.  |  |
>> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
>> ||
>> | HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
>> +-+--+
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5166 - Release Date: 07/30/12




Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Kenneth A. Lloyd
Also, which version of MVAPICH2 did you use?

I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2) vis
MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.

Ken
-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Rolf vandeVaart
Sent: Tuesday, January 17, 2012 7:54 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

I am not aware of any issues.  Can you send me a test program and I can try
it out?
Which version of CUDA are you using?

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Sebastian Rinke
>Sent: Tuesday, January 17, 2012 8:50 AM
>To: Open MPI Developers
>Subject: [OMPI devel] GPUDirect v1 issues
>
>Dear all,
>
>I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking 
>MPI_SEND/RECV to block forever.
>
>For two subsequent MPI_RECV, it hangs if the recv buffer pointer of the 
>second recv points to somewhere, i.e. not at the beginning, in the recv 
>buffer (previously allocated with cudaMallocHost()).
>
>I tried the same with MVAPICH2 and did not see the problem.
>
>Does anybody know about issues with GPUDirect v1 using Open MPI?
>
>Thanks for your help,
>Sebastian
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel

---
This email message is for the sole use of the intended recipient(s) and may
contain confidential information.  Any unauthorized review, use, disclosure
or distribution is prohibited.  If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the
original message.

---

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2109/4747 - Release Date: 01/16/12



Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-22 Thread Kenneth A. Lloyd
Oliver,

Thank you for this summary insight.  This substantially affects the
structural design of software implementations, which points to a new
analysis "opportunity" in our software.

Ken Lloyd

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Oliver Geisler
Sent: Thursday, April 22, 2010 9:38 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

To sum up and give an update:

The extended communication times while using shared memory communication of
openmpi processes are caused by openmpi session directory laying on the
network via NFS.

The problem is resolved by establishing on each diskless node a ramdisk or
mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to point to
the according mountpoint shared memory communication and its files are kept
local, thus decreasing the communication times by magnitudes.

The relation of the problem to the kernel version is not really resolved,
but maybe not "the problem" in this respect.
My benchmark is now running fine on a single node with 4 CPU, kernel
2.6.33.1 and openmpi 1.4.1.
Running on multiple nodes I experience still higher (TCP) communication
times than I would expect. But that requires me some more deep researching
the issue (e.g. collisions on the network) and should probably posted to a
new thread.

Thank you guys for your help.

oli

--
This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RFC: increase default AC/AM/LT requirements

2010-03-24 Thread Kenneth A. Lloyd
Jeff,

I'm a researcher / developer, but a very small player in the OpenMPI
landscape.  I'd say go ahead with the commit. Some stuff is just too old to
maintain.

Ken Lloyd
Watt Systems Technologies Inc.
WARP - Watt Advanced Research Platforms

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Wednesday, March 24, 2010 3:51 PM
To: Open MPI Developers List
Subject: [OMPI devel] RFC: increase default AC/AM/LT requirements

*** LAST CHANCE ***

I'm asking yet one more time because it ***WILL HAVE A DIRECT IMPACT ON
DEVELOPERS!***

We're past the RFC timeout and no one has objected, and I have a patch ready
to commit, but be aware that when I do, you will need the following Auto
tools versions for developers builds on the trunk:

Automake 1.11.1
Autoconf 2.65
Libtool 2.2.6b

See the original RFC here:
http://www.open-mpi.org/community/lists/devel/2010/02/7496.php.

If no one cares by COB tomorrow (Thu, 25 Mar 2010), I'll commit.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel