Re: [OMPI devel] Common symbol warnings in tarballs (was: make install warns about 'common symbols')

2016-04-20 Thread Dave Goodell (dgoodell)
On Apr 20, 2016, at 9:14 AM, Jeff Squyres (jsquyres)  wrote:
> 
> I was under the impression that this warning script only ran for developer 
> builds.  But it looks like it's unconditionally run at the end of "make 
> install" (on master only -- so far).
> 
> Should we make this only run for developer builds?  (e.g., check for 
> $srcdir/.git, or somesuch)  I think it's our goal to have zero common 
> symbols, but that may not always be the case, and we don't want this 
> potentially alarming warning showing up for users, right?

IMO, this is basically just another warning flag.  If you enable most compiler 
warnings for non-developer builds, I don't see why you would go out of your way 
to disable this particular one.  You could always tweak the output to point to 
a wiki page that explains what the warning means, so concerned users can 
hopefully be assuaged.

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-3792-g92290b9

2016-04-07 Thread Dave Goodell (dgoodell)
[inline]

On Apr 7, 2016, at 12:53 PM, git...@crest.iu.edu wrote:
> 
> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi".
> 
> The branch, master has been updated
>   via  92290b94e0584271d6459a6ab5923a04125e23be (commit)
>  from  7cdf50533cf940258072f70231a4a456fa73d2f8 (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi/commit/92290b94e0584271d6459a6ab5923a04125e23be
> 
> commit 92290b94e0584271d6459a6ab5923a04125e23be
> Author: Thananon Patinyasakdikul 
> Date:   Wed Apr 6 14:26:04 2016 -0400
> 
>Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)
> 
> diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c 
> b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> index 9d1d402..a912bc3 100644
> --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> @@ -106,7 +106,7 @@ static int mca_pml_ob1_recv_request_cancel(struct 
> ompi_request_t* ompi_request,
> /* The rest should be protected behind the match logic lock */
> OB1_MATCHING_LOCK(_comm->matching_lock);
> if( true == request->req_match_received ) { /* way to late to cancel this 
> one */
> -OPAL_THREAD_UNLOCK(_comm->matching_lock);
> +OB1_MATCHING_LOCK(_comm->matching_lock);

I've only taken a cursory look, but this looks very wrong to me.  Shouldn't you 
be using the "OB1_MATCHING_UNLOCK" macro instead?  Doubly locking the lock will 
almost certainly lead to sadness.

> assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not 
> matched isn't it */
> return OMPI_SUCCESS;
> }
> diff --git a/opal/mca/btl/tcp/btl_tcp.h b/opal/mca/btl/tcp/btl_tcp.h
> index f2c8917..7e9d726 100644
> --- a/opal/mca/btl/tcp/btl_tcp.h
> +++ b/opal/mca/btl/tcp/btl_tcp.h
> @@ -96,7 +96,7 @@ extern int mca_btl_tcp_progress_thread_trigger;
> do {\
> if(0 < mca_btl_tcp_progress_thread_trigger) {   \
> opal_event_t* _event = (opal_event_t*)(event);  \
> -opal_fd_write( mca_btl_tcp_pipe_to_progress[1], 
> sizeof(opal_event_t*), \
> +(void) opal_fd_write( mca_btl_tcp_pipe_to_progress[1], 
> sizeof(opal_event_t*), \

Seems better to capture the return value and at least put an assert() on it if 
it fails, though admittedly things are very screwed up if you overrun the pipe.

-Dave



Re: [OMPI devel] MTT setup updated to gcc-6.0 (pre)

2016-02-25 Thread Dave Goodell (dgoodell)
On Feb 25, 2016, at 4:05 PM, Jeff Squyres (jsquyres)  wrote:
> 
> On Feb 25, 2016, at 2:59 PM, Paul Hargrove  wrote:
>> 
>> Not an error - a new API in C++11 to get number of dimensions in a 
>> multi-dimensional array.
>> http://en.cppreference.com/w/cpp/types/rank
> 
> So you can't have a local variable named "rank" any more?  That's... terrible!

Or you could avoid "using namespace std".  Or qualify it using "::rank" (I 
think, my C++ is rusty).

-Dave



Re: [OMPI devel] Lots and lots of warnings on master

2015-11-11 Thread Dave Goodell (dgoodell)
On Nov 11, 2015, at 10:09 AM, Ralph Castain  wrote:
> 
> FWIW: I don’t think that’s the issue here. I don’t see these warnings on my 
> CentOS7 box, for example. I think this is driven by the fact that odin has 
> some very old compilers and a very different environment, and so it has 
> historically generated more warnings.
> 
> The warnings often are valid - they just don’t get issued by other compilers, 
> or configure activates other code paths.

Sure, but the automation could be set up to do builds on both Odin and newer 
systems.  I think it's easier to tend to this sort of warning cruft as it 
happens, rather than big stomps later on, but that's just personal preference.  
It would also hopefully reduce the amount of cherry picking that needs to 
happen to release branches.

*shrug* just a suggestion

-Dave



Re: [OMPI devel] Lots and lots of warnings on master

2015-11-11 Thread Dave Goodell (dgoodell)
Once you squash all these warnings, you could set up a little bit of Jenkins or 
Travis CI logic to check for PRs that add new warnings and marks them 
appropriately.  Of course, with people making commits directly to master, 
warnings introduced by those direct commits will be ascribed to those who make 
PRs against master.  But at least you'd catch them quickly.

The MPICH project has a script that could easily be adapted to extract only the 
warnings: http://git.mpich.org/mpich.git/blob/HEAD:/maint/clmake.in

Alternatively/additionally you could have a build bot that watches for new 
commits to master, runs a quick build on each new commit, and sends mail to 
de...@open-mpi.org and the offender with the new warnings that have been 
introduced.

-Dave

> On Nov 10, 2015, at 10:51 PM, Ralph Castain  wrote:
> 
> This is from an older compiler, so take it for what it’s worth - I’ll take 
> care of the orte ones, but the bcol, nbc, and osc ones will need addressing:
> 
> pmix1_server_south.c: In function 'myerr':
> pmix1_server_south.c:58: warning: 'nm' may be used uninitialized in this 
> function
> 
> pmix1_client.c: In function 'myerr':
> pmix1_client.c:44: warning: 'nm' may be used uninitialized in this function
> 
> base/rml_base_channel_handlers.c: In function 
> 'orte_rml_base_close_channel_send_callback':
> base/rml_base_channel_handlers.c:95: warning: unused variable ‘peer'
> 
> runtime/orte_data_server.c: In function 'orte_data_server':
> runtime/orte_data_server.c:174: warning: 'uid' may be used uninitialized in 
> this function
> 
> util/dash_host/dash_host.c: In function 'orte_util_add_dash_host_nodes':
> util/dash_host/dash_host.c:57: warning: 'slots' may be used uninitialized in 
> this function
> util/dash_host/dash_host.c:58: warning: 'slots_given' may be used 
> uninitialized in this function
> 
> qos_ack_component.c: In function 'orte_qos_ack_recv_msg_timeout_callback':
> qos_ack_component.c:534: warning: unused variable 'msg'
> qos_ack_component.c: In function 'orte_qos_ack_msg_send_callback':
> qos_ack_component.c:667: warning: unused variable 'channel'
> qos_ack_component.c: In function 'ack_recv':
> qos_ack_component.c:316: warning: 'room_num' may be used uninitialized in 
> this function
> 
> ras_slurm_module.c: In function 'recv_data':
> ras_slurm_module.c:778: warning: 'app' may be used uninitialized in this 
> function
> 
> bcol_ptpcoll_allreduce.c: In function 'bcol_ptpcoll_allreduce_narraying_init':
> bcol_ptpcoll_allreduce.c:236: warning: unused variable 'dtype'
> bcol_ptpcoll_allreduce.c:235: warning: unused variable ‘count'
> 
> nbc.c: In function 'NBC_Progress':
> nbc.c:297: warning: 'size' may be used uninitialized in this function
> 
> osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_accumulate_w_req':
> osc_pt2pt_comm.c:424: warning: 'ptr' may be used uninitialized in this 
> function
> osc_pt2pt_comm.c:420: warning: 'frag' may be used uninitialized in this 
> function
> osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_put_w_req':
> osc_pt2pt_comm.c:250: warning: 'ptr' may be used uninitialized in this 
> function
> osc_pt2pt_comm.c:242: warning: 'frag' may be used uninitialized in this 
> function
> osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_callback':
> osc_pt2pt_data_move.c:1615: warning: unused variable 'incoming_length'
> osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_rget_accumulate_internal':
> osc_pt2pt_comm.c:951: warning: 'ptr' may be used uninitialized in this 
> function
> osc_pt2pt_comm.c:947: warning: 'frag' may be used uninitialized in this 
> function
> osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_control_send':
> osc_pt2pt_data_move.c:213: warning: 'ptr' may be used uninitialized in this 
> function
> osc_pt2pt_data_move.c:212: warning: 'frag' may be used uninitialized in this 
> function
> osc_pt2pt_data_move.c: In function 'ompi_osc_gacc_long_start':
> osc_pt2pt_data_move.c:961: warning: 'acc_data' may be used uninitialized in 
> this function
> osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_gacc_start':
> osc_pt2pt_data_move.c:914: warning: 'acc_data' may be used uninitialized in 
> this function
> osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_rget_internal':
> osc_pt2pt_comm.c:744: warning: 'ptr' may be used uninitialized in this 
> function
> osc_pt2pt_comm.c:740: warning: 'frag' may be used uninitialized in this 
> function
> osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_acc_long_start':
> osc_pt2pt_data_move.c:831: warning: 'acc_data' may be used uninitialized in 
> this function
> osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_compare_and_swap':
> osc_pt2pt_comm.c:601: warning: 'ptr' may be used uninitialized in this 
> function
> osc_pt2pt_comm.c:594: warning: 'frag' may be used uninitialized in this 
> function
> 
> osc_rdma_active_target.c: In function 'ompi_osc_rdma_post_atomic':
> osc_rdma_active_target.c:183: warning: 'frag' may be used uninitialized in 
> this function
> osc_rdma_accumulate.c: In function 

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2436-g7adb9b7

2015-09-03 Thread Dave Goodell (dgoodell)
On Sep 3, 2015, at 3:40 PM, Burette, Yohann  wrote:

> I see what you are saying. Thank you for pointing it out.
> 
> Would MTL_OFI_RETRY_UNTIL_DONE be better instead?

Yes, I think that would be an improvement.

Thanks,
-Dave


Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2436-g7adb9b7

2015-09-03 Thread Dave Goodell (dgoodell)
On Sep 3, 2015, at 1:03 PM, git...@crest.iu.edu wrote:

> diff --git a/ompi/mca/mtl/ofi/mtl_ofi.h b/ompi/mca/mtl/ofi/mtl_ofi.h
> index 3584d8a..a035b1c 100644
> --- a/ompi/mca/mtl/ofi/mtl_ofi.h
> +++ b/ompi/mca/mtl/ofi/mtl_ofi.h
> @@ -38,6 +38,14 @@
> #include "mtl_ofi_endpoint.h"
> #include "mtl_ofi_compat.h"
> 
> +#define FI_RETRY_UNTIL_DONE(FUNC) \
> +do { \
> +do { \
> +ret = FUNC; \
> +if(OPAL_LIKELY(0 == ret)) {break;} \
> +} while(-FI_EAGAIN == ret); \
> +} while(0);
> +
> BEGIN_C_DECLS

Minor nit: it would be best to avoid stomping the "FI_" and "fi_" namespaces in 
OMPI code.  I find it unlikely that this particular symbol/macro would ever be 
defined, but it's usually just a good idea to stay away from the entire 
namespace.

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2340-gd5763a8

2015-08-20 Thread Dave Goodell (dgoodell)
On Aug 20, 2015, at 3:03 PM, git...@crest.iu.edu wrote:

> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi".
> 
> The branch, master has been updated
>   via  d5763a8288c994e1d55a333d45f1a85d64341aff (commit)
>  from  305053615779af14ed36e0d94d85c2bbea59d55b (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi/commit/d5763a8288c994e1d55a333d45f1a85d64341aff
> 
> commit d5763a8288c994e1d55a333d45f1a85d64341aff
> Author: --quiet <--quiet>

That's sure a funny author name and email... Jeff was this you somehow?

-Dave



Re: [OMPI devel] Proposal: update Open MPI's version number and release process

2015-05-19 Thread Dave Goodell (dgoodell)
On May 19, 2015, at 1:22 PM, Ralph Castain  wrote:

> No thx 
> 
> I would rather not create code czars 

Hence my "half version" alternative suggestion.

-Dave



Re: [OMPI devel] Proposal: update Open MPI's version number and release process

2015-05-19 Thread Dave Goodell (dgoodell)
On May 19, 2015, at 12:36 PM, Ralph Castain  wrote:

> Our pr tests aren't good enough for what you propose

I made no claim about whether PRs even needed automated testing in order to 
switch to this scheme.  Right now I could push any old garbage I want into the 
master directly without ever using a PR, without ever having had a code review, 
and without having had any sort of testing.  Automated PR testing is a separate 
issue and would be pure gravy here.

The change would be whether or not it's OK to have some additional delay from 
the time a contributor decides a patch set is acceptable for inclusion in 
OMPI's master branch until the time that someone else with push access merges 
the PR.  It also requires that everyone is OK with having some limited set of 
people who can make those pull decisions.  IMO, both are fine.  Others may 
disagree.

Alternatively, a half version of this would be to collapse to a single 
repository where only the release branch maintainers have direct push access, 
but allow PRs against master to be automatically merged by anyone by using the 
OMPIbot (with a "bot:pullme" comment or whatever).

-Dave



Re: [OMPI devel] Proposal: update Open MPI's version number and release process

2015-05-19 Thread Dave Goodell (dgoodell)
On May 19, 2015, at 5:08 AM, Jeff Squyres (jsquyres)  wrote:

> On May 18, 2015, at 5:03 PM, Mark Santcroos  
> wrote:
> 
>> What I didn't see in the doc, will you continue to work with two repo's or 
>> will that change too?
>> (I found that confusing as a newcomer)
> 
> Unfortunately, yes, we will keep 2 repos.  Github doesn't let us have 
> per-branch permissions -- having multiple repos is the only way to have 
> strict control over who can push to release branches.  Sad panda.
> 
> If Github ever does enact per-branch permissions, we will happily squash back 
> down to a single repo.

The other way to solve this issue would be to stop treating the master as a 
general dumping ground for potentially unstable code where anyone can just push 
any time they want.  If we switched to using PRs for (essentially) all code 
that goes into master as well, then we wouldn't need two different sets of 
permissions.

Back in the SVN days it was nice to have a trunk where people could freely 
check in work because there was no other good system for keeping track of your 
own work or sharing it with others.  But with Git we no longer have those 
problems.  I can easily organize multiple concurrent streams of private 
development, avoid losing work, and share work with others, all without 
committing to some centralized master branch.

-Dave



Re: [OMPI devel] Unsolicited code review of new distscript.sh

2015-04-27 Thread Dave Goodell (dgoodell)
On Apr 27, 2015, at 8:54 AM, Jeff Squyres (jsquyres)  wrote:

> Neat trick about perl -pie; I wasn't aware of that.

Make sure to write it as "-pi -e" (as Paul had it) or "-p -i -e", or it 
probably won't do what you expect.

>> On Apr 23, 2015, at 10:42 PM, Paul Hargrove  wrote:
>> 
>> Since perl is already required for autogen, you could replace sed+whatever 
>> with perl's in-place operation
>> perl -pi -e 's/from/to/' -- file(s)

-Dave



Re: [OMPI devel] Common symbols warning

2015-04-15 Thread Dave Goodell (dgoodell)
On Apr 14, 2015, at 11:02 PM, Gilles Gouaillardet  wrote:

> Dave,
> 
> my understanding is that the presence of common symbols should be treated as 
> a warning
> (and hence make install should not fail)
> 
> makes sense ?

It should be a warning... are you seeing otherwise in your builds?  Here's the 
tail end of "make install" on my machine:

✂
make[3]: Entering directory `/home/dgoodell/git/ompi-upstream/_build'
WARNING!  Common symbols found:
 rtc_base_frame.o: 0040 C orte_rtc_base
  sstore_base_frame.o: 0008 C 
orte_sstore_base_global_metadata_filename
  sstore_base_frame.o: 0008 C 
orte_sstore_base_local_metadata_filename
  sstore_base_frame.o: 0008 C 
orte_sstore_base_local_snapshot_fmt
  sstore_base_frame.o: 0004 C orte_sstore_context
  sstore_base_frame.o: 0004 C orte_sstore_handle_current
  sstore_base_frame.o: 0004 C orte_sstore_handle_last_stable
  routed_base_frame.o: 0001 C orte_routed_base_wait_sync
  routed_base_frame.o: 0070 C orte_routed_jobfams
 ras_base_frame.o: 0018 C orte_ras_base
[...]
skipping remaining symbols. To see all symbols, run:
  ../config/find_common_syms --top_builddir=. --top_srcdir=.. --objext=o
make[3]: [install-exec-hook] Error 1 (ignored)
make[3]: Leaving directory `/home/dgoodell/git/ompi-upstream/_build'
make[2]: Nothing to be done for `install-data-am'.
make[2]: Leaving directory `/home/dgoodell/git/ompi-upstream/_build'
make[1]: Leaving directory `/home/dgoodell/git/ompi-upstream/_build'
✂

The key bit being: "Error 1 (ignored)".

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch asm_fix created. dev-1370-g26f96c0

2015-03-25 Thread Dave Goodell (dgoodell)
Right, that's why I'm recommending adding a comment so we don't have someone 
flag this a third time :)

-Dave

On Mar 25, 2015, at 4:43 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> I had the same impression but them I went and read the Intel documentation 
> and xchg is one of these exceptions where the lock is implicit.
> 
>   George.
> 
> 
> On Wed, Mar 25, 2015 at 4:59 PM, Dave Goodell (dgoodell) <dgood...@cisco.com> 
> wrote:
> On Mar 25, 2015, at 3:02 PM, git...@crest.iu.edu wrote:
> 
> > +static inline int32_t opal_atomic_swap_32( volatile int32_t *addr,
> > +int32_t newval)
> > +{
> > +int32_t oldval;
> > +
> > +__asm__ __volatile__("xchg %1, %0" :
> 
> This code *looks* buggy because it lacks the "SMPLOCK" prefix, but can be 
> safely omitted because "xchg" is always locked.  A comment to this effect 
> should be added.
> 
> Also, this should probably be "xchgl" instead of "xchg".
> 
> > +  "=r" (oldval), "=m" (*addr) :
> 
> Shouldn't the modifier on the second constraint above be "+" for the same 
> reasons as the rest of this commit?  In that case I also think you can omit 
> the second constraint below altogether, though I'm less sure about that.
> 
> > +  "0" (newval), "m" (*addr) :
> > +  "memory");
> > +return oldval;
> > +}
> 
> -Dave
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/03/17153.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/03/17154.php



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch asm_fix created. dev-1370-g26f96c0

2015-03-25 Thread Dave Goodell (dgoodell)
On Mar 25, 2015, at 3:02 PM, git...@crest.iu.edu wrote:

> +static inline int32_t opal_atomic_swap_32( volatile int32_t *addr,
> +int32_t newval)
> +{
> +int32_t oldval;
> +
> +__asm__ __volatile__("xchg %1, %0" :

This code *looks* buggy because it lacks the "SMPLOCK" prefix, but can be 
safely omitted because "xchg" is always locked.  A comment to this effect 
should be added.

Also, this should probably be "xchgl" instead of "xchg".

> +  "=r" (oldval), "=m" (*addr) :

Shouldn't the modifier on the second constraint above be "+" for the same 
reasons as the rest of this commit?  In that case I also think you can omit the 
second constraint below altogether, though I'm less sure about that.

> +  "0" (newval), "m" (*addr) :
> +  "memory");
> +return oldval;
> +}

-Dave



Re: [OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-04 Thread Dave Goodell (dgoodell)
On Mar 4, 2015, at 3:25 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> On Wed, Mar 4, 2015 at 1:04 PM, Dave Goodell (dgoodell) <dgood...@cisco.com> 
> wrote:
> [...]
> > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
> > libibverbs: Warning: no userspace device-specific driver found for 
> > /sys/class/infiniband_verbs/uverbs0
> 
> I think that warning is printed by libibverbs itself.  Are you 100% sure 
> there are no IB HCAs sitting in the head node?  If there are IB HCAs but you 
> don't want them to be used, you might want to ensure that the various verbs 
> kernel modules don't get loaded, which is one half of the mismatch which 
> confuses libibverbs.
> [...]
>  
> FWIW, I can confirm that these two lines are from libibverbs itself:
> $ strings /usr/lib64/libibverbs.a | grep -e 'no userspace' -e 'open config 
> directory'
> libibverbs: Warning: no userspace device-specific driver found for %s
> libibverbs: Warning: couldn't open config directory '%s'.

Yes, I think you'd also see the same message if you run "ibv_devices" or 
"ibv_devinfo" on the head node.

> As it happens, the login node *does* have an HCA installed and the kernel 
> modules appears to be loaded.  However, as the "17th node" in the cluster it 
> was never cabled to the 16-port switch and the package(s) that should have 
> created/populated /etc/libibverbs.d are *not* present (specifically the login 
> node has libipathverbs-devel installed but not libipathverbs).
> 
> So, Dave, are you saying that what I describe in the previous paragraph would 
> be considered "misconfiguration"?  I am fine with dropping the discussion of 
> those first two lines if there is agreement that Open MPI shouldn't be 
> responsible for handling this case.

I would consider that to be a lesser misconfiguration, which is only really an 
issue because of libibverbs deficiencies.  Either the hardware could be removed 
from the head node or the kernel modules could be unloaded / prevented from 
loading on the head node.

> Now the ibv_fork_init() warnings are another issue entirely.  Since btl:verbs 
> and mtl:psm both work (at least separately) perfectly fine on the compute 
> nodes, I don't believe that there are any configuration issues there.

Agreed, something needs to be improved there.  I assume that Mike D. or someone 
from his team will take a look.  I don't have any bandwidth to look at this 
myself.

-Dave



Re: [OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-04 Thread Dave Goodell (dgoodell)
On Mar 4, 2015, at 11:56 AM, Paul Hargrove  wrote:

> I have a system with InifniPath HCAs, where I've historically tested mtl:psm.
> For some reason, that appears to have ceased working some time in the past 4 
> months.
> However, this report is about something else.
> I am using the current master tarball: openmpi-dev-1203-g171d674.tar.bz2
> 
> When I ran configure, verbs support was found even though it was not my 
> intent to use it.
> So, I am running with an explicit blt list that omits verbs and am disabling 
> the broken mtl:psm and mtl:ofi as well.
> However, I am getting complaints from some verbs-related code:
> 
> $ mpirun -mca btl sm,self,tcp -mca mtl ^psm,ofi -np 2 -host n15,n16  
> examples/ring_c
> libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
> libibverbs: Warning: no userspace device-specific driver found for 
> /sys/class/infiniband_verbs/uverbs0

I think that warning is printed by libibverbs itself.  Are you 100% sure there 
are no IB HCAs sitting in the head node?  If there are IB HCAs but you don't 
want them to be used, you might want to ensure that the various verbs kernel 
modules don't get loaded, which is one half of the mismatch which confuses 
libibverbs.

> --
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:n16
>   Error (22):  Invalid argument
> --
> --
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:n15
>   Error (22):  Invalid argument
> --
> --
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:n16
>   Error (22):  Invalid argument
> --
> --
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:n15
>   Error (22):  Invalid argument
> --

Hmm... I don't know enough about how show_help works, but I thought that would 
have at least de-duped some of this.  It looks like the fork check is run once 
per device, so show_help may not be able to de-dupe everything.

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi-tests branch master updated. dev-62-gff3dee2

2015-02-20 Thread Dave Goodell (dgoodell)
On Feb 20, 2015, at 6:46 AM, git...@crest.iu.edu wrote:

> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi-tests".
> 
> The branch, master has been updated
>   via  ff3dee227f572c21fc6d35ed78cb359578a2661e (commit)
>  from  3ca024f4025d0582c5365e960af5b857a2cf8ca4 (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi-tests/commit/ff3dee227f572c21fc6d35ed78cb359578a2661e
> 
> commit ff3dee227f572c21fc6d35ed78cb359578a2661e
> Author: Jeff Squyres 
> Date:   Fri Feb 20 04:46:12 2015 -0800
> 
>Cisco: stop testing the intel compiler
> 
> diff --git a/cisco/mtt/community/cisco-ompi-core-testing-master.ini 
> b/cisco/mtt/community/cisco-ompi-core-testing-master.ini
> index 942c152..465890f 100644
> --- a/cisco/mtt/community/cisco-ompi-core-testing-master.ini
> +++ b/cisco/mtt/community/cisco-ompi-core-testing-master.ini
> @@ -105,7 +105,9 @@ ompi_configure_arguments = CC=clang CXX=clang++ 
> --disable-mpi-fortran "CFLAGS=-g
> 
> #--
> 
> -[MPI install: Intel-12.1]
> +# Intel compiler borked their 2015 update 2, which broke all my other
> +# intel compiler installs. :-( :-( :-(
> +[SKPI MPI install: Intel-12.1]

I think you mean "SKIP" :-)

-Dave



Re: [OMPI devel] Fortran issue

2015-02-19 Thread Dave Goodell (dgoodell)
On Feb 19, 2015, at 10:15 AM, George Bosilca  wrote:

> While looking the MPI_Testany issue, I came across a very unsettling sentence 
> in the MPI standard (3.0 page 58 line 36).
> 
> > The array is indexed from zero in C, and from one in Fortran.
> 
> This sentence seems to indicate that the index returned by the TestAny and 
> TestSome (as well as the corresponding Wait functions) should be indexed 
> starting from 1 in Fortran, but from 0 in C. Our C code returns all indexes 
> starting from 0 (C), but I failed to find where we handle this case in 
> Fortran? Or maybe I am interpreting too much the MPI standard?

Jeff is Mr. Fortran, so I'll let him answer more definitely, but in the 
meantime you could try running this test from the MPICH test suite:

http://git.mpich.org/mpich.git/blob/v3.0:/test/mpi/f77/pt2pt/allpairf.f

Surely there must be a test in ompi-tests that covers this area too.

-Dave



Re: [OMPI devel] omni-release Github comment bot

2015-02-05 Thread Dave Goodell (dgoodell)
My personal opinion on this is that adding a bot:rebase command is a bit silly. 
 IMO only the author of the PR should be allowed to issue this command, since 
it modifies his/her fork repo, in which case why not just use the git command 
line to do this?  We shouldn't be implementing a full copy of the git CLI via 
GH issue comments.

The bot label/milestone commands are mainly workarounds for GitHub permission 
deficiencies.  We wouldn't need them if we could properly delegate permissions 
at a finer granularity.

-Dave

On Feb 5, 2015, at 3:31 PM, Jeff Squyres (jsquyres)  wrote:

> Ralph and I chatted more about this on the phone.
> 
> Short version: we think we generally agree.  :-)
> 
> A point that was missed in the prior email discussion was that when we click 
> the green "merge" button, it puts effectively those commits at the HEAD -- 
> which, for the purposes of this conversation, is close-enough to rebasing 
> such that rebasing and re-smoke-testing is not a bad thing.
> 
> Is this a bot you guys can write?  I.e., I think it should probably be 
> different than the label/milestone/assignment bot.
> 
> 
> 
> 
>> On Feb 5, 2015, at 2:58 PM, Mike Dubman  wrote:
>> 
>> rebase before merge is a good practice/gate used by other code review tools 
>> (like gerrit).
>> 
>> it helps to keep git history linear (less merge commits) and takes the most 
>> recent patch set from PR and have it rebased onto the tip of the destination 
>> branch. If rebase succeeds (no conflicts) - jenkins will smoke-test it and 
>> RM will feel more confident that rebased PR is up to date with smoke testing 
>> and operational/compilable state.
>> 
>> smoketest/jenkins is not competing with mtt or other forms of testing 
>> anyway, just brutal indication of project health. :)
>> 
>> 
>> 
>> 
>> On Thu, Feb 5, 2015 at 9:17 PM, Jeff Squyres (jsquyres)  
>> wrote:
>> Thinking about this a little bit, there's a wrinkle: you (the individual 
>> developer) will need to give push permissions on your ompi / ompi-release 
>> fork to the OMPIBot Github account.  Otherwise, it won't be able to push 
>> back to your fork.
>> 
>> Thinking about this even more, I'm a little worried about implementing this 
>> feature.  It seems to give a lot of credence to the smoke test -- i.e., if 
>> hello world/ring work, then my patch must work.  I'm not sure that's 
>> "enough" to give me confidence that a patch rebased properly.
>> 
>> Thoughts?
>> 
>> 
>>> On Feb 5, 2015, at 2:08 PM, Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
>>> Mike:
>>> 
>>> This sounds good, but let us get the label/milestone/assign thing going 
>>> first.
>>> 
>>> I'm thinking that the functionality you describe may become a different 
>>> bot...?  I'm not sure.
>>> 
>>> 
 On Feb 5, 2015, at 9:56 AM, Mike Dubman  wrote:
 
 yep, exactly.
 
 
 On Thu, Feb 5, 2015 at 2:35 PM, Jeff Squyres (jsquyres) 
  wrote:
 On Feb 5, 2015, at 7:20 AM, Mike Dubman  wrote:
> 
> sounds cool and useful.
 
 K, thanks.
 
> Also, does it make sense to have "rebase" knob to cause "try rebase if no 
> conflicts" with upstream?
 
 Just to be sure what you mean: something like "rebase:" that will cause 
 the patch set to be rebased to head of master (if there are no conflicts)?
 
 I think you're asking because:
 
 - it doesn't make the RM/GK's job easier because github would have already 
 detected this and still kept the "merge" button green on the PR
 - but it would (assumedly) trigger a new Jenkins smoke test, which is the 
 desirable thing here (i.e., it may merge, but it may or may not *work)
 
 Is that what you're thinking?
 
 --
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to: 
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post: 
 http://www.open-mpi.org/community/lists/devel/2015/02/16929.php
 
 
 
 --
 
 Kind Regards,
 
 M.
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post: 
 http://www.open-mpi.org/community/lists/devel/2015/02/16934.php
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> 

Re: [OMPI devel] ompi-master build error : make can require autotools

2015-01-06 Thread Dave Goodell (dgoodell)
On Jan 5, 2015, at 8:40 PM, Gilles Gouaillardet  
wrote:

> Dave,
> 
> what if you do
> 
> touch ompi/include/mpi.h.in && sleep 1 && touch 
> config/opal_config_pthreads.m4 && ./autogen.pl && module unload 
> cisco/autotools/ac269-am1133-lt242 && ./configure --prefix=$PWD/_prefix && 
> make
> 
> 
> autogen.pl nor configure does not touch ompi/include/mpi.h.in, and as a
> consequence,
> config/opal_config_pthreads.m4 is newer than ompi/include/mpi.h when
> make is invoked.
> 
> then from ompi/include/Makefile:
> 
> $(srcdir)/mpi.h.in:  $(am__configure_deps)
>($(am__cd) $(top_srcdir) && $(AUTOHEADER))
>rm -f stamp-h2
>touch $@

I don't see that rule in my ompi/include/Makefile.  I only have a couple of 
mentions of mpi.h.in:

✂
DIST_COMMON = $(top_srcdir)/Makefile.ompi-rules \
$(srcdir)/ompi/Makefile.am $(srcdir)/Makefile.in \
$(srcdir)/Makefile.am $(srcdir)/mpi.h.in $(srcdir)/mpif.h.in \
$(srcdir)/mpif-config.h.in $(am__include_HEADERS_DIST) \
$(am__nobase_dist_ompi_HEADERS_DIST) $(pkginclude_HEADERS)
[...]
am__tagged_files = $(HEADERS) $(SOURCES) $(TAGS_FILES) $(LISP)mpi.h.in
[...]
stamp-h2: $(srcdir)/mpi.h.in $(top_builddir)/config.status
@rm -f stamp-h2
cd $(top_builddir) && $(SHELL) ./config.status ompi/include/mpi.h
✂

Maybe the rule in your version of ompi/include/Makefile comes from an older, 
buggy version of automake?

> this means $(AUTOHEADER) is invoked, and then ompi/include/mpi.h.in is
> touched.

I don't see $(AUTOHEADER) being invoked by make when I run the commands you 
listed (with "sleep 1" changed to "sleep 5" to be certain):

✂
make[1]: Entering directory `/home/dgoodell/git/ompi-upstream/ompi'
Making all in include
make[2]: Entering directory `/home/dgoodell/git/ompi-upstream/ompi/include'
make  all-am
make[3]: Entering directory `/home/dgoodell/git/ompi-upstream/ompi/include'
  GENERATE mpif-sizeof.h
  LN_S mpi_portable_platform.h
make[3]: Leaving directory `/home/dgoodell/git/ompi-upstream/ompi/include'
make[2]: Leaving directory `/home/dgoodell/git/ompi-upstream/ompi/include'
Making all in datatype
✂

Here are the timestamps on the relevant files after the build completes:

✂
-rw-rw-r-- 1 dgoodell dgoodell 33 2015-01-06 08:21:18.414503328 -0800 
ompi/include/stamp-h2
-rw-rw-r-- 1 dgoodell dgoodell 166283 2015-01-06 08:21:18.408502854 -0800 
ompi/include/mpi.h
-rwxrwxr-x 1 dgoodell dgoodell 246260 2015-01-06 08:21:09.283782006 -0800 
config.status*
-rw-rw-r-- 1 dgoodell dgoodell  18853 2015-01-06 08:17:07.212658002 -0800 
config/opal_config_pthreads.m4
-rw-rw-r-- 1 dgoodell dgoodell 165986 2015-01-06 08:17:02.209262644 -0800 
ompi/include/mpi.h.in
✂

-Dave



Re: [OMPI devel] ompi master, libfabric and static libraries

2014-12-22 Thread Dave Goodell (dgoodell)
On Dec 22, 2014, at 5:16 AM, Gilles Gouaillardet 
 wrote:

> Jeff,
> 
> MTT reported some errors when building some test suites :
> http://mtt.open-mpi.org/index.php?do_redir=2219
> 
> the root cause was some missing flags in the wrappers.
> i fixed that in 8976dcf6101412f6bd0080764d19a3e9d4edf570
> 
> there is now a second issue :
> libfabric requires libnl, but the -lnl flag is not passed to the mpi
> wrappers.
> could you please have a look at this ?

Jeff's on vacation and may not be checking email very frequently.  I think 
he'll be back full time on January 5th.  I'll take a look at this issue, but I 
haven't been closely keeping track of his libfabric integration work, so I 
can't guarantee that I'll get it fixed before I disappear for the Cisco/USA 
holiday period on December 24th.

-Dave



Re: [OMPI devel] ompi-master build error : make can require autotools

2014-12-22 Thread Dave Goodell (dgoodell)
On Dec 22, 2014, at 2:42 AM, Gilles Gouaillardet 
 wrote:

> Jeff and all,
> 
> i just found "by accident" that make can require autotools.
> 
> for example:
> 
> from (generated) ompi/include/Makefile :
> $(srcdir)/mpi.h.in:  $(am__configure_deps)
>($(am__cd) $(top_srcdir) && $(AUTOHEADER))
>rm -f stamp-h2
>touch $@
> 
> and $(am__configure_deps) is a bunch (all?) of .m4 files.
> 
> from a pragmatic point of view, it means that if update a m4 file, run
> autogen.pl and configure,
> then, the first invokation of make will run $(AUTOHEADER)

Gilles,

Have you actually experienced this exact behavior?  The sequence you mention 
above shouldn't cause autoheader to be invoked by make.  Running autogen.pl 
will invoke autoheader after the m4 files were touched, so the mpi.h.in file 
will be newer than its m4 dependencies, which should mean that this make rule 
won't be executed.

-Dave



[OMPI devel] Git security vulnerability, please upgrade Windows & OS X Git clients

2014-12-19 Thread Dave Goodell (dgoodell)
Quoting from 
https://github.com/blog/1938-vulnerability-announced-update-your-git-clients

"""
A critical Git security vulnerability has been announced today, affecting all 
versions of the official Git client and all related software that interacts 
with Git repositories, including GitHub for Windows and GitHub for Mac. Because 
this is a client-side only vulnerability, github.com and GitHub Enterprise are 
not directly affected.

The vulnerability concerns Git and Git-compatible clients that access Git 
repositories in a case-insensitive or case-normalizing filesystem. An attacker 
can craft a malicious Git tree that will cause Git to overwrite its own 
.git/config file when cloning or checking out a repository, leading to 
arbitrary command execution in the client machine. Git clients running on OS X 
(HFS+) or any version of Microsoft Windows (NTFS, FAT) are exploitable through 
this vulnerability. Linux clients are not affected if they run in a 
case-sensitive filesystem.

We strongly encourage all users of GitHub and GitHub Enterprise to update their 
Git clients as soon as possible, and to be particularly careful when cloning or 
accessing Git repositories hosted on unsafe or untrusted hosts.
"""

The official Git release post: 
http://article.gmane.org/gmane.linux.kernel/1853266

-Dave



Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread Dave Goodell (dgoodell)
On Nov 6, 2014, at 12:44 AM, George Bosilca  wrote:

> PS: Sorry Dave I also pushed a master branch merge ...

It's not the end of the world, just try to keep an eye on it and avoid doing it 
in the future.  If you need any help avoiding it, feel free to ping me or the 
devel@ list in general.

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-206-g87dffac

2014-11-03 Thread Dave Goodell (dgoodell)
On Nov 3, 2014, at 10:50 AM, Alexander Mikheev  wrote:

> It is --amend of my previous commit.  When I tried to push my amended commit, 
> the merge was required. 

Ah, I just spotted the minor difference between the two commits.  The second 
argument to setenv() was changed from integer zero to a string "0".

In the future, it would be better to just create a new single commit fixing the 
mistake directly instead of amending, merging, and pushing, since that 
difference was not obvious from just looking at the email diffs flying by.

-Dave



Re: [OMPI devel] OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-198-g68bec0a

2014-11-03 Thread Dave Goodell (dgoodell)
On Nov 3, 2014, at 10:41 AM, Jed Brown <j...@jedbrown.org> wrote:

> "Dave Goodell (dgoodell)" <dgood...@cisco.com> writes:
>> Most of the time a "pull" won't succeed if you have uncommitted
>> modifications your tree, so I'm not sure how pull/commit/push would
>> actually work for you.  Do you stash/unstash in the middle there?
> 
> Git will happily do the pull/merge despite your dirty tree as long as
> none of the dirty files are affected.  Linus says that he usually has
> uncommitted changes in his tree when merging.

Hmm... you can see how often I create proper merge commits on a dirty tree.  I 
must have been hit in the past by conflicting dirty files.

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-206-g87dffac

2014-11-03 Thread Dave Goodell (dgoodell)
Hi Alex,

Why did you push this "OSHMEM: spml ikrit..." commit twice?  I see it here 
(together with an undesirable merge-of-master commit) and also as 065dc9b4.

-Dave

On Nov 3, 2014, at 2:03 AM, git...@crest.iu.edu wrote:

> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi".
> 
> The branch, master has been updated
>   via  87dffacc56b4ebcecaa2e65e19c2f813d2a5d078 (commit)
>   via  e1cf6f37baf2b6240ab3aa3a219b8856cfa2caf4 (commit)
>  from  065dc9b4deec9cd9500f2fdc6bb53bbf58a9c2f6 (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi/commit/87dffacc56b4ebcecaa2e65e19c2f813d2a5d078
> 
> commit 87dffacc56b4ebcecaa2e65e19c2f813d2a5d078
> Merge: e1cf6f3 065dc9b
> Author: Alex Mikheev 
> Date:   Mon Nov 3 10:02:29 2014 +0200
> 
>Merge branch 'master' of github.com:open-mpi/ompi
> 
>Conflicts:
>   oshmem/mca/spml/ikrit/spml_ikrit_component.c
> 
> 
> 
> https://github.com/open-mpi/ompi/commit/e1cf6f37baf2b6240ab3aa3a219b8856cfa2caf4
> 
> commit e1cf6f37baf2b6240ab3aa3a219b8856cfa2caf4
> Author: Alex Mikheev 
> Date:   Sun Nov 2 12:41:20 2014 +0200
> 
>OSHMEM: spml ikrit: disable rdmap op DCI pool
> 
>Instead use single pool for both rdma and send receive ops.
> 
> diff --git a/oshmem/mca/spml/ikrit/spml_ikrit_component.c 
> b/oshmem/mca/spml/ikrit/spml_ikrit_component.c
> index 2079640..e021666 100644
> --- a/oshmem/mca/spml/ikrit/spml_ikrit_component.c
> +++ b/oshmem/mca/spml/ikrit/spml_ikrit_component.c
> @@ -92,6 +92,12 @@ static inline int set_mxm_tls()
> {
> char *tls;
> 
> +/* disable dci pull for rdma ops. Use single pool.
> + * Pool size is controlled by MXM_DC_QP_LIMIT 
> + * variable
> + */
> +setenv("MXM_OSHMEM_DC_RNDV_QP_LIMIT", "0", 0);
> +
> tls = getenv("MXM_OSHMEM_TLS");
> if (NULL != tls) {
> return check_mxm_tls("MXM_OSHMEM_TLS");
> 
> 
> ---
> 
> Summary of changes:
> oshmem/mca/spml/ikrit/spml_ikrit_component.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> hooks/post-receive
> -- 
> open-mpi/ompi
> ___
> ompi-commits mailing list
> ompi-comm...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits



Re: [OMPI devel] OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-198-g68bec0a

2014-11-03 Thread Dave Goodell (dgoodell)
On Nov 1, 2014, at 3:44 AM, Gilles Gouaillardet  
wrote:

> Hi Dave,
> 
> I am sorry about that, the doc is not to be blamed here.
> I usually do pull/commit/push in a row to avoid this kind of things but i 
> screwed up this time ...
> I cannot remember if i did commit/pull/push or if i simply forgot to pull

Most of the time a "pull" won't succeed if you have uncommitted modifications 
your tree, so I'm not sure how pull/commit/push would actually work for you.  
Do you stash/unstash in the middle there?  Or are you saying you make all of 
your changes between "pull" and "commit"?  If so, there's always a race there 
that you might occasionally need to resolve with "git rebase" or "git pull 
--rebase" anyway.

> btw, is there a push option to abort if that would make github history non 
> linear ?

No, not really.  There are some options to "pull" to prevent you from creating 
a merge commit, but the fix when you encounter that situation would simply be 
to rebase in some fashion, so you might as well just do that every time.

The best thing to do is to just try to use "git pull --rebase" for any topic 
work (i.e., don't use a bare "git pull" unless you know that you need to 
perform a merge).  A few other alternatives if you don't like that for some 
reason:

1. Set your "pull" default to perform a rebase.  I don't recommend it because 
this can lead to confusion if you work on multiple systems and you are not 100% 
consistent about setting this behavior.  But here's how to do it: 
http://stevenharman.net/git-pull-with-automatic-rebase

2. "git pull --rebase" can always be substituted by "git fetch ; git rebase".  
You could change your workflow to avoid the "pull" command altogether until it 
all makes more sense to you.  Similarly, "git pull" (which means "git pull 
--no-rebase" by default) can always be substituted by "git fetch ; git merge".

3. View the commit graph before pushing to make sure you're pushing the history 
you think you should be.  A helpful command for this (which you can alias if 
desired) is:

git log --graph --oneline --decorate HEAD '@{u}'

That will show the commit graph that can be traced back from your current 
branch and its tracked upstream branch.  If you see a merge commit where you 
didn't expect one, fix the history before pushing.  If you don't know how to 
fix it, ask the list or google around a bit.

-Dave



Re: [OMPI devel] Why no release tags in open-mpi/ompi repository?

2014-10-20 Thread Dave Goodell (dgoodell)
On Oct 17, 2014, at 9:51 AM, Jed Brown  wrote:

> "Jeff Squyres (jsquyres)"  writes:
> 
>> Meaning: we deliberately chose not to change the development style of
>> the community to "develop on release branch" when we moved to git.
> 
> Understood.  It's your choice, but workflow is a big feature of Git.

Jed, I initially advocated for a merge-based workflow when we were planning the 
transition to Git, but others in the community felt that it would be too 
painful to simultaneously learn a new VCS and invert the flow of development.  
I'm still not 100% sure that sticking to the cherry-pick workflow was really 
the right call, but I've made peace with it for now.

We can certainly live with this workflow for a while and change again later if 
desired.  Separating the shocks to non-git-comfortable developers is a good 
thing, IMO.

>> If github would implement per-branch push ACLs, then we'd squash down
>> to a single repo, and all this would be easier.
>> 
>> But given the relative inexperience with git in our community (which
>> is noticeable via some mistakes on the ompi repo already!) and our
>> history of only allowing regulated commits to release branches, we
>> chose the (admittedly somewhat awkward) 2-repo model.
> 
> You could push release tags to open-mpi/ompi without pushing the branches.

I'm not sure this is less confusing.  It gives the illusion that "ompi" 
contains all of the release development as well, but in reality you need 
"ompi-release" to get anything beyond the latest tagged release.

>>> and it deprives you of context when you
>>> have no idea whether "dev-BIGNUMBER" is earlier or later than a given
>>> release.  (Does it have those features/bugs or not?)
>> 
>> Even if OMPI was just in one git repo, the number of commits on master
>> since dev is unrelated to a given release.
> 
> If integration branches were merged upward, "git describe" would yield
> names like v1.8.3-84-g51a7c90, which tells you immediately that it's 84
> commits "ahead" of v1.8.3.

You're not wrong about the advantages of a merge-based workflow.  I just don't 
think it changes what the community is choosing to do right now.

>> Put differently: the dev tag is solely for ordering of nightly snapshot 
>> tarballs.
> 
> It affects git describe output as a side-effect and when someone writes
> the mailing list with a bug in a year-old nightly snapshot, you'll need
> to query the repository (or have a better memory than me) to have any
> idea what they're working with.  Perhaps you are blessed with users that
> don't do things like this.

Checking the repo isn't particularly onerous, IMO.  It's a lot easier than 
searching old bug tickets, which you're also likely to need to do when dealing 
with bugs reported against old snapshots.  Also, there's almost no scenario 
where a user should be reporting bugs against a years-old "master" snapshot.  
Snapshots from the release branches have more useful names (e.g., 
v1.8.3-39-gd07d53e).

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-40-g93eba3a

2014-10-08 Thread Dave Goodell (dgoodell)
No worries, just thought I'd mention it during this "training wheels" phase for 
everyone's sake, not just yours.  Besides, there was an obvious gap in the 
docs. :)

-Dave

On Oct 8, 2014, at 9:42 AM, Howard Pritchard <hpprit...@gmail.com> wrote:

> Hi David,
> 
> Yes I know about this.  I realized as soon as I did the push that it was 
> pushing
> the commits that I'd pulled on top of my changes back into the repo.  ugh.
> 
>  Jeff suggested the pull with rebase.  I added that as default in my global
> config file.
> 
> In the past I'd not hit this because the projects I was working on had very
> little concurrent commits going in.
> 
> thanks for pointing this out though,
> 
> Howard
> 
> 
> 2014-10-08 7:29 GMT-06:00 Dave Goodell (dgoodell) <dgood...@cisco.com>:
> On Oct 3, 2014, at 5:10 PM, git...@crest.iu.edu wrote:
> 
> > - Log -
> > https://github.com/open-mpi/ompi/commit/93eba3ac70606db12465319804f2733f13bc9ca4
> >
> > commit 93eba3ac70606db12465319804f2733f13bc9ca4
> > Merge: fd6a044 bd2974f
> > Author: Howard Pritchard <hpprit...@gmail.com>
> > Date:   Fri Oct 3 16:08:11 2014 -0600
> >
> >Merge branch 'master' of https://github.com/open-mpi/ompi
> 
> Hey Howard,
> 
> If possible, please avoid this sort of merge in the future.  It usually makes 
> the history a bit harder to follow.  A rebase of your local work onto the 
> latest "ompi/master" probably would have been better (though I'm not familiar 
> with the details of this branch+merge).  Not a big deal, just a bit 
> friendlier for everyone.
> 
> It looks like this best practice somehow slipped through the cracks when we 
> put together the OMPI Git documentation, so I've tweaked the wiki to reflect 
> this:
> 
> https://github.com/open-mpi/ompi/wiki/GitBestPractices
> 
> FWIW, it causes a commit DAG that looks like this (note the tangle stemming 
> from 93eba3a and bd2974f):
> 
> 8<
>  * 8191741 (HEAD, origin/master, origin/HEAD, master) tools: add flag to
>  *   23cb00d Merge pull request #225 from hjelmn/master
>  |\
>  | * eed7b45 osc/rdma: fix issue identified by Berk Hess
>  |/
>  * 9c027e6 Update the PMI configure logic to handle the oddball case wher
>  * a422d89 memchecker: per RFC, use calloc for OBJ_NEW
>  * 86f1d5a OPAL: drop dead with core on bad flow. rarely happens with hel
>  *   cd48fbe Merge pull request #221 from opoplawski/master
>  |\
>  | * 2d5832c Fix typo in liboshmem name
>  * | 89535a3 OSHMEM: sshmem mmap: use MAP_PRIVATE instead of MAP_SHARED
>  * | 399fc1b configury: remove unneeded assignments
>  * | fd77ebd OSHMEM: sshmem verbs: allocate memory at fixed address
>  * | 4ac5936 OSHMEM: sshmem verbs: improve hca name parsing
>  * | d82dc7f OSHMEM: Add two new mca variables
>  * | 067fa05 OSHMEM: fixes bug in shmem_lock
>  * |   93eba3a Merge branch 'master' of https://github.com/open-mpi/ompi
>  |\ \
>  | |/
>  | *   bd2974f Merge branch 'master' of ssh://github.com/open-mpi/ompi
>  | |\
>  | | * 0997c91 openmpi-release.sh: update for git
>  | * | fb1f487 Cleanup some cruft resulting from the move of the btl's to
>  * | | fd6a044 Cleanup some cruft resulting from the move of the btl's to
>  * | | b44a244 openmpi-release.sh: update for git
>  * | | 5428301 Remove catamount timer support
>  | |/
>  |/|
>  * | d2bb8d8 remove alps ess component
>  * | d033674 openmpi-nightly-tarball.sh: don't even check v1.6 any more
>  * | 534d773 openmpi-nightly-tarball.sh: fix typo in ompi-release URLs
>  * | 0e21c66 openmpi-nightly-tarball.sh: fix typo
>  * | f72bf3b gkcommit.pl: so long gkcommit; you served us well in SVN day
>  * | a12eef6 find-copyrights.pl: updates for git
>  * | 58e6213 make_dist_tarball: remove debug statement
>  * | 72d1359 create_tarball.sh: update the email to remove SVN references
>  |/
>  * 8cd3ee7 create_tarball.sh: adjust for new VERSION file format
>  * 697b18d Making async copy the default
> 8<
> 
> Best,
> -Dave
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16030.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16031.php



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-40-g93eba3a

2014-10-08 Thread Dave Goodell (dgoodell)
On Oct 3, 2014, at 5:10 PM, git...@crest.iu.edu wrote:

> - Log -
> https://github.com/open-mpi/ompi/commit/93eba3ac70606db12465319804f2733f13bc9ca4
> 
> commit 93eba3ac70606db12465319804f2733f13bc9ca4
> Merge: fd6a044 bd2974f
> Author: Howard Pritchard 
> Date:   Fri Oct 3 16:08:11 2014 -0600
> 
>Merge branch 'master' of https://github.com/open-mpi/ompi

Hey Howard,

If possible, please avoid this sort of merge in the future.  It usually makes 
the history a bit harder to follow.  A rebase of your local work onto the 
latest "ompi/master" probably would have been better (though I'm not familiar 
with the details of this branch+merge).  Not a big deal, just a bit friendlier 
for everyone.

It looks like this best practice somehow slipped through the cracks when we put 
together the OMPI Git documentation, so I've tweaked the wiki to reflect this:

https://github.com/open-mpi/ompi/wiki/GitBestPractices

FWIW, it causes a commit DAG that looks like this (note the tangle stemming 
from 93eba3a and bd2974f):

8<
 * 8191741 (HEAD, origin/master, origin/HEAD, master) tools: add flag to
 *   23cb00d Merge pull request #225 from hjelmn/master
 |\ 
 | * eed7b45 osc/rdma: fix issue identified by Berk Hess
 |/ 
 * 9c027e6 Update the PMI configure logic to handle the oddball case wher
 * a422d89 memchecker: per RFC, use calloc for OBJ_NEW
 * 86f1d5a OPAL: drop dead with core on bad flow. rarely happens with hel
 *   cd48fbe Merge pull request #221 from opoplawski/master
 |\ 
 | * 2d5832c Fix typo in liboshmem name
 * | 89535a3 OSHMEM: sshmem mmap: use MAP_PRIVATE instead of MAP_SHARED
 * | 399fc1b configury: remove unneeded assignments
 * | fd77ebd OSHMEM: sshmem verbs: allocate memory at fixed address
 * | 4ac5936 OSHMEM: sshmem verbs: improve hca name parsing
 * | d82dc7f OSHMEM: Add two new mca variables
 * | 067fa05 OSHMEM: fixes bug in shmem_lock
 * |   93eba3a Merge branch 'master' of https://github.com/open-mpi/ompi
 |\ \ 
 | |/ 
 | *   bd2974f Merge branch 'master' of ssh://github.com/open-mpi/ompi
 | |\ 
 | | * 0997c91 openmpi-release.sh: update for git
 | * | fb1f487 Cleanup some cruft resulting from the move of the btl's to
 * | | fd6a044 Cleanup some cruft resulting from the move of the btl's to
 * | | b44a244 openmpi-release.sh: update for git
 * | | 5428301 Remove catamount timer support
 | |/ 
 |/|  
 * | d2bb8d8 remove alps ess component
 * | d033674 openmpi-nightly-tarball.sh: don't even check v1.6 any more
 * | 534d773 openmpi-nightly-tarball.sh: fix typo in ompi-release URLs
 * | 0e21c66 openmpi-nightly-tarball.sh: fix typo
 * | f72bf3b gkcommit.pl: so long gkcommit; you served us well in SVN day
 * | a12eef6 find-copyrights.pl: updates for git
 * | 58e6213 make_dist_tarball: remove debug statement
 * | 72d1359 create_tarball.sh: update the email to remove SVN references
 |/ 
 * 8cd3ee7 create_tarball.sh: adjust for new VERSION file format
 * 697b18d Making async copy the default
8<

Best,
-Dave



Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-02 Thread Dave Goodell (dgoodell)
On Oct 2, 2014, at 11:47 AM, Ralph Castain  wrote:

> 
> On Oct 2, 2014, at 9:43 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
>> On Oct 2, 2014, at 12:37 PM, Ralph Castain  wrote:
>> 
>>> Sonow that I've fought thru and created a pull request, I find that I 
>>> cannot assign it to anyone for review. The only users on the ompi-release 
>>> branch are myself and Jeff (plus the ompiteam admin).
>>> 
>>> So how do we assign someone to review it?
>> 
>> I think the best way is to do an @mention of someone in the comment.  E.g., 
>> in this case, "@osvegis, can you review this?"
> 
> How are they going to review it, given they don't have authority to do 
> anything on that branch? Can they still comment? Can they reassign them when 
> done?

They should be able to comment.  I don't think they'll be able to change the 
assignment.

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-9-g3263f72

2014-10-02 Thread Dave Goodell (dgoodell)
On Oct 2, 2014, at 11:17 AM, Ralph Castain <r...@open-mpi.org>
 wrote:

> 
> On Oct 2, 2014, at 9:13 AM, Dave Goodell (dgoodell) <dgood...@cisco.com> 
> wrote:
> 
>> On Oct 2, 2014, at 10:38 AM, git...@crest.iu.edu wrote:
>> 
>>> This is an automated email from the git hooks/post-receive script. It was
>>> generated because a ref change was pushed to the repository containing
>>> the project "open-mpi/ompi".
>>> 
>>> The branch, master has been updated
>>>  via  3263f721b6a21966a1c1eea0fdac2a558a15db06 (commit)
>>> from  f21c349bcb3f7c322805d505484951642d1c7965 (commit)
>>> 
>>> Those revisions listed above that are new to this repository have
>>> not appeared on any other notification email; so we list those
>>> revisions in full, below.
>>> 
>>> - Log -
>>> https://github.com/open-mpi/ompi/commit/3263f721b6a21966a1c1eea0fdac2a558a15db06
>>> 
>>> commit 3263f721b6a21966a1c1eea0fdac2a558a15db06
>>> Author: Ralph Castain <r...@open-mpi.org>
>>> Date:   Thu Oct 2 08:37:18 2014 -0700
>>> 
>>>   Strip crlf line endings
>>> 
>>> diff --git a/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln 
>>> b/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln
>>> index 38498d6..7e887fc 100644
>>> --- a/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln
>>> +++ b/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln
>> 
>> Isn't this file a Windows-specific file that probably should still have CRLF 
>> line endings?
> 
> No idea, but we don't support Windows any more, so who cares?

Right, but when the VT folks slurp in a new update from their upstream, won't 
this just get clobbered back to CRLF?  Or worse, cause some sort of merge 
conflict for them?

I don't especially care, just thought it was odd to remove the CRLFs from a 
Windows-only file.

-Dave



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-9-g3263f72

2014-10-02 Thread Dave Goodell (dgoodell)
On Oct 2, 2014, at 10:38 AM, git...@crest.iu.edu wrote:

> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi".
> 
> The branch, master has been updated
>   via  3263f721b6a21966a1c1eea0fdac2a558a15db06 (commit)
>  from  f21c349bcb3f7c322805d505484951642d1c7965 (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi/commit/3263f721b6a21966a1c1eea0fdac2a558a15db06
> 
> commit 3263f721b6a21966a1c1eea0fdac2a558a15db06
> Author: Ralph Castain 
> Date:   Thu Oct 2 08:37:18 2014 -0700
> 
>Strip crlf line endings
> 
> diff --git a/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln 
> b/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln
> index 38498d6..7e887fc 100644
> --- a/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln
> +++ b/ompi/contrib/vt/vt/extlib/otf/otf_vc08.sln

Isn't this file a Windows-specific file that probably should still have CRLF 
line endings?

-Dave



[OMPI devel] cisco MTT test results

2014-09-04 Thread Dave Goodell (dgoodell)
I just accidentally killed four nodes worth of MTT tests in our Cisco lab 
cluster, so some of our MTT results may look quite bad in the near future.  The 
affected nodes were mpi005..mpi008, in case it makes the results easier to 
filter.

Apologies to anyone who was planning on using this particular set of MTT 
results as part of their development process.

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32556 - trunk/orte/mca/oob/tcp

2014-08-20 Thread Dave Goodell (dgoodell)
On Aug 20, 2014, at 11:55 AM, svn-commit-mai...@open-mpi.org wrote:

> Author: rhc (Ralph Castain)
> Date: 2014-08-20 12:55:36 EDT (Wed, 20 Aug 2014)
> New Revision: 32556
> URL: https://svn.open-mpi.org/trac/ompi/changeset/32556
> 
> Log:
> Track down the last piece of the connection problem. It appears that
> providing a netmask of 0 to opal_net_samenetwork results in everything
> looking like it is on the same network. Hence, we were not retaining any
> of the alternative addresses, so we had no other way to check them.
> 
> Refs #4870
> 
> Text files modified: 
>   trunk/orte/mca/oob/tcp/oob_tcp.c| 8 +++-
> 
>   trunk/orte/mca/oob/tcp/oob_tcp_connection.c | 1 +   
> 
>   2 files changed, 8 insertions(+), 1 deletions(-)
> 
> Modified: trunk/orte/mca/oob/tcp/oob_tcp.c
> ==
> --- trunk/orte/mca/oob/tcp/oob_tcp.c  Tue Aug 19 22:48:47 2014(r32555)
> +++ trunk/orte/mca/oob/tcp/oob_tcp.c  2014-08-20 12:55:36 EDT (Wed, 20 Aug 
> 2014)  (r32556)
> @@ -282,6 +282,8 @@
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
> 
> if (AF_INET != pop->af_family) {
> +opal_output_verbose(20, orte_oob_base_framework.framework_output,
> + "%s NOT AF_INET", 
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
> goto cleanup;
> }
> 
> @@ -306,8 +308,12 @@
> 
> /* do we already have this address? */
> OPAL_LIST_FOREACH(maddr, >addrs, mca_oob_tcp_addr_t) {
> -if (opal_net_samenetwork(, (struct sockaddr*)>addr, 
> 0)) {
> +/* require only that the subnet be the same */
> +if (opal_net_samenetwork(, (struct sockaddr*)>addr, 
> 24)) {

So... what if I have my hosts on a 10.123.0.0/16 network or some other network 
with a non-24-bit netmask?

-Dave



[OMPI devel] RFC: add opal/threads/spinlock.h

2014-08-14 Thread Dave Goodell (dgoodell)
WHAT: Add a new "opal/threads/spinlock.h" header to OPAL that will typically 
use the OS spinlock primitives if present.

WHY: opal_mutex_t is too slow for some use cases and opal_atomic_lock_t is 
inefficiently implemented for most architectures

WHEN: timeout is after next week's engineering call on Tuesday, 2014-08-19


As discussed at the June developer meeting, I propose this patch to add 
spinlocks to OPAL.  There are at least a half dozen reasonable ways to 
implement spinlocks; which one is best will vary from platform to platform.  In 
general, the OS spinlock implementations are well tested and efficient.  We 
should usually be relying on those implementations instead of rolling our own.


My primary criticism of this patch is that it muddies the waters a bit with 
opal_atomic_lock_t.  An alternative approach would be to spend some time 
working on improving the opal_atomic_lock_t implementation, but I have two 
concerns with this approach:

1) It's very difficult for me to measure the potential performance impact of 
opal_atomic_lock_t modifications on all of the various platforms that we 
currently run on.  Adding this new implementation allows component maintainers 
to decide if and when to convert to using the new facility.

2) There's a reasonable chance that I'll make a mistake.  Writing tests for 
this stuff helps to catch the really basic errors, but it doesn't help as much 
with the really subtle mistakes.

-Dave



0001-add-opal-threads-spinlock.h.patch
Description: 0001-add-opal-threads-spinlock.h.patch


Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-11 Thread Dave Goodell (dgoodell)
On Aug 11, 2014, at 11:54 AM, Paul Hargrove  wrote:

> I am on the same page with George here - if it's on the list then support it 
> until its been removed.
> 
> I happen to have systems to test, I believe, every supported atomics 
> implementation except for DEC Alpha, and so I did test them all.

My comment was not intended to indicate that I don't value your testing 
contributions, Paul.  I am more concerned that Nathan is wasting time fixing 
support for an effectively useless platform.  It's not like this is a case 
where making the more portable change improves our general correctness on other 
platforms; it's a very (<= ARMv5)-specific situation.

If there's actually an official list of supported platforms somewhere, then 
I'll let Nathan decide whether he wants to submit an RFC to drop ARMv5 support. 
 I know I'd support it, but I don't care enough to write an RFC of my own right 
now.

-Dave




Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-11 Thread Dave Goodell (dgoodell)
On Aug 7, 2014, at 11:37 PM, George Bosilca  wrote:

> Paul's tests identified an small issue with the previous patch (a real 
> corner-case for ARM v5). The patch below is fixing all known issues.

Wait, why do we care about ARMv5?  It's certainly not a serious HPC platform, 
nor is it even a relevant laptop platform at this point (AFAIK).

-Dave



[OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-30 Thread Dave Goodell (dgoodell)
Jeff and I were talking about some namespacing issues that have come up in the 
recent BTL move from OMPI to OPAL.  AFAIK, the current system for namespacing 
external symbols is to name them "mca_FRAMEWORK_COMPONENT_symbol" (e.g., 
"mca_btl_tcp_add_procs" in the tcp BTL).  Similarly, the DSO for the component 
is named "mca_FRAMEWORK_COMPONENT.so" (e.g., "mca_btl_tcp.so").

Jeff asserted that the eventual goal is to move to a system where all MCA 
frameworks/components are also prefixed by the project name.  So the above 
examples become "mca_ompi_btl_tcp_add_procs" and "mca_ompi_btl_tcp.so".  Does 
anyone actually care about pursuing this goal?

I ask because if nobody wants to pursue the goal of adding project names to 
namespaces then I already have an easy solution to most of our namespacing 
problems.  OTOH, if someone does wish to pursue that goal, then I have a 
namespace-related RFC that I would like to propose (in a subsequent email).

-Dave



Re: [OMPI devel] Build failed in Jenkins: ompi_upstream_v1.8_build_debug #265

2014-07-22 Thread Dave Goodell (dgoodell)
Thanks, that did the trick.  I didn't think that could be an issue here, but I 
forgot this build comes from SVN and not git, so it didn't have the "git clean" 
step run from my normal build script. I'll see if I can tweak the Jenkins 
settings so this doesn't happen again.

-Dave

On Jul 22, 2014, at 12:20 PM, Ralph Castain <r...@open-mpi.org> wrote:

> You need to rm -rf ompi/contrib/vt and then svn up again - it's a stale .deps 
> directory entry
> 
> On Jul 22, 2014, at 10:15 AM, Dave Goodell (dgoodell) <dgood...@cisco.com> 
> wrote:
> 
>> FYI, this causes build failures in OTF code in our Jenkins installation.  
>> It's probably caused by this commit:
>> 
>> https://svn.open-mpi.org/trac/ompi/changeset/32265
>> 
>> I don't have time to track it down myself, unfortunately.
>> 
>> -Dave
>> 
>> Begin forwarded message:
>> 
>>> From: <dgood...@cisco.com>
>>> Subject: Build failed in Jenkins: ompi_upstream_v1.8_build_debug #265
>>> Date: July 22, 2014 12:08:34 PM CDT
>>> To: <usnic-jenk...@cisco.com>
>>> 
>>> See 
>>> <https://savbu-usnic.cisco.com/jenkins/job/ompi_upstream_v1.8_build_debug/265/changes>
>>> 
>>> Changes:
>>> 
>>> [rhc] Fixes #4799: Move r32261 to v1.8 branch (Improve verbose message for 
>>> which devices are being used)
>>> 
>>> ---svn-pre-commit-ignore-below---
>>> 
>>> r32261 [[BR]]
>>> Improve verbose message which says which device:ports are being used.  Also 
>>> move where message is generated.
>>> 
>>> [rhc] Fixes #4798: Move r32260 to v1.8 branch (common/verbs: fix usnic 
>>> detection)
>>> 
>>> ---svn-pre-commit-ignore-below---
>>> 
>>> r32260 [[BR]]
>>> common/verbs: fix usnic detection
>>> 
>>> The logic was mishandling the case of a newer kernel and an older
>>> libusnic_verbs.  Simplify usnic_transport() to return constants in the
>>> 2 known cases (not a usNIC device and the TRANSPORT_USNIC_UDP case),
>>> and call the magic probe in all other cases.
>>> 
>>> Reviewed-by: Dave Goodell <dgood...@cisco.com>
>>> 
>>> cmr=v1.8.2:reviewer=ompi-rm1.8
>>> 
>>> [rhc] Fixes #4797: Move r32259 to v1.8 branch (usnic: explicitly handle 
>>> case when)
>>> 
>>> ---svn-pre-commit-ignore-below---
>>> 
>>> r32259 [[BR]]
>>> usnic: explicitly handle case when both endpoints are NULL
>>> 
>>> If we don't explicitly declare that (a == NULL && b == NULL) is
>>> equivalent to qsort, we could end up with wonky sorting order.  I.e.,
>>> it's *possible* that some NULLs could end up in the middle of the
>>> array.
>>> 
>>> Regardless of whether it will ever happen in practice, it makes the
>>> code more clear to also handle the "both are NULL" case.
>>> 
>>> Also fix the 2-spacing indents.
>>> 
>>> Reviewed by Dave Goodell.
>>> 
>>> cmr=v1.8.2:reviewer=ompi-rm1.8
>>> 
>>> [rhc] Fixes #4795: Move r32257 to v1.8 branch (OSHMEM: Set 
>>> SMA_SYMMETRIC_SIZE to default)
>>> 
>>> ---svn-pre-commit-ignore-below---
>>> 
>>> r32257 [[BR]]
>>> OSHMEM: Set SMA_SYMMETRIC_SIZE to default value
>>> 
>>> OpenSHMEMspec 1.1 introduces a set of environment variables that allows 
>>> users to configure the Open-SHMEM implementation, and receive information 
>>> about the implementation.
>>> - Add SMA_SYMMETRIC_SIZE - number of bytes to allocate for symmetric heap
>>> - SHMEM_SYMMETRIC_HEAP_SIZE (Mellanox extension) is used by a user to 
>>> provide a size of symmetric area. This change sets this env variable in 
>>> case a user does not set this variable
>>> directly.
>>> 
>>> fixed by Igor, reviewed by Miked
>>> 
>>> cmr=v1.8.2:reviwer=ompi-rm1.8
>>> 
>>> [rhc] Fixes #4794: Move r32256 to v1.8 branch (MXM: use builk connection 
>>> establishment)
>>> 
>>> ---svn-pre-commit-ignore-below---
>>> 
>>> r32256 [[BR]]
>>> MXM: use builk connection establishment API
>>> 
>>> fixed by Vasily, reviewed by Yossi/Miked
>>> 
>>> cmr=v1.8.2:reviwer=ompi-rm1.8
>>> 
>>> [rhc] Fixes #4793: Move r32253 to v1.8 branch (configure.ac: use the 
>>> portable '=')
>>> 
>>> ---svn-pre-commit-ignore-below---
>>> 
>>> r32253 [[BR]]
>>

[OMPI devel] Fwd: Build failed in Jenkins: ompi_upstream_v1.8_build_debug #265

2014-07-22 Thread Dave Goodell (dgoodell)
FYI, this causes build failures in OTF code in our Jenkins installation.  It's 
probably caused by this commit:

https://svn.open-mpi.org/trac/ompi/changeset/32265

I don't have time to track it down myself, unfortunately.

-Dave

Begin forwarded message:

> From: 
> Subject: Build failed in Jenkins: ompi_upstream_v1.8_build_debug #265
> Date: July 22, 2014 12:08:34 PM CDT
> To: 
> 
> See 
> 
> 
> Changes:
> 
> [rhc] Fixes #4799: Move r32261 to v1.8 branch (Improve verbose message for 
> which devices are being used)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32261 [[BR]]
> Improve verbose message which says which device:ports are being used.  Also 
> move where message is generated.
> 
> [rhc] Fixes #4798: Move r32260 to v1.8 branch (common/verbs: fix usnic 
> detection)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32260 [[BR]]
> common/verbs: fix usnic detection
> 
> The logic was mishandling the case of a newer kernel and an older
> libusnic_verbs.  Simplify usnic_transport() to return constants in the
> 2 known cases (not a usNIC device and the TRANSPORT_USNIC_UDP case),
> and call the magic probe in all other cases.
> 
> Reviewed-by: Dave Goodell 
> 
> cmr=v1.8.2:reviewer=ompi-rm1.8
> 
> [rhc] Fixes #4797: Move r32259 to v1.8 branch (usnic: explicitly handle case 
> when)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32259 [[BR]]
> usnic: explicitly handle case when both endpoints are NULL
> 
> If we don't explicitly declare that (a == NULL && b == NULL) is
> equivalent to qsort, we could end up with wonky sorting order.  I.e.,
> it's *possible* that some NULLs could end up in the middle of the
> array.
> 
> Regardless of whether it will ever happen in practice, it makes the
> code more clear to also handle the "both are NULL" case.
> 
> Also fix the 2-spacing indents.
> 
> Reviewed by Dave Goodell.
> 
> cmr=v1.8.2:reviewer=ompi-rm1.8
> 
> [rhc] Fixes #4795: Move r32257 to v1.8 branch (OSHMEM: Set SMA_SYMMETRIC_SIZE 
> to default)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32257 [[BR]]
> OSHMEM: Set SMA_SYMMETRIC_SIZE to default value
> 
> OpenSHMEMspec 1.1 introduces a set of environment variables that allows users 
> to configure the Open-SHMEM implementation, and receive information about the 
> implementation.
> - Add SMA_SYMMETRIC_SIZE - number of bytes to allocate for symmetric heap
> - SHMEM_SYMMETRIC_HEAP_SIZE (Mellanox extension) is used by a user to provide 
> a size of symmetric area. This change sets this env variable in case a user 
> does not set this variable
>  directly.
> 
> fixed by Igor, reviewed by Miked
> 
> cmr=v1.8.2:reviwer=ompi-rm1.8
> 
> [rhc] Fixes #4794: Move r32256 to v1.8 branch (MXM: use builk connection 
> establishment)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32256 [[BR]]
> MXM: use builk connection establishment API
> 
> fixed by Vasily, reviewed by Yossi/Miked
> 
> cmr=v1.8.2:reviwer=ompi-rm1.8
> 
> [rhc] Fixes #4793: Move r32253 to v1.8 branch (configure.ac: use the portable 
> '=')
> 
> ---svn-pre-commit-ignore-below---
> 
> r32253 [[BR]]
> configure.ac: use the portable '=' operator for the 'test' command
> 
> Thanks to Kevin M. Buckley for providing the patch
> 
> cmr=v1.8.2:reviewer=rhc
> 
> [rhc] Fixes #4791: Move r32245 to v1.8 branch (oshmem: remove automatically 
> generated files)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32245 [[BR]]
> oshmem: remove automatically generated files from the tarball
> 
> cmr=v1.8.2:reviewer=miked
> 
> [rhc] Fixes #4790: Move r32244 to v1.8 branch (mpi: remove automatically 
> generated file)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32244 [[BR]]
> mpi: remove automatically generated file from the tarball
> 
> cmr=v1.8.2:reviewer=jsquyres
> 
> [rhc] Fixes #4789: Move r32243 to v1.8 branch (vt: remove automatically 
> generated files)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32243 [[BR]]
> vt: remove automatically generated files from the tarball
> 
> cmr=v1.8.2:reviewer=jurenz
> 
> [rhc] Fixes #4788: Move r32231 to v1.8 branch (Silence warning)
> 
> ---svn-pre-commit-ignore-below---
> 
> r32231 [[BR]]
> Silence warning
> 
> cmr=v1.8.2:reviewer=hjelmn
> 
> [rhc] Fixes #4786: Move r29732 to v1.8 branch (BUILD: support new automake)
> 
> ---svn-pre-commit-ignore-below---
> 
> r29732 [[BR]]
> Changes to VT/OTF:
> Fixed warnings about the need of the 'subdir-objects' option when using 
> Automake v1.14.
> Due to a bug in Automake (see 
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928) the 'subdir-objects' 
> option cannot be enabled.
> To get around this problem external sources files are sym linked in the 
> current build directory (as done in ompi/mpi/c/profile) to lead Automake to 
> believe that all source files are in the same directory.
> 
> --
> [...truncated 10830 lines...]
>  CC   prget_f.lo
>  CC   

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 3:08 PM, Joshua Ladd  wrote:

> Ralph warned me that no matter what decision we made, someone would probably 
> violently object. So, with that in mind, let me put my diplomat hat on...

FWIW, I don't think my objections here have been "violent".

> Dave, I'm sorry you view this as a "crapification" of your mpirun user 
> interface. Your lament is duly noted and we are happy to work with you to 
> come to (yet another) happy compromise. It's unfortunate that the issue 
> wasn't raised during the dev meeting when these decisions were made (you were 
> sitting right next to me while I discussed this out loud with Jeff and 
> Ralph.) At the time, it seemed all interested parties had expressed their 
> concern and voiced their opinion on the matter, and the implementation you 
> see in the trunk today, including the decision to deprecate the "-x" option, 
> was the generally agreed upon consensus. 

I think I misunderstood that the "-x" option was going away altogether, so 
sorry for the after-the-fact complaints.  Probably poor listening on my part, 
but I don't think that changes my opinion on the UI impact.

I'm aware that I'm in the minority here.  I don't think that I care enough to 
continue arguing over this, so let's just let it be as-is.  An enhancement to 
permit some form of delimiter escaping would probably still be nice, but is low 
priority.

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 12:15 PM, Mike Dubman  wrote:

> we have a strong use-case for list of env variables passed as mca params.(it 
> was presented and discussed in the past).

I'm not disputing your use case for "mca_base_env_list".  I'm only lamenting 
the crapification of our mpirun user interface.  We had an earlier change along 
these lines based on similar reasoning: the easy-to-use-and-remember "--pernode 
N" was replaced with "--map-by ppr:N:node", which I cannot ever seem to 
remember.

> we can rename opal_base_envlist as "-mca x var=val" for consistency.
> also, "-x" param now is just an alias for "-mca opal_base_envlist var=val" - 
> so, we can keep it (w/o deprecation warning) as it re-uses same infra.

Mike, will you make this change?

Thanks,
-Dave

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 12:27 PM, Joshua Ladd  wrote:

> Dave,
> 
> Your example will error out. If someone tries to set envars with both 
> mechanism, the job fails. The decision to do so was also made at the Dev 
> meeting and is so that we don't have to do this kind of checking. 

Hmm, indeed it does.  I had mis-installed my OMPI build when I first tested 
this.  I also didn't expect that setting "-mca mca_base_env_list foo=quux" on 
the command line also would cause the environment to be updated for the code I 
previously quoted.

Apologies for the false bug alarm.

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 11:31 AM, Ralph Castain  wrote:

> Nobody was "against" retaining it. The issue is that "-x" isn't an MCA 
> parameter, nor does it get translated to one under the covers. So the problem 
> was one of how to insert it into the typical MCA param precedence chain.

I understand the combination of the two features is clunky and could lead to 
odd corner cases, but the "-x" argument is a feature I actually use on a fairly 
regular basis, but I am unlikely to use mca_base_env_list unless given no other 
choice.  It's just a worse, clunkier interface unless one really needs to set 
that MCA parameter via environment variable.

So can we just strike the deprecation warning that is currently issued when 
"-x" is passed in the absence of "mca_base_env_list"?

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 15, 2014, at 2:03 PM, Mike Dubman  wrote:

> these are two separate issues:
> 
> 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same way
> opal_base_envlist does the same as "-x" and can be used in the very same 
> fashion as -x
> 
> 2. When list of vars is passed with help of opal_base_envlist, the escaping 
> is possible but escaped char should differ from delimiter char.

That would be my preference (use something like "\" as the escape char).  
Though we could always go with a scheme where a doubled delimiter means 
"literal delimiter", sort of like "$$" in a Makefile.

> I think -x should stay as shotrt-form alias for -mca opal_base_envlist 
> var=val and -mca opal_base_envlist var.
> on dev meeting it was decided to deprecate it as some point.

Can we revisit this decision?  Mike and I both seem to be in favor of retaining 
"-x", at least for non-conflicting uses.  Would someone who is against 
retaining "-x" please speak up in favor of that position?

Also, Mike, I just looked again at the code and I don't think it is robustly 
checking for conflict cases.  It's possible to do this and you won't get an 
error with the current code, right?

8<
$ mpirun -mca mca_base_env_list foo=bar -x foo=baz ...
8<

See this code, which only looks at the environment when looking for 
"mca_base_env_list":

> Modified: trunk/orte/tools/orterun/orterun.c
> ==
> --- trunk/orte/tools/orterun/orterun.cTue Jul  8 20:10:04 2014
> (r32162)
> +++ trunk/orte/tools/orterun/orterun.c2014-07-08 20:38:25 EDT (Tue, 
> 08 Jul 2014)  (r32163)
> @@ -1722,6 +1722,13 @@
>  
>  /* Did the user request to export any environment variables on the cmd 
> line? */
>  if (opal_cmd_line_is_taken(_line, "x")) {
> +char* env_set_flag = getenv("OMPI_MCA_mca_base_env_list");
> +if (NULL != env_set_flag) {
> +orte_show_help("help-orterun.txt", "orterun:conflict-env-set", 
> false);
> +return ORTE_ERR_FATAL;
> +} else {
> +orte_show_help("help-orterun.txt", "orterun:deprecated-env-set", 
> false);
> +}
>  j = opal_cmd_line_get_ninsts(_line, "x");
>  for (i = 0; i < j; ++i) {
>  param = opal_cmd_line_get_param(_line, "x", i, 0);


-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-15 Thread Dave Goodell (dgoodell)
This commit (and the subsequent amendments to the feature) doesn't appear to 
support escaping the separator.  A later commit allows you to change the 
separator character, which helps, but AFAICS you still can't actually escape 
the separator itself.  That seems like a real deficiency to me...

Furthermore, I really like the "-x" argument and I'm sad to see that it's being 
deprecated in favor of a much clunkier syntax.  Is there a good reason we can't 
keep the "-x" syntax and only complain when there is a conflict with the 
mca_base_env_list variable?

-Dave

On Jul 8, 2014, at 7:38 PM, svn-commit-mai...@open-mpi.org wrote:

> Author: jladd (Joshua Ladd)
> Date: 2014-07-08 20:38:25 EDT (Tue, 08 Jul 2014)
> New Revision: 32163
> URL: https://svn.open-mpi.org/trac/ompi/changeset/32163
> 
> Log:
> Opal: add a new MCA parameter that allows the user to specify a list of 
> environment variables. This parameter will become the standard mechanism by 
> which environment variables are set for OMPI applications replacing the -x 
> option.   
> 
> mpirun ... -x env_foo1=val1 -x env_foo2 -x env_foo3=val3  should now be 
> expressed as
> 
> mpirun ... -mca mca_base_env_list env_foo1=val1+env_foo2+env_foo3=val3. 
> 
> The motivation for doing this is so that a list of environment variables may 
> be set via standard MCA mechanisms such as mca parameter files, amca lists, 
> etc. 
> 
> This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.
> 
> Text files modified: 
>   trunk/opal/mca/base/help-mca-var.txt  |11   
>   
>   trunk/opal/mca/base/mca_base_var.c|52 
> 
>   trunk/orte/tools/orterun/help-orterun.txt |13 + 
>   
>   trunk/orte/tools/orterun/orterun.c| 7 + 
>   
>   4 files changed, 82 insertions(+), 1 deletions(-)
> 
> Modified: trunk/opal/mca/base/help-mca-var.txt
> ==
> --- trunk/opal/mca/base/help-mca-var.txt  Tue Jul  8 20:10:04 2014
> (r32162)
> +++ trunk/opal/mca/base/help-mca-var.txt  2014-07-08 20:38:25 EDT (Tue, 
> 08 Jul 2014)  (r32163)
> @@ -121,3 +121,14 @@
> 
>   Value:  %s
>   Source: %s
> +#
> +[incorrect-env-list-param]
> +WARNING: The format of MCA parameter "mca_base_env_list" is a plus-sign (+) 
> delimited
> +list of VAR=VAL and/or VAR instances, e.g.: -mca mca_base_env_list 
> VAR1=VAL1+VAR2+VAR3=VAL3;... 
> +If a variable, VAR, is listed but not explicitly assigned a value in the 
> command line, VAR will 
> +be assigned the value set in the executing environment.
> +
> +The following environment variable was listed unassigned in 
> "mca_base_env_list", but was
> +not found in your environment:
> +  Variable: %s
> +  MCA variable value:   %s
> 
> Modified: trunk/opal/mca/base/mca_base_var.c
> ==
> --- trunk/opal/mca/base/mca_base_var.cTue Jul  8 20:10:04 2014
> (r32162)
> +++ trunk/opal/mca/base/mca_base_var.c2014-07-08 20:38:25 EDT (Tue, 
> 08 Jul 2014)  (r32163)
> @@ -62,6 +62,7 @@
> static char *mca_base_var_override_file = NULL;
> static char *mca_base_var_file_prefix = NULL;
> static char *mca_base_param_file_path = NULL;
> +static char *mca_base_env_list = NULL;
> static bool mca_base_var_suppress_override_warning = false;
> static opal_list_t mca_base_var_file_values;
> static opal_list_t mca_base_var_override_values;
> @@ -123,6 +124,7 @@
> static int var_set_initial (mca_base_var_t *var);
> static int var_get (int vari, mca_base_var_t **var_out, bool original);
> static int var_value_string (mca_base_var_t *var, char **value_string);
> +static int mca_base_var_process_env_list(void);
> 
> /*
>  * classes
> @@ -255,11 +257,61 @@
> mca_base_var_initialized = true; 
> 
> mca_base_var_cache_files(false);
> +
> +/* set nesessary env variables for external usage */
> +mca_base_var_process_env_list();
> }
> 
> return OPAL_SUCCESS;
> }
> 
> +static int mca_base_var_process_env_list(void)
> +{
> +int i, ret;
> +char** tokens;
> +char* ptr;
> +char* param, *value;
> +ret = mca_base_var_register ("opal", "mca", "base", "env_list",
> + "Set SHELL env variables",
> + MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, 
> OPAL_INFO_LVL_3,
> + MCA_BASE_VAR_SCOPE_READONLY, 
> _base_env_list);
> +if ((0 > ret) || (NULL == mca_base_env_list)) {
> +return OPAL_SUCCESS;
> +}
> +tokens = opal_argv_split(mca_base_env_list, '+');
> +if (NULL != tokens) {
> +for (i = 0; NULL != tokens[i]; i++) {
> +if (NULL == (ptr = strchr(tokens[i], '='))) {
> +value = getenv(tokens[i]);
> + 

Re: [MTT devel] [MTT svn] GIT: MTT branch master updated. 016088f2a0831b32ab5fd6f60f4cabe67e92e594

2014-06-23 Thread Dave Goodell (dgoodell)
On Jun 23, 2014, at 8:48 AM, Mike Dubman  wrote:

> btw, i think now, when parent process is killed before child, OS makes child 
> as "" which stick around for good.

The grandparent should inherit the child.  If the grandparent then does not 
wait(2) on the child, then the child will remain a zombie / defunct.  So in our 
specific case, this behavior will depend on what the parent process of mpirun 
is and whether it is waiting on child processes appropriately.

-Dave



Re: [OMPI devel] Patch to fix valgrind warning

2014-04-29 Thread Dave Goodell (dgoodell)
I've filed a ticket for this so that we don't lose track of it: 
https://svn.open-mpi.org/trac/ompi/ticket/4578

-Dave

On Apr 24, 2014, at 2:37 AM, Lisandro Dalcin  wrote:

> Please review the attached patch,
> 
> ==19533== Conditional jump or move depends on uninitialised value(s)
> ==19533==at 0x140DAB78: component_select (osc_sm_component.c:352)
> ==19533==by 0xD9BA0B2: ompi_osc_base_select (osc_base_init.c:73)
> ==19533==by 0xD9314C1: ompi_win_allocate (win.c:182)
> ==19533==by 0xD982C4E: PMPI_Win_allocate (pwin_allocate.c:79)
> ==19533==by 0xD628887: __pyx_pw_6mpi4py_3MPI_3Win_11Allocate
> (mpi4py.MPI.c:109170)
> ==19533==by 0x38442E0BD3: PyEval_EvalFrameEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442E21EC: PyEval_EvalCodeEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442E22F1: PyEval_EvalCode (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F20DB: PyImport_ExecCodeModuleEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F2357: ??? (in /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F2FF0: ??? (in /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F323C: ??? (in /usr/lib64/libpython2.7.so.1.0)
> ==19533==
> ==19533== Conditional jump or move depends on uninitialised value(s)
> ==19533==at 0x140DAB78: component_select (osc_sm_component.c:352)
> ==19533==by 0xD9BA0B2: ompi_osc_base_select (osc_base_init.c:73)
> ==19533==by 0xD93174D: ompi_win_allocate_shared (win.c:213)
> ==19533==by 0xD982FD0: PMPI_Win_allocate_shared 
> (pwin_allocate_shared.c:80)
> ==19533==by 0xD62C727:
> __pyx_pw_6mpi4py_3MPI_3Win_13Allocate_shared (mpi4py.MPI.c:109409)
> ==19533==by 0x38442E0BD3: PyEval_EvalFrameEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442E21EC: PyEval_EvalCodeEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442E22F1: PyEval_EvalCode (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F20DB: PyImport_ExecCodeModuleEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F2357: ??? (in /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F2FF0: ??? (in /usr/lib64/libpython2.7.so.1.0)
> ==19533==by 0x38442F323C: ??? (in /usr/lib64/libpython2.7.so.1.0)
> 
> 
> -- 
> Lisandro Dalcin
> ---
> CIMEC (UNL/CONICET)
> Predio CONICET-Santa Fe
> Colectora RN 168 Km 472, Paraje El Pozo
> 3000 Santa Fe, Argentina
> Tel: +54-342-4511594 (ext 1016)
> Tel/Fax: +54-342-4511169
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14591.php



Re: [OMPI devel] MPI_Comm_create_group()

2014-04-29 Thread Dave Goodell (dgoodell)
Lisandro,

Thanks for the bug report.  It seems that nobody has time to work on this at 
the moment, so I've filed a ticket so that we don't lose track of it:

https://svn.open-mpi.org/trac/ompi/ticket/4577

-Dave

On Apr 21, 2014, at 9:55 AM, Lisandro Dalcin  wrote:

> A very basic test for MPI_Comm_create_group() is failing for me. I'm
> pasting the code, the failure, and output from valgrind.
> 
> [dalcinl@kw2060 openmpi]$ cat comm_create_group.c
> #include 
> int main(int argc, char *argv[])
> {
>  MPI_Group group;
>  MPI_Comm comm;
>  MPI_Init(, );
>  MPI_Comm_group(MPI_COMM_WORLD, );
>  MPI_Comm_create_group(MPI_COMM_WORLD, group, 0, );
>  MPI_Comm_free();
>  MPI_Group_free();
>  MPI_Finalize();
>  return 0;
> }
> [dalcinl@kw2060 openmpi]$ mpicc comm_create_group.c
> [dalcinl@kw2060 openmpi]$ ./a.out
> [kw2060:22673] *** An error occurred in MPI_Comm_create_group
> [kw2060:22673] *** reported by process [140737483440129,140733193388032]
> [kw2060:22673] *** on communicator MPI_COMM_WORLD
> [kw2060:22673] *** MPI_ERR_UNKNOWN: unknown error
> [kw2060:22673] *** MPI_ERRORS_ARE_FATAL (processes in this
> communicator will now abort,
> [kw2060:22673] ***and potentially your MPI job)
> 
> 
> [dalcinl@kw2060 openmpi]$ valgrind -q ./a.out
> ==22675== Conditional jump or move depends on uninitialised value(s)
> ==22675==at 0x4C457D6: ompi_comm_nextcid (comm_cid.c:262)
> ==22675==by 0x4C42FA8: ompi_comm_create_group (comm.c:1109)
> ==22675==by 0x4C81E35: PMPI_Comm_create_group (pcomm_create_group.c:77)
> ==22675==by 0x4008FF: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
> ==22675==
> ==22675== Conditional jump or move depends on uninitialised value(s)
> ==22675==at 0x4C42FB0: ompi_comm_create_group (comm.c:1116)
> ==22675==by 0x4C81E35: PMPI_Comm_create_group (pcomm_create_group.c:77)
> ==22675==by 0x4008FF: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
> ==22675==
> ==22675== Conditional jump or move depends on uninitialised value(s)
> ==22675==at 0x4C81E46: PMPI_Comm_create_group (pcomm_create_group.c:79)
> ==22675==by 0x4008FF: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
> ==22675==
> ==22675== Conditional jump or move depends on uninitialised value(s)
> ==22675==at 0x4C81BA0: ompi_errcode_get_mpi_code (errcode-internal.h:64)
> ==22675==by 0x4C81E51: PMPI_Comm_create_group (pcomm_create_group.c:79)
> ==22675==by 0x4008FF: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
> ==22675==
> ==22675== Conditional jump or move depends on uninitialised value(s)
> ==22675==at 0x4C4AA14: opal_pointer_array_get_item
> (opal_pointer_array.h:130)
> ==22675==by 0x4C4AA60: ompi_mpi_errnum_get_string (errcode.h:122)
> ==22675==by 0x4C4B0B4: backend_fatal_aggregate 
> (errhandler_predefined.c:192)
> ==22675==by 0x4C4B657: backend_fatal (errhandler_predefined.c:334)
> ==22675==by 0x4C4AB7C: ompi_mpi_errors_are_fatal_comm_handler
> (errhandler_predefined.c:69)
> ==22675==by 0x4C4A63E: ompi_errhandler_invoke (errhandler_invoke.c:53)
> ==22675==by 0x4C81E81: PMPI_Comm_create_group (pcomm_create_group.c:79)
> ==22675==by 0x4008FF: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
> ==22675==
> ==22675== Use of uninitialised value of size 8
> ==22675==at 0x327BC47B9B: _itoa_word (in /usr/lib64/libc-2.18.so)
> ==22675==by 0x327BC48AD0: vfprintf (in /usr/lib64/libc-2.18.so)
> ==22675==by 0x327BC74D52: vasprintf (in /usr/lib64/libc-2.18.so)
> ==22675==by 0x52E6C4B: opal_show_help_vstring (show_help.c:309)
> ==22675==by 0x4FCFBB4: orte_show_help (show_help.c:591)
> ==22675==by 0x4C4B1B5: backend_fatal_aggregate 
> (errhandler_predefined.c:201)
> ==22675==by 0x4C4B657: backend_fatal (errhandler_predefined.c:334)
> ==22675==by 0x4C4AB7C: ompi_mpi_errors_are_fatal_comm_handler
> (errhandler_predefined.c:69)
> ==22675==by 0x4C4A63E: ompi_errhandler_invoke (errhandler_invoke.c:53)
> ==22675==by 0x4C81E81: PMPI_Comm_create_group (pcomm_create_group.c:79)
> ==22675==by 0x4008FF: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
> ==22675==
> ==22675== Conditional jump or move depends on uninitialised value(s)
> ==22675==at 0x327BC47BA5: _itoa_word (in /usr/lib64/libc-2.18.so)
> ==22675==by 0x327BC48AD0: vfprintf (in /usr/lib64/libc-2.18.so)
> ==22675==by 0x327BC74D52: vasprintf (in /usr/lib64/libc-2.18.so)
> ==22675==by 0x52E6C4B: opal_show_help_vstring (show_help.c:309)
> ==22675==by 0x4FCFBB4: orte_show_help (show_help.c:591)
> ==22675==by 0x4C4B1B5: backend_fatal_aggregate 
> (errhandler_predefined.c:201)
> ==22675==by 0x4C4B657: backend_fatal (errhandler_predefined.c:334)
> ==22675==by 0x4C4AB7C: ompi_mpi_errors_are_fatal_comm_handler
> (errhandler_predefined.c:69)
> ==22675==by 0x4C4A63E: ompi_errhandler_invoke (errhandler_invoke.c:53)
> ==22675==by 0x4C81E81: PMPI_Comm_create_group (pcomm_create_group.c:79)
> ==22675== 

Re: [OMPI devel] 1-question developer poll

2014-04-16 Thread Dave Goodell (dgoodell)
On Apr 16, 2014, at 5:32 AM, Jeff Squyres (jsquyres)  wrote:

> What source code repository technology(ies) do you use for Open MPI 
> development? (indicate all that apply)
> 
> - SVN
> - Mercurial
> - Git

Mostly Git (via the Github mirror and git-svn), and very rarely direct SVN.  
Never Mercurial.

-Dave



Re: [OMPI devel] "--pernode N" bug?

2014-03-31 Thread Dave Goodell (dgoodell)
*sigh*

Thanks Ralph.  I had another MPI implementation's CLI options wedged in my 
brain (where "--perhost" take's an argument).

Sorry for the noise,
-Dave

On Mar 31, 2014, at 12:05 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Well, that's why it is a "deprecated" option! :-)
> 
> Looks like the command line parser then gets confused by the error in your 
> option - if you look more closely at the error, you'll see that it picked up 
> the "2" as the name of your executable. This is because "pernode" doesn't 
> take an argument - it is shorthand for "npernode 1". You probably meant to 
> use the "npernode" option, which would have worked.
> 
> 
> On Mar 31, 2014, at 9:57 AM, Dave Goodell (dgoodell) <dgood...@cisco.com> 
> wrote:
> 
>> Ralph,
>> 
>> When I use the "--pernode" option (instead of "--map-by ppr:1:node") with 
>> v1.8@r31295, I get this:
>> 
>> 8<
>> $ mpiexec --pernode 2 -n 4 --host dg1,dg2 ./ring_c
>> --
>> The following command line options and corresponding MCA parameter have
>> been deprecated and replaced as follows:
>> 
>> Command line options:
>>   Deprecated:  --pernode, -pernode
>>   Replacement: --map-by ppr:1:node
>> 
>> Equivalent MCA parameter:
>>   Deprecated:  rmaps_base_pernode, rmaps_ppr_pernode
>>   Replacement: rmaps_base_mapping_policy=ppr:1:node
>> 
>> The deprecated forms *will* disappear in a future version of Open MPI.
>> Please update to the new syntax.
>> --
>> --
>> mpiexec was unable to find the specified executable file, and therefore
>> did not launch the job.  This error was first reported for process
>> rank 0; it may have occurred for other processes as well.
>> 
>> NOTE: A common cause for this error is misspelling a mpiexec command
>> line parameter option (remember that mpiexec interprets the first
>> unrecognized command line token as the executable).
>> 
>> Node:   savbu-usnic-a
>> Executable: 2
>> --
>> 8<
>> 
>> That's a strange error for two reasons:
>> 
>> * because mpiexec shouldn't be launching on the head node (I passed "--host 
>> dg1,dg2")
>> * because the head node (savbu-usnic-a) actually does have a copy of this 
>> file in the exact same place as dg1/dg2
>> 
>> Everything works as expected if I pass the non-deprecated form of the option 
>> to mpiexec.  I checked quickly at the tip of the v1.7 branch (v1.7@r31182) 
>> and it has the same behavior.  I have not tried any other revisions yet.
>> 
>> -Dave
>> 
> 



[OMPI devel] "--pernode N" bug?

2014-03-31 Thread Dave Goodell (dgoodell)
Ralph,

When I use the "--pernode" option (instead of "--map-by ppr:1:node") with 
v1.8@r31295, I get this:

8<
$ mpiexec --pernode 2 -n 4 --host dg1,dg2 ./ring_c
--
The following command line options and corresponding MCA parameter have
been deprecated and replaced as follows:

  Command line options:
Deprecated:  --pernode, -pernode
Replacement: --map-by ppr:1:node

  Equivalent MCA parameter:
Deprecated:  rmaps_base_pernode, rmaps_ppr_pernode
Replacement: rmaps_base_mapping_policy=ppr:1:node

The deprecated forms *will* disappear in a future version of Open MPI.
Please update to the new syntax.
--
--
mpiexec was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpiexec command
  line parameter option (remember that mpiexec interprets the first
  unrecognized command line token as the executable).

Node:   savbu-usnic-a
Executable: 2
--
8<

That's a strange error for two reasons:

* because mpiexec shouldn't be launching on the head node (I passed "--host 
dg1,dg2")
* because the head node (savbu-usnic-a) actually does have a copy of this file 
in the exact same place as dg1/dg2

Everything works as expected if I pass the non-deprecated form of the option to 
mpiexec.  I checked quickly at the tip of the v1.7 branch (v1.7@r31182) and it 
has the same behavior.  I have not tried any other revisions yet.

-Dave



[OMPI devel] MPIEXEC_TIMEOUT broken in v1.7 branch @ r31103

2014-03-18 Thread Dave Goodell (dgoodell)
Ralph,

I'm seeing problems with MPIEXEC_TIMEOUT in v1.7 @ r31103 (fairly close to 
HEAD):

8<
MPIEXEC_TIMEOUT=8 mpirun --mca btl usnic,sm,self -np 4 ./sleeper
--
The user-provided time limit for job execution has been
reached:

  MPIEXEC_TIMEOUT: 8 seconds

The job will now be aborted. Please check your code and/or
adjust/remove the job execution time limit (as specified
by MPIEXEC_TIMEOUT in your environment).

--
srun: error: mpi015: task 0: Killed
srun: Terminating job step 689585.2
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
^C[savbu-usnic-a:26668] [[14634,0],0]->[[14634,0],1] 
mca_oob_tcp_msg_send_bytes: write failed: Connection reset by peer (104) [sd = 
16]
[savbu-usnic-a:26668] [[14634,0],0]-[[14634,0],1] 
mca_oob_tcp_peer_send_handler: unable to send header

^CAbort is in progress...hit ctrl-c again within 5 seconds to forcibly terminate

^C
8<

Where each of the "^C" is a ctrl-c with arbitrary was allowed to pass 
beforehand (several minutes for the first two, <5s in the third).

Where "sleeper" is just an MPI program that does:

8<
MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );
MPI_Comm_size(MPI_COMM_WORLD, );

while (1) {
sleep(60);
}

MPI_Finalize();
8<

It happens under slurm and SSH.  If I launch on localhost (no --host/--hostfile 
option, no slurm, etc.) then it exits just fine.  The example output I gave 
above used the "usnic" BTL, but "tcp" has identical behavior.

This worked fine in v1.7.4.  I've bisected the change in behavior down to 
r30981: https://svn.open-mpi.org/trac/ompi/changeset/30981

Should I file a ticket?

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r31005 - trunk/ompi/mca/bcol/basesmuma

2014-03-11 Thread Dave Goodell (dgoodell)
Might want to replace the bzero with memset while you're at it.  I recall 
hitting portability problems on weird systems and Linux systems where 
features.h has been poked the wrong way with "_POSIX_SOURCE" and friends.

-Dave

On Mar 11, 2014, at 4:59 PM, svn-commit-mai...@open-mpi.org wrote:

> Author: rhc (Ralph Castain)
> Date: 2014-03-11 17:59:17 EDT (Tue, 11 Mar 2014)
> New Revision: 31005
> URL: https://svn.open-mpi.org/trac/ompi/changeset/31005
> 
> Log:
> Silence warning
> 
> cmr=v1.8:reviewer=hjelmn
> 
> Text files modified: 
>   trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_module.c | 2 +-
>   
>   1 files changed, 1 insertions(+), 1 deletions(-)
> 
> Modified: trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_module.c
> ==
> --- trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_module.c Tue Mar 11 
> 17:42:42 2014(r31004)
> +++ trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_module.c 2014-03-11 
> 17:59:17 EDT (Tue, 11 Mar 2014)  (r31005)
> @@ -85,7 +85,7 @@
> mca_bcol_basesmuma_module_construct(mca_bcol_basesmuma_module_t *module)
> {
> /* initialize all values to 0 */
> -bzero ((uintptr_t) module + sizeof (module->super), sizeof (*module) - 
> sizeof (module->super));
> +bzero ((void*)((uintptr_t) module + sizeof (module->super)), sizeof 
> (*module) - sizeof (module->super));
> module->super.bcol_component = (mca_bcol_base_component_t *) 
> _bcol_basesmuma_component;
> module->super.list_n_connected = NULL;
> module->super.hier_scather_offset = 0;
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn



[OMPI devel] onesided/test_acc2 failures

2014-03-11 Thread Dave Goodell (dgoodell)
Nathan,

The onesided/test_acc2 test is failing in our Cisco MTT runs on the trunk and 
v1.7.5 branches:

8<
 test_acc2 == Mon Mar 10 15:31:47 2014

Time per int accumulate 0.769040 microsecs
P0, Test No. 0, PASSED: accumulate performance Mon Mar 10 15:31:47 2014

 test_acc2 == Mon Mar 10 15:31:47 2014

P7, Test No. 0, PASSED: multi-offset accumulate Mon Mar 10 15:31:47 2014

P0, Test No. 1 CHECK: accumulate self without permission, nfail=1, Mon Mar 10 
15:31:47 2014

P0, Test No. 2 CHECK: accumulate self, nfail=1, Mon Mar 10 15:31:49 2014

P1, Test No. 0 CHECK: accumulate non-self, nfail=1, Mon Mar 10 15:31:51 2014

---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[30384,1],1]
  Exit code:16
--
8<

I've bisected from the v1.7.4 to r30977 and the test begins failing around the 
time of r30894 (the big one-sided CMR from trunk): 
https://svn.open-mpi.org/trac/ompi/changeset/30894

The build on the v1.7 branch was in the (inclusive) range r30893:r30896, so 
there's a slim chance that this is one of r30893, r30895, or r30896 instead, 
but given the test failure that seems unlikely.

Valgrind runs are not clean on this test, but they were insufficient to point 
me to a suspicious part of the code (there are some finalization bugs that 
should probably be fixed, but I don't think they are causing this issue).  
Sample VG messages and my (possibly flawed) interpretations are at the end of 
this mail.

Can you take a look?  This is a regression from v1.7.4 as we're headed into 
v1.7.5.

Thanks,
-Dave


No idea here, but probably unrelated to the one-sided test failures:
==22608== Conditional jump or move depends on uninitialised value(s)
==22608==at 0xA72E6B9: ml_init_k_nomial_trees (coll_ml_module.c:651)
==22608==by 0xA7336C8: mca_coll_ml_tree_hierarchy_discovery 
(coll_ml_module.c:2162)
==22608==by 0xA733D46: mca_coll_ml_fulltree_ptp_only_hierarchy_discovery 
(coll_ml_module.c:2333)
==22608==by 0xA7314C9: ml_discover_hierarchy (coll_ml_module.c:1565)
==22608==by 0xA735DCD: mca_coll_ml_comm_query (coll_ml_module.c:2992)
==22608==by 0x4CD23AE: query_2_0_0 (coll_base_comm_select.c:395)
==22608==by 0x4CD2372: query (coll_base_comm_select.c:378)
==22608==by 0x4CD2285: check_one_component (coll_base_comm_select.c:340)
==22608==by 0x4CD20D7: check_components (coll_base_comm_select.c:304)
==22608==by 0x4CCAE11: mca_coll_base_comm_select 
(coll_base_comm_select.c:131)
==22608==by 0x4C60B49: ompi_mpi_init (ompi_mpi_init.c:888)
==22608==by 0x4C93AE2: PMPI_Init (pinit.c:84)
==22608==  Uninitialised value was created by a heap allocation
==22608==at 0x4A07844: malloc (vg_replace_malloc.c:291)
==22608==by 0x4A079B8: realloc (vg_replace_malloc.c:687)
==22608==by 0xA72F91B: get_new_subgroup_data (coll_ml_module.c:1044)
==22608==by 0xA73292E: mca_coll_ml_tree_hierarchy_discovery 
(coll_ml_module.c:1939)
==22608==by 0xA733D46: mca_coll_ml_fulltree_ptp_only_hierarchy_discovery 
(coll_ml_module.c:2333)
==22608==by 0xA7314C9: ml_discover_hierarchy (coll_ml_module.c:1565)
==22608==by 0xA735DCD: mca_coll_ml_comm_query (coll_ml_module.c:2992)
==22608==by 0x4CD23AE: query_2_0_0 (coll_base_comm_select.c:395)
==22608==by 0x4CD2372: query (coll_base_comm_select.c:378)
==22608==by 0x4CD2285: check_one_component (coll_base_comm_select.c:340)
==22608==by 0x4CD20D7: check_components (coll_base_comm_select.c:304)
==22608==by 0x4CCAE11: mca_coll_base_comm_select 
(coll_base_comm_select.c:131)

Possibly related:
==22608== Syscall param writev(vector[...]) points to uninitialised byte(s)
==22608==at 0x354BAE0C57: writev (in /lib64/libc-2.12.so)
==22608==by 0x95F7CD2: mca_btl_tcp_frag_send (btl_tcp_frag.c:107)
==22608==by 0x95F61CE: mca_btl_tcp_endpoint_send (btl_tcp_endpoint.c:261)
==22608==by 0x95F1C08: mca_btl_tcp_send (btl_tcp.c:387)
==22608==by 0x9CD4098: mca_bml_base_send (bml.h:276)
==22608==by 0x9CD636F: mca_pml_ob1_send_request_start_prepare 
(pml_ob1_sendreq.c:650)
==22608==by 0x9CC9B39: mca_pml_ob1_send_request_start_btl 
(pml_ob1_sendreq.h:388)
==22608==by 0x9CC9DF1: mca_pml_ob1_send_request_start 
(pml_ob1_sendreq.h:461)
==22608==by 0x9CCA4FA: mca_pml_ob1_isend (pml_ob1_isend.c:85)
==22608==by 0x4C699BD: comm_allreduce_pml (allreduce.c:178)
==22608==by 0xA73172C: ml_discover_hierarchy (coll_ml_module.c:1616)
==22608==by 

[OMPI devel] Fwd: [OMPI svn] svn:open-mpi r30894 - in branches/v1.7: . ompi ompi/attribute ompi/debuggers ompi/errhandler ompi/include ompi/mca/btl ompi/mca/btl/openib/connect ompi/mca/op ompi/mca/osc

2014-02-28 Thread Dave Goodell (dgoodell)
Begin forwarded message:

> From: 
> Subject: [OMPI svn] svn:open-mpi r30894 - in branches/v1.7: . ompi 
> ompi/attribute ompi/debuggers ompi/errhandler ompi/include ompi/mca/btl 
> ompi/mca/btl/openib/connect ompi/mca/op ompi/mca/osc ompi/mca/osc/base 
> ompi/mca/osc/portals4 ompi/mca/osc/rdma ompi/mca/osc/sm ompi/mca/pml/cm 
> ompi/mpi/c ompi/mpi/c/profile ompi/mpi/man/man3 ompi/op ompi/win opal 
> opal/util
> Date: February 28, 2014 10:16:08 AM PST
> To: 
> Reply-To: 
> 
> Author: rhc (Ralph Castain)
> Date: 2014-02-28 13:16:08 EST (Fri, 28 Feb 2014)
> New Revision: 30894
> URL: https://svn.open-mpi.org/trac/ompi/changeset/30894
> 
> Log:
> Fixes #4304: Move r30816, 30820, 30821, r30853 to v1.7 branch (Fix a number 
> of issues)

I get one-sided build errors in our Cisco jenkins as of this commit (well, 
actually r30895, but I suspect this commit is the culprit).  Maybe something's 
missing from the CMR?

8<
make[2]: Leaving directory 
`/home/jenkins/slave-roots/master/ompi_upstream_v1.7_build_debug/ompi/mca/osc/sm'
Making all in mca/osc/pt2pt
make[2]: Entering directory 
`/home/jenkins/slave-roots/master/ompi_upstream_v1.7_build_debug/ompi/mca/osc/pt2pt'
  CC   osc_pt2pt.lo
  CC   osc_pt2pt_buffer.lo
  CC   osc_pt2pt_comm.lo
  CC   osc_pt2pt_component.lo
  CC   osc_pt2pt_data_move.lo
  CC   osc_pt2pt_longreq.lo
  CC   osc_pt2pt_replyreq.lo
  CC   osc_pt2pt_sendreq.lo
  CC   osc_pt2pt_sync.lo
osc_pt2pt_replyreq.c: In function 'ompi_osc_pt2pt_replyreq_alloc_init':
osc_pt2pt_replyreq.c:35: error: 'ompi_win_t' has no member named 'w_baseptr'
osc_pt2pt_replyreq.c:36: error: 'ompi_win_t' has no member named 'w_disp_unit'
make[2]: *** [osc_pt2pt_replyreq.lo] Error 1
make[2]: *** Waiting for unfinished jobs
osc_pt2pt_sync.c: In function 'ompi_osc_pt2pt_module_fence':
osc_pt2pt_sync.c:153: error: implicit declaration of function 
'ompi_win_set_mode'
osc_pt2pt_sync.c:153: error: 'OMPI_WIN_FENCE' undeclared (first use in this 
function)
osc_pt2pt_sync.c:153: error: (Each undeclared identifier is reported only once
osc_pt2pt_sync.c:153: error: for each function it appears in.)
osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_module_accumulate':
osc_pt2pt_comm.c:60: error: 'OMPI_WIN_STARTED' undeclared (first use in this 
function)
osc_pt2pt_comm.c:60: error: (Each undeclared identifier is reported only once
osc_pt2pt_comm.c:60: error: for each function it appears in.)
osc_pt2pt_comm.c:60: error: implicit declaration of function 'ompi_win_get_mode'
osc_pt2pt_comm.c:65: error: 'OMPI_WIN_FENCE' undeclared (first use in this 
function)
osc_pt2pt_comm.c:67: error: implicit declaration of function 'ompi_win_set_mode'
osc_pt2pt_comm.c:67: error: 'OMPI_WIN_ACCESS_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_comm.c:68: error: 'OMPI_WIN_EXPOSE_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_module_get':
osc_pt2pt_comm.c:115: error: 'OMPI_WIN_STARTED' undeclared (first use in this 
function)
osc_pt2pt_comm.c:120: error: 'OMPI_WIN_FENCE' undeclared (first use in this 
function)
osc_pt2pt_sync.c: In function 'ompi_osc_pt2pt_module_start':
osc_pt2pt_sync.c:213: error: implicit declaration of function 
'ompi_win_remove_mode'
osc_pt2pt_comm.c:122: error: 'OMPI_WIN_ACCESS_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_sync.c:213: error: 'OMPI_WIN_FENCE' undeclared (first use in this 
function)
osc_pt2pt_comm.c:123: error: 'OMPI_WIN_EXPOSE_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_sync.c:214: error: implicit declaration of function 
'ompi_win_append_mode'
osc_pt2pt_sync.c:214: error: 'OMPI_WIN_ACCESS_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_sync.c:214: error: 'OMPI_WIN_STARTED' undeclared (first use in this 
function)
osc_pt2pt_comm.c: In function 'ompi_osc_pt2pt_module_put':
osc_pt2pt_comm.c:165: error: 'OMPI_WIN_STARTED' undeclared (first use in this 
function)
osc_pt2pt_comm.c:170: error: 'OMPI_WIN_FENCE' undeclared (first use in this 
function)
osc_pt2pt_comm.c:172: error: 'OMPI_WIN_ACCESS_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_comm.c:173: error: 'OMPI_WIN_EXPOSE_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_sync.c: In function 'ompi_osc_pt2pt_module_complete':
osc_pt2pt_sync.c:294: error: 'OMPI_WIN_ACCESS_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_sync.c:294: error: 'OMPI_WIN_STARTED' undeclared (first use in this 
function)
osc_pt2pt_sync.c: In function 'ompi_osc_pt2pt_module_post':
osc_pt2pt_sync.c:319: error: 'OMPI_WIN_FENCE' undeclared (first use in this 
function)
osc_pt2pt_sync.c:320: error: 'OMPI_WIN_EXPOSE_EPOCH' undeclared (first use in 
this function)
osc_pt2pt_sync.c:320: error: 'OMPI_WIN_POSTED' undeclared (first use in this 
function)
osc_pt2pt_sync.c: In function 'ompi_osc_pt2pt_module_wait':
osc_pt2pt_sync.c:354: error: 'OMPI_WIN_EXPOSE_EPOCH' 

Re: [OMPI devel] RFC: optimize probe in ob1

2014-02-19 Thread Dave Goodell (dgoodell)
On Feb 19, 2014, at 6:36 AM, George Bosilca  wrote:

> There is one minor thing I would suggest to change. In your patch 
> in_unexpected_list is defined as a bool, which translates to an int on most 
> platforms.

This statement isn't true.  sizeof(bool)==1 on my Mac and on our x86_64 Linux 
cluster at Cisco.  I only mention it because this seems to be a common myth for 
some reason.

> You can change it to an uint8_t and move the in_unexpected_list field in the 
> mca_pml_ob1_comm_proc_t to allow the compiler to pack it with the 
> expected_sequence.

However, this is still a reasonable suggestion to ensure that we retain good 
control of our structure sizes/layouts.

-Dave



Re: [OMPI devel] RFC: Changing 32-bit build behavior/sizes for MPI_Count and MPI_Offset

2014-02-11 Thread Dave Goodell (dgoodell)
On Feb 10, 2014, at 6:14 PM, Jeff Squyres (jsquyres)  wrote:

> As a side effect, this means that -- for 32 bit builds -- we will not support 
> large filesystems well (e.g., filesystems with 64 bit offsets).  BlueGene is 
> an example of such a system (not that OMPI supports BlueGene, but...).

To clarify and head off unnecessary quibbling, I'll point out that by 
"BlueGene", Jeff means "Blue Gene/P" (/Q is 64-bit).  This issue applies to any 
machine with 32-bit addresses that might want to access files larger than 2 GiB.

> Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits.  
> I don't think that this is a major issue, because 32 bit builds are not a 
> huge issue for the OMPI community, but I raise the point in the spirit of 
> full disclosure.  Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset 
> and MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor 
> infrastructure to use something other than size_t, and I have zero desire to 
> do that!  (please, no OMPI vendor reveal that they're going to seriously 
> build giant 32 bit systems...)

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r30555 - in trunk: . config contrib/platform/lanl/cray_xe6

2014-02-04 Thread Dave Goodell (dgoodell)
On Feb 4, 2014, at 1:44 PM, svn-commit-mai...@open-mpi.org wrote:

> Author: hjelmn (Nathan Hjelm)
> Date: 2014-02-04 14:44:08 EST (Tue, 04 Feb 2014)
> New Revision: 30555
> URL: https://svn.open-mpi.org/trac/ompi/changeset/30555
> 
> Log:
> Fix wrapper ldflags.
> 
> cmr=v1.7.4:reviewer=jsquyres
> 
> Text files modified: 
>   trunk/config/opal_setup_wrappers.m4  | 9 -  
>  
>   trunk/configure.ac   | 2 ++ 
>  
>   trunk/contrib/platform/lanl/cray_xe6/cray-common | 4    
>  
>   3 files changed, 6 insertions(+), 9 deletions(-)
> 
> Modified: trunk/config/opal_setup_wrappers.m4
> ==
> --- trunk/config/opal_setup_wrappers.m4   Tue Feb  4 09:47:04 2014
> (r30554)
> +++ trunk/config/opal_setup_wrappers.m4   2014-02-04 14:44:08 EST (Tue, 
> 04 Feb 2014)  (r30555)
> @@ -150,7 +150,7 @@
> # (because if script A sources script B, and B calls "exit", then both
> # B and A will exit).  Instead, we have to send the output to a file
> # and then source that.
> -$OMPI_TOP_BUILDDIR/opal/libltdl/libtool --config > $rpath_outfile
> +$OMPI_TOP_BUILDDIR/libtool --config > $rpath_outfile
> 
> chmod +x $rpath_outfile
> . ./$rpath_outfile
> @@ -214,9 +214,8 @@
> # runtime), and the RUNPATH args, if we have them.
> AC_DEFUN([RPATHIFY_LDFLAGS],[
> OPAL_VAR_SCOPE_PUSH([rpath_out rpath_dir rpath_tmp])
> -AS_IF([test "$enable_wrapper_rpath" = "no" -o "$WRAPPER_RPATH_SUPPORT" = 
> "disabled"],
> -  [:],
> -  [rpath_out=
> +AS_IF([test "$enable_wrapper_rpath" = "yes" -a ! 
> "$WRAPPER_RPATH_SUPPORT" = "disabled" -a ! "WRAPPER_RPATH_SUPPORT" = 
> "unnecessary"], [

This "test" looks dangerous to me.  Both non-portable [1] and slightly 
challenging to read at first glance.  It would be better as:

8<
test "$enable_wrapper_rpath" = "yes" &&
test "$WRAPPER_RPATH_SUPPORT" != "disabled" &&
test "$WRAPPER_RPATH_SUPPORT" != "unnecessary"
8<

In fact, if you look carefully at the third test, there is a missing "$" before 
"WRAPPER_RPATH_SUPPORT" in the SVN version...

[1] 
http://www.gnu.org/software/autoconf/manual/autoconf.html#index-g_t_0040command_007btest_007d-1793

-Dave



Re: [OMPI devel] 1.7.4rc2r30031 - FreeBSD mpirun warning

2013-12-20 Thread Dave Goodell (dgoodell)
On Dec 20, 2013, at 4:43 PM, Paul Hargrove  wrote:

> The warning is correct that no such interface exists.
> However 127.0.0.1/24 DOES exist:
> 
> $ ifconfig lo0 inet
> lo0: flags=8049 metric 0 mtu 16384
>options=63
>inet 127.0.0.1 netmask 0xff00

Minor quibble, Paul: that looks like 127.0.0.1/8 to me, not /24...

I have no specific help to offer for this issue though, sorry.

-Dave



Re: [OMPI devel] Change request for include/mpif-config.h

2013-11-22 Thread Dave Goodell (dgoodell)
Jeff Squyres is usually our Fortran expert for this sort of issue, but he's on 
vacation until after the Thanksgiving holiday in the US.  So please expect a 
modest delay in (properly) responding to your question.

-Dave

On Nov 21, 2013, at 9:37 AM, "Gunter, David O"  wrote:

> We have a user complaining about warnings he is getting from his Fortran 95 
> code.
> 
> The Intel compilers throw out this warning:
> 
> warning #7346: The CHARACTER* form of a CHARACTER declaration is an 
> obsolescent feature in Fortran 95.
> 
> The warning stems from the following two lines in  dir>/include/mpif-config.h:
> 
>  character*32 OMPI_GREEK_VERSION
>  character*32 OMPI_SVN_VERSION
> 
> Can we simply change those lines to the following:
> 
>  character(len=32) OMPI_GREEK_VERSION
>  character(len=32) OMPI_SVN_VERSION
> 
> What would break if that happened?
> 
> Thanks,
> david
> --
> David Gunter
> HPC-3
> Los Alamos National Laboratory
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI svn] svn:open-mpi r29615 - in trunk: . contrib contrib/dist/linux debian debian/source

2013-11-06 Thread Dave Goodell (dgoodell)
Mike,

Why not put these packaging files in "/contrib/dist/..." in SVN and then 
symlink them to "/debian" as a step in your build script?  Top level names are 
(somewhat) precious and should not be added casually.

-Dave

On Nov 6, 2013, at 4:50 AM, svn-commit-mai...@open-mpi.org wrote:

> Author: miked (Mike Dubman)
> Date: 2013-11-06 07:50:28 EST (Wed, 06 Nov 2013)
> New Revision: 29615
> URL: https://svn.open-mpi.org/trac/ompi/changeset/29615
> 
> Log:
> packaging: add support for debian + example
> unfortunately the debian packaging files should reside in the root folder
> and cannot be placed under contrib/dist/... tree.
> developed by Aleksey, reviewed by miked
> cmr=v1.7.4:reviewer=ompi-gk1.7
> 
> Added:
>   trunk/contrib/dist/linux/compile_debian_mlnx_example.in
>   trunk/debian/
>   trunk/debian/changelog.in
>   trunk/debian/compat
>   trunk/debian/control.in
>   trunk/debian/rules.in
>   trunk/debian/source/
>   trunk/debian/source/format
> Text files modified: 
>   trunk/Makefile.am   | 2 +-  
> 
>   trunk/configure.ac  |11 ++- 
> 
>   trunk/contrib/Makefile.am   | 1 +   
> 
>   trunk/contrib/dist/linux/compile_debian_mlnx_example.in |27 
> +++ 
>   trunk/debian/changelog.in   | 5 +   
> 
>   trunk/debian/compat | 1 +   
> 
>   trunk/debian/control.in |18 
> ++  
>   trunk/debian/rules.in   |17 
> +   
>   trunk/debian/source/format  | 1 +   
> 
>   9 files changed, 81 insertions(+), 2 deletions(-)
> 
> Modified: trunk/Makefile.am
> ==
> --- trunk/Makefile.am Wed Nov  6 01:19:03 2013(r29614)
> +++ trunk/Makefile.am 2013-11-06 07:50:28 EST (Wed, 06 Nov 2013)  (r29615)
> @@ -20,7 +20,7 @@
> 
> SUBDIRS = config contrib $(MCA_PROJECT_SUBDIRS) test
> EXTRA_DIST = README INSTALL VERSION Doxyfile LICENSE autogen.pl autogen.sh \
> - README.JAVA.txt
> + README.JAVA.txt debian
> 
> include examples/Makefile.include
> 
> 
> Modified: trunk/configure.ac
> ==
> --- trunk/configure.acWed Nov  6 01:19:03 2013(r29614)
> +++ trunk/configure.ac2013-11-06 07:50:28 EST (Wed, 06 Nov 2013)  
> (r29615)
> @@ -1341,6 +1341,11 @@
> config/Makefile
> 
> contrib/Makefile
> +contrib/dist/linux/compile_debian_mlnx_example
> +
> +debian/changelog
> +debian/rules
> +debian/control
> 
> test/Makefile
> test/event/Makefile
> @@ -1350,7 +1355,11 @@
> test/support/Makefile
> test/threads/Makefile
> test/util/Makefile
> -])
> +],[
> +chmod +x debian/rules
> +chmod +x contrib/dist/linux/compile_debian_mlnx_example
> +cp LICENSE debian/copyright
> +])
> 
> OPAL_CONFIG_FILES
> m4_ifdef([project_orte], [ORTE_CONFIG_FILES])
> 
> Modified: trunk/contrib/Makefile.am
> ==
> --- trunk/contrib/Makefile.am Wed Nov  6 01:19:03 2013(r29614)
> +++ trunk/contrib/Makefile.am 2013-11-06 07:50:28 EST (Wed, 06 Nov 2013)  
> (r29615)
> @@ -33,6 +33,7 @@
> EXTRA_DIST = \
>   dist/make_dist_tarball \
>   dist/linux/openmpi.spec \
> + dist/linux/compile_debian_mlnx_example.in \
>   dist/macosx-pkg/buildpackage.sh \
>   dist/macosx-pkg/ReadMe.rtf \
>   platform/optimized \
> 
> Added: trunk/contrib/dist/linux/compile_debian_mlnx_example.in
> ==
> --- /dev/null 00:00:00 1970   (empty, because file is newly added)
> +++ trunk/contrib/dist/linux/compile_debian_mlnx_example.in   2013-11-06 
> 07:50:28 EST (Wed, 06 Nov 2013)  (r29615)
> @@ -0,0 +1,27 @@
> +INSTALL_DIR=${INSTALL_DIR:-/usr/mpi/gcc}
> +PREFIX=${INSTALL_DIR}/openmpi-@OMPI_MAJOR_VERSION@.@OMPI_MINOR_VERSION@.@OMPI_RELEASE_VERSION@
> +
> +MAINTEINER=${MAINTEINER:-"Mellanox Ltd. "}
> +UPLOADER=${UPLOADER:-"$MAINTEINER"}
> +
> +MXM_PATH=${MXM_PATH:-/opt/mellanox/mxm}
> +FCA_PATH=${FCA_PATH:-/opt/mellanox/fca}
> +KNEM_PATH=${KNEM_PATH:-/opt/knem-1.0.90mlnx2}
> +
> +[ -d $MXM_PATH ] && WITH_MXM="--with-mxm=$MXM_PATH"
> +[ -d $FCA_PATH ] && WITH_FCA="--with-fca=$FCA_PATH"
> +[ -d $KNEM_PATH ] && WITH_KNEM="--with-knem=$KNEM_PATH"
> +
> +CONFIG_ARGS=${CONFIG_ARGS:-"--prefix=$PREFIX
> +--libdir=$OMPI_PREFIX/lib64 \

Re: [OMPI devel] SHMEM v1.7 merge proposal

2013-10-29 Thread Dave Goodell (dgoodell)
Mike,

I've never personally used git2svn, but my understanding is that it would 
require us to essentially "lock" the repository to all other commits while you 
are using it, which isn't very friendly to the rest of the community.  Also, 
using git2svn probably wouldn't twiddle the SVN merge tracking metadata 
correctly.

I think it would be better to simply handle this with "svn merge" and friends.

-Dave

On Oct 29, 2013, at 6:08 AM, Mike Dubman  wrote:

> will it be ok, that once all is ready, we push git2svn branch for this?
> 
> 
> On Tue, Oct 29, 2013 at 12:35 PM, Jeff Squyres (jsquyres) 
>  wrote:
> I think Brian's point is that it should be a SVN branch.
> 
> 
> On Oct 29, 2013, at 3:27 AM, Mike Dubman  wrote:
> 
> > Hi,
> > This is exactly the way we handle it now. We have internal branch on top of 
> > v1.7 with all SHMEM code in it.
> > It runs mtt and other tests.
> >
> > Once we done with all changes - we will provide patch (and branch direct 
> > access if needed) for GK.
> >
> > Kind Regards
> > Mike.
> >
> >
> > On Tue, Oct 29, 2013 at 1:02 AM, Barrett, Brian W  
> > wrote:
> > All -
> >
> > Ralph and I talked today about the logistics of bringing the OpenSHMEM
> > code to the 1.7 release branch, as it's now a fairly large set of changes
> > from the trunk.  What we propose is to follow the same proceedure we used
> > when merging in the RTE framework change, which is essentially a staging
> > branch.  So, Mellanox (as the one filing the CMR) would branch from 1.7,
> > bring the OpenSHMEM changes into that (and hopefully test), and then file
> > a single CMR for the changes from the branch.  If done properly, the GK
> > then only has to merge with --reintegrate and we're set.
> >
> > Let's talk about it on the call tomorrow, but that's the current proposal.
> >
> > Brian
> >
> > --
> >   Brian W. Barrett
> >   Scalable System Software Group
> >   Sandia National Laboratories
> >
> >
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] .mailmap added for github mirror

2013-10-23 Thread Dave Goodell (dgoodell)
Based on discussion with Mellanox around the recent GitHub mirror updates, I've 
added a .mailmap file in r29494.  It helps address two goals:

1) To be able to fix misspelled names without rewinding the Git history.

2) To be able to add email addresses incrementally without rewinding Git 
history.  Mellanox would like email addresses so that they can automatically 
send build failure emails from their Jenkins continuous integration server.

The initial version only contains email addresses for Cisco contributors.  I've 
put commented-out templates for all other contributors listed in the AUTHORS 
file.  If you are a contributor and you would like to opt-in to having your 
email address listed in this format, please follow the editing instructions 
listed in that file on the trunk.

At some point I'll CMR the latest state of the file to the v1.7 branch as well.

For more information about git's usage of the .mailmap file, see the man page 
for git-shortlog: 
https://www.kernel.org/pub/software/scm/git/docs/git-shortlog.html

-Dave