Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Ralph Castain
There is one other thing you can check - check for stale libraries on your backend nodes. The options on the daemons changed. They used to always daemonize unless told otherwise. They now do NOT daemonize unless told to do so. If the orted executables back there are "stale", then you will get the

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Ralph Castain
Hmmm...something isn't making sense. Can I see the command line you used to generate this? I'll tell you why I'm puzzled. If orte_debug_flag is set, then the "--daemonize" should NOT be there, and you should see "--debug" on that command line. What I see is the reverse, which implies to me that or

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 4:12 PM, Gleb Natapov wrote: I can specify different openib_if_include values for different procs on the same host. I know you *can*, but it is certainly uncommon. The common case is Uncommon - yes, but do you what to make it unsupported? No, there's no need for that. t

[OMPI devel] Mercurial demo OMPI repository

2008-04-02 Thread Jeff Squyres
Thanks to the sysadmins at IU, I put up a sample Mercurial OMPI repository here: http://www.open-mpi.org/hg/hgwebdir.cgi/ I converted the entire SVN ompi repository history (/trunk, /tags, and /branches only) as of r17921. Note that it shows some commits on the 0.9 branch as the most

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jon Mason
On Wednesday 02 April 2008 05:04:47 pm Ralph Castain wrote: > Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1 and > look at the cmd line being executed (send it here). It will look like: > > [[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj; > > If the cmd line has --daemonize

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Ralph Castain
Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1 and look at the cmd line being executed (send it here). It will look like: [[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj; If the cmd line has --daemonize on it, then the ssh will close and xterm won't work. Ralph On 4/2

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jeff Squyres
Can you diagnose a little further: 1. in the case where it works, can you verify that the ssh to launch the orteds is still running? 2. in the case where it doesn't work, can you verify that the ssh to launch the orteds has actually died? On Apr 2, 2008, at 4:58 PM, Jon Mason wrote: On

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jon Mason
On Wednesday 02 April 2008 01:21:31 pm Jon Mason wrote: > On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote: > > I remember that someone had found a bug that caused orte_debug_flag to not > > get properly set (local var covering over a global one) - could be that > > your tmp-public bran

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 03:45:20PM -0400, Jeff Squyres wrote: > On Apr 2, 2008, at 1:58 PM, Gleb Natapov wrote: > >> No, I think it would be fine to only send the output after > >> btl_openib_if_in|exclude is applied. Perhaps we need an MCA param to > >> say "always send everything" in the case th

Re: [OMPI devel] [PATCH] Fix compilation error without XRC

2008-04-02 Thread Jeff Squyres
Thanks; applied https://svn.open-mpi.org/trac/ompi/changeset/18076. On Apr 2, 2008, at 8:21 AM, Bernhard Fischer wrote: Hi, * ompi/mca/btl/openib/btl_openib_component.c (init_one_hca): mca_btl_openib_open_xrc_domain and mca_btl_openib_close_xrc_domain depend on XRC Fix

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 1:58 PM, Gleb Natapov wrote: No, I think it would be fine to only send the output after btl_openib_if_in|exclude is applied. Perhaps we need an MCA param to say "always send everything" in the case that someone applies a non- homogeneous if_in|exclude set of values...? When i

Re: [OMPI devel] [PATCH] Fix typo in configure helptext

2008-04-02 Thread Jeff Squyres
Thanks! We have a general rule to not apply autogen-worthy changes during the US workday, so I'll commit this tonight. On Apr 2, 2008, at 8:20 AM, Bernhard Fischer wrote: Hi, * config/ompi_configure_options.m4: Fix typo in helptext Please apply. TIA, Bernhard connectx.diff>_

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jon Mason
On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote: > I remember that someone had found a bug that caused orte_debug_flag to not > get properly set (local var covering over a global one) - could be that > your tmp-public branch doesn't have that patch in it. > > You might try updating to

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 12:08:47PM -0400, Jeff Squyres wrote: > On Apr 2, 2008, at 11:13 AM, Gleb Natapov wrote: > > On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: > >> If we use carto to limit hcas/ports are used on a given host on a > >> per- > >> proc basis, then we can include

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Ralph H Castain
I remember that someone had found a bug that caused orte_debug_flag to not get properly set (local var covering over a global one) - could be that your tmp-public branch doesn't have that patch in it. You might try updating to the latest trunk On 4/2/08 10:41 AM, "George Bosilca" wrote: > I'm

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread George Bosilca
I'm using this feature on the trunk with the version from yesterday. It works without problems ... george. On Apr 2, 2008, at 12:14 PM, Jon Mason wrote: On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote: Are these r numbers relevant on the /tmp-public branch, or the trunk? I pull

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jon Mason
On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote: > Are these r numbers relevant on the /tmp-public branch, or the trunk? I pulled it out of the command used to update the branch, which was: svn merge -r 17590:17917 https://svn.open-mpi.org/svn/ompi/trunk . In the cpc tmp branch, it happ

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 11:13 AM, Gleb Natapov wrote: On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: If we use carto to limit hcas/ports are used on a given host on a per- proc basis, then we can include some proc_send data to say "this proc only uses indexes X,Y,Z from the node data

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jeff Squyres
Are these r numbers relevant on the /tmp-public branch, or the trunk? On Apr 2, 2008, at 11:59 AM, Jon Mason wrote: I regressed my tree and it looks like it happened between 17590:17917 On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote: I am noticing that ssh seems to be broken on trunk (

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 11:10 AM, Tim Prins wrote: Is there a reason to rename ompi_modex_{send,recv} to ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing and less work) to leave the names alone and add ompi_modex_node_{send,recv}. If the arguments don't change, I don't have a

Re: [OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jon Mason
I regressed my tree and it looks like it happened between 17590:17917 On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote: > I am noticing that ssh seems to be broken on trunk (and my cpc branch, as > it is based on trunk). When I try to use xterm and gdb to debug, I only > successfully get 1

[OMPI devel] Ssh tunnelling broken in trunk?

2008-04-02 Thread Jon Mason
I am noticing that ssh seems to be broken on trunk (and my cpc branch, as it is based on trunk). When I try to use xterm and gdb to debug, I only successfully get 1 xterm. I have tried this on 2 different setups. I can successfully get the xterm's on the 1.2 svn branch. I am running the fo

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Ralph H Castain
On 4/2/08 8:52 AM, "Terry Dontje" wrote: > Jeff Squyres wrote: >> WHAT: Changes to MPI layer modex API >> >> WHY: To be mo' betta scalable >> >> WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that >> calls ompi_modex_send() and/or ompi_modex_recv() >> >> TIMEOUT: COB Fri 4 Ap

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: > If we use carto to limit hcas/ports are used on a given host on a per- > proc basis, then we can include some proc_send data to say "this proc > only uses indexes X,Y,Z from the node data". The indexes can be > either uint8_ts, o

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Tim Prins
Is there a reason to rename ompi_modex_{send,recv} to ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing and less work) to leave the names alone and add ompi_modex_node_{send,recv}. Another question: Does the receiving process care that the information received applies to a w

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Terry Dontje
Jeff Squyres wrote: WHAT: Changes to MPI layer modex API WHY: To be mo' betta scalable WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that calls ompi_modex_send() and/or ompi_modex_recv() TIMEOUT: COB Fri 4 Apr 2008 DESCRIPTION: [...snip...] * int ompi_modex_node_send

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 10:27 AM, Gleb Natapov wrote: In the case of openib BTL what part of modex are you going to send using proc_send() and what part using node_send()? In the /tmp-public/openib-cpc2 branch, almost all of it will go to the node_send(). The CPC's will likely now get 2 buffer

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 10:21:12AM -0400, Jeff Squyres wrote: > * int ompi_modex_proc_send(...): send modex data that is specific to > this process. It is just about exactly the same as the current API > call (ompi_modex_send). > [skip] > > * int ompi_modex_node_send(...): send modex dat

[OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
WHAT: Changes to MPI layer modex API WHY: To be mo' betta scalable WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that calls ompi_modex_send() and/or ompi_modex_recv() TIMEOUT: COB Fri 4 Apr 2008 DESCRIPTION: Per some of the scalability discussions that have been occurring (

Re: [OMPI devel] FW: [devel-core] [RFC] Add an alias name to MCA parameter

2008-04-02 Thread Josh Hursey
An arbitrary number of aliases is useful in a number of ways. For example you mention wanting to register an OPAL MCA parameter and later alias it as an OMPI MCA parameter. What if we also wanted to alias it as an ORTE level parameter. The best example I can think of is the TCP include/excl

[OMPI devel] FW: [devel-core] [RFC] Add an alias name to MCA parameter

2008-04-02 Thread Sharon Melamed
-Original Message- From: devel-core-boun...@open-mpi.org [mailto:devel-core-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Wednesday, April 02, 2008 3:44 PM To: Open MPI Core Developers Subject: Re: [devel-core] [RFC] Add an alias name to MCA parameter BTW, these mails can go acro

[OMPI devel] [PATCH] Fix compilation error without XRC

2008-04-02 Thread Bernhard Fischer
Hi, * ompi/mca/btl/openib/btl_openib_component.c (init_one_hca): mca_btl_openib_open_xrc_domain and mca_btl_openib_close_xrc_domain depend on XRC Fixes the compilation failure as in the head of attached patch. TIA, Bernhard CXX -g -finline-functions -o .libs/ompi_info comp

[OMPI devel] [PATCH] Fix typo in configure helptext

2008-04-02 Thread Bernhard Fischer
Hi, * config/ompi_configure_options.m4: Fix typo in helptext Please apply. TIA, Bernhard Index: ompi-trunk/config/ompi_configure_options.m4 === --- ompi-trunk/config/ompi_configure_options.m4 (revision 18069) +++ ompi-trunk/c

Re: [OMPI devel] --disable-ipv6 broken on trunk

2008-04-02 Thread Josh Hursey
Great. Thanks for the fix. On Apr 2, 2008, at 6:54 AM, Adrian Knoth wrote: On Wed, Apr 02, 2008 at 06:36:02AM -0400, Josh Hursey wrote: It seems that builds configured with '--disable-ipv6' are broken on the trunk. I suspect r18055 for this break since the tarball from two

Re: [OMPI devel] --disable-ipv6 broken on trunk

2008-04-02 Thread Adrian Knoth
On Wed, Apr 02, 2008 at 06:36:02AM -0400, Josh Hursey wrote: > It seems that builds configured with '--disable-ipv6' are broken on > the trunk. I suspect r18055 for this break since the tarball from two > --- > oob_tcp.c: In function `mca_oob_tcp_fini': > oob_tcp.c:1364

[OMPI devel] --disable-ipv6 broken on trunk

2008-04-02 Thread Josh Hursey
It seems that builds configured with '--disable-ipv6' are broken on the trunk. I suspect r18055 for this break since the tarball from two nights ago worked fine and it is the only significant change in this code in the past week. The build error is: --- oob_tcp.c: In

Re: [OMPI devel] Trunk launch scaling

2008-04-02 Thread Pavel Shamis (Pasha)
Ralph, If you plan to compare OMPI to Mvapich, make sure to take the version 1.0.0 (or above). In 1.0.0 OSU introduced new launcher that works much faster than previous one. Regards, Pasha Ralph H Castain wrote: Per this morning's telecon, I have added the latest scaling test results to the w