There is one other thing you can check - check for stale libraries on your
backend nodes. The options on the daemons changed. They used to always
daemonize unless told otherwise. They now do NOT daemonize unless told to do
so.
If the orted executables back there are "stale", then you will get the
Hmmm...something isn't making sense. Can I see the command line you used to
generate this?
I'll tell you why I'm puzzled. If orte_debug_flag is set, then the
"--daemonize" should NOT be there, and you should see "--debug" on that
command line. What I see is the reverse, which implies to me that
or
On Apr 2, 2008, at 4:12 PM, Gleb Natapov wrote:
I can specify
different openib_if_include values for different procs on the same
host.
I know you *can*, but it is certainly uncommon. The common case is
Uncommon - yes, but do you what to make it unsupported?
No, there's no need for that.
t
Thanks to the sysadmins at IU, I put up a sample Mercurial OMPI
repository here:
http://www.open-mpi.org/hg/hgwebdir.cgi/
I converted the entire SVN ompi repository history (/trunk, /tags,
and /branches only) as of r17921. Note that it shows some commits on
the 0.9 branch as the most
On Wednesday 02 April 2008 05:04:47 pm Ralph Castain wrote:
> Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1 and
> look at the cmd line being executed (send it here). It will look like:
>
> [[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj;
>
> If the cmd line has --daemonize
Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1 and
look at the cmd line being executed (send it here). It will look like:
[[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj;
If the cmd line has --daemonize on it, then the ssh will close and xterm
won't work.
Ralph
On 4/2
Can you diagnose a little further:
1. in the case where it works, can you verify that the ssh to launch
the orteds is still running?
2. in the case where it doesn't work, can you verify that the ssh to
launch the orteds has actually died?
On Apr 2, 2008, at 4:58 PM, Jon Mason wrote:
On
On Wednesday 02 April 2008 01:21:31 pm Jon Mason wrote:
> On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote:
> > I remember that someone had found a bug that caused orte_debug_flag to not
> > get properly set (local var covering over a global one) - could be that
> > your tmp-public bran
On Wed, Apr 02, 2008 at 03:45:20PM -0400, Jeff Squyres wrote:
> On Apr 2, 2008, at 1:58 PM, Gleb Natapov wrote:
> >> No, I think it would be fine to only send the output after
> >> btl_openib_if_in|exclude is applied. Perhaps we need an MCA param to
> >> say "always send everything" in the case th
Thanks; applied https://svn.open-mpi.org/trac/ompi/changeset/18076.
On Apr 2, 2008, at 8:21 AM, Bernhard Fischer wrote:
Hi,
* ompi/mca/btl/openib/btl_openib_component.c (init_one_hca):
mca_btl_openib_open_xrc_domain and
mca_btl_openib_close_xrc_domain depend on XRC
Fix
On Apr 2, 2008, at 1:58 PM, Gleb Natapov wrote:
No, I think it would be fine to only send the output after
btl_openib_if_in|exclude is applied. Perhaps we need an MCA param to
say "always send everything" in the case that someone applies a non-
homogeneous if_in|exclude set of values...?
When i
Thanks! We have a general rule to not apply autogen-worthy changes
during the US workday, so I'll commit this tonight.
On Apr 2, 2008, at 8:20 AM, Bernhard Fischer wrote:
Hi,
* config/ompi_configure_options.m4: Fix typo in helptext
Please apply.
TIA,
Bernhard
connectx.diff>_
On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote:
> I remember that someone had found a bug that caused orte_debug_flag to not
> get properly set (local var covering over a global one) - could be that
> your tmp-public branch doesn't have that patch in it.
>
> You might try updating to
On Wed, Apr 02, 2008 at 12:08:47PM -0400, Jeff Squyres wrote:
> On Apr 2, 2008, at 11:13 AM, Gleb Natapov wrote:
> > On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote:
> >> If we use carto to limit hcas/ports are used on a given host on a
> >> per-
> >> proc basis, then we can include
I remember that someone had found a bug that caused orte_debug_flag to not
get properly set (local var covering over a global one) - could be that your
tmp-public branch doesn't have that patch in it.
You might try updating to the latest trunk
On 4/2/08 10:41 AM, "George Bosilca" wrote:
> I'm
I'm using this feature on the trunk with the version from yesterday.
It works without problems ...
george.
On Apr 2, 2008, at 12:14 PM, Jon Mason wrote:
On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote:
Are these r numbers relevant on the /tmp-public branch, or the trunk?
I pull
On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote:
> Are these r numbers relevant on the /tmp-public branch, or the trunk?
I pulled it out of the command used to update the branch, which was:
svn merge -r 17590:17917 https://svn.open-mpi.org/svn/ompi/trunk .
In the cpc tmp branch, it happ
On Apr 2, 2008, at 11:13 AM, Gleb Natapov wrote:
On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote:
If we use carto to limit hcas/ports are used on a given host on a
per-
proc basis, then we can include some proc_send data to say "this proc
only uses indexes X,Y,Z from the node data
Are these r numbers relevant on the /tmp-public branch, or the trunk?
On Apr 2, 2008, at 11:59 AM, Jon Mason wrote:
I regressed my tree and it looks like it happened between 17590:17917
On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote:
I am noticing that ssh seems to be broken on trunk (
On Apr 2, 2008, at 11:10 AM, Tim Prins wrote:
Is there a reason to rename ompi_modex_{send,recv} to
ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing
and
less work) to leave the names alone and add
ompi_modex_node_{send,recv}.
If the arguments don't change, I don't have a
I regressed my tree and it looks like it happened between 17590:17917
On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote:
> I am noticing that ssh seems to be broken on trunk (and my cpc branch, as
> it is based on trunk). When I try to use xterm and gdb to debug, I only
> successfully get 1
I am noticing that ssh seems to be broken on trunk (and my cpc branch, as it
is based on trunk). When I try to use xterm and gdb to debug, I only
successfully get 1 xterm. I have tried this on 2 different setups. I can
successfully get the xterm's on the 1.2 svn branch.
I am running the fo
On 4/2/08 8:52 AM, "Terry Dontje" wrote:
> Jeff Squyres wrote:
>> WHAT: Changes to MPI layer modex API
>>
>> WHY: To be mo' betta scalable
>>
>> WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that
>> calls ompi_modex_send() and/or ompi_modex_recv()
>>
>> TIMEOUT: COB Fri 4 Ap
On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote:
> If we use carto to limit hcas/ports are used on a given host on a per-
> proc basis, then we can include some proc_send data to say "this proc
> only uses indexes X,Y,Z from the node data". The indexes can be
> either uint8_ts, o
Is there a reason to rename ompi_modex_{send,recv} to
ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing and
less work) to leave the names alone and add ompi_modex_node_{send,recv}.
Another question: Does the receiving process care that the information
received applies to a w
Jeff Squyres wrote:
WHAT: Changes to MPI layer modex API
WHY: To be mo' betta scalable
WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that
calls ompi_modex_send() and/or ompi_modex_recv()
TIMEOUT: COB Fri 4 Apr 2008
DESCRIPTION:
[...snip...]
* int ompi_modex_node_send
On Apr 2, 2008, at 10:27 AM, Gleb Natapov wrote:
In the case of openib BTL what part of modex are you going to send
using
proc_send() and what part using node_send()?
In the /tmp-public/openib-cpc2 branch, almost all of it will go to the
node_send(). The CPC's will likely now get 2 buffer
On Wed, Apr 02, 2008 at 10:21:12AM -0400, Jeff Squyres wrote:
> * int ompi_modex_proc_send(...): send modex data that is specific to
> this process. It is just about exactly the same as the current API
> call (ompi_modex_send).
>
[skip]
>
> * int ompi_modex_node_send(...): send modex dat
WHAT: Changes to MPI layer modex API
WHY: To be mo' betta scalable
WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that
calls ompi_modex_send() and/or ompi_modex_recv()
TIMEOUT: COB Fri 4 Apr 2008
DESCRIPTION:
Per some of the scalability discussions that have been occurring (
An arbitrary number of aliases is useful in a number of ways.
For example you mention wanting to register an OPAL MCA parameter and
later alias it as an OMPI MCA parameter. What if we also wanted to
alias it as an ORTE level parameter. The best example I can think of
is the TCP include/excl
-Original Message-
From: devel-core-boun...@open-mpi.org
[mailto:devel-core-boun...@open-mpi.org] On Behalf Of Jeff Squyres
Sent: Wednesday, April 02, 2008 3:44 PM
To: Open MPI Core Developers
Subject: Re: [devel-core] [RFC] Add an alias name to MCA parameter
BTW, these mails can go acro
Hi,
* ompi/mca/btl/openib/btl_openib_component.c (init_one_hca):
mca_btl_openib_open_xrc_domain and
mca_btl_openib_close_xrc_domain depend on XRC
Fixes the compilation failure as in the head of attached patch.
TIA,
Bernhard
CXX -g -finline-functions -o .libs/ompi_info comp
Hi,
* config/ompi_configure_options.m4: Fix typo in helptext
Please apply.
TIA,
Bernhard
Index: ompi-trunk/config/ompi_configure_options.m4
===
--- ompi-trunk/config/ompi_configure_options.m4 (revision 18069)
+++ ompi-trunk/c
Great. Thanks for the fix.
On Apr 2, 2008, at 6:54 AM, Adrian Knoth wrote:
On Wed, Apr 02, 2008 at 06:36:02AM -0400, Josh Hursey wrote:
It seems that builds configured with '--disable-ipv6' are broken on
the trunk. I suspect r18055 for this break since the tarball from two
On Wed, Apr 02, 2008 at 06:36:02AM -0400, Josh Hursey wrote:
> It seems that builds configured with '--disable-ipv6' are broken on
> the trunk. I suspect r18055 for this break since the tarball from two
> ---
> oob_tcp.c: In function `mca_oob_tcp_fini':
> oob_tcp.c:1364
It seems that builds configured with '--disable-ipv6' are broken on
the trunk. I suspect r18055 for this break since the tarball from two
nights ago worked fine and it is the only significant change in this
code in the past week. The build error is:
---
oob_tcp.c: In
Ralph,
If you plan to compare OMPI to Mvapich, make sure to take the version
1.0.0 (or above). In 1.0.0 OSU introduced
new launcher that works much faster than previous one.
Regards,
Pasha
Ralph H Castain wrote:
Per this morning's telecon, I have added the latest scaling test results to
the w
37 matches
Mail list logo