[hwloc-devel] Create success (hwloc r1.3.1rc3r4071)

2011-12-15 Thread MPI Team
Creating nightly hwloc snapshot SVN tarball was a success. Snapshot: hwloc 1.3.1rc3r4071 Start time: Thu Dec 15 21:04:17 EST 2011 End time: Thu Dec 15 21:07:12 EST 2011 Your friendly daemon, Cyrador

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Nathan Hjelm
I have an idea. How about we set those the MPIR variables as weak. Just tested it with STAT. Can you replace orte/tools/orterun/orterun.c with the attached version and see if it fixes the issue? -Nathan On Thu, 15 Dec 2011, Ashley Pittman wrote: padb just calls gdb, you can see the error

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Ashley Pittman
padb just calls gdb, you can see the error using gdb alone using just the trace I sent when I started this thread. Perhaps the difference is in versions of gdb, I could give you a login to my test machine if you need? Ashley. On 15 Dec 2011, at 22:49, Nathan Hjelm wrote: > Whats odd is

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Nathan Hjelm
Whats odd is totalview, STAT, and GDB see the correct values despite them being in the B section. What does padb do differently? This is a dynamic, optimized build of 1.5.5rc1. -Nathan Hjelm HPC-3, LANL On Thu, 15 Dec 2011, Ashley Pittman wrote: If I add a new symbol to

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Ashley Pittman
If I add a new symbol to orte/mca/debugger/base/debugger_base_open.c and declare it in orte/mca/debugger/base/base.h, the same as MPIR_proctable_size is defined then it appears in the .so but not in the binary, if I then reference this variable in orte/tools/orterun/orterun.c the symbol

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Jeff Squyres
+1 on Ralph's comment -- it's not on the trunk. Perhaps the CMR didn't properly remove it from v1.5, but that explains why it's not in the v1.5 Makefile.am. On Dec 15, 2011, at 5:08 PM, George Bosilca wrote: > This is quite impressive. After digging a little bit more, it appears that > the

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Nathan Hjelm
orte/tools/orterun/debuggers.c does not exist anymore (its not in the 1.5.5rc1 tarball). I don't know why the symbols are showing up in section B of orterun. Investigating now. -Nathan Hjelm HPC-3, LANL On Thu, 15 Dec 2011, George Bosilca wrote: On Dec 15, 2011, at 16:55 , Ashley Pittman

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread George Bosilca
This is quite impressive. After digging a little bit more, it appears that the orte/tools/orterun/debuggers.c is in the repository but it is not used for compilation. Thus, I really don't see where the second definition is coming from? george. On Dec 15, 2011, at 17:02 , George Bosilca

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Ralph Castain
This file does not exist in the trunk, and should not exist in 1.5 any more. Perhaps the patch for 1.5 didn't correctly delete it? On Dec 15, 2011, at 3:02 PM, George Bosilca wrote: > ./orte/tools/orterun/debuggers.c:142:struct MPIR_PROCDESC *MPIR_proctable = > NULL; >

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Nathan Hjelm
That appears to be a similar problem to the MPIR_Breakpoint bug. Let me play around and see if I can find a fix. -Nathan Hjelm HPC-3, LANL On Thu, 15 Dec 2011, Ashley Pittman wrote: There is a problem with 1.5.5rc1 that prevents padb from loading the process table start from the orterun

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread George Bosilca
On Dec 15, 2011, at 16:55 , Ashley Pittman wrote: > There is a problem with 1.5.5rc1 that prevents padb from loading the process > table start from the orterun process, what appears to be happening is that > MPIR_proctable and MPIR_proctable_size is present in both orterun itself and > also

[OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread Ashley Pittman
There is a problem with 1.5.5rc1 that prevents padb from loading the process table start from the orterun process, what appears to be happening is that MPIR_proctable and MPIR_proctable_size is present in both orterun itself and also in libopen-rte.so, the code is correctly setting them in

Re: [OMPI devel] 1.5.5rc1 is out

2011-12-15 Thread Ashley Pittman
On 15 Dec 2011, at 20:16, Ashley Pittman wrote: > > On 14 Dec 2011, at 04:36, Jeff Squyres wrote: > >> In the usual place: >> >> http://www.open-mpi.org/software/ompi/v1.5/ >> >> Please test! I would really like to get this out by the end of the week. > > As with 1.4 I've tested it on

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-15 Thread Jeff Squyres
Right -- the symbol isn't declared in orterun. It's in libopen-rte.so. My changes ensure that the .o file that MPIR_Breakpoint is defined in will be pulled in by the linker to be in the mpirun process. On Dec 15, 2011, at 3:30 PM, Nathan Hjelm wrote: > Your changes don't break anything but

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-15 Thread Nathan Hjelm
Your changes don't break anything but they also don't cause MPIR_Breakpoint to appear in orterun: ct-login1:/scratch2/hjelmn hjelmn$ nm `type -p orterun` | grep MPIR 0060b0e0 B MPIR_attach_fifo 0060b2e0 B MPIR_being_debugged 0060b7b0 B MPIR_debug_state 0060ada0 B

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25661

2011-12-15 Thread Jeff Squyres
On Dec 15, 2011, at 2:51 PM, George Bosilca wrote: > This patch is not correct. All these variables have been moved into the ORTE > layer (they are declared in orte/mca/debugger/base/base.h), so they should be > in fact removed from the MPI level files. > > While I don't think moving them all

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25661

2011-12-15 Thread George Bosilca
This patch is not correct. All these variables have been moved into the ORTE layer (they are declared in orte/mca/debugger/base/base.h), so they should be in fact removed from the MPI level files. While I don't think moving them all in the ORTE was a good choice, changing their definition in

Re: [OMPI devel] OMPI 1.4.5rc1 posted

2011-12-15 Thread Ashley Pittman
On 8 Dec 2011, at 22:13, Jeff Squyres wrote: > 1.4.5rc1 is now posted in the usual place: > >http://www.open-mpi.org/software/ompi/v1.4/ > > Gearing up for a pre-Christmas release -- please test! There have only been > a few bug fixes since 1.4.4. See >

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-15 Thread Jeff Squyres
Ok, here's what I did: https://svn.open-mpi.org/trac/ompi/changeset/25660 --> pulls in symbols like MPIR_Breakpoint via a different dummy function https://svn.open-mpi.org/trac/ompi/changeset/25661 --> Fixes the ORTE_DECLSPEC typos that George found LANL: Can you verify that this (still) works

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-15 Thread Jeff Squyres
On Dec 15, 2011, at 10:28 AM, Ralph Castain wrote: >> I have had the chance now to test it with totalview and stat 1.1.0. Looks >> good. I pushed the fix to the trunk and it will need to be CMRed to 1.5. Ralph and I just talked about this on the phone some more -- I don't think

Re: [OMPI devel] Nodes already filled when spawning

2011-12-15 Thread Ralph Castain
mpirun --oversubscribe or OMPI_MCA_rmaps_base_oversubscribe=1 On Dec 15, 2011, at 11:27 AM, TERRY DONTJE wrote: > There's an oversubscribe option I can set in my case, right? > > Thanks, > > --td > > On 12/15/2011 1:22 PM, Ralph Castain wrote: >> >> This is fixed, to a degree, with

Re: [OMPI devel] Nodes already filled when spawning

2011-12-15 Thread TERRY DONTJE
There's an oversubscribe option I can set in my case, right? Thanks, --td On 12/15/2011 1:22 PM, Ralph Castain wrote: This is fixed, to a degree, with r25659. However, note that there is one big change that occurred back when we first committed the mapping change. As I noted at that time,

Re: [OMPI devel] Nodes already filled when spawning

2011-12-15 Thread Ralph Castain
This is fixed, to a degree, with r25659. However, note that there is one big change that occurred back when we first committed the mapping change. As I noted at that time, we changed the default for RM-given allocations to be no-oversubscribe. So your MTTs may well fail if they weren't updated

Re: [hwloc-devel] [hwloc-svn] svn:hwloc r4069

2011-12-15 Thread Brice Goglin
Le 15/12/2011 16:31, bgog...@osl.iu.edu a écrit : > Author: bgoglin > Date: 2011-12-15 10:31:50 EST (Thu, 15 Dec 2011) > New Revision: 4069 > URL: https://svn.open-mpi.org/trac/hwloc/changeset/4069 > > Log: > Fix a long-standing obsolete PREDEFINED in the website doxygen config There are still

Re: [OMPI devel] Nodes already filled when spawning

2011-12-15 Thread Ralph Castain
I'll take a look, Terry - it has to be the change I made yesterday. On Dec 15, 2011, at 8:37 AM, TERRY DONTJE wrote: > Last night MTT test results for 1.7a1r25652 from IU and Oracle is showing > failures during some of the spawn tests see > http://www.open-mpi.org/mtt/index.php?do_redir=2036.

[OMPI devel] Nodes already filled when spawning

2011-12-15 Thread TERRY DONTJE
Last night MTT test results for 1.7a1r25652 from IU and Oracle is showing failures during some of the spawn tests see http://www.open-mpi.org/mtt/index.php?do_redir=2036. Essentially, the test are failing with the message: All nodes which are allocated for this job are already filled. I

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-15 Thread Ralph Castain
On Dec 15, 2011, at 8:21 AM, Nathan Hjelm wrote: > > > On Wed, 14 Dec 2011, Ralph Castain wrote: > >> Yes - we were having problems making symbols in orterun visible for the >> "stat" debugger when built dynamically. The symbols are actually >> instantiated in the debugger base, but they

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-15 Thread Nathan Hjelm
On Wed, 14 Dec 2011, Ralph Castain wrote: Yes - we were having problems making symbols in orterun visible for the "stat" debugger when built dynamically. The symbols are actually instantiated in the debugger base, but they need to be "seen" in orterun prior to us calling orte_init. So, we

Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-15 Thread Paul H. Hargrove
On 12/14/2011 10:36 PM, Brice Goglin wrote: I committed the silence-warning patch but I will keep the other part for now. I am a bit afraid of changing that much code in 1.3.1 without being sure whether it's necessary. Sounds good to me. I certainly have no grounds to argue that RHL8 support

Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-15 Thread Brice Goglin
Le 14/12/2011 22:42, Paul H. Hargrove a écrit : > > > On 12/14/2011 1:21 PM, Brice Goglin wrote: >> The attached patch might work. I am not sure all this is actually >> necessary because things have been working fine so far, apart from your >> warnings. > > Yup, the patch silences the warnings.