Re: [OMPI devel] Remote key sizes

2011-11-08 Thread Barrett, Brian W
On 11/8/11 5:25 PM, "George Bosilca"  wrote:

>2. one sided: A quick look in the OSC seems to indicate there are some
>special handling to be done in the RDMA one. Look at
>ompi_osc_rdma_sendreq_t in osc_rdma_sendreq.h, it is using a trick to
>store the remote segments. First, the mca_btl_base_segment_t are stored
>at the end of the structure, in order to allow for dynamic allocation.
>Second, OSC doesn't seems to manipulate pointers to
>mca_btl_base_segment_t, but the content itself. I didn't went too deep
>here, but I think particular attention should be payed to OSC.

I don't entirely remember what I was doing when I wrote that code :).  The
OSC only does puts/gets from the initiator to a single segment on the
target, so the component contains an array of segments, one per peer.  I
only do RDMA when the source is contiguous, so the one in the sendreq is
the segment, not a malloc trick.

I'm planning on rewriting the RDMA one-sided component to implement the
MPI 3 semantics. I think we can make it a whole lot cleaner than the
current implementation.  Which means that if we come up with some rational
semantics for dealing with segments, I can make it work.  If we can get
them implemented before January, even better.

Brian

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories








Re: [OMPI devel] Remote key sizes

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 10:36 , Nathan T. Hjelm wrote:

> On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart 
> wrote:
>>> george.
>>> 
>>> PS: Regarding the hand-copy instead of the memcpy, we tried to avoid
>> using
>>> memcpy in performance critical codes, especially when we know the size of
>>> the data and the alignment. This relieves the compiler of adding ugly
>> intrinsics,
>>> allowing it to nicely pipeline to load/stores. Anyway, with both
>> approaches
>>> you will copy more data than needed for all BTLs except uGNI.
>> 
>> I was looking at a case in a BTL I was working on where I actually need
> 64
>> bytes (yes, bytes) as the remote key size as opposed to the current 16
>> bytes (128 bits).
>> Not sure how I can handle that yet.  (I assume configure is my friend,
> but
>> even in that case, all headers will need to carry around the extra data.)
>> 
> 
> I have been thinking about this a little bit. What I think should be done
> (and I am sure George will disagree) is to allow BTLs to define how long a

Well, I'm really sorry to deceive you … 

> segment is. The PML would then just memcpy the segments into the send
> buffer (instead of copying each member).

The only valid reason I can find now for having the seg_key as it is defined 
today is code simplicity. Read below you will understand.

Otherwise I completely agree with you, the seg_key is something belonging to 
the BTLs, and all knowledge about should be limited to the BTLs (aka PML should 
just move it around). The solution you propose make sense…

However, there are few things that I think make it more challenging to 
implement that it looks.

1. endianess: Apparently the BTL is already responsible of storing the key in 
network order, as no translation is done on the key in the PMLs. As I don't 
think any of them do, I will assume this is already [somehow] taken care of.

2. one sided: A quick look in the OSC seems to indicate there are some special 
handling to be done in the RDMA one. Look at ompi_osc_rdma_sendreq_t in 
osc_rdma_sendreq.h, it is using a trick to store the remote segments. First, 
the mca_btl_base_segment_t are stored at the end of the structure, in order to 
allow for dynamic allocation. Second, OSC doesn't seems to manipulate pointers 
to mca_btl_base_segment_t, but the content itself. I didn't went too deep here, 
but I think particular attention should be payed to OSC.

3. PML. In addition to seg_len we use the seg_addr field extensively all over 
the code base, so it should be exposed in the mca_btl_base_segment_t as well.

4. How do we keep the capability of dealing with multiple 
mca_btl_base_segment_t? Just imagine how the macro 
MCA_PML_OB1_COMPUTE_SEGMENT_LENGTH will look like…

Everything else should be quite trivial ;)

  george.



> For example mca_btl_base_segment_t would become:
> 
> struct mca_btl_base_segment_t {
>size_t seg_len;
> };
> 
> since the pml needs the segment size (it does not need anything else).
> 
> and then each btl would define its own segment like:
> struct mca_btl_ugni_segment_t {
>struct mca_btl_base_segment_t base;
>gni_mem_handle_t seg_key;
> };
> 
> and we would add:
> size_t btl_segment_len;
> 
> to the mca_btl_base_module_t or the base frag so the pml knows how much it
> needs to copy.
> 
> This design would address George's criticism of the length of the seg_key
> and also allow BTLs to do what they need to. It would require a memcpy but
> I disagree this would slow the critical path. Even if it does it would be
> relatively minor (i think) and the flexibility is worth more in the long
> run.
> 
> -Nathan
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] debugger changes

2011-11-08 Thread Paul H. Hargrove

Now this thread is starting to read like an episode of The Big Bang Theory.

One possible guess as to how/why MPICH has managed w/o "volatile" would 
be that they may pass less aggressive optimization flags to the 
compilers. It is a then a question of which MPI implementation is 
supporting a choice of compilers, not a selection of debuggers.


-Paul

On 11/8/2011 3:48 PM, George Bosilca wrote:

I will therefore propose to forever ban all compiler guys from this time-space, 
as now we have the undeniable proof that they concoct an evil plan against us. 
Otherwise, I can't explain how MPICH never had to add volatile to these 
particular variables and still support all these debuggers…

   george.


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca

On Nov 8, 2011, at 18:32 , Ralph Castain wrote:

> That was the experience - after thrashing for quite some time, we finally 
> found that the volatile qualifiers fixed the problem. Hence my request that 
> people check to see if anything is broken.

I will therefore propose to forever ban all compiler guys from this time-space, 
as now we have the undeniable proof that they concoct an evil plan against us. 
Otherwise, I can't explain how MPICH never had to add volatile to these 
particular variables and still support all these debuggers… 

  george.

> 
> 
>> 
>> -Paul
>> 
>> On 11/8/2011 2:46 PM, George Bosilca wrote:
>>> This value is not even read by the debugger. It only check for it's 
>>> existence in the startup process, so I guess we're safe here as well.
>> 
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> HPC Research Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread George Bosilca
I do not recall, and from the code there is no obvious reason. However, being 
able to store multiple smaller members might be a good enough reason.

Btw, we don't use the key8 at all. I guess we can clean that code up to only 
keep key32 and key64, eventually with the count to match up the right size ;)

  george.

On Nov 8, 2011, at 18:11 , Nathan T. Hjelm wrote:

> Ok, that makes sense. Is there a reason why the members were all set the be
> the same size?
> 
> Maybe seg_key should be:
> 
> union {
>  uint8_t key8;
>  uint16_t key16;
>  uint32_t key32;
>  uint64_t key64;
>  struct { uint64_t value[2] } key128;
> };
> 
> -Nathan
> 
> On Tue, 8 Nov 2011 17:22:48 -0500, George Bosilca 
> wrote:
>> Elements in an array are always stored in the expected [increasing]
> order,
>> regardless of the endianess of the architecture. Moreover, due to the
>> alignment rules, all members in a union will start at the same address.
>> 
>> It turns out there is no endianess conversion on the keys, so I suppose
>> both peers have to somehow reach a consensus outside the PML.
>> 
>>  george.
>> 
>> On Nov 8, 2011, at 08:57 , Nathan T. Hjelm wrote:
>> 
>>> Sure, I can do that. My only concern is with sending between hosts of
>>> different endianness.
>>> 
>>> For example, if seg_key is 128 bits wide and the key32 is 64 bits then
>> we
>>> might run into this:
>>> 
>>> Host 1: (big endian)
>>> Set seg_key.key32[0] = 0x
>>> 
>>> would result in seg_key: 0x 0x 0x 0x
>>> 
>>> Host 2: (little endian)
>>> Set seg_key.key32[0] = 0x1
>>> 
>>> would result in seg_key: 0x 0x 0x 0x
>>> 
>>> If either host were to send the other one its seg_key and try to use the
>>> key32 they would get garbage. I haven't tested this case yet but I can
>> test
>>> on a PPE of RR later today.
>>> 
>>> -Nathan
>>> 
>>> On Tue, 8 Nov 2011 08:26:04 -0500, Jeff Squyres 
>> wrote:
 On Nov 7, 2011, at 9:48 PM, Nathan T. Hjelm wrote:
 
> In retrospect I should have done a RFC for the 3rd change with a short
> timeout. At the time (operating on little sleep) it seemed like the
 commits
> would have minimal impact. Please let me know if the commits have any
> negative impact.
 
 FWIW, I think I'd like to see a rollback of the increase of array sizes
>>> in
 the seg_key union.  They weren't necessary and might be slightly
 misleading.
 
 --
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to:
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] debugger changes

2011-11-08 Thread Ralph Castain

On Nov 8, 2011, at 3:56 PM, Paul H. Hargrove wrote:

> In theory, might a sufficiently smart compiler and linker eliminate some 
> MPIR_* variables after optimization?  If that could potentially be true, then 
> perhaps the volatile qualifier would prevent such a removal, which would 
> break the existence check(s) by the debugger?  Just a thought.

That was the experience - after thrashing for quite some time, we finally found 
that the volatile qualifiers fixed the problem. Hence my request that people 
check to see if anything is broken.


> 
> -Paul
> 
> On 11/8/2011 2:46 PM, George Bosilca wrote:
>> This value is not even read by the debugger. It only check for it's 
>> existence in the startup process, so I guess we're safe here as well.
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Remote key sizes

2011-11-08 Thread Kenneth Lloyd
That makes sense to me.

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Nathan T. Hjelm
Sent: Tuesday, November 08, 2011 8:36 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Remote key sizes



On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart 
wrote:
>>  george.
>>
>>PS: Regarding the hand-copy instead of the memcpy, we tried to avoid
> using
>>memcpy in performance critical codes, especially when we know the size of
>>the data and the alignment. This relieves the compiler of adding ugly
> intrinsics,
>>allowing it to nicely pipeline to load/stores. Anyway, with both
> approaches
>>you will copy more data than needed for all BTLs except uGNI.
> 
> I was looking at a case in a BTL I was working on where I actually need
64
> bytes (yes, bytes) as the remote key size as opposed to the current 16
> bytes (128 bits).
> Not sure how I can handle that yet.  (I assume configure is my friend,
but
> even in that case, all headers will need to carry around the extra data.)
> 

I have been thinking about this a little bit. What I think should be done
(and I am sure George will disagree) is to allow BTLs to define how long a
segment is. The PML would then just memcpy the segments into the send
buffer (instead of copying each member).

For example mca_btl_base_segment_t would become:

struct mca_btl_base_segment_t {
size_t seg_len;
};

since the pml needs the segment size (it does not need anything else).

and then each btl would define its own segment like:
struct mca_btl_ugni_segment_t {
struct mca_btl_base_segment_t base;
gni_mem_handle_t seg_key;
};

and we would add:
size_t btl_segment_len;

to the mca_btl_base_module_t or the base frag so the pml knows how much it
needs to copy.

This design would address George's criticism of the length of the seg_key
and also allow BTLs to do what they need to. It would require a memcpy but
I disagree this would slow the critical path. Even if it does it would be
relatively minor (i think) and the flexibility is worth more in the long
run.

-Nathan

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/4003 - Release Date: 11/07/11



Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 17:56 , Paul H. Hargrove wrote:

> In theory, might a sufficiently smart compiler and linker eliminate some 
> MPIR_* variables after optimization?

Even if a compiler can optimize out symbols from an application, I doubt they 
are allowed to apply the same optimization on libraries. As our MPIR_ symbols 
are defined as externally visible in libopen-rte.so (and some in libmpi.so), so 
I guess we're safe.

However, this might be an issue when we compile statically … It is not an 
absolute proof, but I quickly checked with a static build and the MPIR_* 
symbols are still there with both gcc and icc.

> If that could potentially be true, then perhaps the volatile qualifier would 
> prevent such a removal, which would break the existence check(s) by the 
> debugger?  Just a thought.

If we really want to have a clear answer to this, I guess we should ask a 
hard-core compiler guru about … 

  george.

> 
> -Paul
> 
> On 11/8/2011 2:46 PM, George Bosilca wrote:
>> This value is not even read by the debugger. It only check for it's 
>> existence in the startup process, so I guess we're safe here as well.
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread Nathan T. Hjelm
Ok, that makes sense. Is there a reason why the members were all set the be
the same size?

Maybe seg_key should be:

union {
  uint8_t key8;
  uint16_t key16;
  uint32_t key32;
  uint64_t key64;
  struct { uint64_t value[2] } key128;
};

-Nathan

On Tue, 8 Nov 2011 17:22:48 -0500, George Bosilca 
wrote:
> Elements in an array are always stored in the expected [increasing]
order,
> regardless of the endianess of the architecture. Moreover, due to the
> alignment rules, all members in a union will start at the same address.
> 
> It turns out there is no endianess conversion on the keys, so I suppose
> both peers have to somehow reach a consensus outside the PML.
> 
>   george.
> 
> On Nov 8, 2011, at 08:57 , Nathan T. Hjelm wrote:
> 
>> Sure, I can do that. My only concern is with sending between hosts of
>> different endianness.
>>
>> For example, if seg_key is 128 bits wide and the key32 is 64 bits then
> we
>> might run into this:
>>
>> Host 1: (big endian)
>> Set seg_key.key32[0] = 0x
>>
>> would result in seg_key: 0x 0x 0x 0x
>>
>> Host 2: (little endian)
>> Set seg_key.key32[0] = 0x1
>>
>> would result in seg_key: 0x 0x 0x 0x
>>
>> If either host were to send the other one its seg_key and try to use the
>> key32 they would get garbage. I haven't tested this case yet but I can
> test
>> on a PPE of RR later today.
>>
>> -Nathan
>>
>> On Tue, 8 Nov 2011 08:26:04 -0500, Jeff Squyres 
> wrote:
>>> On Nov 7, 2011, at 9:48 PM, Nathan T. Hjelm wrote:
>>>
 In retrospect I should have done a RFC for the 3rd change with a short
 timeout. At the time (operating on little sleep) it seemed like the
>>> commits
 would have minimal impact. Please let me know if the commits have any
 negative impact.
>>>
>>> FWIW, I think I'd like to see a rollback of the increase of array sizes
>> in
>>> the seg_key union.  They weren't necessary and might be slightly
>>> misleading.
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] debugger changes

2011-11-08 Thread Paul H. Hargrove
In theory, might a sufficiently smart compiler and linker eliminate some 
MPIR_* variables after optimization?  If that could potentially be true, 
then perhaps the volatile qualifier would prevent such a removal, which 
would break the existence check(s) by the debugger?  Just a thought.


-Paul

On 11/8/2011 2:46 PM, George Bosilca wrote:

This value is not even read by the debugger. It only check for it's existence 
in the startup process, so I guess we're safe here as well.


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
I guess people should check the commit before …

No way the volatile will do any good here:
-ORTE_DECLSPEC extern volatile char MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
-ORTE_DECLSPEC extern volatile char MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
+ORTE_DECLSPEC extern char MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
+ORTE_DECLSPEC extern char MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];

This value is not even read by the debugger. It only check for it's existence 
in the startup process, so I guess we're safe here as well.

-volatile int MPIR_i_am_starter = 0;
+int MPIR_i_am_starter = 0;

  george.

On Nov 8, 2011, at 17:43 , Ashley Pittman wrote:

> 
> I think the volatiles are there to ensure the compiler doesn't optimise away 
> reads or function calls which has been a problem with this interface in the 
> past.
> 
> On 8 Nov 2011, at 22:18, George Bosilca wrote:
> 
>> MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the 
>> startup process or the MPI application to signal changes to the debugger. No 
>> return value, nothing more than a breakpoint.
>> 
>> I wonder how the volatile got there, there is no such requirement on 
>> variables that cannot be changed during execution.
>> 
>> george.
>> 
>> On Nov 8, 2011, at 08:36 , Jeff Squyres wrote:
>> 
>>> I think the only possible controversial change in this commit is changing 
>>> MPIR_Breakpoint() to return (void) instead of (void*).  Oddly, I see that 
>>> MPICH2 has 2 different prototypes for MPIR_Breakpoint -- one returns 
>>> (void*), another returns (int).  Assuming that MPICH2 works fine with the 
>>> debuggers, this suggests that the return is ignored by the tools -- as it 
>>> should be.
>>> 
>>> I didn't check the volatile removals; I'm assuming that George got them 
>>> right. :-)
>>> 
>>> I'll bet that this change does not cause any problems, but it might be 
>>> worth checking with the big 3+1:
>>> 
>>> - DDT
>>> - Totalview
>>> - padb
>>> - stat
>>> 
>>> 
>>> On Nov 7, 2011, at 8:24 PM, bosi...@osl.iu.edu wrote:
>>> 
 Author: bosilca
 Date: 2011-11-07 20:24:16 EST (Mon, 07 Nov 2011)
 New Revision: 25456
 URL: https://svn.open-mpi.org/trac/ompi/changeset/25456
 
 Log:
 Put the interface of our MPIR support in sync with the document accepted 
 by the MPI
 Forum (http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf).
 
 Text files modified: 
 trunk/ompi/debuggers/debuggers.h  |28 
 ++--
 trunk/orte/mca/debugger/base/base.h   |10 +-   

 trunk/orte/mca/debugger/base/debugger_base_fns.c  | 6 +++---   

 trunk/orte/mca/debugger/base/debugger_base_open.c | 6 +++---   

 4 files changed, 25 insertions(+), 25 deletions(-)
 
 Modified: trunk/ompi/debuggers/debuggers.h
 ==
 --- trunk/ompi/debuggers/debuggers.h   (original)
 +++ trunk/ompi/debuggers/debuggers.h   2011-11-07 20:24:16 EST (Mon, 
 07 Nov 2011)
 @@ -31,20 +31,20 @@
 
 BEGIN_C_DECLS
 
 -/**
 - * Wait for a debugger if asked.
 - */
 -extern void ompi_wait_for_debugger(void);
 -
 -/**
 - * Notify a debugger that we're about to abort
 - */
 -extern void ompi_debugger_notify_abort(char *string);
 -
 -/**
 - * Breakpoint function for parallel debuggers.
 - */
 -ORTE_DECLSPEC extern void *MPIR_Breakpoint(void);
 +/**
 + * Wait for a debugger if asked.
 + */
 +extern void ompi_wait_for_debugger(void);
 +
 +/**
 + * Notify a debugger that we're about to abort
 + */
 +extern void ompi_debugger_notify_abort(char *string);
 +
 +/**
 + * Breakpoint function for parallel debuggers.
 + */
 +ORTE_DECLSPEC extern void MPIR_Breakpoint(void);
 
 END_C_DECLS
 
 
 Modified: trunk/orte/mca/debugger/base/base.h
 ==
 --- trunk/orte/mca/debugger/base/base.h(original)
 +++ trunk/orte/mca/debugger/base/base.h2011-11-07 20:24:16 EST (Mon, 
 07 Nov 2011)
 @@ -61,18 +61,18 @@
 ORTE_DECLSPEC extern int MPIR_proctable_size;
 ORTE_DECLSPEC extern volatile int MPIR_being_debugged;
 ORTE_DECLSPEC extern volatile int MPIR_debug_state;
 -ORTE_DECLSPEC extern volatile int MPIR_i_am_starter;
 +ORTE_DECLSPEC extern int MPIR_i_am_starter;
 ORTE_DECLSPEC extern int MPIR_partial_attach_ok;
 -ORTE_DECLSPEC extern volatile char 
 MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
 -ORTE_DECLSPEC extern volatile char 
 MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
 +ORTE_DECLSPEC extern char 

Re: [OMPI devel] debugger changes

2011-11-08 Thread Ashley Pittman

I think the volatiles are there to ensure the compiler doesn't optimise away 
reads or function calls which has been a problem with this interface in the 
past.

On 8 Nov 2011, at 22:18, George Bosilca wrote:

> MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the 
> startup process or the MPI application to signal changes to the debugger. No 
> return value, nothing more than a breakpoint.
> 
> I wonder how the volatile got there, there is no such requirement on 
> variables that cannot be changed during execution.
> 
>  george.
> 
> On Nov 8, 2011, at 08:36 , Jeff Squyres wrote:
> 
>> I think the only possible controversial change in this commit is changing 
>> MPIR_Breakpoint() to return (void) instead of (void*).  Oddly, I see that 
>> MPICH2 has 2 different prototypes for MPIR_Breakpoint -- one returns 
>> (void*), another returns (int).  Assuming that MPICH2 works fine with the 
>> debuggers, this suggests that the return is ignored by the tools -- as it 
>> should be.
>> 
>> I didn't check the volatile removals; I'm assuming that George got them 
>> right. :-)
>> 
>> I'll bet that this change does not cause any problems, but it might be worth 
>> checking with the big 3+1:
>> 
>> - DDT
>> - Totalview
>> - padb
>> - stat
>> 
>> 
>> On Nov 7, 2011, at 8:24 PM, bosi...@osl.iu.edu wrote:
>> 
>>> Author: bosilca
>>> Date: 2011-11-07 20:24:16 EST (Mon, 07 Nov 2011)
>>> New Revision: 25456
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/25456
>>> 
>>> Log:
>>> Put the interface of our MPIR support in sync with the document accepted by 
>>> the MPI
>>> Forum (http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf).
>>> 
>>> Text files modified: 
>>> trunk/ompi/debuggers/debuggers.h  |28 
>>> ++--
>>> trunk/orte/mca/debugger/base/base.h   |10 +-
>>>   
>>> trunk/orte/mca/debugger/base/debugger_base_fns.c  | 6 +++---
>>>   
>>> trunk/orte/mca/debugger/base/debugger_base_open.c | 6 +++---
>>>   
>>> 4 files changed, 25 insertions(+), 25 deletions(-)
>>> 
>>> Modified: trunk/ompi/debuggers/debuggers.h
>>> ==
>>> --- trunk/ompi/debuggers/debuggers.h(original)
>>> +++ trunk/ompi/debuggers/debuggers.h2011-11-07 20:24:16 EST (Mon, 
>>> 07 Nov 2011)
>>> @@ -31,20 +31,20 @@
>>> 
>>> BEGIN_C_DECLS
>>> 
>>> -/**
>>> - * Wait for a debugger if asked.
>>> - */
>>> -extern void ompi_wait_for_debugger(void);
>>> -
>>> -/**
>>> - * Notify a debugger that we're about to abort
>>> - */
>>> -extern void ompi_debugger_notify_abort(char *string);
>>> -
>>> -/**
>>> - * Breakpoint function for parallel debuggers.
>>> - */
>>> -ORTE_DECLSPEC extern void *MPIR_Breakpoint(void);
>>> +/**
>>> + * Wait for a debugger if asked.
>>> + */
>>> +extern void ompi_wait_for_debugger(void);
>>> +
>>> +/**
>>> + * Notify a debugger that we're about to abort
>>> + */
>>> +extern void ompi_debugger_notify_abort(char *string);
>>> +
>>> +/**
>>> + * Breakpoint function for parallel debuggers.
>>> + */
>>> +ORTE_DECLSPEC extern void MPIR_Breakpoint(void);
>>> 
>>> END_C_DECLS
>>> 
>>> 
>>> Modified: trunk/orte/mca/debugger/base/base.h
>>> ==
>>> --- trunk/orte/mca/debugger/base/base.h (original)
>>> +++ trunk/orte/mca/debugger/base/base.h 2011-11-07 20:24:16 EST (Mon, 
>>> 07 Nov 2011)
>>> @@ -61,18 +61,18 @@
>>> ORTE_DECLSPEC extern int MPIR_proctable_size;
>>> ORTE_DECLSPEC extern volatile int MPIR_being_debugged;
>>> ORTE_DECLSPEC extern volatile int MPIR_debug_state;
>>> -ORTE_DECLSPEC extern volatile int MPIR_i_am_starter;
>>> +ORTE_DECLSPEC extern int MPIR_i_am_starter;
>>> ORTE_DECLSPEC extern int MPIR_partial_attach_ok;
>>> -ORTE_DECLSPEC extern volatile char 
>>> MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
>>> -ORTE_DECLSPEC extern volatile char 
>>> MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
>>> +ORTE_DECLSPEC extern char MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
>>> +ORTE_DECLSPEC extern char MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
>>> ORTE_DECLSPEC extern volatile int MPIR_forward_output;
>>> ORTE_DECLSPEC extern volatile int MPIR_forward_comm;
>>> ORTE_DECLSPEC extern char MPIR_attach_fifo[MPIR_MAX_PATH_LENGTH];
>>> ORTE_DECLSPEC extern int MPIR_force_to_main;
>>> 
>>> -typedef void* (*orte_debugger_breakpoint_fn_t)(void);
>>> +typedef void (*orte_debugger_breakpoint_fn_t)(void);
>>> 
>>> -ORTE_DECLSPEC void* MPIR_Breakpoint(void);
>>> +ORTE_DECLSPEC void MPIR_Breakpoint(void);
>>> 
>>> /* --- end MPICH/TotalView std debugger interface definitions */
>>> 
>>> 
>>> Modified: trunk/orte/mca/debugger/base/debugger_base_fns.c
>>> 

Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-11-08 Thread George Bosilca
Larry,

Thanks for following with us on this. I think your patch is cleaner than what 
we currently have in the trunk, so I went ahead and push it in the trunk 
(25461). I will request a push in 1.5 and 1.4 as well.

  Regards,
george.

On Nov 8, 2011, at 13:57 , Larry Baker wrote:

> The good news is that the issue reported in R25290 is fixed in the latest 
> Intel compilers release (2011.7.256).  The bad news is that both the 
> 2011.6.233 and 2011.7.256 releases identify themselves as V12.1.0 from the 
> command line.  (I reported this bug to Intel already.)  They can only be 
> reliably distinguished using the predefined __INTEL_COMPILER_BUILD_DATE 
> macro.  I verified that the build dates for all three compilers we have -- 
> Linux, Mac OS X, and Windows -- are the same.
> 
> I developed a more targeted patch (attached) for OpenMPI 1.4.3 
> opal/mca/memory/ptmalloc2/malloc.c which disables vectorization for 
> _int_malloc() only if an Intel compiler with the 2011.6.233 release build 
> date is found (__INTEL_COMPILER_BUILD_DATE == 20110811).  This patch could 
> presumably make its way into all the copies of 
> opal/mca/memory/ptmalloc2/malloc.c in the various versions of OpenMPI that 
> are still being maintained.
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:
> 
>> Larry,
>> 
>> Sorry for not updating this thread. The issue was identified and fixed by 
>> Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290). 
>> Please read the comments and the linked thread on the Intel forum for more 
>> info about.
>> 
>> I couldn't find a trace of this being fixed in the 1.4 series, so I would 
>> wait upgrading until this issue gets resolved.
>> 
>>   Thanks,
>> george.
>> 
>> On Oct 17, 2011, at 23:00 , Larry Baker wrote:
>> 
>>> George,
>>> 
>>> I have not had time to look over the 1.4.3 make check failure for Intel 
>>> 2011.6.233 compilers.  Have you?
>>> 
>>> I had planned to get 1.4.3 compiled on all six of our compilers using the 
>>> latest compiler releases.  I was putting off upgrading to 1.4.4 or 1.5.x 
>>> until after that to minimize the number of things that could go wrong.  Do 
>>> you recommend otherwise?
>>> 
>>> Larry Baker
>>> US Geological Survey
>>> 650-329-5608
>>> ba...@usgs.gov
>>> 
>>> On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:
>>> 
 The may_alias attribute was part of a forward-looking attribute checking, 
 at a time where few compiler supported them. This explains why they are 
 not widely used in the library itself. Moreover, as they do not affect the 
 compilation itself (as your test highlights this is not the issue with the 
 icc 2011.6.233 compiler), there is no urge to remove the may_alias support.
 
 I just got that particular version of the compiler installed on one of our 
 machines. I'll give it a try over the weekend.
 
   george.
 
 On Oct 7, 2011, at 20:21 , Larry Baker wrote:
 
> The test for the __may_alias_ attribute uses the following short code 
> snippet:
> 
>> int * p_value __attribute__ ((__may_alias__));
>> int
>> main ()
>> {
>> 
>>   ;
>>   return 0;
>> }
> 
> Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a 
> warning:
> 
>> root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220
>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c 
>> may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored
>>   int * p_value __attribute__ ((__may_alias__));
>> ^
>> 
>> [root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220
> 
>> [root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233
>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c 
> 
> 
> I modified ./configure to force
> 
>> ompi_cv___attribute__may_alias=0
> 
> 
> Then I compiled and tested the library.  Unfortunately, the results were 
> exactly the same:
> 
>> make  check-TESTS
>> make[3]: Entering directory 
>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
>> /bin/sh: line 4: 26326 Segmentation fault  ${dir}$tst
>> FAIL: checksum
>> /bin/sh: line 4: 26359 Segmentation fault  ${dir}$tst
>> FAIL: position
>> 
>> 2 of 2 tests failed
>> Please report to http://www.open-mpi.org/community/help/
>> 
> 
> 
> I could not find any use of the may_alias attribute, other than in a 
> #define in opal/include/opal_config_bottom.h.  Is 
> OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed?
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> On 7 Oct 2011, at 11:08 AM, 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread George Bosilca
Elements in an array are always stored in the expected [increasing] order, 
regardless of the endianess of the architecture. Moreover, due to the alignment 
rules, all members in a union will start at the same address.

It turns out there is no endianess conversion on the keys, so I suppose both 
peers have to somehow reach a consensus outside the PML.

  george.

On Nov 8, 2011, at 08:57 , Nathan T. Hjelm wrote:

> Sure, I can do that. My only concern is with sending between hosts of
> different endianness.
> 
> For example, if seg_key is 128 bits wide and the key32 is 64 bits then we
> might run into this:
> 
> Host 1: (big endian)
> Set seg_key.key32[0] = 0x
> 
> would result in seg_key: 0x 0x 0x 0x
> 
> Host 2: (little endian)
> Set seg_key.key32[0] = 0x1
> 
> would result in seg_key: 0x 0x 0x 0x
> 
> If either host were to send the other one its seg_key and try to use the
> key32 they would get garbage. I haven't tested this case yet but I can test
> on a PPE of RR later today.
> 
> -Nathan
> 
> On Tue, 8 Nov 2011 08:26:04 -0500, Jeff Squyres  wrote:
>> On Nov 7, 2011, at 9:48 PM, Nathan T. Hjelm wrote:
>> 
>>> In retrospect I should have done a RFC for the 3rd change with a short
>>> timeout. At the time (operating on little sleep) it seemed like the
>> commits
>>> would have minimal impact. Please let me know if the commits have any
>>> negative impact.
>> 
>> FWIW, I think I'd like to see a rollback of the increase of array sizes
> in
>> the seg_key union.  They weren't necessary and might be slightly
>> misleading.
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the 
startup process or the MPI application to signal changes to the debugger. No 
return value, nothing more than a breakpoint.

I wonder how the volatile got there, there is no such requirement on variables 
that cannot be changed during execution.

  george.

On Nov 8, 2011, at 08:36 , Jeff Squyres wrote:

> I think the only possible controversial change in this commit is changing 
> MPIR_Breakpoint() to return (void) instead of (void*).  Oddly, I see that 
> MPICH2 has 2 different prototypes for MPIR_Breakpoint -- one returns (void*), 
> another returns (int).  Assuming that MPICH2 works fine with the debuggers, 
> this suggests that the return is ignored by the tools -- as it should be.
> 
> I didn't check the volatile removals; I'm assuming that George got them 
> right. :-)
> 
> I'll bet that this change does not cause any problems, but it might be worth 
> checking with the big 3+1:
> 
> - DDT
> - Totalview
> - padb
> - stat
> 
> 
> On Nov 7, 2011, at 8:24 PM, bosi...@osl.iu.edu wrote:
> 
>> Author: bosilca
>> Date: 2011-11-07 20:24:16 EST (Mon, 07 Nov 2011)
>> New Revision: 25456
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/25456
>> 
>> Log:
>> Put the interface of our MPIR support in sync with the document accepted by 
>> the MPI
>> Forum (http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf).
>> 
>> Text files modified: 
>>  trunk/ompi/debuggers/debuggers.h  |28 
>> ++--
>>  trunk/orte/mca/debugger/base/base.h   |10 +-
>>   
>>  trunk/orte/mca/debugger/base/debugger_base_fns.c  | 6 +++---
>>   
>>  trunk/orte/mca/debugger/base/debugger_base_open.c | 6 +++---
>>   
>>  4 files changed, 25 insertions(+), 25 deletions(-)
>> 
>> Modified: trunk/ompi/debuggers/debuggers.h
>> ==
>> --- trunk/ompi/debuggers/debuggers.h (original)
>> +++ trunk/ompi/debuggers/debuggers.h 2011-11-07 20:24:16 EST (Mon, 07 Nov 
>> 2011)
>> @@ -31,20 +31,20 @@
>> 
>> BEGIN_C_DECLS
>> 
>> -/**
>> - * Wait for a debugger if asked.
>> - */
>> -extern void ompi_wait_for_debugger(void);
>> -
>> -/**
>> - * Notify a debugger that we're about to abort
>> - */
>> -extern void ompi_debugger_notify_abort(char *string);
>> -
>> -/**
>> - * Breakpoint function for parallel debuggers.
>> - */
>> -ORTE_DECLSPEC extern void *MPIR_Breakpoint(void);
>> +/**
>> + * Wait for a debugger if asked.
>> + */
>> +extern void ompi_wait_for_debugger(void);
>> +
>> +/**
>> + * Notify a debugger that we're about to abort
>> + */
>> +extern void ompi_debugger_notify_abort(char *string);
>> +
>> +/**
>> + * Breakpoint function for parallel debuggers.
>> + */
>> +ORTE_DECLSPEC extern void MPIR_Breakpoint(void);
>> 
>> END_C_DECLS
>> 
>> 
>> Modified: trunk/orte/mca/debugger/base/base.h
>> ==
>> --- trunk/orte/mca/debugger/base/base.h  (original)
>> +++ trunk/orte/mca/debugger/base/base.h  2011-11-07 20:24:16 EST (Mon, 
>> 07 Nov 2011)
>> @@ -61,18 +61,18 @@
>> ORTE_DECLSPEC extern int MPIR_proctable_size;
>> ORTE_DECLSPEC extern volatile int MPIR_being_debugged;
>> ORTE_DECLSPEC extern volatile int MPIR_debug_state;
>> -ORTE_DECLSPEC extern volatile int MPIR_i_am_starter;
>> +ORTE_DECLSPEC extern int MPIR_i_am_starter;
>> ORTE_DECLSPEC extern int MPIR_partial_attach_ok;
>> -ORTE_DECLSPEC extern volatile char 
>> MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
>> -ORTE_DECLSPEC extern volatile char 
>> MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
>> +ORTE_DECLSPEC extern char MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
>> +ORTE_DECLSPEC extern char MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
>> ORTE_DECLSPEC extern volatile int MPIR_forward_output;
>> ORTE_DECLSPEC extern volatile int MPIR_forward_comm;
>> ORTE_DECLSPEC extern char MPIR_attach_fifo[MPIR_MAX_PATH_LENGTH];
>> ORTE_DECLSPEC extern int MPIR_force_to_main;
>> 
>> -typedef void* (*orte_debugger_breakpoint_fn_t)(void);
>> +typedef void (*orte_debugger_breakpoint_fn_t)(void);
>> 
>> -ORTE_DECLSPEC void* MPIR_Breakpoint(void);
>> +ORTE_DECLSPEC void MPIR_Breakpoint(void);
>> 
>> /* --- end MPICH/TotalView std debugger interface definitions */
>> 
>> 
>> Modified: trunk/orte/mca/debugger/base/debugger_base_fns.c
>> ==
>> --- trunk/orte/mca/debugger/base/debugger_base_fns.c (original)
>> +++ trunk/orte/mca/debugger/base/debugger_base_fns.c 2011-11-07 20:24:16 EST 
>> (Mon, 07 Nov 2011)
>> @@ -168,7 +168,7 @@
>> */
>>ORTE_PROGRESSED_WAIT(false, jdata->num_reported, jdata->num_procs);
>> 
>> -(void) MPIR_Breakpoint();
>> +

[OMPI devel] Open MPI BOF

2011-11-08 Thread George Bosilca
Folks,

Wednesday November 15th at 12:15 PST, we will have an Open MPI BOF. We will 
have two guest speakers: Rolf vandeVaart from NVIDIA and Shinji Sumimoto from 
the K-computer. If you are at SC, you are all invited to participate to this 
annual event. Blend for a moment with our user community, and eventually answer 
particular questions raised during the discussion.

In same time if you have any early work, any exciting features or any 
long-awaited fix for Open MPI and you want to make the user community aware 
about, this will be a perfect opportunity. Send me one (max two slides) by 
Monday COB, and I will include it in the presentation.

  Looking forward to meet you there,
george.




Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-11-08 Thread Larry Baker
The good news is that the issue reported in R25290 is fixed in the latest Intel compilers release (2011.7.256).  The bad news is that both the 2011.6.233 and 2011.7.256 releases identify themselves as V12.1.0 from the command line.  (I reported this bug to Intel already.)  They can only be reliably distinguished using the predefined __INTEL_COMPILER_BUILD_DATE macro.  I verified that the build dates for all three compilers we have -- Linux, Mac OS X, and Windows -- are the same.I developed a more targeted patch (attached) for OpenMPI 1.4.3 opal/mca/memory/ptmalloc2/malloc.c which disables vectorization for _int_malloc() only if an Intel compiler with the 2011.6.233 release build date is found (__INTEL_COMPILER_BUILD_DATE == 20110811).  This patch could presumably make its way into all the copies of opal/mca/memory/ptmalloc2/malloc.c in the various versions of OpenMPI that are still being maintained. Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:Larry,Sorry for not updating this thread. The issue was identified and fixed by Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290). Please read the comments and the linked thread on the Intel forum for more info about.I couldn't find a trace of this being fixed in the 1.4 series, so I would wait upgrading until this issue gets resolved.  Thanks,    george.On Oct 17, 2011, at 23:00 , Larry Baker wrote:George,I have not had time to look over the 1.4.3 make check failure for Intel 2011.6.233 compilers.  Have you?I had planned to get 1.4.3 compiled on all six of our compilers using the latest compiler releases.  I was putting off upgrading to 1.4.4 or 1.5.x until after that to minimize the number of things that could go wrong.  Do you recommend otherwise? Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:The may_alias attribute was part of a forward-looking attribute checking, at a time where few compiler supported them. This explains why they are not widely used in the library itself. Moreover, as they do not affect the compilation itself (as your test highlights this is not the issue with the icc 2011.6.233 compiler), there is no urge to remove the may_alias support.I just got that particular version of the compiler installed on one of our machines. I'll give it a try over the weekend.  george.On Oct 7, 2011, at 20:21 , Larry Baker wrote:The test for the __may_alias_ attribute uses the following short code snippet:int * p_value __attribute__ ((__may_alias__));intmain (){  ;  return 0;}Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a warning:root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored  int * p_value __attribute__ ((__may_alias__));                                ^[root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220[root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c I modified ./configure to forceompi_cv___attribute__may_alias=0Then I compiled and tested the library.  Unfortunately, the results were exactly the same:make  check-TESTSmake[3]: Entering directory `/state/partition1/root/src/openmpi-1.4.3/test/datatype'/bin/sh: line 4: 26326 Segmentation fault      ${dir}$tstFAIL: checksum/bin/sh: line 4: 26359 Segmentation fault      ${dir}$tstFAIL: position2 of 2 tests failedPlease report to http://www.open-mpi.org/community/help/I could not find any use of the may_alias attribute, other than in a #define in opal/include/opal_config_bottom.h.  Is OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed? Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 7 Oct 2011, at 11:08 AM, Larry Baker wrote:I ran into a problem this past week trying to upgrade our OpenMPI 1.4.3 for the latest Intel 2011 compiler, 2011.6.233.make check fails with Segmentation Fault errors:[root@hydra openmpi-1.4.3]# tail -20 ../openmpi-1.4.3-check-intel.6.233.log/bin/sh ../../libtool --tag=CC   --mode=link icc  -DNDEBUG -g -O3 -finline-functions -fno-strict-aliasing -restrict -pthread -fvisibility=hidden -shared-intel -export-dynamic -shared-intel  -o ddt_pack ddt_pack.o ../../ompi/libmpi.la -lnsl -lutil  libtool: link: icc -DNDEBUG -g -O3 -finline-functions -fno-strict-aliasing -restrict -pthread -fvisibility=hidden -shared-intel -shared-intel -o .libs/ddt_pack ddt_pack.o -Wl,--export-dynamic  ../../ompi/.libs/libmpi.so /usr/local/src/openmpi-1.4.3/orte/.libs/libopen-rte.so /usr/local/src/openmpi-1.4.3/opal/.libs/libopen-pal.so -ldl -lnsl -lutil -pthread -Wl,-rpath -Wl,/usr/local/libmake[3]: Leaving directory `/state/partition1/root/src/openmpi-1.4.3/test/datatype'make  check-TESTSmake[3]: 

Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ralph Castain

On Nov 8, 2011, at 8:37 AM, Jeff Squyres wrote:

> On Nov 8, 2011, at 10:25 AM, George Bosilca wrote:
> 
>> However, based on what we have in the trunk today, Open MPI doesn't follow 
>> that document. As Ralph pinpointed it, the current version work with several 
>> tools (tv, stat, padb) as is, so that means the tools do not really follow 
>> that document either. 
> 
> This is not quite accurate.
> 
> What the tools did over the past decade was make it so that they work with 
> the 5-6 MPIR variants that are out there.  So yes, they work with OMPI, but 
> they work with the others who aren't quite "right," either.  Because before 
> this, there was no central definition of "right."

Agreed, though with a slight variation. Not only were the MPIs variant, but so 
are the tools. Some tools support various MPIR extensions and combinations of 
features, and others don't. That was the motivation behind some of us "pushing" 
the tool vendors to create a "standard" MPIR definition - it was to get all 
those extensions defined. The base stuff was always pretty common.

And yes - I was one of those "twisting" their arms because I got tired of 
dealing with all the bloody tool interface variations, providing special code 
to support someone's pet extension, etc.

> 
> The intent of the document was to make that central definition of "right" and 
> gradually have everyone move to it.  AFAIK, all the tools have been updated 
> to work with the "right" definition of MPIR.

While I think people may generally support some of the basic MPIR definitions, 
I haven't seen movement to supporting the full range - but maybe I've missed 
it. I haven't been following it as much over the last year or so.

Even if they have, though, there is no way for us to control what release 
someone is using. So we still have to support both the old and the new 
variations for some time.

> 
> Keep in mind that this is pretty much the same rationale as to why MPI still 
> supports functions like MPI_ATTR_SET: even though it's deprecated, there's 
> apps out there that still use it and will take a long time to adapt.  Hence, 
> the tools will keep supporting the "old" / "not-quite-right" definitions of 
> MPIR for a long time.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] debugger confusion

2011-11-08 Thread Jeff Squyres
On Nov 8, 2011, at 10:25 AM, George Bosilca wrote:

> However, based on what we have in the trunk today, Open MPI doesn't follow 
> that document. As Ralph pinpointed it, the current version work with several 
> tools (tv, stat, padb) as is, so that means the tools do not really follow 
> that document either. 

This is not quite accurate.

What the tools did over the past decade was make it so that they work with the 
5-6 MPIR variants that are out there.  So yes, they work with OMPI, but they 
work with the others who aren't quite "right," either.  Because before this, 
there was no central definition of "right."

The intent of the document was to make that central definition of "right" and 
gradually have everyone move to it.  AFAIK, all the tools have been updated to 
work with the "right" definition of MPIR.

Keep in mind that this is pretty much the same rationale as to why MPI still 
supports functions like MPI_ATTR_SET: even though it's deprecated, there's apps 
out there that still use it and will take a long time to adapt.  Hence, the 
tools will keep supporting the "old" / "not-quite-right" definitions of MPIR 
for a long time.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] Remote key sizes

2011-11-08 Thread Nathan T. Hjelm


On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart 
wrote:
>>  george.
>>
>>PS: Regarding the hand-copy instead of the memcpy, we tried to avoid
> using
>>memcpy in performance critical codes, especially when we know the size of
>>the data and the alignment. This relieves the compiler of adding ugly
> intrinsics,
>>allowing it to nicely pipeline to load/stores. Anyway, with both
> approaches
>>you will copy more data than needed for all BTLs except uGNI.
> 
> I was looking at a case in a BTL I was working on where I actually need
64
> bytes (yes, bytes) as the remote key size as opposed to the current 16
> bytes (128 bits).
> Not sure how I can handle that yet.  (I assume configure is my friend,
but
> even in that case, all headers will need to carry around the extra data.)
> 

I have been thinking about this a little bit. What I think should be done
(and I am sure George will disagree) is to allow BTLs to define how long a
segment is. The PML would then just memcpy the segments into the send
buffer (instead of copying each member).

For example mca_btl_base_segment_t would become:

struct mca_btl_base_segment_t {
size_t seg_len;
};

since the pml needs the segment size (it does not need anything else).

and then each btl would define its own segment like:
struct mca_btl_ugni_segment_t {
struct mca_btl_base_segment_t base;
gni_mem_handle_t seg_key;
};

and we would add:
size_t btl_segment_len;

to the mca_btl_base_module_t or the base frag so the pml knows how much it
needs to copy.

This design would address George's criticism of the length of the seg_key
and also allow BTLs to do what they need to. It would require a memcpy but
I disagree this would slow the critical path. Even if it does it would be
relatively minor (i think) and the flexibility is worth more in the long
run.

-Nathan



Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ralph Castain

On Nov 8, 2011, at 8:25 AM, George Bosilca wrote:

> 
> On Nov 8, 2011, at 07:52 , Jeff Squyres wrote:
> 
>> To be clear: that document simply standardizes what MPI implementations are 
>> supposed to provide in their MPIR implementation (prior to this, MPI 
>> implementations tended to have subtle differences between their MPIR 
>> implementations, which were a nightmare for the debugger/tool vendors).  
>> This document does *not* fix the scalability and other well-known issues 
>> with MPIR -- it just consolidates and standardizes the slightly-different 
>> versions of MPIR that were floating around out there.
> 
> However, based on what we have in the trunk today, Open MPI doesn't follow 
> that document. As Ralph pinpointed it, the current version work with several 
> tools (tv, stat, padb) as is, so that means the tools do not really follow 
> that document either. What a mess …
> 
> All the time we spent in the MPI Forum talking about the MPIR interface, and 
> look at the result !


Patience, patience - I look at the document as describing where people want to 
go, not a snapshot of where they already are. What will be interesting to see 
is how long it takes them to get there, how many of them will bother to do so, 
etc.

And, of course, how we maintain integration with all of them as the migration 
progresses!

> 
>  george.
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] debugger confusion

2011-11-08 Thread George Bosilca

On Nov 8, 2011, at 07:52 , Jeff Squyres wrote:

> To be clear: that document simply standardizes what MPI implementations are 
> supposed to provide in their MPIR implementation (prior to this, MPI 
> implementations tended to have subtle differences between their MPIR 
> implementations, which were a nightmare for the debugger/tool vendors).  This 
> document does *not* fix the scalability and other well-known issues with MPIR 
> -- it just consolidates and standardizes the slightly-different versions of 
> MPIR that were floating around out there.

However, based on what we have in the trunk today, Open MPI doesn't follow that 
document. As Ralph pinpointed it, the current version work with several tools 
(tv, stat, padb) as is, so that means the tools do not really follow that 
document either. What a mess …

All the time we spent in the MPI Forum talking about the MPIR interface, and 
look at the result !

  george.




[OMPI devel] Remote key sizes

2011-11-08 Thread Rolf vandeVaart
>  george.
>
>PS: Regarding the hand-copy instead of the memcpy, we tried to avoid using
>memcpy in performance critical codes, especially when we know the size of
>the data and the alignment. This relieves the compiler of adding ugly 
>intrinsics,
>allowing it to nicely pipeline to load/stores. Anyway, with both approaches
>you will copy more data than needed for all BTLs except uGNI.

I was looking at a case in a BTL I was working on where I actually need 64 
bytes (yes, bytes) as the remote key size as opposed to the current 16 bytes 
(128 bits).
Not sure how I can handle that yet.  (I assume configure is my friend, but even 
in that case, all headers will need to carry around the extra data.)

Rolf

>
>On Nov 7, 2011, at 21:48 , Nathan T. Hjelm wrote:
>
>>
>>
>> On Mon, 7 Nov 2011 17:18:42 -0500, George Bosilca
>> 
>> wrote:
>>> A little bit of history:
>>>
>>> 1. r25305: added 2 atomic operations to OPAL. However, they only
>>> exists
>> on
>>> amd64 and are only used in the vader BTL, which I assume only
>>> supports amd64.
>>
>> Two things:
>> - The atomic is a new feature that has no impact on existing code. It
>> can also be implemented on Intel but we have not tested it (yet).
>> - The atomic was pushed to support lock-free queues in the Vader BTL.
>> Vader does not need the atomics and can use an atomic lock lock but I
>> see higher latencies when using locks.
>>
>> Why would this change (that has no impact on any other code) need an
>RFC?
>>
>>> 2. r25334: The seg_key union got a new member ptr. This member is
>>> solely used in the vader BTL, as all other BTL use a compiler trick
>>> to convert a pointer to a 64 bits.
>>
>> I am actually going to remove that member. I prefer the use of
>> uintptr_t over casting to a uint64_t but it has no real benefit and
>> possibly a pitfall due to its platform dependent size.
>>
>> But the member has, like the atomic, no impact on any exiting code. It
>> does not change the size of the seg_key and was only used by Vader.
>> Why would this change have required an RFC?
>>
>>> 3. r25445: All members of the seg_key union got friends, because Cray
>> dare
>>> to set their keys at 128 bits long. However a quick  find . -name
>>> "*.[ch]" -exec grep -Hn seg_key {} \; | grep "\[1\]"
>>> indicates that no BTL is using 128 bits keys. Code has been added to
>>> all PMLs, but I guess they just copy empty data.
>>
>> For now they copy empty data but in the near future (as I have said)
>> we will need to bits for the ugni btl (Cray XE Gemini). I pushed this
>> code to prepare for pushing ugni.
>>
>> Also, you might be a good person to ask: Why do we copy each member of
>> a segment individually in the PMLs? Wouldn't it be faster to do a
>> memcpy? If we were using a memcpy I would not have had to make any
>change to the pmls.
>>
>>> What I see is a pattern of commits that can have been dealt with
>>> differently. None had an RFC, and most of them are not even used.
>>
>> I think you are reaching a little here. I pushed several changes over
>> a period of a month. The first two are not related to the third which
>> is the only one that could have any impact to existing code and might
>> require an RFC.
>>
>> In retrospect I should have done a RFC for the 3rd change with a short
>> timeout. At the time (operating on little sleep) it seemed like the
>> commits would have minimal impact. Please let me know if the commits
>> have any negative impact.
>>
>> -Nathan
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread Nathan T. Hjelm
Sure, I can do that. My only concern is with sending between hosts of
different endianness.

For example, if seg_key is 128 bits wide and the key32 is 64 bits then we
might run into this:

Host 1: (big endian)
Set seg_key.key32[0] = 0x

would result in seg_key: 0x 0x 0x 0x

Host 2: (little endian)
Set seg_key.key32[0] = 0x1

would result in seg_key: 0x 0x 0x 0x

If either host were to send the other one its seg_key and try to use the
key32 they would get garbage. I haven't tested this case yet but I can test
on a PPE of RR later today.

-Nathan

On Tue, 8 Nov 2011 08:26:04 -0500, Jeff Squyres  wrote:
> On Nov 7, 2011, at 9:48 PM, Nathan T. Hjelm wrote:
> 
>> In retrospect I should have done a RFC for the 3rd change with a short
>> timeout. At the time (operating on little sleep) it seemed like the
> commits
>> would have minimal impact. Please let me know if the commits have any
>> negative impact.
> 
> FWIW, I think I'd like to see a rollback of the increase of array sizes
in
> the seg_key union.  They weren't necessary and might be slightly
> misleading.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] debugger changes

2011-11-08 Thread Jeff Squyres
I think the only possible controversial change in this commit is changing 
MPIR_Breakpoint() to return (void) instead of (void*).  Oddly, I see that 
MPICH2 has 2 different prototypes for MPIR_Breakpoint -- one returns (void*), 
another returns (int).  Assuming that MPICH2 works fine with the debuggers, 
this suggests that the return is ignored by the tools -- as it should be.

I didn't check the volatile removals; I'm assuming that George got them right. 
:-)

I'll bet that this change does not cause any problems, but it might be worth 
checking with the big 3+1:

- DDT
- Totalview
- padb
- stat


On Nov 7, 2011, at 8:24 PM, bosi...@osl.iu.edu wrote:

> Author: bosilca
> Date: 2011-11-07 20:24:16 EST (Mon, 07 Nov 2011)
> New Revision: 25456
> URL: https://svn.open-mpi.org/trac/ompi/changeset/25456
> 
> Log:
> Put the interface of our MPIR support in sync with the document accepted by 
> the MPI
> Forum (http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf).
> 
> Text files modified: 
>   trunk/ompi/debuggers/debuggers.h  |28 
> ++--
>   trunk/orte/mca/debugger/base/base.h   |10 +-
>   
>   trunk/orte/mca/debugger/base/debugger_base_fns.c  | 6 +++---
>   
>   trunk/orte/mca/debugger/base/debugger_base_open.c | 6 +++---
>   
>   4 files changed, 25 insertions(+), 25 deletions(-)
> 
> Modified: trunk/ompi/debuggers/debuggers.h
> ==
> --- trunk/ompi/debuggers/debuggers.h  (original)
> +++ trunk/ompi/debuggers/debuggers.h  2011-11-07 20:24:16 EST (Mon, 07 Nov 
> 2011)
> @@ -31,20 +31,20 @@
> 
> BEGIN_C_DECLS
> 
> -/**
> - * Wait for a debugger if asked.
> - */
> -extern void ompi_wait_for_debugger(void);
> -
> -/**
> - * Notify a debugger that we're about to abort
> - */
> -extern void ompi_debugger_notify_abort(char *string);
> -
> -/**
> - * Breakpoint function for parallel debuggers.
> - */
> -ORTE_DECLSPEC extern void *MPIR_Breakpoint(void);
> +/**
> + * Wait for a debugger if asked.
> + */
> +extern void ompi_wait_for_debugger(void);
> +
> +/**
> + * Notify a debugger that we're about to abort
> + */
> +extern void ompi_debugger_notify_abort(char *string);
> +
> +/**
> + * Breakpoint function for parallel debuggers.
> + */
> +ORTE_DECLSPEC extern void MPIR_Breakpoint(void);
> 
> END_C_DECLS
> 
> 
> Modified: trunk/orte/mca/debugger/base/base.h
> ==
> --- trunk/orte/mca/debugger/base/base.h   (original)
> +++ trunk/orte/mca/debugger/base/base.h   2011-11-07 20:24:16 EST (Mon, 
> 07 Nov 2011)
> @@ -61,18 +61,18 @@
> ORTE_DECLSPEC extern int MPIR_proctable_size;
> ORTE_DECLSPEC extern volatile int MPIR_being_debugged;
> ORTE_DECLSPEC extern volatile int MPIR_debug_state;
> -ORTE_DECLSPEC extern volatile int MPIR_i_am_starter;
> +ORTE_DECLSPEC extern int MPIR_i_am_starter;
> ORTE_DECLSPEC extern int MPIR_partial_attach_ok;
> -ORTE_DECLSPEC extern volatile char 
> MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
> -ORTE_DECLSPEC extern volatile char 
> MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
> +ORTE_DECLSPEC extern char MPIR_executable_path[MPIR_MAX_PATH_LENGTH];
> +ORTE_DECLSPEC extern char MPIR_server_arguments[MPIR_MAX_ARG_LENGTH];
> ORTE_DECLSPEC extern volatile int MPIR_forward_output;
> ORTE_DECLSPEC extern volatile int MPIR_forward_comm;
> ORTE_DECLSPEC extern char MPIR_attach_fifo[MPIR_MAX_PATH_LENGTH];
> ORTE_DECLSPEC extern int MPIR_force_to_main;
> 
> -typedef void* (*orte_debugger_breakpoint_fn_t)(void);
> +typedef void (*orte_debugger_breakpoint_fn_t)(void);
> 
> -ORTE_DECLSPEC void* MPIR_Breakpoint(void);
> +ORTE_DECLSPEC void MPIR_Breakpoint(void);
> 
> /* --- end MPICH/TotalView std debugger interface definitions */
> 
> 
> Modified: trunk/orte/mca/debugger/base/debugger_base_fns.c
> ==
> --- trunk/orte/mca/debugger/base/debugger_base_fns.c  (original)
> +++ trunk/orte/mca/debugger/base/debugger_base_fns.c  2011-11-07 20:24:16 EST 
> (Mon, 07 Nov 2011)
> @@ -168,7 +168,7 @@
>  */
> ORTE_PROGRESSED_WAIT(false, jdata->num_reported, jdata->num_procs);
> 
> -(void) MPIR_Breakpoint();
> +MPIR_Breakpoint();
> 
> /* send a message to rank=0 to release it */
> OBJ_CONSTRUCT(, opal_buffer_t); /* don't need anything in this */
> @@ -186,7 +186,7 @@
> /*
>  * Breakpoint function for parallel debuggers
>  */
> -void *MPIR_Breakpoint(void)
> +void MPIR_Breakpoint(void)
> {
> -return NULL;
> +return;
> }
> 
> Modified: trunk/orte/mca/debugger/base/debugger_base_open.c
> ==
> --- 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread Jeff Squyres
On Nov 7, 2011, at 9:48 PM, Nathan T. Hjelm wrote:

> In retrospect I should have done a RFC for the 3rd change with a short
> timeout. At the time (operating on little sleep) it seemed like the commits
> would have minimal impact. Please let me know if the commits have any
> negative impact.

FWIW, I think I'd like to see a rollback of the increase of array sizes in the 
seg_key union.  They weren't necessary and might be slightly misleading.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] debugger confusion

2011-11-08 Thread Jeff Squyres
On Nov 7, 2011, at 8:34 PM, Ralph Castain wrote:

> Best guess: from what I've seen, most debuggers don't seem to conform to what 
> the MPI Forum has "accepted". It doesn't appear that the vendors and debugger 
> developers pay too much attention to that document, possibly because it (a) 
> came after the debuggers were developed, and (b) still doesn't seem to be 
> widely adopted.

Keep in mind that the debugger/tool authors essentially wrote the document, 
with some guidance from the Forum.  The Forum saw the wisdom in making it an 
"official" MPI Forum document so that it would carry some weight, and voted to 
do so.  That document is not actually part of any MPI standard document for 
multiple reasons; here's two:

1. MPIR has a bunch of known problems which no one is currently interested in 
fixing (e.g., scalability)
2. No one wanted to *mandate* the MPIR interface in an MPI implementation

It is therefore a standalone document that, since it became an "official" Forum 
document, is available on mpi-forum.org:

http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf

To be clear: that document simply standardizes what MPI implementations are 
supposed to provide in their MPIR implementation (prior to this, MPI 
implementations tended to have subtle differences between their MPIR 
implementations, which were a nightmare for the debugger/tool vendors).  This 
document does *not* fix the scalability and other well-known issues with MPIR 
-- it just consolidates and standardizes the slightly-different versions of 
MPIR that were floating around out there.

> I'd suggest being a little careful about making changes without consulting 
> people who use TV and "stat", at least - those are the ones most recently 
> tested.

Fair enough.

Moving towards what was specified in that document would probably be a good 
thing, though, since that document *is* the currently accepted version of how 
MPIR is supposed to work and was essentially written *by* the tool vendors.  Of 
course, appropriate testing with various debuggers and tools out there should 
be a given -- current versions of DDT, Totalview, and padb are probably the 3 
most obvious ones with which to test; others have mentioned some "stat," too.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ralph Castain

On Nov 8, 2011, at 4:48 AM, Ashley Pittman wrote:

> I agree that it's not clear this, I don't think this spec is well understood 
> by anyone, indeed it wasn't originally written with the intention of becoming 
> a specification at all.  I've looked at it a couple of times but never used 
> this aspect of it, padb (and I believe stat is the same) don't ever launch 
> jobs under control of the debugger, simply attach to an already existing job 
> which means I've been able to ignore this part of the spec in padb entirely.
> 

This was the point I was trying to communicate earlier, without apparent 
success. I don't think this document can be treated like a spec at this point, 
nor should we assume that debugger "vendors" already support it. It isn't clear 
to me that any real consensus understanding of the document even exists at this 
time.

Hence, I really suggest caution about making changes to our interface code 
without people with access to the various debuggers having a chance to test the 
idea. It took some degree of pain to get this all working, especially to 
support those debuggers that dynamically attach, and I for one would rather not 
go thru it again just because someone decided to interpret the document a 
particular way.

Nathan/Sam: can you please test stat against the trunk and see if it still 
works?

Ashley: ditto with padb, when you have time, would be most appreciated.

Ralph



Re: [OMPI devel] Segfault in odls_fork_local_procs() for some values of npersocket

2011-11-08 Thread Ralph Castain
Looks fine to me - CMR filed. Thanks!

On Nov 8, 2011, at 1:01 AM, nadia.derbey wrote:

> Hi,
> 
> In v1.5, when mpirun is called with both the "-bind-to-core" and
> "-npersocket" options, and the npersocket value leads to less procs than
> sockets allocated on one node, we get a segfault
> 
> Testing environment:
> openmpi v1.5
> 2 nodes with 4 8-cores sockets each
> mpirun -n 10 -bind-to-core -npersocket 2
> 
> I was expecting to get:
>   . ranks 0-1 : node 0 - socket 0
>   . ranks 2-3 : node 0 - socket 1
>   . ranks 4-5 : node 0 - socket 2
>   . ranks 6-7 : node 0 - socket 3
>   . ranks 8-9 : node 1 - socket 0
> 
> Instead of that, everything worked fine on node 0, and I got a segfault
> on node 1, with a stack that looks like:
> 
> [derbeyn@berlin18 ~]$ mpirun --host berlin18,berlin26 -n 10
> -bind-to-core -npersocket 2 sleep 900
> [berlin26:21531] *** Process received signal ***
> [berlin26:21531] Signal: Floating point exception (8)
> [berlin26:21531] Signal code: Integer divide-by-zero (1)
> [berlin26:21531] Failing at address: 0x7fed13731d63
> [berlin26:21531] [ 0] /lib64/libpthread.so.0(+0xf490) [0x7fed15327490]
> [berlin26:21531]
> [ 1] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x2d63) 
> [0x7fed13731d63]
> [berlin26:21531]
> [ 2] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_odls_base_default_launch_local+0xaf3)
>  [0x7fed15e1fe73]
> [berlin26:21531]
> [ 3] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x1d10) 
> [0x7fed13730d10]
> [berlin26:21531]
> [ 4] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x3804d)
> [0x7fed15e1004d]
> [berlin26:21531]
> [ 5] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon_cmd_processor+0x4aa)
>  [0x7fed15e1209a]
> [berlin26:21531]
> [ 6] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x74ee8)
> [0x7fed15e4cee8]
> [berlin26:21531]
> [ 7] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon+0x8d8) 
> [0x7fed15e0f268]
> [berlin26:21531] [ 8] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
> [0x4008c6]
> [berlin26:21531] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)
> [0x7fed14fa7c9d]
> [berlin26:21531] [10] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
> [0x400799]
> [berlin26:21531] *** End of error message ***
> 
> The reason for this issue is that the npersocket value is taken into
> account during the very first phase of mpirun (rmaps/load_balance) to
> claim the slots on each node:
> npersocket() (in rmaps/load_balance/rmaps_lb.c) claims
>   . 8 slots on node 0 (4 sockets * 2 persocket)
>   . 2 slots on node 1 (10 total ranks - 8 already claimed)
> 
> But when we come to odls_default_fork_local_proc() (in
> odls/default/odls_default_module.c) npersocket is actually recomputed.
> Everything works fine on node 0. But on node 1, we have:
>   . jobdat->policy has both ORTE_BIND_TO_CORE and ORTE_MAPPING_NPERXXX
>   . npersocket is recomputed the following way:
> npersocket = jobdat->num_local_procs/orte_odls_globals.num_sockets
>= 2 / 4 = 0
>   . later on, when the starting point is computed:
> logical_cpu = (lrank % npersocket) * jobdat->cpus_per_rank;
> we get the divide-by-zero exception.
> 
> The problem comes, in my mind, from the fact we are recomputing the
> npersocket on the local nodes instead of storing it in the jobdat
> structure (as it is done today for the policy, the cpus_per_rank, the
> stride,...).
> Recomputing this value leads either to the segfault I got, or even to
> wrong mappings: if we had had 4 slots claimed on node 1, the result
> would have been 1 rank per socket (since we have 4-sockets nodes)
> instead of 2 ranks on the first 2 sockets.
> 
> The attached patch is a fix proposal implementing my suggestion of
> storing the npersocket into the jobdat.
> 
> This patch applies on v1.5. Waiting for your comments...
> 
> Regards,
> Nadia
> 
> -- 
> Nadia Derbey
> <001_dont_recompute_npersocket_on_local_nodes.patch>___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ashley Pittman

On 8 Nov 2011, at 00:59, George Bosilca wrote:

> A started process is defined as being our mpirun. In Open MPI 
> MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a 
> means to synchronize the processes not based on MPIR_debug_gate. Therefore 
> only one behavior if acceptable based on the text above: no MPIR_debug_gate=1 
> should be issued by the tool.

Open MPI itself (Via ORTE) is not the only possible launch mechanism for Open 
MPI jobs, Slurm is the only other tool I can think of of the top of my head 
that can do it but I wouldn't be surprised if there are others.  At the time 
the document was written it was assumed that the MPI library and resource 
manager/job launcher were so closely integrated they could be assumed to be 
part of the same software.

> However, in the ompi_debuggers.c around line 226, we have an if that switch 
> between the two acceptable behavior (MPIR_debug_gate or own mechanism) based 
> on the fact that we are a standalone (slurmd or generic) or not. As generic 
> is the ess loaded in most of the cases, I can't figure out how this works if 
> the MPIR specification document has to be trusted.

Unless the library can guarantee that the starter process has 
MPIR_partial_attach_ok the only safe thing it can do it wait on 
MPIR_debug_gate, the only way the library can make any guarantees about mpirun 
is if it's launched from orted.

I agree that it's not clear this, I don't think this spec is well understood by 
anyone, indeed it wasn't originally written with the intention of becoming a 
specification at all.  I've looked at it a couple of times but never used this 
aspect of it, padb (and I believe stat is the same) don't ever launch jobs 
under control of the debugger, simply attach to an already existing job which 
means I've been able to ignore this part of the spec in padb entirely.

Ashley.


[OMPI devel] Segfault in odls_fork_local_procs() for some values of npersocket

2011-11-08 Thread nadia.derbey
Hi,

In v1.5, when mpirun is called with both the "-bind-to-core" and
"-npersocket" options, and the npersocket value leads to less procs than
sockets allocated on one node, we get a segfault

Testing environment:
openmpi v1.5
2 nodes with 4 8-cores sockets each
mpirun -n 10 -bind-to-core -npersocket 2

I was expecting to get:
   . ranks 0-1 : node 0 - socket 0
   . ranks 2-3 : node 0 - socket 1
   . ranks 4-5 : node 0 - socket 2
   . ranks 6-7 : node 0 - socket 3
   . ranks 8-9 : node 1 - socket 0

Instead of that, everything worked fine on node 0, and I got a segfault
on node 1, with a stack that looks like:

[derbeyn@berlin18 ~]$ mpirun --host berlin18,berlin26 -n 10
-bind-to-core -npersocket 2 sleep 900
[berlin26:21531] *** Process received signal ***
[berlin26:21531] Signal: Floating point exception (8)
[berlin26:21531] Signal code: Integer divide-by-zero (1)
[berlin26:21531] Failing at address: 0x7fed13731d63
[berlin26:21531] [ 0] /lib64/libpthread.so.0(+0xf490) [0x7fed15327490]
[berlin26:21531]
[ 1] 
/home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x2d63) 
[0x7fed13731d63]
[berlin26:21531]
[ 2] 
/home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_odls_base_default_launch_local+0xaf3)
 [0x7fed15e1fe73]
[berlin26:21531]
[ 3] 
/home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x1d10) 
[0x7fed13730d10]
[berlin26:21531]
[ 4] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x3804d)
[0x7fed15e1004d]
[berlin26:21531]
[ 5] 
/home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon_cmd_processor+0x4aa)
 [0x7fed15e1209a]
[berlin26:21531]
[ 6] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x74ee8)
[0x7fed15e4cee8]
[berlin26:21531]
[ 7] 
/home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon+0x8d8) 
[0x7fed15e0f268]
[berlin26:21531] [ 8] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
[0x4008c6]
[berlin26:21531] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)
[0x7fed14fa7c9d]
[berlin26:21531] [10] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
[0x400799]
[berlin26:21531] *** End of error message ***

The reason for this issue is that the npersocket value is taken into
account during the very first phase of mpirun (rmaps/load_balance) to
claim the slots on each node:
npersocket() (in rmaps/load_balance/rmaps_lb.c) claims
   . 8 slots on node 0 (4 sockets * 2 persocket)
   . 2 slots on node 1 (10 total ranks - 8 already claimed)

But when we come to odls_default_fork_local_proc() (in
odls/default/odls_default_module.c) npersocket is actually recomputed.
Everything works fine on node 0. But on node 1, we have:
   . jobdat->policy has both ORTE_BIND_TO_CORE and ORTE_MAPPING_NPERXXX
   . npersocket is recomputed the following way:
 npersocket = jobdat->num_local_procs/orte_odls_globals.num_sockets
= 2 / 4 = 0
   . later on, when the starting point is computed:
 logical_cpu = (lrank % npersocket) * jobdat->cpus_per_rank;
 we get the divide-by-zero exception.

The problem comes, in my mind, from the fact we are recomputing the
npersocket on the local nodes instead of storing it in the jobdat
structure (as it is done today for the policy, the cpus_per_rank, the
stride,...).
Recomputing this value leads either to the segfault I got, or even to
wrong mappings: if we had had 4 slots claimed on node 1, the result
would have been 1 rank per socket (since we have 4-sockets nodes)
instead of 2 ranks on the first 2 sockets.

The attached patch is a fix proposal implementing my suggestion of
storing the npersocket into the jobdat.

This patch applies on v1.5. Waiting for your comments...

Regards,
Nadia

-- 
Nadia Derbey
npersocket should not be recomputed in odls_default_fork_local_procs: segfault might occur in some particular cases

diff -r ce3749a94a9e orte/mca/odls/base/odls_base_default_fns.c
--- a/orte/mca/odls/base/odls_base_default_fns.c	Fri Nov 04 13:31:18 2011 +0100
+++ b/orte/mca/odls/base/odls_base_default_fns.c	Fri Nov 04 13:55:00 2011 +0100
@@ -352,6 +352,12 @@ int orte_odls_base_default_get_add_procs
 return rc;
 }

+/* pack the npersocket for this job */
+if (ORTE_SUCCESS != (rc = opal_dss.pack(data, >npersocket, 1, OPAL_INT32))) {
+ORTE_ERROR_LOG(rc);
+return rc;
+}
+
 /* pack the cpus_per_rank for this job */
 if (ORTE_SUCCESS != (rc = opal_dss.pack(data, >cpus_per_rank, 1, OPAL_INT16))) {
 ORTE_ERROR_LOG(rc);
@@ -809,6 +815,12 @@ int orte_odls_base_default_construct_chi
 ORTE_ERROR_LOG(rc);
 goto REPORT_ERROR;
 }
+/* unpack the npersocket for the job */
+cnt=1;
+if (ORTE_SUCCESS != (rc = opal_dss.unpack(data, >npersocket, , OPAL_INT32))) {
+ORTE_ERROR_LOG(rc);
+goto REPORT_ERROR;
+}
 /* unpack the cpus/rank for the job */
 cnt=1;
 if (ORTE_SUCCESS != (rc = opal_dss.unpack(data, >cpus_per_rank, , OPAL_INT16))) {
diff -r ce3749a94a9e