Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-07 Thread Gilles Gouaillardet
Ralph and George, here are attached two patches : - heterogeneous.v1.patch : a cleanup of the previous patch - heterogeneous.v2.patch : a new patch based on Ralph suggestion. i made the minimal changes to move jobid and vpid into the OPAL layer. Cheers, Gilles On 2014/08/07 11:27, Ralph Castain

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread Ralph Castain
Are we maybe approaching this from the wrong direction? I ask because we had to do some gyrations in the pmix framework to work around the difference in naming schemes between OPAL and the rest of the code base, and now we have more gyrations here. Given that the MPI and RTE layers both rely on

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread George Bosilca
Gilles, This looks right. It is really unfortunately that we have to change the definition of orte_process_name_t for big endian architectures, but I don't think there is a way around. Regarding your patch I have two comments: 1. There is a flagrant lack of comments ... especially on the ORTE sid

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread Gilles Gouaillardet
Ralph and George, here is attached a patch that fixes the heterogeneous support without the abstraction violation. Cheers, Gilles On 2014/08/06 9:40, Gilles Gouaillardet wrote: > hummm > > i intentionally did not swap the two 32 bits (!) > > from the top level, what we have is : > > typedef str

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
hummm i intentionally did not swap the two 32 bits (!) from the top level, what we have is : typedef struct { union { uint64_t opal; struct { uint32_t jobid; uint32_t vpid; } orte; } meta_process_name_t; OPAL is agnostic about jobid and vpid. jobid an

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
Ah yes, so it is - sorry I missed that last test :-/ On Aug 5, 2014, at 10:50 AM, George Bosilca wrote: > The code committed by Gilles is correctly protected for big endian > (https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely pointing > out that I think he should also swap the

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
The code committed by Gilles is correctly protected for big endian ( https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely pointing out that I think he should also swap the 2 32 bits in his implementation. George. On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain wrote: > > On Aug 5

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
On Aug 5, 2014, at 10:23 AM, George Bosilca wrote: > On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain wrote: > Hmmm...wouldn't that then require that you know (a) the other side is little > endian, and (b) that you are on a big endian? Otherwise, you wind up with the > same issue in reverse, yes

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain wrote: > Hmmm...wouldn't that then require that you know (a) the other side is > little endian, and (b) that you are on a big endian? Otherwise, you wind up > with the same issue in reverse, yes? > This is similar to the 32 bits ntohl that we are usi

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
Hmmm...wouldn't that then require that you know (a) the other side is little endian, and (b) that you are on a big endian? Otherwise, you wind up with the same issue in reverse, yes? In the ORTE methods, we explicitly set the fields (e.g., jobid = ntohl(remote-jobid)) to get around this problem

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
Technically speaking, converting a 64 bits to a big endian representation requires the swap of the 2 32 bits parts. So the correct approach would have been: uint64_t htonll(uint64_t v) { return uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32)); } George. On Tue, Aug 5, 2014 at 5:52

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Ralph Castain
FWIW: that's exactly how we do it in ORTE On Aug 4, 2014, at 10:25 PM, Gilles Gouaillardet wrote: > George, > > i confirm there was a problem when running on an heterogeneous cluster, > this is now fixed in r32425. > > i am not convinced i chose the most elegant way to achieve the desired res

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
George, i confirm there was a problem when running on an heterogeneous cluster, this is now fixed in r32425. i am not convinced i chose the most elegant way to achieve the desired result ... could you please double check this commit ? Thanks, Gilles On 2014/08/02 0:14, George Bosilca wrote: >

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread George Bosilca
Gilles, The design of the BTL move was to let the opal_process_name_t be agnostic to what is stored inside, and all accesses should be done through the provided accessors. Thus, big endian or little endian doesn’t make a difference, as long as everything goes through the accessors. I’m skeptic

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
Oh, should point out: I didn't deal with the potential btl/tcp issue you noted - I defer that to George On Aug 1, 2014, at 7:56 AM, Ralph Castain wrote: > Hi Gilles > > I'm not sure if we have a problem or not - we'll have to wait and see, I > guess. So far, I'm not seeing any problems on x8

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
Hi Gilles I'm not sure if we have a problem or not - we'll have to wait and see, I guess. So far, I'm not seeing any problems on x86 archs, but that's to be expected and I don't have access to anything else. I fixed the issues you noted plus a few others I found. I imagine we'll discover more

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
one last point : in orte_process_name_t, jobid and vpid have type orte_jobid_t and orte_vpid_t which really is uint32_t. in orte/util/proc.c, the function pointers opal_process_name_vpid and opal_process_name_jobid return an int32_t should it be an uint32_t instead ? /* and then _process_name_jo

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
George and Ralph, i am very confused whether there is an issue or not. anyway, today Paul and i ran basic tests on big endian machines and did not face any issue related to big endianness. so i made my homework, digged into the code, and basically, opal_process_name_t is used as an orte_process

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
The underlying structure changed, so a little bit of fiddling is normal. Instead of using a field in the ompi_proc_t you are now using a field down in opal_proc_t, a field that simply cannot have the same type as before (orte_process_name_t). George. On Wed, Jul 30, 2014 at 12:19 PM, Ralph Ca

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
George - my point was that we regularly tested using the method in that routine, and now we have to do something a little different. So it is an "issue" in that we have to make changes across the code base to ensure we do things the "new" way, that's all On Jul 30, 2014, at 9:17 AM, George Bosi

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
No, this is not going to be an issue if the opal_identifier_t is used correctly (aka only via the exposed accessors). George. On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain wrote: > Yeah, my fix won't work for big endian machines - this is going to be an > issue across the code base now, s

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
Yeah, my fix won't work for big endian machines - this is going to be an issue across the code base now, so we'll have to troll and fix it. I was doing the minimal change required to fix the trunk in the meantime. On Jul 30, 2014, at 9:06 AM, George Bosilca wrote: > Yes. opal_process_name_t ha

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
Yes. opal_process_name_t has basically no meaning by itself, it is a 64 bits storage location used by the upper layer to save some local key that can be later used to extract information. Calling the OPAL level compare function might be a better fit there. George. On Wed, Jul 30, 2014 at 11:5

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
Ralph, was it really that simple ? proc_temp->super.proc_name has type opal_process_name_t : typedef opal_identifier_t opal_process_name_t; typedef uint64_t opal_identifier_t; *but* item_ptr->peer has type orte_process_name_t : struct orte_process_name_t { orte_jobid_t jobid; orte_vpid_t

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Rolf vandeVaart
devel] OMPI devel] trunk compilation errors in jenkins I just fixed this one - all that was required was an ampersand as the name was being passed into the function instead of a pointer to the name r32357 On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET mailto:gilles.gouaillar...@gmail.com>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Ralph Castain
I just fixed this one - all that was required was an ampersand as the name was being passed into the function instead of a pointer to the name r32357 On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET wrote: > Rolf, > > r32353 can be seen as a suspect... > Even if it is correct, it might have

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles GOUAILLARDET
Rolf, r32353 can be seen as a suspect... Even if it is correct, it might have exposed the bug discussed in #4815 even more (e.g. we hit the bug 100% after the fix) does the attached patch to #4815 fixes the problem ? If yes, and if you see this issue as a showstopper, feel free to commit it and