Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread George Bosilca
This patch finally make it's way back into the trunk. I had to modify  
it to fit again into the source, but hopefully I manage to do it  
right. I did some testing and it seems to not harm anything. I split  
it up in several commits, in order to have a clean submission with  
one commit related to one particular patch. They span between  
revision r14923 and r14928.


  george.

On Jun 6, 2007, at 9:51 AM, Tim Prins wrote:


I hate to go back to this, but...

The original commits also included changes to gpr_replica_dict_fn.c
(r14331 and r14336). This change shows some performance improvement  
for
me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some  
ugliness
in the gpr. Again, this is a algorithmic change so as the job  
scales the

performance improvement would be more noticeable.

I vote that this be put back in.

On a related topic, a small memory leak was fixed in r14328, and then
reverted. This change should be put back in.

Tim

George Bosilca wrote:

Commit r14791 apply this patch to the trunk. Let me know if you
encounter any kind of troubles.

  Thanks,
george.

On May 29, 2007, at 2:28 PM, Ralph Castain wrote:

After some work off-list with Tim, it appears that something has  
been

broken
again on the OMPI trunk with respect to comm_spawn. It was working
two weeks
ago, but...sigh.

Anyway, it doesn't appear to have any bearing either way on George's
patch(es), so whomever wants to commit them is welcome to do so.

Thanks
Ralph


On 5/29/07 11:44 AM, "Ralph Castain"  wrote:





On 5/29/07 11:02 AM, "Tim Prins"  wrote:


Well, after fixing many of the tests...


Interesting - they worked fine for me. Perhaps a difference in
environment.


It passes all the tests
except the spawn tests. However, the spawn tests are seriously  
broken
without this patch as well, and the ibm mpi spawn tests seem to  
work

fine.


Then something is seriously wrong. The spawn tests were working as
of my
last commit - that is a test I religiously run. If the spawn  
test here

doesn't work, then it is hard to understand how the mpi spawn can
work since
the call is identical.

Let me see what's wrong first...



As far as I'm concerned, this should assuage any fear of problems
with these changes and they should now go in.

Tim

On May 29, 2007, at 11:34 AM, Ralph Castain wrote:


Well, I'll be the voice of caution again...

Tim: did you run all of the orte tests in the orte/test/system
directory? If
so, and they all run correctly, then I have no issue with  
doing the

commit.
If not, then I would ask that we not do the commit until that has
been done.

In running those tests, you need to run them on a multi-node
system, both
using mpirun and as singletons (you'll have to look at the  
tests to

see
which ones make sense in the latter case). This will ensure  
that we

have at
least some degree of coverage.

Thanks
Ralph



On 5/29/07 9:23 AM, "George Bosilca"  wrote:


I'd be happy to commit the patch into the trunk. But after what
happened last time, I'm more than cautious. If the community  
think
the patch is worth having it, let me know and I'll push it in  
the

trunk asap.

   Thanks,
 george.

On May 29, 2007, at 10:56 AM, Tim Prins wrote:

I think both patches should be put in immediately. I have  
done some

simple testing, and with 128 nodes of odin, with 1024 processes
running mpi hello, these decrease our running time from  
about 14.2
seconds to 10.9 seconds. This is a significant decrease, and  
as the

scale increases there should be increasing benefit.

I'd be happy to commit these changes if no one objects.

Tim

On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:


Thanks - I'll take a look at this (and the prior ones!) in the
next
couple
of weeks when time permits and get back to you.

Ralph


On 5/23/07 1:11 PM, "George Bosilca"   
wrote:


Attached is another patch to the ORTE layer, more  
specifically

the
replica. The idea is to decrease the number of strcmp by  
using a

small hash function before doing the strcmp. The hask key for
each
registry entry is computed when it is added to the  
registry. When
we're doing a query, instead of comparing the 2 strings we  
first
check if the hash key match, and if they do match then we  
compare

the
2 strings in order to make sure we eliminate collisions  
from our

answers.

There is some benefit in terms of performance. It's hardly
visible
for few processes, but it start showing up when the number of
processes increase. In fact the number of strcmp in the trace
file
drastically decrease. The main reason it works well, is  
because

most
of the keys start with basically the same chars (such as  
orte-
blahblah) which transform the strcmp on a loop over few  
chars.


Ralph, please consider it for inclusion on the ORTE layer.

   Thanks,
 george.


___
devel mailing list
de...@open-mpi.org

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14923

2007-06-06 Thread Ralph Castain
George

I believe that such non-professional comments do not belong in a code base
that will be distributed to the public. I have removed this one - kindly
refrain from them in the future.

I remind you that the decision to use dynamic memory was made in an ORTE
design meeting here at LANL three years ago which you (among many others)
attended. If you feel that decision should be revisited, then fine - let's
revisit it.

Meantime, I appreciate catching the necessary free - let's do it without the
negative wisecracks.

Thanks
Ralph


On 6/6/07 2:17 PM, "bosi...@osl.iu.edu"  wrote:

> Author: bosilca
> Date: 2007-06-06 16:17:27 EDT (Wed, 06 Jun 2007)
> New Revision: 14923
> URL: https://svn.open-mpi.org/trac/ompi/changeset/14923
> 
> Log:
> Don't forget to free the temporary buffer.
> 
> Text files modified:
>trunk/orte/mca/gpr/replica/functional_layer/gpr_replica_put_get_fn.c |
> 1 +  
>1 files changed, 1 insertions(+), 0 deletions(-)
> 
> Modified: trunk/orte/mca/gpr/replica/functional_layer/gpr_replica_put_get_fn.c
> ==
> --- 
> 
trunk/orte/mca/gpr/replica/functional_layer/gpr_replica_put_get_fn.c (original>
)
> +++ 
> trunk/orte/mca/gpr/replica/functional_layer/gpr_replica_put_get_fn.c 2007-06-0
> 6 16:17:27 EDT (Wed, 06 Jun 2007)
> @@ -139,6 +139,7 @@
>  for (i=0; i < num_tokens; i++) {
>  orte_gpr_replica_dict_reverse_lookup(, seg, token_itags[i]);
>  opal_output(0, "\t%s", tmp);
> +free(tmp); /* We all enjoy allocating and releasing memory all
> over the code isn't it ? */
>  }
>  }
>  
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn




Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
On 6/6/07 9:21 AM, "Tim Prins"  wrote:

> Actually, the tests are quite painful to run, since there are things in
> there that aren't real tests (such as spin, no-op, loob-child, etc) and
> I really don't know what the expected output should be.

Actually, they are tests - you just have to know how to use them. The RTE
needs to test things that are somewhat difficult to automate, and frankly,
nobody has had the time to go back and try to develop more automatic
versions. So that is the best we've got - let's at least use them as best we
can.

After all, people have complained to me more than once about why things in
ORTE keep getting repeatedly broken (you included ;-) ). This is why -
nobody tests a range of RTE functionality before committing things that have
unfortunate side effectsonly to have them finally detected when a user
hits a code path after we do a release.

> 
> Anyways, I have made my way through these things, and I could not see
> any failures. This should clear the way for these changesets to be being
> brought in.

That's fine - thanks!

> 
> George: Do you want to bring this over? If you do, remember to also
> remove test/class/orte_bitmap.c
> 
> Thanks,
> 
> Tim
> 
> 
> Ralph H Castain wrote:
>> Sigh...is it really so much to ask that we at least run the tests in
>> orte/test/system and orte/test/mpi using both mpirun and singleton (where
>> appropriate) instead of just relying on "well I ran hello_world"?
>> 
>> That is all I have ever asked, yet it seems to be viewed as a huge
>> impediment. Is it really that much to ask for when modifying a core part of
>> the system? :-/
>> 
>> If you have done those tests, then my apology - but your note only indicates
>> that you ran "hello_world" and are basing your recommendation *solely* on
>> that test.
>> 
>> 
>> On 6/6/07 7:51 AM, "Tim Prins"  wrote:
>> 
>>   
>>> I hate to go back to this, but...
>>> 
>>> The original commits also included changes to gpr_replica_dict_fn.c
>>> (r14331 and r14336). This change shows some performance improvement for
>>> me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness
>>> in the gpr. Again, this is a algorithmic change so as the job scales the
>>> performance improvement would be more noticeable.
>>> 
>>> I vote that this be put back in.
>>> 
>>> On a related topic, a small memory leak was fixed in r14328, and then
>>> reverted. This change should be put back in.
>>> 
>>> Tim
>>> 
>>> George Bosilca wrote:
>>> 
 Commit r14791 apply this patch to the trunk. Let me know if you
 encounter any kind of troubles.
 
   Thanks,
 george.
 
 On May 29, 2007, at 2:28 PM, Ralph Castain wrote:
 
   
> After some work off-list with Tim, it appears that something has been
> broken
> again on the OMPI trunk with respect to comm_spawn. It was working
> two weeks
> ago, but...sigh.
> 
> Anyway, it doesn't appear to have any bearing either way on George's
> patch(es), so whomever wants to commit them is welcome to do so.
> 
> Thanks
> Ralph
> 
> 
> On 5/29/07 11:44 AM, "Ralph Castain"  wrote:
> 
> 
>> 
>> On 5/29/07 11:02 AM, "Tim Prins"  wrote:
>> 
>>   
>>> Well, after fixing many of the tests...
>>>
>> Interesting - they worked fine for me. Perhaps a difference in
>> environment.
>> 
>>   
>>> It passes all the tests
>>> except the spawn tests. However, the spawn tests are seriously broken
>>> without this patch as well, and the ibm mpi spawn tests seem to work
>>> fine.
>>>
>> Then something is seriously wrong. The spawn tests were working as
>> of my
>> last commit - that is a test I religiously run. If the spawn test here
>> doesn't work, then it is hard to understand how the mpi spawn can
>> work since
>> the call is identical.
>> 
>> Let me see what's wrong first...
>> 
>>   
>>> As far as I'm concerned, this should assuage any fear of problems
>>> with these changes and they should now go in.
>>> 
>>> Tim
>>> 
>>> On May 29, 2007, at 11:34 AM, Ralph Castain wrote:
>>> 
>>>
 Well, I'll be the voice of caution again...
 
 Tim: did you run all of the orte tests in the orte/test/system
 directory? If
 so, and they all run correctly, then I have no issue with doing the
 commit.
 If not, then I would ask that we not do the commit until that has
 been done.
 
 In running those tests, you need to run them on a multi-node
 system, both
 using mpirun and as singletons (you'll have to look at the tests to
 see
 which ones make sense in the latter case). This will ensure that we
 have at
 least 

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Tim Prins
Actually, the tests are quite painful to run, since there are things in 
there that aren't real tests (such as spin, no-op, loob-child, etc) and 
I really don't know what the expected output should be.


Anyways, I have made my way through these things, and I could not see 
any failures. This should clear the way for these changesets to be being 
brought in.


George: Do you want to bring this over? If you do, remember to also 
remove test/class/orte_bitmap.c


Thanks,

Tim


Ralph H Castain wrote:

Sigh...is it really so much to ask that we at least run the tests in
orte/test/system and orte/test/mpi using both mpirun and singleton (where
appropriate) instead of just relying on "well I ran hello_world"?

That is all I have ever asked, yet it seems to be viewed as a huge
impediment. Is it really that much to ask for when modifying a core part of
the system? :-/

If you have done those tests, then my apology - but your note only indicates
that you ran "hello_world" and are basing your recommendation *solely* on
that test.


On 6/6/07 7:51 AM, "Tim Prins"  wrote:

  

I hate to go back to this, but...

The original commits also included changes to gpr_replica_dict_fn.c
(r14331 and r14336). This change shows some performance improvement for
me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness
in the gpr. Again, this is a algorithmic change so as the job scales the
performance improvement would be more noticeable.

I vote that this be put back in.

On a related topic, a small memory leak was fixed in r14328, and then
reverted. This change should be put back in.

Tim

George Bosilca wrote:


Commit r14791 apply this patch to the trunk. Let me know if you
encounter any kind of troubles.

  Thanks,
george.

On May 29, 2007, at 2:28 PM, Ralph Castain wrote:

  

After some work off-list with Tim, it appears that something has been
broken
again on the OMPI trunk with respect to comm_spawn. It was working
two weeks
ago, but...sigh.

Anyway, it doesn't appear to have any bearing either way on George's
patch(es), so whomever wants to commit them is welcome to do so.

Thanks
Ralph


On 5/29/07 11:44 AM, "Ralph Castain"  wrote:




On 5/29/07 11:02 AM, "Tim Prins"  wrote:

  

Well, after fixing many of the tests...


Interesting - they worked fine for me. Perhaps a difference in
environment.

  

It passes all the tests
except the spawn tests. However, the spawn tests are seriously broken
without this patch as well, and the ibm mpi spawn tests seem to work
fine.


Then something is seriously wrong. The spawn tests were working as
of my
last commit - that is a test I religiously run. If the spawn test here
doesn't work, then it is hard to understand how the mpi spawn can
work since
the call is identical.

Let me see what's wrong first...

  

As far as I'm concerned, this should assuage any fear of problems
with these changes and they should now go in.

Tim

On May 29, 2007, at 11:34 AM, Ralph Castain wrote:



Well, I'll be the voice of caution again...

Tim: did you run all of the orte tests in the orte/test/system
directory? If
so, and they all run correctly, then I have no issue with doing the
commit.
If not, then I would ask that we not do the commit until that has
been done.

In running those tests, you need to run them on a multi-node
system, both
using mpirun and as singletons (you'll have to look at the tests to
see
which ones make sense in the latter case). This will ensure that we
have at
least some degree of coverage.

Thanks
Ralph



On 5/29/07 9:23 AM, "George Bosilca"  wrote:

  

I'd be happy to commit the patch into the trunk. But after what
happened last time, I'm more than cautious. If the community think
the patch is worth having it, let me know and I'll push it in the
trunk asap.

   Thanks,
 george.

On May 29, 2007, at 10:56 AM, Tim Prins wrote:



I think both patches should be put in immediately. I have done some
simple testing, and with 128 nodes of odin, with 1024 processes
running mpi hello, these decrease our running time from about 14.2
seconds to 10.9 seconds. This is a significant decrease, and as the
scale increases there should be increasing benefit.

I'd be happy to commit these changes if no one objects.

Tim

On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:

  

Thanks - I'll take a look at this (and the prior ones!) in the
next
couple
of weeks when time permits and get back to you.

Ralph


On 5/23/07 1:11 PM, "George Bosilca"  wrote:



Attached is another patch to the ORTE layer, more specifically
the
replica. The idea is to decrease the number of strcmp by using a
small hash function before doing the strcmp. The hask key for
each
registry entry is computed when it is added to the registry. When
we're doing a query, 

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14897

2007-06-06 Thread Brian Barrett

Yup, thanks.

Brian

On Jun 6, 2007, at 2:27 AM, Bert Wesarg wrote:




+#ifdef HAVE_REGEXEC
+args_count = opal_argv_count(options_data[i].compiler_args);
+for (j = 0 ; j < args_count ; ++j) {
+if (0 != regcomp(, options_data[i].compiler_args 
[j], REG_NOSUB)) {

+return -1;
+}
+
+if (0 == regexec(, arg, (size_t) 0, NULL, 0)) {

missing regfree();?


+return i;
+}
+
+regfree();
+}
+#else


regards
Bert
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
Sigh...is it really so much to ask that we at least run the tests in
orte/test/system and orte/test/mpi using both mpirun and singleton (where
appropriate) instead of just relying on "well I ran hello_world"?

That is all I have ever asked, yet it seems to be viewed as a huge
impediment. Is it really that much to ask for when modifying a core part of
the system? :-/

If you have done those tests, then my apology - but your note only indicates
that you ran "hello_world" and are basing your recommendation *solely* on
that test.


On 6/6/07 7:51 AM, "Tim Prins"  wrote:

> I hate to go back to this, but...
> 
> The original commits also included changes to gpr_replica_dict_fn.c
> (r14331 and r14336). This change shows some performance improvement for
> me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness
> in the gpr. Again, this is a algorithmic change so as the job scales the
> performance improvement would be more noticeable.
> 
> I vote that this be put back in.
> 
> On a related topic, a small memory leak was fixed in r14328, and then
> reverted. This change should be put back in.
> 
> Tim
> 
> George Bosilca wrote:
>> Commit r14791 apply this patch to the trunk. Let me know if you
>> encounter any kind of troubles.
>> 
>>   Thanks,
>> george.
>> 
>> On May 29, 2007, at 2:28 PM, Ralph Castain wrote:
>> 
>>> After some work off-list with Tim, it appears that something has been
>>> broken
>>> again on the OMPI trunk with respect to comm_spawn. It was working
>>> two weeks
>>> ago, but...sigh.
>>> 
>>> Anyway, it doesn't appear to have any bearing either way on George's
>>> patch(es), so whomever wants to commit them is welcome to do so.
>>> 
>>> Thanks
>>> Ralph
>>> 
>>> 
>>> On 5/29/07 11:44 AM, "Ralph Castain"  wrote:
>>> 
 
 
 
 On 5/29/07 11:02 AM, "Tim Prins"  wrote:
 
> Well, after fixing many of the tests...
 
 Interesting - they worked fine for me. Perhaps a difference in
 environment.
 
> It passes all the tests
> except the spawn tests. However, the spawn tests are seriously broken
> without this patch as well, and the ibm mpi spawn tests seem to work
> fine.
 
 Then something is seriously wrong. The spawn tests were working as
 of my
 last commit - that is a test I religiously run. If the spawn test here
 doesn't work, then it is hard to understand how the mpi spawn can
 work since
 the call is identical.
 
 Let me see what's wrong first...
 
> 
> As far as I'm concerned, this should assuage any fear of problems
> with these changes and they should now go in.
> 
> Tim
> 
> On May 29, 2007, at 11:34 AM, Ralph Castain wrote:
> 
>> Well, I'll be the voice of caution again...
>> 
>> Tim: did you run all of the orte tests in the orte/test/system
>> directory? If
>> so, and they all run correctly, then I have no issue with doing the
>> commit.
>> If not, then I would ask that we not do the commit until that has
>> been done.
>> 
>> In running those tests, you need to run them on a multi-node
>> system, both
>> using mpirun and as singletons (you'll have to look at the tests to
>> see
>> which ones make sense in the latter case). This will ensure that we
>> have at
>> least some degree of coverage.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> 
>> On 5/29/07 9:23 AM, "George Bosilca"  wrote:
>> 
>>> I'd be happy to commit the patch into the trunk. But after what
>>> happened last time, I'm more than cautious. If the community think
>>> the patch is worth having it, let me know and I'll push it in the
>>> trunk asap.
>>> 
>>>Thanks,
>>>  george.
>>> 
>>> On May 29, 2007, at 10:56 AM, Tim Prins wrote:
>>> 
 I think both patches should be put in immediately. I have done some
 simple testing, and with 128 nodes of odin, with 1024 processes
 running mpi hello, these decrease our running time from about 14.2
 seconds to 10.9 seconds. This is a significant decrease, and as the
 scale increases there should be increasing benefit.
 
 I'd be happy to commit these changes if no one objects.
 
 Tim
 
 On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:
 
> Thanks - I'll take a look at this (and the prior ones!) in the
> next
> couple
> of weeks when time permits and get back to you.
> 
> Ralph
> 
> 
> On 5/23/07 1:11 PM, "George Bosilca"  wrote:
> 
>> Attached is another patch to the ORTE layer, more specifically
>> the
>> replica. The idea is to decrease the number of strcmp by using a
>> small hash function before doing the strcmp. 

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Tim Prins

I hate to go back to this, but...

The original commits also included changes to gpr_replica_dict_fn.c 
(r14331 and r14336). This change shows some performance improvement for 
me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness 
in the gpr. Again, this is a algorithmic change so as the job scales the 
performance improvement would be more noticeable.


I vote that this be put back in.

On a related topic, a small memory leak was fixed in r14328, and then 
reverted. This change should be put back in.


Tim

George Bosilca wrote:
Commit r14791 apply this patch to the trunk. Let me know if you 
encounter any kind of troubles.


  Thanks,
george.

On May 29, 2007, at 2:28 PM, Ralph Castain wrote:

After some work off-list with Tim, it appears that something has been 
broken
again on the OMPI trunk with respect to comm_spawn. It was working 
two weeks

ago, but...sigh.

Anyway, it doesn't appear to have any bearing either way on George's
patch(es), so whomever wants to commit them is welcome to do so.

Thanks
Ralph


On 5/29/07 11:44 AM, "Ralph Castain"  wrote:





On 5/29/07 11:02 AM, "Tim Prins"  wrote:


Well, after fixing many of the tests...


Interesting - they worked fine for me. Perhaps a difference in 
environment.



It passes all the tests
except the spawn tests. However, the spawn tests are seriously broken
without this patch as well, and the ibm mpi spawn tests seem to work
fine.


Then something is seriously wrong. The spawn tests were working as 
of my

last commit - that is a test I religiously run. If the spawn test here
doesn't work, then it is hard to understand how the mpi spawn can 
work since

the call is identical.

Let me see what's wrong first...



As far as I'm concerned, this should assuage any fear of problems
with these changes and they should now go in.

Tim

On May 29, 2007, at 11:34 AM, Ralph Castain wrote:


Well, I'll be the voice of caution again...

Tim: did you run all of the orte tests in the orte/test/system
directory? If
so, and they all run correctly, then I have no issue with doing the
commit.
If not, then I would ask that we not do the commit until that has
been done.

In running those tests, you need to run them on a multi-node
system, both
using mpirun and as singletons (you'll have to look at the tests to
see
which ones make sense in the latter case). This will ensure that we
have at
least some degree of coverage.

Thanks
Ralph



On 5/29/07 9:23 AM, "George Bosilca"  wrote:


I'd be happy to commit the patch into the trunk. But after what
happened last time, I'm more than cautious. If the community think
the patch is worth having it, let me know and I'll push it in the
trunk asap.

   Thanks,
 george.

On May 29, 2007, at 10:56 AM, Tim Prins wrote:


I think both patches should be put in immediately. I have done some
simple testing, and with 128 nodes of odin, with 1024 processes
running mpi hello, these decrease our running time from about 14.2
seconds to 10.9 seconds. This is a significant decrease, and as the
scale increases there should be increasing benefit.

I'd be happy to commit these changes if no one objects.

Tim

On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:

Thanks - I'll take a look at this (and the prior ones!) in the 
next

couple
of weeks when time permits and get back to you.

Ralph


On 5/23/07 1:11 PM, "George Bosilca"  wrote:

Attached is another patch to the ORTE layer, more specifically 
the

replica. The idea is to decrease the number of strcmp by using a
small hash function before doing the strcmp. The hask key for 
each

registry entry is computed when it is added to the registry. When
we're doing a query, instead of comparing the 2 strings we first
check if the hash key match, and if they do match then we compare
the
2 strings in order to make sure we eliminate collisions from our
answers.

There is some benefit in terms of performance. It's hardly 
visible

for few processes, but it start showing up when the number of
processes increase. In fact the number of strcmp in the trace 
file

drastically decrease. The main reason it works well, is because
most
of the keys start with basically the same chars (such as orte-
blahblah) which transform the strcmp on a loop over few chars.

Ralph, please consider it for inclusion on the ORTE layer.

   Thanks,
 george.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] jnysal-openib-wireup branch

2007-06-06 Thread Jeff Squyres
Ok -- so did you want to go ahead and make these changes, or did you  
want me to do it?


Either way, I'd be in favor of all this stuff coming to the trunk in  
the Very Near Future.  :-)




On Jun 6, 2007, at 7:02 AM, Nysal Jan wrote:


Hi Jeff,

1. The logic for if_exclude was not correct.  I committed a fix for
it.  https://svn.open-mpi.org/trac/ompi/changeset/14748

Thanks

2. I'm a bit confused on a) how the new MCA params mca_num_hcas and
map_num_procs_per_hca are supposed to be used and b) what their
default values shoulant code)d be.

Probably these params(and relevant code) should be removed now,  
since there is a plan for generic Socket/Core to HCA mapping  
scheme. mca_num_hcas is the maximum number of HCAs a task can use.  
Eg. If mpa_num_procs_per_hca is 3 and max_num_hcas is 2. On any  
node, task 1/2/3 are mapped to hca1 & hca2, task 4/5/6 are mapped  
to hca3 & hca4 
Default values were set as 1(thats what we needed at that point in  
time).It needs to be modified so that ompi's default behaviour  
remains unchanged (ie. use all hcas)


2a. I don't quite understand the logic of is_hca_allowed(); I could
not get it to work properly.  Specifically, I have 2 machines each
with 2 HCAs (mthca0 has 1 port, mthca1 has 2 ports).  If I ran 2
procs (regardless of byslot or bynode), is_hca_allowed() would always
return false for the 2nd proc.  So I put a temporary override in
is_hca_allowed() to simply always return true.  Can you explain how
the logic is supposed to work in that function?

Explained above

2b. The default values of max_num_hcas and map_num_procs_per_hca are
both 1.  Based on my (potentially flawed) understanding of how these
MCA params are meant to be used, this is different than the current
default behavior.  The current default is that all procs use all
ACTIVE ports on all HCAs.  I *think* your new default param values
will set each proc to use the ACTIVE ports on exactly one HCA,
regardless how many there are in the host.  Did you mean to do that?
Also: both values must currently be >=1; should we allow -1 for both
of these values, meaning that they can be "infinite" ( i.e., based on
the number of HCAs in the host)?

Yes,  the defaults need to be changed. I'll also make the selection  
logic more granular (eg. -mca mca_btl_openib_if_include  
mthca0:1,mthca1:1)


--Nysal
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] jnysal-openib-wireup branch

2007-06-06 Thread Nysal Jan

Hi Jeff,


1. The logic for if_exclude was not correct.  I committed a fix for
it.  https://svn.open-mpi.org/trac/ompi/changeset/14748



Thanks

2. I'm a bit confused on a) how the new MCA params mca_num_hcas and

map_num_procs_per_hca are supposed to be used and b) what their
default values shoulant code)d be.



Probably these params(and relevant code) should be removed now, since there
is a plan for generic Socket/Core to HCA mapping scheme. mca_num_hcas is the
maximum number of HCAs a task can use. Eg. If mpa_num_procs_per_hca is 3 and
max_num_hcas is 2. On any node, task 1/2/3 are mapped to hca1 & hca2, task
4/5/6 are mapped to hca3 & hca4 
Default values were set as 1(thats what we needed at that point in time).It
needs to be modified so that ompi's default behaviour remains unchanged (ie.
use all hcas)

2a. I don't quite understand the logic of is_hca_allowed(); I could

not get it to work properly.  Specifically, I have 2 machines each
with 2 HCAs (mthca0 has 1 port, mthca1 has 2 ports).  If I ran 2
procs (regardless of byslot or bynode), is_hca_allowed() would always
return false for the 2nd proc.  So I put a temporary override in
is_hca_allowed() to simply always return true.  Can you explain how
the logic is supposed to work in that function?



Explained above

2b. The default values of max_num_hcas and map_num_procs_per_hca are

both 1.  Based on my (potentially flawed) understanding of how these
MCA params are meant to be used, this is different than the current
default behavior.  The current default is that all procs use all
ACTIVE ports on all HCAs.  I *think* your new default param values
will set each proc to use the ACTIVE ports on exactly one HCA,
regardless how many there are in the host.  Did you mean to do that?
Also: both values must currently be >=1; should we allow -1 for both
of these values, meaning that they can be "infinite" ( i.e., based on
the number of HCAs in the host)?



Yes,  the defaults need to be changed. I'll also make the selection logic
more granular (eg. -mca mca_btl_openib_if_include mthca0:1,mthca1:1)

--Nysal


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14897

2007-06-06 Thread Bert Wesarg

> +#ifdef HAVE_REGEXEC
> +args_count = opal_argv_count(options_data[i].compiler_args);
> +for (j = 0 ; j < args_count ; ++j) {
> +if (0 != regcomp(, options_data[i].compiler_args[j], 
> REG_NOSUB)) {
> +return -1;
> +}
> +
> +if (0 == regexec(, arg, (size_t) 0, NULL, 0)) {
missing regfree();?

> +return i;
> +}
> +
> +regfree();
> +}
> +#else

regards
Bert