date:20070911

[OMPI devel] Which tests for larger cluster testing

2007-09-11 Thread Rolf . Vandevaart



I am curious which tests are being used when running tests on larger
clusters.  And by larger clusters, I mean anything with np > 128.
(Although I realize that is not very large, but it is bigger than most
of the clusters I assume tests are being run on)
I ask this because I planned on using some of the intel tests, but they
clearly have limitations starting at np=64. 

To avoid mailing list clutter, feel free to just email me and I will 
summarize.


Rolf

Re: [OMPI devel] [devel-core] [RFC] Exit without finalize

2007-09-11 Thread Aurelien Bouteiller


Sounds great to me.

Aurelien

Le 11 sept. 07 à 13:03, Jeff Squyres a écrit :

If you genericize the concept, I think it's compatible with FT:

1. during MPI_INIT, one of the MPI processes can request a "notify"
exit pattern for the job: a process must notify the RTE before it
actually exits (i.e., some ORTE notification during MPI_FINALIZE).
If a process exits before notifying the RTE, it's an error.

1a. The default action upon error can be to kill the entire job.
1b. If you desire plug-in-able error actions (e.g., not kill the
entire job), I'm *assuming* that our plugin frameworks can handle
that...?

2. for an FT MPI job, I assume that the MPI processes would either
not perform step 1 (i.e., the default action upon process exit is
nothing -- just like if you had run "mpirun -np 4 hostname"), or you
would select a specific action upon error/plugin for what to do when
a process exits without first notifying the RTE.

Howzat?

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [devel-core] [RFC] Exit without finalize

2007-09-11 Thread Jeff Squyres


On Sep 8, 2007, at 2:33 PM, Aurelien Bouteiller wrote:


I agree (b) is not a good idea. However I am not very pleased by (a)
either. It totally prevent any process Fault Tolerant mechanism if we
go that way. If we plan to add some failure detection mechanism to
RTE and failure management (to avoid Finalize to hang), we should add
the ability to plug-in FT specific error handlers. The default error
handler should do exactly what is proposed by Ralph, but nowhere else
(than in this handler) the RTE code should assume that the
application is aborting when a failure occurs. If it is a FT
application it might just not abort and recover.


(b) sounds fine to me.

If you genericize the concept, I think it's compatible with FT:

1. during MPI_INIT, one of the MPI processes can request a "notify"  
exit pattern for the job: a process must notify the RTE before it  
actually exits (i.e., some ORTE notification during MPI_FINALIZE).   
If a process exits before notifying the RTE, it's an error.


1a. The default action upon error can be to kill the entire job.
1b. If you desire plug-in-able error actions (e.g., not kill the  
entire job), I'm *assuming* that our plugin frameworks can handle  
that...?


2. for an FT MPI job, I assume that the MPI processes would either  
not perform step 1 (i.e., the default action upon process exit is  
nothing -- just like if you had run "mpirun -np 4 hostname"), or you  
would select a specific action upon error/plugin for what to do when  
a process exits without first notifying the RTE.


Howzat?

--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] UD BTL alltoall hangs

2007-09-11 Thread Andrew Friedley

First off, I've managed to reproduce this with nbcbench using only 16 
procs (two per node), and setting btl_ofud_sd_num to 12 -- eases 
debugging with fewer procs to look at.


ompi_coll_tuned_alltoall_intra_basic_linear is the alltoall routine that 
is being called.  What I'm seeing from totalview is that some random 
number of procs (1-5 usually, varies from run to run) are sitting with a 
send and a recv outstanding to every other proc.  The other procs 
however have moved on to the next collective.  This is hard to see with 
the default nbcbench code since it calls only alltoall repeatedly -- 
adding a barrier after the MPI_Alltoall() call makes it easier to see, 
as the barrier has a different tag number and communication pattern.  So 
what I see is a few procs stuck in alltoall, while the rest are waiting 
in the following barrier.


I've also verified with totalview that there are no outstanding send 
wqe's at the UD BTL, and all procs are polling progress.  The procs in 
the alltoall are polling in the opal_condition_wait() called from 
ompi_request_wait_all().


Not sure what to ask or where to look further other than, what should I 
look at to see what requests are outstanding in the PML?


Andrew

George Bosilca wrote:
The first step will be to figure out which version of the alltoall  
you're using. I suppose you use the default parameters, and then the  
decision function in the tuned component say it is using the linear  
all to all. As the name state it, this means that every node will  
post one receive from any other node and then will start sending to  
every other node the respective fragment. This will lead to a lot of  
outstanding sends and receives. I doubt that the receive can cause a  
problem, so I expect the problem is coming from the send side.


Do you have TotalView installed on your odin ? If yes there is a  
simple way to see how many sends are pending and where ... That might  
pinpoint [at least] the process where you should look to see what'  
wrong.


   george.

On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote:

I'm having a problem with the UD BTL and hoping someone might have  
some

input to help solve it.

What I'm seeing is hangs when running alltoall benchmarks with  
nbcbench

or an LLNL program called mpiBench -- both hang exactly the same way.
With the code on the trunk running nbcbench on IU's odin using 32  
nodes

and a command line like this:

mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p  
128-128

-s 1-262144

hangs consistently when testing 256-byte messages.  There are two  
things
I can do to make the hang go away until running at larger scale.   
First
is to increase the 'btl_ofud_sd_num' MCA param from its default  
value of

128.  This allows you to run with more procs/nodes before hitting the
hang, but AFAICT doesn't fix the actual problem.  What this parameter
does is control the maximum number of outstanding send WQEs posted at
the IB level -- when the limit is reached, frags are queued on an
opal_list_t and later sent by progress as IB sends complete.

The other way I've found is to play games with calling
mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send 
().  In

fact I replaced the CHECK_FRAG_QUEUES() macro used around
btl_ofud_endpoint.c:77 with a version that loops on progress until a
send WQE slot is available (as opposed to queueing).  Same result -- I
can run at larger scale, but still hit the hang eventually.

It appears that when the job hangs, progress is being polled very
quickly, and after spinning for a while there are no outstanding send
WQEs or queued sends in the BTL.  I'm not sure where further up things
are spinning/blocking, as I can't produce the hang at less than 32  
nodes

/ 128 procs and don't have a good way of debugging that (suggestions
appreciated).

Furthermore, both ob1 and dr PMLs result in the same behavior, except
that DR eventually trips a watchdog timeout, fails the BTL, and
terminates the job.

Other collectives such as allreduce and allgather do not hang -- only
alltoall.  I can also reproduce the hang on LLNL's Atlas machine.

Can anyone else reproduce this (Torsten might have to make a copy of
nbcbench available)?  Anyone have any ideas as to what's wrong?

Andrew
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Adding a new component

2007-09-11 Thread Sajjad Tabib

Hi Aurelien,

Thank you for the pointers. I was able to plug in a component to an 
existing framework.

Thanks again,

Sajjad



Aurelien Bouteiller  
Sent by: devel-boun...@open-mpi.org
09/08/07 01:34 PM
Please respond to
Open MPI Developers 


To
Open MPI Developers 
cc

Subject
Re: [OMPI devel] Adding a new component






Hi Sajjad,

First it will depend wether you are writing a new component in an existing 
framework (let say you are writing a new BTL for a new type of 
interconnect) or a totally new framework (you want to have a family of 
component that can manage a totally new functionality in Open MPI). In 
each Framework there is a "base" which take care of the component 
selection process. If you are just adding a component, you will just need 
to provide a mca_mycomponent_init(bool enable_progress_threads, bool 
enable_mpi_threads) as described in the mca_component_t structure. The 
mca_framework_base_select will then take care of everything for you. If 
you want to add a new framework you'll have to create a selection function 
by yourself (all along with a full bunch of other functions to populate 
the base of the framework). I'll give you more details on this if it is 
relevant for you, just ask. 

Aurelien

Le 7 sept. 07 à 17:21, Sajjad Tabib a écrit :


Hi, 

I am a complete newbie to Open MPI internals and just began browsing the 
code and reading up on slides and papers. From what I have read, I learned 
that I have to create a new component. What I do not know is how to make 
MPI aware of it or should I say make MPI open and select my component. I 
found a set of slides that briefly went over adding components. For 
example, it briefly described that I must add PARAM_INIT_FILE and 
PARAM_CONFIG_FILES options in configure.params, but I'm not sure what 
these mean. Does anybody know of any tutorials/documents that could help 
me with this? 

Any help is greatly appreciated. 

S Tabib
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] Coverity

2007-09-11 Thread Jeff Squyres

David fixed a problem this morning that Coverity wasn't quite running  
right because the directory where OMPI lived was changing every  
night.  So a few of the old runs were pruned.


--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Edgar Gabriel


Gleb Natapov wrote:

On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote:

Gleb,

in the scenario which you describe in the comment to the patch, what 
should happen is, that the communicator with the cid which started 
already the allreduce will basically 'hang' until the other processes 
'allow' the lower cids to continue. It should basically be blocked in 
the allreduce.

Why? Two threads are allowed to run allreduce simultaneously for different
communicators. Are they?


They are, but they might never agree on the cid. This is simply how the 
algorithm was designed originally -- which does not mean, that it has to 
remain this way, just to explain its behavior and the intent. See the 
design doc for that in ompi-docs in the January 2004 repository.


Lets assume, that we have n procs with 2 threads, and both threads do a 
comm_create at the same time, input cid 1 and cid 2. N-1 processes let 
cid 1 start because that's the lower number. However, one process let 
cid 2 start because the other thread was late. What would happen in the 
algorithm is nobody responds to cid 2, so it would hang. As soon as the 
other thread with cid 1 enters the comm_create, it would be allowed to 
run, this operation would finish. The other threads would then allow cid 
2 to enter, the 'hanging' process would be released.





However, here is something, where we might have problems with the sun 
thread tests (and we discussed that with Terry already): the cid 
allocation algorithm as implemented in Open MPI assumes ( -- this was/is 
my/our understanding of the standard --) that a communicator creation is 
a collective operation. This means, you can not have a comm_create and 
another allreduce of the same communicator running in different threads, 
because these allreduces will mix up and produce non-sense results. We 
fixed the case, if all collective operations are comm_creates, but if 
some of the threads are in a comm_create and some are in allreduce on 
the same communicator, it won't work.

Correct, but this is not what happens with mt_coll test. mt_coll calls
commdup on the same communicator in different threads concurrently, but
we handle this case inside ompi_comm_nextcid().



Gleb Natapov wrote:

On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:

Gleb,

This patch is not correct. The code preventing the registration of the same 
communicator twice is later in the code (same file in the function 
ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid 

I saw this code and the comment. The problem is not with the same
communicator but with different communicators.

is called, we know that each communicator only handle one "communicator 
creation" function at the same time. Therefore, we want to give priority to 
the smallest com_id, which is what happens in the code you removed.

The code I removed was doing it wrongly. I.e the algorithm sometimes is executed
for different communicators simultaneously by different threads. Think
about the case where the function is running for cid 1 and then another
thread runs it for cid 0. cid 0 will proceed although the function is
executed on another CPU. And this is not something theoretical, that
is happening with sun's thread test suit mpi_coll test case.

Without the condition in the ompi_comm_register_cid (each communicator only 
get registered once) your comment make sense. However, with the condition 
your patch allow a dead end situation, while 2 processes try to create 
communicators in multiple threads, and they will never succeed, simply 
because they will not order the creation based on the com_id.

If the algorithm is really prone to deadlock in case it is concurrently
executed for several different communicators (I haven't check this),
then we may want to fix original code to really prevent two threads to
enter the function, but then I don't see the reason for all those
complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
The algorithm described here:
http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2
in section 5.3 works without it and we can do something similar.


  george.



On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:


Author: gleb
Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
New Revision: 16088
URL: https://svn.open-mpi.org/trac/ompi/changeset/16088

Log:
The code tries to prevent itself from running for more then one 
communicator
simultaneously, but is doing it incorrectly. If the function is running 
already
for one communicator and it is called from another thread for other 
communicator

with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
will fail and the function will be executed for two different 
communicators by
two threads simultaneously. There is nothing in the algorithm that prevent 
it
from been running simultaneously for different communicators as far as

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov

On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote:
>
> On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote:
>
>> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
>>> We don't want to prevent two thread from entering the code is same time.
>>> The algorithm you cited support this case. There is only one moment that 
>>> is
>> Are you sure it support this case? There is a global var mask_in_use
>> that prevent multiple access.
>
> I'm unable to find the mask_in_use global variable. Where it is ?
I thought by "algorithm you cited" you meant the algorithm described in
the link I provided. There is mask_in_use global var there that IMO
ensure that algorithm is executed for only one communicator
simultaneously.

>
>   george.
>
>>
>>> critical. The local selection of the next available cid. And this is what
>>> we try to protect there. If after the first run, the collective call do 
>>> not
>>> manage to figure out the correct next_cid then we will execute the while
>>> loop again. And then this condition make sense, as only the thread 
>>> running
>>> on the smallest communicator cid will continue. This insure that it will
>>> pickup the smallest next available cid, and then it's reduce operation 
>>> will
>>> succeed. The other threads will wait until the selection of the next
>>> available cid is unlocked.
>>>
>>> Without the code you removed we face a deadlock situation. Multiple 
>>> threads
>>> will pick different next_cid on each process and thy will never succeed
>>> with the reduce operation. And this is what we're trying to avoid with 
>>> the
>>> test.
>> OK. I think now I get the idea behind this test. I'll restore it and
>> leave ompi_comm_unregister_cid() fix in place. Is this OK?
>>
>>>
>>>   george.
>>>
>>> On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote:
>>>
 On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
> Gleb,
>
> This patch is not correct. The code preventing the registration of the
> same
> communicator twice is later in the code (same file in the function
> ompi_comm_register_cid line 326). Once the function
> ompi_comm_register_cid
 I saw this code and the comment. The problem is not with the same
 communicator but with different communicators.

> is called, we know that each communicator only handle one "communicator
> creation" function at the same time. Therefore, we want to give 
> priority
> to
> the smallest com_id, which is what happens in the code you removed.
 The code I removed was doing it wrongly. I.e the algorithm sometimes is
 executed
 for different communicators simultaneously by different threads. Think
 about the case where the function is running for cid 1 and then another
 thread runs it for cid 0. cid 0 will proceed although the function is
 executed on another CPU. And this is not something theoretical, that
 is happening with sun's thread test suit mpi_coll test case.

>
> Without the condition in the ompi_comm_register_cid (each communicator
> only
> get registered once) your comment make sense. However, with the 
> condition
> your patch allow a dead end situation, while 2 processes try to create
> communicators in multiple threads, and they will never succeed, simply
> because they will not order the creation based on the com_id.
 If the algorithm is really prone to deadlock in case it is concurrently
 executed for several different communicators (I haven't check this),
 then we may want to fix original code to really prevent two threads to
 enter the function, but then I don't see the reason for all those
 complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
 The algorithm described here:
 http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2
 in section 5.3 works without it and we can do something similar.

>
>   george.
>
>
>
> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:
>
>> Author: gleb
>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
>> New Revision: 16088
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
>>
>> Log:
>> The code tries to prevent itself from running for more then one
>> communicator
>> simultaneously, but is doing it incorrectly. If the function is 
>> running
>> already
>> for one communicator and it is called from another thread for other
>> communicator
>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
>> will fail and the function will be executed for two different
>> communicators by
>> two threads simultaneously. There is nothing in the algorithm that
>> prevent
>> it
>> from been running simultaneously for different communicators as far as 
>> I
>> can see,
>>

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread George Bosilca



On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote:


On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
We don't want to prevent two thread from entering the code is same  
time.
The algorithm you cited support this case. There is only one  
moment that is

Are you sure it support this case? There is a global var mask_in_use
that prevent multiple access.


I'm unable to find the mask_in_use global variable. Where it is ?

  george.



critical. The local selection of the next available cid. And this  
is what
we try to protect there. If after the first run, the collective  
call do not
manage to figure out the correct next_cid then we will execute the  
while
loop again. And then this condition make sense, as only the thread  
running
on the smallest communicator cid will continue. This insure that  
it will
pickup the smallest next available cid, and then it's reduce  
operation will

succeed. The other threads will wait until the selection of the next
available cid is unlocked.

Without the code you removed we face a deadlock situation.  
Multiple threads
will pick different next_cid on each process and thy will never  
succeed
with the reduce operation. And this is what we're trying to avoid  
with the

test.

OK. I think now I get the idea behind this test. I'll restore it and
leave ompi_comm_unregister_cid() fix in place. Is this OK?



  george.

On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote:


On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:

Gleb,

This patch is not correct. The code preventing the registration  
of the

same
communicator twice is later in the code (same file in the function
ompi_comm_register_cid line 326). Once the function
ompi_comm_register_cid

I saw this code and the comment. The problem is not with the same
communicator but with different communicators.

is called, we know that each communicator only handle one  
"communicator
creation" function at the same time. Therefore, we want to give  
priority

to
the smallest com_id, which is what happens in the code you removed.
The code I removed was doing it wrongly. I.e the algorithm  
sometimes is

executed
for different communicators simultaneously by different threads.  
Think
about the case where the function is running for cid 1 and then  
another
thread runs it for cid 0. cid 0 will proceed although the  
function is

executed on another CPU. And this is not something theoretical, that
is happening with sun's thread test suit mpi_coll test case.



Without the condition in the ompi_comm_register_cid (each  
communicator

only
get registered once) your comment make sense. However, with the  
condition
your patch allow a dead end situation, while 2 processes try to  
create
communicators in multiple threads, and they will never succeed,  
simply

because they will not order the creation based on the com_id.
If the algorithm is really prone to deadlock in case it is  
concurrently

executed for several different communicators (I haven't check this),
then we may want to fix original code to really prevent two  
threads to

enter the function, but then I don't see the reason for all those
complications with ompi_comm_register_cid()/ 
ompi_comm_unregister_cid()

The algorithm described here:
http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp:// 
info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI 
+communicator+dup+algorithm=en=clnk=2

in section 5.3 works without it and we can do something similar.



  george.



On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:


Author: gleb
Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
New Revision: 16088
URL: https://svn.open-mpi.org/trac/ompi/changeset/16088

Log:
The code tries to prevent itself from running for more then one
communicator
simultaneously, but is doing it incorrectly. If the function is  
running

already
for one communicator and it is called from another thread for  
other

communicator
with lower cid the check comm->c_contextid !=  
ompi_comm_lowest_cid()

will fail and the function will be executed for two different
communicators by
two threads simultaneously. There is nothing in the algorithm that
prevent
it
from been running simultaneously for different communicators as  
far as I

can see,
but ompi_comm_unregister_cid() assumes that it is always called  
for a

communicator
with the lowest cid and this is not always the case. This patch  
removes

bogus
lowest cid check and fix ompi_comm_register_cid() to properly  
remove cid

from
the list.

Text files modified:
   trunk/ompi/communicator/comm_cid.c |24 +++ 
+

   1 files changed, 12 insertions(+), 12 deletions(-)

Modified: trunk/ompi/communicator/comm_cid.c
== 


--- trunk/ompi/communicator/comm_cid.c  (original)
+++ trunk/ompi/communicator/comm_cid.c	2007-09-11 09:23:46 EDT  
(Tue, 11

Sep 2007)
@@ -11,6 +11,7 @@
  * All rights reserved.
  *

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov

On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote:
> Gleb,
> 
> in the scenario which you describe in the comment to the patch, what 
> should happen is, that the communicator with the cid which started 
> already the allreduce will basically 'hang' until the other processes 
> 'allow' the lower cids to continue. It should basically be blocked in 
> the allreduce.
Why? Two threads are allowed to run allreduce simultaneously for different
communicators. Are they?

> 
> However, here is something, where we might have problems with the sun 
> thread tests (and we discussed that with Terry already): the cid 
> allocation algorithm as implemented in Open MPI assumes ( -- this was/is 
> my/our understanding of the standard --) that a communicator creation is 
> a collective operation. This means, you can not have a comm_create and 
> another allreduce of the same communicator running in different threads, 
> because these allreduces will mix up and produce non-sense results. We 
> fixed the case, if all collective operations are comm_creates, but if 
> some of the threads are in a comm_create and some are in allreduce on 
> the same communicator, it won't work.
Correct, but this is not what happens with mt_coll test. mt_coll calls
commdup on the same communicator in different threads concurrently, but
we handle this case inside ompi_comm_nextcid().

> 
> 
> Gleb Natapov wrote:
> > On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
> >> Gleb,
> >>
> >> This patch is not correct. The code preventing the registration of the 
> >> same 
> >> communicator twice is later in the code (same file in the function 
> >> ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid 
> > I saw this code and the comment. The problem is not with the same
> > communicator but with different communicators.
> > 
> >> is called, we know that each communicator only handle one "communicator 
> >> creation" function at the same time. Therefore, we want to give priority 
> >> to 
> >> the smallest com_id, which is what happens in the code you removed.
> > The code I removed was doing it wrongly. I.e the algorithm sometimes is 
> > executed
> > for different communicators simultaneously by different threads. Think
> > about the case where the function is running for cid 1 and then another
> > thread runs it for cid 0. cid 0 will proceed although the function is
> > executed on another CPU. And this is not something theoretical, that
> > is happening with sun's thread test suit mpi_coll test case.
> > 
> >> Without the condition in the ompi_comm_register_cid (each communicator 
> >> only 
> >> get registered once) your comment make sense. However, with the condition 
> >> your patch allow a dead end situation, while 2 processes try to create 
> >> communicators in multiple threads, and they will never succeed, simply 
> >> because they will not order the creation based on the com_id.
> > If the algorithm is really prone to deadlock in case it is concurrently
> > executed for several different communicators (I haven't check this),
> > then we may want to fix original code to really prevent two threads to
> > enter the function, but then I don't see the reason for all those
> > complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
> > The algorithm described here:
> > http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2
> > in section 5.3 works without it and we can do something similar.
> > 
> >>   george.
> >>
> >>
> >>
> >> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:
> >>
> >>> Author: gleb
> >>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
> >>> New Revision: 16088
> >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
> >>>
> >>> Log:
> >>> The code tries to prevent itself from running for more then one 
> >>> communicator
> >>> simultaneously, but is doing it incorrectly. If the function is running 
> >>> already
> >>> for one communicator and it is called from another thread for other 
> >>> communicator
> >>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
> >>> will fail and the function will be executed for two different 
> >>> communicators by
> >>> two threads simultaneously. There is nothing in the algorithm that 
> >>> prevent 
> >>> it
> >>> from been running simultaneously for different communicators as far as I 
> >>> can see,
> >>> but ompi_comm_unregister_cid() assumes that it is always called for a 
> >>> communicator
> >>> with the lowest cid and this is not always the case. This patch removes 
> >>> bogus
> >>> lowest cid check and fix ompi_comm_register_cid() to properly remove cid 
> >>> from
> >>> the list.
> >>>
> >>> Text files modified:
> >>>trunk/ompi/communicator/comm_cid.c |24 
> >>>1 files changed, 12 insertions(+), 12 deletions(-)
> >>>
> >>> Modified:

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Edgar Gabriel


Gleb,

in the scenario which you describe in the comment to the patch, what 
should happen is, that the communicator with the cid which started 
already the allreduce will basically 'hang' until the other processes 
'allow' the lower cids to continue. It should basically be blocked in 
the allreduce.


However, here is something, where we might have problems with the sun 
thread tests (and we discussed that with Terry already): the cid 
allocation algorithm as implemented in Open MPI assumes ( -- this was/is 
my/our understanding of the standard --) that a communicator creation is 
a collective operation. This means, you can not have a comm_create and 
another allreduce of the same communicator running in different threads, 
because these allreduces will mix up and produce non-sense results. We 
fixed the case, if all collective operations are comm_creates, but if 
some of the threads are in a comm_create and some are in allreduce on 
the same communicator, it won't work.



Gleb Natapov wrote:

On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:

Gleb,

This patch is not correct. The code preventing the registration of the same 
communicator twice is later in the code (same file in the function 
ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid 

I saw this code and the comment. The problem is not with the same
communicator but with different communicators.

is called, we know that each communicator only handle one "communicator 
creation" function at the same time. Therefore, we want to give priority to 
the smallest com_id, which is what happens in the code you removed.

The code I removed was doing it wrongly. I.e the algorithm sometimes is executed
for different communicators simultaneously by different threads. Think
about the case where the function is running for cid 1 and then another
thread runs it for cid 0. cid 0 will proceed although the function is
executed on another CPU. And this is not something theoretical, that
is happening with sun's thread test suit mpi_coll test case.

Without the condition in the ompi_comm_register_cid (each communicator only 
get registered once) your comment make sense. However, with the condition 
your patch allow a dead end situation, while 2 processes try to create 
communicators in multiple threads, and they will never succeed, simply 
because they will not order the creation based on the com_id.

If the algorithm is really prone to deadlock in case it is concurrently
executed for several different communicators (I haven't check this),
then we may want to fix original code to really prevent two threads to
enter the function, but then I don't see the reason for all those
complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
The algorithm described here:
http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2
in section 5.3 works without it and we can do something similar.


  george.



On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:


Author: gleb
Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
New Revision: 16088
URL: https://svn.open-mpi.org/trac/ompi/changeset/16088

Log:
The code tries to prevent itself from running for more then one 
communicator
simultaneously, but is doing it incorrectly. If the function is running 
already
for one communicator and it is called from another thread for other 
communicator

with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
will fail and the function will be executed for two different 
communicators by
two threads simultaneously. There is nothing in the algorithm that prevent 
it
from been running simultaneously for different communicators as far as I 
can see,
but ompi_comm_unregister_cid() assumes that it is always called for a 
communicator
with the lowest cid and this is not always the case. This patch removes 
bogus
lowest cid check and fix ompi_comm_register_cid() to properly remove cid 
from

the list.

Text files modified:
   trunk/ompi/communicator/comm_cid.c |24 
   1 files changed, 12 insertions(+), 12 deletions(-)

Modified: trunk/ompi/communicator/comm_cid.c
==
--- trunk/ompi/communicator/comm_cid.c  (original)
+++ trunk/ompi/communicator/comm_cid.c	2007-09-11 09:23:46 EDT (Tue, 11 
Sep 2007)

@@ -11,6 +11,7 @@
  * All rights reserved.
  * Copyright (c) 2006-2007 University of Houston. All rights reserved.
  * Copyright (c) 2007  Cisco, Inc.  All rights reserved.
+ * Copyright (c) 2007  Voltaire All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -170,15 +171,6 @@
  * This is the real algorithm described in the doc
  */

-OPAL_THREAD_LOCK(_cid_lock);
-if (comm->c_contextid != ompi_comm_lowest_cid() ) {
-/* if not

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread George Bosilca

We don't want to prevent two thread from entering the code is same  
time. The algorithm you cited support this case. There is only one  
moment that is critical. The local selection of the next available  
cid. And this is what we try to protect there. If after the first  
run, the collective call do not manage to figure out the correct  
next_cid then we will execute the while loop again. And then this  
condition make sense, as only the thread running on the smallest  
communicator cid will continue. This insure that it will pickup the  
smallest next available cid, and then it's reduce operation will  
succeed. The other threads will wait until the selection of the next  
available cid is unlocked.


Without the code you removed we face a deadlock situation. Multiple  
threads will pick different next_cid on each process and thy will  
never succeed with the reduce operation. And this is what we're  
trying to avoid with the test.


  george.

On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote:


On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:

Gleb,

This patch is not correct. The code preventing the registration of  
the same

communicator twice is later in the code (same file in the function
ompi_comm_register_cid line 326). Once the function  
ompi_comm_register_cid

I saw this code and the comment. The problem is not with the same
communicator but with different communicators.

is called, we know that each communicator only handle one  
"communicator
creation" function at the same time. Therefore, we want to give  
priority to

the smallest com_id, which is what happens in the code you removed.
The code I removed was doing it wrongly. I.e the algorithm  
sometimes is executed

for different communicators simultaneously by different threads. Think
about the case where the function is running for cid 1 and then  
another

thread runs it for cid 0. cid 0 will proceed although the function is
executed on another CPU. And this is not something theoretical, that
is happening with sun's thread test suit mpi_coll test case.



Without the condition in the ompi_comm_register_cid (each  
communicator only
get registered once) your comment make sense. However, with the  
condition
your patch allow a dead end situation, while 2 processes try to  
create
communicators in multiple threads, and they will never succeed,  
simply

because they will not order the creation based on the com_id.
If the algorithm is really prone to deadlock in case it is  
concurrently

executed for several different communicators (I haven't check this),
then we may want to fix original code to really prevent two threads to
enter the function, but then I don't see the reason for all those
complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
The algorithm described here:
http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp:// 
info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator 
+dup+algorithm=en=clnk=2

in section 5.3 works without it and we can do something similar.



  george.



On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:


Author: gleb
Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
New Revision: 16088
URL: https://svn.open-mpi.org/trac/ompi/changeset/16088

Log:
The code tries to prevent itself from running for more then one
communicator
simultaneously, but is doing it incorrectly. If the function is  
running

already
for one communicator and it is called from another thread for other
communicator
with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
will fail and the function will be executed for two different
communicators by
two threads simultaneously. There is nothing in the algorithm  
that prevent

it
from been running simultaneously for different communicators as  
far as I

can see,
but ompi_comm_unregister_cid() assumes that it is always called  
for a

communicator
with the lowest cid and this is not always the case. This patch  
removes

bogus
lowest cid check and fix ompi_comm_register_cid() to properly  
remove cid

from
the list.

Text files modified:
   trunk/ompi/communicator/comm_cid.c |24 +++ 
+

   1 files changed, 12 insertions(+), 12 deletions(-)

Modified: trunk/ompi/communicator/comm_cid.c
 
==

--- trunk/ompi/communicator/comm_cid.c  (original)
+++ trunk/ompi/communicator/comm_cid.c	2007-09-11 09:23:46 EDT  
(Tue, 11

Sep 2007)
@@ -11,6 +11,7 @@
  * All rights reserved.
  * Copyright (c) 2006-2007 University of Houston. All rights  
reserved.

  * Copyright (c) 2007  Cisco, Inc.  All rights reserved.
+ * Copyright (c) 2007  Voltaire All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -170,15 +171,6 @@
  * This is the real algorithm described in the doc
  */

-OPAL_THREAD_LOCK(_cid_lock);
-if (comm->c_contextid !=

Re: [MTT devel] [MTT svn] svn:mtt-svn r998

2007-09-11 Thread Jeff Squyres


Ethan --

Could you show the use case that motivated this change?

Thanks.


On Sep 7, 2007, at 11:52 AM, emall...@osl.iu.edu wrote:


Author: emallove
Date: 2007-09-07 11:52:04 EDT (Fri, 07 Sep 2007)
New Revision: 998
URL: https://svn.open-mpi.org/trac/mtt/changeset/998

Log:
Escape the Perl regular expression quantifiers in
`::OMPI::find_network` (for test names such as
`mpic++`).

Text files modified:
   tmp/jms-new-parser/lib/MTT/Values/Functions/MPI/OMPI.pm | 3 +++
   1 files changed, 3 insertions(+), 0 deletions(-)

Modified: tmp/jms-new-parser/lib/MTT/Values/Functions/MPI/OMPI.pm
== 


--- tmp/jms-new-parser/lib/MTT/Values/Functions/MPI/OMPI.pm (original)
+++ tmp/jms-new-parser/lib/MTT/Values/Functions/MPI/OMPI.pm	 
2007-09-07 11:52:04 EDT (Fri, 07 Sep 2007)

@@ -98,6 +98,9 @@
 # Ignore argv[0]
 $str =~ s/^\s*\S+\s*(.+)$/\1/;

+# Escape the quantifiers (for test names such as "mpi2c++")
+$final =~ s/(\?|\*|\+|\{|\})/\\$1/g;
+
 # Ignore everything beyond $final
 $str =~ s/^(.+)\s*$final.+$/\1/;
 Debug("Examining: $str\n");
___
mtt-svn mailing list
mtt-...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-svn



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov

On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
> Gleb,
>
> This patch is not correct. The code preventing the registration of the same 
> communicator twice is later in the code (same file in the function 
> ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid 
I saw this code and the comment. The problem is not with the same
communicator but with different communicators.

> is called, we know that each communicator only handle one "communicator 
> creation" function at the same time. Therefore, we want to give priority to 
> the smallest com_id, which is what happens in the code you removed.
The code I removed was doing it wrongly. I.e the algorithm sometimes is executed
for different communicators simultaneously by different threads. Think
about the case where the function is running for cid 1 and then another
thread runs it for cid 0. cid 0 will proceed although the function is
executed on another CPU. And this is not something theoretical, that
is happening with sun's thread test suit mpi_coll test case.

>
> Without the condition in the ompi_comm_register_cid (each communicator only 
> get registered once) your comment make sense. However, with the condition 
> your patch allow a dead end situation, while 2 processes try to create 
> communicators in multiple threads, and they will never succeed, simply 
> because they will not order the creation based on the com_id.
If the algorithm is really prone to deadlock in case it is concurrently
executed for several different communicators (I haven't check this),
then we may want to fix original code to really prevent two threads to
enter the function, but then I don't see the reason for all those
complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
The algorithm described here:
http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2
in section 5.3 works without it and we can do something similar.

>
>   george.
>
>
>
> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:
>
>> Author: gleb
>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
>> New Revision: 16088
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
>>
>> Log:
>> The code tries to prevent itself from running for more then one 
>> communicator
>> simultaneously, but is doing it incorrectly. If the function is running 
>> already
>> for one communicator and it is called from another thread for other 
>> communicator
>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
>> will fail and the function will be executed for two different 
>> communicators by
>> two threads simultaneously. There is nothing in the algorithm that prevent 
>> it
>> from been running simultaneously for different communicators as far as I 
>> can see,
>> but ompi_comm_unregister_cid() assumes that it is always called for a 
>> communicator
>> with the lowest cid and this is not always the case. This patch removes 
>> bogus
>> lowest cid check and fix ompi_comm_register_cid() to properly remove cid 
>> from
>> the list.
>>
>> Text files modified:
>>trunk/ompi/communicator/comm_cid.c |24 
>>1 files changed, 12 insertions(+), 12 deletions(-)
>>
>> Modified: trunk/ompi/communicator/comm_cid.c
>> ==
>> --- trunk/ompi/communicator/comm_cid.c   (original)
>> +++ trunk/ompi/communicator/comm_cid.c   2007-09-11 09:23:46 EDT (Tue, 
>> 11 
>> Sep 2007)
>> @@ -11,6 +11,7 @@
>>   * All rights reserved.
>>   * Copyright (c) 2006-2007 University of Houston. All rights reserved.
>>   * Copyright (c) 2007  Cisco, Inc.  All rights reserved.
>> + * Copyright (c) 2007  Voltaire All rights reserved.
>>   * $COPYRIGHT$
>>   *
>>   * Additional copyrights may follow
>> @@ -170,15 +171,6 @@
>>   * This is the real algorithm described in the doc
>>   */
>>
>> -OPAL_THREAD_LOCK(_cid_lock);
>> -if (comm->c_contextid != ompi_comm_lowest_cid() ) {
>> -/* if not lowest cid, we do not continue, but sleep and 
>> try again */
>> -OPAL_THREAD_UNLOCK(_cid_lock);
>> -continue;
>> -}
>> -OPAL_THREAD_UNLOCK(_cid_lock);
>> -
>> -
>>  for (i=start; i < mca_pml.pml_max_contextid ; i++) {
>>  
>> flag=ompi_pointer_array_test_and_set_item(_mpi_communicators,
>>i, comm);
>> @@ -365,10 +357,18 @@
>>
>>  static int ompi_comm_unregister_cid (uint32_t cid)
>>  {
>> -ompi_comm_reg_t *regcom=NULL;
>> -opal_list_item_t 
>> *item=opal_list_remove_first(_registered_comms);
>> +ompi_comm_reg_t *regcom;
>> +opal_list_item_t *item;
>>
>> -regcom = (ompi_comm_reg_t *) item;
>> +for (item = opal_list_get_first(_registered_comms);

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread George Bosilca


Gleb,

This patch is not correct. The code preventing the registration of  
the same communicator twice is later in the code (same file in the  
function ompi_comm_register_cid line 326). Once the function  
ompi_comm_register_cid is called, we know that each communicator only  
handle one "communicator creation" function at the same time.  
Therefore, we want to give priority to the smallest com_id, which is  
what happens in the code you removed.


Without the condition in the ompi_comm_register_cid (each  
communicator only get registered once) your comment make sense.  
However, with the condition your patch allow a dead end situation,  
while 2 processes try to create communicators in multiple threads,  
and they will never succeed, simply because they will not order the  
creation based on the com_id.


  george.



On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:


Author: gleb
Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
New Revision: 16088
URL: https://svn.open-mpi.org/trac/ompi/changeset/16088

Log:
The code tries to prevent itself from running for more then one  
communicator
simultaneously, but is doing it incorrectly. If the function is  
running already
for one communicator and it is called from another thread for other  
communicator

with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
will fail and the function will be executed for two different  
communicators by
two threads simultaneously. There is nothing in the algorithm that  
prevent it
from been running simultaneously for different communicators as far  
as I can see,
but ompi_comm_unregister_cid() assumes that it is always called for  
a communicator
with the lowest cid and this is not always the case. This patch  
removes bogus
lowest cid check and fix ompi_comm_register_cid() to properly  
remove cid from

the list.

Text files modified:
   trunk/ompi/communicator/comm_cid.c |24 
   1 files changed, 12 insertions(+), 12 deletions(-)

Modified: trunk/ompi/communicator/comm_cid.c
== 


--- trunk/ompi/communicator/comm_cid.c  (original)
+++ trunk/ompi/communicator/comm_cid.c	2007-09-11 09:23:46 EDT  
(Tue, 11 Sep 2007)

@@ -11,6 +11,7 @@
  * All rights reserved.
  * Copyright (c) 2006-2007 University of Houston. All rights  
reserved.

  * Copyright (c) 2007  Cisco, Inc.  All rights reserved.
+ * Copyright (c) 2007  Voltaire All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -170,15 +171,6 @@
  * This is the real algorithm described in the doc
  */

-OPAL_THREAD_LOCK(_cid_lock);
-if (comm->c_contextid != ompi_comm_lowest_cid() ) {
-/* if not lowest cid, we do not continue, but  
sleep and try again */

-OPAL_THREAD_UNLOCK(_cid_lock);
-continue;
-}
-OPAL_THREAD_UNLOCK(_cid_lock);
-
-
 for (i=start; i < mca_pml.pml_max_contextid ; i++) {
 flag=ompi_pointer_array_test_and_set_item 
(_mpi_communicators,

   i, comm);
@@ -365,10 +357,18 @@

 static int ompi_comm_unregister_cid (uint32_t cid)
 {
-ompi_comm_reg_t *regcom=NULL;
-opal_list_item_t *item=opal_list_remove_first 
(_registered_comms);

+ompi_comm_reg_t *regcom;
+opal_list_item_t *item;

-regcom = (ompi_comm_reg_t *) item;
+for (item = opal_list_get_first(_registered_comms);
+ item != opal_list_get_end(_registered_comms);
+ item = opal_list_get_next(item)) {
+regcom = (ompi_comm_reg_t *)item;
+if(regcom->cid == cid) {
+opal_list_remove_item(_registered_comms, item);
+break;
+}
+}
 OBJ_RELEASE(regcom);
 return OMPI_SUCCESS;
 }
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full




smime.p7s
Description: S/MIME cryptographic signature

[OMPI devel] Which tests for larger cluster testing

Re: [OMPI devel] [devel-core] [RFC] Exit without finalize

Re: [OMPI devel] [devel-core] [RFC] Exit without finalize

Re: [OMPI devel] UD BTL alltoall hangs

Re: [OMPI devel] Adding a new component

[OMPI devel] Coverity

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Re: [MTT devel] [MTT svn] svn:mtt-svn r998

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

15 matches

Site Navigation

Mail list logo

Footer information