Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote: Gleb, in the scenario which you describe in the comment to the patch, what should happen is, that the communicator with the cid which started already the allreduce will basically 'hang' until the other processes 'allow' the lower cids to continue. It should basically be blocked in the allreduce. Why? Two threads are allowed to run allreduce simultaneously for different communicators. Are they? They are, but they might never agree on the cid. This is simply how the algorithm was designed originally -- which does not mean, that it has to remain this way, just to explain its behavior and the intent. See the design doc for that in ompi-docs in the January 2004 repository. Lets assume, that we have n procs with 2 threads, and both threads do a comm_create at the same time, input cid 1 and cid 2. N-1 processes let cid 1 start because that's the lower number. However, one process let cid 2 start because the other thread was late. What would happen in the algorithm is nobody responds to cid 2, so it would hang. As soon as the other thread with cid 1 enters the comm_create, it would be allowed to run, this operation would finish. The other threads would then allow cid 2 to enter, the 'hanging' process would be released. However, here is something, where we might have problems with the sun thread tests (and we discussed that with Terry already): the cid allocation algorithm as implemented in Open MPI assumes ( -- this was/is my/our understanding of the standard --) that a communicator creation is a collective operation. This means, you can not have a comm_create and another allreduce of the same communicator running in different threads, because these allreduces will mix up and produce non-sense results. We fixed the case, if all collective operations are comm_creates, but if some of the threads are in a comm_create and some are in allreduce on the same communicator, it won't work. Correct, but this is not what happens with mt_coll test. mt_coll calls commdup on the same communicator in different threads concurrently, but we handle this case inside ompi_comm_nextcid(). Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: Gleb, This patch is not correct. The code preventing the registration of the same communicator twice is later in the code (same file in the function ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid I saw this code and the comment. The problem is not with the same communicator but with different communicators. is called, we know that each communicator only handle one "communicator creation" function at the same time. Therefore, we want to give priority to the smallest com_id, which is what happens in the code you removed. The code I removed was doing it wrongly. I.e the algorithm sometimes is executed for different communicators simultaneously by different threads. Think about the case where the function is running for cid 1 and then another thread runs it for cid 0. cid 0 will proceed although the function is executed on another CPU. And this is not something theoretical, that is happening with sun's thread test suit mpi_coll test case. Without the condition in the ompi_comm_register_cid (each communicator only get registered once) your comment make sense. However, with the condition your patch allow a dead end situation, while 2 processes try to create communicators in multiple threads, and they will never succeed, simply because they will not order the creation based on the com_id. If the algorithm is really prone to deadlock in case it is concurrently executed for several different communicators (I haven't check this), then we may want to fix original code to really prevent two threads to enter the function, but then I don't see the reason for all those complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() The algorithm described here: http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2 in section 5.3 works without it and we can do something similar. george. On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: Author: gleb Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) New Revision: 16088 URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 Log: The code tries to prevent itself from running for more then one communicator simultaneously, but is doing it incorrectly. If the function is running already for one communicator and it is called from another thread for other communicator with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() will fail and the function will be executed for two different communicators by two threads simultaneously. There is nothing in the algorithm that prevent it from been running simultaneously for different communicators as far as
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote: > > On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote: > >> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: >>> We don't want to prevent two thread from entering the code is same time. >>> The algorithm you cited support this case. There is only one moment that >>> is >> Are you sure it support this case? There is a global var mask_in_use >> that prevent multiple access. > > I'm unable to find the mask_in_use global variable. Where it is ? I thought by "algorithm you cited" you meant the algorithm described in the link I provided. There is mask_in_use global var there that IMO ensure that algorithm is executed for only one communicator simultaneously. > > george. > >> >>> critical. The local selection of the next available cid. And this is what >>> we try to protect there. If after the first run, the collective call do >>> not >>> manage to figure out the correct next_cid then we will execute the while >>> loop again. And then this condition make sense, as only the thread >>> running >>> on the smallest communicator cid will continue. This insure that it will >>> pickup the smallest next available cid, and then it's reduce operation >>> will >>> succeed. The other threads will wait until the selection of the next >>> available cid is unlocked. >>> >>> Without the code you removed we face a deadlock situation. Multiple >>> threads >>> will pick different next_cid on each process and thy will never succeed >>> with the reduce operation. And this is what we're trying to avoid with >>> the >>> test. >> OK. I think now I get the idea behind this test. I'll restore it and >> leave ompi_comm_unregister_cid() fix in place. Is this OK? >> >>> >>> george. >>> >>> On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote: >>> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: > Gleb, > > This patch is not correct. The code preventing the registration of the > same > communicator twice is later in the code (same file in the function > ompi_comm_register_cid line 326). Once the function > ompi_comm_register_cid I saw this code and the comment. The problem is not with the same communicator but with different communicators. > is called, we know that each communicator only handle one "communicator > creation" function at the same time. Therefore, we want to give > priority > to > the smallest com_id, which is what happens in the code you removed. The code I removed was doing it wrongly. I.e the algorithm sometimes is executed for different communicators simultaneously by different threads. Think about the case where the function is running for cid 1 and then another thread runs it for cid 0. cid 0 will proceed although the function is executed on another CPU. And this is not something theoretical, that is happening with sun's thread test suit mpi_coll test case. > > Without the condition in the ompi_comm_register_cid (each communicator > only > get registered once) your comment make sense. However, with the > condition > your patch allow a dead end situation, while 2 processes try to create > communicators in multiple threads, and they will never succeed, simply > because they will not order the creation based on the com_id. If the algorithm is really prone to deadlock in case it is concurrently executed for several different communicators (I haven't check this), then we may want to fix original code to really prevent two threads to enter the function, but then I don't see the reason for all those complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() The algorithm described here: http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2 in section 5.3 works without it and we can do something similar. > > george. > > > > On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: > >> Author: gleb >> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) >> New Revision: 16088 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 >> >> Log: >> The code tries to prevent itself from running for more then one >> communicator >> simultaneously, but is doing it incorrectly. If the function is >> running >> already >> for one communicator and it is called from another thread for other >> communicator >> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() >> will fail and the function will be executed for two different >> communicators by >> two threads simultaneously. There is nothing in the algorithm that >> prevent >> it >> from been running simultaneously for different communicators as far as >> I >> can see, >>
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: We don't want to prevent two thread from entering the code is same time. The algorithm you cited support this case. There is only one moment that is Are you sure it support this case? There is a global var mask_in_use that prevent multiple access. I'm unable to find the mask_in_use global variable. Where it is ? george. critical. The local selection of the next available cid. And this is what we try to protect there. If after the first run, the collective call do not manage to figure out the correct next_cid then we will execute the while loop again. And then this condition make sense, as only the thread running on the smallest communicator cid will continue. This insure that it will pickup the smallest next available cid, and then it's reduce operation will succeed. The other threads will wait until the selection of the next available cid is unlocked. Without the code you removed we face a deadlock situation. Multiple threads will pick different next_cid on each process and thy will never succeed with the reduce operation. And this is what we're trying to avoid with the test. OK. I think now I get the idea behind this test. I'll restore it and leave ompi_comm_unregister_cid() fix in place. Is this OK? george. On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: Gleb, This patch is not correct. The code preventing the registration of the same communicator twice is later in the code (same file in the function ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid I saw this code and the comment. The problem is not with the same communicator but with different communicators. is called, we know that each communicator only handle one "communicator creation" function at the same time. Therefore, we want to give priority to the smallest com_id, which is what happens in the code you removed. The code I removed was doing it wrongly. I.e the algorithm sometimes is executed for different communicators simultaneously by different threads. Think about the case where the function is running for cid 1 and then another thread runs it for cid 0. cid 0 will proceed although the function is executed on another CPU. And this is not something theoretical, that is happening with sun's thread test suit mpi_coll test case. Without the condition in the ompi_comm_register_cid (each communicator only get registered once) your comment make sense. However, with the condition your patch allow a dead end situation, while 2 processes try to create communicators in multiple threads, and they will never succeed, simply because they will not order the creation based on the com_id. If the algorithm is really prone to deadlock in case it is concurrently executed for several different communicators (I haven't check this), then we may want to fix original code to really prevent two threads to enter the function, but then I don't see the reason for all those complications with ompi_comm_register_cid()/ ompi_comm_unregister_cid() The algorithm described here: http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp:// info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI +communicator+dup+algorithm=en=clnk=2 in section 5.3 works without it and we can do something similar. george. On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: Author: gleb Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) New Revision: 16088 URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 Log: The code tries to prevent itself from running for more then one communicator simultaneously, but is doing it incorrectly. If the function is running already for one communicator and it is called from another thread for other communicator with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() will fail and the function will be executed for two different communicators by two threads simultaneously. There is nothing in the algorithm that prevent it from been running simultaneously for different communicators as far as I can see, but ompi_comm_unregister_cid() assumes that it is always called for a communicator with the lowest cid and this is not always the case. This patch removes bogus lowest cid check and fix ompi_comm_register_cid() to properly remove cid from the list. Text files modified: trunk/ompi/communicator/comm_cid.c |24 +++ + 1 files changed, 12 insertions(+), 12 deletions(-) Modified: trunk/ompi/communicator/comm_cid.c == --- trunk/ompi/communicator/comm_cid.c (original) +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) @@ -11,6 +11,7 @@ * All rights reserved. *
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote: > Gleb, > > in the scenario which you describe in the comment to the patch, what > should happen is, that the communicator with the cid which started > already the allreduce will basically 'hang' until the other processes > 'allow' the lower cids to continue. It should basically be blocked in > the allreduce. Why? Two threads are allowed to run allreduce simultaneously for different communicators. Are they? > > However, here is something, where we might have problems with the sun > thread tests (and we discussed that with Terry already): the cid > allocation algorithm as implemented in Open MPI assumes ( -- this was/is > my/our understanding of the standard --) that a communicator creation is > a collective operation. This means, you can not have a comm_create and > another allreduce of the same communicator running in different threads, > because these allreduces will mix up and produce non-sense results. We > fixed the case, if all collective operations are comm_creates, but if > some of the threads are in a comm_create and some are in allreduce on > the same communicator, it won't work. Correct, but this is not what happens with mt_coll test. mt_coll calls commdup on the same communicator in different threads concurrently, but we handle this case inside ompi_comm_nextcid(). > > > Gleb Natapov wrote: > > On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: > >> Gleb, > >> > >> This patch is not correct. The code preventing the registration of the > >> same > >> communicator twice is later in the code (same file in the function > >> ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid > > I saw this code and the comment. The problem is not with the same > > communicator but with different communicators. > > > >> is called, we know that each communicator only handle one "communicator > >> creation" function at the same time. Therefore, we want to give priority > >> to > >> the smallest com_id, which is what happens in the code you removed. > > The code I removed was doing it wrongly. I.e the algorithm sometimes is > > executed > > for different communicators simultaneously by different threads. Think > > about the case where the function is running for cid 1 and then another > > thread runs it for cid 0. cid 0 will proceed although the function is > > executed on another CPU. And this is not something theoretical, that > > is happening with sun's thread test suit mpi_coll test case. > > > >> Without the condition in the ompi_comm_register_cid (each communicator > >> only > >> get registered once) your comment make sense. However, with the condition > >> your patch allow a dead end situation, while 2 processes try to create > >> communicators in multiple threads, and they will never succeed, simply > >> because they will not order the creation based on the com_id. > > If the algorithm is really prone to deadlock in case it is concurrently > > executed for several different communicators (I haven't check this), > > then we may want to fix original code to really prevent two threads to > > enter the function, but then I don't see the reason for all those > > complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() > > The algorithm described here: > > http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2 > > in section 5.3 works without it and we can do something similar. > > > >> george. > >> > >> > >> > >> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: > >> > >>> Author: gleb > >>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) > >>> New Revision: 16088 > >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 > >>> > >>> Log: > >>> The code tries to prevent itself from running for more then one > >>> communicator > >>> simultaneously, but is doing it incorrectly. If the function is running > >>> already > >>> for one communicator and it is called from another thread for other > >>> communicator > >>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() > >>> will fail and the function will be executed for two different > >>> communicators by > >>> two threads simultaneously. There is nothing in the algorithm that > >>> prevent > >>> it > >>> from been running simultaneously for different communicators as far as I > >>> can see, > >>> but ompi_comm_unregister_cid() assumes that it is always called for a > >>> communicator > >>> with the lowest cid and this is not always the case. This patch removes > >>> bogus > >>> lowest cid check and fix ompi_comm_register_cid() to properly remove cid > >>> from > >>> the list. > >>> > >>> Text files modified: > >>>trunk/ompi/communicator/comm_cid.c |24 > >>>1 files changed, 12 insertions(+), 12 deletions(-) > >>> > >>> Modified:
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
Gleb, in the scenario which you describe in the comment to the patch, what should happen is, that the communicator with the cid which started already the allreduce will basically 'hang' until the other processes 'allow' the lower cids to continue. It should basically be blocked in the allreduce. However, here is something, where we might have problems with the sun thread tests (and we discussed that with Terry already): the cid allocation algorithm as implemented in Open MPI assumes ( -- this was/is my/our understanding of the standard --) that a communicator creation is a collective operation. This means, you can not have a comm_create and another allreduce of the same communicator running in different threads, because these allreduces will mix up and produce non-sense results. We fixed the case, if all collective operations are comm_creates, but if some of the threads are in a comm_create and some are in allreduce on the same communicator, it won't work. Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: Gleb, This patch is not correct. The code preventing the registration of the same communicator twice is later in the code (same file in the function ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid I saw this code and the comment. The problem is not with the same communicator but with different communicators. is called, we know that each communicator only handle one "communicator creation" function at the same time. Therefore, we want to give priority to the smallest com_id, which is what happens in the code you removed. The code I removed was doing it wrongly. I.e the algorithm sometimes is executed for different communicators simultaneously by different threads. Think about the case where the function is running for cid 1 and then another thread runs it for cid 0. cid 0 will proceed although the function is executed on another CPU. And this is not something theoretical, that is happening with sun's thread test suit mpi_coll test case. Without the condition in the ompi_comm_register_cid (each communicator only get registered once) your comment make sense. However, with the condition your patch allow a dead end situation, while 2 processes try to create communicators in multiple threads, and they will never succeed, simply because they will not order the creation based on the com_id. If the algorithm is really prone to deadlock in case it is concurrently executed for several different communicators (I haven't check this), then we may want to fix original code to really prevent two threads to enter the function, but then I don't see the reason for all those complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() The algorithm described here: http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2 in section 5.3 works without it and we can do something similar. george. On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: Author: gleb Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) New Revision: 16088 URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 Log: The code tries to prevent itself from running for more then one communicator simultaneously, but is doing it incorrectly. If the function is running already for one communicator and it is called from another thread for other communicator with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() will fail and the function will be executed for two different communicators by two threads simultaneously. There is nothing in the algorithm that prevent it from been running simultaneously for different communicators as far as I can see, but ompi_comm_unregister_cid() assumes that it is always called for a communicator with the lowest cid and this is not always the case. This patch removes bogus lowest cid check and fix ompi_comm_register_cid() to properly remove cid from the list. Text files modified: trunk/ompi/communicator/comm_cid.c |24 1 files changed, 12 insertions(+), 12 deletions(-) Modified: trunk/ompi/communicator/comm_cid.c == --- trunk/ompi/communicator/comm_cid.c (original) +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2006-2007 University of Houston. All rights reserved. * Copyright (c) 2007 Cisco, Inc. All rights reserved. + * Copyright (c) 2007 Voltaire All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -170,15 +171,6 @@ * This is the real algorithm described in the doc */ -OPAL_THREAD_LOCK(_cid_lock); -if (comm->c_contextid != ompi_comm_lowest_cid() ) { -/* if not
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
We don't want to prevent two thread from entering the code is same time. The algorithm you cited support this case. There is only one moment that is critical. The local selection of the next available cid. And this is what we try to protect there. If after the first run, the collective call do not manage to figure out the correct next_cid then we will execute the while loop again. And then this condition make sense, as only the thread running on the smallest communicator cid will continue. This insure that it will pickup the smallest next available cid, and then it's reduce operation will succeed. The other threads will wait until the selection of the next available cid is unlocked. Without the code you removed we face a deadlock situation. Multiple threads will pick different next_cid on each process and thy will never succeed with the reduce operation. And this is what we're trying to avoid with the test. george. On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: Gleb, This patch is not correct. The code preventing the registration of the same communicator twice is later in the code (same file in the function ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid I saw this code and the comment. The problem is not with the same communicator but with different communicators. is called, we know that each communicator only handle one "communicator creation" function at the same time. Therefore, we want to give priority to the smallest com_id, which is what happens in the code you removed. The code I removed was doing it wrongly. I.e the algorithm sometimes is executed for different communicators simultaneously by different threads. Think about the case where the function is running for cid 1 and then another thread runs it for cid 0. cid 0 will proceed although the function is executed on another CPU. And this is not something theoretical, that is happening with sun's thread test suit mpi_coll test case. Without the condition in the ompi_comm_register_cid (each communicator only get registered once) your comment make sense. However, with the condition your patch allow a dead end situation, while 2 processes try to create communicators in multiple threads, and they will never succeed, simply because they will not order the creation based on the com_id. If the algorithm is really prone to deadlock in case it is concurrently executed for several different communicators (I haven't check this), then we may want to fix original code to really prevent two threads to enter the function, but then I don't see the reason for all those complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() The algorithm described here: http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp:// info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator +dup+algorithm=en=clnk=2 in section 5.3 works without it and we can do something similar. george. On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: Author: gleb Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) New Revision: 16088 URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 Log: The code tries to prevent itself from running for more then one communicator simultaneously, but is doing it incorrectly. If the function is running already for one communicator and it is called from another thread for other communicator with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() will fail and the function will be executed for two different communicators by two threads simultaneously. There is nothing in the algorithm that prevent it from been running simultaneously for different communicators as far as I can see, but ompi_comm_unregister_cid() assumes that it is always called for a communicator with the lowest cid and this is not always the case. This patch removes bogus lowest cid check and fix ompi_comm_register_cid() to properly remove cid from the list. Text files modified: trunk/ompi/communicator/comm_cid.c |24 +++ + 1 files changed, 12 insertions(+), 12 deletions(-) Modified: trunk/ompi/communicator/comm_cid.c == --- trunk/ompi/communicator/comm_cid.c (original) +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2006-2007 University of Houston. All rights reserved. * Copyright (c) 2007 Cisco, Inc. All rights reserved. + * Copyright (c) 2007 Voltaire All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -170,15 +171,6 @@ * This is the real algorithm described in the doc */ -OPAL_THREAD_LOCK(_cid_lock); -if (comm->c_contextid !=
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: > Gleb, > > This patch is not correct. The code preventing the registration of the same > communicator twice is later in the code (same file in the function > ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid I saw this code and the comment. The problem is not with the same communicator but with different communicators. > is called, we know that each communicator only handle one "communicator > creation" function at the same time. Therefore, we want to give priority to > the smallest com_id, which is what happens in the code you removed. The code I removed was doing it wrongly. I.e the algorithm sometimes is executed for different communicators simultaneously by different threads. Think about the case where the function is running for cid 1 and then another thread runs it for cid 0. cid 0 will proceed although the function is executed on another CPU. And this is not something theoretical, that is happening with sun's thread test suit mpi_coll test case. > > Without the condition in the ompi_comm_register_cid (each communicator only > get registered once) your comment make sense. However, with the condition > your patch allow a dead end situation, while 2 processes try to create > communicators in multiple threads, and they will never succeed, simply > because they will not order the creation based on the com_id. If the algorithm is really prone to deadlock in case it is concurrently executed for several different communicators (I haven't check this), then we may want to fix original code to really prevent two threads to enter the function, but then I don't see the reason for all those complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() The algorithm described here: http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm=en=clnk=2 in section 5.3 works without it and we can do something similar. > > george. > > > > On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: > >> Author: gleb >> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) >> New Revision: 16088 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 >> >> Log: >> The code tries to prevent itself from running for more then one >> communicator >> simultaneously, but is doing it incorrectly. If the function is running >> already >> for one communicator and it is called from another thread for other >> communicator >> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() >> will fail and the function will be executed for two different >> communicators by >> two threads simultaneously. There is nothing in the algorithm that prevent >> it >> from been running simultaneously for different communicators as far as I >> can see, >> but ompi_comm_unregister_cid() assumes that it is always called for a >> communicator >> with the lowest cid and this is not always the case. This patch removes >> bogus >> lowest cid check and fix ompi_comm_register_cid() to properly remove cid >> from >> the list. >> >> Text files modified: >>trunk/ompi/communicator/comm_cid.c |24 >>1 files changed, 12 insertions(+), 12 deletions(-) >> >> Modified: trunk/ompi/communicator/comm_cid.c >> == >> --- trunk/ompi/communicator/comm_cid.c (original) >> +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, >> 11 >> Sep 2007) >> @@ -11,6 +11,7 @@ >> * All rights reserved. >> * Copyright (c) 2006-2007 University of Houston. All rights reserved. >> * Copyright (c) 2007 Cisco, Inc. All rights reserved. >> + * Copyright (c) 2007 Voltaire All rights reserved. >> * $COPYRIGHT$ >> * >> * Additional copyrights may follow >> @@ -170,15 +171,6 @@ >> * This is the real algorithm described in the doc >> */ >> >> -OPAL_THREAD_LOCK(_cid_lock); >> -if (comm->c_contextid != ompi_comm_lowest_cid() ) { >> -/* if not lowest cid, we do not continue, but sleep and >> try again */ >> -OPAL_THREAD_UNLOCK(_cid_lock); >> -continue; >> -} >> -OPAL_THREAD_UNLOCK(_cid_lock); >> - >> - >> for (i=start; i < mca_pml.pml_max_contextid ; i++) { >> >> flag=ompi_pointer_array_test_and_set_item(_mpi_communicators, >>i, comm); >> @@ -365,10 +357,18 @@ >> >> static int ompi_comm_unregister_cid (uint32_t cid) >> { >> -ompi_comm_reg_t *regcom=NULL; >> -opal_list_item_t >> *item=opal_list_remove_first(_registered_comms); >> +ompi_comm_reg_t *regcom; >> +opal_list_item_t *item; >> >> -regcom = (ompi_comm_reg_t *) item; >> +for (item = opal_list_get_first(_registered_comms);
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088
Gleb, This patch is not correct. The code preventing the registration of the same communicator twice is later in the code (same file in the function ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid is called, we know that each communicator only handle one "communicator creation" function at the same time. Therefore, we want to give priority to the smallest com_id, which is what happens in the code you removed. Without the condition in the ompi_comm_register_cid (each communicator only get registered once) your comment make sense. However, with the condition your patch allow a dead end situation, while 2 processes try to create communicators in multiple threads, and they will never succeed, simply because they will not order the creation based on the com_id. george. On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: Author: gleb Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) New Revision: 16088 URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 Log: The code tries to prevent itself from running for more then one communicator simultaneously, but is doing it incorrectly. If the function is running already for one communicator and it is called from another thread for other communicator with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() will fail and the function will be executed for two different communicators by two threads simultaneously. There is nothing in the algorithm that prevent it from been running simultaneously for different communicators as far as I can see, but ompi_comm_unregister_cid() assumes that it is always called for a communicator with the lowest cid and this is not always the case. This patch removes bogus lowest cid check and fix ompi_comm_register_cid() to properly remove cid from the list. Text files modified: trunk/ompi/communicator/comm_cid.c |24 1 files changed, 12 insertions(+), 12 deletions(-) Modified: trunk/ompi/communicator/comm_cid.c == --- trunk/ompi/communicator/comm_cid.c (original) +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2006-2007 University of Houston. All rights reserved. * Copyright (c) 2007 Cisco, Inc. All rights reserved. + * Copyright (c) 2007 Voltaire All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -170,15 +171,6 @@ * This is the real algorithm described in the doc */ -OPAL_THREAD_LOCK(_cid_lock); -if (comm->c_contextid != ompi_comm_lowest_cid() ) { -/* if not lowest cid, we do not continue, but sleep and try again */ -OPAL_THREAD_UNLOCK(_cid_lock); -continue; -} -OPAL_THREAD_UNLOCK(_cid_lock); - - for (i=start; i < mca_pml.pml_max_contextid ; i++) { flag=ompi_pointer_array_test_and_set_item (_mpi_communicators, i, comm); @@ -365,10 +357,18 @@ static int ompi_comm_unregister_cid (uint32_t cid) { -ompi_comm_reg_t *regcom=NULL; -opal_list_item_t *item=opal_list_remove_first (_registered_comms); +ompi_comm_reg_t *regcom; +opal_list_item_t *item; -regcom = (ompi_comm_reg_t *) item; +for (item = opal_list_get_first(_registered_comms); + item != opal_list_get_end(_registered_comms); + item = opal_list_get_next(item)) { +regcom = (ompi_comm_reg_t *)item; +if(regcom->cid == cid) { +opal_list_remove_item(_registered_comms, item); +break; +} +} OBJ_RELEASE(regcom); return OMPI_SUCCESS; } ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full smime.p7s Description: S/MIME cryptographic signature