Ethan --
Could you show the use case that motivated this change?
Thanks.
On Sep 7, 2007, at 11:52 AM, emall...@osl.iu.edu wrote:
Author: emallove
Date: 2007-09-07 11:52:04 EDT (Fri, 07 Sep 2007)
New Revision: 998
URL: https://svn.open-mpi.org/trac/mtt/changeset/998
Log:
Escape the Perl
Gleb,
This patch is not correct. The code preventing the registration of
the same communicator twice is later in the code (same file in the
function ompi_comm_register_cid line 326). Once the function
ompi_comm_register_cid is called, we know that each communicator only
handle one
On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
> Gleb,
>
> This patch is not correct. The code preventing the registration of the same
> communicator twice is later in the code (same file in the function
> ompi_comm_register_cid line 326). Once the function
We don't want to prevent two thread from entering the code is same
time. The algorithm you cited support this case. There is only one
moment that is critical. The local selection of the next available
cid. And this is what we try to protect there. If after the first
run, the collective
Gleb,
in the scenario which you describe in the comment to the patch, what
should happen is, that the communicator with the cid which started
already the allreduce will basically 'hang' until the other processes
'allow' the lower cids to continue. It should basically be blocked in
the
On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote:
> Gleb,
>
> in the scenario which you describe in the comment to the patch, what
> should happen is, that the communicator with the cid which started
> already the allreduce will basically 'hang' until the other processes
> 'allow'
On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote:
On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
We don't want to prevent two thread from entering the code is same
time.
The algorithm you cited support this case. There is only one
moment that is
Are you sure it support
On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote:
>
> On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote:
>
>> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
>>> We don't want to prevent two thread from entering the code is same time.
>>> The algorithm you cited
Gleb Natapov wrote:
On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote:
Gleb,
in the scenario which you describe in the comment to the patch, what
should happen is, that the communicator with the cid which started
already the allreduce will basically 'hang' until the other
David fixed a problem this morning that Coverity wasn't quite running
right because the directory where OMPI lived was changing every
night. So a few of the old runs were pruned.
--
Jeff Squyres
Cisco Systems
Hi Aurelien,
Thank you for the pointers. I was able to plug in a component to an
existing framework.
Thanks again,
Sajjad
Aurelien Bouteiller
Sent by: devel-boun...@open-mpi.org
09/08/07 01:34 PM
Please respond to
Open MPI Developers
To
Open MPI
First off, I've managed to reproduce this with nbcbench using only 16
procs (two per node), and setting btl_ofud_sd_num to 12 -- eases
debugging with fewer procs to look at.
ompi_coll_tuned_alltoall_intra_basic_linear is the alltoall routine that
is being called. What I'm seeing from
On Sep 8, 2007, at 2:33 PM, Aurelien Bouteiller wrote:
I agree (b) is not a good idea. However I am not very pleased by (a)
either. It totally prevent any process Fault Tolerant mechanism if we
go that way. If we plan to add some failure detection mechanism to
RTE and failure management (to
Sounds great to me.
Aurelien
Le 11 sept. 07 à 13:03, Jeff Squyres a écrit :
If you genericize the concept, I think it's compatible with FT:
1. during MPI_INIT, one of the MPI processes can request a "notify"
exit pattern for the job: a process must notify the RTE before it
actually exits
I am curious which tests are being used when running tests on larger
clusters. And by larger clusters, I mean anything with np > 128.
(Although I realize that is not very large, but it is bigger than most
of the clusters I assume tests are being run on)
I ask this because I planned on using
15 matches
Mail list logo