Isn't it possible for an MPI spawn on the same node, with the sockets provider 
(or something with a reasonably simple addressing scheme) to give a duplicate 
address? 

-- Jim
 
On 3/20/18, 2:07 PM, "ofiwg on behalf of Blocksome, Michael" 
<[email protected] on behalf of [email protected]> 
wrote:

    ew .. MPI spawn.
    
    Again, how can a rank - even if it is yet to be attached to a MPI job - get 
the same fabric endpoint address from its OFI provider as some other rank in 
the system? Is this spawn test doing something crazy like 
attach-detach-attach-detach-etc and a previous address is not being removed 
properly before the next (same) address is inserted again?
    
    I guess I don't understand the intricacies of this MPI spawn problem, and 
it's difficult for me to believe the statement "It is apparently non-trivial 
for the apps to avoid duplicate insertions" without this understanding. But, to 
me, this seems like applications/middleware just shouldn't be inserting a 
fabric endpoint address twice ... at least for HPC/MPI anyway. But maybe this 
duplicate insert scenario can still happen in a data center environment?
    
    -----Original Message-----
    From: Hefty, Sean 
    Sent: Tuesday, March 20, 2018 1:38 PM
    To: Blocksome, Michael <[email protected]>; 
[email protected]
    Subject: RE: inserting duplicate addresses into an AV
    
    The failures are related to MPI spawn tests.  This happens with Intel MPI, 
but I suspect MPICH or other MPIs may have similar problems with this test.
    
    
    > -----Original Message-----
    > From: Blocksome, Michael
    > Sent: Tuesday, March 20, 2018 11:29 AM
    > To: Hefty, Sean <[email protected]>; [email protected]
    > Subject: RE: inserting duplicate addresses into an AV
    > 
    > Which application, or which MPI, is inserting duplicate addresses? I
    > don't see how MPI could be doing this. At least the MPI
    > implementations I'm familiar with use PMI1, PMI2, or PMIx to exchange
    > addresses at job startup into a distributed key-value store, and then
    > after a barrier each MPI rank initializes its av with all these unique
    > addresses. For a duplicate address to happen multiple MPI ranks would
    > have to get the *same* local address from the OFI provider - how would
    > that happen?
    > 
    > Some providers, like bgq, can stuff all the fabric address information
    > within the 64 bits of fi_addr_t, which basically makes the
    > fi_av_insert() call a noop in FI_AV_MAP mode. So if this duplicate
    > address problem happened on bgq it would still "just work" from the
    > provider's perspective. Now MPI (or whatever is using the provider)
    > might get messed up because of it, but the fabric communication
    > operations would still work.
    > 
    > Mike
    > 
    > -----Original Message-----
    > From: ofiwg [mailto:[email protected]] On Behalf Of
    > Hefty, Sean
    > Sent: Tuesday, March 20, 2018 11:54 AM
    > To: [email protected]
    > Subject: [ofiwg] inserting duplicate addresses into an AV
    > 
    > MPI is hitting into an issue that is the result of inserting the same
    > address into an AV more than once.  There is no defined behavior for
    > what a provider should do in this case.  At least one provider allows
    > the duplicate insertion, and at least one fails the call... and
    > neither work with MPI when this occurs.  :/
    > 
    > There are a couple of problems trying to define this.  In the case of
    > the provider that fails the call, the failure is detected when
    > attempting to insert the same address into a hash table.  However, not
    > all providers are easily able to detect duplicates.  Forcing them to
    > do so _may_ require the provider to perform a linear search over the
    > AV looking for a duplicate for every address that is inserted.  At
    > scale, this is a significant overhead.
    > 
    > Even if the decision is made to force detecting duplicates (maybe even
    > making this an AV option), there's the question of how a provider
    > should respond.  Should it insert the address twice -- creating a new
    > fi_addr for it, discard the duplicate -- and return the existing
    > fi_addr, or generate an error.  And does it matter if AV_TABLE or MAP
    > is used?
    > 
    > We need to know what applications need here, and how difficult it will
    > be for providers to detect duplicates.  It is apparently non-trivial
    > for the apps to avoid duplicate insertions.
    > 
    > - Sean
    > 
    > _______________________________________________
    > ofiwg mailing list
    > [email protected]
    > http://lists.openfabrics.org/mailman/listinfo/ofiwg
    _______________________________________________
    ofiwg mailing list
    [email protected]
    http://lists.openfabrics.org/mailman/listinfo/ofiwg
    

_______________________________________________
ofiwg mailing list
[email protected]
http://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to