Sorry typo 314 not 313, Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985
On May 17, 2011, at 2:02 PM, Brock Palen wrote: > Thanks, I though of looking at ompi_info after I sent that note sigh. > > SEND_INPLACE appears to help performance of larger messages in my synthetic > benchmarks over regular SEND. Also it appears that SEND_INPLACE still allows > our code to run. > > We working on getting devs access to our system and code. > > Brock Palen > www.umich.edu/~brockp > Center for Advanced Computing > bro...@umich.edu > (734)936-1985 > > > > On May 16, 2011, at 11:49 AM, George Bosilca wrote: > >> Here is the output of the "ompi_info --param btl openib": >> >> MCA btl: parameter "btl_openib_flags" (current value: <306>, >> data >> source: default value) >> BTL bit flags (general flags: SEND=1, PUT=2, GET=4, >> SEND_INPLACE=8, RDMA_MATCHED=64, >> HETEROGENEOUS_RDMA=256; flags >> only used by the "dr" PML (ignored by others): >> ACK=16, >> CHECKSUM=32, RDMA_COMPLETION=128; flags only used by >> the "bfo" >> PML (ignored by others): FAILOVER_SUPPORT=512) >> >> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of >> these flags are totally useless in the current version of Open MPI (DR is >> not supported), so the only value that really matter is SEND | >> HETEROGENEOUS_RDMA. >> >> If you want to enable the send protocol try first with SEND | SEND_INPLACE >> (9), if not downgrade to SEND (1) >> >> george. >> >> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote: >> >>> >>> On May 16, 2011, at 8:53 AM, Brock Palen wrote: >>> >>>> >>>> >>>> >>>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote: >>>> >>>>> Hi, >>>>> >>>>> Just out of curiosity - what happens when you add the following MCA >>>>> option to your openib runs? >>>>> >>>>> -mca btl_openib_flags 305 >>>> >>>> You Sir found the magic combination. >>> >>> :-) - cool. >>> >>> Developers - does this smell like a registered memory availability hang? >>> >>>> I verified this lets IMB and CRASH progress pass their lockup points, >>>> I will have a user test this, >>> >>> Please let us know what you find. >>> >>>> Is this an ok option to put in our environment? What does 305 mean? >>> >>> There may be a performance hit associated with this configuration, but if >>> it lets your users run, then I don't see a problem with adding it to your >>> environment. >>> >>> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on >>> SEND. >>> >>> OpenFabrics gurus - please correct me if I'm wrong :-). >>> >>> Samuel Gutierrez >>> Los Alamos National Laboratory >>> >>> >>>> >>>> >>>> Brock Palen >>>> www.umich.edu/~brockp >>>> Center for Advanced Computing >>>> bro...@umich.edu >>>> (734)936-1985 >>>> >>>>> >>>>> Thanks, >>>>> >>>>> Samuel Gutierrez >>>>> Los Alamos National Laboratory >>>>> >>>>> On May 13, 2011, at 2:38 PM, Brock Palen wrote: >>>>> >>>>>> On May 13, 2011, at 4:09 PM, Dave Love wrote: >>>>>> >>>>>>> Jeff Squyres <jsquy...@cisco.com> writes: >>>>>>> >>>>>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote: >>>>>>>> >>>>>>>>> We can reproduce it with IMB. We could provide access, but we'd have >>>>>>>>> to >>>>>>>>> negotiate with the owners of the relevant nodes to give you >>>>>>>>> interactive >>>>>>>>> access to them. Maybe Brock's would be more accessible? (If you >>>>>>>>> contact me, I may not be able to respond for a few days.) >>>>>>>> >>>>>>>> Brock has replied off-list that he, too, is able to reliably reproduce >>>>>>>> the issue with IMB, and is working to get access for us. Many thanks >>>>>>>> for your offer; let's see where Brock's access takes us. >>>>>>> >>>>>>> Good. Let me know if we could be useful >>>>>>> >>>>>>>>>> -- we have not closed this issue, >>>>>>>>> >>>>>>>>> Which issue? I couldn't find a relevant-looking one. >>>>>>>> >>>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/2714 >>>>>>> >>>>>>> Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on >>>>>>> connectx with more than one collective I can't recall. >>>>>> >>>>>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1, >>>>>> well that doesn't help here, both my production code (crash) and IMB >>>>>> still hang. >>>>>> >>>>>> >>>>>> Brock Palen >>>>>> www.umich.edu/~brockp >>>>>> Center for Advanced Computing >>>>>> bro...@umich.edu >>>>>> (734)936-1985 >>>>>> >>>>>>> >>>>>>> -- >>>>>>> Excuse the typping -- I have a broken wrist >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> George Bosilca >> Research Assistant Professor >> Innovative Computing Laboratory >> Department of Electrical Engineering and Computer Science >> University of Tennessee, Knoxville >> http://web.eecs.utk.edu/~bosilca/ >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >