Re: [OMPI devel] [RFC] mca_base_select()
Excellent! Thanks Josh - both for the original work/commit and for the quick fix! Ralph On 5/6/08 3:58 PM, "Josh Hursey"wrote: > Sorry about that. Looking back at the filem logic it seems that I > returned success even if select failed (and just use the 'none' > passthrough component). I committed a patch in r18389 that fixes this > problem. > > This commit now has a warning that prints on the filem verbose stream > so if a user hits something like this in the wild unexpectedly then > we can help them debug it a bit. > > Cheers, > Josh > > > On May 6, 2008, at 2:56 PM, Ralph H Castain wrote: > >> Hmmmwell, I hit a problem (of course!). I have mca-no-build on >> the filem >> framework on my Mac. If I just mpriun -n 3 ./hello, I get the >> following >> error: >> >> -- >> >> It looks like orte_init failed for some reason; your parallel >> process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_filem_base_select failed >> --> Returned value Error (-1) instead of ORTE_SUCCESS >> >> -- >> >> >> After looking at the source code for filem_select, I can run just >> fine if I >> specify -mca filem none on the cmd line. Otherwise, it looks like your >> select logic insists that at least one component must be built and >> selectable? >> >> Is that generally true, or is your filem framework the exception? I >> think >> this would not be a good general requirement - frankly, I don't >> think it is >> good for any framework to have such a requirement. >> >> Ralph >> >> >> >> On 5/6/08 12:09 PM, "Josh Hursey" wrote: >> >>> This has been committed in r18381 >>> >>> Please let me know if you have any problems with this commit. >>> >>> Cheers, >>> Josh >>> >>> On May 5, 2008, at 10:41 AM, Josh Hursey wrote: >>> Awesome. The branch is updated to the latest trunk head. I encourage folks to check out this repository and make sure that it builds on their system. A normal build of the branch should be enough to find out if there are any cut-n-paste problems (though I tried to be careful, mistakes do happen). I haven't heard any problems so this is looking like it will come in tomorrow after the teleconf. I'll ask again there to see if there are any voices of concern. Cheers, Josh On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: > This all sounds good to me! > > On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: > >> What: Add mca_base_select() and adjust frameworks & components to >> use >> it. >> Why: Consolidation of code for general goodness. >> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play >> When: Code ready now. Documentation ready soon. >> Timeout: May 6, 2008 (After teleconf) [1 week] >> >> Discussion: >> --- >> For a number of years a few developers have been talking about >> creating a MCA base component selection function. For various >> reasons >> this was never implemented. Recently I decided to give it a try. >> >> A base select function will allow Open MPI to provide completely >> consistent selection behavior for many of its frameworks (18 of 31 >> to >> be exact at the moment). The primary goal of this work is to >> improving >> code maintainability through code reuse. Other benefits also >> result >> such as a slightly smaller memory footprint. >> >> The mca_base_select() function represented the most commonly used >> logic for component selection: Select the one component with the >> highest priority and close all of the not selected components. >> This >> function can be found at the path below in the branch: >> opal/mca/base/mca_base_components_select.c >> >> To support this I had to formalize a query() function in the >> mca_base_component_t of the form: >> int mca_base_query_component_fn(mca_base_module_t **module, int >> *priority); >> >> This function is specified after the open and close component >> functions in this structure as to allow compatibility with >> frameworks >> that do not use the base selection logic. Frameworks that do *not* >> use >> this function are *not* effected by this commit. However, every >> component in the frameworks that use the mca_base_select function >> must >> adjust their component query function to fit that specified above. >> >> 18 frameworks in
Re: [OMPI devel] [RFC] mca_base_select()
Sorry about that. Looking back at the filem logic it seems that I returned success even if select failed (and just use the 'none' passthrough component). I committed a patch in r18389 that fixes this problem. This commit now has a warning that prints on the filem verbose stream so if a user hits something like this in the wild unexpectedly then we can help them debug it a bit. Cheers, Josh On May 6, 2008, at 2:56 PM, Ralph H Castain wrote: Hmmmwell, I hit a problem (of course!). I have mca-no-build on the filem framework on my Mac. If I just mpriun -n 3 ./hello, I get the following error: -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_filem_base_select failed --> Returned value Error (-1) instead of ORTE_SUCCESS -- After looking at the source code for filem_select, I can run just fine if I specify -mca filem none on the cmd line. Otherwise, it looks like your select logic insists that at least one component must be built and selectable? Is that generally true, or is your filem framework the exception? I think this would not be a good general requirement - frankly, I don't think it is good for any framework to have such a requirement. Ralph On 5/6/08 12:09 PM, "Josh Hursey"wrote: This has been committed in r18381 Please let me know if you have any problems with this commit. Cheers, Josh On May 5, 2008, at 10:41 AM, Josh Hursey wrote: Awesome. The branch is updated to the latest trunk head. I encourage folks to check out this repository and make sure that it builds on their system. A normal build of the branch should be enough to find out if there are any cut-n-paste problems (though I tried to be careful, mistakes do happen). I haven't heard any problems so this is looking like it will come in tomorrow after the teleconf. I'll ask again there to see if there are any voices of concern. Cheers, Josh On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: This all sounds good to me! On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: What: Add mca_base_select() and adjust frameworks & components to use it. Why: Consolidation of code for general goodness. Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play When: Code ready now. Documentation ready soon. Timeout: May 6, 2008 (After teleconf) [1 week] Discussion: --- For a number of years a few developers have been talking about creating a MCA base component selection function. For various reasons this was never implemented. Recently I decided to give it a try. A base select function will allow Open MPI to provide completely consistent selection behavior for many of its frameworks (18 of 31 to be exact at the moment). The primary goal of this work is to improving code maintainability through code reuse. Other benefits also result such as a slightly smaller memory footprint. The mca_base_select() function represented the most commonly used logic for component selection: Select the one component with the highest priority and close all of the not selected components. This function can be found at the path below in the branch: opal/mca/base/mca_base_components_select.c To support this I had to formalize a query() function in the mca_base_component_t of the form: int mca_base_query_component_fn(mca_base_module_t **module, int *priority); This function is specified after the open and close component functions in this structure as to allow compatibility with frameworks that do not use the base selection logic. Frameworks that do *not* use this function are *not* effected by this commit. However, every component in the frameworks that use the mca_base_select function must adjust their component query function to fit that specified above. 18 frameworks in Open MPI have been changed. I have updated all of the components in the 18 frameworks available in the trunk on my branch. The effected frameworks are: - OPAL Carto - OPAL crs - OPAL maffinity - OPAL memchecker - OPAL paffinity - ORTE errmgr - ORTE ess - ORTE Filem - ORTE grpcomm - ORTE odls - ORTE pml - ORTE ras - ORTE rmaps - ORTE routed - ORTE snapc - OMPI crcp - OMPI dpm - OMPI pubsub There was a question of the memory footprint change as a result of this commit. I used 'pmap' to determine process memory footprint of a hello world MPI program. Static and Shared build numbers are below along with variations on launching locally and to a single node allocated by SLURM. All of this was on Indiana University's Odin machine. We
Re: [OMPI devel] [RFC] mca_base_select()
Hmmmwell, I hit a problem (of course!). I have mca-no-build on the filem framework on my Mac. If I just mpriun -n 3 ./hello, I get the following error: -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_filem_base_select failed --> Returned value Error (-1) instead of ORTE_SUCCESS -- After looking at the source code for filem_select, I can run just fine if I specify -mca filem none on the cmd line. Otherwise, it looks like your select logic insists that at least one component must be built and selectable? Is that generally true, or is your filem framework the exception? I think this would not be a good general requirement - frankly, I don't think it is good for any framework to have such a requirement. Ralph On 5/6/08 12:09 PM, "Josh Hursey"wrote: > This has been committed in r18381 > > Please let me know if you have any problems with this commit. > > Cheers, > Josh > > On May 5, 2008, at 10:41 AM, Josh Hursey wrote: > >> Awesome. >> >> The branch is updated to the latest trunk head. I encourage folks to >> check out this repository and make sure that it builds on their >> system. A normal build of the branch should be enough to find out if >> there are any cut-n-paste problems (though I tried to be careful, >> mistakes do happen). >> >> I haven't heard any problems so this is looking like it will come in >> tomorrow after the teleconf. I'll ask again there to see if there are >> any voices of concern. >> >> Cheers, >> Josh >> >> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: >> >>> This all sounds good to me! >>> >>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: >>> What: Add mca_base_select() and adjust frameworks & components to use it. Why: Consolidation of code for general goodness. Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play When: Code ready now. Documentation ready soon. Timeout: May 6, 2008 (After teleconf) [1 week] Discussion: --- For a number of years a few developers have been talking about creating a MCA base component selection function. For various reasons this was never implemented. Recently I decided to give it a try. A base select function will allow Open MPI to provide completely consistent selection behavior for many of its frameworks (18 of 31 to be exact at the moment). The primary goal of this work is to improving code maintainability through code reuse. Other benefits also result such as a slightly smaller memory footprint. The mca_base_select() function represented the most commonly used logic for component selection: Select the one component with the highest priority and close all of the not selected components. This function can be found at the path below in the branch: opal/mca/base/mca_base_components_select.c To support this I had to formalize a query() function in the mca_base_component_t of the form: int mca_base_query_component_fn(mca_base_module_t **module, int *priority); This function is specified after the open and close component functions in this structure as to allow compatibility with frameworks that do not use the base selection logic. Frameworks that do *not* use this function are *not* effected by this commit. However, every component in the frameworks that use the mca_base_select function must adjust their component query function to fit that specified above. 18 frameworks in Open MPI have been changed. I have updated all of the components in the 18 frameworks available in the trunk on my branch. The effected frameworks are: - OPAL Carto - OPAL crs - OPAL maffinity - OPAL memchecker - OPAL paffinity - ORTE errmgr - ORTE ess - ORTE Filem - ORTE grpcomm - ORTE odls - ORTE pml - ORTE ras - ORTE rmaps - ORTE routed - ORTE snapc - OMPI crcp - OMPI dpm - OMPI pubsub There was a question of the memory footprint change as a result of this commit. I used 'pmap' to determine process memory footprint of a hello world MPI program. Static and Shared build numbers are below along with variations on launching locally and to a single node allocated by SLURM. All of this was on Indiana University's Odin machine. We compare against the trunk (r18276)
Re: [OMPI devel] [RFC] mca_base_select()
This has been committed in r18381 Please let me know if you have any problems with this commit. Cheers, Josh On May 5, 2008, at 10:41 AM, Josh Hursey wrote: Awesome. The branch is updated to the latest trunk head. I encourage folks to check out this repository and make sure that it builds on their system. A normal build of the branch should be enough to find out if there are any cut-n-paste problems (though I tried to be careful, mistakes do happen). I haven't heard any problems so this is looking like it will come in tomorrow after the teleconf. I'll ask again there to see if there are any voices of concern. Cheers, Josh On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: This all sounds good to me! On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: What: Add mca_base_select() and adjust frameworks & components to use it. Why: Consolidation of code for general goodness. Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play When: Code ready now. Documentation ready soon. Timeout: May 6, 2008 (After teleconf) [1 week] Discussion: --- For a number of years a few developers have been talking about creating a MCA base component selection function. For various reasons this was never implemented. Recently I decided to give it a try. A base select function will allow Open MPI to provide completely consistent selection behavior for many of its frameworks (18 of 31 to be exact at the moment). The primary goal of this work is to improving code maintainability through code reuse. Other benefits also result such as a slightly smaller memory footprint. The mca_base_select() function represented the most commonly used logic for component selection: Select the one component with the highest priority and close all of the not selected components. This function can be found at the path below in the branch: opal/mca/base/mca_base_components_select.c To support this I had to formalize a query() function in the mca_base_component_t of the form: int mca_base_query_component_fn(mca_base_module_t **module, int *priority); This function is specified after the open and close component functions in this structure as to allow compatibility with frameworks that do not use the base selection logic. Frameworks that do *not* use this function are *not* effected by this commit. However, every component in the frameworks that use the mca_base_select function must adjust their component query function to fit that specified above. 18 frameworks in Open MPI have been changed. I have updated all of the components in the 18 frameworks available in the trunk on my branch. The effected frameworks are: - OPAL Carto - OPAL crs - OPAL maffinity - OPAL memchecker - OPAL paffinity - ORTE errmgr - ORTE ess - ORTE Filem - ORTE grpcomm - ORTE odls - ORTE pml - ORTE ras - ORTE rmaps - ORTE routed - ORTE snapc - OMPI crcp - OMPI dpm - OMPI pubsub There was a question of the memory footprint change as a result of this commit. I used 'pmap' to determine process memory footprint of a hello world MPI program. Static and Shared build numbers are below along with variations on launching locally and to a single node allocated by SLURM. All of this was on Indiana University's Odin machine. We compare against the trunk (r18276) representing the last SVN sync point of the branch. Process(shared)| Trunk| Branch | Diff (Improvement) ---+--+-+--- mpirun (orted) | 39976K | 36828K | 3148K hello (0) | 229288K | 229268K | 20K hello (1) | 229288K | 229268K | 20K ---+--+-+--- mpirun | 40032K | 37924K | 2108K orted | 34720K | 34660K | 60K hello (0) | 228404K | 228384K | 20K hello (1) | 228404K | 228384K | 20K Process(static)| Trunk| Branch | Diff (Improvement) ---+--+-+--- mpirun (orted) | 21384K | 21372K | 12K hello (0) | 194000K | 193980K | 20K hello (1) | 194000K | 193980K | 20K ---+--+-+--- mpirun | 21384K | 21372K | 12K orted | 21208K | 21196K | 12K hello (0) | 193116K | 193096K | 20K hello (1) | 193116K | 193096K | 20K As you can see there are some small memory footprint improvements on my branch that result from this work. The size of the Open MPI project shrinks a bit as well. This commit cuts between 3,500 and 2,000 lines of code (depending on how you count) so about a ~1% code shrink. The branch is stable in all of the testing I have done, but there are some platforms on which I cannot test. So please give this branch a try and let me know if you find any problems. Cheers, Josh ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org
[OMPI devel] [RFC] mca_base_open() NULL
What: Add a MCA-NULL option to open no components in mca_base_open() Why: Sometimes we do not want to open or select any components of a framework. Where: patch attached for current trunk. When: Needs further discussion. Timeout: Unknown. [May 13, 2008 (After teleconf)?] Short Version: -- This RFC is intended to continue discussion on the thread started here: http://www.open-mpi.org/community/lists/devel/2008/05/3793.php Discussion should occur on list, but maybe try to come to some settlement on this RFC in the next week or two. Longer Version: --- Currently there is no way to express to the MCA system that absolutely no components of a framework are needed and therefore nothing should be opened. The addition of a sentinel value is needed to explicitly express this intention. It was suggested that if a 'MCA-NULL' value is passed as an argument for a framework then this should be taken to indicate such an intention. mca-null.diff Description: Binary data
Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown
In addition to Steve's comments, we discussed this on the call today and decided that the patch is fine. Jon and I will discuss further because this is the first instance of calling some form of "disconnect" on one side causes events to occur on the other side without the involvement from the remote OMPI (e.g., the remote side's OMPI layer simply hasn't called its "disconnect" flavor yet, but the kernel level transport/network stack will cause things to happen on the remote side anyway). On May 6, 2008, at 11:45 AM, Steve Wise wrote: Jeff Squyres wrote: On May 5, 2008, at 6:27 PM, Steve Wise wrote: I am seeing some unusual behavior during the shutdown phase of ompi at the end of my testcase. While running a IMB pingpong test over the rdmacm on openib, I get cq flush errors on my iWARP adapters. This error is happening because the remote node is still polling the endpoint while the other one shutdown. This occurs because iWARP puts the qps in error state when the channel is disconnected (IB does not do this). Since the cq is still being polled when the event is received on the remote node, ompi thinks it hit an error and kills the run. Since this is expected behavior on iWARP, this is not really an error case. The key here, I think is that when an iWARP QP moves out of RTS, all the RECVs and any pending SQ WRs get flushed. Further, disconnecting the iwarp connection forces the QP out of RTS. This is probably different than they way IB works. IE "disconnecting" in IB is an out-of-band exchange done by the IBCM. For iWARP, "disconnecting" is an in-band operation (a TCP close or abort) so the QP cannot remain in RTS during this process. Let me make sure I understand: - proc A calls del_procs on proc B - proc A calls ibv_destroy_qp() on QP to proc B Actually proc A calls rdma_disconnect() on QP to proc B - this causes a local (proc A) flush on all pending receives and SQ WRs - this then causes a FLUSH event to show up *in proc B* --> I'm not clear on this point from Jon's/Steve's text Yes. Once the connection is torn down the iwarp QPs will be flushed on both ends. - OMPI [currently] treats the FLUSH in proc B as an error Is that right? What is the purpose of the FLUSH event? In general, I think it is to allow the application to recover any resources that are allocated and cannot be touched until the WRs complete. For example, the buffers that were described in all the RECV WRs. If the app is going to exit, this isn't very interesting since everything will get cleaned up in the exit path. But if the process is long lived and setting up/tearing down connections, then these pending RECV buffers need to be reclaimed and put back into the buffer poll, as an example... There is a larger question regarding why the remote node is still polling the hca and not shutting down, but my immediate question is if it is an acceptable fix to simply disregard this "error" if it is an iWARP adapter. If proc B is still polling the hca, it is likely because it simply has not yet stopped doing it. I.e., a big problem in MPI implementations is that not all actions are exactly synchronous. MPI disconnects are *effectively* synchronous, but we probably didn't *guarantee* synchronicity in this case because we didn't need it (perhaps until now). Yes. Steve. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown
Jeff Squyres wrote: On May 5, 2008, at 6:27 PM, Steve Wise wrote: I am seeing some unusual behavior during the shutdown phase of ompi at the end of my testcase. While running a IMB pingpong test over the rdmacm on openib, I get cq flush errors on my iWARP adapters. This error is happening because the remote node is still polling the endpoint while the other one shutdown. This occurs because iWARP puts the qps in error state when the channel is disconnected (IB does not do this). Since the cq is still being polled when the event is received on the remote node, ompi thinks it hit an error and kills the run. Since this is expected behavior on iWARP, this is not really an error case. The key here, I think is that when an iWARP QP moves out of RTS, all the RECVs and any pending SQ WRs get flushed. Further, disconnecting the iwarp connection forces the QP out of RTS. This is probably different than they way IB works. IE "disconnecting" in IB is an out-of-band exchange done by the IBCM. For iWARP, "disconnecting" is an in-band operation (a TCP close or abort) so the QP cannot remain in RTS during this process. Let me make sure I understand: - proc A calls del_procs on proc B - proc A calls ibv_destroy_qp() on QP to proc B Actually proc A calls rdma_disconnect() on QP to proc B - this causes a local (proc A) flush on all pending receives and SQ WRs - this then causes a FLUSH event to show up *in proc B* --> I'm not clear on this point from Jon's/Steve's text Yes. Once the connection is torn down the iwarp QPs will be flushed on both ends. - OMPI [currently] treats the FLUSH in proc B as an error Is that right? What is the purpose of the FLUSH event? In general, I think it is to allow the application to recover any resources that are allocated and cannot be touched until the WRs complete. For example, the buffers that were described in all the RECV WRs. If the app is going to exit, this isn't very interesting since everything will get cleaned up in the exit path. But if the process is long lived and setting up/tearing down connections, then these pending RECV buffers need to be reclaimed and put back into the buffer poll, as an example... There is a larger question regarding why the remote node is still polling the hca and not shutting down, but my immediate question is if it is an acceptable fix to simply disregard this "error" if it is an iWARP adapter. If proc B is still polling the hca, it is likely because it simply has not yet stopped doing it. I.e., a big problem in MPI implementations is that not all actions are exactly synchronous. MPI disconnects are *effectively* synchronous, but we probably didn't *guarantee* synchronicity in this case because we didn't need it (perhaps until now). Yes. Steve.
Re: [OMPI devel] NO IP address found
I think the larger issue, though, is whether rdmacm will work properly for the LMC>0 case over IB, right? The fact that it shouldn't be displaying this error message now because RDMA CM is not the default is one issue, but it's not the *real* issue... On May 6, 2008, at 11:00 AM, Jon Mason wrote: On Tuesday 06 May 2008 09:41:53 am Jeff Squyres wrote: I actually don't know what the RDMA CM requires for the LMC>0 case -- does it require a unique IP address for every LID? It requires a unique IP address for every hca/port in use by rdmacm. I see the bug in rdmacm (since I don't believe you were trying to use rdmacm), and will have a patch out shortly. On May 6, 2008, at 5:09 AM, Lenny Verkhovsky wrote: Hi, running BW benchmark with btl_openib_max_lmc >= 2 couses warning ( MPI from the TRUNK ) #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca btl_openib_max_lmc 2 ./mpi_p_LMC -t bw -s 40 BW (40) (size min max avg) 40 321.493757 342.972837 329.493715 #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca btl_openib_max_lmc 3 ./mpi_p_LMC -t bw -s 40 [witch9][[7493,1],7][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],0][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],9][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],4][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],2][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],5][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],10][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],17][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],3][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],6][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],14][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],19][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],13][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],12][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],27][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],23][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],20][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],37][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],35][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],32][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],22][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],33][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],30][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],16][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],15][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],39][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],25][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],29][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],34][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message]
Re: [OMPI devel] NO IP address found
On Tuesday 06 May 2008 09:41:53 am Jeff Squyres wrote: > I actually don't know what the RDMA CM requires for the LMC>0 case -- > does it require a unique IP address for every LID? It requires a unique IP address for every hca/port in use by rdmacm. I see the bug in rdmacm (since I don't believe you were trying to use rdmacm), and will have a patch out shortly. > > On May 6, 2008, at 5:09 AM, Lenny Verkhovsky wrote: > > > Hi, > > > > running BW benchmark with btl_openib_max_lmc >= 2 couses warning > > ( MPI from the TRUNK ) > > > > > > #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca > > btl_openib_max_lmc 2 ./mpi_p_LMC -t bw -s 40 > > BW (40) (size min max avg) 40 321.493757 > > 342.972837 329.493715 > > > > #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca > > btl_openib_max_lmc 3 ./mpi_p_LMC -t bw -s 40 > > [witch9][[7493,1],7][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch2][[7493,1],0][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch10][[7493,1],9][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch6][[7493,1],4][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch4][[7493,1],2][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch7][[7493,1],5][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch2][[7493,1],10][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch9][[7493,1],17][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch5][[7493,1],3][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch8][[7493,1],6][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch6][[7493,1],14][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch10][[7493,1],19][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch5][[7493,1],13][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch4][[7493,1],12][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch9][[7493,1],27][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch5][[7493,1],23][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch2][[7493,1],20][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch9][[7493,1],37][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch7][[7493,1],35][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch4][[7493,1],32][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch4][[7493,1],22][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch5][[7493,1],33][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch2][[7493,1],30][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch8][[7493,1],16][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch7][[7493,1],15][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch10][[7493,1],39][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch7][[7493,1],25][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch10][[7493,1],29][../../../../../ompi/mca/btl/openib/connect/ > > btl_openib_connect_rdmacm.c:989:create_message] No IP address found > > [witch6][[7493,1],34][../../../../../ompi/mca/btl/openib/connect/ > >
Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown
On Tue, 6 May 2008, Jeff Squyres wrote: On May 5, 2008, at 6:27 PM, Steve Wise wrote: There is a larger question regarding why the remote node is still polling the hca and not shutting down, but my immediate question is if it is an acceptable fix to simply disregard this "error" if it is an iWARP adapter. If proc B is still polling the hca, it is likely because it simply has not yet stopped doing it. I.e., a big problem in MPI implementations is that not all actions are exactly synchronous. MPI disconnects are *effectively* synchronous, but we probably didn't *guarantee* synchronicity in this case because we didn't need it (perhaps until now). Not to mention... The BTL has to be able to handle a shutdown from one proc while still running its progression engine, as that's a normal sequence of events when dynamic processes are involved. Because of that, there wasn't too much care taken to ensure that everyone stopped polling, then everyone did del_procs. Brian
Re: [OMPI devel] NO IP address found
I actually don't know what the RDMA CM requires for the LMC>0 case -- does it require a unique IP address for every LID? On May 6, 2008, at 5:09 AM, Lenny Verkhovsky wrote: Hi, running BW benchmark with btl_openib_max_lmc >= 2 couses warning ( MPI from the TRUNK ) #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca btl_openib_max_lmc 2 ./mpi_p_LMC -t bw -s 40 BW (40) (size min max avg) 40 321.493757 342.972837 329.493715 #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca btl_openib_max_lmc 3 ./mpi_p_LMC -t bw -s 40 [witch9][[7493,1],7][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],0][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],9][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],4][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],2][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],5][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],10][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],17][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],3][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],6][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],14][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],19][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],13][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],12][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],27][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],23][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],20][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],37][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],35][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],32][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],22][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],33][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],30][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],16][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],15][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],39][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],25][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],29][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],34][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],26][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],24][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],36][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c:989:create_message] No IP address found BW (40) (size min max avg) 40 312.622582 334.037277 324.014814
Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown
On May 5, 2008, at 6:27 PM, Steve Wise wrote: I am seeing some unusual behavior during the shutdown phase of ompi at the end of my testcase. While running a IMB pingpong test over the rdmacm on openib, I get cq flush errors on my iWARP adapters. This error is happening because the remote node is still polling the endpoint while the other one shutdown. This occurs because iWARP puts the qps in error state when the channel is disconnected (IB does not do this). Since the cq is still being polled when the event is received on the remote node, ompi thinks it hit an error and kills the run. Since this is expected behavior on iWARP, this is not really an error case. The key here, I think is that when an iWARP QP moves out of RTS, all the RECVs and any pending SQ WRs get flushed. Further, disconnecting the iwarp connection forces the QP out of RTS. This is probably different than they way IB works. IE "disconnecting" in IB is an out-of-band exchange done by the IBCM. For iWARP, "disconnecting" is an in-band operation (a TCP close or abort) so the QP cannot remain in RTS during this process. Let me make sure I understand: - proc A calls del_procs on proc B - proc A calls ibv_destroy_qp() on QP to proc B - this causes a local (proc A) flush on all pending receives and SQ WRs - this then causes a FLUSH event to show up *in proc B* --> I'm not clear on this point from Jon's/Steve's text - OMPI [currently] treats the FLUSH in proc B as an error Is that right? What is the purpose of the FLUSH event? There is a larger question regarding why the remote node is still polling the hca and not shutting down, but my immediate question is if it is an acceptable fix to simply disregard this "error" if it is an iWARP adapter. If proc B is still polling the hca, it is likely because it simply has not yet stopped doing it. I.e., a big problem in MPI implementations is that not all actions are exactly synchronous. MPI disconnects are *effectively* synchronous, but we probably didn't *guarantee* synchronicity in this case because we didn't need it (perhaps until now). Opinions? If the openib btl (or the layers above) assume the "disconnect" will notify the remote rank that the connection should be finalized, then we must deal with FLUSHED WRs for the iwarp case. If some sort of "finalizing" is done by OMPI and then the connections disconnected, then that "finalizing" should include not polling the CQ anymore. But that's not what we observe. I'd have to check the exact shutdown sequence... -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Intel MPI Benchmark(IMB) using OpenMPI - Segmentation-fault error message.
On May 1, 2008, at 10:43 AM, Lenny Verkhovsky wrote: (a) I did modify make_mpich makefile present in IMB-3.1/src folder giving the path for openmpi. Here I am using same mpirun as built from openmpi(v-1.2.5) also did mention in PATH & LD_LIBRARY_PATH. That should be fine. (b) What is the command on console to run any new additional file with MPI API contents call. Do I need to add in Makefile.base of IMB-3.1/src folder or mentioning in console as a command it takes care alongwith "$mpirun IMB-MPI1" I don't understand this question... What exactly are you trying to do; modify the IMB benchmarks or write your own/new MPI application? (c) Does IMB-3.1 need INB(Infiniband) or TCP support to complete it's Benchmark routine call, means do I need to configure and build OpnMPI with Infiniband stack too? IMB is a set of benchmarks that can be run between 1 and more machines it calls for MPI API that does all the communication MPI decides how to run ( IB or TCP or shared memory ) according to priorities and all possible ways to be connected to another host. Lenny is right; in general Open MPI will decide what is the best network stack to use to communicate with a peer MPI process. So whether you build Open MPI with IB support and/or TCP support is up to you. Generally, you want to build Open MPI with support for your high speed network (e.g., IB) and let Open MPI use it for off-node communication (OMPI will usually use shared memory for communication between processes on the same node). -- Jeff Squyres Cisco Systems
[OMPI devel] NO IP address found
Hi, running BW benchmark with btl_openib_max_lmc >= 2 couses warning ( MPI from the TRUNK ) #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca btl_openib_max_lmc 2 ./mpi_p_LMC -t bw -s 40 BW (40) (size min max avg) 40 321.493757 342.972837 329.493715 #mpirun --bynode -np 40 -hostfile hostfile_ompi_arbel -mca btl_openib_max_lmc 3 ./mpi_p_LMC -t bw -s 40 [witch9][[7493,1],7][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],0][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],9][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],4][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],2][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],5][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],10][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],17][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],3][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],6][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],14][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],19][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],13][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],12][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],27][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],23][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],20][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch9][[7493,1],37][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],35][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],32][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch4][[7493,1],22][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch5][[7493,1],33][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch2][[7493,1],30][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],16][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],15][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],39][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch7][[7493,1],25][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch10][[7493,1],29][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],34][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],26][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch6][[7493,1],24][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found [witch8][[7493,1],36][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:989:create_message] No IP address found BW (40) (size min max avg) 40 312.622582 334.037277 324.014814 using -mca btl openib,self causes warning with LMC >=10 Best regards Lenny.