Re: [OMPI devel] binding with MCA parameters: broken or user error?
Try adding -display-devel-map to your cmd line so you can see what OMPI thinks the binding and mapping policy is set to - that'll tell you if the problem is in the mapping or in the daemon binding. Also, it might help to know something about this node - like how many sockets, cores/socket. On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote: Here are two problems with openmpi-1.3.4a1r22051 # Here, I try to run the moral equivalent of -bysocket -bind-to- socket, # using the MCA parameter form specified on the mpirun command line. # No binding results. THIS IS PROBLEM 1. % mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca orte_process_binding socket -report-bindings hostname saem9 saem9 saem9 saem9 saem9 # Same thing with the "core" form. % mpirun -np 5 --mca rmaps_base_schedule_policy core --mca orte_process_binding core -report-bindings hostname saem9 saem9 saem9 saem9 saem9 # Now, I set the MCA parameters as environment variables. # I then check the spellings and confirm all is set using ompi_info. % setenv OMPI_MCA_rmaps_base_schedule_policy socket % setenv OMPI_MCA_orte_process_binding socket % ompi_info -a | grep rmaps_base_schedule_policy MCA rmaps: parameter "rmaps_base_schedule_policy" (current value: "socket", data source: environment) % ompi_info -a | grep orte_process_binding MCA orte: parameter "orte_process_binding" (current value: "socket", data source: environment) # So, now I run a simple program. # I get binding now, but I'm filling up the first socket before going to the second. # THIS IS PROBLEM 2. % mpirun -np 5 -report-bindings hostname [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],0] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],1] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],2] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],3] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],4] to socket 1 cpus 00f0 saem9 saem9 saem9 saem9 saem9 # Adding -bysocket to the command line fixes things. % mpirun -np 5 -bysocket -report-bindings hostname [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],0] to socket 0 cpus 000f [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],1] to socket 1 cpus 00f0 [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],2] to socket 0 cpus 000f [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],3] to socket 1 cpus 00f0 [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],4] to socket 0 cpus 000f saem9 saem9 saem9 saem9 saem9 Bug? Or am I doing something wrong? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)
On 10/08/09 17:14, Don Kerr wrote: George, This is an interesting approach although I am guessing the changes would be wide spread and have many performance implications. Am I wrong in this belief? My point here is that if this is going to have as many performance implications as I think it will, it probably makes sense to investigate the potential bigger dual-rail issue and consider the "never share" approach in the larger context. -DON -DON On 10/08/09 11:45, George Bosilca wrote: Don, I think we can do something slightly different that will satisfy everybody. How about a solution where each BTL will define a limit where a message will never be shared with another BTL? We can have two such limits, one for the send protocol and one for the RMA (it will apply either to PUT or GET operations based on the BTL support and PML decision). george. On Oct 8, 2009, at 11:01 , Don Kerr wrote: On 10/07/09 13:52, George Bosilca wrote: Don, The problem is that a particular BTL doesn't have the knowledge about the other selected BTL, so allowing the BTLs to set this limit is not as easy as it sound. However, in the case two identical BTLs are selected and that they are the only ones, this clearly is a better approach. If this parameter is set at the PML level, I can't imagine how we figure out the correct value depending on the BTLs. I see this as a pretty strong restriction. How do we know we set a value that make sense? OK, I now see why setting at btl level is difficult. And for the case of multiple btls which are also different component types, however unlikely that is, a pml setting will not be optimal for both. -DON george. On Oct 7, 2009, at 10:19 , Don Kerr wrote: George, Were you suggesting that the proposed new parameter "max_rdma_single_rget" be set by the individual btls similar to "btl_eager_limit"? Seems to me to that is the better approach if I am to move forward with this. -DON On 10/06/09 11:14, Don Kerr wrote: I agree there is probably a larger issue here and yes this is somewhat specific but where as OB1 appears to have multiple protocols depending on the capabilities of the BTLs I would not characterize as an IB centric problem. Maybe OB1 RDMA problem. There is a clear benefit from modifying this specific case. Do you think its not worth making incremental improvements while also attacking a potential bigger issue? -DON On 10/06/09 10:52, George Bosilca wrote: Don, This seems a very IB centric problem (and solution) going up in the PML. Moreover, I noticed that independent on the BTL we have some problems with the multi-rail performance. As an example on a cluster with 3 GB cards we get the same performance is I enable 2 or 3. Didn't had time to look into the details, but this might be a more general problem. george. On Oct 6, 2009, at 09:51 , Don Kerr wrote: I intend to make the change suggested in this ticket to the trunk. The change does not impact single rail, tested with openib btl, case and does improve dual rail case. Since it does involve performance and I am adding a OB1 mca parameter just wanted to check if anyone was interested or had an issue with it before I committed the change. -DON ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] binding with MCA parameters: broken or user error?
Here are two problems with openmpi-1.3.4a1r22051 # Here, I try to run the moral equivalent of -bysocket -bind-to-socket, # using the MCA parameter form specified on the mpirun command line. # No binding results. THIS IS PROBLEM 1. % mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca orte_process_binding socket -report-bindings hostname saem9 saem9 saem9 saem9 saem9 # Same thing with the "core" form. % mpirun -np 5 --mca rmaps_base_schedule_policy core --mca orte_process_binding core -report-bindings hostname saem9 saem9 saem9 saem9 saem9 # Now, I set the MCA parameters as environment variables. # I then check the spellings and confirm all is set using ompi_info. % setenv OMPI_MCA_rmaps_base_schedule_policy socket % setenv OMPI_MCA_orte_process_binding socket % ompi_info -a | grep rmaps_base_schedule_policy MCA rmaps: parameter "rmaps_base_schedule_policy" (current value: "socket", data source: environment) % ompi_info -a | grep orte_process_binding MCA orte: parameter "orte_process_binding" (current value: "socket", data source: environment) # So, now I run a simple program. # I get binding now, but I'm filling up the first socket before going to the second. # THIS IS PROBLEM 2. % mpirun -np 5 -report-bindings hostname [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],0] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],1] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],2] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],3] to socket 0 cpus 000f [saem9:23947] [[29741,0],0] odls:default:fork binding child [[29741,1],4] to socket 1 cpus 00f0 saem9 saem9 saem9 saem9 saem9 # Adding -bysocket to the command line fixes things. % mpirun -np 5 -bysocket -report-bindings hostname [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],0] to socket 0 cpus 000f [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],1] to socket 1 cpus 00f0 [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],2] to socket 0 cpus 000f [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],3] to socket 1 cpus 00f0 [saem9:23953] [[29751,0],0] odls:default:fork binding child [[29751,1],4] to socket 0 cpus 000f saem9 saem9 saem9 saem9 saem9 Bug? Or am I doing something wrong?