I am attempting to use coll_tuned_dynamic_rules_filename to tune Open MPI 1.3.2. Based on my testing, it appears that the dynamic rules file *only* influences the algorithm selection for MPI_COMM_WORLD. Any duplicate communicators will only use fixed or forced rules, which may have much worse performance than the custom-tuned collectives in the dynamic rules file. The following code demonstrates the difference between MPI_COMM_WORLD and a duplicate communicator.
test.c: #include <mpi.h> int main( int argc, char** argv ) { float u = 0.0, v = 0.0; MPI_Comm world_dup; MPI_Init( &argc, &argv ); MPI_Comm_dup( MPI_COMM_WORLD, &world_dup ); MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, world_dup ); MPI_Barrier( MPI_COMM_WORLD ); MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD ); MPI_Finalize(); return 0; } allreduce.ompi: 1 2 1 9 1 0 1 0 0 invocation: orterun -np 9 \ -mca btl self,sm,openib,tcp \ -mca coll_tuned_use_dynamic_rules 1 \ -mca coll_tuned_dynamic_rules_filename allreduce.ompi \ -mca coll_base_verbose 1000 \ -- test This program is run with tracing, and the barrier is only used to separate the allreduce calls in the trace. The trace for one node is at the end of the message, and the relevant section is the choice of algorithms for the two allreduce calls. The allreduce.ompi file indicates that all size 9 communicators should use the basic linear allreduce algorithm. MPI_COMM_WORLD uses basic_linear, but the world_dup communicator uses the fixed algorithm (for this message size, the fixed algorithm is recursive doubling). Thank you. John Jumper Trace of one process for the above program: mca: base: components_open: opening coll components mca: base: components_open: found loaded component basic mca: base: components_open: component basic register function successful mca: base: components_open: component basic has no open function mca: base: components_open: found loaded component hierarch mca: base: components_open: component hierarch has no register function mca: base: components_open: component hierarch open function successful mca: base: components_open: found loaded component inter mca: base: components_open: component inter has no register function mca: base: components_open: component inter open function successful mca: base: components_open: found loaded component self mca: base: components_open: component self has no register function mca: base: components_open: component self open function successful mca: base: components_open: found loaded component sm mca: base: components_open: component sm has no register function mca: base: components_open: component sm open function successful mca: base: components_open: found loaded component sync mca: base: components_open: component sync register function successful mca: base: components_open: component sync has no open function mca: base: components_open: found loaded component tuned mca: base: components_open: component tuned has no register function coll:tuned:component_open: done! mca: base: components_open: component tuned open function successful coll:find_available: querying coll component basic coll:find_available: coll component basic is available coll:find_available: querying coll component hierarch coll:find_available: coll component hierarch is available coll:find_available: querying coll component inter coll:find_available: coll component inter is available coll:find_available: querying coll component self coll:find_available: coll component self is available coll:find_available: querying coll component sm coll:find_available: coll component sm is available coll:find_available: querying coll component sync coll:find_available: coll component sync is available coll:find_available: querying coll component tuned coll:find_available: coll component tuned is available coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0) coll:base:comm_select: Checking all available modules coll:base:comm_select: component available: basic, priority: 10 coll:base:comm_select: component not available: hierarch coll:base:comm_select: component not available: inter coll:base:comm_select: component not available: self coll:base:comm_select: component not available: sm coll:base:comm_select: component not available: sync coll:tuned:module_tuned query called coll:tuned:module_query using intra_dynamic coll:base:comm_select: component available: tuned, priority: 30 coll:tuned:module_init called. coll:tuned:module_init MCW & Dynamic coll:tuned:module_init Opening [allreduce.ompi] Reading dynamic rule for collective ID 2 Read communicator count 1 for dynamic rule for collective ID 2 Read message count 1 for dynamic rule for collective ID 2 and comm size 9 Done reading dynamic rule for collective ID 2 Collectives with rules : 1 Communicator sizes with rules : 1 Message sizes with rules : 1 Lines in configuration file read : 0 coll:tuned:module_init Read 1 valid rules Selected the following com rule id 0 alg_id 2 com_id 0 com_size 9 number of message sizes 1 alg_id 2 com_id 0 com_size 9 msg_id 0 msg_size 0 -> algorithm 1 topo in/out 0 segsize 0 max_requests 0 coll:tuned:topo_build_tree Building fo 4 rt 0 coll:tuned:topo_build_tree Building fo 2 rt 0 coll:tuned:topo:build_bmtree rt 0 coll:tuned:topo:build_in_order_bmtree rt 0 coll:tuned:topo:build_chain fo 4 rt 0 coll:tuned:topo:build_chain fo 1 rt 0 coll:tuned:topo_build_in_order_tree Building fo 2 rt 8 coll:tuned:module_init Tuned is in use coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1) coll:base:comm_select: Checking all available modules coll:base:comm_select: component available: basic, priority: 10 coll:base:comm_select: component not available: hierarch coll:base:comm_select: component not available: inter coll:base:comm_select: component available: self, priority: 75 coll:base:comm_select: component not available: sm coll:base:comm_select: component not available: sync coll:tuned:module_tuned query called coll:base:comm_select: component not available: tuned coll:base:comm_select: new communicator: MPI COMMUNICATOR 4 DUP FROM 0 (cid 4) coll:base:comm_select: Checking all available modules coll:base:comm_select: component available: basic, priority: 10 coll:base:comm_select: component not available: hierarch coll:base:comm_select: component not available: inter coll:base:comm_select: component not available: self coll:base:comm_select: component not available: sm coll:base:comm_select: component not available: sync coll:tuned:module_tuned query called coll:tuned:module_query using intra_dynamic coll:base:comm_select: component available: tuned, priority: 30 coll:tuned:module_init called. coll:tuned:topo_build_tree Building fo 4 rt 0 coll:tuned:topo_build_tree Building fo 2 rt 0 coll:tuned:topo:build_bmtree rt 0 coll:tuned:topo:build_in_order_bmtree rt 0 coll:tuned:topo:build_chain fo 4 rt 0 coll:tuned:topo:build_chain fo 1 rt 0 coll:tuned:topo_build_in_order_tree Building fo 2 rt 8 coll:tuned:module_init Tuned is in use ompi_coll_tuned_allreduce_intra_dec_dynamic ompi_coll_tuned_allreduce_intra_dec_fixed coll:tuned:allreduce_intra_recursivedoubling rank 8 ompi_coll_tuned_barrier_intra_dec_dynamic ompi_coll_tuned_barrier_intra_dec_fixed com_size 9 ompi_coll_tuned_barrier_intra_bruck rank 8 ompi_coll_tuned_allreduce_intra_dec_dynamic Selected the following msg rule id 0 alg_id 2 com_id 0 com_size 9 msg_id 0 msg_size 0 -> algorithm 1 topo in/out 0 segsize 0 max_requests 0 coll:tuned:allreduce_intra_do_this algorithm 1 topo fan in/out 0 segsize 0 coll:tuned:allreduce_intra_basic_linear rank 8 coll:tuned:reduce_intra_basic_linear rank 8 ompi_coll_tuned_bcast_intra_basic_linear rank 8 root 0 mca: base: close: unloading component basic mca: base: close: unloading component hierarch mca: base: close: unloading component inter mca: base: close: unloading component self mca: base: close: component sm closed mca: base: close: unloading component sm mca: base: close: unloading component sync coll:tuned:component_close: called coll:tuned:component_close: done! mca: base: close: component tuned closed mca: base: close: unloading component tuned