I tweaked your scripts a bunch so that I could run a bunch of different 
variations on my cluster.

I have lots of jobs queued up (I have 29 nodes in my cluster -- 3 have died 
over time); they'll take a bunch of time to execute.

             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
           3204131   jenkins alltoall jsquyres PD       0:00      8 (Resources)
           3204132   jenkins alltoall jsquyres PD       0:00      8 (Resources)
           3204133   jenkins  barrier jsquyres PD       0:00      8 (Resources)
           3204134   jenkins    bcast jsquyres PD       0:00      8 (Resources)
           3204135   jenkins   gather jsquyres PD       0:00      8 (Resources)
           3204136   jenkins   reduce jsquyres PD       0:00      8 (Resources)
           3204137   jenkins reduce_s jsquyres PD       0:00      8 (Resources)
           3204138   jenkins reduce_s jsquyres PD       0:00      8 (Resources)
           3204139   jenkins  scatter jsquyres PD       0:00      8 (Resources)
           3204140   jenkins allgathe jsquyres PD       0:00      8 (Resources)
           3204141   jenkins allgathe jsquyres PD       0:00      8 (Resources)
           3204142   jenkins allreduc jsquyres PD       0:00      8 (Resources)
           3204143   jenkins alltoall jsquyres PD       0:00      8 (Resources)
           3204144   jenkins alltoall jsquyres PD       0:00      8 (Resources)
           3204145   jenkins  barrier jsquyres PD       0:00      8 (Resources)
           3204146   jenkins    bcast jsquyres PD       0:00      8 (Resources)
           3204147   jenkins   gather jsquyres PD       0:00      8 (Resources)
           3204148   jenkins   reduce jsquyres PD       0:00      8 (Resources)
           3204149   jenkins reduce_s jsquyres PD       0:00      8 (Resources)
           3204150   jenkins reduce_s jsquyres PD       0:00      8 (Resources)
           3204151   jenkins  scatter jsquyres PD       0:00      8 (Resources)
           3204152   jenkins allgathe jsquyres PD       0:00     16 (Resources)
           3204153   jenkins allgathe jsquyres PD       0:00     16 (Resources)
           3204154   jenkins allreduc jsquyres PD       0:00     16 (Resources)
           3204155   jenkins alltoall jsquyres PD       0:00     16 (Resources)
           3204156   jenkins alltoall jsquyres PD       0:00     16 (Resources)
           3204157   jenkins  barrier jsquyres PD       0:00     16 (Resources)
           3204158   jenkins    bcast jsquyres PD       0:00     16 (Resources)
           3204159   jenkins   gather jsquyres PD       0:00     16 (Resources)
           3204160   jenkins   reduce jsquyres PD       0:00     16 (Resources)
           3204161   jenkins reduce_s jsquyres PD       0:00     16 (Resources)
           3204162   jenkins reduce_s jsquyres PD       0:00     16 (Resources)
           3204163   jenkins  scatter jsquyres PD       0:00     16 (Resources)
           3204164   jenkins allgathe jsquyres PD       0:00     16 (Resources)
           3204165   jenkins allgathe jsquyres PD       0:00     16 (Resources)
           3204166   jenkins allreduc jsquyres PD       0:00     16 (Resources)
           3204167   jenkins alltoall jsquyres PD       0:00     16 (Resources)
           3204168   jenkins alltoall jsquyres PD       0:00     16 (Resources)
           3204169   jenkins  barrier jsquyres PD       0:00     16 (Resources)
           3204170   jenkins    bcast jsquyres PD       0:00     16 (Resources)
           3204171   jenkins   gather jsquyres PD       0:00     16 (Resources)
           3204172   jenkins   reduce jsquyres PD       0:00     16 (Resources)
           3204173   jenkins reduce_s jsquyres PD       0:00     16 (Resources)
           3204174   jenkins reduce_s jsquyres PD       0:00     16 (Resources)
           3204175   jenkins  scatter jsquyres PD       0:00     16 (Resources)
           3204176   jenkins allgathe jsquyres PD       0:00     16 (Resources)
           3204177   jenkins allgathe jsquyres PD       0:00     16 (Resources)
           3204178   jenkins allreduc jsquyres PD       0:00     16 (Resources)
           3204179   jenkins alltoall jsquyres PD       0:00     16 (Resources)
           3204180   jenkins alltoall jsquyres PD       0:00     16 (Resources)
           3204181   jenkins  barrier jsquyres PD       0:00     16 (Resources)
           3204182   jenkins    bcast jsquyres PD       0:00     16 (Resources)
           3204183   jenkins   gather jsquyres PD       0:00     16 (Resources)
           3204184   jenkins   reduce jsquyres PD       0:00     16 (Resources)
           3204185   jenkins reduce_s jsquyres PD       0:00     16 (Resources)
           3204186   jenkins reduce_s jsquyres PD       0:00     16 (Resources)
           3204187   jenkins  scatter jsquyres PD       0:00     16 (Resources)
           3204188   jenkins allgathe jsquyres PD       0:00     29 (Resources)
           3204189   jenkins allgathe jsquyres PD       0:00     29 (Resources)
           3204190   jenkins allreduc jsquyres PD       0:00     29 (Resources)
           3204191   jenkins alltoall jsquyres PD       0:00     29 (Resources)
           3204192   jenkins alltoall jsquyres PD       0:00     29 (Resources)
           3204193   jenkins  barrier jsquyres PD       0:00     29 (Resources)
           3204194   jenkins    bcast jsquyres PD       0:00     29 (Resources)
           3204195   jenkins   gather jsquyres PD       0:00     29 (Resources)
           3204196   jenkins   reduce jsquyres PD       0:00     29 (Resources)
           3204197   jenkins reduce_s jsquyres PD       0:00     29 (Resources)
           3204198   jenkins reduce_s jsquyres PD       0:00     29 (Resources)
           3204199   jenkins  scatter jsquyres PD       0:00     29 (Resources)
           3204200   jenkins allgathe jsquyres PD       0:00     29 (Resources)
           3204201   jenkins allgathe jsquyres PD       0:00     29 (Resources)
           3204202   jenkins allreduc jsquyres PD       0:00     29 (Resources)
           3204203   jenkins alltoall jsquyres PD       0:00     29 (Resources)
           3204204   jenkins alltoall jsquyres PD       0:00     29 (Resources)
           3204205   jenkins  barrier jsquyres PD       0:00     29 (Resources)
           3204206   jenkins    bcast jsquyres PD       0:00     29 (Resources)
           3204207   jenkins   gather jsquyres PD       0:00     29 (Resources)
           3204208   jenkins   reduce jsquyres PD       0:00     29 (Resources)
           3204209   jenkins reduce_s jsquyres PD       0:00     29 (Resources)
           3204210   jenkins reduce_s jsquyres PD       0:00     29 (Resources)
           3204211   jenkins  scatter jsquyres PD       0:00     29 (Resources)
           3204212   jenkins allgathe jsquyres PD       0:00     29 (Resources)
           3204213   jenkins allgathe jsquyres PD       0:00     29 (Resources)
           3204214   jenkins allreduc jsquyres PD       0:00     29 (Resources)
           3204215   jenkins alltoall jsquyres PD       0:00     29 (Resources)
           3204216   jenkins alltoall jsquyres PD       0:00     29 (Resources)
           3204217   jenkins  barrier jsquyres PD       0:00     29 (Resources)
           3204218   jenkins    bcast jsquyres PD       0:00     29 (Resources)
           3204219   jenkins   gather jsquyres PD       0:00     29 (Resources)
           3204220   jenkins   reduce jsquyres PD       0:00     29 (Resources)
           3204221   jenkins reduce_s jsquyres PD       0:00     29 (Resources)
           3204222   jenkins reduce_s jsquyres PD       0:00     29 (Resources)
           3204223   jenkins  scatter jsquyres PD       0:00     29 (Resources)
           3204128   jenkins allgathe jsquyres  R       5:10      8 mpi[004-011]
           3204129   jenkins allgathe jsquyres  R       5:10      8 mpi[016-023]
           3204130   jenkins allreduc jsquyres  R       5:10      8 mpi[024-031]


> On Apr 13, 2020, at 6:35 PM, Zhang, William via devel 
> <devel@lists.open-mpi.org> wrote:
> 
> Hello all,
>  
> I have created a —with-slurm option when running (See updated README). In 
> order to set new defaults for collective algorithms, we will need data from 
> those who wish to provide it. We have created the following package that 
> allows for collecting data: 
> https://github.com/open-mpi/ompi-collectives-tuning
>  
> Please run the package as soon as possible. Details on how to run are in the 
> README.md. If data collection fails, the output of the analyze script (either 
> analyze.sh.o* for SGE or the ouput of ./run_and_analyze if using slurm) will 
> report "Error parsing <filename>. Data format doesn't match. Exiting..”. 
> Please make sure data collection succeeds and a decision file is written 
> entirely.
>  
> Please provide me with either the output directory or if it’s inconvenient to 
> share this data, provide me a list of optimal switchover points at different 
> message sizes for each algorithm (This can be in the form of the 
> output/decision.file which only contains switchover points and no specific 
> performance numbers)
>  
> Thanks,
> William Zhang


-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to