Ralph, Huh. That isn't in the Open MPI 1.8.8 mpirun man page. It is in Open MPI 1.10, so I'm guessing someone noticed it wasn't there. Explains why I didn't try it out. I'm assuming this option is respected on all nodes?
Note: a SmarterManThanI™ here at Goddard thought up this: #!/bin/bash rank=0 for node in $(srun uname -n | sort); do echo "rank $rank=$node slots=1:*" let rank+=1 done It does seem to work in synthetic tests so I'm trying it now in my real job. I had to hack a few run scripts so I'll probably spend the next hour debugging something dumb I did. What I'm wondering about all this is: can this be done with --slot-list? Or, perhaps, does --slot-list even work? I have tried about 20 different variations of it, e.g., --slot-list 1:*, --slot-list '1:*', --slot-list 1:0,1,2,3,4,5,6,7, --slot-list 1:8,9,10,11,12,13,14,15, --slot-list 8-15, &c., and every time I seem to trigger an error via help-rmaps_rank_file.txt. I tried to read through opal_hwloc_base_slot_list_parse in the source, but my C isn't great (see my gmail address name) so that didn't help. Might not even be the right function, but I was just acking the code. Thanks, Matt On Mon, Dec 21, 2015 at 10:51 AM, Ralph Castain <r...@open-mpi.org> wrote: > Try adding —cpu-set a,b,c,… where the a,b,c… are the core id’s of your > second socket. I’m working on a cleaner option as this has come up before. > > > On Dec 21, 2015, at 5:29 AM, Matt Thompson <fort...@gmail.com> wrote: > > Dear Open MPI Gurus, > > I'm currently trying to do something with Open MPI 1.8.8 that I'm pretty > sure is possible, but I'm just not smart enough to figure out. Namely, I'm > seeing some odd GPU timings and I think it's because I was dumb and assumed > the GPU was on the PCI bus next to Socket #0 as some older GPU nodes I ran > on were like that. > > But, a trip through lspci and lstopo has shown me that the GPU is actually > on Socket #1. These are dual socket Sandy Bridge nodes and I'd like to do > some tests where I run a 8 processes per node and those processes all land > on Socket #1. > > So, what I'm trying to figure out is how to have Open MPI bind processes > like that. My first thought as always is to run a helloworld job with > -report-bindings on. I can manage to do this: > > (1061) $ mpirun -np 8 -report-bindings -map-by core ./helloWorld.exe > [borg01z205:16306] MCW rank 4 bound to socket 0[core 4[hwt 0]]: > [././././B/././.][./././././././.] > [borg01z205:16306] MCW rank 5 bound to socket 0[core 5[hwt 0]]: > [./././././B/./.][./././././././.] > [borg01z205:16306] MCW rank 6 bound to socket 0[core 6[hwt 0]]: > [././././././B/.][./././././././.] > [borg01z205:16306] MCW rank 7 bound to socket 0[core 7[hwt 0]]: > [./././././././B][./././././././.] > [borg01z205:16306] MCW rank 0 bound to socket 0[core 0[hwt 0]]: > [B/././././././.][./././././././.] > [borg01z205:16306] MCW rank 1 bound to socket 0[core 1[hwt 0]]: > [./B/./././././.][./././././././.] > [borg01z205:16306] MCW rank 2 bound to socket 0[core 2[hwt 0]]: > [././B/././././.][./././././././.] > [borg01z205:16306] MCW rank 3 bound to socket 0[core 3[hwt 0]]: > [./././B/./././.][./././././././.] > Process 7 of 8 is on borg01z205 > Process 5 of 8 is on borg01z205 > Process 2 of 8 is on borg01z205 > Process 3 of 8 is on borg01z205 > Process 4 of 8 is on borg01z205 > Process 6 of 8 is on borg01z205 > Process 0 of 8 is on borg01z205 > Process 1 of 8 is on borg01z205 > > Great...but wrong socket! Is there a way to tell it to use Socket 1 > instead? > > Note I'll be running under SLURM, so I will only have 8 processes per > node, so it shouldn't need to use Socket 0. > -- > Matt Thompson > > Man Among Men > Fulcrum of History > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/12/28190.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/12/28195.php > -- Matt Thompson Man Among Men Fulcrum of History