Ralph,

Huh. That isn't in the Open MPI 1.8.8 mpirun man page. It is in Open MPI
1.10, so I'm guessing someone noticed it wasn't there. Explains why I
didn't try it out. I'm assuming this option is respected on all nodes?

Note: a SmarterManThanI™ here at Goddard thought up this:

#!/bin/bash
rank=0
for node in $(srun uname -n | sort); do
        echo "rank $rank=$node slots=1:*"
        let rank+=1
done

It does seem to work in synthetic tests so I'm trying it now in my real
job. I had to hack a few run scripts so I'll probably spend the next hour
debugging something dumb I did.

What I'm wondering about all this is: can this be done with --slot-list?
Or, perhaps, does --slot-list even work?

I have tried about 20 different variations of it, e.g., --slot-list 1:*,
--slot-list '1:*', --slot-list 1:0,1,2,3,4,5,6,7, --slot-list
1:8,9,10,11,12,13,14,15, --slot-list 8-15, &c., and every time I seem to
trigger an error via help-rmaps_rank_file.txt. I tried to read
through opal_hwloc_base_slot_list_parse in the source, but my C isn't great
(see my gmail address name) so that didn't help. Might not even be the
right function, but I was just acking the code.

Thanks,
Matt


On Mon, Dec 21, 2015 at 10:51 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Try adding —cpu-set a,b,c,…  where the a,b,c… are the core id’s of your
> second socket. I’m working on a cleaner option as this has come up before.
>
>
> On Dec 21, 2015, at 5:29 AM, Matt Thompson <fort...@gmail.com> wrote:
>
> Dear Open MPI Gurus,
>
> I'm currently trying to do something with Open MPI 1.8.8 that I'm pretty
> sure is possible, but I'm just not smart enough to figure out. Namely, I'm
> seeing some odd GPU timings and I think it's because I was dumb and assumed
> the GPU was on the PCI bus next to Socket #0 as some older GPU nodes I ran
> on were like that.
>
> But, a trip through lspci and lstopo has shown me that the GPU is actually
> on Socket #1. These are dual socket Sandy Bridge nodes and I'd like to do
> some tests where I run a 8 processes per node and those processes all land
> on Socket #1.
>
> So, what I'm trying to figure out is how to have Open MPI bind processes
> like that. My first thought as always is to run a helloworld job with
> -report-bindings on. I can manage to do this:
>
> (1061) $ mpirun -np 8 -report-bindings -map-by core ./helloWorld.exe
> [borg01z205:16306] MCW rank 4 bound to socket 0[core 4[hwt 0]]:
> [././././B/././.][./././././././.]
> [borg01z205:16306] MCW rank 5 bound to socket 0[core 5[hwt 0]]:
> [./././././B/./.][./././././././.]
> [borg01z205:16306] MCW rank 6 bound to socket 0[core 6[hwt 0]]:
> [././././././B/.][./././././././.]
> [borg01z205:16306] MCW rank 7 bound to socket 0[core 7[hwt 0]]:
> [./././././././B][./././././././.]
> [borg01z205:16306] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././././././.][./././././././.]
> [borg01z205:16306] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./././././.][./././././././.]
> [borg01z205:16306] MCW rank 2 bound to socket 0[core 2[hwt 0]]:
> [././B/././././.][./././././././.]
> [borg01z205:16306] MCW rank 3 bound to socket 0[core 3[hwt 0]]:
> [./././B/./././.][./././././././.]
> Process    7 of    8 is on borg01z205
> Process    5 of    8 is on borg01z205
> Process    2 of    8 is on borg01z205
> Process    3 of    8 is on borg01z205
> Process    4 of    8 is on borg01z205
> Process    6 of    8 is on borg01z205
> Process    0 of    8 is on borg01z205
> Process    1 of    8 is on borg01z205
>
> Great...but wrong socket! Is there a way to tell it to use Socket 1
> instead?
>
> Note I'll be running under SLURM, so I will only have 8 processes per
> node, so it shouldn't need to use Socket 0.
> --
> Matt Thompson
>
> Man Among Men
> Fulcrum of History
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/12/28190.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/12/28195.php
>



-- 
Matt Thompson

Man Among Men
Fulcrum of History

Reply via email to