Re: [OMPI users] process mapping

2019-06-21 Thread Noam Bernstein via users
> On Jun 21, 2019, at 4:04 PM, Ralph Castain via users 
>  wrote:
> 
> I’m unaware of any “map-to cartofile” option, nor do I find it in mpirun’s 
> help or man page. Are you seeing it somewhere?

From "mpirun —help”:

tin 1431 : mpirun --help mapping
mpirun (Open MPI) 4.0.1

Usage: mpirun [OPTION]...  [PROGRAM]...
Start the given program using Open RTE

   -cf|--cartofile   
 Provide a cartography file
followed by all the other mapping-related options.

Maybe what I want is best described as not doing round-robin, but I see no way 
to do that either.

Noam___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] process mapping

2019-06-21 Thread Ralph Castain via users
On Jun 21, 2019, at 1:52 PM, Noam Bernstein mailto:noam.bernst...@nrl.navy.mil> > wrote:



On Jun 21, 2019, at 4:45 PM, Ralph Castain mailto:r...@open-mpi.org> > wrote:

Hilarious - I wrote that code and I have no idea who added that option or what 
it is supposed to do. I can assure, however, that it isn’t implemented anywhere.

Not really a big deal, since the documentation doesn’t explain them, and I was 
just grasping at straws. Are rankfiles implemented?  Maybe I could use those 
(although binding/mapping command line arguments would definitely be easier).


Perhaps if you tell us what pattern you are trying to get, we can advise you on 
the proper cmd line to get there?

I thought that was in the original email.  Basically, I have 2 18 core CPUs, 
and I want ranks 0-17 on slots 0-17 in cpu 0 and 18-35 on slots 0-17 in cpu 1.  
I’d have thought that would be straightforward, but everything I’ve tried ends 
up with i_task%2 == i_cpu, i.e. 0,2,4,… on cpu 0 and 1,3,5… on cpu 1.

Too many emails to track :-(

Should just be “--map-by core --rank-by core” - nothing fancy required. Sounds 
like you are getting --map-by node, or at least --rank-by node, which means 
somebody has set an MCA param either in the system config file or the 
environment.

Noam


 

 
|
|
 
|
U.S. NAVAL
|
 
|
_RESEARCH_
|
 
LABORATORY

Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] process mapping

2019-06-21 Thread Noam Bernstein via users
Hi - are there any examples of the cartofile format?  Or is there some combo of 
—map, —rank, or —bind to achieve this mapping?
[BB/..][../..]
[../BB][../..]
[../..][BB/..]
[../..][../BB]

I tried everything I could think of for —bind-to, —map-by, and —rank-by, and I 
can’t get it to happen.  I can get
[BB/..][../..]
[../../][BB/..]
[../BB][../../]
[../../][../BB]
but that’s not quite what I want.

thanks,
Noam

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] process mapping

2019-06-21 Thread Noam Bernstein via users


> On Jun 21, 2019, at 4:45 PM, Ralph Castain  wrote:
> 
> Hilarious - I wrote that code and I have no idea who added that option or 
> what it is supposed to do. I can assure, however, that it isn’t implemented 
> anywhere.

Not really a big deal, since the documentation doesn’t explain them, and I was 
just grasping at straws. Are rankfiles implemented?  Maybe I could use those 
(although binding/mapping command line arguments would definitely be easier).

> 
> Perhaps if you tell us what pattern you are trying to get, we can advise you 
> on the proper cmd line to get there?

I thought that was in the original email.  Basically, I have 2 18 core CPUs, 
and I want ranks 0-17 on slots 0-17 in cpu 0 and 18-35 on slots 0-17 in cpu 1.  
I’d have thought that would be straightforward, but everything I’ve tried ends 
up with i_task%2 == i_cpu, i.e. 0,2,4,… on cpu 0 and 1,3,5… on cpu 1.

Noam


||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] process mapping

2019-06-21 Thread Ralph Castain via users
Hilarious - I wrote that code and I have no idea who added that option or what 
it is supposed to do. I can assure, however, that it isn’t implemented anywhere.

Perhaps if you tell us what pattern you are trying to get, we can advise you on 
the proper cmd line to get there?


On Jun 21, 2019, at 1:43 PM, Noam Bernstein mailto:noam.bernst...@nrl.navy.mil> > wrote:

On Jun 21, 2019, at 4:04 PM, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:

I’m unaware of any “map-to cartofile” option, nor do I find it in mpirun’s help 
or man page. Are you seeing it somewhere?

>From "mpirun —help”:

tin 1431 : mpirun --help mapping
mpirun (Open MPI) 4.0.1

Usage: mpirun [OPTION]...  [PROGRAM]...
Start the given program using Open RTE

   -cf|--cartofile   
                         Provide a cartography file
followed by all the other mapping-related options.

Maybe what I want is best described as not doing round-robin, but I see no way 
to do that either.

Noam

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] process mapping

2019-06-21 Thread Ralph Castain via users
I’m unaware of any “map-to cartofile” option, nor do I find it in mpirun’s help 
or man page. Are you seeing it somewhere?


On Jun 21, 2019, at 12:43 PM, Noam Bernstein via users 
mailto:users@lists.open-mpi.org> > wrote:

Hi - are there any examples of the cartofile format?  Or is there some combo of 
—map, —rank, or —bind to achieve this mapping?
[BB/..][../..]
[../BB][../..]
[../..][BB/..]
[../..][../BB]

I tried everything I could think of for —bind-to, —map-by, and —rank-by, and I 
can’t get it to happen.  I can get
[BB/..][../..]
[../../][BB/..]
[../BB][../../]
[../../][../BB]
but that’s not quite what I want.

thanks,
Noam

___
users mailing list
users@lists.open-mpi.org  
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] process mapping

2019-06-21 Thread Noam Bernstein via users
> On Jun 21, 2019, at 5:02 PM, Ralph Castain  wrote:
> 
> 
> 
> Too many emails to track :-(
> 
> Should just be “--map-by core --rank-by core” - nothing fancy required. 
> Sounds like you are getting --map-by node, or at least --rank-by node, which 
> means somebody has set an MCA param either in the system config file or the 
> environment.
> 

Yeay, that worked.  Apparently I didn’t try every combination.  I feel like the 
documentation could be better, since I clearly wasn’t able to figure this out, 
but I’m not sure what particular wording to suggest.  Let me think about it.

thanks,
Noam___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] growing memory use from MPI application

2019-06-21 Thread Noam Bernstein via users
Perhaps I spoke too soon.  Now, with the Mellanox OFED stack, we occasionally 
get the following failure on exit:
[compute-4-20:68008:0:68008] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x10)
0 0x0002a3c5 opal_free_list_destruct()  opal_free_list.c:0
1 0x1e89 mca_rcache_grdma_finalize()  rcache_grdma_module.c:0
2 0x000cbfdf mca_rcache_base_module_destroy()  ???:0
3 0xdfef device_destruct()  btl_openib_component.c:0
4 0x9c61 mca_btl_openib_finalize()  ???:0
5 0x000796f3 mca_btl_base_close()  btl_base_frame.c:0
6 0x00062c99 mca_base_framework_close()  ???:0
7 0x00062c99 mca_base_framework_close()  ???:0
8 0x00052a2a ompi_mpi_finalize()  ???:0
9 0x00046449 mpi_finalize__()  ???:0
It appears to be non-deterministic, as far as my users can tell.  

I have no idea how to even begin debugging this, but it started when we 
switched from the CentOS OFED stuff to the Mellanox version (which, 
incidentally, seems to be failing to even recognize our oldest FDR IB cards).  
If anyone has any suggestions, I'd appreciate it.

Noam___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] growing memory use from MPI application

2019-06-21 Thread Noam Bernstein via users
> On Jun 21, 2019, at 9:57 PM, Carlson, Timothy S  
> wrote:
> 
> Switch back to stock OFED?   

Well, CentOS included OFED has a memory leak (at least when using ucx).  I 
haven't tried OFED's stack yet.

> 
> Make sure all your cards are patched to the latest firmware.   

That's a good idea.  I'll try that.  If only SuperMicro didn't make it so 
difficult to find the correct firmware.

Noam


||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users