Sebastian,

the PSM2 shared memory segment name is set by the PSM2 library and

my understanding is that Open MPI has no control over it.


If you believe the root cause of the crash is related to non unique PSM2 shared

memory segment name, I guess you should report this at https://github.com/intel/opa-psm2



Below is a snippet from ptl_am/am_reqrep_shmem.c



Cheers,


Gilles


psm2_error_t psmi_shm_create(ptl_t *ptl_gen)
{
// ...

               snprintf(shmbuf,
                         sizeof(shmbuf),
                         "/psm2_shm.%ld%016lx%d",
                         (long int) getuid(),
                         ep->epid,
                         iterator);
                amsh_keyname = psmi_strdup(NULL, shmbuf);
 // ...

              shmfd =
                    shm_open(amsh_keyname, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);


On 7/5/2019 4:13 AM, Kraus, Sebastian via users wrote:
Hi all,
anyone around there, who could explain me how the naming scheme for the PSM2 
and Vader shared memory segments is constructed.
I am curious if there is a possibility to influence the naming scheme via 
run-time parameters. I am confronted to the situation where distinct
SLURM jobs of the same user on the same node randomly segfault. I suppose that 
the problem is connected with the non-unique naming
scheme of the PSM2 shared memory segments (as determined by openmpi/SLURM).
The PSM segments show the following naming convention: 
/dev/shm/psm2_shm.[user_id][some_mask]
Unfortunately, the values of the mask do not change for distinct SLURM jobs. 
Instead  the names of the Vader segments show uniqueness for
different process ids: 
/dev/shm/vader_segment.[nodename].[some_process-mask].[SLURM_STEPID]

An example:

Vader segments:
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.5
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.3
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.1
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.7
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.6
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.0
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.2
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e00001.4
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.7
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.5
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.1
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.4
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.3
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.0
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.6
-rw------- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.2

PSM2 segments:
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000007ff0000e00
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000006ff0000c00
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000005ff0000a00
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000003ff0000600
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000002ff0000400
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000001ff0000200
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000000ff0000000
-rw------- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.117648500000004ff0000800

Thanks for your time and support
Sebastian


Sebastian Kraus

Technische Universität Berlin
Fakultät II
Institut für Chemie
Sekretariat C3
Straße des 17. Juni 135
10623 Berlin
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to