Re: [OMPI devel] MPI_T SEGV on DSO
Yup, just noticed that. All component variables should be registered with mca_base_component_var_register but the versions were registered with the generic register function. The code in question is the oldest part of the MCA rewrite so it probably was missed when the component variable register function was added. Fixing now. -Nathan On Thu, Jul 31, 2014 at 12:40:55AM +0900, KAWASHIMA Takahiro wrote: > Nathan, > > The diffrences seems to be the flags on registering. > > Normal MCA variables shmem_sysv_priority etc. have flag > MCA_BASE_VAR_FLAG_DWG so that they are deregistered through > mca_base_var_group_deregister in mca_base_component_unload. > > But shmem_sysv_major_version doesn't have the flag. > > Regards, > KAWASHIMA Takahiro > > > This is odd. The variable in question is registered by the MCA itself. I > > will take a look and see if I can determine why it isn't being > > deregistered correctly when the rest of the component's parameters are. > > > > -Nathan > > > > On Wed, Jul 30, 2014 at 08:17:15AM +0900, KAWASHIMA Takahiro wrote: > > > Nathan, > > > > > > Thanks for your response. > > > > > > Yes. My previous mail was the result of uncommented code. > > > Now I also pulled latest varList source code which uncommented > > > the section you mentioned, but the result was same. > > > > > > If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX > > > for variables for unloaded components, not returning > > > MPI_T_ERR_INVALID_INDEX is the problem. > > > > > > I run varList on GDB and found that MPI_T_cvar_get_info returns > > > MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane). > > > But it returns MPI_SUCCESS for shmem_sysv_major_version. > > > The difference is mbv_flags values. mbv_flags is 0x44 for > > > shmem_sysv_priority on MPI_T_cvar_get_info call so that > > > mca_base_var_get function in opal/mca/base/mca_base_var.c > > > returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for > > > shmem_sysv_major_version so that mca_base_var_get function > > > returns OPAL_SUCCESS. > > > > > > Control variables for unloaded components are not deregistered > > > completely? > > > > > > I can track it more when I have time. > > > > > > My environment: > > > OS: Debian GNU/Linux wheezy > > > CPU: x86_64 > > > Run: mpiexec -n 1 varList > > > Open MPI source: trunk r32338 (almost latest) > > > Open MPI configure: > > > enable_picky=yes > > > enable_debug=yes > > > enable_mem_debug=yes > > > enable_mem_profile=yes > > > enable_memchecker=no > > > > > > enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras > > > enable_contrib_no_build=vt > > > enable_mpi_cxx=no > > > enable_mpi_f77=no > > > enable_mpi_f90=no > > > enable_ipv6=no > > > enable_mpi_io=no > > > with_devel_headers=no > > > with_wrapper_cflags=-g > > > with_wrapper_cxxflags=-g > > > with_wrapper_fflags=-g > > > with_wrapper_fcflags=-g > > > > > > Regards, > > > KAWASHIMA Takahiro > > > > > > > The problem is the code in question does not check the return code of > > > > MPI_T_cvar_handle_alloc . We are returning an error and they still try > > > > to use the handle (which is stale). Uncomment this section of the code: > > > > > > > > > > > > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: > > > > This variable is not recognized by Mvapich. It is OpenMPI specific. > > > > // continue; > > > > > > > > > > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich > > > > must not have implemented it (and thus should not claim to be MPI 3.0). > > > > > > > > -Nathan > > > > > > > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote: > > > > > Hi, > > > > > > > > > > I encountered the same SEGV reported on the users list when > > > > > running varList program. > > > > > > > > > > http://www.open-mpi.org/community/lists/users/2014/07/24792.php > > > > > > > > > > mpiexec -n 1 ./varList: > > > > > > > > > > ... snip ... > > > > > event U/D-2 CHAR n/a > > > > > ALL > > > > > event_base_verboseD/D-8 INTn/a > > > > > LOCAL0 > > > > > event_libevent2021_event_include U/A-3 CHAR n/a > > > > > LOCALpoll > > > > > opal_event_includeU/A-3 CHAR n/a > > > > > LOCALpoll > > > > > event_libevent2021_major_version D/A-9 INTn/a > > > > > UNKNOWN 1 > > > > > event_libevent2021_minor_version D/A-9 INTn/a
Re: [OMPI devel] MPI_T SEGV on DSO
This is odd. The variable in question is registered by the MCA itself. I will take a look and see if I can determine why it isn't being deregistered correctly when the rest of the component's parameters are. -Nathan On Wed, Jul 30, 2014 at 08:17:15AM +0900, KAWASHIMA Takahiro wrote: > Nathan, > > Thanks for your response. > > Yes. My previous mail was the result of uncommented code. > Now I also pulled latest varList source code which uncommented > the section you mentioned, but the result was same. > > If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX > for variables for unloaded components, not returning > MPI_T_ERR_INVALID_INDEX is the problem. > > I run varList on GDB and found that MPI_T_cvar_get_info returns > MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane). > But it returns MPI_SUCCESS for shmem_sysv_major_version. > The difference is mbv_flags values. mbv_flags is 0x44 for > shmem_sysv_priority on MPI_T_cvar_get_info call so that > mca_base_var_get function in opal/mca/base/mca_base_var.c > returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for > shmem_sysv_major_version so that mca_base_var_get function > returns OPAL_SUCCESS. > > Control variables for unloaded components are not deregistered > completely? > > I can track it more when I have time. > > My environment: > OS: Debian GNU/Linux wheezy > CPU: x86_64 > Run: mpiexec -n 1 varList > Open MPI source: trunk r32338 (almost latest) > Open MPI configure: > enable_picky=yes > enable_debug=yes > enable_mem_debug=yes > enable_mem_profile=yes > enable_memchecker=no > > enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras > enable_contrib_no_build=vt > enable_mpi_cxx=no > enable_mpi_f77=no > enable_mpi_f90=no > enable_ipv6=no > enable_mpi_io=no > with_devel_headers=no > with_wrapper_cflags=-g > with_wrapper_cxxflags=-g > with_wrapper_fflags=-g > with_wrapper_fcflags=-g > > Regards, > KAWASHIMA Takahiro > > > The problem is the code in question does not check the return code of > > MPI_T_cvar_handle_alloc . We are returning an error and they still try > > to use the handle (which is stale). Uncomment this section of the code: > > > > > > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This > > variable is not recognized by Mvapich. It is OpenMPI specific. > > // continue; > > > > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich > > must not have implemented it (and thus should not claim to be MPI 3.0). > > > > -Nathan > > > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote: > > > Hi, > > > > > > I encountered the same SEGV reported on the users list when > > > running varList program. > > > > > > http://www.open-mpi.org/community/lists/users/2014/07/24792.php > > > > > > mpiexec -n 1 ./varList: > > > > > > ... snip ... > > > event U/D-2 CHAR n/a > > > ALL > > > event_base_verboseD/D-8 INTn/a > > > LOCAL0 > > > event_libevent2021_event_include U/A-3 CHAR n/a > > > LOCALpoll > > > opal_event_includeU/A-3 CHAR n/a > > > LOCALpoll > > > event_libevent2021_major_version D/A-9 INTn/a > > > UNKNOWN 1 > > > event_libevent2021_minor_version D/A-9 INTn/a > > > UNKNOWN 9 > > > event_libevent2021_release_versionD/A-9 INTn/a > > > UNKNOWN 0 > > > shmem U/D-2 CHAR n/a > > > ALL > > > shmem_base_verboseD/D-8 INTn/a > > > LOCAL0 > > > shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a > > > ALL-EQ > > > shmem_mmap_priority U/A-3 INTn/a > > > ALL 50 > > > shmem_mmap_enable_nfs_warning D/A-9 INTn/a > > > LOCALtrue > > > shmem_mmap_relocate_backing_file D/A-9 INTn/a > > > ALL 0 > > > shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a > > > ALL /dev/shm > > > shmem_mmap_major_version D/A-9 INTn/a > > > UNKNOWN 1 > > > shmem_mmap_minor_version D/A-9 INTn/a > > > UNKNOWN 9 > > > shmem_mmap_release_versionD/A-9 INTn/a > > > UNKNOWN 0 > > > shmem_posix_major_version D/A-9 INTn/a
Re: [OMPI devel] MPI_T SEGV on DSO
Nathan, Thanks for your response. Yes. My previous mail was the result of uncommented code. Now I also pulled latest varList source code which uncommented the section you mentioned, but the result was same. If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX for variables for unloaded components, not returning MPI_T_ERR_INVALID_INDEX is the problem. I run varList on GDB and found that MPI_T_cvar_get_info returns MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane). But it returns MPI_SUCCESS for shmem_sysv_major_version. The difference is mbv_flags values. mbv_flags is 0x44 for shmem_sysv_priority on MPI_T_cvar_get_info call so that mca_base_var_get function in opal/mca/base/mca_base_var.c returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for shmem_sysv_major_version so that mca_base_var_get function returns OPAL_SUCCESS. Control variables for unloaded components are not deregistered completely? I can track it more when I have time. My environment: OS: Debian GNU/Linux wheezy CPU: x86_64 Run: mpiexec -n 1 varList Open MPI source: trunk r32338 (almost latest) Open MPI configure: enable_picky=yes enable_debug=yes enable_mem_debug=yes enable_mem_profile=yes enable_memchecker=no enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras enable_contrib_no_build=vt enable_mpi_cxx=no enable_mpi_f77=no enable_mpi_f90=no enable_ipv6=no enable_mpi_io=no with_devel_headers=no with_wrapper_cflags=-g with_wrapper_cxxflags=-g with_wrapper_fflags=-g with_wrapper_fcflags=-g Regards, KAWASHIMA Takahiro > The problem is the code in question does not check the return code of > MPI_T_cvar_handle_alloc . We are returning an error and they still try > to use the handle (which is stale). Uncomment this section of the code: > > > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This > variable is not recognized by Mvapich. It is OpenMPI specific. > // continue; > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich > must not have implemented it (and thus should not claim to be MPI 3.0). > > -Nathan > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote: > > Hi, > > > > I encountered the same SEGV reported on the users list when > > running varList program. > > > > http://www.open-mpi.org/community/lists/users/2014/07/24792.php > > > > mpiexec -n 1 ./varList: > > > > ... snip ... > > event U/D-2 CHAR n/a ALL > > event_base_verboseD/D-8 INTn/a > > LOCAL0 > > event_libevent2021_event_include U/A-3 CHAR n/a > > LOCALpoll > > opal_event_includeU/A-3 CHAR n/a > > LOCALpoll > > event_libevent2021_major_version D/A-9 INTn/a > > UNKNOWN 1 > > event_libevent2021_minor_version D/A-9 INTn/a > > UNKNOWN 9 > > event_libevent2021_release_versionD/A-9 INTn/a > > UNKNOWN 0 > > shmem U/D-2 CHAR n/a ALL > > shmem_base_verboseD/D-8 INTn/a > > LOCAL0 > > shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a > > ALL-EQ > > shmem_mmap_priority U/A-3 INTn/a ALL > > 50 > > shmem_mmap_enable_nfs_warning D/A-9 INTn/a > > LOCALtrue > > shmem_mmap_relocate_backing_file D/A-9 INTn/a ALL > > 0 > > shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL > > /dev/shm > > shmem_mmap_major_version D/A-9 INTn/a > > UNKNOWN 1 > > shmem_mmap_minor_version D/A-9 INTn/a > > UNKNOWN 9 > > shmem_mmap_release_versionD/A-9 INTn/a > > UNKNOWN 0 > > shmem_posix_major_version D/A-9 INTn/a > > UNKNOWN 1201644720 > > shmem_posix_minor_version D/A-9 INTn/a > > UNKNOWN 32756 > > shmem_posix_release_version D/A-9 INTn/a > > UNKNOWN 6 > > [ppc:12688] *** Process received signal *** > > [ppc:12688] Signal: Segmentation fault (11) > > [ppc:12688] Signal code: Invalid permissions (2) > > [ppc:12688] Failing at address: 0x7ff4479f83d8 > > [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0] > > [ppc:12688] [ 1] > >
Re: [OMPI devel] MPI_T SEGV on DSO
FYI: We have pinged the upstream/LLNL authors of varlist about this issue. On Jul 29, 2014, at 11:38 AM, Nathan Hjelmwrote: > > The problem is the code in question does not check the return code of > MPI_T_cvar_handle_alloc . We are returning an error and they still try > to use the handle (which is stale). Uncomment this section of the code: > > >//if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This > variable is not recognized by Mvapich. It is OpenMPI specific. >// continue; > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich > must not have implemented it (and thus should not claim to be MPI 3.0). > > -Nathan > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote: >> Hi, >> >> I encountered the same SEGV reported on the users list when >> running varList program. >> >> http://www.open-mpi.org/community/lists/users/2014/07/24792.php >> >> mpiexec -n 1 ./varList: >> >> ... snip ... >> event U/D-2 CHAR n/a ALL >> event_base_verboseD/D-8 INTn/a >> LOCAL0 >> event_libevent2021_event_include U/A-3 CHAR n/a >> LOCALpoll >> opal_event_includeU/A-3 CHAR n/a >> LOCALpoll >> event_libevent2021_major_version D/A-9 INTn/a >> UNKNOWN 1 >> event_libevent2021_minor_version D/A-9 INTn/a >> UNKNOWN 9 >> event_libevent2021_release_versionD/A-9 INTn/a >> UNKNOWN 0 >> shmem U/D-2 CHAR n/a ALL >> shmem_base_verboseD/D-8 INTn/a >> LOCAL0 >> shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a >> ALL-EQ >> shmem_mmap_priority U/A-3 INTn/a ALL >> 50 >> shmem_mmap_enable_nfs_warning D/A-9 INTn/a >> LOCALtrue >> shmem_mmap_relocate_backing_file D/A-9 INTn/a ALL >> 0 >> shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL >> /dev/shm >> shmem_mmap_major_version D/A-9 INTn/a >> UNKNOWN 1 >> shmem_mmap_minor_version D/A-9 INTn/a >> UNKNOWN 9 >> shmem_mmap_release_versionD/A-9 INTn/a >> UNKNOWN 0 >> shmem_posix_major_version D/A-9 INTn/a >> UNKNOWN 1201644720 >> shmem_posix_minor_version D/A-9 INTn/a >> UNKNOWN 32756 >> shmem_posix_release_version D/A-9 INTn/a >> UNKNOWN 6 >> [ppc:12688] *** Process received signal *** >> [ppc:12688] Signal: Segmentation fault (11) >> [ppc:12688] Signal code: Invalid permissions (2) >> [ppc:12688] Failing at address: 0x7ff4479f83d8 >> [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0] >> [ppc:12688] [ 1] >> /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7] >> [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc] >> [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598] >> [ppc:12688] [ 4] >> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed] >> [ppc:12688] [ 5] ./varlist[0x4016c9] >> [ppc:12688] *** End of error message *** >> >> >> I tracked this error and found that this seems related to DSO. >> >> The error occurs when accessing value->intval for the >> control variable shmem_sysv_major_version in MPI_T_cvar_read. >> >> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c >> >> The 'value' was gotten by mca_base_var_get_value and it points >> mca_shmem_sysv_component.super.base_version.mca_component_major_version, >> which was dlclose'd in MPI_INIT for DSO. >> (component mmap is selected on my environment) >> >> Abnormal shmem_posix_{major,minor,relase}_version values in >> my output above are the same reason. SEGV occurs if the memory >> was returned to kernel, and abnormal values are printed >> if not yet. >> >> So this SEGV doesn't occur if I configure Open MPI with >> --disable-dlopen option. I think it's the reason why Nathan >> doesn't see this error. >> >> Regards, >> KAWASHIMA Takahiro >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15304.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: >
Re: [OMPI devel] MPI_T SEGV on DSO
The problem is the code in question does not check the return code of MPI_T_cvar_handle_alloc . We are returning an error and they still try to use the handle (which is stale). Uncomment this section of the code: //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This variable is not recognized by Mvapich. It is OpenMPI specific. // continue; Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich must not have implemented it (and thus should not claim to be MPI 3.0). -Nathan On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote: > Hi, > > I encountered the same SEGV reported on the users list when > running varList program. > > http://www.open-mpi.org/community/lists/users/2014/07/24792.php > > mpiexec -n 1 ./varList: > > ... snip ... > event U/D-2 CHAR n/a ALL > event_base_verboseD/D-8 INTn/a LOCAL >0 > event_libevent2021_event_include U/A-3 CHAR n/a LOCAL >poll > opal_event_includeU/A-3 CHAR n/a LOCAL >poll > event_libevent2021_major_version D/A-9 INTn/a > UNKNOWN 1 > event_libevent2021_minor_version D/A-9 INTn/a > UNKNOWN 9 > event_libevent2021_release_versionD/A-9 INTn/a > UNKNOWN 0 > shmem U/D-2 CHAR n/a ALL > shmem_base_verboseD/D-8 INTn/a LOCAL >0 > shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a ALL-EQ > shmem_mmap_priority U/A-3 INTn/a ALL >50 > shmem_mmap_enable_nfs_warning D/A-9 INTn/a LOCAL >true > shmem_mmap_relocate_backing_file D/A-9 INTn/a ALL >0 > shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL >/dev/shm > shmem_mmap_major_version D/A-9 INTn/a > UNKNOWN 1 > shmem_mmap_minor_version D/A-9 INTn/a > UNKNOWN 9 > shmem_mmap_release_versionD/A-9 INTn/a > UNKNOWN 0 > shmem_posix_major_version D/A-9 INTn/a > UNKNOWN 1201644720 > shmem_posix_minor_version D/A-9 INTn/a > UNKNOWN 32756 > shmem_posix_release_version D/A-9 INTn/a > UNKNOWN 6 > [ppc:12688] *** Process received signal *** > [ppc:12688] Signal: Segmentation fault (11) > [ppc:12688] Signal code: Invalid permissions (2) > [ppc:12688] Failing at address: 0x7ff4479f83d8 > [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0] > [ppc:12688] [ 1] > /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7] > [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc] > [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598] > [ppc:12688] [ 4] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed] > [ppc:12688] [ 5] ./varlist[0x4016c9] > [ppc:12688] *** End of error message *** > > > I tracked this error and found that this seems related to DSO. > > The error occurs when accessing value->intval for the > control variable shmem_sysv_major_version in MPI_T_cvar_read. > > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c > > The 'value' was gotten by mca_base_var_get_value and it points > mca_shmem_sysv_component.super.base_version.mca_component_major_version, > which was dlclose'd in MPI_INIT for DSO. > (component mmap is selected on my environment) > > Abnormal shmem_posix_{major,minor,relase}_version values in > my output above are the same reason. SEGV occurs if the memory > was returned to kernel, and abnormal values are printed > if not yet. > > So this SEGV doesn't occur if I configure Open MPI with > --disable-dlopen option. I think it's the reason why Nathan > doesn't see this error. > > Regards, > KAWASHIMA Takahiro > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15304.php pgpuuOas5m31Y.pgp Description: PGP signature
[OMPI devel] MPI_T SEGV on DSO
Hi, I encountered the same SEGV reported on the users list when running varList program. http://www.open-mpi.org/community/lists/users/2014/07/24792.php mpiexec -n 1 ./varList: ... snip ... event U/D-2 CHAR n/a ALL event_base_verboseD/D-8 INTn/a LOCAL 0 event_libevent2021_event_include U/A-3 CHAR n/a LOCAL poll opal_event_includeU/A-3 CHAR n/a LOCAL poll event_libevent2021_major_version D/A-9 INTn/a UNKNOWN 1 event_libevent2021_minor_version D/A-9 INTn/a UNKNOWN 9 event_libevent2021_release_versionD/A-9 INTn/a UNKNOWN 0 shmem U/D-2 CHAR n/a ALL shmem_base_verboseD/D-8 INTn/a LOCAL 0 shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a ALL-EQ shmem_mmap_priority U/A-3 INTn/a ALL 50 shmem_mmap_enable_nfs_warning D/A-9 INTn/a LOCAL true shmem_mmap_relocate_backing_file D/A-9 INTn/a ALL 0 shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL /dev/shm shmem_mmap_major_version D/A-9 INTn/a UNKNOWN 1 shmem_mmap_minor_version D/A-9 INTn/a UNKNOWN 9 shmem_mmap_release_versionD/A-9 INTn/a UNKNOWN 0 shmem_posix_major_version D/A-9 INTn/a UNKNOWN 1201644720 shmem_posix_minor_version D/A-9 INTn/a UNKNOWN 32756 shmem_posix_release_version D/A-9 INTn/a UNKNOWN 6 [ppc:12688] *** Process received signal *** [ppc:12688] Signal: Segmentation fault (11) [ppc:12688] Signal code: Invalid permissions (2) [ppc:12688] Failing at address: 0x7ff4479f83d8 [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0] [ppc:12688] [ 1] /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7] [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc] [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598] [ppc:12688] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed] [ppc:12688] [ 5] ./varlist[0x4016c9] [ppc:12688] *** End of error message *** I tracked this error and found that this seems related to DSO. The error occurs when accessing value->intval for the control variable shmem_sysv_major_version in MPI_T_cvar_read. https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c The 'value' was gotten by mca_base_var_get_value and it points mca_shmem_sysv_component.super.base_version.mca_component_major_version, which was dlclose'd in MPI_INIT for DSO. (component mmap is selected on my environment) Abnormal shmem_posix_{major,minor,relase}_version values in my output above are the same reason. SEGV occurs if the memory was returned to kernel, and abnormal values are printed if not yet. So this SEGV doesn't occur if I configure Open MPI with --disable-dlopen option. I think it's the reason why Nathan doesn't see this error. Regards, KAWASHIMA Takahiro