Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-30 Thread Nathan Hjelm

Yup, just noticed that. All component variables should be registered
with mca_base_component_var_register but the versions were registered
with the generic register function. The code in question is the oldest
part of the MCA rewrite so it probably was missed when the component
variable register function was added. Fixing now. 

-Nathan

On Thu, Jul 31, 2014 at 12:40:55AM +0900, KAWASHIMA Takahiro wrote:
> Nathan,
> 
> The diffrences seems to be the flags on registering.
> 
> Normal MCA variables shmem_sysv_priority etc. have flag
> MCA_BASE_VAR_FLAG_DWG so that they are deregistered through
> mca_base_var_group_deregister in mca_base_component_unload.
> 
> But shmem_sysv_major_version doesn't have the flag.
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > This is odd. The variable in question is registered by the MCA itself. I
> > will take a look and see if I can determine why it isn't being
> > deregistered correctly when the rest of the component's parameters are.
> > 
> > -Nathan
> > 
> > On Wed, Jul 30, 2014 at 08:17:15AM +0900, KAWASHIMA Takahiro wrote:
> > > Nathan,
> > > 
> > > Thanks for your response.
> > > 
> > > Yes. My previous mail was the result of uncommented code.
> > > Now I also pulled latest varList source code which uncommented
> > > the section you mentioned, but the result was same.
> > > 
> > > If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX
> > > for variables for unloaded components, not returning
> > > MPI_T_ERR_INVALID_INDEX is the problem.
> > > 
> > > I run varList on GDB and found that MPI_T_cvar_get_info returns
> > > MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane).
> > > But it returns MPI_SUCCESS for shmem_sysv_major_version.
> > > The difference is mbv_flags values. mbv_flags is 0x44 for
> > > shmem_sysv_priority on MPI_T_cvar_get_info call so that
> > > mca_base_var_get function in opal/mca/base/mca_base_var.c
> > > returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for
> > > shmem_sysv_major_version so that mca_base_var_get function
> > > returns OPAL_SUCCESS.
> > > 
> > > Control variables for unloaded components are not deregistered
> > > completely?
> > > 
> > > I can track it more when I have time.
> > > 
> > > My environment:
> > >   OS: Debian GNU/Linux wheezy
> > >   CPU: x86_64
> > >   Run: mpiexec -n 1 varList
> > >   Open MPI source: trunk r32338 (almost latest)
> > >   Open MPI configure:
> > > enable_picky=yes
> > > enable_debug=yes
> > > enable_mem_debug=yes
> > > enable_mem_profile=yes
> > > enable_memchecker=no
> > > 
> > > enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras
> > > enable_contrib_no_build=vt
> > > enable_mpi_cxx=no
> > > enable_mpi_f77=no
> > > enable_mpi_f90=no
> > > enable_ipv6=no
> > > enable_mpi_io=no
> > > with_devel_headers=no
> > > with_wrapper_cflags=-g
> > > with_wrapper_cxxflags=-g
> > > with_wrapper_fflags=-g
> > > with_wrapper_fcflags=-g
> > > 
> > > Regards,
> > > KAWASHIMA Takahiro
> > > 
> > > > The problem is the code in question does not check the return code of
> > > > MPI_T_cvar_handle_alloc . We are returning an error and they still try
> > > > to use the handle (which is stale). Uncomment this section of the code:
> > > > 
> > > > 
> > > > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: 
> > > > This variable is not recognized by Mvapich. It is OpenMPI specific.
> > > > //  continue;
> > > > 
> > > > 
> > > > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
> > > > must not have implemented it (and thus should not claim to be MPI 3.0).
> > > > 
> > > > -Nathan
> > > > 
> > > > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
> > > > > Hi,
> > > > > 
> > > > > I encountered the same SEGV reported on the users list when
> > > > > running varList program.
> > > > > 
> > > > >   http://www.open-mpi.org/community/lists/users/2014/07/24792.php
> > > > > 
> > > > > mpiexec -n 1 ./varList:
> > > > > 
> > > > > ... snip ...
> > > > > event U/D-2 CHAR   n/a
> > > > >   ALL
> > > > > event_base_verboseD/D-8 INTn/a
> > > > >   LOCAL0
> > > > > event_libevent2021_event_include  U/A-3 CHAR   n/a
> > > > >   LOCALpoll
> > > > > opal_event_includeU/A-3 CHAR   n/a
> > > > >   LOCALpoll
> > > > > event_libevent2021_major_version  D/A-9 INTn/a
> > > > >   UNKNOWN  1
> > > > > event_libevent2021_minor_version  D/A-9 INTn/a

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-30 Thread Nathan Hjelm

This is odd. The variable in question is registered by the MCA itself. I
will take a look and see if I can determine why it isn't being
deregistered correctly when the rest of the component's parameters are.

-Nathan

On Wed, Jul 30, 2014 at 08:17:15AM +0900, KAWASHIMA Takahiro wrote:
> Nathan,
> 
> Thanks for your response.
> 
> Yes. My previous mail was the result of uncommented code.
> Now I also pulled latest varList source code which uncommented
> the section you mentioned, but the result was same.
> 
> If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX
> for variables for unloaded components, not returning
> MPI_T_ERR_INVALID_INDEX is the problem.
> 
> I run varList on GDB and found that MPI_T_cvar_get_info returns
> MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane).
> But it returns MPI_SUCCESS for shmem_sysv_major_version.
> The difference is mbv_flags values. mbv_flags is 0x44 for
> shmem_sysv_priority on MPI_T_cvar_get_info call so that
> mca_base_var_get function in opal/mca/base/mca_base_var.c
> returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for
> shmem_sysv_major_version so that mca_base_var_get function
> returns OPAL_SUCCESS.
> 
> Control variables for unloaded components are not deregistered
> completely?
> 
> I can track it more when I have time.
> 
> My environment:
>   OS: Debian GNU/Linux wheezy
>   CPU: x86_64
>   Run: mpiexec -n 1 varList
>   Open MPI source: trunk r32338 (almost latest)
>   Open MPI configure:
> enable_picky=yes
> enable_debug=yes
> enable_mem_debug=yes
> enable_mem_profile=yes
> enable_memchecker=no
> 
> enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras
> enable_contrib_no_build=vt
> enable_mpi_cxx=no
> enable_mpi_f77=no
> enable_mpi_f90=no
> enable_ipv6=no
> enable_mpi_io=no
> with_devel_headers=no
> with_wrapper_cflags=-g
> with_wrapper_cxxflags=-g
> with_wrapper_fflags=-g
> with_wrapper_fcflags=-g
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > The problem is the code in question does not check the return code of
> > MPI_T_cvar_handle_alloc . We are returning an error and they still try
> > to use the handle (which is stale). Uncomment this section of the code:
> > 
> > 
> > //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This 
> > variable is not recognized by Mvapich. It is OpenMPI specific.
> > //  continue;
> > 
> > 
> > Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
> > must not have implemented it (and thus should not claim to be MPI 3.0).
> > 
> > -Nathan
> > 
> > On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
> > > Hi,
> > > 
> > > I encountered the same SEGV reported on the users list when
> > > running varList program.
> > > 
> > >   http://www.open-mpi.org/community/lists/users/2014/07/24792.php
> > > 
> > > mpiexec -n 1 ./varList:
> > > 
> > > ... snip ...
> > > event U/D-2 CHAR   n/a  
> > > ALL
> > > event_base_verboseD/D-8 INTn/a  
> > > LOCAL0
> > > event_libevent2021_event_include  U/A-3 CHAR   n/a  
> > > LOCALpoll
> > > opal_event_includeU/A-3 CHAR   n/a  
> > > LOCALpoll
> > > event_libevent2021_major_version  D/A-9 INTn/a  
> > > UNKNOWN  1
> > > event_libevent2021_minor_version  D/A-9 INTn/a  
> > > UNKNOWN  9
> > > event_libevent2021_release_versionD/A-9 INTn/a  
> > > UNKNOWN  0
> > > shmem U/D-2 CHAR   n/a  
> > > ALL
> > > shmem_base_verboseD/D-8 INTn/a  
> > > LOCAL0
> > > shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR   n/a  
> > > ALL-EQ
> > > shmem_mmap_priority   U/A-3 INTn/a  
> > > ALL  50
> > > shmem_mmap_enable_nfs_warning D/A-9 INTn/a  
> > > LOCALtrue
> > > shmem_mmap_relocate_backing_file  D/A-9 INTn/a  
> > > ALL  0
> > > shmem_mmap_backing_file_base_dir  D/A-9 CHAR   n/a  
> > > ALL  /dev/shm
> > > shmem_mmap_major_version  D/A-9 INTn/a  
> > > UNKNOWN  1
> > > shmem_mmap_minor_version  D/A-9 INTn/a  
> > > UNKNOWN  9
> > > shmem_mmap_release_versionD/A-9 INTn/a  
> > > UNKNOWN  0
> > > shmem_posix_major_version D/A-9 INTn/a  

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-29 Thread KAWASHIMA Takahiro
Nathan,

Thanks for your response.

Yes. My previous mail was the result of uncommented code.
Now I also pulled latest varList source code which uncommented
the section you mentioned, but the result was same.

If MPI_T_cvar_get_info should return MPI_T_ERR_INVALID_INDEX
for variables for unloaded components, not returning
MPI_T_ERR_INVALID_INDEX is the problem.

I run varList on GDB and found that MPI_T_cvar_get_info returns
MPI_T_ERR_INVALID_INDEX for shmem_sysv_priority (this is sane).
But it returns MPI_SUCCESS for shmem_sysv_major_version.
The difference is mbv_flags values. mbv_flags is 0x44 for
shmem_sysv_priority on MPI_T_cvar_get_info call so that
mca_base_var_get function in opal/mca/base/mca_base_var.c
returns OPAL_ERR_NOT_FOUND. But mbv_flags is 0x10003 for
shmem_sysv_major_version so that mca_base_var_get function
returns OPAL_SUCCESS.

Control variables for unloaded components are not deregistered
completely?

I can track it more when I have time.

My environment:
  OS: Debian GNU/Linux wheezy
  CPU: x86_64
  Run: mpiexec -n 1 varList
  Open MPI source: trunk r32338 (almost latest)
  Open MPI configure:
enable_picky=yes
enable_debug=yes
enable_mem_debug=yes
enable_mem_profile=yes
enable_memchecker=no

enable_mca_no_build=btl-elan,btl-gm,btl-mx,btl-ofud,btl-portals,btl-sctp,btl-template,btl-udapl,common-mx,common-portals,ess-alps,ess-cnos,ess-lsf,ess-portals_utcp,ess-singleton,ess-slurm,grpcomm-cnos,mpool-fake,mtl,notifier,plm-alps,plm-ccp,plm-lsf,plm-process,plm-slurm,plm-submit,plm-tm,plm-xgrid,pml-cm,pml-csum,pml-example,pml-v,ras
enable_contrib_no_build=vt
enable_mpi_cxx=no
enable_mpi_f77=no
enable_mpi_f90=no
enable_ipv6=no
enable_mpi_io=no
with_devel_headers=no
with_wrapper_cflags=-g
with_wrapper_cxxflags=-g
with_wrapper_fflags=-g
with_wrapper_fcflags=-g

Regards,
KAWASHIMA Takahiro

> The problem is the code in question does not check the return code of
> MPI_T_cvar_handle_alloc . We are returning an error and they still try
> to use the handle (which is stale). Uncomment this section of the code:
> 
> 
> //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This 
> variable is not recognized by Mvapich. It is OpenMPI specific.
> //  continue;
> 
> 
> Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
> must not have implemented it (and thus should not claim to be MPI 3.0).
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
> > Hi,
> > 
> > I encountered the same SEGV reported on the users list when
> > running varList program.
> > 
> >   http://www.open-mpi.org/community/lists/users/2014/07/24792.php
> > 
> > mpiexec -n 1 ./varList:
> > 
> > ... snip ...
> > event U/D-2 CHAR   n/a  ALL
> > event_base_verboseD/D-8 INTn/a  
> > LOCAL0
> > event_libevent2021_event_include  U/A-3 CHAR   n/a  
> > LOCALpoll
> > opal_event_includeU/A-3 CHAR   n/a  
> > LOCALpoll
> > event_libevent2021_major_version  D/A-9 INTn/a  
> > UNKNOWN  1
> > event_libevent2021_minor_version  D/A-9 INTn/a  
> > UNKNOWN  9
> > event_libevent2021_release_versionD/A-9 INTn/a  
> > UNKNOWN  0
> > shmem U/D-2 CHAR   n/a  ALL
> > shmem_base_verboseD/D-8 INTn/a  
> > LOCAL0
> > shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR   n/a  
> > ALL-EQ
> > shmem_mmap_priority   U/A-3 INTn/a  ALL 
> >  50
> > shmem_mmap_enable_nfs_warning D/A-9 INTn/a  
> > LOCALtrue
> > shmem_mmap_relocate_backing_file  D/A-9 INTn/a  ALL 
> >  0
> > shmem_mmap_backing_file_base_dir  D/A-9 CHAR   n/a  ALL 
> >  /dev/shm
> > shmem_mmap_major_version  D/A-9 INTn/a  
> > UNKNOWN  1
> > shmem_mmap_minor_version  D/A-9 INTn/a  
> > UNKNOWN  9
> > shmem_mmap_release_versionD/A-9 INTn/a  
> > UNKNOWN  0
> > shmem_posix_major_version D/A-9 INTn/a  
> > UNKNOWN  1201644720
> > shmem_posix_minor_version D/A-9 INTn/a  
> > UNKNOWN  32756
> > shmem_posix_release_version   D/A-9 INTn/a  
> > UNKNOWN  6
> > [ppc:12688] *** Process received signal ***
> > [ppc:12688] Signal: Segmentation fault (11)
> > [ppc:12688] Signal code: Invalid permissions (2)
> > [ppc:12688] Failing at address: 0x7ff4479f83d8
> > [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0]
> > [ppc:12688] [ 1] 
> > 

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-29 Thread Jeff Squyres (jsquyres)
FYI: We have pinged the upstream/LLNL authors of varlist about this issue.


On Jul 29, 2014, at 11:38 AM, Nathan Hjelm  wrote:

> 
> The problem is the code in question does not check the return code of
> MPI_T_cvar_handle_alloc . We are returning an error and they still try
> to use the handle (which is stale). Uncomment this section of the code:
> 
> 
>//if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This 
> variable is not recognized by Mvapich. It is OpenMPI specific.
>//  continue;
> 
> 
> Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
> must not have implemented it (and thus should not claim to be MPI 3.0).
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
>> Hi,
>> 
>> I encountered the same SEGV reported on the users list when
>> running varList program.
>> 
>>  http://www.open-mpi.org/community/lists/users/2014/07/24792.php
>> 
>> mpiexec -n 1 ./varList:
>> 
>> ... snip ...
>> event U/D-2 CHAR   n/a  ALL
>> event_base_verboseD/D-8 INTn/a  
>> LOCAL0
>> event_libevent2021_event_include  U/A-3 CHAR   n/a  
>> LOCALpoll
>> opal_event_includeU/A-3 CHAR   n/a  
>> LOCALpoll
>> event_libevent2021_major_version  D/A-9 INTn/a  
>> UNKNOWN  1
>> event_libevent2021_minor_version  D/A-9 INTn/a  
>> UNKNOWN  9
>> event_libevent2021_release_versionD/A-9 INTn/a  
>> UNKNOWN  0
>> shmem U/D-2 CHAR   n/a  ALL
>> shmem_base_verboseD/D-8 INTn/a  
>> LOCAL0
>> shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR   n/a  
>> ALL-EQ
>> shmem_mmap_priority   U/A-3 INTn/a  ALL  
>> 50
>> shmem_mmap_enable_nfs_warning D/A-9 INTn/a  
>> LOCALtrue
>> shmem_mmap_relocate_backing_file  D/A-9 INTn/a  ALL  
>> 0
>> shmem_mmap_backing_file_base_dir  D/A-9 CHAR   n/a  ALL  
>> /dev/shm
>> shmem_mmap_major_version  D/A-9 INTn/a  
>> UNKNOWN  1
>> shmem_mmap_minor_version  D/A-9 INTn/a  
>> UNKNOWN  9
>> shmem_mmap_release_versionD/A-9 INTn/a  
>> UNKNOWN  0
>> shmem_posix_major_version D/A-9 INTn/a  
>> UNKNOWN  1201644720
>> shmem_posix_minor_version D/A-9 INTn/a  
>> UNKNOWN  32756
>> shmem_posix_release_version   D/A-9 INTn/a  
>> UNKNOWN  6
>> [ppc:12688] *** Process received signal ***
>> [ppc:12688] Signal: Segmentation fault (11)
>> [ppc:12688] Signal code: Invalid permissions (2)
>> [ppc:12688] Failing at address: 0x7ff4479f83d8
>> [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0]
>> [ppc:12688] [ 1] 
>> /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7]
>> [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc]
>> [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598]
>> [ppc:12688] [ 4] 
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed]
>> [ppc:12688] [ 5] ./varlist[0x4016c9]
>> [ppc:12688] *** End of error message ***
>>  
>> 
>> I tracked this error and found that this seems related to DSO.
>> 
>> The error occurs when accessing value->intval for the
>> control variable shmem_sysv_major_version in MPI_T_cvar_read.
>> 
>>  https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c
>> 
>> The 'value' was gotten by mca_base_var_get_value and it points
>> mca_shmem_sysv_component.super.base_version.mca_component_major_version,
>> which was dlclose'd in MPI_INIT for DSO.
>> (component mmap is selected on my environment)
>> 
>> Abnormal shmem_posix_{major,minor,relase}_version values in
>> my output above are the same reason. SEGV occurs if the memory
>> was returned to kernel, and abnormal values are printed
>> if not yet.
>> 
>> So this SEGV doesn't occur if I configure Open MPI with
>> --disable-dlopen option. I think it's the reason why Nathan
>> doesn't see this error.
>> 
>> Regards,
>> KAWASHIMA Takahiro
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15304.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> 

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-29 Thread Nathan Hjelm

The problem is the code in question does not check the return code of
MPI_T_cvar_handle_alloc . We are returning an error and they still try
to use the handle (which is stale). Uncomment this section of the code:


//if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This 
variable is not recognized by Mvapich. It is OpenMPI specific.
//  continue;


Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
must not have implemented it (and thus should not claim to be MPI 3.0).

-Nathan

On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
> Hi,
> 
> I encountered the same SEGV reported on the users list when
> running varList program.
> 
>   http://www.open-mpi.org/community/lists/users/2014/07/24792.php
> 
> mpiexec -n 1 ./varList:
> 
> ... snip ...
> event U/D-2 CHAR   n/a  ALL
> event_base_verboseD/D-8 INTn/a  LOCAL 
>0
> event_libevent2021_event_include  U/A-3 CHAR   n/a  LOCAL 
>poll
> opal_event_includeU/A-3 CHAR   n/a  LOCAL 
>poll
> event_libevent2021_major_version  D/A-9 INTn/a  
> UNKNOWN  1
> event_libevent2021_minor_version  D/A-9 INTn/a  
> UNKNOWN  9
> event_libevent2021_release_versionD/A-9 INTn/a  
> UNKNOWN  0
> shmem U/D-2 CHAR   n/a  ALL
> shmem_base_verboseD/D-8 INTn/a  LOCAL 
>0
> shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR   n/a  ALL-EQ
> shmem_mmap_priority   U/A-3 INTn/a  ALL   
>50
> shmem_mmap_enable_nfs_warning D/A-9 INTn/a  LOCAL 
>true
> shmem_mmap_relocate_backing_file  D/A-9 INTn/a  ALL   
>0
> shmem_mmap_backing_file_base_dir  D/A-9 CHAR   n/a  ALL   
>/dev/shm
> shmem_mmap_major_version  D/A-9 INTn/a  
> UNKNOWN  1
> shmem_mmap_minor_version  D/A-9 INTn/a  
> UNKNOWN  9
> shmem_mmap_release_versionD/A-9 INTn/a  
> UNKNOWN  0
> shmem_posix_major_version D/A-9 INTn/a  
> UNKNOWN  1201644720
> shmem_posix_minor_version D/A-9 INTn/a  
> UNKNOWN  32756
> shmem_posix_release_version   D/A-9 INTn/a  
> UNKNOWN  6
> [ppc:12688] *** Process received signal ***
> [ppc:12688] Signal: Segmentation fault (11)
> [ppc:12688] Signal code: Invalid permissions (2)
> [ppc:12688] Failing at address: 0x7ff4479f83d8
> [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0]
> [ppc:12688] [ 1] 
> /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7]
> [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc]
> [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598]
> [ppc:12688] [ 4] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed]
> [ppc:12688] [ 5] ./varlist[0x4016c9]
> [ppc:12688] *** End of error message ***
>   
> 
> I tracked this error and found that this seems related to DSO.
> 
> The error occurs when accessing value->intval for the
> control variable shmem_sysv_major_version in MPI_T_cvar_read.
> 
>   https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c
> 
> The 'value' was gotten by mca_base_var_get_value and it points
> mca_shmem_sysv_component.super.base_version.mca_component_major_version,
> which was dlclose'd in MPI_INIT for DSO.
> (component mmap is selected on my environment)
> 
> Abnormal shmem_posix_{major,minor,relase}_version values in
> my output above are the same reason. SEGV occurs if the memory
> was returned to kernel, and abnormal values are printed
> if not yet.
> 
> So this SEGV doesn't occur if I configure Open MPI with
> --disable-dlopen option. I think it's the reason why Nathan
> doesn't see this error.
> 
> Regards,
> KAWASHIMA Takahiro
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15304.php


pgpuuOas5m31Y.pgp
Description: PGP signature


[OMPI devel] MPI_T SEGV on DSO

2014-07-29 Thread KAWASHIMA Takahiro
Hi,

I encountered the same SEGV reported on the users list when
running varList program.

  http://www.open-mpi.org/community/lists/users/2014/07/24792.php

mpiexec -n 1 ./varList:

... snip ...
event U/D-2 CHAR   n/a  ALL
event_base_verboseD/D-8 INTn/a  LOCAL   
 0
event_libevent2021_event_include  U/A-3 CHAR   n/a  LOCAL   
 poll
opal_event_includeU/A-3 CHAR   n/a  LOCAL   
 poll
event_libevent2021_major_version  D/A-9 INTn/a  UNKNOWN 
 1
event_libevent2021_minor_version  D/A-9 INTn/a  UNKNOWN 
 9
event_libevent2021_release_versionD/A-9 INTn/a  UNKNOWN 
 0
shmem U/D-2 CHAR   n/a  ALL
shmem_base_verboseD/D-8 INTn/a  LOCAL   
 0
shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR   n/a  ALL-EQ
shmem_mmap_priority   U/A-3 INTn/a  ALL 
 50
shmem_mmap_enable_nfs_warning D/A-9 INTn/a  LOCAL   
 true
shmem_mmap_relocate_backing_file  D/A-9 INTn/a  ALL 
 0
shmem_mmap_backing_file_base_dir  D/A-9 CHAR   n/a  ALL 
 /dev/shm
shmem_mmap_major_version  D/A-9 INTn/a  UNKNOWN 
 1
shmem_mmap_minor_version  D/A-9 INTn/a  UNKNOWN 
 9
shmem_mmap_release_versionD/A-9 INTn/a  UNKNOWN 
 0
shmem_posix_major_version D/A-9 INTn/a  UNKNOWN 
 1201644720
shmem_posix_minor_version D/A-9 INTn/a  UNKNOWN 
 32756
shmem_posix_release_version   D/A-9 INTn/a  UNKNOWN 
 6
[ppc:12688] *** Process received signal ***
[ppc:12688] Signal: Segmentation fault (11)
[ppc:12688] Signal code: Invalid permissions (2)
[ppc:12688] Failing at address: 0x7ff4479f83d8
[ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0]
[ppc:12688] [ 1] 
/home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7]
[ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc]
[ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598]
[ppc:12688] [ 4] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed]
[ppc:12688] [ 5] ./varlist[0x4016c9]
[ppc:12688] *** End of error message ***


I tracked this error and found that this seems related to DSO.

The error occurs when accessing value->intval for the
control variable shmem_sysv_major_version in MPI_T_cvar_read.

  https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c

The 'value' was gotten by mca_base_var_get_value and it points
mca_shmem_sysv_component.super.base_version.mca_component_major_version,
which was dlclose'd in MPI_INIT for DSO.
(component mmap is selected on my environment)

Abnormal shmem_posix_{major,minor,relase}_version values in
my output above are the same reason. SEGV occurs if the memory
was returned to kernel, and abnormal values are printed
if not yet.

So this SEGV doesn't occur if I configure Open MPI with
--disable-dlopen option. I think it's the reason why Nathan
doesn't see this error.

Regards,
KAWASHIMA Takahiro