> On Oct 26, 2014, at 11:12 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
> 
> Ralph,
> 
> this is also a solution.
> the pro is it seems more lightweight than PR #249
> the two cons i can see are :
> - opal_process_name_t alignment goes from 64 to 32 bits
> - some functions (opal_hash_table_*) takes an uint64_t as argument so we
> still need to use memcpy in order to
>  * guarantee 64 bits alignment on some archs (such as sparc)
>  * avoid ugly cast such as uint64_t id = *(uint64_t *)&process_name;

Actually, I propose to also remove that issue. Simple enough to use a 
hash_table_32 to handle the jobids, and let that point to a hash_table_32 of 
vpids. Since we rarely have more than one jobid anyway, the memory overhead 
actually decreases with this model, and we get rid of that annoying need to 
memcpy everything.

> 
> as far as i am concerned, i am fine with your proposed suggestion to
> dump opal_identifier_t.
> 
> about the patch, did you mean you have something ready i can apply to my
> PR ?
> or do you expect me to do the changes (i am ok to do it if needed)

Why don’t I grab your branch, create a separate repo based on it (just to keep 
things clean), push it to my area and give you write access? We can then 
collaborate on the changes and create a PR from there. This way, you don’t need 
to give me write access to your entire repo.

Make sense?
Ralph

> 
> Cheers,
> 
> Gilles
> 
> On 2014/10/27 11:04, Ralph Castain wrote:
>> Just took a glance thru 249 and have a few suggestions on it - will pass 
>> them along tomorrow. I think the right solution is to (a) dump 
>> opal_identifier_t in favor of using opal_process_name_t everywhere in the 
>> opal layer, (b) typedef orte_process_name_t to opal_process_name_t, and (c) 
>> leave ompi_process_name_t as typedef’d to the RTE component in the MPI 
>> layer. This lets other RTEs decide for themselves how they want to handle it.
>> 
>> If you add changes to your branch, I can pass you a patch with my suggested 
>> alterations.
>> 
>>> On Oct 26, 2014, at 5:55 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@gmail.com> wrote:
>>> 
>>> No :-(
>>> I need some extra work to stop declaring orte_process_name_t and 
>>> ompi_process_name_t variables.
>>> #249 will make things much easier.
>>> One option is to use opal_process_name_t everywhere or typedef orte and 
>>> ompi types to the opal one.
>>> An other (lightweight but error prone imho) is to change variable 
>>> declaration only.
>>> Any thought ?
>>> 
>>> Ralph Castain <r...@open-mpi.org> wrote:
>>>> Will PR#249 solve it? If so, we should just go with it as I suspect that 
>>>> is the long-term solution.
>>>> 
>>>>> On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet 
>>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>> 
>>>>> It looks like we faced a similar issue :
>>>>> opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 
>>>>> bits aligned. If you run an alignment sensitive cpu such as sparc and you 
>>>>> are not lucky (so to speak) you can run into this issue.
>>>>> i will make a patch for this shortly
>>>>> 
>>>>> Ralph Castain <r...@open-mpi.org> wrote:
>>>>>> Afraid this must be something about the Sparc - just ran on a Solaris 11 
>>>>>> x86 box and everything works fine.
>>>>>> 
>>>>>> 
>>>>>>> On Oct 26, 2014, at 8:22 AM, Siegmar Gross 
>>>>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>>>> 
>>>>>>> Hi Gilles,
>>>>>>> 
>>>>>>> I wanted to explore which function is called, when I call MPI_Init
>>>>>>> in a C program, because this function should be called from a Java
>>>>>>> program as well. Unfortunately C programs break with a Bus Error
>>>>>>> once more for openmpi-dev-124-g91e9686 on Solaris. I assume that's
>>>>>>> the reason why I get no useful backtrace for my Java program.
>>>>>>> 
>>>>>>> tyr small_prog 117 mpicc -o init_finalize init_finalize.c
>>>>>>> tyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>>>>>>> ...
>>>>>>> (gdb) run -np 1 init_finalize
>>>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 
>>>>>>> init_finalize
>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>> [New Thread 1 (LWP 1)]
>>>>>>> [New LWP    2        ]
>>>>>>> [tyr:19240] *** Process received signal ***
>>>>>>> [tyr:19240] Signal: Bus Error (10)
>>>>>>> [tyr:19240] Signal code: Invalid address alignment (1)
>>>>>>> [tyr:19240] Failing at address: ffffffff7bd1c10c
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdcc04
>>>>>>> /lib/sparcv9/libc.so.1:0xd8b98
>>>>>>> /lib/sparcv9/libc.so.1:0xcc70c
>>>>>>> /lib/sparcv9/libc.so.1:0xcc918
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c
>>>>>>>  [ Signal 10 (BUS)]
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103e8
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init+0x374
>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:PMPI_Init+0x2a8
>>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:main+0x20
>>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:_start+0x7c
>>>>>>> [tyr:19240] *** End of error message ***
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpiexec noticed that process rank 0 with PID 0 on node tyr exited on 
>>>>>>> signal 10 (Bus Error).
>>>>>>> --------------------------------------------------------------------------
>>>>>>> [LWP    2         exited]
>>>>>>> [New Thread 2        ]
>>>>>>> [Switching to Thread 1 (LWP 1)]
>>>>>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
>>>>>>> satisfy query
>>>>>>> (gdb) bt
>>>>>>> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from 
>>>>>>> /usr/lib/sparcv9/ld.so.1
>>>>>>> #1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
>>>>>>> #2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
>>>>>>> #3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
>>>>>>> #4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
>>>>>>> #5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
>>>>>>> #6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
>>>>>>> #7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
>>>>>>> #8  0xffffffff7ec87f60 in vm_close (loader_data=0x0, 
>>>>>>> module=0xffffffff7c901fe0)
>>>>>>> at ../../../openmpi-dev-124-g91e9686/opal/libltdl/loaders/dlopen.c:212
>>>>>>> #9  0xffffffff7ec85534 in lt_dlclose (handle=0x100189b50)
>>>>>>> at ../../../openmpi-dev-124-g91e9686/opal/libltdl/ltdl.c:1982
>>>>>>> #10 0xffffffff7ecaabd4 in ri_destructor (obj=0x1001893a0)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:382
>>>>>>> #11 0xffffffff7eca9504 in opal_obj_run_destructors (object=0x1001893a0)
>>>>>>> at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:446
>>>>>>> #12 0xffffffff7ecaa474 in mca_base_component_repository_release (
>>>>>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:240
>>>>>>> #13 0xffffffff7ecac774 in mca_base_component_unload (
>>>>>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:47
>>>>>>> #14 0xffffffff7ecac808 in mca_base_component_close (
>>>>>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:60
>>>>>>> #15 0xffffffff7ecac8dc in mca_base_components_close (output_id=-1, 
>>>>>>> components=0xffffffff7f14ba58 <orte_oob_base_framework+80>, skip=0x0)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:86
>>>>>>> #16 0xffffffff7ecac844 in mca_base_framework_components_close (
>>>>>>> framework=0xffffffff7f14ba08 <orte_oob_base_framework>, skip=0x0)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:66
>>>>>>> #17 0xffffffff7efcaf58 in orte_oob_base_close ()
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/orte/mca/oob/base/oob_base_frame.c:112
>>>>>>> #18 0xffffffff7ecc136c in mca_base_framework_close (
>>>>>>> framework=0xffffffff7f14ba08 <orte_oob_base_framework>)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_framework.c:187
>>>>>>> #19 0xffffffff7be07858 in rte_finalize ()
>>>>>>> at 
>>>>>>> ../../../../../openmpi-dev-124-g91e9686/orte/mca/ess/hnp/ess_hnp_module.c:857
>>>>>>> #20 0xffffffff7ef338a4 in orte_finalize ()
>>>>>>> at ../../openmpi-dev-124-g91e9686/orte/runtime/orte_finalize.c:66
>>>>>>> #21 0x000000010000723c in orterun (argc=4, argv=0xffffffff7fffe0b8)
>>>>>>> at 
>>>>>>> ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1103
>>>>>>> #22 0x0000000100003e80 in main (argc=4, argv=0xffffffff7fffe0b8)
>>>>>>> at ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/main.c:13
>>>>>>> (gdb) 
>>>>>>> 
>>>>>>> Kind regards
>>>>>>> 
>>>>>>> Siegmar
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> thank you very much for the quick tutorial. Unfortunately I still
>>>>>>>> can't get a backtrace.
>>>>>>>> 
>>>>>>>>> You might need to configure with --enable-debug and add -g -O0
>>>>>>>>> to your CFLAGS and LDFLAGS
>>>>>>>>> 
>>>>>>>>> Then once you attach with gdb, you have to find the thread that is 
>>>>>>>>> polling :
>>>>>>>>> thread 1
>>>>>>>>> bt
>>>>>>>>> thread 2
>>>>>>>>> bt
>>>>>>>>> and so on until you find the good thread
>>>>>>>>> If _dbg is a local variable, you need to select the right frame
>>>>>>>>> before you can change the value :
>>>>>>>>> get the frame number from bt (generally 1 under linux)
>>>>>>>>> f <frame number>
>>>>>>>>> set _dbg=0
>>>>>>>>> 
>>>>>>>>> I hope this helps
>>>>>>>> "--enable-debug" is one of my default options. Now I used the
>>>>>>>> following command to configure Open MPI. I always start the
>>>>>>>> build process in an empty directory and I always remove
>>>>>>>> /usr/local/openmpi-1.9.0_64_gcc, before I install a new version.
>>>>>>>> 
>>>>>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 112 head config.log \
>>>>>>>> | grep openmpi
>>>>>>>> $ ../openmpi-dev-124-g91e9686/configure
>>>>>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc
>>>>>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64
>>>>>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin
>>>>>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include
>>>>>>>> JAVA_HOME=/usr/local/jdk1.8.0
>>>>>>>> LDFLAGS=-m64 -g -O0 CC=gcc CXX=g++ FC=gfortran
>>>>>>>> CFLAGS=-m64 -D_REENTRANT -g -O0
>>>>>>>> CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp
>>>>>>>> CPPFLAGS=-D_REENTRANT CXXCPPFLAGS=
>>>>>>>> --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java
>>>>>>>> --enable-heterogeneous --enable-mpi-thread-multiple
>>>>>>>> --with-threads=posix --with-hwloc=internal --without-verbs
>>>>>>>> --with-wrapper-cflags=-std=c11 -m64 --enable-debug
>>>>>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 113 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> "gbd" doesn't allow any backtrace for any thread.
>>>>>>>> 
>>>>>>>> tyr java 124 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>>> ...
>>>>>>>> (gdb) attach 18876
>>>>>>>> Attaching to process 18876
>>>>>>>> [New process 18876]
>>>>>>>> Retry #1:
>>>>>>>> Retry #2:
>>>>>>>> Retry #3:
>>>>>>>> Retry #4:
>>>>>>>> 0x7eadcb04 in ?? ()
>>>>>>>> (gdb) info threads
>>>>>>>> [New LWP 12]
>>>>>>>> [New LWP 11]
>>>>>>>> [New LWP 10]
>>>>>>>> [New LWP 9]
>>>>>>>> [New LWP 8]
>>>>>>>> [New LWP 7]
>>>>>>>> [New LWP 6]
>>>>>>>> [New LWP 5]
>>>>>>>> [New LWP 4]
>>>>>>>> [New LWP 3]
>>>>>>>> [New LWP 2]
>>>>>>>> Id   Target Id         Frame 
>>>>>>>> 12   LWP 2             0x7eadc6b0 in ?? ()
>>>>>>>> 11   LWP 3             0x7eadcbb8 in ?? ()
>>>>>>>> 10   LWP 4             0x7eadcbb8 in ?? ()
>>>>>>>> 9    LWP 5             0x7eadcbb8 in ?? ()
>>>>>>>> 8    LWP 6             0x7eadcbb8 in ?? ()
>>>>>>>> 7    LWP 7             0x7eadcbb8 in ?? ()
>>>>>>>> 6    LWP 8             0x7ead8b0c in ?? ()
>>>>>>>> 5    LWP 9             0x7eadcbb8 in ?? ()
>>>>>>>> 4    LWP 10            0x7eadcbb8 in ?? ()
>>>>>>>> 3    LWP 11            0x7eadcbb8 in ?? ()
>>>>>>>> 2    LWP 12            0x7eadcbb8 in ?? ()
>>>>>>>> * 1    LWP 1             0x7eadcb04 in ?? ()
>>>>>>>> (gdb) thread 1
>>>>>>>> [Switching to thread 1 (LWP 1)]
>>>>>>>> #0  0x7eadcb04 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcb04 in ?? ()
>>>>>>>> #1  0x7eaca12c in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 2
>>>>>>>> [Switching to thread 2 (LWP 12)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 3
>>>>>>>> [Switching to thread 3 (LWP 11)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 4
>>>>>>>> [Switching to thread 4 (LWP 10)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 5
>>>>>>>> [Switching to thread 5 (LWP 9)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 6
>>>>>>>> [Switching to thread 6 (LWP 8)]
>>>>>>>> #0  0x7ead8b0c in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7ead8b0c in ?? ()
>>>>>>>> #1  0x7eacbcb0 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 7
>>>>>>>> [Switching to thread 7 (LWP 7)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 8
>>>>>>>> [Switching to thread 8 (LWP 6)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 9
>>>>>>>> [Switching to thread 9 (LWP 5)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac2638 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 10
>>>>>>>> [Switching to thread 10 (LWP 4)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 11
>>>>>>>> [Switching to thread 11 (LWP 3)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> #1  0x7eac25a8 in ?? ()
>>>>>>>> Backtrace stopped: previous frame identical to this frame (corrupt 
>>>>>>>> stack?)
>>>>>>>> (gdb) thread 12
>>>>>>>> [Switching to thread 12 (LWP 2)]
>>>>>>>> #0  0x7eadc6b0 in ?? ()
>>>>>>>> (gdb) 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I also tried to set _dbg in all available frames without success.
>>>>>>>> 
>>>>>>>> (gdb) f 1
>>>>>>>> #1  0x7eacb46c in ?? ()
>>>>>>>> (gdb) set _dbg=0
>>>>>>>> No symbol table is loaded.  Use the "file" command.
>>>>>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>>>>>>>> Reading symbols from 
>>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>>>>>>>> (gdb) f 1
>>>>>>>> #1  0x7eacb46c in ?? ()
>>>>>>>> (gdb) set _dbg=0
>>>>>>>> No symbol "_dbg" in current context.
>>>>>>>> (gdb) f 2
>>>>>>>> #0  0x00000000 in ?? ()
>>>>>>>> (gdb) set _dbg=0
>>>>>>>> No symbol "_dbg" in current context.
>>>>>>>> (gdb) 
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> With "list" I get source code from mpi_CartComm.c and not from 
>>>>>>>> mpi_MPI.c.
>>>>>>>> If a switch threads, "list" continues in the old file.
>>>>>>>> 
>>>>>>>> (gdb) thread 1
>>>>>>>> [Switching to thread 1 (LWP 1)]
>>>>>>>> #0  0x7eadcb04 in ?? ()
>>>>>>>> (gdb) list 36
>>>>>>>> 31          distributed under the License is distributed on an "AS IS" 
>>>>>>>> BASIS,
>>>>>>>> 32          WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
>>>>>>>> express or implied.
>>>>>>>> 33          See the License for the specific language governing 
>>>>>>>> permissions and
>>>>>>>> 34          limitations under the License.
>>>>>>>> 35      */
>>>>>>>> 36      /*
>>>>>>>> 37       * File         : mpi_CartComm.c
>>>>>>>> 38       * Headerfile   : mpi_CartComm.h
>>>>>>>> 39       * Author       : Sung-Hoon Ko, Xinying Li
>>>>>>>> 40       * Created      : Thu Apr  9 12:22:15 1998
>>>>>>>> (gdb) thread 2
>>>>>>>> [Switching to thread 2 (LWP 12)]
>>>>>>>> #0  0x7eadcbb8 in ?? ()
>>>>>>>> (gdb) list
>>>>>>>> 41       * Revision     : $Revision: 1.6 $
>>>>>>>> 42       * Updated      : $Date: 2003/01/16 16:39:34 $
>>>>>>>> 43       * Copyright: Northeast Parallel Architectures Center
>>>>>>>> 44       *            at Syracuse University 1998
>>>>>>>> 45       */
>>>>>>>> 46      #include "ompi_config.h"
>>>>>>>> 47      
>>>>>>>> 48      #include <stdlib.h>
>>>>>>>> 49      #ifdef HAVE_TARGETCONDITIONALS_H
>>>>>>>> 50      #include <TargetConditionals.h>
>>>>>>>> (gdb) 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Do you have any ideas, what's going wrong or if I must use a different
>>>>>>>> symbol table?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> 
>>>>>>>> Siegmar
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Gilles
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>>>>>>> Hi Gilles,
>>>>>>>>>> 
>>>>>>>>>> I changed _dbg to a static variable, so that it is visible in the
>>>>>>>>>> library, but unfortunately still not in the symbol table.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> tyr java 419 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so 
>>>>>>>>>> | grep -i _dbg
>>>>>>>>>> [271]   |  1249644|     4|OBJT |LOCL |0    |18     |_dbg.14258
>>>>>>>>>> tyr java 420 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>>>>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>>>>> ...
>>>>>>>>>> (gdb) attach 13019
>>>>>>>>>> Attaching to process 13019
>>>>>>>>>> [New process 13019]
>>>>>>>>>> Retry #1:
>>>>>>>>>> Retry #2:
>>>>>>>>>> Retry #3:
>>>>>>>>>> Retry #4:
>>>>>>>>>> 0x7eadcb04 in ?? ()
>>>>>>>>>> (gdb) symbol-file 
>>>>>>>>>> /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>>>>>>>>>> Reading symbols from 
>>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>>>>>>>>>> (gdb) set var _dbg.14258=0
>>>>>>>>>> No symbol "_dbg" in current context.
>>>>>>>>>> (gdb) 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Kind regards
>>>>>>>>>> 
>>>>>>>>>> Siegmar
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> unfortunately I didn't get anything useful. It's probably my fault,
>>>>>>>>>>> because I'm still not very familiar with gdb or any other debugger.
>>>>>>>>>>> I did the following things.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 1st window:
>>>>>>>>>>> -----------
>>>>>>>>>>> 
>>>>>>>>>>> tyr java 174 setenv OMPI_ATTACH 1
>>>>>>>>>>> tyr java 175 mpijavac InitFinalizeMain.java 
>>>>>>>>>>> warning: [path] bad path element
>>>>>>>>>>> "/usr/local/openmpi-1.9.0_64_gcc/lib64/shmem.jar":
>>>>>>>>>>> no such file or directory
>>>>>>>>>>> 1 warning
>>>>>>>>>>> tyr java 176 mpiexec -np 1 java InitFinalizeMain
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2nd window:
>>>>>>>>>>> -----------
>>>>>>>>>>> 
>>>>>>>>>>> tyr java 379 ps -aef | grep java
>>>>>>>>>>> noaccess  1345     1   0   May 22 ?         113:23 
>>>>>>>>>>> /usr/java/bin/java -server -Xmx128m 
>>>>>>>> -XX:+UseParallelGC 
>>>>>>>>>> -XX:ParallelGCThreads=4 
>>>>>>>>>>> fd1026  3661 10753   0 14:09:12 pts/14      0:00 mpiexec -np 1 java 
>>>>>>>>>>> InitFinalizeMain
>>>>>>>>>>> fd1026  3677 13371   0 14:16:55 pts/2       0:00 grep java
>>>>>>>>>>> fd1026  3663  3661   0 14:09:12 pts/14      0:01 java -cp 
>>>>>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/java:/usr/local/jun
>>>>>>>>>>> tyr java 380 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
>>>>>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>>>>>> ...
>>>>>>>>>>> (gdb) attach 3663
>>>>>>>>>>> Attaching to process 3663
>>>>>>>>>>> [New process 3663]
>>>>>>>>>>> Retry #1:
>>>>>>>>>>> Retry #2:
>>>>>>>>>>> Retry #3:
>>>>>>>>>>> Retry #4:
>>>>>>>>>>> 0x7eadcb04 in ?? ()
>>>>>>>>>>> (gdb) symbol-file 
>>>>>>>>>>> /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so
>>>>>>>>>>> Reading symbols from 
>>>>>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done.
>>>>>>>>>>> (gdb) set var _dbg=0
>>>>>>>>>>> No symbol "_dbg" in current context.
>>>>>>>>>>> (gdb) set var JNI_OnLoad::_dbg=0
>>>>>>>>>>> No symbol "_dbg" in specified context.
>>>>>>>>>>> (gdb) set JNI_OnLoad::_dbg=0
>>>>>>>>>>> No symbol "_dbg" in specified context.
>>>>>>>>>>> (gdb) info threads
>>>>>>>>>>> [New LWP 12]
>>>>>>>>>>> [New LWP 11]
>>>>>>>>>>> [New LWP 10]
>>>>>>>>>>> [New LWP 9]
>>>>>>>>>>> [New LWP 8]
>>>>>>>>>>> [New LWP 7]
>>>>>>>>>>> [New LWP 6]
>>>>>>>>>>> [New LWP 5]
>>>>>>>>>>> [New LWP 4]
>>>>>>>>>>> [New LWP 3]
>>>>>>>>>>> [New LWP 2]
>>>>>>>>>>> Id   Target Id         Frame 
>>>>>>>>>>> 12   LWP 2             0x7eadc6b0 in ?? ()
>>>>>>>>>>> 11   LWP 3             0x7eadcbb8 in ?? ()
>>>>>>>>>>> 10   LWP 4             0x7eadcbb8 in ?? ()
>>>>>>>>>>> 9    LWP 5             0x7eadcbb8 in ?? ()
>>>>>>>>>>> 8    LWP 6             0x7eadcbb8 in ?? ()
>>>>>>>>>>> 7    LWP 7             0x7eadcbb8 in ?? ()
>>>>>>>>>>> 6    LWP 8             0x7ead8b0c in ?? ()
>>>>>>>>>>> 5    LWP 9             0x7eadcbb8 in ?? ()
>>>>>>>>>>> 4    LWP 10            0x7eadcbb8 in ?? ()
>>>>>>>>>>> 3    LWP 11            0x7eadcbb8 in ?? ()
>>>>>>>>>>> 2    LWP 12            0x7eadcbb8 in ?? ()
>>>>>>>>>>> * 1    LWP 1             0x7eadcb04 in ?? ()
>>>>>>>>>>> (gdb) 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> It seems that "_dbg" is unknown and unavailable.
>>>>>>>>>>> 
>>>>>>>>>>> tyr java 399 grep _dbg 
>>>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/*
>>>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c:
>>>>>>>>>>>         volatile 
>>>>>>>> int _dbg = 1;
>>>>>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c:
>>>>>>>>>>>         while 
>>>>>>>> (_dbg) poll(NULL, 0, 1);
>>>>>>>>>>> tyr java 400 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep 
>>>>>>>>>>> -i _dbg
>>>>>>>>>>> tyr java 401 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep 
>>>>>>>>>>> -i JNI_OnLoad
>>>>>>>>>>> [1057]  |              139688|                 444|FUNC |GLOB |0    
>>>>>>>>>>> |11     |JNI_OnLoad
>>>>>>>>>>> tyr java 402 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> How can I set _dbg to zero to continue mpiexec? I also tried to
>>>>>>>>>>> set a breakpoint for function JNI_OnLoad, but it seems, that the
>>>>>>>>>>> function isn't called before SIGSEGV.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> tyr java 177 unsetenv OMPI_ATTACH 
>>>>>>>>>>> tyr java 178 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>>>>>>>>>>> GNU gdb (GDB) 7.6.1
>>>>>>>>>>> ...
>>>>>>>>>>> (gdb) b mpi_MPI.c:JNI_OnLoad
>>>>>>>>>>> No source file named mpi_MPI.c.
>>>>>>>>>>> Make breakpoint pending on future shared library load? (y or [n]) y
>>>>>>>>>>> 
>>>>>>>>>>> Breakpoint 1 (mpi_MPI.c:JNI_OnLoad) pending.
>>>>>>>>>>> (gdb) run -np 1 java InitFinalizeMain 
>>>>>>>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 
>>>>>>>>>>> java InitFinalizeMain
>>>>>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>>>>>> [New Thread 1 (LWP 1)]
>>>>>>>>>>> [New LWP    2        ]
>>>>>>>>>>> #
>>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>>> #
>>>>>>>>>>> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3518, tid=2
>>>>>>>>>>> ...
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> tyr java 381 cat InitFinalizeMain.java 
>>>>>>>>>>> import mpi.*;
>>>>>>>>>>> 
>>>>>>>>>>> public class InitFinalizeMain
>>>>>>>>>>> {
>>>>>>>>>>> public static void main (String args[]) throws MPIException
>>>>>>>>>>> {
>>>>>>>>>>> MPI.Init (args);
>>>>>>>>>>> System.out.print ("Hello!\n");
>>>>>>>>>>> MPI.Finalize ();
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> SIGSEGV happens in MPI.Init(args), because I can print a message
>>>>>>>>>>> before I call the method.
>>>>>>>>>>> 
>>>>>>>>>>> tyr java 192 unsetenv OMPI_ATTACH
>>>>>>>>>>> tyr java 193 mpijavac InitFinalizeMain.java
>>>>>>>>>>> tyr java 194 mpiexec -np 1 java InitFinalizeMain
>>>>>>>>>>> Before MPI.Init()
>>>>>>>>>>> #
>>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>>> #
>>>>>>>>>>> #  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3697, tid=2
>>>>>>>>>>> ...
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Any ideas, how I can continue? I couldn't find a C function for
>>>>>>>>>>> MPI.Init() in a C file. Do you know, which function is called first,
>>>>>>>>>>> so that I can set a breakpoint? By the way, I get the same error
>>>>>>>>>>> for Solaris 10 x86_64.
>>>>>>>>>>> 
>>>>>>>>>>> tyr java 388 ssh sunpc1
>>>>>>>>>>> ...
>>>>>>>>>>> sunpc1 java 106 mpijavac InitFinalizeMain.java
>>>>>>>>>>> sunpc1 java 107 uname -a
>>>>>>>>>>> SunOS sunpc1 5.10 Generic_147441-21 i86pc i386 i86pc Solaris
>>>>>>>>>>> sunpc1 java 108 isainfo -k
>>>>>>>>>>> amd64
>>>>>>>>>>> sunpc1 java 109 mpiexec -np 1 java InitFinalizeMain
>>>>>>>>>>> #
>>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>>> #
>>>>>>>>>>> #  SIGSEGV (0xb) at pc=0xfffffd7fff1d77f0, pid=20256, tid=2
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thank you very much for any help in advance.
>>>>>>>>>>> 
>>>>>>>>>>> Kind regards
>>>>>>>>>>> 
>>>>>>>>>>> Siegmar
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> thank you very much for your help.
>>>>>>>>>>>> 
>>>>>>>>>>>>> how did you configure openmpi ? which java version did you use ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> i just found a regression and you currently have to explicitly add
>>>>>>>>>>>>> CFLAGS=-D_REENTRANT CPPFLAGS=-D_REENTRANT
>>>>>>>>>>>>> to your configure command line
>>>>>>>>>>>> I added "-D_REENTRANT" to my command.
>>>>>>>>>>>> 
>>>>>>>>>>>> ../openmpi-dev-124-g91e9686/configure 
>>>>>>>>>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc \
>>>>>>>>>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 \
>>>>>>>>>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>>>>>>>>>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include \
>>>>>>>>>>>> JAVA_HOME=/usr/local/jdk1.8.0 \
>>>>>>>>>>>> LDFLAGS="-m64" CC="gcc" CXX="g++" FC="gfortran" \
>>>>>>>>>>>> CFLAGS="-m64 -D_REENTRANT" CXXFLAGS="-m64" FCFLAGS="-m64" \
>>>>>>>>>>>> CPP="cpp" CXXCPP="cpp" \
>>>>>>>>>>>> CPPFLAGS="-D_REENTRANT" CXXCPPFLAGS="" \
>>>>>>>>>>>> --enable-mpi-cxx \
>>>>>>>>>>>> --enable-cxx-exceptions \
>>>>>>>>>>>> --enable-mpi-java \
>>>>>>>>>>>> --enable-heterogeneous \
>>>>>>>>>>>> --enable-mpi-thread-multiple \
>>>>>>>>>>>> --with-threads=posix \
>>>>>>>>>>>> --with-hwloc=internal \
>>>>>>>>>>>> --without-verbs \
>>>>>>>>>>>> --with-wrapper-cflags="-std=c11 -m64" \
>>>>>>>>>>>> --enable-debug \
>>>>>>>>>>>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc
>>>>>>>>>>>> 
>>>>>>>>>>>> I use Java 8.
>>>>>>>>>>>> 
>>>>>>>>>>>> tyr openmpi-1.9 112 java -version
>>>>>>>>>>>> java version "1.8.0"
>>>>>>>>>>>> Java(TM) SE Runtime Environment (build 1.8.0-b132)
>>>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
>>>>>>>>>>>> tyr openmpi-1.9 113 
>>>>>>>>>>>> 
>>>>>>>>>>>> Unfortunately I still get a SIGSEGV with openmpi-dev-124-g91e9686.
>>>>>>>>>>>> I have applied your patch and will try to debug my small Java
>>>>>>>>>>>> program tomorrow or next week and then let you know the result.
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> Link to this post: 
>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25581.php
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post: 
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25582.php
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25584.php
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25585.php
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25586.php
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25587.php
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/10/25588.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/10/25589.php
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25592.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25595.php

Reply via email to