Hello Oscar,
do you have time to look into my problem? Probably Takahiro has a
point and gdb behaves differently on Solaris and Linux, so that
the differing outputs have no meaning. I tried to debug my Java
program, but without success so far, because I wasn't able to get
into the Java program to set a breakpoint or to see the code. Have
you succeeded to debug a mpiJava program? If so, how must I call
gdb (I normally use "gdb mipexec" and then "run -np 1 java ...")?
What can I do to get helpful information to track the error down?
I have attached the error log file. Perhaps you can see if something
is going wrong with the Java interface. Thank you very much for your
help and any hints for the usage of gdb with mpiJava in advance.
Please let me know if I can provide anything else.
Kind regards
Siegmar
I think that it must have to do with MPI, because everything
works fine on Linux and my Java program works fine with an older
MPI version (openmpi-1.8.2a1r31804) as well.
Yes. I also think it must have to do with MPI.
But java process side, not mpiexec process side.
When you run Java MPI program via mpiexec, a mpiexec process
process launch a java process. When the java process (your
Java program) calls a MPI method, native part (written in C/C++)
of the MPI library is called. It runs in java process, not in
mpiexec process. I suspect that part.
On Solaris things are different.
Are you saying the following difference?
After this line,
881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
Linux shows
orte_job_state_to_str (state=1)
at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217 switch(state) {
but Solaris shows
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122 if (NULL == name) {
Each macro is defined as:
#define ORTE_ACTIVATE_JOB_STATE(j, s) \
do { \
orte_job_t *shadow=(j); \
opal_output_verbose(1, orte_state_base_framework.framework_output, \
"%s ACTIVATE JOB %s STATE %s AT %s:%d", \
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), \
(NULL == shadow) ? "NULL" : \
ORTE_JOBID_PRINT(shadow->jobid), \
orte_job_state_to_str((s)), \
__FILE__, __LINE__); \
orte_state.activate_job_state(shadow, (s)); \
} while(0);
#define ORTE_NAME_PRINT(n) \
orte_util_print_name_args(n)
#define ORTE_JOBID_PRINT(n) \
orte_util_print_jobids(n)
I'm not sure, but I think the gdb on Solaris steps into
orte_util_print_name_args, but gdb on Linux doesn't step into
orte_util_print_name_args and orte_util_print_jobids for some
reason, or orte_job_state_to_str is evaluated before them.
So I think it's not an important difference.
You showed the following lines.
orterun (argc=5, argv=0xffffffff7fffe0d8)
at
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084
1084 while (orte_event_base_active) {
(gdb)
1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
I'm not familiar with this code but I think this part (in mpiexec
process) is only waiting the java process to terminate (normally
or abnormally). So I think the problem is not in a mpiexec process
but in a java process.
Regards,
Takahiro
Hi Takahiro,
mpiexec and java run as distinct processes. Your JRE message
says java process raises SEGV. So you should trace the java
process, not the mpiexec process. And more, your JRE message
says the crash happened outside the Java Virtual Machine in
native code. So usual Java program debugger is useless.
You should trace native code part of the java process.
Unfortunately I don't know how to debug such one.
I think that it must have to do with MPI, because everything
works fine on Linux and my Java program works fine with an older
MPI version (openmpi-1.8.2a1r31804) as well.
linpc1 x 112 mpiexec -np 1 java InitFinalizeMain
Hello!
linpc1 x 113
Therefore I single stepped through the program on Linux as well
and found a difference launching the process. On Linux I get the
following sequence.
Breakpoint 1, rsh_launch (jdata=0x614aa0)
at
../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
876 if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
(gdb) s
881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
(gdb) s
orte_job_state_to_str (state=1)
at ../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217 switch(state) {
(gdb)
221 return "PENDING INIT";
(gdb)
317 }
(gdb)
orte_util_print_jobids (job=4294967295)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170 ptr = get_print_name_buffer();
(gdb)
On Solaris things are different.
Breakpoint 1, rsh_launch (jdata=0x100125250)
at
../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
876 if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
(gdb) s
881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
(gdb) s
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122 if (NULL == name) {
(gdb)
142 job = orte_util_print_jobids(name->jobid);
(gdb)
orte_util_print_jobids (job=2673410048)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170 ptr = get_print_name_buffer();
(gdb)
Is this normal or is it the reason for the crash on Solaris?
Kind regards
Siegmar
The log file output by JRE may help you.
# An error report file with more information is saved as:
#
/home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log
Regards,
Takahiro
Hi,
I installed openmpi-dev-124-g91e9686 on Solaris 10 Sparc with
gcc-4.9.1 to track down the error with my small Java program.
I started single stepping in orterun.c at line 1081 and
continued until I got the segmentation fault. I get
"jdata = 0x0" in version openmpi-1.8.2a1r31804, which is the
last one which works with Java in my environment, while I get
"jdata = 0x100125250" in this version. Unfortunately I don't
know which files or variables are important to look at. Perhaps
somebody can look at the following lines of code and tell me,
which information I should provide to solve the problem. I know
that Solaris isn't any longer on your list of supported systems,
but perhaps we can get it working again, if you tell me what
you need and I do the debugging.
/usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
...
(gdb) run -np 1 java InitFinalizeMain
Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec \
-np 1 java InitFinalizeMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13064, tid=2
...
[LWP 2 exited]
[New Thread 2 ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be
found to satisfy query
(gdb) thread 1
[Switching to thread 1 (LWP 1 )]
#0 0xffffffff7f6173d0 in rtld_db_dlactivity () from
/usr/lib/sparcv9/ld.so.1
(gdb) b orterun.c:1081
Breakpoint 1 at 0x1000070dc: file
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c, line
1081.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 java
InitFinalizeMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
[Switching to Thread 1 (LWP 1)]
Breakpoint 1, orterun (argc=5, argv=0xffffffff7fffe0d8)
at
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1081
1081 rc = orte_plm.spawn(jdata);
(gdb) print jdata
$1 = (orte_job_t *) 0x100125250
(gdb) s
rsh_launch (jdata=0x100125250)
at
../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:876
876 if (ORTE_FLAG_TEST(jdata, ORTE_JOB_FLAG_RESTART)) {
(gdb) s
881 ORTE_ACTIVATE_JOB_STATE(jdata, ORTE_JOB_STATE_INIT);
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122 if (NULL == name) {
(gdb)
142 job = orte_util_print_jobids(name->jobid);
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd990)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172 if (NULL == ptr) {
(gdb)
178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182 if (ORTE_JOBID_INVALID == job) {
(gdb)
184 } else if (ORTE_JOBID_WILDCARD == job) {
(gdb)
187 tmp1 = ORTE_JOB_FAMILY((unsigned long)job);
(gdb)
188 tmp2 = ORTE_LOCAL_JOBID((unsigned long)job);
(gdb)
189 snprintf(ptr->buffers[ptr->cntr++],
(gdb)
193 return ptr->buffers[ptr->cntr-1];
(gdb)
194 }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143
143 vpid = orte_util_print_vpids(name->vpid);
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260
260 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd9a0)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262
262 if (NULL == ptr) {
(gdb)
268 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
272 if (ORTE_VPID_INVALID == vpid) {
(gdb)
274 } else if (ORTE_VPID_WILDCARD == vpid) {
(gdb)
277 snprintf(ptr->buffers[ptr->cntr++],
(gdb)
281 return ptr->buffers[ptr->cntr-1];
(gdb)
282 }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146
146 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148
148 if (NULL == ptr) {
(gdb)
154 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
158 snprintf(ptr->buffers[ptr->cntr++],
(gdb)
162 return ptr->buffers[ptr->cntr-1];
(gdb)
163 }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffda60)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172 if (NULL == ptr) {
(gdb)
178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182 if (ORTE_JOBID_INVALID == job) {
(gdb)
183 snprintf(ptr->buffers[ptr->cntr++],
ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]");
(gdb)
193 return ptr->buffers[ptr->cntr-1];
(gdb)
194 }
(gdb)
orte_job_state_to_str (state=1) at
../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217 switch(state) {
(gdb)
221 return "PENDING INIT";
(gdb)
317 }
(gdb)
opal_output_verbose (level=1, output_id=0,
format=0xffffffff7f14dd98 <orte_job_states>
"\336\257\276\355\336\257\276\355")
at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373
373 va_start(arglist, format);
(gdb)
369 {
(gdb)
370 if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS &&
(gdb)
377 }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
at
../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:33
33 opal_list_item_t *itm, *any=NULL, *error=NULL;
(gdb)
37 for (itm = opal_list_get_first(&orte_job_states);
(gdb)
opal_list_get_first (list=0xffffffff7f14dd98 <orte_job_states>)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:320
320 opal_list_item_t* item =
(opal_list_item_t*)list->opal_list_sentinel.opal_list_next;
(gdb)
324 assert(1 == item->opal_list_item_refcount);
(gdb)
325 assert( list == item->opal_list_item_belong_to );
(gdb)
328 return item;
(gdb)
329 }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
at
../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:38
38 itm != opal_list_get_end(&orte_job_states);
(gdb)
opal_list_get_end (list=0xffffffff7f14dd98 <orte_job_states>)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_list.h:399
399 return &(list->opal_list_sentinel);
(gdb)
400 }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
at
../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:37
37 for (itm = opal_list_get_first(&orte_job_states);
(gdb)
40 s = (orte_state_t*)itm;
(gdb)
41 if (s->job_state == ORTE_JOB_STATE_ANY) {
(gdb)
45 if (s->job_state == ORTE_JOB_STATE_ERROR) {
(gdb)
48 if (s->job_state == state) {
(gdb)
49 OPAL_OUTPUT_VERBOSE((1,
orte_state_base_framework.framework_output,
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:122
122 if (NULL == name) {
(gdb)
142 job = orte_util_print_jobids(name->jobid);
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd880)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_jobids (job=2502885376) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172 if (NULL == ptr) {
(gdb)
178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182 if (ORTE_JOBID_INVALID == job) {
(gdb)
184 } else if (ORTE_JOBID_WILDCARD == job) {
(gdb)
187 tmp1 = ORTE_JOB_FAMILY((unsigned long)job);
(gdb)
188 tmp2 = ORTE_LOCAL_JOBID((unsigned long)job);
(gdb)
189 snprintf(ptr->buffers[ptr->cntr++],
(gdb)
193 return ptr->buffers[ptr->cntr-1];
(gdb)
194 }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:143
143 vpid = orte_util_print_vpids(name->vpid);
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:260
260 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd890)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_vpids (vpid=0) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:262
262 if (NULL == ptr) {
(gdb)
268 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
272 if (ORTE_VPID_INVALID == vpid) {
(gdb)
274 } else if (ORTE_VPID_WILDCARD == vpid) {
(gdb)
277 snprintf(ptr->buffers[ptr->cntr++],
(gdb)
281 return ptr->buffers[ptr->cntr-1];
(gdb)
282 }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:146
146 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_name_args (name=0x100118380 <orte_process_info+104>)
at ../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:148
148 if (NULL == ptr) {
(gdb)
154 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
158 snprintf(ptr->buffers[ptr->cntr++],
(gdb)
162 return ptr->buffers[ptr->cntr-1];
(gdb)
163 }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:170
170 ptr = get_print_name_buffer();
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:92
92 if (!fns_init) {
(gdb)
101 ret = opal_tsd_getspecific(print_args_tsd_key,
(void**)&ptr);
(gdb)
opal_tsd_getspecific (key=1, valuep=0xffffffff7fffd950)
at ../../openmpi-dev-124-g91e9686/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb)
164 return OPAL_SUCCESS;
(gdb)
165 }
(gdb)
get_print_name_buffer () at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb)
104 if (NULL == ptr) {
(gdb)
113 return (orte_print_args_buffers_t*) ptr;
(gdb)
114 }
(gdb)
orte_util_print_jobids (job=4294967295) at
../../openmpi-dev-124-g91e9686/orte/util/name_fns.c:172
172 if (NULL == ptr) {
(gdb)
178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb)
182 if (ORTE_JOBID_INVALID == job) {
(gdb)
183 snprintf(ptr->buffers[ptr->cntr++],
ORTE_PRINT_NAME_ARGS_MAX_SIZE, "[INVALID]");
(gdb)
193 return ptr->buffers[ptr->cntr-1];
(gdb)
194 }
(gdb)
orte_job_state_to_str (state=1) at
../../openmpi-dev-124-g91e9686/orte/util/error_strings.c:217
217 switch(state) {
(gdb)
221 return "PENDING INIT";
(gdb)
317 }
(gdb)
opal_output_verbose (level=1, output_id=-1, format=0x1 <Address 0x1 out
of
bounds>)
at ../../../openmpi-dev-124-g91e9686/opal/util/output.c:373
373 va_start(arglist, format);
(gdb)
369 {
(gdb)
370 if (output_id >= 0 && output_id < OPAL_OUTPUT_MAX_STREAMS &&
(gdb)
377 }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
at
../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:54
54 if (NULL == s->cbfunc) {
(gdb)
62 caddy = OBJ_NEW(orte_state_caddy_t);
(gdb)
opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>,
file=0xffffffff7f034c08
"../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c",
line=62) at
../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:249
249 opal_object_t* object = opal_obj_new(type);
(gdb)
opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:465
465 assert(cls->cls_sizeof >= sizeof(opal_object_t));
(gdb)
470 object = (opal_object_t *) malloc(cls->cls_sizeof);
(gdb)
472 if (0 == cls->cls_initialized) {
(gdb)
473 opal_class_initialize(cls);
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:79
79 assert(cls);
(gdb)
84 if (1 == cls->cls_initialized) {
(gdb)
87 opal_atomic_lock(&class_lock);
(gdb)
opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:397
397 while( !opal_atomic_cmpset_acq_32( &(lock->u.lock),
(gdb)
opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>,
oldval=0,
newval=1)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:107
107 rc = opal_atomic_cmpset_32(addr, oldval, newval);
(gdb)
opal_atomic_cmpset_32 (addr=0xffffffff7ee89bf0 <class_lock>, oldval=0,
newval=1)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93
93 int32_t ret = newval;
(gdb)
95 __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0"
(gdb)
98 return (ret == oldval);
(gdb)
99 }
(gdb)
opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>,
oldval=0,
newval=1)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:108
108 opal_atomic_rmb();
(gdb)
opal_atomic_rmb () at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:63
63 MEMBAR("#LoadLoad");
(gdb)
64 }
(gdb)
opal_atomic_cmpset_acq_32 (addr=0xffffffff7ee89bf0 <class_lock>,
oldval=0,
newval=1)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:110
110 return rc;
(gdb)
111 }
(gdb)
opal_atomic_lock (lock=0xffffffff7ee89bf0 <class_lock>)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:403
403 }
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:93
93 if (1 == cls->cls_initialized) {
(gdb)
103 cls->cls_depth = 0;
(gdb)
104 cls_construct_array_count = 0;
(gdb)
105 cls_destruct_array_count = 0;
(gdb)
106 for (c = cls; c; c = c->cls_parent) {
(gdb)
107 if( NULL != c->cls_construct ) {
(gdb)
108 cls_construct_array_count++;
(gdb)
110 if( NULL != c->cls_destruct ) {
(gdb)
111 cls_destruct_array_count++;
(gdb)
113 cls->cls_depth++;
(gdb)
106 for (c = cls; c; c = c->cls_parent) {
(gdb)
107 if( NULL != c->cls_construct ) {
(gdb)
110 if( NULL != c->cls_destruct ) {
(gdb)
113 cls->cls_depth++;
(gdb)
106 for (c = cls; c; c = c->cls_parent) {
(gdb)
122 (void
(**)(opal_object_t*))malloc((cls_construct_array_count +
(gdb)
123
cls_destruct_array_count + 2)
*
(gdb)
122 (void
(**)(opal_object_t*))malloc((cls_construct_array_count +
(gdb)
121 cls->cls_construct_array =
(gdb)
125 if (NULL == cls->cls_construct_array) {
(gdb)
130 cls->cls_construct_array + cls_construct_array_count +
1;
(gdb)
129 cls->cls_destruct_array =
(gdb)
136 cls_construct_array = cls->cls_construct_array +
cls_construct_array_count;
(gdb)
137 cls_destruct_array = cls->cls_destruct_array;
(gdb)
139 c = cls;
(gdb)
140 *cls_construct_array = NULL; /* end marker for the
constructors */
(gdb)
141 for (i = 0; i < cls->cls_depth; i++) {
(gdb)
142 if( NULL != c->cls_construct ) {
(gdb)
143 --cls_construct_array;
(gdb)
144 *cls_construct_array = c->cls_construct;
(gdb)
146 if( NULL != c->cls_destruct ) {
(gdb)
147 *cls_destruct_array = c->cls_destruct;
(gdb)
148 cls_destruct_array++;
(gdb)
150 c = c->cls_parent;
(gdb)
141 for (i = 0; i < cls->cls_depth; i++) {
(gdb)
142 if( NULL != c->cls_construct ) {
(gdb)
146 if( NULL != c->cls_destruct ) {
(gdb)
150 c = c->cls_parent;
(gdb)
141 for (i = 0; i < cls->cls_depth; i++) {
(gdb)
152 *cls_destruct_array = NULL; /* end marker for the
destructors */
(gdb)
154 cls->cls_initialized = 1;
(gdb)
155 save_class(cls);
(gdb)
save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:188
188 if (num_classes >= max_classes) {
(gdb)
189 expand_array();
(gdb)
expand_array () at
../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:201
201 max_classes += increment;
(gdb)
202 classes = (void**)realloc(classes, sizeof(opal_class_t*) *
max_classes);
(gdb)
203 if (NULL == classes) {
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
208 classes[i] = NULL;
(gdb)
207 for (i = num_classes; i < max_classes; ++i) {
(gdb)
210 }
(gdb)
save_class (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:192
192 classes[num_classes] = cls->cls_construct_array;
(gdb)
193 ++num_classes;
(gdb)
194 }
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:159
159 opal_atomic_unlock(&class_lock);
(gdb)
opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:409
409 opal_atomic_wmb();
(gdb)
opal_atomic_wmb () at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:69
69 MEMBAR("#StoreStore");
(gdb)
70 }
(gdb)
opal_atomic_unlock (lock=0xffffffff7ee89bf0 <class_lock>)
at
../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:410
410 lock->u.lock=OPAL_ATOMIC_UNLOCKED;
(gdb)
411 }
(gdb)
opal_class_initialize (cls=0xffffffff7f14c7d8
<orte_state_caddy_t_class>)
at ../../openmpi-dev-124-g91e9686/opal/class/opal_object.c:160
160 }
(gdb)
opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:475
475 if (NULL != object) {
(gdb)
476 object->obj_class = cls;
(gdb)
477 object->obj_reference_count = 1;
(gdb)
478 opal_obj_run_constructors(object);
(gdb)
opal_obj_run_constructors (object=0x1001bfcf0)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:420
420 assert(NULL != object->obj_class);
(gdb)
422 cls_construct = object->obj_class->cls_construct_array;
(gdb)
423 while( NULL != *cls_construct ) {
(gdb)
424 (*cls_construct)(object);
(gdb)
orte_state_caddy_construct (caddy=0x1001bfcf0)
at
../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_frame.c:84
84 memset(&caddy->ev, 0, sizeof(opal_event_t));
(gdb)
85 caddy->jdata = NULL;
(gdb)
86 }
(gdb)
opal_obj_run_constructors (object=0x1001bfcf0)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:425
425 cls_construct++;
(gdb)
423 while( NULL != *cls_construct ) {
(gdb)
427 }
(gdb)
opal_obj_new (cls=0xffffffff7f14c7d8 <orte_state_caddy_t_class>)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:480
480 return object;
(gdb)
481 }
(gdb)
opal_obj_new_debug (type=0xffffffff7f14c7d8 <orte_state_caddy_t_class>,
file=0xffffffff7f034c08
"../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c",
line=62) at
../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:250
250 object->obj_magic_id = OPAL_OBJ_MAGIC_ID;
(gdb)
251 object->cls_init_file_name = file;
(gdb)
252 object->cls_init_lineno = line;
(gdb)
253 return object;
(gdb)
254 }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
at
../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:63
63 if (NULL != jdata) {
(gdb)
64 caddy->jdata = jdata;
(gdb)
65 caddy->job_state = state;
(gdb)
66 OBJ_RETAIN(jdata);
(gdb)
opal_obj_update (inc=1, object=0x100125250)
at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:497
497 return opal_atomic_add_32(&(object->obj_reference_count),
inc);
(gdb)
opal_atomic_add_32 (addr=0x100125260, delta=1)
at
../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:63
63 oldval = *addr;
(gdb)
64 } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval +
delta));
(gdb)
opal_atomic_cmpset_32 (addr=0x100125260, oldval=1, newval=2)
at
../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/sparcv9/atomic.h:93
93 int32_t ret = newval;
(gdb)
95 __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0"
(gdb)
98 return (ret == oldval);
(gdb)
99 }
(gdb)
opal_atomic_add_32 (addr=0x100125260, delta=1)
at
../../../../openmpi-dev-124-g91e9686/opal/include/opal/sys/atomic_impl.h:65
65 return (oldval + delta);
(gdb)
66 }
(gdb)
orte_state_base_activate_job_state (jdata=0x100125250, state=1)
at
../../../../openmpi-dev-124-g91e9686/orte/mca/state/base/state_base_fns.c:66
66 OBJ_RETAIN(jdata);
(gdb)
68 opal_event_set(orte_event_base, &caddy->ev, -1,
OPAL_EV_WRITE, s->cbfunc, caddy);
(gdb)
69 opal_event_set_priority(&caddy->ev, s->priority);
(gdb)
70 opal_event_active(&caddy->ev, OPAL_EV_WRITE, 1);
(gdb)
71 return;
(gdb)
105 }
(gdb)
rsh_launch (jdata=0x100125250)
at
../../../../../openmpi-dev-124-g91e9686/orte/mca/plm/rsh/plm_rsh_module.c:883
883 return ORTE_SUCCESS;
(gdb)
884 }
(gdb)
orterun (argc=5, argv=0xffffffff7fffe0d8)
at
../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1084
1084 while (orte_event_base_active) {
(gdb)
1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
1084 while (orte_event_base_active) {
(gdb)
1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
1084 while (orte_event_base_active) {
(gdb)
1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=13080, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build
1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode
solaris-sparc
compressed oops)
# Problematic frame:
# 1084 while (orte_event_base_active) {
(gdb)
1085 opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
(gdb)
C [libc.so.1+0x3c7f0] strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable
core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
#
/home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid13080.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node tyr exited on
signal 6
(Abort).
--------------------------------------------------------------------------
1084 while (orte_event_base_active) {
(gdb)
1089 orte_odls.kill_local_procs(NULL);
(gdb)
Thank you very much for any help in advance.
Kind regards
Siegmar
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25559.php