Hi,
today I installed openmpi-1.7.4a1r29784 on "Solaris 10, Sparc"
with "Sun C 5.12" with the following configure command.
../openmpi-1.7.4a1r29784/configure \
--prefix=/usr/local/openmpi-1.7.4_64_cc \
--libdir=/usr/local/openmpi-1.7.4_64_cc/lib64 \
--with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \
--with-jdk-headers=/usr/local/jdk1.7.0_07/include \
JAVA_HOME=/usr/local/jdk1.7.0_07 \
LDFLAGS="-m64" \
CC="cc" CXX="CC" FC="f95" \
CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
CPP="cpp" CXXCPP="cpp" \
CPPFLAGS="" CXXCPPFLAGS="" \
--enable-cxx-exceptions \
--enable-mpi-java \
--enable-heterogeneous \
--enable-opal-multi-threads \
--enable-mpi-thread-multiple \
--with-threads=posix \
--with-hwloc=internal \
--without-verbs \
--with-wrapper-cflags=-m64 \
--enable-debug \
|& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
1) Bus error with "ompi_info -a"
tyr fd1026 108 ompi_info | grep MPI:
Open MPI: 1.7.4a1r29784
I get a Bus Error, if I use option "-a".
tyr fd1026 109 ompi_info -a | grep MPI:
[tyr:17668] *** Process received signal ***
[tyr:17668] Signal: Bus Error (10)
[tyr:17668] Signal code: Invalid address alignment (1)
[tyr:17668] Failing at address: ffffffff7d3ca461
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
opal_backtrace_print+0x14
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
0x1843d8
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
0x13a3dc [ Signal 2099942168 (?)]
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
mca_base_var_dump+0x190
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
0x899a8
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
opal_info_show_mca_params+0xb4
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
opal_info_do_params+0x364
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/bin/ompi_info:main+0x6e4
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/bin/ompi_info:_start+0x12c
[tyr:17668] *** End of error message ***
Bus error
tyr fd1026 110
tyr fd1026 112 cd /usr/local/openmpi-1.7.4_64_cc/bin/
tyr bin 113 /opt/solstudio12.3/bin/sparcv9/dbx ompi_info
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading ompi_info
Reading ld.so.1
Reading libmpi.so.1.2.0
Reading libopen-rte.so.6.0.0
Reading libopen-pal.so.6.0.0
Reading libsendfile.so.1
Reading libpicl.so.1
Reading libkstat.so.1
Reading liblgrp.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading librt.so.1
Reading libm.so.2
Reading libthread.so.1
Reading libc.so.1
Reading libdoor.so.1
Reading libaio.so.1
Reading libmd.so.1
(dbx) run -a
Running: ompi_info -a
(process id 17678)
Reading libc_psr.so.1
...
Reading mca_topo_basic.so
Reading mca_vprotocol_pessimist.so
Prefix: /usr/local/openmpi-1.7.4_64_cc
Exec_prefix: /usr/local/openmpi-1.7.4_64_cc
Bindir: /usr/local/openmpi-1.7.4_64_cc/bin
...
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
MCA mca: parameter "mca_param_files" (current value:
"/home/fd1026/.openmpi/mca-params.conf:
/usr/local/openmpi-1.7.4_64_cc/etc/openmpi-mca-params.conf",
data source: default, level: 2 user/detail, type:
string, deprecated, synonym of:
mca_base_param_files)
Path for MCA configuration files containing
variable values
MCA mca: parameter "mca_component_path" (current value:
"/usr/local/openmpi-1.7.4_64_cc/lib64/openmpi:
/home/fd1026/.openmpi/components",
data source: default, level: 9 dev/all, type:
string, deprecated, synonym of:
mca_base_component_path)
Path where to look for Open MPI and ORTE components
MCA mca: parameter "mca_component_show_load_errors" (current
value: "true", data source: default, level: 9
dev/all, type: bool, deprecated, synonym of:
mca_base_component_show_load_errors)
Whether to show errors for components that failed
to load or not
Valid values: 0: f|false|disabled, 1:
t|true|enabled
t@1 (l@1) signal BUS (invalid address alignment) in var_value_string at
line 1685 in file "mca_base_var.c"
1685 ret = var->mbv_enumerator->string_from_value(var->mbv_enumerator,
value->intval, &tmp);
(dbx)
(dbx)
(dbx)
(dbx) check -all
dbx: warning: check -all will be turned on in the next run of the process
access checking - OFF
memuse checking - OFF
(dbx) run -a
Running: ompi_info -a
(process id 17680)
Reading rtcapihook.so
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading rtcboot.so
Reading librtc.so
Reading libmd_psr.so.1
RTC: Enabling Error Checking...
RTC: Using UltraSparc trap mechanism
RTC: See `help rtc showmap' and `help rtc limitations' for details.
RTC: Running program...
Read from uninitialized (rui) on thread 1:
Attempting to read 4 bytes at address 0xffffffff7fffd548
which is 184 bytes above the current stack pointer
Variable is 'index'
t@1 (l@1) stopped in var_find at line 802 in file "mca_base_var.c"
802 return (OPAL_SUCCESS != ret) ? ret : index;
(dbx)
2) Bus error with "make check"
tail -15 log.make-check.SunOS.sparc.64_cc
>>--------------------------------------------<<
PASS: ddt_test
/bin/bash: line 5: 4466 Bus Error ${dir}$tst
FAIL: ddt_raw
========================================================
1 of 6 tests failed
Please report to http://www.open-mpi.org/community/help/
========================================================
make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `.../test/datatype'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `.../test/datatype'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `.../test'
make: *** [check-recursive] Error 1
tyr openmpi-1.7.4a1r29784-SunOS.sparc.64_cc 116 cd test/datatype/.libs/
tyr .libs 117 /opt/solstudio12.3/bin/sparcv9/dbx ddt_raw
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading ddt_raw
Reading ld.so.1
Reading libmpi.so.1.2.0
Reading libopen-rte.so.6.0.0
Reading libopen-pal.so.6.0.0
Reading libsendfile.so.1
Reading libpicl.so.1
Reading libkstat.so.1
Reading liblgrp.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading librt.so.1
Reading libm.so.2
Reading libthread.so.1
Reading libc.so.1
Reading libdoor.so.1
Reading libaio.so.1
Reading libmd.so.1
(dbx) run
Running: ddt_raw
(process id 17689)
Reading libc_psr.so.1
#
* TEST INVERSED VECTOR
#
t@1 (l@1) signal BUS (invalid address alignment) in opal_convertor_raw
at line 64 in file "opal_convertor_raw.c"
64 DO_DEBUG( opal_output( 0, "opal_convertor_raw( %p, {%p,
%u}, %lu )\n", (void*)pConvertor,
(dbx)
(dbx)
(dbx)
(dbx) check -all
dbx: warning: check -all will be turned on in the next run of the process
access checking - OFF
memuse checking - OFF
(dbx) run
Running: ddt_raw
(process id 17691)
Reading rtcapihook.so
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading rtcboot.so
Reading librtc.so
Reading libmd_psr.so.1
RTC: Enabling Error Checking...
RTC: Using UltraSparc trap mechanism
RTC: See `help rtc showmap' and `help rtc limitations' for details.
RTC: Running program...
#
* TEST INVERSED VECTOR
#
Misaligned read (mar) on thread 1:
Attempting to read 4 bytes at address 0xffffffff60cca179
t@1 (l@1) stopped in opal_convertor_raw at line 64 in file
"opal_convertor_raw.c"
64 DO_DEBUG( opal_output( 0, "opal_convertor_raw( %p,
{%p, %u}, %lu )\n", (void*)pConvertor,
(dbx)
3) Bus error with my programs
tyr small_prog 122 mpicc init_finalize.c
tyr small_prog 123 /opt/solstudio12.3/bin/sparcv9/dbx
/usr/local/openmpi-1.7.4_64_cc/bin/mpiexec
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading mpiexec
Reading ld.so.1
Reading libopen-rte.so.6.0.0
Reading libopen-pal.so.6.0.0
Reading libsendfile.so.1
Reading libpicl.so.1
Reading libkstat.so.1
Reading liblgrp.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading librt.so.1
Reading libm.so.2
Reading libthread.so.1
Reading libc.so.1
Reading libdoor.so.1
Reading libaio.so.1
Reading libmd.so.1
(dbx) run -np 1 a.out
Running: mpiexec -np 1 a.out
(process id 17791)
Reading libc_psr.so.1
Reading mca_shmem_mmap.so
Reading libmp.so.2
...
Reading mca_dfs_orted.so
Reading mca_dfs_test.so
t@1 (l@1) signal BUS (invalid address alignment) in opal_net_samenetwork
at line 272 in file "net.c"
272 (inaddr2->sin_addr.s_addr & netmask)) {
(dbx)
(dbx)
(dbx)
(dbx) check -all
dbx: warning: check -all will be turned on in the next run of the process
access checking - OFF
memuse checking - OFF
(dbx) run -np 1 a.out
Running: mpiexec -np 1 a.out
(process id 17794)
Reading rtcapihook.so
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading rtcboot.so
Reading librtc.so
Reading libmd_psr.so.1
RTC: Enabling Error Checking...
RTC: Using UltraSparc trap mechanism
RTC: See `help rtc showmap' and `help rtc limitations' for details.
RTC: Running program...
Read from uninitialized (rui) on thread 1:
Attempting to read 4 bytes at address 0xffffffff7fffd368
which is 184 bytes above the current stack pointer
Variable is 'index'
t@1 (l@1) stopped in var_find at line 802 in file "mca_base_var.c"
802 return (OPAL_SUCCESS != ret) ? ret : index;
(dbx)
I have the same problems with openmpi-1.9a1r29790 (same files).
tyr fd1026 107 ompi_info |grep MPI:
Open MPI: 1.9a1r29790
tyr fd1026 108 ompi_info -a | grep MPI:
[tyr:17867] *** Process received signal ***
[tyr:17867] Signal: Bus Error (10)
[tyr:17867] Signal code: Invalid address alignment (1)
[tyr:17867] Failing at address: ffffffff7d3c5ac1
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
opal_backtrace_print+0x14
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
0x17f268
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
0x134b9c [ Signal 2099923552 (?)]
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
mca_base_var_dump+0x1b0
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
0x89828
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
opal_info_show_mca_params+0xb4
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
opal_info_do_params+0x364
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/ompi_info:main+0x628
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/ompi_info:_start+0x12c
[tyr:17867] *** End of error message ***
Bus error
tyr fd1026 109
I would be grateful, if somebody could solve the problems. Do you need
any further information?
Kind regards
Siegmar