Hi,
I am trying to port two OpenDX loadable modules (ImportHDF5Field and
ImportHDF5Species of the DXHDF5 package) to AIX, which both are
written in C++. The modules build correctly but they crash when I try
to run them. We have successfully been using the modules on other
platforms: Linux, IRIX64 and Macintosh Darwin Kernel.
Let me start with giving some setting information. The platform is
AIX 5.1, VisualAge C++ Professional / C for AIX Compiler, Version 6.
I am using OpenDX 4.3.0 and HDF5 1.6.0. As the problems with the two
modules are the same, from now on I will write only about one of the
modules, ImportHDF5Field.
Below are two sections: one gives details about building the module,
the other about running the module.
BUILDING THE MODULE
-------------------
The module builds correctly: no warnings or errors are issued. First,
the source files are compiled and archived into one static library
(libdxh5base.a) file, plus there is the file "ImportHDF5Field.o" which
implements the module. Second, the linking command creates one shared
library file in the following way:
xlC -o ImportHDF5Field -qmkshrobj -berok -eDXEntry -bI:dxexec.imp
-bexpfull ImportHDF5Field.o ../dxh5base/libdxh5base.a
-L/u11/ijs/local/hdf5-1.6.0/lib -lhdf5 -lhdf5_cpp
The produced file "ImportHDF5Field" is the loadable module.
The import file "dxexec.imp" provided with the option "-bI:" in the
above command is a modified version of the OpenDX export file
"${dxroot}/dx/lib/dxexec.exp"; the only modification is that the first
line of the file was changed from "#!" to "#!." to let the linker know
that the imported symbols are to be found in the main executable, i.e.
"dxexec". I need to have a modified version of the file, because
"dxexec" is built without the option "-brtl".
The libraries "hdf5" and "hdf5_cpp" in "/u11/ijs/local/hdf5-1.6.0/lib"
are static.
To play safe, I am not telling the linker with the option "-bE:" which
of the module's symbols should be exported, but instead I export all
the possible symbols by using the option "-bexpfull". I am using
"-bexpfull", and NOT "-bexpall", lest some C++ symbols were not
exported by the linker.
RUNNING THE MODULE
------------------
"dxexec" loads the module with the "dlopen" function and then calls
the module successfully. The module continues to run for some time,
calls a few OpenDX functions defined in "dxexec" (e.g. DXAddModule,
DXExtractString), creates a few C++ objects, calls a few C++ functions
(some of them are C++ functions from the HDF5 library), but at some
point later the module receives the SIGILL signal. OpenDX message
window reports this:
> -1: cleaning up and exiting
> child process 2 (78804) killed by signal = 4
>
> parent exiting
When I run "dxexec" and the module in a debugger (which is dbx), I
get:
> trace in H5File.cpp: 79 hid_t access_plist_id = create_plist.getId();
> tracei: 0xd98136cc ($b11216) 80610124 lwz r3,0x124(r1)
> tracei: 0xd98136d0 ($b11216+0x4) 80630000 lwz r3,0x0(r3)
> tracei: 0xd98136d4 ($b11216+0x8) 80830014 lwz r4,0x14(r3)
> tracei: 0xd98136d8 ($b11216+0xc) 80610124 lwz r3,0x124(r1)
> tracei: 0xd98136dc ($b11216+0x10) 80a30000 lwz r5,0x0(r3)
> tracei: 0xd98136e0 ($b11216+0x14) 81650010 lwz r11,0x10(r5)
> tracei: 0xd98136e4 ($b11216+0x18) 7c632214 add r3,r3,r4
> tracei: 0xd98136e8 ($b11216+0x1c) 4be4afd9 bl 0xd965e6c0 (_ptrgl)
> tracei: 0xd965e6c0 (_ptrgl) 800b0000 lwz r0,0x0(r11)
> tracei: 0xd965e6c4 (_ptrgl+0x4) 90410014 stw r2,0x14(r1)
> tracei: 0xd965e6c8 (_ptrgl+0x8) 7c0903a6 mtctr r0
> tracei: 0xd965e6cc (_ptrgl+0xc) 804b0004 lwz r2,0x4(r11)
> tracei: 0xd965e6d0 (_ptrgl+0x10) 816b0008 lwz r11,0x8(r11)
> tracei: 0xd965e6d4 (_ptrgl+0x14) 4e800420 bctr
>
> Illegal instruction in . at 0x0 ($t1)
> 0x00000000 00000000 Invalid opcode.
It seems that there is a problem with calling the "getId()" function
of HDF5. This function, however, is defined in the file, because when
I do:
nm ImportHDF5Field | grep getId
then I get:
> .H5::IdComponent::getId() const T 270414220 84
> .H5::PredType::getId() const T 270855216 3168
> H5::IdComponent::getId() const - 3308
> H5::IdComponent::getId() const D 536951424 12
> H5::PredType::getId() const - 520
> H5::PredType::getId() const D 536956368 12
> Q2_2H511IdComponent:T1=Y12c(cup24__vftQ2_2H511IdComponent:__vfp:7,0,32;u[f:setId__Q2_2H511IdComponentFi:8;u[c:__ct__Q2_2H511IdComponentFi:11;u[c:__ct__Q2_2H511IdComponentFRCQ2_2H511IdComponent:13;vu0[fk:getId__Q2_2H511IdComponentCFv:17;u[f:incRefCount__Q2_2H511IdComponentFv:18;u[f:decRefCount__Q2_2H511IdComponentFv:19;u[f:getCounter__Q2_2H511IdComponentFv:20;u[f:noReference__Q2_2H511IdComponentFv:22;u[f:__as__Q2_2H511IdComponentFRCQ2_2H511IdComponent:24;u[f:reset__Q2_2H511IdComponentFv:25;vu0[d:__dt__Q2_2H511IdComponentFv:26;o:id:10,32,32;o:ref_count:28,64,32;o[c:__ct__Q2_2H511IdComponentFv:26;;
> - 0
>
> grep: 0652-226 Maximum line length of 2048 exceeded.
Above we see that the "getId()" function is defined in the module
file. The only thing I do not understand is the line above which
starts with "Q2"...
AN INTERESTING THING
--------------------
I wrote a small program that tests our module (its source code is at
the bottom of this e-mail). I build the module in the same way as the
"dxexec" is built - at least I think they are built the same way.
"dxexec" is built this way:
> cc -g -o dxexec -bE:../../../lib/dxexec.exp main.o ../dxmods/user.o
> ../libdx/mem.o ../libdx/memory.o ../dpexec/.libs/libDPEXEC.a
> ../dxmods/.libs/libDXMODS.a ../dxmods/.libs/libDXMODSN.a
> ../libdx/.libs/libLIBDX.a ../hwrender/.libs/libHW.a
> ../hwrender/opengl/.libs/libOPENGL.a -lnsl -ldl -lXm -lGLU -lGL -lm
> -lXext -lXt -lX11 -lSM -lICE -lpthread
My small test program is built this way:
> cc -g -c -I/u11/ijs/local/dx-4.3.0/dx/include/ test.c
>
> cc -bE:/u11/ijs/local/dx-4.3.0/dx/lib/dxexec.exp
> -L/u11/ijs/local/dx-4.3.0/dx/lib_ibm6000 -lm -lDXlite test.o -o test
Now, the interesting thing is that the module ("ImportHDF5Field")
crashes when it is called from "dxexec", but runs CORRECTLY when it is
called from the test program!
CONCLUSION
----------
So far I think I am following the rules from the IBM book "Developing
and Porting C and C++ Applications on AIX". I have tried building the
module with the command "makeC++SharedLib", then I tried various
options (-qmkshrobj, -G, -brtl, -bnortllib, -bM:SRE, ...) and the
problem is still not solved.
Thank you for your time. I will be grateful for your advice.
Best,
Irek