I installed TJ O'Donnell's openCHORD on my Mac. It took some
doing, and I documented all of the steps at
http://dalkescientific.com/writings/diary/archive/2010/04/12/compiling_openchord.html
I ran into problems loading the NCI data set at 260,000
compounds. I worked with TJ to try and track down the problem.
It's in the Python or OpenBabel levels. I have a reproducible
but it takes a long time to trigger the problem. The data set
for the reproducible is the 200+ GB file at
http://cactus.nci.nih.gov/DownLoad/NCI-Open_09-03.sdf.gz
Does anyone recognize this problem, or can think of some change
which has gone into the code which might fix it? I haven't yet
tried install OB from version control; I was hoping to identify
the bug first.
I've attached the reproducible. One of the strange things is that "tj2.py",
which is a simple transformation of some of the Python code from
"tj_with_bug.py", does not trigger the segfault. This strongly suggests some
different caused by memory allocation.
I've managed to collect a few error messages and stack traces over time.
Once from the postgres extension handler (usually I just got a segfault)
postgres: dalke openchord [local] SELECT(59230) malloc: *** error for object
0x100000016: pointer being freed was not allocated
Here's a stack trace from gdb:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
OpenBabel::OBMol::DestroyBond (this=<value temporarily unavailable, due to
optimizations>, bond=0x10541efa0) at mol.cpp:1471
1471 delete bond;
(gdb)
#0 OpenBabel::OBMol::DestroyBond (this=<value temporarily unavailable, due to
optimizations>, bond=0x10541efa0) at mol.cpp:1471
#1 0x0000000101a0fd9f in OpenBabel::OBConversion::GetInStream () at
/Users/dalke/ftps/openbabel-2.2.3/include/openbabel/obconversion.h:256
#2 0x0000000101a0fd9f in OpenBabel::SMIBaseFormat::ReadMolecule
(this=0x10541efa0, pOb=0x102c83990, pConv=0x10027cc10) at base.h:854
#3 0x00000001013dfe2c in OpenBabel::OBConversion::Read (this=0x10027cc10,
pOb=0x102c81140, pin=<value temporarily unavailable, due to optimizations>) at
obconversion.cpp:745
#4 0x00000001013e3f4c in OpenBabel::OBConversion::ReadString
(this=0x10027cc10, pOb=0x102c81140, input=<value temporarily unavailable, due
to optimizations>) at obconversion.cpp:893
warning: .o file
"/Users/dalke/ftps/openbabel-2.2.3/scripts/python/build/temp.macosx-10.6-universal-2.6/openbabel_python.o"
more recent than executable timestamp in
"/Library/Python/2.6/site-packages/_openbabel.so"
warning: Couldn't open object file
'/Users/dalke/ftps/openbabel-2.2.3/scripts/python/build/temp.macosx-10.6-universal-2.6/openbabel_python.o'
#5 0x0000000101135d6c in _wrap_OBConversion_ReadString ()
#6 0x000000010000aff3 in PyObject_Call ()
#7 0x000000010008a51a in PyEval_EvalFrameEx ()
#8 0x00000001000892e1 in PyEval_EvalFrameEx ()
#9 0x000000010008acce in PyEval_EvalCodeEx ()
#10 0x000000010008ad61 in PyEval_EvalCode ()
#11 0x00000001000a265a in Py_CompileString ()
#12 0x00000001000a2723 in PyRun_FileExFlags ()
#13 0x00000001000a423d in PyRun_SimpleFileExFlags ()
#14 0x00000001000b0286 in Py_Main ()
#15 0x0000000100000e6c in ?? ()
Current language: auto; currently c++
Here's part of another gdb trace, where you can see the
actual failure mode is not consistent.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000001900000010
0x000000010137753b in OpenBabel::OBBase::Clear (this=0x1047cb810) at base.cpp:77
77 delete *m;
(gdb) where
#0 0x000000010137753b in OpenBabel::OBBase::Clear (this=0x1047cb810) at
base.cpp:77
#1 0x00000001013ccc83 in OpenBabel::OBMol::Clear (this=0x1047cb810) at
mol.cpp:1345
#2 0x0000000101a0fd9f in OpenBabel::OBConversion::GetInStream () at
/Users/dalke/ftps/openbabel-2.2.3/include/openbabel/obconversion.h:256
#3 0x0000000101a0fd9f in OpenBabel::SMIBaseFormat::ReadMolecule
(this=0x10000001c, pOb=0x1047cb810, pConv=0x10027ca10) at base.h:854
#4 0x00000001013dfe2c in OpenBabel::OBConversion::Read (this=0x10027ca10,
pOb=0x1047cb810, pin=<value temporarily unavailable, due to optimizations>) at
obconversion.cpp:745
#5 0x00000001013e3f4c in OpenBabel::OBConversion::ReadString
(this=0x10027ca10, pOb=0x1047cb810, input=<value temporarily unavailable, due
to optimizations>) at obconversion.cpp:893
#6 0x0000000101135d6c in _wrap_OBConversion_ReadString ()
and here's a yet different stack trace from Apple's built-in crash reporter:
Thread 0 Crashed: Dispatch queue: com.apple.main-thread
0 libstdc++.6.dylib 0x00007fff861e47cd __dynamic_cast + 89
1 smilesformat.so 0x0000000101a0fd8e
OpenBabel::SMIBaseFormat::ReadMolecule(OpenBabel::OBBase*,
OpenBabel::OBConversion*) + 78 (base.h:254)
2 libopenbabel.3.dylib 0x00000001013dfe2c
OpenBabel::OBConversion::Read(OpenBabel::OBBase*, std::istream*) + 220
(obconversion.cpp:745)
3 libopenbabel.3.dylib 0x00000001013e3f4c
OpenBabel::OBConversion::ReadString(OpenBabel::OBBase*, std::string) + 508
(obconversion.cpp:893)
4 _openbabel.so 0x0000000101135d6c
_wrap_OBConversion_ReadString + 881
... Python run-time calls omitted ...
Any ideas about what's going on?
Andrew
da...@dalkescientific.com
import sys
import gzip
#import openchord
import openbabel
f = gzip.open("NCI-Open_09-03.sdf.gz")
f = iter(enumerate(f))
GD = {}
GD['mol'] = dict()
GD['obc'] = openbabel.OBConversion()
GD['obc'].SetInFormat("smi")
GD['nmol'] = 0
GD['maxsmi'] = 10000
n = 0
for lineno, line in f:
if lineno % 10000 == 0:
sys.stdout.write("\r %d / %d" % (n, lineno))
sys.stdout.flush()
if line.startswith("> <E_SMILES>"):
lineno, line = next(f)
smiles = line.strip()
#mol = openchord.parse_smi(GD, smiles)
if GD['nmol'] < GD['maxsmi']:
mol = openbabel.OBMol()
GD['nmol'] += 1
#plpy.notice('new mol for %s' % smiles)
else:
key,mol = GD['mol'].popitem()
#plpy.notice('mol reuse %s for %s' % (key,smiles))
if GD['obc'].ReadString(mol, smiles):
GD['mol'][smiles] = mol
# return copy is slower, but safer?
# return openbabel.OBMol(mol)
n += 1
import sys
import gzip
#import openchord
import openbabel
f = gzip.open("NCI-Open_09-03.sdf.gz")
f = iter(enumerate(f))
GD = {}
GD['mol'] = dict()
obc = openbabel.OBConversion()
obc.SetInFormat("smi")
nmol = 0
maxsmi = 10000
n = 0
for lineno, line in f:
if lineno % 10000 == 0:
sys.stdout.write("\r %d / %d" % (n, lineno))
sys.stdout.flush()
if line.startswith("> <E_SMILES>"):
lineno, line = next(f)
smiles = line.strip()
#mol = openchord.parse_smi(GD, smiles)
if nmol < maxsmi:
mol = openbabel.OBMol()
nmol += 1
#plpy.notice('new mol for %s' % smiles)
else:
key,mol = GD['mol'].popitem()
#plpy.notice('mol reuse %s for %s' % (key,smiles))
if obc.ReadString(mol, smiles):
GD['mol'][smiles] = mol
# return copy is slower, but safer?
# return openbabel.OBMol(mol)
n += 1
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel