I installed TJ O'Donnell's openCHORD on my Mac. It took some
doing, and I documented all of the steps at

http://dalkescientific.com/writings/diary/archive/2010/04/12/compiling_openchord.html

I ran into problems loading the NCI data set at 260,000
compounds. I worked with TJ to try and track down the problem.
It's in the Python or OpenBabel levels. I have a reproducible
but it takes a long time to trigger the problem. The data set
for the reproducible is the 200+ GB file at
   http://cactus.nci.nih.gov/DownLoad/NCI-Open_09-03.sdf.gz


Does anyone recognize this problem, or can think of some change
which has gone into the code which might fix it? I haven't yet
tried install OB from version control; I was hoping to identify
the bug first.


I've attached the reproducible. One of the strange things is that "tj2.py", 
which is a simple transformation of some of the Python code from 
"tj_with_bug.py", does not trigger the segfault. This strongly suggests some 
different caused by memory allocation.


I've managed to collect a few error messages and stack traces over time.

Once from the postgres extension handler (usually I just got a segfault)

postgres: dalke openchord [local] SELECT(59230) malloc: *** error for object 
0x100000016: pointer being freed was not allocated


Here's a stack trace from gdb:


Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
OpenBabel::OBMol::DestroyBond (this=<value temporarily unavailable, due to 
optimizations>, bond=0x10541efa0) at mol.cpp:1471
1471            delete bond;
(gdb) 

#0  OpenBabel::OBMol::DestroyBond (this=<value temporarily unavailable, due to 
optimizations>, bond=0x10541efa0) at mol.cpp:1471
#1  0x0000000101a0fd9f in OpenBabel::OBConversion::GetInStream () at 
/Users/dalke/ftps/openbabel-2.2.3/include/openbabel/obconversion.h:256
#2  0x0000000101a0fd9f in OpenBabel::SMIBaseFormat::ReadMolecule 
(this=0x10541efa0, pOb=0x102c83990, pConv=0x10027cc10) at base.h:854
#3  0x00000001013dfe2c in OpenBabel::OBConversion::Read (this=0x10027cc10, 
pOb=0x102c81140, pin=<value temporarily unavailable, due to optimizations>) at 
obconversion.cpp:745
#4  0x00000001013e3f4c in OpenBabel::OBConversion::ReadString 
(this=0x10027cc10, pOb=0x102c81140, input=<value temporarily unavailable, due 
to optimizations>) at obconversion.cpp:893
warning: .o file 
"/Users/dalke/ftps/openbabel-2.2.3/scripts/python/build/temp.macosx-10.6-universal-2.6/openbabel_python.o"
 more recent than executable timestamp in 
"/Library/Python/2.6/site-packages/_openbabel.so"
warning: Couldn't open object file 
'/Users/dalke/ftps/openbabel-2.2.3/scripts/python/build/temp.macosx-10.6-universal-2.6/openbabel_python.o'
#5  0x0000000101135d6c in _wrap_OBConversion_ReadString ()
#6  0x000000010000aff3 in PyObject_Call ()
#7  0x000000010008a51a in PyEval_EvalFrameEx ()
#8  0x00000001000892e1 in PyEval_EvalFrameEx ()
#9  0x000000010008acce in PyEval_EvalCodeEx ()
#10 0x000000010008ad61 in PyEval_EvalCode ()
#11 0x00000001000a265a in Py_CompileString ()
#12 0x00000001000a2723 in PyRun_FileExFlags ()
#13 0x00000001000a423d in PyRun_SimpleFileExFlags ()
#14 0x00000001000b0286 in Py_Main ()
#15 0x0000000100000e6c in ?? ()
Current language:  auto; currently c++

Here's part of another gdb trace, where you can see the
actual failure mode is not consistent.


Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000001900000010
0x000000010137753b in OpenBabel::OBBase::Clear (this=0x1047cb810) at base.cpp:77
77                delete *m;
(gdb) where
#0  0x000000010137753b in OpenBabel::OBBase::Clear (this=0x1047cb810) at 
base.cpp:77
#1  0x00000001013ccc83 in OpenBabel::OBMol::Clear (this=0x1047cb810) at 
mol.cpp:1345
#2  0x0000000101a0fd9f in OpenBabel::OBConversion::GetInStream () at 
/Users/dalke/ftps/openbabel-2.2.3/include/openbabel/obconversion.h:256
#3  0x0000000101a0fd9f in OpenBabel::SMIBaseFormat::ReadMolecule 
(this=0x10000001c, pOb=0x1047cb810, pConv=0x10027ca10) at base.h:854
#4  0x00000001013dfe2c in OpenBabel::OBConversion::Read (this=0x10027ca10, 
pOb=0x1047cb810, pin=<value temporarily unavailable, due to optimizations>) at 
obconversion.cpp:745
#5  0x00000001013e3f4c in OpenBabel::OBConversion::ReadString 
(this=0x10027ca10, pOb=0x1047cb810, input=<value temporarily unavailable, due 
to optimizations>) at obconversion.cpp:893
#6  0x0000000101135d6c in _wrap_OBConversion_ReadString ()





and here's a yet different stack trace from Apple's built-in crash reporter:

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   libstdc++.6.dylib                   0x00007fff861e47cd __dynamic_cast + 89
1   smilesformat.so                     0x0000000101a0fd8e 
OpenBabel::SMIBaseFormat::ReadMolecule(OpenBabel::OBBase*, 
OpenBabel::OBConversion*) + 78 (base.h:254)
2   libopenbabel.3.dylib                0x00000001013dfe2c 
OpenBabel::OBConversion::Read(OpenBabel::OBBase*, std::istream*) + 220 
(obconversion.cpp:745)
3   libopenbabel.3.dylib                0x00000001013e3f4c 
OpenBabel::OBConversion::ReadString(OpenBabel::OBBase*, std::string) + 508 
(obconversion.cpp:893)
4   _openbabel.so                       0x0000000101135d6c 
_wrap_OBConversion_ReadString + 881
     ... Python run-time calls omitted ...


Any ideas about what's going on?

                                Andrew
                                da...@dalkescientific.com


import sys
import gzip
#import openchord
import openbabel

f = gzip.open("NCI-Open_09-03.sdf.gz")
f = iter(enumerate(f))
GD = {}
GD['mol'] = dict()
GD['obc'] = openbabel.OBConversion()
GD['obc'].SetInFormat("smi")
GD['nmol'] = 0
GD['maxsmi'] = 10000

n = 0
for lineno, line in f:
   if lineno % 10000 == 0:
       sys.stdout.write("\r %d / %d" % (n, lineno))
       sys.stdout.flush()
   if line.startswith("> <E_SMILES>"):
       lineno, line = next(f)
       smiles = line.strip()
       #mol = openchord.parse_smi(GD, smiles)

       if GD['nmol'] < GD['maxsmi']:
           mol = openbabel.OBMol()
           GD['nmol'] += 1
           #plpy.notice('new mol for %s' % smiles)
       else:
           key,mol = GD['mol'].popitem()
           #plpy.notice('mol reuse %s for %s' % (key,smiles))

       if GD['obc'].ReadString(mol, smiles):
           GD['mol'][smiles] = mol
           # return copy is slower, but safer?
           # return openbabel.OBMol(mol)

       n += 1
import sys
import gzip
#import openchord
import openbabel

f = gzip.open("NCI-Open_09-03.sdf.gz")
f = iter(enumerate(f))
GD = {}
GD['mol'] = dict()
obc = openbabel.OBConversion()
obc.SetInFormat("smi")
nmol = 0
maxsmi = 10000

n = 0
for lineno, line in f:
   if lineno % 10000 == 0:
       sys.stdout.write("\r %d / %d" % (n, lineno))
       sys.stdout.flush()
   if line.startswith("> <E_SMILES>"):
       lineno, line = next(f)
       smiles = line.strip()
       #mol = openchord.parse_smi(GD, smiles)

       if nmol < maxsmi:
           mol = openbabel.OBMol()
           nmol += 1
           #plpy.notice('new mol for %s' % smiles)
       else:
           key,mol = GD['mol'].popitem()
           #plpy.notice('mol reuse %s for %s' % (key,smiles))

       if obc.ReadString(mol, smiles):
           GD['mol'][smiles] = mol
           # return copy is slower, but safer?
           # return openbabel.OBMol(mol)

       n += 1
------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to