I'm desalting the ChEMBL data set and generating the corresponding de-salted SD 
and SMILES files. I found a problem in the conversion step, and found that the 
problem has nothing to do with the de-salting.

My code failed with CHEMBL1269997, which is record ~750,200 out of 1,142,974. 
(In other words, it took a while to get to this point.) Here's a reproducible:

>>> from rdkit import Chem
>>> writer = Chem.SDWriter("/dev/stdout")
>>> for mol in Chem.ForwardSDMolSupplier("CHEMBL1269997.sdf"):
...   writer.write(mol)
... 
[22:11:05] 

****
Invariant Violation

Violation occurred on line 388 in file 
/tmp/homebrew-rdkit-HEAD-Ebdo/Code/GraphMol/FileParsers/MolFileStereochem.cpp
Failed Expression: pick >= 0
****

Traceback (most recent call last):
 File "<stdin>", line 2, in <module>
RuntimeError: Invariant Violation
>>> Chem.MolToSmiles(mol)
'OCC1=CC2OC(CC(C)C)(CC(C)C)C3C4CCCC56C(OC(C)(C)O5)C1(O)C46C23'
>>> Chem.MolToSmiles(mol, isomericSmiles=True)
'OCC1=C[C@@H]2OC(CC(C)C)(CC(C)C)[C@@H]3[C@H]4CCC[C@@]56[C@@H](OC(C)(C)O5)[C@]1(O)[C@]46[C@H]23'
>>> 

You can see that the molecule was read in, is not None, and can be used to 
generate a SMILES.

The CHEMBL1269997.sdf is attached.

This error was previously reported in the thread JP started, titled "Invariant 
violation...", dated July 6, 2011. Greg replied:

> Wow that is certainly an error I never expected to see. From the code,
> I guess the molecule has a stereocenter that is surrounded by other
> stereocenters and something extremely unfortunate is happening with
> the way decisions are being made about which bonds to wedge. As Eddie
> requested in an earlier message, it would be helpful to have the input
> that produced the error so that it can be added to the test cases (and
> so that I can be sure the problem is fixed once I figure out how to).

but I see no posting of a failing structure. I hope the attached structure 
helps resolve this problem.



                                Andrew
                                da...@dalkescientific.com

Attachment: CHEMBL1269997.sdf
Description: Binary data

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to