I'm desalting the ChEMBL data set and generating the corresponding de-salted SD and SMILES files. I found a problem in the conversion step, and found that the problem has nothing to do with the de-salting.
My code failed with CHEMBL1269997, which is record ~750,200 out of 1,142,974. (In other words, it took a while to get to this point.) Here's a reproducible: >>> from rdkit import Chem >>> writer = Chem.SDWriter("/dev/stdout") >>> for mol in Chem.ForwardSDMolSupplier("CHEMBL1269997.sdf"): ... writer.write(mol) ... [22:11:05] **** Invariant Violation Violation occurred on line 388 in file /tmp/homebrew-rdkit-HEAD-Ebdo/Code/GraphMol/FileParsers/MolFileStereochem.cpp Failed Expression: pick >= 0 **** Traceback (most recent call last): File "<stdin>", line 2, in <module> RuntimeError: Invariant Violation >>> Chem.MolToSmiles(mol) 'OCC1=CC2OC(CC(C)C)(CC(C)C)C3C4CCCC56C(OC(C)(C)O5)C1(O)C46C23' >>> Chem.MolToSmiles(mol, isomericSmiles=True) 'OCC1=C[C@@H]2OC(CC(C)C)(CC(C)C)[C@@H]3[C@H]4CCC[C@@]56[C@@H](OC(C)(C)O5)[C@]1(O)[C@]46[C@H]23' >>> You can see that the molecule was read in, is not None, and can be used to generate a SMILES. The CHEMBL1269997.sdf is attached. This error was previously reported in the thread JP started, titled "Invariant violation...", dated July 6, 2011. Greg replied: > Wow that is certainly an error I never expected to see. From the code, > I guess the molecule has a stereocenter that is surrounded by other > stereocenters and something extremely unfortunate is happening with > the way decisions are being made about which bonds to wedge. As Eddie > requested in an earlier message, it would be helpful to have the input > that produced the error so that it can be added to the test cases (and > so that I can be sure the problem is fixed once I figure out how to). but I see no posting of a failing structure. I hope the attached structure helps resolve this problem. Andrew da...@dalkescientific.com
CHEMBL1269997.sdf
Description: Binary data
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss