Many Thanks for this Greg... as usual your helpful attitude adds so much more value to this already brilliant toolkit.
I have added another three testing cases which fail (attached)...I am not sure whether these are "new" cases, or instances of your supplied failing case (#9 in the attached: O=c1ccnc(c1)-c1cc1). I basically took three cases at random from my "failing" sdf. (I can keep you entertained for hours! :) ) Jean-Paul Ebejer Early Stage Researcher On 10 August 2011 04:45, Greg Landrum <greg.land...@gmail.com> wrote: > On Tue, Aug 9, 2011 at 1:53 PM, JP <jeanpaul.ebe...@inhibox.com> wrote: > > I have been using sanifix3.py to get around the "Can't kekulize mol" > > errors... I have a number of molecules which still give me this error > > (~1000 mols) even after running the sanifix script. I am attaching a > > sanifix3.py script which fails with two of these molecules as an example. > I > > can supply more if needed. > > Can someone guide me on how I can fix this to get it to work? (or are > all > > these molecules chemically nonsense) > > Oh, that was a fun one. > The problem came because of the aromatic nitrogens attached to other > ring atoms. These were not being properly handled in the fragmentation > process. I've attached a corrected version, sanifix4.py > > -greg >
""" This code belongs to James Davidson and is discussed here: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01185.html http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01162.html http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01900.html """ mb = [ """MolPort-000-002-029 Marvin 05210809592D 18 20 0 0 0 0 999 V2000 0.0000 2.1304 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 1.3054 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7144 0.8929 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4290 1.3054 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4290 2.1304 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7144 2.5429 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.2136 1.0505 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 2.6985 1.7179 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.2136 2.3854 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5235 1.7179 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 4.2211 1.7179 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.7060 1.0505 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.4906 1.3054 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.4906 2.1304 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.7060 2.3854 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 6.1580 0.8205 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.9117 1.1561 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 6.0718 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 0 0 0 0 2 3 4 0 0 0 0 3 4 4 0 0 0 0 4 5 4 0 0 0 0 5 6 4 0 0 0 0 6 1 4 0 0 0 0 4 7 4 0 0 0 0 7 8 4 0 0 0 0 8 9 4 0 0 0 0 9 5 4 0 0 0 0 8 10 1 0 0 0 0 10 11 1 0 0 0 0 11 12 4 0 0 0 0 12 13 4 0 0 0 0 13 14 4 0 0 0 0 14 15 4 0 0 0 0 15 11 4 0 0 0 0 13 16 1 0 0 0 0 16 17 2 0 0 0 0 16 18 1 0 0 0 0 M STY 2 1 DAT 2 DAT M SAL 1 1 7 M SDT 1 MRV_IMPLICIT_H M SDD 1 0.0000 0.0000 DR ALL 0 0 M SED 1 IMPL_H1 M SAL 2 1 12 M SDT 2 MRV_IMPLICIT_H M SDD 2 0.0000 0.0000 DR ALL 0 0 M SED 2 IMPL_H1 M END > <PUBCHEM_EXT_DATASOURCE_REGID> MolPort-000-002-029 > <PUBCHEM_EXT_SUBSTANCE_URL> http://www.molport.com/buy-chemicals/molecule-link/MolPort-000-002-029 > <PUBCHEM_EXT_DATASOURCE_URL> http://www.molport.com $$$$ """, """MolPort-000-003-259 Marvin 05210810032D 20 21 0 0 0 0 999 V2000 2.8546 1.2418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8546 0.4139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.5684 1.6558 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 2.1411 1.6558 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.1411 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.5684 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 4.2821 1.2418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4274 1.2418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4274 0.4139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.2821 0.4139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.9958 1.6558 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7136 1.6558 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 0.7136 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 4.9958 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 5.7095 1.2418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 1.2418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.4139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.4231 1.6558 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.1340 1.2418 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 6.4231 2.4751 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 0 0 0 0 1 3 4 0 0 0 0 1 4 4 0 0 0 0 2 5 4 0 0 0 0 2 6 4 0 0 0 0 3 7 4 0 0 0 0 4 8 4 0 0 0 0 5 9 4 0 0 0 0 6 10 4 0 0 0 0 7 11 1 0 0 0 0 8 12 1 0 0 0 0 9 13 1 0 0 0 0 10 14 2 0 0 0 0 11 15 1 0 0 0 0 12 16 1 0 0 0 0 13 17 1 0 0 0 0 15 18 1 0 0 0 0 18 19 1 0 0 0 0 18 20 2 0 0 0 0 7 10 4 0 0 0 0 8 9 4 0 0 0 0 M STY 1 1 DAT M SAL 1 1 6 M SDT 1 MRV_IMPLICIT_H M SDD 1 0.0000 0.0000 DR ALL 0 0 M SED 1 IMPL_H1 M END > <PUBCHEM_EXT_DATASOURCE_REGID> MolPort-000-003-259 > <PUBCHEM_EXT_SUBSTANCE_URL> http://www.molport.com/buy-chemicals/molecule-link/MolPort-000-003-259 > <PUBCHEM_EXT_DATASOURCE_URL> http://www.molport.com $$$$""", """MolPort-000-014-293 Marvin 05240818432D 35 39 0 0 0 0 999 V2000 -0.2869 0.0074 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.4929 0.1839 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5663 0.9930 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -0.1765 1.3460 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -0.7061 0.7502 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.2878 2.1624 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -2.8244 1.4931 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.2878 0.8164 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -4.0968 1.9123 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -4.0968 1.0666 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.3540 -0.3456 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 1.1621 -0.3825 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 2.5743 -0.6253 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.9865 -0.8752 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7061 -0.6987 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.5520 0.7502 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.7662 -0.5958 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4346 -1.4417 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.9785 1.4563 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9785 -0.0957 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.9427 0.2574 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -4.8029 2.3390 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -4.8029 0.6399 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.8100 -1.6844 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2869 -1.4417 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.5225 -0.6987 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.3620 -1.1254 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.3028 0.8164 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -5.5090 1.9492 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -5.5090 1.0666 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.4793 -2.2507 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.9418 -1.4417 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.2222 -1.9344 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.6694 -2.1478 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.5225 -2.1478 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2 1 4 0 0 0 0 3 2 4 0 0 0 0 4 5 4 0 0 0 0 5 1 4 0 0 0 0 6 7 4 0 0 0 0 7 19 1 0 0 0 0 8 7 4 0 0 0 0 9 6 4 0 0 0 0 10 8 4 0 0 0 0 11 13 1 0 0 0 0 12 2 1 0 0 0 0 13 20 1 0 0 0 0 14 11 1 0 0 0 0 15 1 1 0 0 0 0 16 5 1 0 0 0 0 17 14 4 0 0 0 0 18 13 2 0 0 0 0 19 16 1 0 0 0 0 20 12 1 0 0 0 0 21 17 1 0 0 0 0 22 9 4 0 0 0 0 23 10 4 0 0 0 0 24 14 4 0 0 0 0 25 15 4 0 0 0 0 26 15 4 0 0 0 0 27 17 4 0 0 0 0 28 21 1 0 0 0 0 29 22 4 0 0 0 0 30 23 4 0 0 0 0 31 24 4 0 0 0 0 32 26 4 0 0 0 0 33 31 4 0 0 0 0 34 25 4 0 0 0 0 35 32 4 0 0 0 0 3 4 4 0 0 0 0 35 34 4 0 0 0 0 9 10 4 0 0 0 0 29 30 4 0 0 0 0 33 27 4 0 0 0 0 M STY 1 1 DAT M SAL 1 1 8 M SDT 1 MRV_IMPLICIT_H M SDD 1 0.0000 0.0000 DR ALL 0 0 M SED 1 IMPL_H1 M END > <PUBCHEM_EXT_DATASOURCE_REGID> MolPort-000-014-293 > <PUBCHEM_EXT_SUBSTANCE_URL> http://www.molport.com/buy-chemicals/molecule-link/MolPort-000-014-293 > <PUBCHEM_EXT_DATASOURCE_URL> http://www.molport.com > <STOCK> 8 > <STOCKMEASURE> 234 $$$$""" ] from rdkit import Chem from rdkit.Chem import AllChem def FragIndicesToMol(oMol,indices): em = Chem.EditableMol(Chem.Mol()) newIndices={} for i,idx in enumerate(indices): em.AddAtom(oMol.GetAtomWithIdx(idx)) newIndices[idx]=i for i,idx in enumerate(indices): at = oMol.GetAtomWithIdx(idx) for bond in at.GetBonds(): if bond.GetBeginAtomIdx()==idx: oidx = bond.GetEndAtomIdx() else: oidx = bond.GetBeginAtomIdx() # make sure every bond only gets added once: if oidx<idx: continue em.AddBond(newIndices[idx],newIndices[oidx],bond.GetBondType()) res = em.GetMol() res.ClearComputedProps() Chem.GetSymmSSSR(res) res.UpdatePropertyCache(False) res._idxMap=newIndices return res def _recursivelyModifyNs(mol,matches,indices=None): if indices is None: indices=[] res=None while len(matches) and res is None: tIndices=indices[:] nextIdx = matches.pop(0) tIndices.append(nextIdx) nm = Chem.Mol(mol.ToBinary()) nm.GetAtomWithIdx(nextIdx).SetNoImplicit(True) nm.GetAtomWithIdx(nextIdx).SetNumExplicitHs(1) cp = Chem.Mol(nm.ToBinary()) try: Chem.SanitizeMol(cp) except ValueError: res,indices = _recursivelyModifyNs(nm,matches,indices=tIndices) else: indices=tIndices res=cp return res,indices def AdjustAromaticNs(m,nitrogenPattern='[n&D2&H0;r5,r6]'): """ default nitrogen pattern matches Ns in 5 rings and 6 rings in order to be able to fix: O=c1ccncc1 """ Chem.GetSymmSSSR(m) m.UpdatePropertyCache(False) # break non-ring bonds linking rings: em = Chem.EditableMol(m) linkers = m.GetSubstructMatches(Chem.MolFromSmarts('[r]!@[r]')) plsFix=set() for a,b in linkers: em.RemoveBond(a,b) plsFix.add(a) plsFix.add(b) nm = em.GetMol() for at in plsFix: at=nm.GetAtomWithIdx(at) if at.GetIsAromatic() and at.GetAtomicNum()==7: at.SetNumExplicitHs(1) at.SetNoImplicit(True) # build molecules from the fragments: fragLists = Chem.GetMolFrags(nm) frags = [FragIndicesToMol(nm,x) for x in fragLists] # loop through the fragments in turn and try to aromatize them: ok=True for i,frag in enumerate(frags): cp = Chem.Mol(frag.ToBinary()) try: Chem.SanitizeMol(cp) except ValueError: matches = [x[0] for x in frag.GetSubstructMatches(Chem.MolFromSmarts(nitrogenPattern))] lres,indices=_recursivelyModifyNs(frag,matches) if not lres: #print 'frag %d failed (%s)'%(i,str(fragLists[i])) ok=False break else: revMap={} for k,v in frag._idxMap.iteritems(): revMap[v]=k for idx in indices: oatom = m.GetAtomWithIdx(revMap[idx]) oatom.SetNoImplicit(True) oatom.SetNumExplicitHs(1) if not ok: return None return m if __name__=='__main__': ms= ( Chem.MolFromMolBlock(mb[0],False), Chem.MolFromMolBlock(mb[1],False), Chem.MolFromMolBlock(mb[2],False), Chem.MolFromSmiles('O=c1ccc2ccccc2n1', False), Chem.MolFromSmiles('Cc1nnnn1C', False), Chem.MolFromSmiles('CCc1ccc2nc(=O)c(cc2c1)Cc1nnnn1C1CCCCC1', False), Chem.MolFromSmiles('c1cnc2cc3ccnc3cc12', False), Chem.MolFromSmiles('c1cc2cc3ccnc3cc2n1', False), Chem.MolFromSmiles('O=c1ccnc(c1)-c1cnc2cc3ccnc3cc12', False), Chem.MolFromSmiles('O=c1ccnc(c1)-c1cc1', False), ) fine = fixed = broken = 0 for i,m in enumerate(ms): print '#---------------------' try: m.UpdatePropertyCache(False) cp = Chem.Mol(m.ToBinary()) Chem.SanitizeMol(cp) m = cp print 'fine:',Chem.MolToSmiles(m) fine += 1 except ValueError: nm=AdjustAromaticNs(m) if nm is not None: Chem.SanitizeMol(nm) print 'fixed:',Chem.MolToSmiles(nm) fixed += 1 else: print 'still broken:', i broken += 1 print "%d fine, %d fixed, %d still broken (%d total)" % (fine, fixed, broken, (fine+fixed+broken))
------------------------------------------------------------------------------ uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss