Dear Colleagues, I recently came across some notes from Noel and Roger about this molecular hashing.
https://baoilleach.blogspot.com/2018/01/implementing-sayle-tautomer-hash-with.html https://nextmovesoftware.com/blog/2013/04/25/finding-all-types-of-every-mer/#footnote-1 https://nextmovesoftware.com/blog/2016/06/22/fishing-for-matched-series-in-a-sea-of-structure-representations/ Noel implemented it fr pybel (first link) and I would like to implement it for RDKit, but somehow I do not get the same results as Noel. here below my code What is that I get wrong? Thanks a lot for your help. I will post the solution as a comment to Noel's blog post once we have one. Cheers, m from rdkit import Chem from rdkit.Chem import AllChem def tautomerhash(smi): rdmol = Chem.MolFromSmiles(smi) m = Chem.RemoveHs(rdmol) formalcharges = 0 hcount = 0 for atom in m.GetAtoms(): formalcharges += atom.GetFormalCharge() atom.SetFormalCharge(0) if atom.GetAtomicNum() != 6: # non-carbon hcount += atom.GetNumImplicitHs() atom.SetNoImplicit(True) atom.SetIsAromatic(False) for bond in m.GetBonds(): bond.SetBondType(Chem.rdchem.BondType.SINGLE) bond.SetIsAromatic(False) #rmol.SetAromaticPerceived() # no point triggering perception s = Chem.MolToSmiles(m,canonical=True) o = "%s_%d" % (s, hcount-formalcharges) #print (hcount, formalcharges, hcount-formalcharges, s) return o if __name__ == "__main__": smis = ["*c1c(c(C(=N)O)cc2nc([nH]c12)C(=O)[O-])N(=O)=O", "*c1c(c(C(=O)N)cc2[nH]c(nc12)C(=O)O)[N+](=O)[O-]"] #smis = ['N#[O+]', 'O=[N]'] for smi in smis: print(smi, tautomerhash(smi))
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss