Dear Colleagues,
I recently came across some notes from Noel and Roger about this molecular
hashing.

https://baoilleach.blogspot.com/2018/01/implementing-sayle-tautomer-hash-with.html
https://nextmovesoftware.com/blog/2013/04/25/finding-all-types-of-every-mer/#footnote-1
https://nextmovesoftware.com/blog/2016/06/22/fishing-for-matched-series-in-a-sea-of-structure-representations/

Noel implemented it fr pybel (first link) and I would like to implement it
for RDKit, but somehow I do not get the same results as Noel. here below my
code
What is that I get wrong?

Thanks a lot for your help. I will post the solution as a comment to Noel's
blog post once we have one.

Cheers,
m


from rdkit import Chem
from rdkit.Chem import AllChem

def tautomerhash(smi):
    rdmol = Chem.MolFromSmiles(smi)
    m = Chem.RemoveHs(rdmol)
    formalcharges = 0
    hcount = 0

    for atom in m.GetAtoms():
        formalcharges += atom.GetFormalCharge()
        atom.SetFormalCharge(0)
        if atom.GetAtomicNum() != 6: # non-carbon
            hcount += atom.GetNumImplicitHs()
        atom.SetNoImplicit(True)
        atom.SetIsAromatic(False)

    for bond in m.GetBonds():
        bond.SetBondType(Chem.rdchem.BondType.SINGLE)
        bond.SetIsAromatic(False)

    #rmol.SetAromaticPerceived() # no point triggering perception
    s = Chem.MolToSmiles(m,canonical=True)

    o = "%s_%d" % (s, hcount-formalcharges)
    #print (hcount, formalcharges, hcount-formalcharges, s)
    return o


if __name__ == "__main__":
    smis = ["*c1c(c(C(=N)O)cc2nc([nH]c12)C(=O)[O-])N(=O)=O",
            "*c1c(c(C(=O)N)cc2[nH]c(nc12)C(=O)O)[N+](=O)[O-]"]


    #smis = ['N#[O+]', 'O=[N]']
    for smi in smis:
        print(smi, tautomerhash(smi))
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to