This is my 3rd attempt to get an explanation about how these invariants work in the ECFP fingerprint cause I can't find it anywhere in the documentation. I tried the generateAtomInvariant() [see below] and the resulting ECFP bit-vectors had for the same molecules drastically reduced variance, 2360 variant bits without invariants versus 795 with the invariants. Surprisingly, the performance of the ECFP with invariants was better in this dataset in terms of affinity ranking. Can someone please explain what happens when I pass invariants to the AllChem.GetMorganFingerprint() function??? I hope that I will get an answer this time.
>> def generateAtomInvariant(mol): >> """ >> >>> generateAtomInvariant(Chem.MolFromSmiles("Cc1ncccc1")) >> [341294046, 3184205312, 522345510, 1545984525, 1545984525, 1545984525, >> 1545984525] >> """ >> num_atoms = mol.GetNumAtoms() >> invariants = [0]*num_atoms >> for i,a in enumerate(mol.GetAtoms()): >> descriptors=[] >> descriptors.append(a.GetAtomicNum()) >> descriptors.append(a.GetTotalDegree()) >> descriptors.append(a.GetTotalNumHs()) >> descriptors.append(a.IsInRing()) >> descriptors.append(a.GetIsAromatic()) >> invariants[i]=hash(tuple(descriptors))& 0xffffffff >> return invariants >> >> >> And then generate the fingerprint like this: >> >> >> fp = AllChem.GetMorganFingerprint(mol, radius=3, >> invariants=generateAtomInvariant(mol)) >> >> >> -- ====================================================================== Dr. Thomas Evangelidis Research Scientist IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague, Czech Republic & CEITEC - Central European Institute of Technology <https://www.ceitec.eu/>, Brno, Czech Republic email: teva...@gmail.com, Twitter: tevangelidis <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis <https://www.linkedin.com/in/thomas-evangelidis-495b45125/> website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss