Greetings, Maybe I should had posted this query as a comment on Greg's blog post ( https://rdkit.blogspot.com/2018/02/introducing-substructlibrary.html) but I write it here instead for greater visibility. I have many active fragments against a protein target (validated by NMR) and I want to screen a very large database for molecules containing those fragments. Therefore I tried the SubstructLibrary for greater efficiency. However, the results I get differ from direct PatternFingerprint comparison and substructure search using the Mol object. Try this simple example below:
from rdkit import Chem, DataStructs from rdkit.Chem import rdSubstructLibrary SMILES1 = 'O=C(O)c1cccnc1' SMILES2 = 'c1nccc(c1C(=O)O)-c2cc(Cl)ccc2' # Remove hydrogens, otherwise you will have to modify the valence of the atoms in the fragment # that can facilitate extension by hand mol1 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES1, sanitize=False) ) mol2 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES2, sanitize=False) ) # AVENUE 1: Library mols2 = rdSubstructLibrary.CachedTrustedSmilesMolHolder() mols2.AddSmiles( Chem.MolToSmiles(mol2) ) fps = rdSubstructLibrary.PatternHolder() fp2 = Chem.PatternFingerprint(mol2, fpSize=4096) fps.AddFingerprint( fp2 ) library = rdSubstructLibrary.SubstructLibrary(mols2, fps) print("SubstructLibrary:", library.HasMatch(mol1, useChirality=False) ) # AVENUE 2: PatternFingerprint comparison fp1 = Chem.PatternFingerprint(mol1, fpSize=4096) print("PatternFingerprint:", DataStructs.AllProbeBitsMatch(fp1, fp2)) # AVENUE 3: HasSubstructMatch print("HasSubstructMatch:", mol2.HasSubstructMatch(mol1)) I strip out the hydrogens from both molecules in order to avoid manual modification of the atoms in the fragment (SMILES1 in this case) that can facilitate linking or extension. What is wrong in this case and the results do not agree? Am I not using SubstructLibrary correctly? I thank you in advance. Thomas -- ====================================================================== Dr. Thomas Evangelidis Research Scientist IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague, Czech Republic & CEITEC - Central European Institute of Technology <https://www.ceitec.eu/>, Brno, Czech Republic email: teva...@gmail.com, Twitter: tevangelidis <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis <https://www.linkedin.com/in/thomas-evangelidis-495b45125/> website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss