Brian pointed out to me offline that the easiest way to avoid this problem is to let the PatternHolder calculate the fingerprints for you by calling its MakeFingerprint() method
-greg On Mon, Aug 31, 2020 at 4:34 PM Greg Landrum <greg.land...@gmail.com> wrote: > Hi Thomas, > > I agree that this is a much better place to ask a question than in the > comments of my blog post. :-) > > The problem you're having here is that the PatternHolder() class assumes > that the fingerprints being used have the default size, so you are storing > fingerprints with 4096 bits, but when the SubstructLibrary generates a > fingerprint for a query molecule it only generates a 2048-bit fingerprint. > This causes the substructure screenout to fail. > This is certainly a bug in the SubstructLibrary (it should, at the very > least, generate an error when you try to do this), but it's easy enough to > fix in your code: just stop specifying the length of the pattern > fingerprints. > > Best, > -greg > > > > On Mon, Aug 31, 2020 at 3:57 PM Thomas Evangelidis <teva...@gmail.com> > wrote: > >> Greetings, >> >> Maybe I should had posted this query as a comment on Greg's blog post ( >> https://rdkit.blogspot.com/2018/02/introducing-substructlibrary.html) >> but I write it here instead for greater visibility. I have many active >> fragments >> against a protein target (validated by NMR) and I want to screen a very >> large database for molecules containing those fragments. Therefore I >> tried the SubstructLibrary for greater efficiency. However, the results >> I get differ from direct PatternFingerprint comparison and substructure >> search using the Mol object. Try this simple example below: >> >> from rdkit import Chem, DataStructs >> from rdkit.Chem import rdSubstructLibrary >> >> SMILES1 = 'O=C(O)c1cccnc1' >> SMILES2 = 'c1nccc(c1C(=O)O)-c2cc(Cl)ccc2' >> # Remove hydrogens, otherwise you will have to modify the valence of the >> atoms in the fragment >> # that can facilitate extension by hand >> mol1 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES1, sanitize=False) ) >> mol2 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES2, sanitize=False) ) >> >> # AVENUE 1: Library >> mols2 = rdSubstructLibrary.CachedTrustedSmilesMolHolder() >> mols2.AddSmiles( Chem.MolToSmiles(mol2) ) >> fps = rdSubstructLibrary.PatternHolder() >> fp2 = Chem.PatternFingerprint(mol2, fpSize=4096) >> fps.AddFingerprint( fp2 ) >> library = rdSubstructLibrary.SubstructLibrary(mols2, fps) >> print("SubstructLibrary:", library.HasMatch(mol1, useChirality=False) ) >> >> # AVENUE 2: PatternFingerprint comparison >> fp1 = Chem.PatternFingerprint(mol1, fpSize=4096) >> print("PatternFingerprint:", DataStructs.AllProbeBitsMatch(fp1, fp2)) >> >> # AVENUE 3: HasSubstructMatch >> print("HasSubstructMatch:", mol2.HasSubstructMatch(mol1)) >> >> >> I strip out the hydrogens from both molecules in order to avoid manual >> modification of the atoms in the fragment (SMILES1 in this case) that can >> facilitate linking or extension. What is wrong in this case and the results >> do not agree? Am I not using SubstructLibrary correctly? >> >> I thank you in advance. >> Thomas >> >> -- >> >> ====================================================================== >> >> Dr. Thomas Evangelidis >> >> Research Scientist >> >> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech >> Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en> >> , Prague, Czech Republic >> & >> CEITEC - Central European Institute of Technology >> <https://www.ceitec.eu/>, Brno, Czech Republic >> >> email: teva...@gmail.com, Twitter: tevangelidis >> <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis >> <https://www.linkedin.com/in/thomas-evangelidis-495b45125/> >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss