Greetings,

Maybe I should had posted this query as a comment on Greg's blog post (
https://rdkit.blogspot.com/2018/02/introducing-substructlibrary.html) but I
write it here instead for greater visibility. I have many active fragments
against a protein target (validated by NMR) and I want to screen a very
large database for molecules containing those fragments. Therefore I tried
the SubstructLibrary for greater efficiency. However, the results I get
differ from direct PatternFingerprint comparison and substructure search
using the Mol object. Try this simple example below:

from rdkit import Chem, DataStructs
from rdkit.Chem import rdSubstructLibrary

SMILES1 = 'O=C(O)c1cccnc1'
SMILES2 = 'c1nccc(c1C(=O)O)-c2cc(Cl)ccc2'
# Remove hydrogens, otherwise you will have to modify the valence of
the atoms in the fragment
# that can facilitate extension by hand
mol1 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES1, sanitize=False) )
mol2 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES2, sanitize=False) )

# AVENUE 1: Library
mols2 = rdSubstructLibrary.CachedTrustedSmilesMolHolder()
mols2.AddSmiles( Chem.MolToSmiles(mol2) )
fps = rdSubstructLibrary.PatternHolder()
fp2 = Chem.PatternFingerprint(mol2, fpSize=4096)
fps.AddFingerprint( fp2 )
library = rdSubstructLibrary.SubstructLibrary(mols2, fps)
print("SubstructLibrary:", library.HasMatch(mol1, useChirality=False) )

# AVENUE 2: PatternFingerprint comparison
fp1 = Chem.PatternFingerprint(mol1, fpSize=4096)
print("PatternFingerprint:", DataStructs.AllProbeBitsMatch(fp1, fp2))

# AVENUE 3: HasSubstructMatch
print("HasSubstructMatch:", mol2.HasSubstructMatch(mol1))


I strip out the hydrogens from both molecules in order to avoid manual
modification of the atoms in the fragment (SMILES1 in this case) that can
facilitate linking or extension. What is wrong in this case and the results
do not agree? Am I not using SubstructLibrary correctly?

I thank you in advance.
Thomas

-- 

======================================================================

Dr. Thomas Evangelidis

Research Scientist

IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy
of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague,
Czech Republic
  &
CEITEC - Central European Institute of Technology
<https://www.ceitec.eu/>, Brno,
Czech Republic

email: teva...@gmail.com, Twitter: tevangelidis
<https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
<https://www.linkedin.com/in/thomas-evangelidis-495b45125/>

website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to