Brian pointed out to me offline that the easiest way to avoid this problem
is to let the PatternHolder calculate the fingerprints for you by calling
its MakeFingerprint() method

-greg



On Mon, Aug 31, 2020 at 4:34 PM Greg Landrum <greg.land...@gmail.com> wrote:

> Hi Thomas,
>
> I agree that this is a much better place to ask a question than in the
> comments of my blog post. :-)
>
> The problem you're having here is that the PatternHolder() class assumes
> that the fingerprints being used have the default size, so you are storing
> fingerprints with 4096 bits, but when the SubstructLibrary generates a
> fingerprint for a query molecule it only generates a 2048-bit fingerprint.
> This causes the substructure screenout to fail.
> This is certainly a bug in the SubstructLibrary (it should, at the very
> least, generate an error when you try to do this), but it's easy enough to
> fix in your code: just stop specifying the length of the pattern
> fingerprints.
>
> Best,
> -greg
>
>
>
> On Mon, Aug 31, 2020 at 3:57 PM Thomas Evangelidis <teva...@gmail.com>
> wrote:
>
>> Greetings,
>>
>> Maybe I should had posted this query as a comment on Greg's blog post (
>> https://rdkit.blogspot.com/2018/02/introducing-substructlibrary.html)
>> but I write it here instead for greater visibility. I have many active 
>> fragments
>> against a protein target (validated by NMR) and I want to screen a very
>> large database for molecules containing those fragments. Therefore I
>> tried the SubstructLibrary for greater efficiency. However, the results
>> I get differ from direct PatternFingerprint comparison and substructure
>> search using the Mol object. Try this simple example below:
>>
>> from rdkit import Chem, DataStructs
>> from rdkit.Chem import rdSubstructLibrary
>>
>> SMILES1 = 'O=C(O)c1cccnc1'
>> SMILES2 = 'c1nccc(c1C(=O)O)-c2cc(Cl)ccc2'
>> # Remove hydrogens, otherwise you will have to modify the valence of the 
>> atoms in the fragment
>> # that can facilitate extension by hand
>> mol1 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES1, sanitize=False) )
>> mol2 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES2, sanitize=False) )
>>
>> # AVENUE 1: Library
>> mols2 = rdSubstructLibrary.CachedTrustedSmilesMolHolder()
>> mols2.AddSmiles( Chem.MolToSmiles(mol2) )
>> fps = rdSubstructLibrary.PatternHolder()
>> fp2 = Chem.PatternFingerprint(mol2, fpSize=4096)
>> fps.AddFingerprint( fp2 )
>> library = rdSubstructLibrary.SubstructLibrary(mols2, fps)
>> print("SubstructLibrary:", library.HasMatch(mol1, useChirality=False) )
>>
>> # AVENUE 2: PatternFingerprint comparison
>> fp1 = Chem.PatternFingerprint(mol1, fpSize=4096)
>> print("PatternFingerprint:", DataStructs.AllProbeBitsMatch(fp1, fp2))
>>
>> # AVENUE 3: HasSubstructMatch
>> print("HasSubstructMatch:", mol2.HasSubstructMatch(mol1))
>>
>>
>> I strip out the hydrogens from both molecules in order to avoid manual
>> modification of the atoms in the fragment (SMILES1 in this case) that can
>> facilitate linking or extension. What is wrong in this case and the results
>> do not agree? Am I not using SubstructLibrary correctly?
>>
>> I thank you in advance.
>> Thomas
>>
>> --
>>
>> ======================================================================
>>
>> Dr. Thomas Evangelidis
>>
>> Research Scientist
>>
>> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech
>> Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>
>> , Prague, Czech Republic
>>   &
>> CEITEC - Central European Institute of Technology
>> <https://www.ceitec.eu/>, Brno, Czech Republic
>>
>> email: teva...@gmail.com, Twitter: tevangelidis
>> <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
>> <https://www.linkedin.com/in/thomas-evangelidis-495b45125/>
>>
>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to