Hi Alexis,

if you go down that route and calculate artifical skeletons, you could
also go all the way and use an algorithm like HierS [1] or the scaffold
tree [2] to perform a recursive fragmentation of your queries and
molecules into their various rings and ring systems. If a query contains
a ring system that is not present in the molecule, it cannot be a
substructure.

This is something you should be able to check with basic string matching
or lookups in dictionaries / hashes instead of doing fingerprint
calculations and comparisons.

Not sure if that is actually faster, but might be worth a try.

Hope this helps,
Nils

[1] https://pubs.acs.org/doi/abs/10.1021/jm049032d
[2] https://pubs.acs.org/doi/10.1021/ci600338x

Am 10.02.2020 um 21:01 schrieb Alexis Parenty:
> Hi Maciek, thanks for your response. I did try that function too, but it
> also takes smiles only (not smarts). I think the solution of Gregori is
> very interesting: I am going to transform all smiles and smarts into
> their single-bonded-carbon-based skeleton and will store the pattern
> fingerprint of those skeletons in a dictionary using the smarts or the
> smiles as a key. Then I will use your proposed function to match the
> sub-skeletons with skeletons and will only do the expensive molecular
> graph substructure search of the keys of the dictionary from which the
> dictionary values have been identified as potential substructure of
> others. Thanks Gregori!
> Any other good tips?
> Cheers,
> Alexis
> 
> On Mon, 10 Feb 2020 at 20:33, Maciek Wójcikowski <mac...@wojcikowski.pl
> <mailto:mac...@wojcikowski.pl>> wrote:
> 
>     Alexis,
> 
>     I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is
>     the function you are looking for here. More advanced usage and code
>     snippets you can find on RDKit blog post that Greg has put together
>     here: 
> https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html 
> 
>     Best,
>     Maciek
> 
>     ----
>     Pozdrawiam,  |  Best regards,
>     Maciek Wójcikowski
>     mac...@wojcikowski.pl <mailto:mac...@wojcikowski.pl>
> 
> 
>     pon., 10 lut 2020 o 16:10 Alexis Parenty
>     <alexis.parenty.h...@gmail.com
>     <mailto:alexis.parenty.h...@gmail.com>> napisał(a):
> 
>         Dear Rdkiters,
> 
>         I am interested in doing substructure searches between many
>         thousands structures and many thousands of fragments, as quickly
>         as possible, with reasonable accuracy (> 0.95)...
> 
>         I did read Greg's excellent post on that subject:
> 
>         
> http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
> 
>         I was using the rdkit pattern fingerprint approach to filter out
>         any fragments that have no chance of matching the bigger
>         structure through the slow and more accurate molecular graph
>         approach, saving a lot of time.
> 
>         However, I realized that this rdkit pattern fingerprint approach
>         only works well if we compared smiles with smiles:
> 
>          
> 
>         def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>             pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
>             pfp_structure =
>         Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
> 
>             frag_bits = set(pfp_frag.GetOnBits())
>             structure_bits = set(pfp_structure.GetOnBits())
> 
>             if frag_bits.issubset(structure_bits):
>                 return True
>             else:
>                 return False
> 
>          
> 
>         Unfortunately, some of my fragments are Smarts that are not
>         valid Smiles: Using Chem.MolFromSmarts(smarts) gives really poor
>         result (Many False Positives leading to poor Specificity).
>         Interestingly, there is no False Negative, leading to a
>         Sensitivity of 1!
> 
>          
> 
>         def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
>             pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
>             pfp_structure =
>         Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
> 
>             frag_bits = set(pfp_frag.GetOnBits())
>             structure_bits = set(pfp_structure.GetOnBits())
> 
>             if frag_bits.issubset(structure_bits):
>                 return True
>             else:
>                 return False
> 
>          
> 
>         Is there a way to use pattern fingerprint (or other method) for
>         substructure searches independently of the Smiles/Smarts format
>         of the fragments? If not, is
>         mol_struct.HasSubstructMatch(mol_frag) the only way I am left with?
> 
>         Many thanks,
> 
>         Alexis
> 
>         _______________________________________________
>         Rdkit-discuss mailing list
>         Rdkit-discuss@lists.sourceforge.net
>         <mailto:Rdkit-discuss@lists.sourceforge.net>
>         https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> 
> 
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to