Thank Nils for pointing both algorithms to the list. Interestingly Greg is
putting together scaffold tree algorithm in this PR
https://github.com/rdkit/rdkit/pull/2911 so anyone could try it in the
nearest future, hopefully 2020 release.
----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


pon., 10 lut 2020 o 21:40 Nils Weskamp <nils.wesk...@gmail.com> napisał(a):

> Hi Alexis,
>
> if you go down that route and calculate artifical skeletons, you could
> also go all the way and use an algorithm like HierS [1] or the scaffold
> tree [2] to perform a recursive fragmentation of your queries and
> molecules into their various rings and ring systems. If a query contains
> a ring system that is not present in the molecule, it cannot be a
> substructure.
>
> This is something you should be able to check with basic string matching
> or lookups in dictionaries / hashes instead of doing fingerprint
> calculations and comparisons.
>
> Not sure if that is actually faster, but might be worth a try.
>
> Hope this helps,
> Nils
>
> [1] https://pubs.acs.org/doi/abs/10.1021/jm049032d
> [2] https://pubs.acs.org/doi/10.1021/ci600338x
>
> Am 10.02.2020 um 21:01 schrieb Alexis Parenty:
> > Hi Maciek, thanks for your response. I did try that function too, but it
> > also takes smiles only (not smarts). I think the solution of Gregori is
> > very interesting: I am going to transform all smiles and smarts into
> > their single-bonded-carbon-based skeleton and will store the pattern
> > fingerprint of those skeletons in a dictionary using the smarts or the
> > smiles as a key. Then I will use your proposed function to match the
> > sub-skeletons with skeletons and will only do the expensive molecular
> > graph substructure search of the keys of the dictionary from which the
> > dictionary values have been identified as potential substructure of
> > others. Thanks Gregori!
> > Any other good tips?
> > Cheers,
> > Alexis
> >
> > On Mon, 10 Feb 2020 at 20:33, Maciek Wójcikowski <mac...@wojcikowski.pl
> > <mailto:mac...@wojcikowski.pl>> wrote:
> >
> >     Alexis,
> >
> >     I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is
> >     the function you are looking for here. More advanced usage and code
> >     snippets you can find on RDKit blog post that Greg has put together
> >     here:
> https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html
> >
> >     Best,
> >     Maciek
> >
> >     ----
> >     Pozdrawiam,  |  Best regards,
> >     Maciek Wójcikowski
> >     mac...@wojcikowski.pl <mailto:mac...@wojcikowski.pl>
> >
> >
> >     pon., 10 lut 2020 o 16:10 Alexis Parenty
> >     <alexis.parenty.h...@gmail.com
> >     <mailto:alexis.parenty.h...@gmail.com>> napisał(a):
> >
> >         Dear Rdkiters,
> >
> >         I am interested in doing substructure searches between many
> >         thousands structures and many thousands of fragments, as quickly
> >         as possible, with reasonable accuracy (> 0.95)...
> >
> >         I did read Greg's excellent post on that subject:
> >
> >
> http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
> >
> >         I was using the rdkit pattern fingerprint approach to filter out
> >         any fragments that have no chance of matching the bigger
> >         structure through the slow and more accurate molecular graph
> >         approach, saving a lot of time.
> >
> >         However, I realized that this rdkit pattern fingerprint approach
> >         only works well if we compared smiles with smiles:
> >
> >
> >
> >         def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
> >             pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag))
> >             pfp_structure =
> >         Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
> >
> >             frag_bits = set(pfp_frag.GetOnBits())
> >             structure_bits = set(pfp_structure.GetOnBits())
> >
> >             if frag_bits.issubset(structure_bits):
> >                 return True
> >             else:
> >                 return False
> >
> >
> >
> >         Unfortunately, some of my fragments are Smarts that are not
> >         valid Smiles: Using Chem.MolFromSmarts(smarts) gives really poor
> >         result (Many False Positives leading to poor Specificity).
> >         Interestingly, there is no False Negative, leading to a
> >         Sensitivity of 1!
> >
> >
> >
> >         def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles):
> >             pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag))
> >             pfp_structure =
> >         Chem.PatternFingerprint(Chem.MolFromSmiles(smiles))
> >
> >             frag_bits = set(pfp_frag.GetOnBits())
> >             structure_bits = set(pfp_structure.GetOnBits())
> >
> >             if frag_bits.issubset(structure_bits):
> >                 return True
> >             else:
> >                 return False
> >
> >
> >
> >         Is there a way to use pattern fingerprint (or other method) for
> >         substructure searches independently of the Smiles/Smarts format
> >         of the fragments? If not, is
> >         mol_struct.HasSubstructMatch(mol_frag) the only way I am left
> with?
> >
> >         Many thanks,
> >
> >         Alexis
> >
> >         _______________________________________________
> >         Rdkit-discuss mailing list
> >         Rdkit-discuss@lists.sourceforge.net
> >         <mailto:Rdkit-discuss@lists.sourceforge.net>
> >         https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
> >
> > _______________________________________________
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to