Thank Nils for pointing both algorithms to the list. Interestingly Greg is putting together scaffold tree algorithm in this PR https://github.com/rdkit/rdkit/pull/2911 so anyone could try it in the nearest future, hopefully 2020 release. ---- Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl
pon., 10 lut 2020 o 21:40 Nils Weskamp <nils.wesk...@gmail.com> napisał(a): > Hi Alexis, > > if you go down that route and calculate artifical skeletons, you could > also go all the way and use an algorithm like HierS [1] or the scaffold > tree [2] to perform a recursive fragmentation of your queries and > molecules into their various rings and ring systems. If a query contains > a ring system that is not present in the molecule, it cannot be a > substructure. > > This is something you should be able to check with basic string matching > or lookups in dictionaries / hashes instead of doing fingerprint > calculations and comparisons. > > Not sure if that is actually faster, but might be worth a try. > > Hope this helps, > Nils > > [1] https://pubs.acs.org/doi/abs/10.1021/jm049032d > [2] https://pubs.acs.org/doi/10.1021/ci600338x > > Am 10.02.2020 um 21:01 schrieb Alexis Parenty: > > Hi Maciek, thanks for your response. I did try that function too, but it > > also takes smiles only (not smarts). I think the solution of Gregori is > > very interesting: I am going to transform all smiles and smarts into > > their single-bonded-carbon-based skeleton and will store the pattern > > fingerprint of those skeletons in a dictionary using the smarts or the > > smiles as a key. Then I will use your proposed function to match the > > sub-skeletons with skeletons and will only do the expensive molecular > > graph substructure search of the keys of the dictionary from which the > > dictionary values have been identified as potential substructure of > > others. Thanks Gregori! > > Any other good tips? > > Cheers, > > Alexis > > > > On Mon, 10 Feb 2020 at 20:33, Maciek Wójcikowski <mac...@wojcikowski.pl > > <mailto:mac...@wojcikowski.pl>> wrote: > > > > Alexis, > > > > I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is > > the function you are looking for here. More advanced usage and code > > snippets you can find on RDKit blog post that Greg has put together > > here: > https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html > > > > Best, > > Maciek > > > > ---- > > Pozdrawiam, | Best regards, > > Maciek Wójcikowski > > mac...@wojcikowski.pl <mailto:mac...@wojcikowski.pl> > > > > > > pon., 10 lut 2020 o 16:10 Alexis Parenty > > <alexis.parenty.h...@gmail.com > > <mailto:alexis.parenty.h...@gmail.com>> napisał(a): > > > > Dear Rdkiters, > > > > I am interested in doing substructure searches between many > > thousands structures and many thousands of fragments, as quickly > > as possible, with reasonable accuracy (> 0.95)... > > > > I did read Greg's excellent post on that subject: > > > > > http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html > > > > I was using the rdkit pattern fingerprint approach to filter out > > any fragments that have no chance of matching the bigger > > structure through the slow and more accurate molecular graph > > approach, saving a lot of time. > > > > However, I realized that this rdkit pattern fingerprint approach > > only works well if we compared smiles with smiles: > > > > > > > > def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles): > > pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag)) > > pfp_structure = > > Chem.PatternFingerprint(Chem.MolFromSmiles(smiles)) > > > > frag_bits = set(pfp_frag.GetOnBits()) > > structure_bits = set(pfp_structure.GetOnBits()) > > > > if frag_bits.issubset(structure_bits): > > return True > > else: > > return False > > > > > > > > Unfortunately, some of my fragments are Smarts that are not > > valid Smiles: Using Chem.MolFromSmarts(smarts) gives really poor > > result (Many False Positives leading to poor Specificity). > > Interestingly, there is no False Negative, leading to a > > Sensitivity of 1! > > > > > > > > def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles): > > pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag)) > > pfp_structure = > > Chem.PatternFingerprint(Chem.MolFromSmiles(smiles)) > > > > frag_bits = set(pfp_frag.GetOnBits()) > > structure_bits = set(pfp_structure.GetOnBits()) > > > > if frag_bits.issubset(structure_bits): > > return True > > else: > > return False > > > > > > > > Is there a way to use pattern fingerprint (or other method) for > > substructure searches independently of the Smiles/Smarts format > > of the fragments? If not, is > > mol_struct.HasSubstructMatch(mol_frag) the only way I am left > with? > > > > Many thanks, > > > > Alexis > > > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > <mailto:Rdkit-discuss@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > > > > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss