Alexis, I believe that `DataStructs.AllProbeBitsMatch(query_fp,mol_fp)` is the function you are looking for here. More advanced usage and code snippets you can find on RDKit blog post that Greg has put together here: https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html
Best, Maciek ---- Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl pon., 10 lut 2020 o 16:10 Alexis Parenty <alexis.parenty.h...@gmail.com> napisał(a): > Dear Rdkiters, > > I am interested in doing substructure searches between many thousands > structures and many thousands of fragments, as quickly as possible, with > reasonable accuracy (> 0.95)... > > I did read Greg's excellent post on that subject: > > > http://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html > > I was using the rdkit pattern fingerprint approach to filter out any > fragments that have no chance of matching the bigger structure through the > slow and more accurate molecular graph approach, saving a lot of time. > > However, I realized that this rdkit pattern fingerprint approach only > works well if we compared smiles with smiles: > > > > def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles): > pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmiles(frag)) > pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles)) > > frag_bits = set(pfp_frag.GetOnBits()) > structure_bits = set(pfp_structure.GetOnBits()) > > if frag_bits.issubset(structure_bits): > return True > else: > return False > > > > Unfortunately, some of my fragments are Smarts that are not valid Smiles: > Using Chem.MolFromSmarts(smarts) gives really poor result (Many False > Positives leading to poor Specificity). Interestingly, there is no False > Negative, leading to a Sensitivity of 1! > > > > def frag_is_a_substructure_of_structure_via_pfp(frag*, *smiles): > pfp_frag = Chem.PatternFingerprint(Chem.MolFromSmarts(frag)) > pfp_structure = Chem.PatternFingerprint(Chem.MolFromSmiles(smiles)) > > frag_bits = set(pfp_frag.GetOnBits()) > structure_bits = set(pfp_structure.GetOnBits()) > > if frag_bits.issubset(structure_bits): > return True > else: > return False > > > > Is there a way to use pattern fingerprint (or other method) for > substructure searches independently of the Smiles/Smarts format of the > fragments? If not, is mol_struct.HasSubstructMatch(mol_frag) the only way I > am left with? > > Many thanks, > > Alexis > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss