Thank you so much!
What I ended up doing follows the same basic idea, although not even close to the level of detail you put in your program. I'm only comparing the structures in pairs, and doing the following: (Sorry for the mess - its part of a larger system I just copied the relevant parts.) def scaffold_matching(query_smi, scaff_smi): """ Checks if the scaffold from scaff_smi is contained in the query_smi. Uses a stringent scaffold test. """ sca = Chem.MolFromSmiles(scaff_smi) que = Chem.MolFromSmiles(query_smi) match = 0 if que is not None: maxMatch = sca.GetNumAtoms() match = rdFMCS.FindMCS([sca,que], atomCompare=rdFMCS.AtomCompare.CompareAny, bondCompare=rdFMCS.BondCompare.CompareOrder, ringMatchesRingOnly=True, completeRingsOnly=True, ).numAtoms / maxMatch return match if __name__ == "__main__": template_smiles= <SMILES_FOR_SOME_BASE_MOLECULE> query_smiles=<SMILES_FOR_SOME_QUERY_MOLECULE> template_mol = Chem.MolFromSmiles(template_smiles) core = MurckoScaffold.GetScaffoldForMol(template_mol) scaffold = Chem.MolToSmiles(core) match = scaffold_matching(query_smiles,scaffold) -- Gustavo Seabra From: Andrew Dalke <da...@dalkescientific.com> Sent: Monday, November 23, 2020 7:59 AM To: Gustavo Seabra <gustavo.sea...@gmail.com> Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Partial substructure match? On Nov 19, 2020, at 17:48, Gustavo Seabra <gustavo.sea...@gmail.com <mailto:gustavo.sea...@gmail.com> > wrote: Is it possible to search for *partial* substructure matches using RDKit? ... For example, if the pattern is a naphthalene and the molecule to search has a benzene, that would count as a 60% match. A number of people pointed out that RDKit's MCS feature might be appropriate. I've attached an example program based around that. For example, the default is your two structures: % python mcs_search.py No --query specified, using naphthalene as the default. No --target or --targets specified, using phenol as the default. Target_ID: phenol nAtoms: 7 nBonds: 7 match_nAtoms: 6 match_nBonds: 6 atom_overlap: 0.600 bond_overlap: 0.545 atom_Tanimoto: 0.545 bond_Tanimoto: 0.500 I'll reverse it by specifying the SMILES on the command-line. % python mcs_search.py --query 'c1ccccc1O' --target 'c1ccc2ccccc2c1' Target_ID: query nAtoms: 10 nBonds: 11 match_nAtoms: 6 match_nBonds: 6 atom_overlap: 0.857 bond_overlap: 0.857 atom_Tanimoto: 0.545 bond_Tanimoto: 0.500 The program includes options to configure the FindMCS() parameters. In addition, if chemfp 3.x is installed then some additional features are available, like the following example, which applies the MCS search to all records in ChEBI: % python mcs_search.py --query 'COC(=O)C1C(OC(=O)c2ccccc2)CC2CCC1N2C' --targets ~/databases/ChEBI_lite.sdf.gz --id-tag 'ChEBI ID' Target_ID nAtoms nBonds match_nAtoms match_nBonds atom_overlap bond_overlap atom_Tanimoto bond_Tanimoto CHEBI:776 21 24 9 8 0.409 0.333 0.265 0.200 CHEBI:1148 7 6 6 5 0.273 0.208 0.261 0.200 CHEBI:1734 19 21 16 15 0.727 0.625 0.640 0.500 CHEBI:1895 9 9 9 8 0.409 0.333 0.409 0.320 ... On Nov 20, 2020, at 15:56, Gustavo Seabra <gustavo.sea...@gmail.com <mailto:gustavo.sea...@gmail.com> > wrote: Is it possible to get a partial match with substructure search? No. Andrew da...@dalkescientific.com <mailto:da...@dalkescientific.com>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss