Thank you so much!

 

What I ended up doing follows the same basic idea, although not even close
to the level of detail you put in your program. I'm only comparing the
structures in pairs, and doing the following:

(Sorry for the mess - its part of a larger system I just copied the relevant
parts.)

 

 

def scaffold_matching(query_smi, scaff_smi):

    """

        Checks if the scaffold from scaff_smi is 

        contained in the query_smi.

 

        Uses a stringent scaffold test.

    """

    sca = Chem.MolFromSmiles(scaff_smi)

    que = Chem.MolFromSmiles(query_smi)

 

    match = 0

    if que is not None:

        maxMatch = sca.GetNumAtoms()

        match = rdFMCS.FindMCS([sca,que],

                                atomCompare=rdFMCS.AtomCompare.CompareAny,

                                bondCompare=rdFMCS.BondCompare.CompareOrder,

                                ringMatchesRingOnly=True,

                                completeRingsOnly=True,

                                ).numAtoms / maxMatch

    return match

 

if __name__ == "__main__":

    template_smiles= <SMILES_FOR_SOME_BASE_MOLECULE>

    query_smiles=<SMILES_FOR_SOME_QUERY_MOLECULE>

    template_mol = Chem.MolFromSmiles(template_smiles)

    core = MurckoScaffold.GetScaffoldForMol(template_mol)

    scaffold = Chem.MolToSmiles(core)

    match = scaffold_matching(query_smiles,scaffold)

 

--

Gustavo Seabra

 

From: Andrew Dalke <da...@dalkescientific.com> 
Sent: Monday, November 23, 2020 7:59 AM
To: Gustavo Seabra <gustavo.sea...@gmail.com>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Partial substructure match?

 

On Nov 19, 2020, at 17:48, Gustavo Seabra <gustavo.sea...@gmail.com
<mailto:gustavo.sea...@gmail.com> > wrote:



Is it possible to search for *partial* substructure matches using RDKit?

  ...



For example, if the pattern is a naphthalene and the molecule to
search has a benzene, that would count as a 60% match.


A number of people pointed out that RDKit's MCS feature might be
appropriate.

I've attached an example program based around that.

For example, the default is your two structures:

% python mcs_search.py
No --query specified, using naphthalene as the default.
No --target or --targets specified, using phenol as the default.
Target_ID: phenol
nAtoms: 7
nBonds: 7
match_nAtoms: 6
match_nBonds: 6
atom_overlap: 0.600
bond_overlap: 0.545
atom_Tanimoto: 0.545
bond_Tanimoto: 0.500

I'll reverse it by specifying the SMILES on the command-line. 

% python mcs_search.py --query 'c1ccccc1O' --target 'c1ccc2ccccc2c1'
Target_ID: query
nAtoms: 10
nBonds: 11
match_nAtoms: 6
match_nBonds: 6
atom_overlap: 0.857
bond_overlap: 0.857
atom_Tanimoto: 0.545
bond_Tanimoto: 0.500

 

 

The program includes options to configure the FindMCS() parameters.

 

In addition, if chemfp 3.x is installed then some additional features are
available, like the following example, which applies the MCS search to all
records in ChEBI: 


% python mcs_search.py --query 'COC(=O)C1C(OC(=O)c2ccccc2)CC2CCC1N2C'
--targets ~/databases/ChEBI_lite.sdf.gz --id-tag 'ChEBI ID'
Target_ID            nAtoms nBonds  match_nAtoms  match_nBonds
atom_overlap               bond_overlap     atom_Tanimoto
bond_Tanimoto
CHEBI:776           21           24           9             8
0.409     0.333     0.265     0.200
CHEBI:1148         7             6             6             5
0.273     0.208     0.261     0.200
CHEBI:1734         19           21           16           15           0.727
0.625     0.640     0.500
CHEBI:1895         9             9             9             8
0.409     0.333     0.409     0.320
  ...






On Nov 20, 2020, at 15:56, Gustavo Seabra <gustavo.sea...@gmail.com
<mailto:gustavo.sea...@gmail.com> > wrote:

Is it possible to get a partial match with substructure search?

 

No.


                                                            Andrew
 
da...@dalkescientific.com <mailto:da...@dalkescientific.com> 



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to