Hi all,,

I would appreciate some pointers on how it would be possible to find the
maximum common substructure of 2 molecules, where in the template structure
some atoms may be *any*, but some other atoms must be fixed.

Currently, I'm trying to use rdFMCS module. For example:

from rdkit import Chem
from rdkit.Chem import rdFMCS

template =
Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
# This should give a sulfone connected to an aromatic ring and
# some other (any) element. Notice that the ring may have
# any atoms (N,C,O), but for me it is important to have the SO2 group.

mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3ccccc3)CCN2)cc1')
# This molecule has the pattern.

# Now, if I try to find a substructure match, I use:
compare = [template, mol1]
res = rdFMCS.FindMCS(compare,
                atomCompare=rdFMCS.AtomCompare.CompareElements,
                bondCompare=rdFMCS.BondCompare.CompareAny,
                ringMatchesRingOnly=False,
                completeRingsOnly=False)
res.smartsString
# gives: '[#16](=[#8])=[#8]'

# Notice that the only match is the SO2, it does not match the ring.
However, if I try that with another structure that has a CF3 in place of
the SO2, I get:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
                atomCompare=rdFMCS.AtomCompare.CompareElements,
                bondCompare=rdFMCS.BondCompare.CompareAny,
                ringMatchesRingOnly=False,
                completeRingsOnly=False)
res.smartsString
# Returns: '' (empty string)

# if I change to AtomCompare.CompareAny, now a CF3 will also match
# in the SO2-X:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
                atomCompare=rdFMCS.AtomCompare.CompareAny,
                bondCompare=rdFMCS.BondCompare.CompareAny,
                ringMatchesRingOnly=False,
                completeRingsOnly=False)
res.smartsString
# Returns:
'[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'

But now theCF3 is counted in place of the SO2. The result I'd like to get
here would be just the ring, as in the case:
new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1')
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [new_template,mol2]
res = rdFMCS.FindMCS(compare,
                atomCompare=rdFMCS.AtomCompare.CompareElements,
                bondCompare=rdFMCS.BondCompare.CompareAny,
                ringMatchesRingOnly=False,
                completeRingsOnly=False)
res.smartsString
# Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring)

Notice that if I use CompareElements, there seems to be no way to match the
ring with either N or C.

Does anyone have a suggestion on how I can specify flexibility (similar to
AtomCompare.CompareAny) only for a portion of the molecule and still
enforce specific atoms in another portion?

Thank you so much!
--
Gustavo Seabra.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to