On Aug 7, 2019, at 13:08, Paolo Tosco <paolo.tosco.m...@gmail.com> wrote:
> You can use
> 
> Chem.MolFragmentToSmiles(mol, match)
> 
> where match is a tuple of atom indices returned by GetSubstructMatch().

Note however that if only the atom indices are given then 
Chem.MolFragmentToSmiles() will include all bonds which connect those atoms, 
even if the original SMARTS does not match those bonds. For example:

>>> from rdkit import Chem
>>> pat = Chem.MolFromSmarts("*~*~*~*") # match 4 linear atoms
>>> mol = Chem.MolFromSmiles("C1CCC1") # ring of size 4
>>> atom_indices = mol.GetSubstructMatch(pat)
>>> atom_indices
(0, 1, 2, 3)
>>> Chem.MolFragmentToSmiles(mol, atom_indices)  # returns the ring
'C1CCC1'


If this is important, then you need to pass the correct bond indices to 
MolFragmentToSmiles(). This can be done by using the bonds in the query graph 
to get the bond indices in the molecule graph. I believe the following is 
correct:

def get_match_bond_indices(query, mol, match_atom_indices):
    bond_indices = []
    for query_bond in query.GetBonds():
        atom_index1 = match_atom_indices[query_bond.GetBeginAtomIdx()]
        atom_index2 = match_atom_indices[query_bond.GetEndAtomIdx()]
        bond_indices.append(mol.GetBondBetweenAtoms(
             atom_index1, atom_index2).GetIdx())
    return bond_indices

(Does a function like this already exist in RDKit?)

I'll use it to get the bond indices for the *~*~*~* match:

>>> bond_indices = get_match_bond_indices(pat, mol, atom_indices)
>>> bond_indices
[0, 1, 2]

Passing the atom and bond indices gives the expected match SMILES: 

>>> Chem.MolFragmentToSmiles(mol, atom_indices, bond_indices)
'CCCC'

Cheers,

                                Andrew
                                da...@dalkescientific.com




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to