I'm not sure why you'd want to reimplement something that's already there, but if this works better for you...
the easiest way to get a single function you could call would be to do something like: In [18]: def MolToGenericScaffold(mol): ...: return MurckoScaffold.MakeScaffoldGeneric(MurckoScaffold.GetScaffoldForMol(mol)) ...: In [19]: Chem.MolToSmiles(MolToGenericScaffold(Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1'))) Out[19]: 'CC(C1CCCCC1)C1CC1' -greg On Tue, Apr 27, 2021 at 4:32 AM Francois Berenger <mli...@ligand.eu> wrote: > On 27/04/2021 10:12, Francois Berenger wrote: > > On 26/04/2021 23:35, Greg Landrum wrote: > >> Hi Francois, > >> > >> The implementation which is there does, I believe, the right thing. > >> However... first you need to find the Murcko Scaffold, then you can > >> convert that scaffold to the generic form: > >> > >>> In [5]: m = Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1') > >>> In [6]: scaff = MurckoScaffold.GetScaffoldForMol(m) > >>> In [7]: Chem.MolToSmiles(scaff) > >>> Out[7]: 'O=C(c1ccccc1)C1CC1' > >>> In [8]: framework = MurckoScaffold.MakeScaffoldGeneric(scaff) > >>> In [9]: print(Chem.MolToSmiles(framework)) > >>> CC(C1CCCCC1)C1CC1 > > > > Ok, maybe this two steps process is a little bit better, but still > > not exactly what I would expect in some cases. > > > > I'll say if I program something which I prefer. > > Hello, > > I end up with this: > --- > def find_terminal_atoms(mol): > res = [] > for a in mol.GetAtoms(): > if len(a.GetBonds()) == 1: > res.append(a) > return res > > # Bemis, G. W., & Murcko, M. A. (1996). > # "The properties of known drugs. 1. Molecular frameworks." > # Journal of medicinal chemistry, 39(15), 2887-2893. > def BemisMurckoFramework(mol): > # keep only Heavy Atoms (HA) > only_HA = rdkit.Chem.rdmolops.RemoveHs(mol) > # switch all HA to Carbon > rw_mol = Chem.RWMol(only_HA) > for i in range(rw_mol.GetNumAtoms()): > rw_mol.ReplaceAtom(i, Chem.Atom(6)) > # switch all non single bonds to single > non_single_bonds = [] > for b in rw_mol.GetBonds(): > if b.GetBondType() != Chem.BondType.SINGLE: > non_single_bonds.append(b) > for b in non_single_bonds: > j = b.GetBeginAtomIdx() > k = b.GetEndAtomIdx() > rw_mol.RemoveBond(j, k) > rw_mol.AddBond(j, k, Chem.BondType.SINGLE) > # as long as there are terminal atoms, remove them > terminal_atoms = find_terminal_atoms(rw_mol) > while terminal_atoms != []: > for a in terminal_atoms: > for b in a.GetBonds(): > rw_mol.RemoveBond(b.GetBeginAtomIdx(), > b.GetEndAtomIdx()) > rw_mol.RemoveAtom(a.GetIdx()) > terminal_atoms = find_terminal_atoms(rw_mol) > return rw_mol.GetMol() > --- > > I don't claim this is very efficient Python code. I am not very good at > snake charming. > > Regards, > F. > > >> Best, > >> -greg > >> > >> On Mon, Apr 26, 2021 at 11:15 AM Francois Berenger <mli...@ligand.eu> > >> wrote: > >> > >>> Hello, > >>> > >>> I am trying MurckoScaffold.MakeScaffoldGeneric(mol), > >>> but this keeps the side chains. > >>> > >>> While my understanding of BM scaffolds is that only rings > >>> and ring linkers should be kept. > >>> > >>> The fact that the rdkit implementation keeps the > >>> side chains makes Murcko scaffolds a much less powerful filter > >>> to enforce molecular diversity. > >>> > >>> And I don't even see any option to force the standard/vanilla > >>> behavior. > >>> Or, am I missing something? > >>> > >>> Regards, > >>> F. > >>> > >>> _______________________________________________ > >>> Rdkit-discuss mailing list > >>> Rdkit-discuss@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss