To find all compounds that match the ester substructure, you can use GetSubstructMatches. I would do the following (I'm supposing you have all your structures in a Pandas dataframe, and that you converted SMILES to RDKit Mol):
ester_pattern = Chem.MolFromSmarts("COC(C)=O") # in a pandas dataframe with a column containing your structures as RDKit Mol objects df["is_ester"] = df["rdkit_mol"].apply(lambda x: bool(x.GetSubstructMatches(ester_pattern))) This will give you a column with 0s and 1s that you can use as a mask. Of course, there are other ways to do this, like using a for loop. Now, to replace the esters by the disulfide group, if you can't manage to work with reaction SMARTS, you could try using Python strings' replace() method on SMILES. I believe esters can be represented in two ways (left to right and right to left), so keep that in mind. You can always use GetSubstructMatches later to see if any ester was left behind. Regards. On Sun, May 15, 2022 at 1:33 AM Ming Hao <haom.ni...@gmail.com> wrote: > Hi All, > > I want to replace the ester structure ('COC(C)=O') with disulfide ('CSSC') > > [image: image.png] > > Here is what I did, but it does not work. It seems to need specified > methods to replace the original structure with the new one, not just put > individual SMILES there. > > ############################################################## > from rdkit import Chem > from rdkit.Chem import AllChem, Draw > from rdkit.Chem.Draw import IPythonConsole > > orgsmi = 'CCOC(=O)CCCCCN(CC)CCCCCCCC(=O)OC(C)CC' > m = Chem.MolFromSmiles(orgsmi) > m > > pat = Chem.MolFromSmiles('COC(C)=O') > pat > > rep = Chem.MolFromSmiles('CSSC') > rep > > new = AllChem.ReplaceSubstructs(m, pat, rep) > new[0] # The structure was separated > new[1] # The structure was separated > len(new) > ################################################################# > > Can you help me with this? By the way, I have 10K structures, and first I > need to find the compounds with the pattern (ester, COC(C)=O) and replace > them with disulfide ('CSSC'). What is a good way to do this? > > Thanks. > Ming > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- *Rafael da Fonseca Lameiro * PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED) São Carlos Institute of Chemistry - University of São Paulo - Brazil [image: orcid logo 16px] https://orcid.org/0000-0003-4466-2682
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss