Hi,
I'm not still no completely sure i get it, but here's an attempt.
This version of your function collects the modified molecules in a list and
returns them from the function so that you can work with them further in
the calling code:
def mutatemol(mol):
res = []
patterns = ('[C,c;!H0]','[N;!H0]')
replacements = ('CC','NC')
if (mol.HasProp('_Name')):
mName = mol.GetProp('_Name')
else:
mName = 'StructX'
for i in range(len(patterns)):
pat = Chem.MolFromSmarts(patterns[i])
repl = Chem.MolFromSmiles(replacements[i])
mutmol = AllChem.ReplaceSubstructs(mol,pat,repl)
for j,mm in enumerate(mutmol):
mm.SetProp('_Name', '%s_%d_%d'%(mName,i,j))
AllChem.SanitizeMol(mm)
AllChem.Compute2DCoords(mm)
res.append(mm)
return res
This may work, but it will lose information about stereochemistry:
>>> m = Chem.MolFromSmiles('O[C@H](Cl)F')
>>> rs = mutatemol(m)
>>> for r in rs: print Chem.MolToSmiles(r,True)
CC(O)(F)Cl
O[C@@H](F)Cl
That second results is due to the fact that the code isn't checking whether
or not ReplaceSubstructs will actually do anything. This is fixable:
def mutatemol2(mol):
res = []
patterns = ('[C,c;!H0]','[N;!H0]')
replacements = ('CC','NC')
if (mol.HasProp('_Name')):
mName = mol.GetProp('_Name')
else:
mName = 'StructX'
for i in range(len(patterns)):
pat = Chem.MolFromSmarts(patterns[i])
if not mol.HasSubstructMatch(pat):
continue
repl = Chem.MolFromSmiles(replacements[i])
mutmol = AllChem.ReplaceSubstructs(mol,pat,repl)
for j,mm in enumerate(mutmol):
mm.SetProp('_Name', '%s_%d_%d'%(mName,i,j))
AllChem.SanitizeMol(mm)
AllChem.Compute2DCoords(mm)
res.append(mm)
return res
Now I get:
>>> m = Chem.MolFromSmiles('O[C@H](Cl)F')
>>> rs = mutatemol2(m)
>>> for r in rs: print Chem.MolToSmiles(r,True)
CC(O)(F)Cl
At least only one result, but the stereochemistry is still gone. This is
also fixable by first adding Hs to the molecule and then replacing the ones
connected to Cs or Ns:
def mutatemol3(mol):
res = []
hmol = Chem.AddHs(mol)
patterns = ('[$([#1][#6])]','[$([#1][#7])]')
replacements = ('C','C')
if (mol.HasProp('_Name')):
mName = mol.GetProp('_Name')
else:
mName = 'StructX'
for i in range(len(patterns)):
pat = Chem.MolFromSmarts(patterns[i])
if not hmol.HasSubstructMatch(pat):
continue
repl = Chem.MolFromSmiles(replacements[i])
mutmol = AllChem.ReplaceSubstructs(hmol,pat,repl)
for j,mm in enumerate(mutmol):
mm.SetProp('_Name', '%s_%d_%d'%(mName,i,j))
mm=Chem.RemoveHs(mm)
AllChem.Compute2DCoords(mm)
res.append(mm)
return res
>>> m = Chem.MolFromSmiles('O[C@H](Cl)F')
>>> rs = mutatemol3(m)
>>> for r in rs: print Chem.MolToSmiles(r,True)
C[C@](O)(F)Cl
A possible problem with this one is that it returns one results for every
terminal H attached to a C or N.
So if I have a molecule with a terminal -CH3, I get three results:
>>> m = Chem.MolFromSmiles('CO')
>>> rs = mutatemol3(m)
>>> for r in rs: print Chem.MolToSmiles(r,True)
CCO
CCO
CCO
If that's a problem it can be fixed by checking for uniqueness:
def mutatemol4(mol):
res = []
seen=set()
hmol = Chem.AddHs(mol)
patterns = ('[$([#1][#6])]','[$([#1][#7])]')
replacements = ('C','C')
if (mol.HasProp('_Name')):
mName = mol.GetProp('_Name')
else:
mName = 'StructX'
for i in range(len(patterns)):
pat = Chem.MolFromSmarts(patterns[i])
if not hmol.HasSubstructMatch(pat):
continue
repl = Chem.MolFromSmiles(replacements[i])
mutmol = AllChem.ReplaceSubstructs(hmol,pat,repl)
for j,mm in enumerate(mutmol):
mm.SetProp('_Name', '%s_%d_%d'%(mName,i,j))
mm=Chem.RemoveHs(mm)
AllChem.Compute2DCoords(mm)
smi = Chem.MolToSmiles(mm,True)
if smi not in seen:
seen.add(smi)
res.append(mm)
return res
This now returns a single result:
>>> m = Chem.MolFromSmiles('CO')
>>> rs = mutatemol4(m)
>>> for r in rs: print Chem.MolToSmiles(r,True)
CCO
Does that help?
-greg
On Thu, Nov 7, 2013 at 11:03 AM, Basil Hartzoulakis
<[email protected]>wrote:
> Hi Greg,
>
> Thanks or the answer,
>
>
>
> There is a small example that replaces terminal hydrogens with methyl
> groups.
>
>
>
> -----------------------
>
> def mutatemol(mol, outputsdf):
>
> outf=open(outputsdf, 'w+')
>
> writer = Chem.SDWriter(outf)
>
> patterns = ('[C,c;!H0]','[N;!H0]')
>
> replacements = ('[#6](C)','[N](C)')
>
> if (mol.HasProp('_Name')):
>
> mName = mol.GetProp('_Name')
>
> else:
>
> mName = 'StructX'
>
> for i in range(0, len(patterns)):
>
> pat = Chem.MolFromSmarts(patterns[i])
>
> repl = Chem.MolFromSmarts(replacements[i])
>
> mutmol =
> AllChem.ReplaceSubstructs(mol,pat,repl)
>
> for j in range(0, len(mutmol)):
>
> mname
> ='%s_%d_%d'%(mName,i,j)
>
> mutmol[j].SetProp('_Name',
> mname)
>
>
> AllChem.SanitizeMol(mutmol[j])
>
>
> AllChem.Compute2DCoords(mutmol[j])
>
> writer.write(mutmol[j])
>
> writer.flush()
>
> outf.close()
>
>
>
> -------------------------------------------
>
>
>
> I would like to replace the second “for” loop with a function that puts
> structures from mutmol in a temporary container of sorts. Is this possible?
>
> Its not critical but it I think it would give me more flexibility in how
> to manipulate the output.
>
>
>
> Regards
>
>
>
> Basil
>
>
>
>
>
>
>
> *From:* Greg Landrum [mailto:[email protected]]
> *Sent:* 07 November 2013 02:57
> *To:* [email protected]
> *Cc:* RDKit Discuss
> *Subject:* Re: [Rdkit-discuss] ReplaceSubstructs Output
>
>
>
> Hi Basil
>
>
>
> On Tue, Nov 5, 2013 at 1:31 PM, Basil Hartzoulakis <[email protected]>
> wrote:
>
> Hello,
>
> I am building a tool with multiple calls of the ReplaceSustructs on the
> same structure.
>
> Is there any easy way to place the output of each call in a pickle or some
> temporary holder?
>
> At the moment I have to iterate through each output and the code looks a
> bit awkward.
>
>
>
>
>
> I'm not quite sure what you mean. Can you provide a short code snippet
> that shows the repeated calls?
>
>
>
> -greg
>
>
>
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss