On Fri, Jul 8, 2011 at 8:21 PM, JP <[email protected]> wrote:
> Any ideas why the attached (not very well written) code gives this error (or
> warning?) when run in the following manner:
> Set_Protonation_State.py Set_Protonation_State_Tests.smi
> [19:17:10]  unsupported radical count: 3 set to 3.
> And what does it mean?  Doesn't make much sense to me.  The unsupported
> value is 3 and so I/rdkit has to set it to 3 (?!)
> Am I being thick?

You mean you don't find the meaning of the warning completely obvious? ;-)

The warning is coming during the SDF writing process when the RAD line
is being written for atoms. The "RAD" line in CTABs, despite it's
name, isn't actually the radical count on atoms; it's their spin
state. So "2" means "doublet", not "two unpaired electrons". The
warning message is telling you that it found an atom with a radical
count of 3 (!) and set the value for the spin state in the SD file to
3. It's a stupid error message combined with chemically unreasonable
behavior. I fixed the CTAB writer to generate "2" (doublet) for odd
radical counts and "3" (triplet) for even radical counts. The RDKit
doesn't have enough information to be able to distinguish doublet from
singlet, so this last bit is totally arbitrary.

There's a more interesting question about how exactly the radical
count is getting that high in your example. Your code is not doing
exactly what I guess you think it's doing. Starting with the first
replacement:

   repl1 = Chem.MolFromSmiles('[O-]')
   patt1 = Chem.MolFromSmarts('[$([OH]C(=O))]')
   m1 = AllChem.ReplaceSubstructs(reagent1,patt1,repl1,replaceAll=True)
   print "Input: " , Chem.MolToSmiles(reagent1), "    Output: ",
Chem.MolToSmiles(m1[0])

Your replacement atom has a radical on it:
>>> repl = Chem.MolFromSmiles('[O-]')
>>> repl.Debug()
Atoms:
        0 8 O chg: -1  deg: 0 exp: 0 imp: 0 hyb: 4 arom?: 0 chi: 0 rad: 1
Bonds:

And that radical is carried over into the product molecule:

>>> patt1 = Chem.MolFromSmarts('[$([OH]C(=O))]')
>>> reagent1 = Chem.MolFromSmiles('CC(=O)O')
>>> m1 = AllChem.ReplaceSubstructs(reagent1,patt1,repl,replaceAll=True)
>>> m1[0].Debug()
Atoms:
        0 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: 4 arom?: 0 chi: 0
        1 6 C chg: 0  deg: 3 exp: 4 imp: 0 hyb: 3 arom?: 0 chi: 0
        2 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
        3 8 O chg: -1  deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0 rad: 1
Bonds:
        0 0->1 order: 1 conj?: 0 aromatic?: 0
        1 1->2 order: 2 conj?: 1 aromatic?: 0
        2 3->1 order: 1 conj?: 0 aromatic?: 0

I guess this is not what you intend.

A simple workaround is to not sanitize your replacement pieces:
>>> repl2 = Chem.MolFromSmiles('[O-]',sanitize=False)
>>> m1 = AllChem.ReplaceSubstructs(reagent1,patt1,repl2,replaceAll=True)
>>> m1[0].Debug()
Atoms:
        0 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: 4 arom?: 0 chi: 0
        1 6 C chg: 0  deg: 3 exp: 4 imp: 0 hyb: 3 arom?: 0 chi: 0
        2 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
        3 8 O chg: -1  deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0
Bonds:
        0 0->1 order: 1 conj?: 0 aromatic?: 0
        1 1->2 order: 2 conj?: 1 aromatic?: 0
        2 3->1 order: 1 conj?: 0 aromatic?: 0
>>> print Chem.MolToSmiles(m1[0])
CC(=O)[O-]

You'll have to be somewhat careful about your tautomer fixes (repl 4
and repl 5), but those should also be manageable this way.

The other problem that's likely to bite you relatively soon is the
fact that you aren't updating chemistry information on molecules
between steps. You should at least be updating the valence counts and
ring information.

I've attached a somewhat refactored version of your code that is more
likely to generate correct results.

-greg
#!/usr/bin/python

from rdkit import Chem
from rdkit.Chem import AllChem 

import sys

reagentlist1 = Chem.SmilesMolSupplier(sys.argv[1],'\t',0,1,0)

smi = Chem.SmilesWriter('out.smi')
sdf = Chem.SDWriter('out.sdf')
rules=(
# Carboxylic Acid ionization
   ('[O-]','[$([OH]C(=O))]'),

# Thiocarboxylic Acid ionization
   ('[S-]','[$([SH]C(=O))]'),

# Tetrazole ionization (Tautomer 1)
   ('c1[n-]nnn1','c1[nH]nnn1'),

# Tetrazole ionization (Tautomer 2)
   ('c1[n-]nnn1','c1n[nH]nn1'),

# Aromatic thiol ionization
   ('[S-]','[$([SH]c)]'),

# Sulphate ionization
   ('[O-]','[$([OH]S(=O)(=O)[O,C,N])]'),

# Activated sulphonamide
   ('[N-]','[$([NH]S(=O)(=O)C(F)(F))]'),

# Now the bases

# Amines 1 
   ('[NH3+]','[$([NX3;H2][CX4])]'),

# Amines 2 
   ('[NH2+]','[$([NX3;H1]([CX4])[CX4])]'),

# Amines 3 
   ('[NH1+]','[$([NX3;H0]([CX4])([CX4])[CX4])]'),

# Amidines 1  
   ('[NH2+]','[$([NH1;+0]=[C;!r]([NH2;+0])[CX4,c]),$([NH1;+0]=[C;!r]([NH1;+0][CX4,c])[CX4,c]),$([NH1;+0]=[C;!r]([NH0;+0]([CX4,c])[CX4,c])[CX4,c])]') ,

# Amidines 2  
   ('[NH1+]','[$([NH0;+0]([CX4,c])=[C;!r]([NH2;+0])[CX4,c]),$([NH0;+0]([CX4,c])=[C;!r]([NH1;+0][CX4,c])[CX4,c]),$([NH0;+0]([CX4,c])=[C;!r]([NH0;+0]([CX4,c])[CX4,c])[CX4,c])]') ,

# Guandidines 1  
   ('[NH2+]','[$([NH1;+0]=[C;!r]([NH2;+0])[NX3;H1][CX4,c,#1]),$([NH1;+0]=[C;!r]([NH1;+0][CX4,c])[NX3;H1][CX4,c]),$([NH1;+0]=[C;!r]([NH0;+0]([CX4,c])[CX4,c])[NX3;H1][CX4,c])]') ,

# Guandidines 2  
   ('[NH2+]','[$([NH1;+0]=[C;!r]([NH2;+0])[NX3;H0]([CX4,c])[CX4,c]),$([NH1;+0]=[C;!r]([NH1;+0][CX4,c])[NX3;H0]([CX4,c])[CX4,c]),$([NH1;+0]=[C;!r]([NH0;+0]([CX4,c])[CX4,c])[NX3;H0]([CX4,c])[CX4,c])]') ,

# Guandidines 3  
   ('[NH1+]','[$([NH0;+0]([CX4,c])=[C;!r]([NH2;+0])[NX3;H1][CX4,c,#1]),$([NH0;+0]([CX4,c])=[C;!r]([NH1;+0][CX4,c])[NX3;H1][CX4,c]),$([NH0;+0]([CX4,c])=[C;!r]([NH0;+0]([CX4,c])[CX4,c])[NX3;H1][CX4,c])]') ,

# Guandidines 4  
   ('[NH1+]','[$([NH0;+0]([CX4,c])=[C;!r]([NH2;+0])[NX3;H0]([CX4,c])[CX4,c]),$([NH0;+0]([CX4,c])=[C;!r]([NH1;+0][CX4,c])[NX3;H0]([CX4,c])[CX4,c]),$([NH0;+0]([CX4,c])=[C;!r]([NH0;+0]([CX4,c])[CX4,c])[NX3;H0]([CX4,c])[CX4,c])]') ,

# Miscellaneous (infrequently occurring acids

# Ascobic acid-like 
   ('[O-]','[$([OH]C1=CC(=O)O[CX4]1)]'),

# Tetramic acid-like 
   ('[C-1]','[$([CX4;H2;r]1C(=O)N[CX4]C(=O)1)]'),
)

for reagent1 in reagentlist1:
   print '# ---------------------'
   Chem.SanitizeMol(reagent1)

   for repl,patt in rules:
      repl = Chem.MolFromSmiles(repl,sanitize=False)
      patt = Chem.MolFromSmarts(patt)
      ps = AllChem.ReplaceSubstructs(reagent1,patt,repl,replaceAll=True)
      ps[0].UpdatePropertyCache(False)
      Chem.GetSymmSSSR(ps[0])
      print "Input: " , Chem.MolToSmiles(reagent1), "    Output: ", Chem.MolToSmiles(ps[0])
      reagent1=ps[0]

# Finally, clean up and write smiles and sdf versions
   Chem.SanitizeMol(reagent1)
   AllChem.Compute2DCoords(reagent1)
   smi.write(reagent1)
   sdf.write(reagent1)


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to