All,

I am working on a filtering code in python to search for substructure matches 
against my hit list (in SMILES) and my filter lists (in SMARTS).  My current 
filter lists were copied from Rajarshi Guha's blog at 
http://blog.rguha.net/?p=850.

While working on this I was working with the following SMARTS string from the 
p_l150 collection, filter purrole_A(118):


n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]

I have highlighted the problem area in the string.  Although this should be 
interpreted as 'not H', the rendering generated from Chem.MolFromSmarts does 
indeed result in a hydrogen in this position, which is in the middle of an 
aromatic ring and results in a valency issue and as such I can't standardize 
the mol for filtering purposes.

I confirmed this by making the following edit to the SMILES string:
n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]

Which results in a carbon in the position of the hydrogen from the original 
SMARTS.  Is this a problem with the SMARTS translator?  Or is there something 
that I am missing?

I believe this happens quite frequently.  When running a standardization code 
for the filter p_l150 (55 compounds) using:

p_l150['standardized mol']=''
imax,jmax = p_l150.shape
for i in range(imax):
    mol_file =mf= p_l150.loc[i,'mol file']
    s = Standardizer()
    try:
        m = Chem.MolToSmiles(mf)
        m2 = standardize_smiles(m)
        m3 = Chem.MolFromSmiles(m2)
        smol = s.standardize(m3)
        p_l150.loc[i,'standardized mol'] = smol
    except Exception as e:
        print p_l150.loc[i,'filter'], e
p_l150

I return 11 errors, 8 of which are valency (7 of those involve hydrogens):


<regId="pyrrole_A(118)"> Sanitization error: Explicit valence for atom # 8 H, 
3, is greater than permitted
<regId="imine_one_fives(89)"> Sanitization error: Explicit valence for atom # 3 
H, 3, is greater than permitted
<regId="hzone_pipzn(79)"> Sanitization error: Explicit valence for atom # 3 H, 
2, is greater than permitted
<regId="hzone_pyrrol(64)"> Sanitization error: Can't kekulize mol

<regId="cyano_pyridone_A(54)"> Sanitization error: Explicit valence for atom # 
1 H, 3, is greater than permitted
<regId="het_pyridiniums_A(39)"> Sanitization error: Explicit valence for atom # 
5 H, 3, is greater than permitted
<regId="diazox_sulfon_A(36)"> Sanitization error: Explicit valence for atom # 
14 C, 5, is greater than permitted
<regId="pyrrole_B(29)"> Sanitization error: Explicit valence for atom # 9 H, 3, 
is greater than permitted
<regId="thiophene_hydroxy(28)"> Sanitization error: Can't kekulize mol

<regId="imidazole_A(19)"> Sanitization error: Explicit valence for atom # 4 H, 
2, is greater than permitted
<regId="het_6_tetrazine(18)"> Sanitization error: Aromatic bonds on non 
aromatic atom 1


Any insight would be greatly appreciated.


Thank you


Christopher R. Bodle

PhD Candidate, University of Iowa

College of Pharmacy

Division of Medicinal and Natural Products Chemistry

115 S. Grand Avenue-Rm. S338

Iowa City, Iowa 52242

(319) 335-7845


------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to