Hi Christopher,
Since you're mentioning Rajarshi's SMARTS, I guess that you haven't seen
Greg's latest revision of PAINS filters (see
http://rdkit.blogspot.com.es/2015/08/curating-pains-filters.html). On the
other hand, during RDKit UGM I remember Greg saying that some of the
filters would require changes to RDKit's aromatic model, and this one seams
to be the case (Greg might confirm/check?).
Best,
Maciej
2015-09-15 18:48 GMT+02:00 Bodle, Christopher R <christopher-bo...@uiowa.edu
>:
> All,
>
> I am working on a filtering code in python to search for substructure
> matches against my hit list (in SMILES) and my filter lists (in SMARTS).
> My current filter lists were copied from Rajarshi Guha's blog at
> http://blog.rguha.net/?p=850.
>
> While working on this I was working with the following SMARTS string from
> the p_l150 collection, filter purrole_A(118):
>
> n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
>
>
> I have highlighted the problem area in the string. Although this should
> be interpreted as 'not H', the rendering generated from Chem.MolFromSmarts
> does indeed result in a hydrogen in this position, which is in the middle
> of an aromatic ring and results in a valency issue and as such I can't
> standardize the mol for filtering purposes.
>
> I confirmed this by making the following edit to the SMILES string:
> n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
>
> Which results in a carbon in the position of the hydrogen from the
> original SMARTS. Is this a problem with the SMARTS translator? Or is
> there something that I am missing?
>
> I believe this happens quite frequently. When running a standardization
> code for the filter p_l150 (55 compounds) using:
>
> p_l150['standardized mol']=''
> imax,jmax = p_l150.shape
> for i in range(imax):
> mol_file =mf= p_l150.loc[i,'mol file']
> s = Standardizer()
> try:
> m = Chem.MolToSmiles(mf)
> m2 = standardize_smiles(m)
> m3 = Chem.MolFromSmiles(m2)
> smol = s.standardize(m3)
> p_l150.loc[i,'standardized mol'] = smol
> except Exception as e:
> print p_l150.loc[i,'filter'], e
> p_l150
>
> I return 11 errors, 8 of which are valency (7 of those involve hydrogens):
>
> <regId="pyrrole_A(118)"> Sanitization error: Explicit valence for atom # 8 H,
> 3, is greater than permitted
> <regId="imine_one_fives(89)"> Sanitization error: Explicit valence for atom #
> 3 H, 3, is greater than permitted
> <regId="hzone_pipzn(79)"> Sanitization error: Explicit valence for atom # 3
> H, 2, is greater than permitted
> <regId="hzone_pyrrol(64)"> Sanitization error: Can't kekulize mol
>
> <regId="cyano_pyridone_A(54)"> Sanitization error: Explicit valence for atom
> # 1 H, 3, is greater than permitted
> <regId="het_pyridiniums_A(39)"> Sanitization error: Explicit valence for atom
> # 5 H, 3, is greater than permitted
> <regId="diazox_sulfon_A(36)"> Sanitization error: Explicit valence for atom #
> 14 C, 5, is greater than permitted
> <regId="pyrrole_B(29)"> Sanitization error: Explicit valence for atom # 9 H,
> 3, is greater than permitted
> <regId="thiophene_hydroxy(28)"> Sanitization error: Can't kekulize mol
>
> <regId="imidazole_A(19)"> Sanitization error: Explicit valence for atom # 4
> H, 2, is greater than permitted
> <regId="het_6_tetrazine(18)"> Sanitization error: Aromatic bonds on non
> aromatic atom 1
>
>
> Any insight would be greatly appreciated.
>
>
> Thank you
>
>
> Christopher R. Bodle
>
> PhD Candidate, University of Iowa
>
> College of Pharmacy
>
> Division of Medicinal and Natural Products Chemistry
>
> 115 S. Grand Avenue-Rm. S338
>
> Iowa City, Iowa 52242
>
> (319) 335-7845
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss