All, I am working on a filtering code in python to search for substructure matches against my hit list (in SMILES) and my filter lists (in SMARTS). My current filter lists were copied from Rajarshi Guha's blog at http://blog.rguha.net/?p=850.
While working on this I was working with the following SMARTS string from the p_l150 collection, filter purrole_A(118): n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4] I have highlighted the problem area in the string. Although this should be interpreted as 'not H', the rendering generated from Chem.MolFromSmarts does indeed result in a hydrogen in this position, which is in the middle of an aromatic ring and results in a valency issue and as such I can't standardize the mol for filtering purposes. I confirmed this by making the following edit to the SMILES string: n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4] Which results in a carbon in the position of the hydrogen from the original SMARTS. Is this a problem with the SMARTS translator? Or is there something that I am missing? I believe this happens quite frequently. When running a standardization code for the filter p_l150 (55 compounds) using: p_l150['standardized mol']='' imax,jmax = p_l150.shape for i in range(imax): mol_file =mf= p_l150.loc[i,'mol file'] s = Standardizer() try: m = Chem.MolToSmiles(mf) m2 = standardize_smiles(m) m3 = Chem.MolFromSmiles(m2) smol = s.standardize(m3) p_l150.loc[i,'standardized mol'] = smol except Exception as e: print p_l150.loc[i,'filter'], e p_l150 I return 11 errors, 8 of which are valency (7 of those involve hydrogens): <regId="pyrrole_A(118)"> Sanitization error: Explicit valence for atom # 8 H, 3, is greater than permitted <regId="imine_one_fives(89)"> Sanitization error: Explicit valence for atom # 3 H, 3, is greater than permitted <regId="hzone_pipzn(79)"> Sanitization error: Explicit valence for atom # 3 H, 2, is greater than permitted <regId="hzone_pyrrol(64)"> Sanitization error: Can't kekulize mol <regId="cyano_pyridone_A(54)"> Sanitization error: Explicit valence for atom # 1 H, 3, is greater than permitted <regId="het_pyridiniums_A(39)"> Sanitization error: Explicit valence for atom # 5 H, 3, is greater than permitted <regId="diazox_sulfon_A(36)"> Sanitization error: Explicit valence for atom # 14 C, 5, is greater than permitted <regId="pyrrole_B(29)"> Sanitization error: Explicit valence for atom # 9 H, 3, is greater than permitted <regId="thiophene_hydroxy(28)"> Sanitization error: Can't kekulize mol <regId="imidazole_A(19)"> Sanitization error: Explicit valence for atom # 4 H, 2, is greater than permitted <regId="het_6_tetrazine(18)"> Sanitization error: Aromatic bonds on non aromatic atom 1 Any insight would be greatly appreciated. Thank you Christopher R. Bodle PhD Candidate, University of Iowa College of Pharmacy Division of Medicinal and Natural Products Chemistry 115 S. Grand Avenue-Rm. S338 Iowa City, Iowa 52242 (319) 335-7845
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss