Maciek,
Thank you for the resource. I actually had based my initial troubleshooting
efforts off of that blog spot. In retrospect I should have included that
information in my original post. Here is the basic code for how I filter my
hit list against a filter list.
def get_compound_molfile(Compound_ID):
imax,jmax = inhibitors.shape
mol_file = []
for i in range (imax):
compound_data = inhibitors.iloc[i,:]
if Compound_ID in compound_data.ravel():
mol_file = inhibitors.iloc[i,21]
else:
mol_file = mol_file
return mol_file
def filter_hits(mol_file,filter_list):
imax,jmax = filter_list.shape
filter_matches = []
for i in range(imax):
filter_compound_molfile = fcm = filter_list.iloc[i,2]
mol_fileh = mfh = Chem.AddHs(mol_file)
fcmh = Chem.MergeQueryHs(fcm)
result = mfh.HasSubstructMatch(fcmh)
if result:
filter_matches.append(filter_list.iloc[i,1])
else:
continue
if len(filter_matches)>0:
return str(filter_matches)
else:
return np.nan
def filter_hit_list(hit_list, filter_list):
filterd_list = hit_list.copy()
imax,jmax = hit_list.shape
for i in range (imax):
Compound_ID = hit_list.iloc[i,0]
m = get_compound_molfile(Compound_ID)
p = filter_hits(m,filter_list)
filterd_list.iloc[i,jmax-1] = str(p)
return filterd_list
In the second function (filter_hits) I add Hs to the hit compound mol_file with
Chem.AddHs, and I merge the Hs to the filter_list compound mol_file with
Chem.MergeQueryHs. Since the blog mentioned in your e mail showed that the
HasSubstructMatch function works when both inputs have their respective
hydrogens in the structure representation, I decided to cover my basis and make
sure I wasn't missing any hydrogens from either species.
Christopher R. Bodle
PhD Candidate, University of Iowa
College of Pharmacy
Division of Medicinal and Natural Products Chemistry
115 S. Grand Avenue-Rm. S338
Iowa City, Iowa 52242
(319) 335-7845
________________________________
From: Maciek Wójcikowski [mac...@wojcikowski.pl]
Sent: Wednesday, September 16, 2015 3:22 AM
To: Bodle, Christopher R
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] possible SMARTS translating mistake?
Hi Christopher,
Since you're mentioning Rajarshi's SMARTS, I guess that you haven't seen Greg's
latest revision of PAINS filters (see
http://rdkit.blogspot.com.es/2015/08/curating-pains-filters.html). On the other
hand, during RDKit UGM I remember Greg saying that some of the filters would
require changes to RDKit's aromatic model, and this one seams to be the case
(Greg might confirm/check?).
Best,
Maciej
2015-09-15 18:48 GMT+02:00 Bodle, Christopher R
<christopher-bo...@uiowa.edu<mailto:christopher-bo...@uiowa.edu>>:
All,
I am working on a filtering code in python to search for substructure matches
against my hit list (in SMILES) and my filter lists (in SMARTS). My current
filter lists were copied from Rajarshi Guha's blog at
http://blog.rguha.net/?p=850.
While working on this I was working with the following SMARTS string from the
p_l150 collection, filter purrole_A(118):
n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
I have highlighted the problem area in the string. Although this should be
interpreted as 'not H', the rendering generated from Chem.MolFromSmarts does
indeed result in a hydrogen in this position, which is in the middle of an
aromatic ring and results in a valency issue and as such I can't standardize
the mol for filtering purposes.
I confirmed this by making the following edit to the SMILES string:
n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
Which results in a carbon in the position of the hydrogen from the original
SMARTS. Is this a problem with the SMARTS translator? Or is there something
that I am missing?
I believe this happens quite frequently. When running a standardization code
for the filter p_l150 (55 compounds) using:
p_l150['standardized mol']=''
imax,jmax = p_l150.shape
for i in range(imax):
mol_file =mf= p_l150.loc[i,'mol file']
s = Standardizer()
try:
m = Chem.MolToSmiles(mf)
m2 = standardize_smiles(m)
m3 = Chem.MolFromSmiles(m2)
smol = s.standardize(m3)
p_l150.loc[i,'standardized mol'] = smol
except Exception as e:
print p_l150.loc[i,'filter'], e
p_l150
I return 11 errors, 8 of which are valency (7 of those involve hydrogens):
<regId="pyrrole_A(118)"> Sanitization error: Explicit valence for atom # 8 H,
3, is greater than permitted
<regId="imine_one_fives(89)"> Sanitization error: Explicit valence for atom # 3
H, 3, is greater than permitted
<regId="hzone_pipzn(79)"> Sanitization error: Explicit valence for atom # 3 H,
2, is greater than permitted
<regId="hzone_pyrrol(64)"> Sanitization error: Can't kekulize mol
<regId="cyano_pyridone_A(54)"> Sanitization error: Explicit valence for atom #
1 H, 3, is greater than permitted
<regId="het_pyridiniums_A(39)"> Sanitization error: Explicit valence for atom #
5 H, 3, is greater than permitted
<regId="diazox_sulfon_A(36)"> Sanitization error: Explicit valence for atom #
14 C, 5, is greater than permitted
<regId="pyrrole_B(29)"> Sanitization error: Explicit valence for atom # 9 H, 3,
is greater than permitted
<regId="thiophene_hydroxy(28)"> Sanitization error: Can't kekulize mol
<regId="imidazole_A(19)"> Sanitization error: Explicit valence for atom # 4 H,
2, is greater than permitted
<regId="het_6_tetrazine(18)"> Sanitization error: Aromatic bonds on non
aromatic atom 1
Any insight would be greatly appreciated.
Thank you
Christopher R. Bodle
PhD Candidate, University of Iowa
College of Pharmacy
Division of Medicinal and Natural Products Chemistry
115 S. Grand Avenue-Rm. S338
Iowa City, Iowa 52242
(319) 335-7845
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss