Maciek,

Thank you for the resource.  I actually had based my initial troubleshooting 
efforts off of that blog spot.  In retrospect I should have included that 
information in my original post.  Here is the basic code for how I filter my 
hit list against a filter list.

def get_compound_molfile(Compound_ID):
    imax,jmax = inhibitors.shape
    mol_file = []
    for i in range (imax):
        compound_data = inhibitors.iloc[i,:]
        if Compound_ID in compound_data.ravel():
            mol_file = inhibitors.iloc[i,21]
        else:
            mol_file = mol_file
    return mol_file

def filter_hits(mol_file,filter_list):
    imax,jmax = filter_list.shape
    filter_matches = []
    for i in range(imax):
        filter_compound_molfile = fcm = filter_list.iloc[i,2]
        mol_fileh = mfh = Chem.AddHs(mol_file)
        fcmh = Chem.MergeQueryHs(fcm)
        result = mfh.HasSubstructMatch(fcmh)
        if result:
            filter_matches.append(filter_list.iloc[i,1])
        else:
            continue
    if len(filter_matches)>0:
        return str(filter_matches)
    else:
        return np.nan

def filter_hit_list(hit_list, filter_list):
    filterd_list = hit_list.copy()
    imax,jmax = hit_list.shape
    for i in range (imax):
        Compound_ID = hit_list.iloc[i,0]
        m = get_compound_molfile(Compound_ID)
        p = filter_hits(m,filter_list)
        filterd_list.iloc[i,jmax-1] = str(p)
    return filterd_list

In the second function (filter_hits) I add Hs to the hit compound mol_file with 
Chem.AddHs, and I merge the Hs to the filter_list compound mol_file with 
Chem.MergeQueryHs.  Since the blog mentioned in your e mail showed that the 
HasSubstructMatch function works when both inputs have their respective 
hydrogens in the structure representation, I decided to cover my basis and make 
sure I wasn't missing any hydrogens from either species.



Christopher R. Bodle

PhD Candidate, University of Iowa

College of Pharmacy

Division of Medicinal and Natural Products Chemistry

115 S. Grand Avenue-Rm. S338

Iowa City, Iowa 52242

(319) 335-7845



________________________________
From: Maciek Wójcikowski [mac...@wojcikowski.pl]
Sent: Wednesday, September 16, 2015 3:22 AM
To: Bodle, Christopher R
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] possible SMARTS translating mistake?

Hi Christopher,

Since you're mentioning Rajarshi's SMARTS, I guess that you haven't seen Greg's 
latest revision of PAINS filters (see 
http://rdkit.blogspot.com.es/2015/08/curating-pains-filters.html). On the other 
hand, during RDKit UGM I remember Greg saying that some of the filters would 
require changes to RDKit's aromatic model, and this one seams to be the case 
(Greg might confirm/check?).

Best,
Maciej

2015-09-15 18:48 GMT+02:00 Bodle, Christopher R 
<christopher-bo...@uiowa.edu<mailto:christopher-bo...@uiowa.edu>>:
All,

I am working on a filtering code in python to search for substructure matches 
against my hit list (in SMILES) and my filter lists (in SMARTS).  My current 
filter lists were copied from Rajarshi Guha's blog at 
http://blog.rguha.net/?p=850.

While working on this I was working with the following SMARTS string from the 
p_l150 collection, filter purrole_A(118):


n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]

I have highlighted the problem area in the string.  Although this should be 
interpreted as 'not H', the rendering generated from Chem.MolFromSmarts does 
indeed result in a hydrogen in this position, which is in the middle of an 
aromatic ring and results in a valency issue and as such I can't standardize 
the mol for filtering purposes.

I confirmed this by making the following edit to the SMILES string:
n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]

Which results in a carbon in the position of the hydrogen from the original 
SMARTS.  Is this a problem with the SMARTS translator?  Or is there something 
that I am missing?

I believe this happens quite frequently.  When running a standardization code 
for the filter p_l150 (55 compounds) using:

p_l150['standardized mol']=''
imax,jmax = p_l150.shape
for i in range(imax):
    mol_file =mf= p_l150.loc[i,'mol file']
    s = Standardizer()
    try:
        m = Chem.MolToSmiles(mf)
        m2 = standardize_smiles(m)
        m3 = Chem.MolFromSmiles(m2)
        smol = s.standardize(m3)
        p_l150.loc[i,'standardized mol'] = smol
    except Exception as e:
        print p_l150.loc[i,'filter'], e
p_l150

I return 11 errors, 8 of which are valency (7 of those involve hydrogens):


<regId="pyrrole_A(118)"> Sanitization error: Explicit valence for atom # 8 H, 
3, is greater than permitted
<regId="imine_one_fives(89)"> Sanitization error: Explicit valence for atom # 3 
H, 3, is greater than permitted
<regId="hzone_pipzn(79)"> Sanitization error: Explicit valence for atom # 3 H, 
2, is greater than permitted
<regId="hzone_pyrrol(64)"> Sanitization error: Can't kekulize mol

<regId="cyano_pyridone_A(54)"> Sanitization error: Explicit valence for atom # 
1 H, 3, is greater than permitted
<regId="het_pyridiniums_A(39)"> Sanitization error: Explicit valence for atom # 
5 H, 3, is greater than permitted
<regId="diazox_sulfon_A(36)"> Sanitization error: Explicit valence for atom # 
14 C, 5, is greater than permitted
<regId="pyrrole_B(29)"> Sanitization error: Explicit valence for atom # 9 H, 3, 
is greater than permitted
<regId="thiophene_hydroxy(28)"> Sanitization error: Can't kekulize mol

<regId="imidazole_A(19)"> Sanitization error: Explicit valence for atom # 4 H, 
2, is greater than permitted
<regId="het_6_tetrazine(18)"> Sanitization error: Aromatic bonds on non 
aromatic atom 1


Any insight would be greatly appreciated.


Thank you


Christopher R. Bodle

PhD Candidate, University of Iowa

College of Pharmacy

Division of Medicinal and Natural Products Chemistry

115 S. Grand Avenue-Rm. S338

Iowa City, Iowa 52242

(319) 335-7845



------------------------------------------------------------------------------

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to