Hi David,

Thanks for pointing out the distinction between “X<number>” and “D<number>” 
which I missed.  So consistent with the Gillet, et al. definition, the AZ 
definition also excludes pyrrole. 

Sorry for the confusion.

Cheers,

Konrad 


> On 08 May 2016, at 18:19, David Cosgrove <davidacosgrov...@gmail.com> wrote:
> 
> Hi Konrad,
> 
> From  the Daylight definition of SMARTS 
> (http://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html 
> <http://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html>): 
> '"X<number>" defines an atom that is connected to <number> other atoms 
> (including all hydrogens)' which would not include pyrrole.  To match a 
> pyrrole nitrogen, should you want to, you'd need something like '[nX3;H]'.  
> It's not mentioned in the Daylight page (not sure why that would be, it's 
> standard syntax, see, for example, 
> https://www.ics.uci.edu/~dock/manuals/oechem/pyprog/node245.html 
> <https://www.ics.uci.edu/~dock/manuals/oechem/pyprog/node245.html>) but there 
> is another atom descriptor D which means explicit (i.e. not counting implicit 
> hydrogens) connection count, such that [nD2] would, indeed, match pyridine 
> and pyrrole.
> 
> Hope this helps,
> Dave
> 
> 
> 
> Cheers,
> Dave
> 
> 
> On Sat, May 7, 2016 at 10:34 AM, Konrad Koehler <konrad.koeh...@icloud.com 
> <mailto:konrad.koeh...@icloud.com>> wrote:
> Hi David,
> 
> Thanks for your helpful response.  Your definitions are very thorough, but 
> probably overkill for my current needs. I will certainly keep it them mind 
> for in the future.  The implementation that you mentioned also seems to have 
> overlooked pyrroles:
> 
> https://github.com/OpenEye-Contrib/Triphic/blob/master/test_dir/test.smt 
> <https://github.com/OpenEye-Contrib/Triphic/blob/master/test_dir/test.smt> 
> 
> Ac4     [nX2] allows pyridines (OK) but also pyrroles (not IMHO OK).
> 
> For now, I just needed something that works for my current dataset.  My 
> dataset contained a large number of indoles and I was surprised that the 
> Gobbi_Pharm2D Acceptor SMARTs target defined indoles as acceptors.  Modifying 
> the definition in my local installation of RDKit gave random forest models 
> that were of equal statistical quality, but the pharmacophores with the 
> highest “feature_importances” were in my opinion much more intuitive.
> 
> It was also suggested that the Gobbi SMARTs definitions were based on 
> experimental data.  I have now carefully read the Gobbi in there is no 
> indication that the pharmacophore definitions themselves were based on 
> experimental data.  Just that pharmacophores were able to enrich active 
> molecules in simulated selection experiments which of course is a valuable 
> test.  It would have been useful to rerun these tests to see how sensitive 
> the enrichments are to the pharmacophore definitions.
> 
> Cheers,
> 
> Konrad
>> On 01 May 2016, at 17:13, David Cosgrove <davidacosgrov...@gmail.com 
>> <mailto:davidacosgrov...@gmail.com>> wrote:
>> 
>> Hi Konrad et al.,
>> 
>> In the process of taking redundancy/early retirement from AstraZeneca this 
>> year, I was allowed to publish various bits of code I had written over my 25 
>> years there.  Embedded within them are several versions of the SMARTS 
>> definitions we used for defining pharmacophore features.  They were written 
>> by Pete Kenny originally, based on experimental studies by Jeff Morris, 
>> Peter Taylor and others and published in the early 80s (I think, I don't 
>> have a reference). They have been refined over the intervening years.  There 
>> is a file at 
>> https://github.com/OpenEye-Contrib/SMG/blob/9f9af38be266ff7dc46e9a33226981e866f9fdd9/test_dir/test.smt
>>  
>> <https://github.com/OpenEye-Contrib/SMG/blob/9f9af38be266ff7dc46e9a33226981e866f9fdd9/test_dir/test.smt>
>>  which gives definitions assuming neutral species on input, and attempts to 
>> predict ionisation at physiological pH, and a file at 
>> https://github.com/OpenEye-Contrib/Triphic/blob/master/test_dir/test.smt 
>> <https://github.com/OpenEye-Contrib/Triphic/blob/master/test_dir/test.smt> 
>> which assumes an ionisation model has already been applied to the input 
>> structures. The latter is therefore somewhat simpler.  The format is a bit 
>> fiddly to use, but is described in the first file fairly clearly.  The main 
>> issue is that they use what Daylight used to call vector bindings and which 
>> these days we might term macros, and these need to be expanded before use.  
>> These files are intended to be used by OpenEye's OEChem toolkit which has a 
>> function (OESmartsLexReplace) which does the expansion, in RDKit you would 
>> have to write your own.  Some of the SMARTS definitions become very large in 
>> expanded format, they're a lot easier to understand with the macros in place.
>> 
>> I think they address the concerns you mention in your emails, although IIRC 
>> they don't handle aromatic c-H bonds as donors.
>> 
>> If you have an OpenEye OEChem license, you might find the programs I 
>> deposited at https://github.com/OpenEye-Contrib 
>> <https://github.com/OpenEye-Contrib> useful.  One of them, for example, 
>> contains code for applying a tautomer/ionisation model that you could use to 
>> prepare input structures to use with the second SMARTS file above.  There 
>> are also programs for doing pharmacophore searches of large databases of 
>> conformations using these pharmacophore definitions.
>> 
>> Hope this helps,
>> Dave
>> 
>> 
>> On Sun, May 1, 2016 at 11:08 AM, Konrad Koehler <konrad.koeh...@icloud.com 
>> <mailto:konrad.koeh...@icloud.com>> wrote:
>> Hi Greg,
>> 
>> Digging around a bit more, I noticed there are at least two published SMARTS 
>> definitions of hydrogen bond acceptor.  The first by Gillet et al. (1998, 
>> see below) that is also found on the Daylight web site and the second by 
>> Gobbi et al. (1998). It appears that both versions are deficient for 
>> different reasons.  Gillet et al. exclude pyrrole nitrogen atoms but not 
>> amide nitrogen atoms whereas Gobbi et al. do the reverse.  I don’t think 
>> either omission was intentional, but rather an oversight.
>> 
>> Both amide nitrogen atoms and pyrrole nitrogen atoms should be excluded from 
>> the definitions for precisely the same reason.  These nitrogen pi electrons 
>> are delocalized and are not available for hydrogen bonding.  Also I agree 
>> with Gillet that halogens and aromatic oxygen and sulfurs should be excluded 
>> since these are exceedingly weak hydrogen bond acceptors.
>> 
>> I also noticed that the HAcceptorSmarts descriptor from Chem.Lipinski also 
>> uses the Gobbi definition, but is implemented slightly differently. In this 
>> version, a pyrrole with a hydrogen attached to the nitrogen atom is excluded 
>> (as it should), but not N-alkyl pyrrole (this IMHO is incorrect).
>> 
>> What I think is needed is a hybrid definition that corrects both 
>> deficiencies.  I am not sure what to call it.  Perhaps Gillet/Gobbi?
>> 
>> Cheers,
>> 
>> Konrad
>> 
>> -------------------------------------------------------------------------------------------------------------
>> 
>> From: 
>> http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html#H_BOND
>>  
>> <http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html#H_BOND>
>> 
>> Hydrogen-bond acceptor
>> [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]
>> A H-bond acceptor is a heteroatom with no positive charge, note that 
>> negatively charged oxygen or sulphur are included. Excluded are halogens, 
>> including F, heteroaromatic oxygen, sulphur and pyrrole N.
>> 
>> Which in turn is taken from:
>> 
>> Identification of biological activity profiles using substructural analysis 
>> and genetic algorithms.
>> Gillet VJ, Willett P, Bradshaw J.
>> J Chem Inf Comput Sci. 1998 Mar-Apr;38(2):165-79.
>> 
>> Quote: HBA is defined as a heteroatom with no positive charge, excluding the 
>> halogens, aromatic oxygen, sulfur, and pyrrole nitrogen and the higher 
>> oxidation levels of nitrogen, phosphorus, and sulfur.
>> 
>> Table 1. SMARTS Definitions for Substructural Features feature
>> HBD [!#6;!H0]
>> HBA [$([!#6;+0]);!$([F,Cl,Br,I]);!$([o,s,nX3]);!$([Nv5,Pv5,Sv4,Sv6])]
>> RB [! $([NH]!@C()O))&!D1&!(*#*)]&!@[!$([NH]!@C()O))!D1&!(*#*)]
>> 
>> >>> from rdkit import Chem
>> >>> p = 
>> >>> Chem.MolFromSmarts('[!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]')
>> >>> m = Chem.MolFromSmiles('c1ccccc1’) # benzene
>> >>> m.HasSubstructMatch(p)
>> False # correct
>> >>> m = Chem.MolFromSmiles('n1ccccc1’) # pyridine
>> >>> m.HasSubstructMatch(p)
>> True # correct
>> >>> m = Chem.MolFromSmiles('[nH]1cccc1') # pyrrole
>> >>> m.HasSubstructMatch(p)
>> False # correct
>> >>> m = Chem.MolFromSmiles('C(=O)N') # amide
>> >>> m.GetSubstructMatches(p)
>> ((1,), (2,)) # correctly matches the oxygen atom but incorrectly matches the 
>> nitrogen atom
>> 
>> -------------------------------------------------------------------------------------------------------------
>> 
>> From: http://www.rdkit.org/Python_Docs/rdkit.Chem.Lipinski-pysrc.html 
>> <http://www.rdkit.org/Python_Docs/rdkit.Chem.Lipinski-pysrc.html>
>> 
>> HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ 
>> $([O,S;H0;v2]),$([O,S;-]),\ 
>> $([N;v3;!$(N-*=!@[O,N,P,S])]),\ 
>> $([nH0,o,s;+0])\ 
>> ]’)
>> 
>> 
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications Manager
>> Applications Manager provides deep performance insights into multiple tiers 
>> of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial!
>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z 
>> <https://ad.doubleclick.net/ddm/clk/302982198;130105516;z>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net 
>> <mailto:Rdkit-discuss@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
>> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
>> 
>> 
> 
> 

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to