On Tue, Aug 14, 2012 at 11:43 AM, JP <[email protected]> wrote:
>
> Anyway enough of the blabber.  I am using the feature definition file
> in RDKit and was wondering why the order of the rules in the file
> makes a difference.
>
> So
>
> AtomType NAcceptor C[N;H0]=C
> AtomType NAcceptor [N&v3;H0;$(Nc)]
>
> Gives different results than
>
> AtomType NAcceptor [N&v3;H0;$(Nc)]
> AtomType NAcceptor C[N;H0]=C
>
> These are different rules affecting different chemotypes...  why does
> the above find the CN=C acceptor feature and the below does not?

The short answer is that you're using the wrong SMARTS. An AtomType
definition should match a single Atom. What I think you mean here is:

AtomType NAcceptor [N&v3;H0;$(Nc)]
AtomType NAcceptor [$(N(C)=C)]

Here's a demonstration that using this makes the order dependence go away:

In [31]: fdf="""AtomType NAcceptor3 [N&v3;H0;$(Nc)]
   ....: AtomType NAcceptor3 [$(N(C)=C)]
   ....: DefineFeature SingleAtomAcceptor3 [{NAcceptor3}]
   ....:   Family Acceptor3
   ....:   Weights 1
   ....: EndFeature
   ....:
   ....: AtomType NAcceptor4 [$(N(C)=C)]
   ....: AtomType NAcceptor4 [N&v3;H0;$(Nc)]
   ....: DefineFeature SingleAtomAcceptor3 [{NAcceptor4}]
   ....:   Family Acceptor4
   ....:   Weights 1
   ....: EndFeature
   ....: """

In [32]: m = Chem.MolFromSmiles('CN=C')

In [33]: ff = AllChem.BuildFeatureFactoryFromString(fdf)

In [34]: feats=ff.GetFeaturesForMol(m)

In [35]: [x.GetFamily() for x in feats]
Out[35]: ['Acceptor3', 'Acceptor4']

Hopefully that gets your code working. You may want to stop reading here. :-)


Here's what happens when I do the same thing with your definitions:

In [36]: fdf="""AtomType NAcceptor1 C[N;H0]=C
   ....: AtomType NAcceptor1 [N&v3;H0;$(Nc)]
   ....: DefineFeature SingleAtomAcceptor1 [{NAcceptor1}]
   ....:   Family Acceptor1
   ....:   Weights 1
   ....: EndFeature
   ....:
   ....: AtomType NAcceptor2 [N&v3;H0;$(Nc)]
   ....: AtomType NAcceptor2 C[N;H0]=C
   ....: DefineFeature SingleAtomAcceptor2 [{NAcceptor2}]
   ....:   Family Acceptor2
   ....:   Weights 1
   ....: EndFeature
   ....: """

In [37]: ff = AllChem.BuildFeatureFactoryFromString(fdf)

In [38]: feats=ff.GetFeaturesForMol(m)

In [39]: [x.GetFamily() for x in feats]
Out[39]: ['Acceptor1']

This is the behavior you were seeing.

To understand why this happens, you need to look at the SMARTS that
ends up being produced for each of your feature definitions:

In [40]: for k,v in ff.GetFeatureDefs().iteritems(): print k,v
Acceptor1.SingleAtomAcceptor1 [$(C[N;H0,$([N&v3;H0;$(Nc)])]=C)]
Acceptor2.SingleAtomAcceptor2 [$([N&v3;H0;$(Nc),$(C[N;H0]=C)])]

The fdef parser combines the different atom type defintions with each
other based on the assumption that each defines a single atom using
simple string manipulations. It's really expecting your AtomType
definition to start and end with a square bracket.  It should be
testing for that, but it's not.

-greg

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to