Hi George,

On Thu, Sep 29, 2011 at 1:11 PM, George Papadatos <[email protected]> wrote:
> I'd like to calculate the *rooted* Morgan fingerprint for a set of
> molecules. By rooted I mean the subset of the whole-molecule fingerprint
> which contains just the bits which correspond to circular atom layers (up to
> N bond lengths) that include a specific atom.
> So let's say that there is a single Uranium atom in each molecule. What I
> want to calculate is the subset of the Morgan fingerprint (let's say with a
> radius of 3) which contains the bits set on by layers including my U atom.
> This should include not only the bits where U was the root of the layer, but
> also the bits where U was in the layer of neighboring atoms, up to 3 bonds
> away.

A minor point: I wouldn't call this the rooted fingerprint since it
includes bits that are set by layers that are not centered at your U
atom.

> After checking the super-helpful "Getting Started with the RDKit in Python"
> (Q2 2011) tutorial, section 5.4.1, I can see one way of doing this:
> calculating the Morgan fp and then enumerating all the sub-molecules (or
> layers) that set the corresponding bits on and then checking if U is in any
> one of these submolecules. If it is then the corresponding bit is part of
> the root Morgan fp.
> Is there any other more efficient way???

If you only want the bits that are set by a particular atom (i.e.
those that are centered at that atom), you can use the fromAtoms
argument:
>>> from rdkit import Chem
>>> from rdkit.Chem import rdMolDescriptors
>>> m1 = Chem.MolFromSmiles('Cc1ccccc1')
>>> m2 = Chem.MolFromSmiles('Cc1c(C)cccc1')
>>> rdMolDescriptors.GetMorganFingerprint(m1,1,fromAtoms=[0]).GetNonzeroElements()
{2246728737: 1, 422715066: 1}
>>> rdMolDescriptors.GetMorganFingerprint(m1,2,fromAtoms=[0]).GetNonzeroElements()
{2246728737: 1, 422715066: 1, 2218109011: 1}
>>> rdMolDescriptors.GetMorganFingerprint(m2,1,fromAtoms=[0]).GetNonzeroElements()
{2246728737: 1, 422715066: 1}
>>> rdMolDescriptors.GetMorganFingerprint(m2,2,fromAtoms=[0]).GetNonzeroElements()
{2246728737: 1, 422715066: 1, 2368203427: 1}

Note that I just fixed a bug that was leading to missing bits in the
morgan fingerprints generated with a fromAtoms argument.

If you want all bits that the atom is involved in, I would suggest
using the fromAtoms argument, but also including all the atoms that
are within the appropriate radius of your atom. You can find these
atoms using the molecule's distance matrix:
>>> m1 = Chem.MolFromSmiles('Cc1ccccc1')
>>> dm=Chem.GetDistanceMatrix(m1)
>>> dm
array([[ 0.,  1.,  2.,  3.,  4.,  3.,  2.],
       [ 1.,  0.,  1.,  2.,  3.,  2.,  1.],
       [ 2.,  1.,  0.,  1.,  2.,  3.,  2.],
       [ 3.,  2.,  1.,  0.,  1.,  2.,  3.],
       [ 4.,  3.,  2.,  1.,  0.,  1.,  2.],
       [ 3.,  2.,  3.,  2.,  1.,  0.,  1.],
       [ 2.,  1.,  2.,  3.,  2.,  1.,  0.]])


I hope this helps,
-greg

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to