I found that on the NY Public Library web site, the book is available,
chapter by chapter, as a digital download, if you have a library card. The
host site is at John’s-Hopkins, so check your local library system, which
might also supply access.

-P.

On Wed, Oct 28, 2020 at 12:08 PM Cyrus Maher <cma...@vir.bio> wrote:

> Hi Andrew,
>
> Thank you! This is so thorough, and so helpful. We truly appreciate it.
>
> All the best,
>
> -Cyrus
>
> On 10/27/20, 4:28 AM, "Andrew Dalke" <da...@dalkescientific.com> wrote:
>
>     ** EXTERNAL EMAIL **
>
>
>     On Oct 26, 2020, at 17:41, Cyrus Maher <cma...@vir.bio> wrote:
>     > I’m wondering if there is an easy way to retrieve the atom numbers
> that the morgan fingerprints algorithm assigns as its first step.
>
>     Many of the fingerprint function support an optional "bitInfo"
> parameter. If it's a dictionary then the keys are the bit that was set, and
> the value is at tuple of the (atom index, radius) which set it.
>
>     Here's an example with theobromine using r=0, which lets you see the
> initial invariants:
>
>     >>> from rdkit import Chem
>     >>> from rdkit.Chem import rdMolDescriptors
>     >>> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
>     >>> bitInfo = {}
>     >>> fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, radius=0,
> useFeatures=1, bitInfo=bitInfo)
>     >>> for bitno, pairs in sorted(bitInfo.items()):
>     ...   print(f"Bitno: {bitno}")
>     ...   for atom_idx, r in pairs:
>     ...     print(f"  atom {atom_idx}
> ({mol.GetAtomWithIdx(atom_idx).GetSymbol()}) with radius {r}")
>     ...
>     Bitno: 0
>       atom 0 (C) with radius 0
>       atom 12 (C) with radius 0
>     Bitno: 2
>       atom 7 (O) with radius 0
>       atom 10 (O) with radius 0
>     Bitno: 4
>       atom 2 (C) with radius 0
>       atom 4 (C) with radius 0
>       atom 5 (C) with radius 0
>       atom 6 (C) with radius 0
>       atom 9 (C) with radius 0
>     Bitno: 5
>       atom 8 (N) with radius 0
>     Bitno: 6
>       atom 1 (N) with radius 0
>       atom 3 (N) with radius 0
>       atom 11 (N) with radius 0
>
>     If I follow the code correctly, when useFeatures == 1 then the intial
> invariants are set by getFeatureInvariants() in
> ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp , available at:
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L221&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=b_UdO5RJBZB-KGEyd1F-0g&m=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY&s=tfPrxiPHW2FK-NXmObtRK0ri4Z456d1IlSiKx1tIB9s&e=
>
>     A few lines up, at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L182&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=b_UdO5RJBZB-KGEyd1F-0g&m=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY&s=rd8o6LjWxXd6iezueStsEXFPmvKD2IoPWRz_vCOnPNI&e=
> , you'll see the features patterns defined in smartsPatterns
>
>     They appear to be identical to the list you gave.
>
>     I reimplemented the initialization function (copied at the end of this
> email). Running the program shows that it produces the same invariants
> which are used as the bit numbers in the Morgan feature fingerprint:
>
>     Invariant: 0
>       atom 0 (C)
>       atom 12 (C)
>     Invariant: 2
>       atom 7 (O)
>       atom 10 (O)
>     Invariant: 4
>       atom 2 (C)
>       atom 4 (C)
>       atom 5 (C)
>       atom 6 (C)
>       atom 9 (C)
>     Invariant: 5
>       atom 8 (N)
>     Invariant: 6
>       atom 1 (N)
>       atom 3 (N)
>       atom 11 (N)
>
>
>     I believe that gives you two ways to get the information you want!
>
>     Best regards,
>
>                                     Andrew
>                                     da...@dalkescientific.com
>
>
>
>
>     # Python re-implementation of RDKit's getFeatureInvariants() from
>     # ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp
>
>     from rdkit import Chem
>
>     smartsPatterns = [
>         "[$([N;!H0;v3,v4&+1]),\
>     $([O,S;H1;+0]),\
>     n&H1&+0]",                                                  # Donor
>         "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\
>     $([O,S;H0;v2]),\
>     $([O,S;-]),\
>     $([N;v3;!$(N-*=[O,N,P,S])]),\
>     n&H0&+0,\
>     $([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]",                    # Acceptor
>         "[a]",                                                  # Aromatic
>         "[F,Cl,Br,I]",                                          # Halogen
>         "[#7;+,\
>     $([N;H2&+0][$([C,a]);!$([C,a](=O))]),\
>     $([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\
>     $([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]",  # Basic
>         "[$([C,S](=[O,S,P])-[O;H1,-1])]"                        # Acidic
>         ]
>
>     mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
>     invariants = [0] * mol.GetNumAtoms()
>     for pattern_idx, smartsPattern in enumerate(smartsPatterns):
>         pat = Chem.MolFromSmarts(smartsPattern)
>         for (atom_idx,) in mol.GetSubstructMatches(pat):
>             invariants[atom_idx] |= (1<<pattern_idx)
>
>     # Show all atoms with the same invariant, ordered by invariant
>     from collections import defaultdict
>     by_invariant = defaultdict(list)
>     for atom_idx, invariant in enumerate(invariants):
>         by_invariant[invariant].append(atom_idx)
>     for invariant, atom_indices in sorted(by_invariant.items()):
>         print(f"Invariant: {invariant}")
>         for atom_idx in atom_indices:
>             print(f"  atom {atom_idx} (
> https://urldefense.proofpoint.com/v2/url?u=http-3A__-7Bmol.Ge&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=b_UdO5RJBZB-KGEyd1F-0g&m=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY&s=9p0TLX9AIODjZpvzQA_s-Q3jrUGE6l9exmJiFVhU4ro&e=tAtomWithIdx(atom_idx).GetSymbol()}
> )")
>
>
>
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to