I found that on the NY Public Library web site, the book is available, chapter by chapter, as a digital download, if you have a library card. The host site is at John’s-Hopkins, so check your local library system, which might also supply access.
-P. On Wed, Oct 28, 2020 at 12:08 PM Cyrus Maher <cma...@vir.bio> wrote: > Hi Andrew, > > Thank you! This is so thorough, and so helpful. We truly appreciate it. > > All the best, > > -Cyrus > > On 10/27/20, 4:28 AM, "Andrew Dalke" <da...@dalkescientific.com> wrote: > > ** EXTERNAL EMAIL ** > > > On Oct 26, 2020, at 17:41, Cyrus Maher <cma...@vir.bio> wrote: > > I’m wondering if there is an easy way to retrieve the atom numbers > that the morgan fingerprints algorithm assigns as its first step. > > Many of the fingerprint function support an optional "bitInfo" > parameter. If it's a dictionary then the keys are the bit that was set, and > the value is at tuple of the (atom index, radius) which set it. > > Here's an example with theobromine using r=0, which lets you see the > initial invariants: > > >>> from rdkit import Chem > >>> from rdkit.Chem import rdMolDescriptors > >>> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C") > >>> bitInfo = {} > >>> fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, radius=0, > useFeatures=1, bitInfo=bitInfo) > >>> for bitno, pairs in sorted(bitInfo.items()): > ... print(f"Bitno: {bitno}") > ... for atom_idx, r in pairs: > ... print(f" atom {atom_idx} > ({mol.GetAtomWithIdx(atom_idx).GetSymbol()}) with radius {r}") > ... > Bitno: 0 > atom 0 (C) with radius 0 > atom 12 (C) with radius 0 > Bitno: 2 > atom 7 (O) with radius 0 > atom 10 (O) with radius 0 > Bitno: 4 > atom 2 (C) with radius 0 > atom 4 (C) with radius 0 > atom 5 (C) with radius 0 > atom 6 (C) with radius 0 > atom 9 (C) with radius 0 > Bitno: 5 > atom 8 (N) with radius 0 > Bitno: 6 > atom 1 (N) with radius 0 > atom 3 (N) with radius 0 > atom 11 (N) with radius 0 > > If I follow the code correctly, when useFeatures == 1 then the intial > invariants are set by getFeatureInvariants() in > ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp , available at: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L221&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=b_UdO5RJBZB-KGEyd1F-0g&m=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY&s=tfPrxiPHW2FK-NXmObtRK0ri4Z456d1IlSiKx1tIB9s&e= > > A few lines up, at > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L182&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=b_UdO5RJBZB-KGEyd1F-0g&m=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY&s=rd8o6LjWxXd6iezueStsEXFPmvKD2IoPWRz_vCOnPNI&e= > , you'll see the features patterns defined in smartsPatterns > > They appear to be identical to the list you gave. > > I reimplemented the initialization function (copied at the end of this > email). Running the program shows that it produces the same invariants > which are used as the bit numbers in the Morgan feature fingerprint: > > Invariant: 0 > atom 0 (C) > atom 12 (C) > Invariant: 2 > atom 7 (O) > atom 10 (O) > Invariant: 4 > atom 2 (C) > atom 4 (C) > atom 5 (C) > atom 6 (C) > atom 9 (C) > Invariant: 5 > atom 8 (N) > Invariant: 6 > atom 1 (N) > atom 3 (N) > atom 11 (N) > > > I believe that gives you two ways to get the information you want! > > Best regards, > > Andrew > da...@dalkescientific.com > > > > > # Python re-implementation of RDKit's getFeatureInvariants() from > # ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp > > from rdkit import Chem > > smartsPatterns = [ > "[$([N;!H0;v3,v4&+1]),\ > $([O,S;H1;+0]),\ > n&H1&+0]", # Donor > "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\ > $([O,S;H0;v2]),\ > $([O,S;-]),\ > $([N;v3;!$(N-*=[O,N,P,S])]),\ > n&H0&+0,\ > $([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]", # Acceptor > "[a]", # Aromatic > "[F,Cl,Br,I]", # Halogen > "[#7;+,\ > $([N;H2&+0][$([C,a]);!$([C,a](=O))]),\ > $([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\ > $([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]", # Basic > "[$([C,S](=[O,S,P])-[O;H1,-1])]" # Acidic > ] > > mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C") > invariants = [0] * mol.GetNumAtoms() > for pattern_idx, smartsPattern in enumerate(smartsPatterns): > pat = Chem.MolFromSmarts(smartsPattern) > for (atom_idx,) in mol.GetSubstructMatches(pat): > invariants[atom_idx] |= (1<<pattern_idx) > > # Show all atoms with the same invariant, ordered by invariant > from collections import defaultdict > by_invariant = defaultdict(list) > for atom_idx, invariant in enumerate(invariants): > by_invariant[invariant].append(atom_idx) > for invariant, atom_indices in sorted(by_invariant.items()): > print(f"Invariant: {invariant}") > for atom_idx in atom_indices: > print(f" atom {atom_idx} ( > https://urldefense.proofpoint.com/v2/url?u=http-3A__-7Bmol.Ge&d=DwIFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=b_UdO5RJBZB-KGEyd1F-0g&m=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY&s=9p0TLX9AIODjZpvzQA_s-Q3jrUGE6l9exmJiFVhU4ro&e=tAtomWithIdx(atom_idx).GetSymbol()} > )") > > > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- -P. Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss