Re: [Rdkit-discuss] Conformer generation does not sample well?

2016-06-29 Thread Steven Kearnes
These papers may be relevant:

   - http://pubs.acs.org/doi/full/10.1021/ci2004658
   - https://jcheminf.springeropen.com/articles/10.1186/1758-2946-3-4


On Wed, Jun 29, 2016 at 1:29 AM, Sereina  wrote:

> Hi Tim,
>
> I had a look at the 1DWD example and I detail in the following the
> analysis I did as it may be useful to other users.
>
> The conformer generation function has the option to print the experimental
> torsion preferences that were used in the generation:
> printExpTorsionAngles=True
>
> For 1DWD, it gives the following output:
>
> > ref = Chem.MolFromSmiles('NC(=[NH2+])c1ccc(C[C@
> @H](NC(=O)CNS(=O)(=O)c2ccc3c3c2)C(=O)N2C2)cc1’)
> > mol1 =
> Chem.MolFromPDBFile(RDConfig.RDBaseDir+'/rdkit/Chem/test_data/1DWD_ligand.pdb')
> > mol1 = AllChem.AssignBondOrdersFromTemplate(ref, mol1)
> > mol1 = Chem.AddHs(mol1)
> > numConfs = 100
> > cids = AllChem.EmbedMultipleConfs(mol1, numConfs=numConfs,
> useExpTorsionAnglePrefs=True, useBasicKnowledge=True,
> printExpTorsionAngles=True)
> O=[S:1](=O)[NX3H1:2]!@;-[CX4H2:3][!#1:4]: 0 13 14 15, (17.9, -13.3, 9.2,
> 4.7, -2.3, -0.9)
> O=[C:1][NX3H1:2]!@;-[CX4H1:3][H:4]: 15 17 18 48, (5, 0, 0, 0, 0, 0)
> [O:1]=[CX3:2]!@;-[NX3H0:3][!#1:4]: 20 19 31 32, (0, 8, 0, 0, 0, 0)
> [O:1]=[CX3:2]!@;-[NX3H1:3][!#1:4]: 16 15 17 18, (100, 0, 0, 0, 0, 0)
> [c:1][S:2](=O)(=O)!@;-[NX3H1:3][C:4]: 4 0 13 14, (0, 16, 5, 7, 0, 0)
> [aH1:1][c:2]([aH1])!@;-[SX4:3][!#1:4]: 3 4 0 13, (0, 1.5, 0, 0, 0, 0)
> N[C:2](=[O:1])!@;-[CH2:3][N:4]: 16 15 14 13, (0, 10, 0, 0, 0, 0)
> [!#1:1][CX4:2]!@;-[CX4:3][!#1:4]: 17 18 21 22, (0, 0, 7, 0, 0, 0)
> [NH1:1][CX4:2]!@;-[CX3:3]=[O:4]: 17 18 19 20, (0, 2.1, -0.1, -1, -0.4,
> -0.1)
> [cH1:1][c:2]([cH1])!@;-[CX4H2:3][CX4:4]: 23 22 21 18, (0, 3.6, 0, 0, 0, 0)
> [a:1][c:2]!@;-[C:3](=[N:4]): 25 27 28 30, (0, 5, 0, 0, 0, 0)
> [*:1][X3,X2:2]=[X3,X2:3][*:4]: 27 28 30 57, (0, 100, 0, 0, 0, 0)
>
> The output is structured as follows: First comes the SMARTS pattern, then
> the indices of the four atoms involved and finally the six force constants
> K_1, K_2, K_3, K_4, K_5, K_6 for the torsion potential (see Eq. (2) in
> JCIM, 55, 2562, 2015).
> The position of the non-zero force constant tells you the multiplicity i
> of the cosine. E.g. for the torsion between atoms 18 and 21 the third force
> constant is 7.0 and all others are zero, thus the torsion potential for
> this bond has a multiplicity i = 3 (i.e. three maxima). The information
> about the phase shift is not exposed, but we can change that if people find
> it useful.
> Currently to get the full information about the torsion potential, you
> need to search for the SMARTS pattern in
> Code/GraphMol/ForceFieldHelpers/CrystalFF/TorsionPreferences.cpp
> There you find the line: "[!#1:1][CX4:2]!@;-[CX4:3][!#1:4] 1 0.0 1 0.0 1
> 7.0 1 0.0 1 0.0 1 0.0\n"
> Here we have twelve parameters, always two for a given multiplicity: first
> the phase shift (can be 1 or -1) and second the force constant.
> For this particular torsion pattern we have a multiplicity of 3, a phase
> shift of cos(d) = 1 and a force constant K_3 = 7.0, which corresponds to
> three peaks at 60°, 180° and 300° in the range [0°, 360°] (or -60°, 60° and
> 180° in the range [-180°, 180°]).
>
> The two bonds connecting the benzamidine ring to the rest of the molecule
> are between atoms 22/21 and 21/18.
> I calculated the signed dihedral angle for these two bonds, here is the
> code for the second one:
>
> > angles = []
> > for cid in cids:
> >conf = mol1.GetConformer(id=cid)
> ># torsion of atoms 17 18 21 22
> >p1 = conf.GetAtomPosition(17)
> >p2 = conf.GetAtomPosition(18)
> >p3 = conf.GetAtomPosition(21)
> >p4 = conf.GetAtomPosition(22)
> >a = Geometry.ComputeSignedDihedralAngle(p1, p2, p3, p4)/math.pi *
> 180.0
> >angles.append(a)
>
> For 100 confs, this gives the following distribution:
> So, the torsion potential is sampled as it is supposed to do.
>
>
> For the other bond between atmos 22/21, we have the following line in
> TorsionPreferences.cpp:
> "[cH1:1][c:2]([cH1])!@;-[CX4H2:3][CX4:4] 1 0.0 1 3.6 1 0.0 1 0.0 1 0.0 1
> 0.0\n”
> This means, we have a multiplicity of 2, a phase shift cos(d) = 1 and a
> force constant of 3.6, which corresponds to two peaks at -120° and 120° in
> the range [-180°, 180°].
> The histogram looks like:
> Again, the potential is largely sampled as it’s supposed to be.
>
> Now, let’s have a look at the two bonds connecting atom 18 to the other
> two branches.
> For the bond between atoms 18/17, we should have a single peak at 0°,
> which we get most of the time
>
> The bond between atoms 18/19 has a multi-term potential, which is a bit
> more difficult to interpret from the numbers alone.
> "[NH1:1][CX4:2]!@;-[CX3:3]=[O:4] 1 0.0 -1 2.1 -1 -0.1 -1 -1.0 1 -0.4 1
> -0.1\n"
> It has three peaks at -150°, 0° and 150° in the range [-180°, 180°]. The
> peak at 0° is broad.
> The histogram looks like:
> For the 100 conformers, the peaks at -150° and 150° 

Re: [Rdkit-discuss] Conformer generation does not sample well?

2016-06-29 Thread Sereina
Hi Tim,

I had a look at the 1DWD example and I detail in the following the analysis I 
did as it may be useful to other users.

The conformer generation function has the option to print the experimental 
torsion preferences that were used in the generation:
printExpTorsionAngles=True

For 1DWD, it gives the following output:

> ref = 
> Chem.MolFromSmiles('NC(=[NH2+])c1ccc(C[C@@H](NC(=O)CNS(=O)(=O)c2ccc3c3c2)C(=O)N2C2)cc1’)
> mol1 = 
> Chem.MolFromPDBFile(RDConfig.RDBaseDir+'/rdkit/Chem/test_data/1DWD_ligand.pdb')
> mol1 = AllChem.AssignBondOrdersFromTemplate(ref, mol1)
> mol1 = Chem.AddHs(mol1)
> numConfs = 100
> cids = AllChem.EmbedMultipleConfs(mol1, numConfs=numConfs, 
> useExpTorsionAnglePrefs=True, useBasicKnowledge=True, 
> printExpTorsionAngles=True)
O=[S:1](=O)[NX3H1:2]!@;-[CX4H2:3][!#1:4]: 0 13 14 15, (17.9, -13.3, 9.2, 4.7, 
-2.3, -0.9) 
O=[C:1][NX3H1:2]!@;-[CX4H1:3][H:4]: 15 17 18 48, (5, 0, 0, 0, 0, 0) 
[O:1]=[CX3:2]!@;-[NX3H0:3][!#1:4]: 20 19 31 32, (0, 8, 0, 0, 0, 0) 
[O:1]=[CX3:2]!@;-[NX3H1:3][!#1:4]: 16 15 17 18, (100, 0, 0, 0, 0, 0) 
[c:1][S:2](=O)(=O)!@;-[NX3H1:3][C:4]: 4 0 13 14, (0, 16, 5, 7, 0, 0) 
[aH1:1][c:2]([aH1])!@;-[SX4:3][!#1:4]: 3 4 0 13, (0, 1.5, 0, 0, 0, 0) 
N[C:2](=[O:1])!@;-[CH2:3][N:4]: 16 15 14 13, (0, 10, 0, 0, 0, 0) 
[!#1:1][CX4:2]!@;-[CX4:3][!#1:4]: 17 18 21 22, (0, 0, 7, 0, 0, 0) 
[NH1:1][CX4:2]!@;-[CX3:3]=[O:4]: 17 18 19 20, (0, 2.1, -0.1, -1, -0.4, -0.1) 
[cH1:1][c:2]([cH1])!@;-[CX4H2:3][CX4:4]: 23 22 21 18, (0, 3.6, 0, 0, 0, 0) 
[a:1][c:2]!@;-[C:3](=[N:4]): 25 27 28 30, (0, 5, 0, 0, 0, 0) 
[*:1][X3,X2:2]=[X3,X2:3][*:4]: 27 28 30 57, (0, 100, 0, 0, 0, 0)

The output is structured as follows: First comes the SMARTS pattern, then the 
indices of the four atoms involved and finally the six force constants K_1, 
K_2, K_3, K_4, K_5, K_6 for the torsion potential (see Eq. (2) in JCIM, 55, 
2562, 2015).
The position of the non-zero force constant tells you the multiplicity i of the 
cosine. E.g. for the torsion between atoms 18 and 21 the third force constant 
is 7.0 and all others are zero, thus the torsion potential for this bond has a 
multiplicity i = 3 (i.e. three maxima). The information about the phase shift 
is not exposed, but we can change that if people find it useful. 
Currently to get the full information about the torsion potential, you need to 
search for the SMARTS pattern in 
Code/GraphMol/ForceFieldHelpers/CrystalFF/TorsionPreferences.cpp
There you find the line: "[!#1:1][CX4:2]!@;-[CX4:3][!#1:4] 1 0.0 1 0.0 1 7.0 1 
0.0 1 0.0 1 0.0\n"
Here we have twelve parameters, always two for a given multiplicity: first the 
phase shift (can be 1 or -1) and second the force constant.
For this particular torsion pattern we have a multiplicity of 3, a phase shift 
of cos(d) = 1 and a force constant K_3 = 7.0, which corresponds to three peaks 
at 60°, 180° and 300° in the range [0°, 360°] (or -60°, 60° and 180° in the 
range [-180°, 180°]).

The two bonds connecting the benzamidine ring to the rest of the molecule are 
between atoms 22/21 and 21/18.
I calculated the signed dihedral angle for these two bonds, here is the code 
for the second one:

> angles = []
> for cid in cids:
>conf = mol1.GetConformer(id=cid) 
># torsion of atoms 17 18 21 22
>p1 = conf.GetAtomPosition(17)
>p2 = conf.GetAtomPosition(18)
>p3 = conf.GetAtomPosition(21)
>p4 = conf.GetAtomPosition(22)
>a = Geometry.ComputeSignedDihedralAngle(p1, p2, p3, p4)/math.pi * 180.0
>angles.append(a)

For 100 confs, this gives the following distribution:

So, the torsion potential is sampled as it is supposed to do. 


For the other bond between atmos 22/21, we have the following line in 
TorsionPreferences.cpp:
"[cH1:1][c:2]([cH1])!@;-[CX4H2:3][CX4:4] 1 0.0 1 3.6 1 0.0 1 0.0 1 0.0 1 0.0\n”
This means, we have a multiplicity of 2, a phase shift cos(d) = 1 and a force 
constant of 3.6, which corresponds to two peaks at -120° and 120° in the range 
[-180°, 180°].
The histogram looks like:

Again, the potential is largely sampled as it’s supposed to be.

Now, let’s have a look at the two bonds connecting atom 18 to the other two 
branches.
For the bond between atoms 18/17, we should have a single peak at 0°, which we 
get most of the time


The bond between atoms 18/19 has a multi-term potential, which is a bit more 
difficult to interpret from the numbers alone.
"[NH1:1][CX4:2]!@;-[CX3:3]=[O:4] 1 0.0 -1 2.1 -1 -0.1 -1 -1.0 1 -0.4 1 -0.1\n"
It has three peaks at -150°, 0° and 150° in the range [-180°, 180°]. The peak 
at 0° is broad.

The histogram looks like:

For the 100 conformers, the peaks at -150° and 150° are not sampled. I 
therefore generated 1000 conformers, but the sampling does not really improve:


The algorithm of ETKDG takes a randomly generated conformer from standard 
distance geometry and minimizes it with the ET and K terms. For each bond with 
an exp. torsion term, the torsional angle will be minimized to the closest 
minima. In other words, ETKDG relies 

[Rdkit-discuss] Conformer generation does not sample well?

2016-06-22 Thread Tim Dudgeon
This topic (https://sourceforge.net/p/rdkit/mailman/message/35173301/) 
discussed using conformer generation as input into Open3DAlign.

One thing I noticed is that the conformer generation (using the 
useExpTorsionAnglePrefs=True and
useBasicKnowledge=True options) does not generate conformers that align 
well for this example. The input is based on the 1DWD_ligand.pdb 
structure in the RDKit distro. What I find is the the rotation of the 
benzamidine ring is never in the right place for alignment (the other 
two ring systems align well). This is when generating up to 10,000 
conformers.

Does this suggest that the conformer generation does not sample 
conformational space very effectively? Are there options for improving this?

Thanks

Tim



--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss