Re: [Rdkit-discuss] Strange MMFF94 Optimization Results?

2017-11-03 Thread Patrick Avery
I started using the SD format instead, and it fixed my problems (the
molecules now have correct bond orders and they are MMFF94 optimized
properly).

Thanks for the help!

On Thu, Nov 2, 2017 at 3:21 PM, Paolo Tosco  wrote:

> Hi Patrick,
>
> I don't know Avogadro in detail, but as it is based on OpenBabel I can
> imagine it uses an algorithm to guess bond orders from bond distances,
> angles, etc., whereas the RDKit does not. In fact, if I convert a benzene
> ring from SMILES to SDF through PDB format (which does not contain any bond
> information once CONECT records are removed), I still get correct bond
> orders; this does not happen with RDKit (which is expected):
>
> echo 'c1c1' | babel --gen3d -i smi -o pdb | grep -v CONECT | babel -i
> pdb -o sdf
>
> yields
>
>
>  OpenBabel11021718533D
>
>  12 12  0  0  0  0  0  0  0  0999 V2000
>-0.76001.1690   -0.0010 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.63301.2450   -0.0010 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.39500.07700. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.7640   -1.16800.0030 C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.6290   -1.24300. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.3910   -0.0750   -0.0020 C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.35402.07900.0010 H   0  0  0  0  0  0  0  0  0  0  0  0
> 1.12402.2140   -0.0030 H   0  0  0  0  0  0  0  0  0  0  0  0
> 2.48000.1350   -0. H   0  0  0  0  0  0  0  0  0  0  0  0
> 1.3580   -2.07800.0060 H   0  0  0  0  0  0  0  0  0  0  0  0
>-1.1200   -2.2130   -0. H   0  0  0  0  0  0  0  0  0  0  0  0
>-2.4760   -0.1340   -0.0030 H   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0  0  0  0
>   1  7  1  0  0  0  0
>   2  3  2  0  0  0  0
>   3  9  1  0  0  0  0
>   3  4  1  0  0  0  0
>   4 10  1  0  0  0  0
>   5 11  1  0  0  0  0
>   5  4  2  0  0  0  0
>   6  1  2  0  0  0  0
>   6  5  1  0  0  0  0
>   8  2  1  0  0  0  0
>  12  6  1  0  0  0  0
> M  END
>
> whereas:
>
> mol = Chem.AddHs(Chem.MolFromSmiles('c1c1'))
> rdDistGeom.EmbedMolecule(mol)
> molFromPDB = Chem.MolFromPDBBlock(re.sub('CONECT.*\n', '',
> Chem.MolToPDBBlock(mol), flags=re.MULTILINE), removeHs = False)
> print (Chem.MolToMolBlock(molFromPDB))
>
> yields
>
>
>  RDKit  3D
>
>  12 12  0  0  0  0  0  0  0  0999 V2000
>-0.0060   -1.36200.0590 C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.2180   -0.70800.0690 C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.20900.6740   -0.0030 C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.00301.36200.0280 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.22100.70900.0760 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.1960   -0.67800. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.0250   -2.4100   -0.2260 H   0  0  0  0  0  0  0  0  0  0  0  0
>-2.1060   -1.17600.4870 H   0  0  0  0  0  0  0  0  0  0  0  0
>-2.09101.1950   -0.3790 H   0  0  0  0  0  0  0  0  0  0  0  0
> 0.04202.4080   -0.2780 H   0  0  0  0  0  0  0  0  0  0  0  0
> 2.10301.18300.5000 H   0  0  0  0  0  0  0  0  0  0  0  0
> 2.0960   -1.1980   -0.3330 H   0  0  0  0  0  0  0  0  0  0  0  0
>   2  1  1  0
>   3  2  1  0
>   4  3  1  0
>   5  4  1  0
>   6  5  1  0
>   6  1  1  0
>   7  1  1  0
>   8  2  1  0
>   9  3  1  0
>  10  4  1  0
>  11  5  1  0
>  12  6  1  0
> M  END
>
> Note that the RDKit gets the correct bond orders if the PDB file contains
> CONECT records with duplicate (or triplicate) entries for the relevant atom
> indices, which is the widely accepted (though non official) way to encode
> bond orders in PDB files.
>
> Cheers,
> p.
>
> On 11/02/17 18:45, Patrick Avery wrote:
>
> Yes, that is probably correct. I am loading a pdb file for the initial
> conformer, so bond orders are not specified.
>
> But I find it strange, still, that when I use MMFF94 in Avogadro to
> optimize it, it results in a planar shape even though all the bonds are
> still single.
>
> On Thu, Nov 2, 2017 at 2:38 PM, Paolo Tosco  wrote:
>
>> Dear Patrick,
>>
>> my guess is that you loaded the caffeine coordinates from PDB, or anyway
>> from a format where bond orders were not specified. All atoms appear to be
>> sp3-hybridized, which results in the wrong geometry being generated.
>>
>> Hope that helps,
>> Paolo
>>
>> On 11/02/17 18:20, Patrick Avery wrote:
>>
>> Hey there RDKitters,
>>
>> I have been generating conformers in RDKit, optimizing them using MMFF94,
>> and sorting them by their energies. For some tests, I have been using
>> caffeine. But I seem to have some strange results, and I wonder if anyone
>> knows why.
>>
>> I have attached two images of an MMFF94 optimized caffeine conformer. In
>> them, the oxygen in the top right corner is the primary strange thing I see
>> (although it doesn't seem to be very planar either, which may be somewhat
>> strange).
>>
>> Note that the oxygen is 

Re: [Rdkit-discuss] Strange MMFF94 Optimization Results?

2017-11-02 Thread Paolo Tosco

Hi Patrick,

I don't know Avogadro in detail, but as it is based on OpenBabel I can 
imagine it uses an algorithm to guess bond orders from bond distances, 
angles, etc., whereas the RDKit does not. In fact, if I convert a 
benzene ring from SMILES to SDF through PDB format (which does not 
contain any bond information once CONECT records are removed), I still 
get correct bond orders; this does not happen with RDKit (which is 
expected):


echo 'c1c1' | babel --gen3d -i smi -o pdb | grep -v CONECT | babel 
-i pdb -o sdf


yields


 OpenBabel11021718533D

 12 12  0  0  0  0  0  0  0  0999 V2000
   -0.76001.1690   -0.0010 C   0  0  0  0  0  0  0 0  0  0  0  0
0.63301.2450   -0.0010 C   0  0  0  0  0  0  0 0  0  0  0  0
1.39500.07700. C   0  0  0  0  0  0  0 0  0  0  0  0
0.7640   -1.16800.0030 C   0  0  0  0  0  0  0 0  0  0  0  0
   -0.6290   -1.24300. C   0  0  0  0  0  0  0 0  0  0  0  0
   -1.3910   -0.0750   -0.0020 C   0  0  0  0  0  0  0 0  0  0  0  0
   -1.35402.07900.0010 H   0  0  0  0  0  0  0 0  0  0  0  0
1.12402.2140   -0.0030 H   0  0  0  0  0  0  0 0  0  0  0  0
2.48000.1350   -0. H   0  0  0  0  0  0  0 0  0  0  0  0
1.3580   -2.07800.0060 H   0  0  0  0  0  0  0 0  0  0  0  0
   -1.1200   -2.2130   -0. H   0  0  0  0  0  0  0 0  0  0  0  0
   -2.4760   -0.1340   -0.0030 H   0  0  0  0  0  0  0 0  0  0  0  0
  1  2  1  0  0  0  0
  1  7  1  0  0  0  0
  2  3  2  0  0  0  0
  3  9  1  0  0  0  0
  3  4  1  0  0  0  0
  4 10  1  0  0  0  0
  5 11  1  0  0  0  0
  5  4  2  0  0  0  0
  6  1  2  0  0  0  0
  6  5  1  0  0  0  0
  8  2  1  0  0  0  0
 12  6  1  0  0  0  0
M  END

whereas:

mol = Chem.AddHs(Chem.MolFromSmiles('c1c1'))
rdDistGeom.EmbedMolecule(mol)
molFromPDB = Chem.MolFromPDBBlock(re.sub('CONECT.*\n', '', 
Chem.MolToPDBBlock(mol), flags=re.MULTILINE), removeHs = False)

print (Chem.MolToMolBlock(molFromPDB))

yields


 RDKit  3D

 12 12  0  0  0  0  0  0  0  0999 V2000
   -0.0060   -1.36200.0590 C   0  0  0  0  0  0  0 0  0  0  0  0
   -1.2180   -0.70800.0690 C   0  0  0  0  0  0  0 0  0  0  0  0
   -1.20900.6740   -0.0030 C   0  0  0  0  0  0  0 0  0  0  0  0
   -0.00301.36200.0280 C   0  0  0  0  0  0  0 0  0  0  0  0
1.22100.70900.0760 C   0  0  0  0  0  0  0 0  0  0  0  0
1.1960   -0.67800. C   0  0  0  0  0  0  0 0  0  0  0  0
   -0.0250   -2.4100   -0.2260 H   0  0  0  0  0  0  0 0  0  0  0  0
   -2.1060   -1.17600.4870 H   0  0  0  0  0  0  0 0  0  0  0  0
   -2.09101.1950   -0.3790 H   0  0  0  0  0  0  0 0  0  0  0  0
0.04202.4080   -0.2780 H   0  0  0  0  0  0  0 0  0  0  0  0
2.10301.18300.5000 H   0  0  0  0  0  0  0 0  0  0  0  0
2.0960   -1.1980   -0.3330 H   0  0  0  0  0  0  0 0  0  0  0  0
  2  1  1  0
  3  2  1  0
  4  3  1  0
  5  4  1  0
  6  5  1  0
  6  1  1  0
  7  1  1  0
  8  2  1  0
  9  3  1  0
 10  4  1  0
 11  5  1  0
 12  6  1  0
M  END

Note that the RDKit gets the correct bond orders if the PDB file 
contains CONECT records with duplicate (or triplicate) entries for the 
relevant atom indices, which is the widely accepted (though non 
official) way to encode bond orders in PDB files.


Cheers,
p.


On 11/02/17 18:45, Patrick Avery wrote:
Yes, that is probably correct. I am loading a pdb file for the initial 
conformer, so bond orders are not specified.


But I find it strange, still, that when I use MMFF94 in Avogadro to 
optimize it, it results in a planar shape even though all the bonds 
are still single.


On Thu, Nov 2, 2017 at 2:38 PM, Paolo Tosco > wrote:


Dear Patrick,

my guess is that you loaded the caffeine coordinates from PDB, or
anyway from a format where bond orders were not specified. All
atoms appear to be sp3-hybridized, which results in the wrong
geometry being generated.

Hope that helps,
Paolo


On 11/02/17 18:20, Patrick Avery wrote:

Hey there RDKitters,

I have been generating conformers in RDKit, optimizing them using
MMFF94, and sorting them by their energies. For some tests, I
have been using caffeine. But I seem to have some strange
results, and I wonder if anyone knows why.

I have attached two images of an MMFF94 optimized caffeine
conformer. In them, the oxygen in the top right corner is the
primary strange thing I see (although it doesn't seem to be very
planar either, which may be somewhat strange).

Note that the oxygen is sticking out of the plane of the
molecule. This is the lowest energy conformer generated and
optimized by RDKit (and other conformers low in energy are
similar to it - with the oxygen sticking out).

If I take that exact molecule and MMFF94 optimize it in Avogadro,
all the atoms move to be in the same plane (including the oxygen
that was sticking out). So the problem goes 

Re: [Rdkit-discuss] Strange MMFF94 Optimization Results?

2017-11-02 Thread Patrick Avery
Yes, that is probably correct. I am loading a pdb file for the initial
conformer, so bond orders are not specified.

But I find it strange, still, that when I use MMFF94 in Avogadro to
optimize it, it results in a planar shape even though all the bonds are
still single.

On Thu, Nov 2, 2017 at 2:38 PM, Paolo Tosco  wrote:

> Dear Patrick,
>
> my guess is that you loaded the caffeine coordinates from PDB, or anyway
> from a format where bond orders were not specified. All atoms appear to be
> sp3-hybridized, which results in the wrong geometry being generated.
>
> Hope that helps,
> Paolo
>
> On 11/02/17 18:20, Patrick Avery wrote:
>
> Hey there RDKitters,
>
> I have been generating conformers in RDKit, optimizing them using MMFF94,
> and sorting them by their energies. For some tests, I have been using
> caffeine. But I seem to have some strange results, and I wonder if anyone
> knows why.
>
> I have attached two images of an MMFF94 optimized caffeine conformer. In
> them, the oxygen in the top right corner is the primary strange thing I see
> (although it doesn't seem to be very planar either, which may be somewhat
> strange).
>
> Note that the oxygen is sticking out of the plane of the molecule. This is
> the lowest energy conformer generated and optimized by RDKit (and other
> conformers low in energy are similar to it - with the oxygen sticking out).
>
> If I take that exact molecule and MMFF94 optimize it in Avogadro, all the
> atoms move to be in the same plane (including the oxygen that was sticking
> out). So the problem goes away if I MMFF94 optimize it with Avogadro.
>
> So the question is: why are RDKit and Avogadro giving different results
> for the MMFF94 optimization? And why does RDKit's MMFF94 push the oxygen
> out of the plane so much?
>
> In RDKit, I am using the C++ function RDKit::MMFF::
> MMFFOptimizeMoleculeConfs().
>
> My parameters are the mol (with all the conformers in it), an empty result
> vector, 1 thread, 1000 maximum optimization iterations, "MMFF94" for the
> mmffVariant, 100.0 for the nonbonded threshold, and true for
> ignoreInterfragInteractions.
>
> Let me know if anyone knows why.
>
> Thanks,
> Patrick
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Strange MMFF94 Optimization Results?

2017-11-02 Thread Paolo Tosco

Dear Patrick,

my guess is that you loaded the caffeine coordinates from PDB, or anyway 
from a format where bond orders were not specified. All atoms appear to 
be sp3-hybridized, which results in the wrong geometry being generated.


Hope that helps,
Paolo


On 11/02/17 18:20, Patrick Avery wrote:

Hey there RDKitters,

I have been generating conformers in RDKit, optimizing them using 
MMFF94, and sorting them by their energies. For some tests, I have 
been using caffeine. But I seem to have some strange results, and I 
wonder if anyone knows why.


I have attached two images of an MMFF94 optimized caffeine conformer. 
In them, the oxygen in the top right corner is the primary strange 
thing I see (although it doesn't seem to be very planar either, which 
may be somewhat strange).


Note that the oxygen is sticking out of the plane of the molecule. 
This is the lowest energy conformer generated and optimized by RDKit 
(and other conformers low in energy are similar to it - with the 
oxygen sticking out).


If I take that exact molecule and MMFF94 optimize it in Avogadro, all 
the atoms move to be in the same plane (including the oxygen that was 
sticking out). So the problem goes away if I MMFF94 optimize it with 
Avogadro.


So the question is: why are RDKit and Avogadro giving different 
results for the MMFF94 optimization? And why does RDKit's MMFF94 push 
the oxygen out of the plane so much?


In RDKit, I am using the C++ function 
RDKit::MMFF::MMFFOptimizeMoleculeConfs().


My parameters are the mol (with all the conformers in it), an empty 
result vector, 1 thread, 1000 maximum optimization iterations, 
"MMFF94" for the mmffVariant, 100.0 for the nonbonded threshold, and 
true for ignoreInterfragInteractions.


Let me know if anyone knows why.

Thanks,
Patrick


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss