Re: [Rdkit-discuss] Strange MMFF94 Optimization Results?

2017-11-03 Thread Patrick Avery
I started using the SD format instead, and it fixed my problems (the
molecules now have correct bond orders and they are MMFF94 optimized
properly).

Thanks for the help!

On Thu, Nov 2, 2017 at 3:21 PM, Paolo Tosco  wrote:

> Hi Patrick,
>
> I don't know Avogadro in detail, but as it is based on OpenBabel I can
> imagine it uses an algorithm to guess bond orders from bond distances,
> angles, etc., whereas the RDKit does not. In fact, if I convert a benzene
> ring from SMILES to SDF through PDB format (which does not contain any bond
> information once CONECT records are removed), I still get correct bond
> orders; this does not happen with RDKit (which is expected):
>
> echo 'c1c1' | babel --gen3d -i smi -o pdb | grep -v CONECT | babel -i
> pdb -o sdf
>
> yields
>
>
>  OpenBabel11021718533D
>
>  12 12  0  0  0  0  0  0  0  0999 V2000
>-0.76001.1690   -0.0010 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.63301.2450   -0.0010 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.39500.07700. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.7640   -1.16800.0030 C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.6290   -1.24300. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.3910   -0.0750   -0.0020 C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.35402.07900.0010 H   0  0  0  0  0  0  0  0  0  0  0  0
> 1.12402.2140   -0.0030 H   0  0  0  0  0  0  0  0  0  0  0  0
> 2.48000.1350   -0. H   0  0  0  0  0  0  0  0  0  0  0  0
> 1.3580   -2.07800.0060 H   0  0  0  0  0  0  0  0  0  0  0  0
>-1.1200   -2.2130   -0. H   0  0  0  0  0  0  0  0  0  0  0  0
>-2.4760   -0.1340   -0.0030 H   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0  0  0  0
>   1  7  1  0  0  0  0
>   2  3  2  0  0  0  0
>   3  9  1  0  0  0  0
>   3  4  1  0  0  0  0
>   4 10  1  0  0  0  0
>   5 11  1  0  0  0  0
>   5  4  2  0  0  0  0
>   6  1  2  0  0  0  0
>   6  5  1  0  0  0  0
>   8  2  1  0  0  0  0
>  12  6  1  0  0  0  0
> M  END
>
> whereas:
>
> mol = Chem.AddHs(Chem.MolFromSmiles('c1c1'))
> rdDistGeom.EmbedMolecule(mol)
> molFromPDB = Chem.MolFromPDBBlock(re.sub('CONECT.*\n', '',
> Chem.MolToPDBBlock(mol), flags=re.MULTILINE), removeHs = False)
> print (Chem.MolToMolBlock(molFromPDB))
>
> yields
>
>
>  RDKit  3D
>
>  12 12  0  0  0  0  0  0  0  0999 V2000
>-0.0060   -1.36200.0590 C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.2180   -0.70800.0690 C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.20900.6740   -0.0030 C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.00301.36200.0280 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.22100.70900.0760 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.1960   -0.67800. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.0250   -2.4100   -0.2260 H   0  0  0  0  0  0  0  0  0  0  0  0
>-2.1060   -1.17600.4870 H   0  0  0  0  0  0  0  0  0  0  0  0
>-2.09101.1950   -0.3790 H   0  0  0  0  0  0  0  0  0  0  0  0
> 0.04202.4080   -0.2780 H   0  0  0  0  0  0  0  0  0  0  0  0
> 2.10301.18300.5000 H   0  0  0  0  0  0  0  0  0  0  0  0
> 2.0960   -1.1980   -0.3330 H   0  0  0  0  0  0  0  0  0  0  0  0
>   2  1  1  0
>   3  2  1  0
>   4  3  1  0
>   5  4  1  0
>   6  5  1  0
>   6  1  1  0
>   7  1  1  0
>   8  2  1  0
>   9  3  1  0
>  10  4  1  0
>  11  5  1  0
>  12  6  1  0
> M  END
>
> Note that the RDKit gets the correct bond orders if the PDB file contains
> CONECT records with duplicate (or triplicate) entries for the relevant atom
> indices, which is the widely accepted (though non official) way to encode
> bond orders in PDB files.
>
> Cheers,
> p.
>
> On 11/02/17 18:45, Patrick Avery wrote:
>
> Yes, that is probably correct. I am loading a pdb file for the initial
> conformer, so bond orders are not specified.
>
> But I find it strange, still, that when I use MMFF94 in Avogadro to
> optimize it, it results in a planar shape even though all the bonds are
> still single.
>
> On Thu, Nov 2, 2017 at 2:38 PM, Paolo Tosco  wrote:
>
>> Dear Patrick,
>>
>> my guess is that you loaded the caffeine coordinates from PDB, or anyway
>> from a format where bond orders were not specified. All atoms appear to be
>> sp3-hybridized, which results in the wrong geometry being generated.
>>
>> Hope that helps,
>> Paolo
>>
>> On 11/02/17 18:20, Patrick Avery wrote:
>>
>> Hey there RDKitters,
>>
>> I have been generating conformers in RDKit, optimizing them using MMFF94,
>> and sorting them by their energies. For some tests, I have been using
>> caffeine. But I seem to have some strange results, and I wonder if anyone
>> knows why.
>>
>> I have attached two images of an MMFF94 optimized caffeine conformer. In
>> them, the oxygen in the top right corner is the primary strange thing I see
>> (although it doesn't seem to be very planar either, which may be somewhat
>> strange).
>>
>> Note that the oxygen is 

Re: [Rdkit-discuss] Find difference(s) between molecules

2017-11-03 Thread Greg Landrum
Hi Jens,

Ed provided some good pointers to the matched pair literature, which is
certainly one way to approach the problem. There are RDKit implementations
of MMP analysis.

A somewhat more direct (though limited) approach would be to use the MCS
code to find the maximum common substructure of your set of molecules and
then remove that with the DeleteSubstructs() function:

In [14]: smis =
('c1c1F','c1ccc(Cl)cc1F','c1cc(CCO)ccc1','c1cc(C1CC1)ccc1F',
...: )

In [15]: ms = [Chem.MolFromSmiles(x) for x in smis]

In [16]: from rdkit.Chem import rdFMCS

In [17]: mcs = rdFMCS.FindMCS(ms)

In [18]: patt = mcs.smartsString

In [19]: patt
Out[19]: '[#6](:,-[#6]:,-[#6]:,-[#6]):,-[#6]:,-[#6]'

In [20]: commonMol =Chem.MolFromSmarts(patt)

In [21]: diffs = [Chem.DeleteSubstructs(x,commonMol) for x in ms]

In [22]: [Chem.MolToSmiles(x) for x in diffs]
Out[22]: ['F', 'Cl.F', 'O', 'C.CC.F']

This won't solve every problem, but it's a quick start.

-greg



On Fri, Nov 3, 2017 at 8:15 AM, Jens Kristian Munk <
jens.kristian.m...@regionh.dk> wrote:

> Hi list,
>
>
>
> I’ve searched far and wide for an answer to this; I apologize if the
> answer is obvious...
>
>
>
> I can use rdFMCS (http://www.rdkit.org/Python_
> Docs/rdkit.Chem.rdFMCS-module.html) to find the maximum common
> substructure of a set of molecules... But how do I find the difference(s)
> between two (or more) molecules?
>
>
>
> I work with lipids a lot, so for example, the difference between palmitoic
> acid (C16:0) and stearic acid (C18:0) is SMILES ‘CC’. I would like RDkit to
> tell me just that, as well as tell me where on the maximum common
> substructure (which in this example is palmitoic acid) to add the ‘CC’ to
> get stearic acid – i.e. on the terminus of the fatty chain.
>
>
>
> Any ideas?
>
>
>
> The example above is just the first step. After that comes identifying and
> locating double bonds in the fatty chains... And then jump to
> phospholipids, with two fatty chains and a head group... J
>
>
>
> Med venlig hilsen
>
>
>
> *Jens Kristian Munk*
>
> Kemiker, Cand. Scient., Ph.D.
>
>
>
> Telefon: 3862 0398
>
> Mobil: 5142 3483
>
> E-mail: *jens.kristian.m...@regionh.dk *
>
>
>
> Klinisk Biokemisk afdeling
>
> Amager og Hvidovre hospital
>
> Kettegård Allé 30
>
> 2650 Hvidovre
>
>
> Web: www.regionh.dk
>
>
>
> --
>
>
> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette
> modtager af denne e-mail eller hvis du modtager den ved en fejltagelse,
> beder vi dig venligst informere afsender om fejlen ved at bruge
> svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at
> videresende eller kopiere den.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Find difference(s) between molecules

2017-11-03 Thread Jens Kristian Munk
Hi list,

I've searched far and wide for an answer to this; I apologize if the answer is 
obvious...

I can use rdFMCS 
(http://www.rdkit.org/Python_Docs/rdkit.Chem.rdFMCS-module.html) to find the 
maximum common substructure of a set of molecules... But how do I find the 
difference(s) between two (or more) molecules?

I work with lipids a lot, so for example, the difference between palmitoic acid 
(C16:0) and stearic acid (C18:0) is SMILES 'CC'. I would like RDkit to tell me 
just that, as well as tell me where on the maximum common substructure (which 
in this example is palmitoic acid) to add the 'CC' to get stearic acid - i.e. 
on the terminus of the fatty chain.

Any ideas?

The example above is just the first step. After that comes identifying and 
locating double bonds in the fatty chains... And then jump to phospholipids, 
with two fatty chains and a head group... :)

Med venlig hilsen

Jens Kristian Munk
Kemiker, Cand. Scient., Ph.D.

Telefon: 3862 0398
Mobil: 5142 3483
E-mail: jens.kristian.m...@regionh.dk

Klinisk Biokemisk afdeling
Amager og Hvidovre hospital
Kettegård Allé 30
2650 Hvidovre

Web: www.regionh.dk





Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
kopiere den.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss