Hi,

Thank you Ed and Greg for good suggestions. Your input is certainly usable.

I come from R but had failed to find tools there that could do the job, so 
found rdkit. Your input actually led me back to R and the ChemmineR package 
fmcsR.

Thanks again.

Med venlig hilsen

Jens Kristian Munk
Kemiker, Cand. Scient., Ph.D.

Telefon: 3862 0398
Mobil: 5142 3483
E-mail: jens.kristian.m...@regionh.dk<mailto:jens.kristian.m...@regionh.dk>

Klinisk Biokemisk afdeling
Amager og Hvidovre hospital
Kettegård Allé 30
2650 Hvidovre

Web: www.regionh.dk<http://www.regionh.dk/>

Fra: Greg Landrum [mailto:greg.land...@gmail.com]
Sendt: Friday, November 03, 2017 9:41 PM
Til: Jens Kristian Munk
Cc: Rdkit-discuss@lists.sourceforge.net
Emne: Re: [Rdkit-discuss] Find difference(s) between molecules

Hi Jens,

Ed provided some good pointers to the matched pair literature, which is 
certainly one way to approach the problem. There are RDKit implementations of 
MMP analysis.

A somewhat more direct (though limited) approach would be to use the MCS code 
to find the maximum common substructure of your set of molecules and then 
remove that with the DeleteSubstructs() function:

In [14]: smis = ('c1ccccc1F','c1ccc(Cl)cc1F','c1cc(CCO)ccc1','c1cc(C1CC1)ccc1F',
    ...: )

In [15]: ms = [Chem.MolFromSmiles(x) for x in smis]

In [16]: from rdkit.Chem import rdFMCS

In [17]: mcs = rdFMCS.FindMCS(ms)

In [18]: patt = mcs.smartsString

In [19]: patt
Out[19]: '[#6](:,-[#6]:,-[#6]:,-[#6]):,-[#6]:,-[#6]'

In [20]: commonMol =Chem.MolFromSmarts(patt)

In [21]: diffs = [Chem.DeleteSubstructs(x,commonMol) for x in ms]

In [22]: [Chem.MolToSmiles(x) for x in diffs]
Out[22]: ['F', 'Cl.F', 'O', 'C.CC.F']

This won't solve every problem, but it's a quick start.

-greg



On Fri, Nov 3, 2017 at 8:15 AM, Jens Kristian Munk 
<jens.kristian.m...@regionh.dk<mailto:jens.kristian.m...@regionh.dk>> wrote:
Hi list,

I’ve searched far and wide for an answer to this; I apologize if the answer is 
obvious...

I can use rdFMCS 
(http://www.rdkit.org/Python_Docs/rdkit.Chem.rdFMCS-module.html) to find the 
maximum common substructure of a set of molecules... But how do I find the 
difference(s) between two (or more) molecules?

I work with lipids a lot, so for example, the difference between palmitoic acid 
(C16:0) and stearic acid (C18:0) is SMILES ‘CC’. I would like RDkit to tell me 
just that, as well as tell me where on the maximum common substructure (which 
in this example is palmitoic acid) to add the ‘CC’ to get stearic acid – i.e. 
on the terminus of the fatty chain.

Any ideas?

The example above is just the first step. After that comes identifying and 
locating double bonds in the fatty chains... And then jump to phospholipids, 
with two fatty chains and a head group... ☺

Med venlig hilsen

Jens Kristian Munk
Kemiker, Cand. Scient., Ph.D.

Telefon: 3862 0398
Mobil: 5142 3483
E-mail: jens.kristian.m...@regionh.dk<mailto:jens.kristian.m...@regionh.dk>

Klinisk Biokemisk afdeling
Amager og Hvidovre hospital
Kettegård Allé 30
2650 Hvidovre

Web: www.regionh.dk<http://www.regionh.dk/>


________________________________


Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
kopiere den.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


________________________________


Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
kopiere den.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to