Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-21 Thread James Davidson
Hi Greg (and Markus, Peter, et al.),

Personal opinion - my vote would be to always keep the chiral information at 
3-valent nitrogen centres...
As Peter pointed-out, there are bridgehead examples (most of which, I guess, 
will have additional carbon chiral centres - and offer diastereomeric 
considerations).
There are also, I believe, some nice oxaziridine examples where the oxaziridine 
N is the only chiral centre present (interpreted from abstract here: 
http://dx.doi.org/10.1039/C3985998):

3,3-dimethyl (2S)-2-tert-butyloxaziridine-3,3-dicarboxylate
COC(=O)C1(O[N@]1C(C)(C)C)C(=O)OC

and many other examples of diastereomeric oxaziridines - where the N is a 
chiral centre - eg see http://dx.doi.org/10.1016/j.tetasy.2008.09.016


Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-21 Thread Markus Sitzmann
Hi James,

I know that my opinion might sound extreme but I had this discussion
many times (mostly regarding tautomerism which is, however, similar in
some way). The problem is, you can look at a chemical structure in
many different ways - two scenarios are:

1. What can I perceive from a chemical structure if all I have is the
pure connection table and nothing else (and maybe millions of them)
2. What can I find about a particular structure if a I can run fully
fledged quantum-mechanical calculations, do an extensive literature
search, and/or have carefully measured experimental data and
conditions (rarely in the millions :-))

So, if I deal with something like implementing RDKit, things are
probably always quite close to scenario 1, hence my suggestion to
disregard stereochemistry on these type of N atoms (you need a lot of
information from scenario 2 to even decide whether there is
stereochemistry or not). The ideal solution, of course, would be to
offer three different modes for stereo perception: disregard,
keep, perceive from 3D (I am not sure if Greg likes that :-)).  If
these three modes would be available I still would suggest to set the
default to disregard for 3-coordinated N because the other two modes
require that you know what you are doing and/or have full trust in
your data - otherwise you probably do more harm than good.

Best,
Markus

On Fri, Aug 21, 2015 at 3:10 PM, James Davidson j.david...@vernalis.com wrote:
 Hi Greg (and Markus, Peter, et al.),



 Personal opinion – my vote would be to always keep the chiral information at
 3-valent nitrogen centres…

 As Peter pointed-out, there are bridgehead examples (most of which, I guess,
 will have additional carbon chiral centres – and offer diastereomeric
 considerations).

 There are also, I believe, some nice oxaziridine examples where the
 oxaziridine N is the only chiral centre present (interpreted from abstract
 here: http://dx.doi.org/10.1039/C3985998):



 3,3-dimethyl (2S)-2-tert-butyloxaziridine-3,3-dicarboxylate

 COC(=O)C1(O[N@]1C(C)(C)C)C(=O)OC



 and many other examples of diastereomeric oxaziridines – where the N is a
 chiral centre – eg see http://dx.doi.org/10.1016/j.tetasy.2008.09.016





 Kind regards



 James


 __
 PLEASE READ: This email is confidential and may be privileged. It is
 intended for the named addressee(s) only and access to it by anyone else is
 unauthorised. If you are not an addressee, any disclosure or copying of the
 contents of this email or any action taken (or not taken) in reliance on it
 is unauthorised and may be unlawful. If you have received this email in
 error, please notify the sender or postmas...@vernalis.com. Email is not a
 secure method of communication and the Company cannot accept responsibility
 for the accuracy or completeness of this message or any attachment(s).
 Please check this email for virus infection for which the Company accepts no
 responsibility. If verification of this email is sought then please request
 a hard copy. Unless otherwise stated, any views or opinions presented are
 solely those of the author and do not represent those of the Company.

 The Vernalis Group of Companies
 100 Berkshire Place
 Wharfedale Road
 Winnersh, Berkshire
 RG41 5RD, England
 Tel: +44 (0)118 938 

 To access trading company registration and address details, please go to the
 Vernalis website at www.vernalis.com and click on the Company address and
 registration details link at the bottom of the page..
 __

 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-20 Thread Markus Sitzmann
Hehe, that is why I keep my computers always really cold when I run RDKit ... 

-
|  Markus Sitzmann
|  markus.sitzm...@gmail.com

 On 20.08.2015, at 04:33, Peter Shenkin shen...@gmail.com wrote:
 
 Maybe when you have a toolkit as blazingly fast as RDKit it captures the 
 chirality of N center before it has time to interconvert
 
 -P.
 
 On Wed, Aug 19, 2015 at 10:17 PM, John M john.wilkinson...@gmail.com wrote:
 More odd is the carbon stereocentre with two methyls...
 
 Generally trivalent nitrogens are not considered chiral due to inversion of 
 the lone-pair. The two usual exceptions are when they are a bridgehead or in 
 a tight ring (cyclopropane). This is the same in most toolkits, the InChI 
 technical documentation provides useful examples.
 
 InChI actually only sees one stereo centre since it strips the proton off:
 InChI=1S/C13H26N2/c1-4-14-8-5-12(6-9-14)15-10-7-13(15)11(2)3/h11-13H,4-10H2,1-3H3/p+1/t13-/m1/s1
 
 It may well be chiral in this case but since it's not you should also 
 strictly remove the other stereocentre in the para position to the nitrogen
 
 For the record just tested and ChemAxon/CDK/OpenBabel do the same.
 
 John
 
 Regards,
 John W May 
 john.wilkinson...@gmail.com
 
 On 19 August 2015 at 09:00, Rob Smith robtsm...@gmail.com wrote:
 Dear RDKit community,
 
 I'm trying to use RDKit to read in Corina generated stereoisomers (from a 
 Mol file), assign chiral tags and stereochemistry to the structure and 
 output the canonical smiles string for each isomer of a given molecule (in 
 Python), when I do this, half the canonical smiles strings are not unique.
 
 When I read in the output from Corina into an Indigo instance, then use the 
 canonical smiles from Indigo to create an RDKit molecule, canonical smiles 
 strings generated from the molecule objects are all unique.
 
 I may be missing an option to enable RDKit to 'visualise' the chiral centre 
 adjacent to the protonated nitrogen, so if someone can spot where I've made 
 a mistake, I'd really appreciate it. I've included the output and Python 
 script below. If you require any further information, please let me know.
 
 Many thanks,
 Rob
 
 Output:
 
 RDKit Read in of Molecule
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1
 
 INDIGO Read in of Molecule
 RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1
 
 Python script :
 
 from rdkit import Chem
 import subprocess # Used to run Corina
 from indigo import *
 
 def runCorinaTest(inputMol):
 indigo = Indigo()
 
 molFile = Chem.MolToMolBlock(inputMol)
 
 corinaCommand = echo \' + molFile + \' | 
 # Then Corina - generate stereoisomers...
 corinaCommand = corinaCommand + /apps/corina/corina -t n -d 
 canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf
 corinaResult = subprocess.check_output([corinaCommand], shell=True) # 
 Gives the stereoisomer species as an SDF string
 
 allMoleculeObjects = []
 allMolecules = corinaResult.split(\n) # Separate Corina output 
 into individual molecules
 allMolecules = allMolecules[0:len(allMolecules)-1]
 
 print(RDKit Read in of Molecule)
 
 for eachMolecule in allMolecules:
 eachMolecule = eachMolecule + \n
 mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True, 
 removeHs=True, strictParsing=False)
 Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol, 
 replaceExistingTags=True)
 Chem.rdmolops.AssignStereochemistry(mol)
 print(RDKit Output -   + Chem.MolToSmiles(mol, 
 isomericSmiles=True))
 
 print(INDIGO Read in of Molecule)
 for eachMolecule in allMolecules:
 eachMolecule = eachMolecule + \n
 mol = indigo.loadMolecule(eachMolecule)
 # print(Indigo Output -  + mol.canonicalSmiles())
 # Use Indigo Canonical Smiles to create RDKit molecule
 mol = Chem.MolFromSmiles(mol.canonicalSmiles())
 if mol is not None:
 print(RDKit Output -   + Chem.MolToSmiles(mol, 
 isomericSmiles=True))
 
 return 0
 
 mol = Chem.MolFromSmiles(CC(C)C1[NH+](C2CCN(CC)CC2)CC1)
 z = runCorinaTest(mol

Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-20 Thread Markus Sitzmann
Hmm, well - probably not, you mention the always present exception in
chemistry, Peter (Sulfoxides have a similar situation, stereochemistry
from lone pairs). But generally I still think it is more dangerous to
keep or even perceive (from 3D) stereochemistry on three-coordinated N
- you will do more harm with this than fix things.



On Thu, Aug 20, 2015 at 6:40 PM, Peter Shenkin shen...@gmail.com wrote:
 My initial answer, and I would love input on this, is that three-coordinate
 N should always have stereochemistry removed.

 Umm... even if it's a bridgehead?

 -P.

 On Thu, Aug 20, 2015 at 10:30 AM, Greg Landrum greg.land...@gmail.com
 wrote:

 This isn't a simple one, so it may take a bit to get to an answer that's
 comprehensible.

 There are two things going on here in the RDKit:
 1) Ring stereochemistry
 2) stereochemistry about nitrogen centers

 Let's start with the second, because it's easier: RDKit does not generally
 believe in stereochemistry around three coordinate nitrogens. Here's a
 very simple example:
 In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl')

 In [46]: Chem.MolToSmiles(m3,isomericSmiles=True)
 Out[46]: 'FN(Cl)Br'


 The 3D equivalent of that:
 In [41]: m = Chem.MolFromSmiles('BrN(F)Cl')

 In [42]: AllChem.EmbedMolecule(m)
 Out[42]: 0

 In [43]: Chem.AssignAtomChiralTagsFromStructure(m)

 In [44]: Chem.MolToSmiles(m,isomericSmiles=True)
 Out[44]: 'FN(Cl)Br'

 Contrast this with what you get for a carbon:

 In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I')

 In [35]: AllChem.EmbedMolecule(m2)
 Out[35]: 0

 In [36]: Chem.AssignAtomChiralTagsFromStructure(m2)

 In [37]: Chem.MolToSmiles(m2,isomericSmiles=True)
 Out[37]: 'F[C@](Cl)(Br)I'


 Back to the first: ring stereochemistry. By this I mean things like
 C[C@H]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is
 really about whether the substituents of the ring are cis or trans relative
 to the ring plane.

 The way the RDKit handles this is something of a hack: it doesn't identify
 those atoms as chiral centers, but it does preserve the chiral tags when
 generating a canonical SMILES:

 In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1')

 In [48]: Chem.FindMolChiralCenters(m)
 Out[48]: []

 In [49]: Chem.MolToSmiles(m,isomericSmiles=True)
 Out[49]: 'C[C@H]1CC[C@@H](C)CC1'

 Curiously, to me at least, it does the same thing with nitrogens;

 In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1')

 In [53]: Chem.MolToSmiles(m2,isomericSmiles=True)
 Out[53]: 'C[C@H]1CC[N@](C)CC1'

 Lest anyone think that this might make sense because being a ring makes
 inversion more difficult, that's not what is going on here. If I make the
 ring truly chiral, then the stereochemistry of the N is removed:

 In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1')

 In [55]: Chem.MolToSmiles(m3,isomericSmiles=True)
 Out[55]: 'C[C@H]1CCN(C)CO1'

 I believe that this inconsistent behavior is a bug: either N should always
 have the input stereochemistry preserved (and that should be perceived from
 the 3D coordinates) or it should never have the input stereochemistry
 preserved. My initial answer, and I would love input on this, is that
 three-coordinate N should always have stereochemistry removed.

 -greg



 On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith robtsm...@gmail.com wrote:

 Hi Greg,

 I've attached the SDF that Corina generates. I'm not convinced it is a
 problem, more an observation that I'm trying to understand.

 Looking at the results again today - it seems that from the Corina output
 Indigo is interpreting the conformer (including whether the ethyl
 substituent on the piperidine nitrogen is equatorial or axial) - and
 outputting a canonical smiles string that has the conformer encoded in it
 (using the chiral flags). Whereas RDKit is reading in the Corina output,
 discounting whether the nitrogen is axial or equatorial (which due to
 inversion I can understand) and interpreting it as having only two chiral
 centers (which is correct).

 What is confusing me, is that when I supply RDKit with the canonical
 smiles string from Indigo (which has the conformer encoded in it), and
 then ask for the isomeric canonical smiles, it supplies the canonical smiles
 with the conformer still encoded within it.

 For example, I read in the following canonical smiles string into RDKit:
 CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated by reading
 in one of the mols in the SD File into RDKit and output the isomeric
 canonical smiles), running the FindMolChiralCenters on this molecule,
 correctly reports the number of chiral centres to be 2 (6S, 9R), and then
 asking it to output the canonical smiles string (with isomericSmiles=True)
 gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1).

 If I take the same mol file, read it into Indigo, and ask it to output
 the canonical smiles string, I get:
 CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC, if I read this smiles string into
 RDKit and run FindMolCenters on it, I get (3R, 6S

Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-20 Thread Greg Landrum
This isn't a simple one, so it may take a bit to get to an answer that's
comprehensible.

There are two things going on here in the RDKit:
1) Ring stereochemistry
2) stereochemistry about nitrogen centers

Let's start with the second, because it's easier: RDKit does not generally
believe in stereochemistry around three coordinate nitrogens. Here's a
very simple example:
In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl')

In [46]: Chem.MolToSmiles(m3,isomericSmiles=True)
Out[46]: 'FN(Cl)Br'


The 3D equivalent of that:
In [41]: m = Chem.MolFromSmiles('BrN(F)Cl')

In [42]: AllChem.EmbedMolecule(m)
Out[42]: 0

In [43]: Chem.AssignAtomChiralTagsFromStructure(m)

In [44]: Chem.MolToSmiles(m,isomericSmiles=True)
Out[44]: 'FN(Cl)Br'

Contrast this with what you get for a carbon:

In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I')

In [35]: AllChem.EmbedMolecule(m2)
Out[35]: 0

In [36]: Chem.AssignAtomChiralTagsFromStructure(m2)

In [37]: Chem.MolToSmiles(m2,isomericSmiles=True)
Out[37]: 'F[C@](Cl)(Br)I'


Back to the first: ring stereochemistry. By this I mean things like C[C@H
]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is
really about whether the substituents of the ring are cis or trans relative
to the ring plane.

The way the RDKit handles this is something of a hack: it doesn't identify
those atoms as chiral centers, but it does preserve the chiral tags when
generating a canonical SMILES:

In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1')

In [48]: Chem.FindMolChiralCenters(m)
Out[48]: []

In [49]: Chem.MolToSmiles(m,isomericSmiles=True)
Out[49]: 'C[C@H]1CC[C@@H](C)CC1'

Curiously, to me at least, it does the same thing with nitrogens;

In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1')

In [53]: Chem.MolToSmiles(m2,isomericSmiles=True)
Out[53]: 'C[C@H]1CC[N@](C)CC1'

Lest anyone think that this might make sense because being a ring makes
inversion more difficult, that's not what is going on here. If I make the
ring truly chiral, then the stereochemistry of the N is removed:

In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1')

In [55]: Chem.MolToSmiles(m3,isomericSmiles=True)
Out[55]: 'C[C@H]1CCN(C)CO1'

I believe that this inconsistent behavior is a bug: either N should always
have the input stereochemistry preserved (and that should be perceived from
the 3D coordinates) or it should never have the input stereochemistry
preserved. My initial answer, and I would love input on this, is that
three-coordinate N should always have stereochemistry removed.

-greg



On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith robtsm...@gmail.com wrote:

 Hi Greg,

 I've attached the SDF that Corina generates. I'm not convinced it is a
 problem, more an observation that I'm trying to understand.

 Looking at the results again today - it seems that from the Corina output
 Indigo is interpreting the conformer (including whether the ethyl
 substituent on the piperidine nitrogen is equatorial or axial) - and
 outputting a canonical smiles string that has the conformer encoded in it
 (using the chiral flags). Whereas RDKit is reading in the Corina output,
 discounting whether the nitrogen is axial or equatorial (which due to
 inversion I can understand) and interpreting it as having only two chiral
 centers (which is correct).

 What is confusing me, is that when I supply RDKit with the canonical
 smiles string from Indigo (which has the conformer encoded in it), and
 then ask for the isomeric canonical smiles, it supplies the canonical
 smiles with the conformer still encoded within it.

 For example, I read in the following canonical smiles string into
 RDKit: CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated
 by reading in one of the mols in the SD File into RDKit and output the
 isomeric canonical smiles), running the FindMolChiralCenters on this
 molecule, correctly reports the number of chiral centres to be 2 (6S, 9R),
 and then asking it to output the canonical smiles string (with
 isomericSmiles=True) gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1).

 If I take the same mol file, read it into Indigo, and ask it to output the
 canonical smiles string, I get: CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC,
 if I read this smiles string into RDKit and run FindMolCenters on it, I get
 (3R, 6S) - which is fine, if I then out the canonical smiles (again with
 isomericSmiles=True) I get CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1. I
 expected this isomeric canonical smiles to be the same as (1), however
 RDKit appears to conserve the conformer representation given to it from an
 isomeric smiles string, but when reading a Mol file doesn't keep all
 conformer information (axial or equatorial substituents on a nitrogen).

 Thanks to all for your quick (and quick witted) responses

 Rob


 On Thu, Aug 20, 2015 at 3:46 AM, Greg Landrum greg.land...@gmail.com
 wrote:

 Hi Rob,

 The results below are quite strange. As John has already pointed out:
 there really shouldn't be chirality present

Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-20 Thread Markus Sitzmann
I agree with remove - the chance that you destroy actual information
by this is low - or in other words, the chance that steroinformation
on three-coordinate N is spurious I would expect as high.

Markus

On Thu, Aug 20, 2015 at 4:30 PM, Greg Landrum greg.land...@gmail.com wrote:
 This isn't a simple one, so it may take a bit to get to an answer that's
 comprehensible.

 There are two things going on here in the RDKit:
 1) Ring stereochemistry
 2) stereochemistry about nitrogen centers

 Let's start with the second, because it's easier: RDKit does not generally
 believe in stereochemistry around three coordinate nitrogens. Here's a
 very simple example:
 In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl')

 In [46]: Chem.MolToSmiles(m3,isomericSmiles=True)
 Out[46]: 'FN(Cl)Br'


 The 3D equivalent of that:
 In [41]: m = Chem.MolFromSmiles('BrN(F)Cl')

 In [42]: AllChem.EmbedMolecule(m)
 Out[42]: 0

 In [43]: Chem.AssignAtomChiralTagsFromStructure(m)

 In [44]: Chem.MolToSmiles(m,isomericSmiles=True)
 Out[44]: 'FN(Cl)Br'

 Contrast this with what you get for a carbon:

 In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I')

 In [35]: AllChem.EmbedMolecule(m2)
 Out[35]: 0

 In [36]: Chem.AssignAtomChiralTagsFromStructure(m2)

 In [37]: Chem.MolToSmiles(m2,isomericSmiles=True)
 Out[37]: 'F[C@](Cl)(Br)I'


 Back to the first: ring stereochemistry. By this I mean things like
 C[C@H]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is
 really about whether the substituents of the ring are cis or trans relative
 to the ring plane.

 The way the RDKit handles this is something of a hack: it doesn't identify
 those atoms as chiral centers, but it does preserve the chiral tags when
 generating a canonical SMILES:

 In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1')

 In [48]: Chem.FindMolChiralCenters(m)
 Out[48]: []

 In [49]: Chem.MolToSmiles(m,isomericSmiles=True)
 Out[49]: 'C[C@H]1CC[C@@H](C)CC1'

 Curiously, to me at least, it does the same thing with nitrogens;

 In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1')

 In [53]: Chem.MolToSmiles(m2,isomericSmiles=True)
 Out[53]: 'C[C@H]1CC[N@](C)CC1'

 Lest anyone think that this might make sense because being a ring makes
 inversion more difficult, that's not what is going on here. If I make the
 ring truly chiral, then the stereochemistry of the N is removed:

 In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1')

 In [55]: Chem.MolToSmiles(m3,isomericSmiles=True)
 Out[55]: 'C[C@H]1CCN(C)CO1'

 I believe that this inconsistent behavior is a bug: either N should always
 have the input stereochemistry preserved (and that should be perceived from
 the 3D coordinates) or it should never have the input stereochemistry
 preserved. My initial answer, and I would love input on this, is that
 three-coordinate N should always have stereochemistry removed.

 -greg



 On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith robtsm...@gmail.com wrote:

 Hi Greg,

 I've attached the SDF that Corina generates. I'm not convinced it is a
 problem, more an observation that I'm trying to understand.

 Looking at the results again today - it seems that from the Corina output
 Indigo is interpreting the conformer (including whether the ethyl
 substituent on the piperidine nitrogen is equatorial or axial) - and
 outputting a canonical smiles string that has the conformer encoded in it
 (using the chiral flags). Whereas RDKit is reading in the Corina output,
 discounting whether the nitrogen is axial or equatorial (which due to
 inversion I can understand) and interpreting it as having only two chiral
 centers (which is correct).

 What is confusing me, is that when I supply RDKit with the canonical
 smiles string from Indigo (which has the conformer encoded in it), and
 then ask for the isomeric canonical smiles, it supplies the canonical smiles
 with the conformer still encoded within it.

 For example, I read in the following canonical smiles string into RDKit:
 CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated by reading
 in one of the mols in the SD File into RDKit and output the isomeric
 canonical smiles), running the FindMolChiralCenters on this molecule,
 correctly reports the number of chiral centres to be 2 (6S, 9R), and then
 asking it to output the canonical smiles string (with isomericSmiles=True)
 gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1).

 If I take the same mol file, read it into Indigo, and ask it to output the
 canonical smiles string, I get: CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC,
 if I read this smiles string into RDKit and run FindMolCenters on it, I get
 (3R, 6S) - which is fine, if I then out the canonical smiles (again with
 isomericSmiles=True) I get CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1. I
 expected this isomeric canonical smiles to be the same as (1), however RDKit
 appears to conserve the conformer representation given to it from an
 isomeric smiles string, but when reading a Mol file doesn't keep all
 conformer

[Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-19 Thread Rob Smith
Dear RDKit community,

I'm trying to use RDKit to read in Corina generated stereoisomers (from a
Mol file), assign chiral tags and stereochemistry to the structure and
output the canonical smiles string for each isomer of a given molecule (in
Python), when I do this, half the canonical smiles strings are not unique.

When I read in the output from Corina into an Indigo instance, then use the
canonical smiles from Indigo to create an RDKit molecule, canonical smiles
strings generated from the molecule objects are all unique.

I may be missing an option to enable RDKit to 'visualise' the chiral centre
adjacent to the protonated nitrogen, so if someone can spot where I've made
a mistake, I'd really appreciate it. I've included the output and Python
script below. If you require any further information, please let me know.

Many thanks,
Rob

Output:

RDKit Read in of Molecule
RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1
RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1

INDIGO Read in of Molecule
RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1
RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1
RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1
RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1
RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1
RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1

Python script :

from rdkit import Chem
import subprocess # Used to run Corina
from indigo import *

def runCorinaTest(inputMol):
indigo = Indigo()

molFile = Chem.MolToMolBlock(inputMol)

corinaCommand = echo \' + molFile + \' | 
# Then Corina - generate stereoisomers...
corinaCommand = corinaCommand + /apps/corina/corina -t n -d
canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf
corinaResult = subprocess.check_output([corinaCommand], shell=True) #
Gives the stereoisomer species as an SDF string

allMoleculeObjects = []
allMolecules = corinaResult.split(\n) # Separate Corina output
into individual molecules
allMolecules = allMolecules[0:len(allMolecules)-1]

print(RDKit Read in of Molecule)

for eachMolecule in allMolecules:
eachMolecule = eachMolecule + \n
mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True,
removeHs=True, strictParsing=False)
Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol,
replaceExistingTags=True)
Chem.rdmolops.AssignStereochemistry(mol)
print(RDKit Output -   + Chem.MolToSmiles(mol,
isomericSmiles=True))

print(INDIGO Read in of Molecule)
for eachMolecule in allMolecules:
eachMolecule = eachMolecule + \n
mol = indigo.loadMolecule(eachMolecule)
# print(Indigo Output -  + mol.canonicalSmiles())
# Use Indigo Canonical Smiles to create RDKit molecule
mol = Chem.MolFromSmiles(mol.canonicalSmiles())
if mol is not None:
print(RDKit Output -   + Chem.MolToSmiles(mol,
isomericSmiles=True))

return 0

mol = Chem.MolFromSmiles(CC(C)C1[NH+](C2CCN(CC)CC2)CC1)
z = runCorinaTest(mol)
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-19 Thread Peter Shenkin
Maybe when you have a toolkit as blazingly fast as RDKit it captures the
chirality of N center before it has time to interconvert

-P.

On Wed, Aug 19, 2015 at 10:17 PM, John M john.wilkinson...@gmail.com
wrote:

 More odd is the carbon stereocentre with two methyls...

 Generally trivalent nitrogens are not considered chiral due to inversion
 of the lone-pair. The two usual exceptions are when they are a bridgehead
 or in a tight ring (cyclopropane). This is the same in most toolkits, the
 InChI technical documentation provides useful examples.

 InChI actually only sees one stereo centre since it strips the proton off:

 InChI=1S/C13H26N2/c1-4-14-8-5-12(6-9-14)15-10-7-13(15)11(2)3/h11-13H,4-10H2,1-3H3/p+1/t13-/m1/s1

 It may well be chiral in this case but since it's not you should also
 strictly remove the other stereocentre in the para position to the nitrogen

 For the record just tested and ChemAxon/CDK/OpenBabel do the same.

 John

 Regards,
 John W May
 john.wilkinson...@gmail.com

 On 19 August 2015 at 09:00, Rob Smith robtsm...@gmail.com wrote:

 Dear RDKit community,

 I'm trying to use RDKit to read in Corina generated stereoisomers (from a
 Mol file), assign chiral tags and stereochemistry to the structure and
 output the canonical smiles string for each isomer of a given molecule (in
 Python), when I do this, half the canonical smiles strings are not unique.

 When I read in the output from Corina into an Indigo instance, then use
 the canonical smiles from Indigo to create an RDKit molecule, canonical
 smiles strings generated from the molecule objects are all unique.

 I may be missing an option to enable RDKit to 'visualise' the chiral
 centre adjacent to the protonated nitrogen, so if someone can spot where
 I've made a mistake, I'd really appreciate it. I've included the output and
 Python script below. If you require any further information, please let me
 know.

 Many thanks,
 Rob

 Output:

 RDKit Read in of Molecule
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1

 INDIGO Read in of Molecule
 RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1

 Python script :

 from rdkit import Chem
 import subprocess # Used to run Corina
 from indigo import *

 def runCorinaTest(inputMol):
 indigo = Indigo()

 molFile = Chem.MolToMolBlock(inputMol)

 corinaCommand = echo \' + molFile + \' | 
 # Then Corina - generate stereoisomers...
 corinaCommand = corinaCommand + /apps/corina/corina -t n -d
 canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf
 corinaResult = subprocess.check_output([corinaCommand], shell=True) #
 Gives the stereoisomer species as an SDF string

 allMoleculeObjects = []
 allMolecules = corinaResult.split(\n) # Separate Corina output
 into individual molecules
 allMolecules = allMolecules[0:len(allMolecules)-1]

 print(RDKit Read in of Molecule)

 for eachMolecule in allMolecules:
 eachMolecule = eachMolecule + \n
 mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True,
 removeHs=True, strictParsing=False)
 Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol,
 replaceExistingTags=True)
 Chem.rdmolops.AssignStereochemistry(mol)
 print(RDKit Output -   + Chem.MolToSmiles(mol,
 isomericSmiles=True))

 print(INDIGO Read in of Molecule)
 for eachMolecule in allMolecules:
 eachMolecule = eachMolecule + \n
 mol = indigo.loadMolecule(eachMolecule)
 # print(Indigo Output -  + mol.canonicalSmiles())
 # Use Indigo Canonical Smiles to create RDKit molecule
 mol = Chem.MolFromSmiles(mol.canonicalSmiles())
 if mol is not None:
 print(RDKit Output -   + Chem.MolToSmiles(mol,
 isomericSmiles=True))

 return 0

 mol = Chem.MolFromSmiles(CC(C)C1[NH+](C2CCN(CC)CC2)CC1)
 z = runCorinaTest(mol)


 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Stereochemistry - Differences between RDKit Indigo

2015-08-19 Thread Greg Landrum
Hi Rob,

The results below are quite strange. As John has already pointed out: there
really shouldn't be chirality present on either the N+ or the C that has
two methyls attached.

I tried to reproduce the problem by running corina myself using the same
command-line options you provided (from SMILES instead of SDF, but I don't
think that should make a difference), but I get sensible results;

In [5]: s = Chem.SDMolSupplier('sample.sdf')

In [6]: for m in s:
Chem.AssignAtomChiralTagsFromStructure(m)
Chem.AssignStereochemistry(m,cleanIt=True,force=True)
   ...: print Chem.MolToSmiles(m,True)
   ...:
CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1

In [7]: s = Chem.SDMolSupplier('sample.sdf')

In [8]: for m in s:
Chem.AssignAtomChiralTagsFromStructure(m)
print Chem.MolToSmiles(m,True)
   ...:
CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1
CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1


Could you please send the SDF that corina generates so I can try to
reproduce the problem (or at least try to understand what's gong on) from
that?

Thanks,
-greg

On Wed, Aug 19, 2015 at 3:00 PM, Rob Smith robtsm...@gmail.com wrote:

 Dear RDKit community,

 I'm trying to use RDKit to read in Corina generated stereoisomers (from a
 Mol file), assign chiral tags and stereochemistry to the structure and
 output the canonical smiles string for each isomer of a given molecule (in
 Python), when I do this, half the canonical smiles strings are not unique.

 When I read in the output from Corina into an Indigo instance, then use
 the canonical smiles from Indigo to create an RDKit molecule, canonical
 smiles strings generated from the molecule objects are all unique.

 I may be missing an option to enable RDKit to 'visualise' the chiral
 centre adjacent to the protonated nitrogen, so if someone can spot where
 I've made a mistake, I'd really appreciate it. I've included the output and
 Python script below. If you require any further information, please let me
 know.

 Many thanks,
 Rob

 Output:

 RDKit Read in of Molecule
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1
 RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1

 INDIGO Read in of Molecule
 RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1
 RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1

 Python script :

 from rdkit import Chem
 import subprocess # Used to run Corina
 from indigo import *

 def runCorinaTest(inputMol):
 indigo = Indigo()

 molFile = Chem.MolToMolBlock(inputMol)

 corinaCommand = echo \' + molFile + \' | 
 # Then Corina - generate stereoisomers...
 corinaCommand = corinaCommand + /apps/corina/corina -t n -d
 canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf
 corinaResult = subprocess.check_output([corinaCommand], shell=True) #
 Gives the stereoisomer species as an SDF string

 allMoleculeObjects = []
 allMolecules = corinaResult.split(\n) # Separate Corina output
 into individual molecules
 allMolecules = allMolecules[0:len(allMolecules)-1]

 print(RDKit Read in of Molecule)

 for eachMolecule in allMolecules:
 eachMolecule = eachMolecule + \n
 mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True,
 removeHs=True, strictParsing=False)
 Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol,
 replaceExistingTags=True)
 Chem.rdmolops.AssignStereochemistry(mol)
 print(RDKit Output -   + Chem.MolToSmiles(mol,
 isomericSmiles=True))

 print(INDIGO Read in of Molecule)
 for eachMolecule in allMolecules:
 eachMolecule = eachMolecule + \n
 mol = indigo.loadMolecule(eachMolecule)
 # print(Indigo Output -  + mol.canonicalSmiles())
 # Use Indigo Canonical Smiles to create RDKit molecule
 mol = Chem.MolFromSmiles(mol.canonicalSmiles())
 if mol is not None