Hi Greg, Hi Paolo,

@Paolo - thanks for the updated gist!


@Greg - thanks for this detailed explanation. I think it makes sense to equate 
unspecified with unknown stereochem. I can't think of any obvious caveats to 
this convention change for now (but maybe others in the community can?).


When you say "have unspecified double bonds be marked as unknown", you mean 
have unspecified double bonds be represented by crossed bonds too?


If so, would this loop you're suggesting be computationally not-too-expensive 
when working with 1000s of molecules?




Thanks and good morning!


Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai





________________________________
From: Greg Landrum <greg.land...@gmail.com>
Sent: Wednesday, October 21, 2020 6:15:58 AM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Paolo's gist includes a vocabulary mistake[1] that I think is confusing things 
here.

In the RDKit the stereochemistry of a double bond can be unspecified, unknown, 
or known. Unspecified means that you haven't said anything about what the 
stereo is; unknown means that you've actively provided the information that you 
don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in molecule drawings when the 
stereochemistry of the double bond is unknown.

The problem here is that in standard SMILES there is no way to actively specify 
that you don't know the stereochemistry of a double bond (the same thing 
applies to stereocenters). You can either provide information about the 
stereochemistry by using "/" and "\" bonds, or you provide no information. So 
the SMILES C/C=C/C produces a double bond with known stereochemistry but CC=CC 
produces a double bond with unspecified stereochemistry.

If, based on what you know about the SMILES that you are parsing, you would 
like to change the convention and have unspecified double bonds be marked as 
unknown, it's straightforward to write a script that loops over the molecule 
and makes that change (watch out for ring bonds).

-greg
[1] Perhaps "mistake" isn't the right word. It's confusing

On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
<paolo.tosco.m...@gmail.com<mailto:paolo.tosco.m...@gmail.com>> wrote:
Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
<adelene....@uni.lu<mailto:adelene....@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://owa.uni.lu/owa/]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>


or that single bond to be squiggly, as in CDK's depiction:

[https://owa.uni.lu/owa/]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570


<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>

<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai





_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to