Hi Dave and Pablo,

Thanks for your helpful replies.


@Dave, issue created: https://github.com/rdkit/rdkit/issues/3514


@Pablo, your gist shows that the internal representation of the mol does indeed 
factor in undefined stereo, contrary to the way it is depicted.


But why then does this happen when I check if the 2 molecules are the same?


smi = Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
isosmi = 
Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
print(smi == isosmi)                    #True, expect False
print(smi.HasSubstructMatch(isosmi)) #True, expect False
print(isosmi.HasSubstructMatch(smi))   #True, expect False
print(smi.HasSubstructMatch(isosmi) and isosmi.HasSubstructMatch(smi))   #True, 
expect False


However, converting smi and isosmi to canonical smiles and comparing them gives 
False, as expected:

a = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
b = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
a == b       #False


(If there are better ways to check if 2 molecules are equal, I'd be interested 
to know.)
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/9DF05ED7-A30E-4742-A568-9B3995689382%40dalkescientific.com/#msg29882815
 ?


Adelene





Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai





________________________________
From: Paolo Tosco <paolo.tosco.m...@gmail.com>
Sent: Tuesday, October 20, 2020 1:52:12 PM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
<adelene....@uni.lu<mailto:adelene....@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/service.svc/s/GetFileAttachment?id=AAMkAGZmYjQwYmQ2LTcxODYtNDNhYi1hNTZlLTFiNDgxODA0MjNiZQBGAAAAAADhez7GVLyNT6vooKL2ihHhBwBuSX%2BNSPCHQainUEFyygsfAAAB%2B4B1AABuSX%2BNSPCHQainUEFyygsfAAGQzO9iAAABEgAQACo4Qhn9gSVGjyknvlrNy9g%3D&X-OWA-CANARY=KzXvJGD5S0GSEPfNkS5fZYDFe7bcdNgIObv5ckhjF4wefmj-g3q1TT_E6gcW1r5xr5EjBUEwMBo.&isImagePreview=True]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://lh6.googleusercontent.com/qcj3x-KsughszG8tryquO6V-VDfqWT0oNF-LfA0jHbbue2pSzA69HqOAWsa_34FYyxQKfTdJv6gWeIsXW-hhNglMy4_rpf6l_x-Y3ufGRpuz_c1ZCK69k4VKVmE1Cq93rhdD7a7ij8U]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>


or that single bond to be squiggly, as in CDK's depiction:

[https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CC(C)(C1%3DCC(%3DC(C(%3DC1)Br)O)Br)C(%3DCC(C(%3DO)O)Br)CC(%3DO)O&w=80&h=50&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570


<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>

<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai





_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to