Hi Dave and Pablo,
Thanks for your helpful replies. @Dave, issue created: https://github.com/rdkit/rdkit/issues/3514 @Pablo, your gist shows that the internal representation of the mol does indeed factor in undefined stereo, contrary to the way it is depicted. But why then does this happen when I check if the 2 molecules are the same? smi = Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') isosmi = Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O') print(smi == isosmi) #True, expect False print(smi.HasSubstructMatch(isosmi)) #True, expect False print(isosmi.HasSubstructMatch(smi)) #True, expect False print(smi.HasSubstructMatch(isosmi) and isosmi.HasSubstructMatch(smi)) #True, expect False However, converting smi and isosmi to canonical smiles and comparing them gives False, as expected: a = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') b = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O') a == b #False (If there are better ways to check if 2 molecules are equal, I'd be interested to know.) https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/9DF05ED7-A30E-4742-A568-9B3995689382%40dalkescientific.com/#msg29882815 ? Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai ________________________________ From: Paolo Tosco <paolo.tosco.m...@gmail.com> Sent: Tuesday, October 20, 2020 1:52:12 PM To: Adelene LAI Cc: rdkit-discuss Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry? Hi Adelene, this gist https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b shows how to add stereo annotations to RDKit 2D depictions, and also how to access the double bond stereochemistry programmatically. Cheers, p. On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI <adelene....@uni.lu<mailto:adelene....@uni.lu>> wrote: Hi RDKit Community, Is there a way to preserve undefined stereochemistry aka unspecified stereochemistry when doing MolFromSmiles? I'm working with a bunch of molecules, some with stereochemistry defined, some without. If stereochemistry is undefined in the SMILES, I would like it to stay that way when converted to a Mol, but this doesn't seem to be the case: > mol = > Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') > mol [https://owa.uni.lu/owa/service.svc/s/GetFileAttachment?id=AAMkAGZmYjQwYmQ2LTcxODYtNDNhYi1hNTZlLTFiNDgxODA0MjNiZQBGAAAAAADhez7GVLyNT6vooKL2ihHhBwBuSX%2BNSPCHQainUEFyygsfAAAB%2B4B1AABuSX%2BNSPCHQainUEFyygsfAAGQzO9iAAABEgAQACo4Qhn9gSVGjyknvlrNy9g%3D&X-OWA-CANARY=KzXvJGD5S0GSEPfNkS5fZYDFe7bcdNgIObv5ckhjF4wefmj-g3q1TT_E6gcW1r5xr5EjBUEwMBo.&isImagePreview=True] One would expect that C=C to either be crossed, as in PubChem's depiction: https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure [https://lh6.googleusercontent.com/qcj3x-KsughszG8tryquO6V-VDfqWT0oNF-LfA0jHbbue2pSzA69HqOAWsa_34FYyxQKfTdJv6gWeIsXW-hhNglMy4_rpf6l_x-Y3ufGRpuz_c1ZCK69k4VKVmE1Cq93rhdD7a7ij8U]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure> or that single bond to be squiggly, as in CDK's depiction: [https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CC(C)(C1%3DCC(%3DC(C(%3DC1)Br)O)Br)C(%3DCC(C(%3DO)O)Br)CC(%3DO)O&w=80&h=50&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none] But it's not just a matter of depiction, as it seems internally, mol is equivalent to its stereochem-specific sibling (Entgegen form) CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O I've tried sanitize=False, but it doesn't seem to have any effect. I would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every molecule with undefined stereochem (not sure how I would even go about that...). Possibly related to: https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570 <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128 o = Chem.MolFromSmiles('C/C=C/C') https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html https://github.com/openforcefield/openforcefield/issues/146 Any help would be much appreciated. Thanks, Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss