Dear Paolo,
Thanks for updating the gist - it's a really important resource for me and probably future RDKit beginners too. Thanks. I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag to SmilesParserParams. I think this way circumvents having to do a SS-match + BondStereo replacement loop. To clarify, will implementing the above effectively mean unspecified stereo will be depicted as a crossed double bond too? Because then, the only way to differentiate between stereo unspecified and stereo unknown would be to run bond.GetStereo(), which would give STEREOANY or STEREONONE respectively. I think this would be OK...unless depiction-folks have alternative suggestions. Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai ________________________________ From: Paolo Tosco <paolo.tosco.m...@gmail.com> Sent: Wednesday, October 21, 2020 10:56:24 AM To: Adelene LAI Cc: Greg Landrum; rdkit-discuss Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry? Hi Adelene, Greg, I have updated my gist fixing my gross vocabulary mistake ("undefined" to "unspecified") and I have also added an example of the crossed bond depiction by changing the BondStereo attribute to STEREOANY. @Adelene: I think you touched an interesting point here. There are indeed cases where it would be nice to address the SMILES ambiguity (no way to symbolically discriminate "unspecified" from "unknown") more efficiently than by doing a time-consuming (and potentially error-prone) substructure match and BondStereo replacement on all input molecules, particularly if you have a large number of those. I propose to do that by adding a unspecifiedBondStereoMeansUnknown (suggestion on a better name welcome) flag to SmilesParserParams - I believe that would be useful to many. Cheers, p. On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI <adelene....@uni.lu<mailto:adelene....@uni.lu>> wrote: Hi Greg, Hi Paolo, @Paolo - thanks for the updated gist! @Greg - thanks for this detailed explanation. I think it makes sense to equate unspecified with unknown stereochem. I can't think of any obvious caveats to this convention change for now (but maybe others in the community can?). When you say "have unspecified double bonds be marked as unknown", you mean have unspecified double bonds be represented by crossed bonds too? If so, would this loop you're suggesting be computationally not-too-expensive when working with 1000s of molecules? Thanks and good morning! Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai ________________________________ From: Greg Landrum <greg.land...@gmail.com<mailto:greg.land...@gmail.com>> Sent: Wednesday, October 21, 2020 6:15:58 AM To: Adelene LAI Cc: rdkit-discuss Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry? Paolo's gist includes a vocabulary mistake[1] that I think is confusing things here. In the RDKit the stereochemistry of a double bond can be unspecified, unknown, or known. Unspecified means that you haven't said anything about what the stereo is; unknown means that you've actively provided the information that you don't know what the stereochemistry is; known is clear. The RDKit only draws crossed bonds in molecule drawings when the stereochemistry of the double bond is unknown. The problem here is that in standard SMILES there is no way to actively specify that you don't know the stereochemistry of a double bond (the same thing applies to stereocenters). You can either provide information about the stereochemistry by using "/" and "\" bonds, or you provide no information. So the SMILES C/C=C/C produces a double bond with known stereochemistry but CC=CC produces a double bond with unspecified stereochemistry. If, based on what you know about the SMILES that you are parsing, you would like to change the convention and have unspecified double bonds be marked as unknown, it's straightforward to write a script that loops over the molecule and makes that change (watch out for ring bonds). -greg [1] Perhaps "mistake" isn't the right word. It's confusing On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco <paolo.tosco.m...@gmail.com<mailto:paolo.tosco.m...@gmail.com>> wrote: Hi Adelene, this gist https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b shows how to add stereo annotations to RDKit 2D depictions, and also how to access the double bond stereochemistry programmatically. Cheers, p. On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI <adelene....@uni.lu<mailto:adelene....@uni.lu>> wrote: Hi RDKit Community, Is there a way to preserve undefined stereochemistry aka unspecified stereochemistry when doing MolFromSmiles? I'm working with a bunch of molecules, some with stereochemistry defined, some without. If stereochemistry is undefined in the SMILES, I would like it to stay that way when converted to a Mol, but this doesn't seem to be the case: > mol = > Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') > mol [https://owa.uni.lu/owa/] One would expect that C=C to either be crossed, as in PubChem's depiction: https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure [https://owa.uni.lu/owa/]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure> or that single bond to be squiggly, as in CDK's depiction: [https://owa.uni.lu/owa/] But it's not just a matter of depiction, as it seems internally, mol is equivalent to its stereochem-specific sibling (Entgegen form) CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O I've tried sanitize=False, but it doesn't seem to have any effect. I would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every molecule with undefined stereochem (not sure how I would even go about that...). Possibly related to: https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570 <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128 o = Chem.MolFromSmiles('C/C=C/C') https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html https://github.com/openforcefield/openforcefield/issues/146 Any help would be much appreciated. Thanks, Adelene Doctoral Researcher Environmental Cheminformatics UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE 6, avenue du Swing, L-4367 Belvaux T +356 46 66 44 67 18 [github.png] adelenelai _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss