Hi Adelene, In SMILES, there’s no way of distinguishing between unknown and unspecified. Technically in a SMILES string it’s either specified or unspecified. In an SDF you can also say you have a Rumsfeldian “known unknown”.
Dave On Thu, 22 Oct 2020 at 10:07, Adelene LAI <adelene....@uni.lu> wrote: > Dear Paolo, > > > > Thanks for updating the gist - it's a really important resource for me and > probably future RDKit beginners too. Thanks. > > > I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag > to SmilesParserParams. I think this way circumvents having to do a > SS-match + BondStereo replacement loop. > > > To clarify, will implementing the above effectively mean unspecified > stereo will be depicted as a crossed double bond too? > > > Because then, the only way to differentiate between stereo unspecified and > stereo unknown would be to run bond.GetStereo(), which would give > STEREOANY or STEREONONE respectively. I think this would be OK...unless > depiction-folks have alternative suggestions. > > > Adelene > > > > > > > > > > > > > > > > Doctoral Researcher > > Environmental Cheminformatics > > UNIVERSITÉ DU LUXEMBOURG > > > LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE > > 6, avenue du Swing > <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, > L-4367 Belvaux > > T +356 46 66 44 67 18 > > [image: github.png] adelenelai > > > > > > ------------------------------ > *From:* Paolo Tosco <paolo.tosco.m...@gmail.com> > *Sent:* Wednesday, October 21, 2020 10:56:24 AM > *To:* Adelene LAI > *Cc:* Greg Landrum; rdkit-discuss > > *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry? > > Hi Adelene, Greg, > > I have updated my gist fixing my gross vocabulary mistake ("undefined" to > "unspecified") and I have also added an example of the crossed bond > depiction by changing the BondStereo attribute to STEREOANY. > > @Adelene: I think you touched an interesting point here. There are indeed > cases where it would be nice to address the SMILES ambiguity (no way to > symbolically discriminate "unspecified" from "unknown") more efficiently > than by doing a time-consuming (and potentially error-prone) > substructure match and BondStereo replacement on all input molecules, > particularly if you have a large number of those. > > I propose to do that by adding a unspecifiedBondStereoMeansUnknown > (suggestion on a better name welcome) flag to SmilesParserParams - I > believe that would be useful to many. > > Cheers, > p. > > On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI <adelene....@uni.lu> wrote: > >> Hi Greg, Hi Paolo, >> >> >> @Paolo - thanks for the updated gist! >> >> >> @Greg - thanks for this detailed explanation. I think it makes sense to >> equate unspecified with unknown stereochem. I can't think of any obvious >> caveats to this convention change for now (but maybe others in the >> community can?). >> >> >> When you say "have unspecified double bonds be marked as unknown", you >> mean have unspecified double bonds be represented by crossed bonds too? >> >> >> If so, would this loop you're suggesting be computationally >> not-too-expensive when working with 1000s of molecules? >> >> >> >> >> Thanks and good morning! >> >> >> Adelene >> >> >> >> Doctoral Researcher >> >> Environmental Cheminformatics >> >> UNIVERSITÉ DU LUXEMBOURG >> >> >> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >> >> 6, avenue du Swing >> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >> L-4367 Belvaux >> >> T +356 46 66 44 67 18 >> >> [image: github.png] adelenelai >> >> >> >> >> >> ------------------------------ >> *From:* Greg Landrum <greg.land...@gmail.com> >> *Sent:* Wednesday, October 21, 2020 6:15:58 AM >> *To:* Adelene LAI >> *Cc:* rdkit-discuss >> *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry? >> >> Paolo's gist includes a vocabulary mistake[1] that I think is confusing >> things here. >> >> In the RDKit the stereochemistry of a double bond can be unspecified, >> unknown, or known. Unspecified means that you haven't said anything about >> what the stereo is; unknown means that you've actively provided the >> information that you don't know what the stereochemistry is; known is clear. >> >> The RDKit only draws crossed bonds in molecule drawings when the >> stereochemistry of the double bond is unknown. >> >> The problem here is that in standard SMILES there is no way to actively >> specify that you don't know the stereochemistry of a double bond (the same >> thing applies to stereocenters). You can either provide information about >> the stereochemistry by using "/" and "\" bonds, or you provide no >> information. So the SMILES C/C=C/C produces a double bond with known >> stereochemistry but CC=CC produces a double bond with unspecified >> stereochemistry. >> >> If, based on what you know about the SMILES that you are parsing, you >> would like to change the convention and have unspecified double bonds be >> marked as unknown, it's straightforward to write a script that loops over >> the molecule and makes that change (watch out for ring bonds). >> >> -greg >> [1] Perhaps "mistake" isn't the right word. It's confusing >> >> On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco <paolo.tosco.m...@gmail.com> >> wrote: >> >>> Hi Adelene, >>> >>> this gist >>> >>> https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b >>> >>> shows how to add stereo annotations to RDKit 2D depictions, and also how >>> to access the double bond stereochemistry programmatically. >>> >>> Cheers, >>> p. >>> >>> >>> On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI <adelene....@uni.lu> wrote: >>> >>>> Hi RDKit Community, >>>> >>>> >>>> Is there a way to preserve undefined stereochemistry aka unspecified >>>> stereochemistry when doing MolFromSmiles? >>>> >>>> I'm working with a bunch of molecules, some with stereochemistry >>>> defined, some without. >>>> >>>> >>>> If stereochemistry is undefined in the SMILES, I would like it to stay >>>> that way when converted to a Mol, but this doesn't seem to be the case: >>>> >>>> >>>> > mol = >>>> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O') >>>> > mol >>>> >>>> One would expect that C=C to either be crossed, as in PubChem's >>>> depiction: >>>> >>>> https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure >>>> >>>> >>>> <https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure> >>>> >>>> >>>> or that single bond to be squiggly, as in CDK's depiction: >>>> >>>> But it's not just a matter of depiction, as it seems internally, mol is >>>> equivalent to its stereochem-specific sibling (Entgegen form) >>>> >>>> >>>> CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O >>>> >>>> >>>> >>>> I've tried sanitize=False, but it doesn't seem to have any effect. I >>>> would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) >>>> for every molecule with undefined stereochem (not sure how I would even go >>>> about that...). >>>> >>>> >>>> Possibly related to: >>>> >>>> >>>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570 >>>> >>>> >>>> >>>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> >>>> >>>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128 >>>> o = Chem.MolFromSmiles('C/C=C/C') >>>> >>>> >>>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570> >>>> https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html >>>> >>>> https://github.com/openforcefield/openforcefield/issues/146 >>>> >>>> >>>> >>>> >>>> Any help would be much appreciated. >>>> >>>> >>>> Thanks, >>>> >>>> Adelene >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Doctoral Researcher >>>> >>>> Environmental Cheminformatics >>>> >>>> UNIVERSITÉ DU LUXEMBOURG >>>> >>>> >>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >>>> >>>> 6, avenue du Swing >>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >>>> L-4367 Belvaux >>>> >>>> T +356 46 66 44 67 18 >>>> >>>> [image: github.png] adelenelai >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdkit-discuss@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss