Hi Adelene,
In SMILES, there’s no way of distinguishing between unknown and
unspecified. Technically in a SMILES string it’s either specified or
unspecified. In an SDF you can also say you have a Rumsfeldian “known
unknown”.

Dave

On Thu, 22 Oct 2020 at 10:07, Adelene LAI <adelene....@uni.lu> wrote:

> Dear Paolo,
>
>
>
> Thanks for updating the gist - it's a really important resource for me and
> probably future RDKit beginners too. Thanks.
>
>
> I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag
> to SmilesParserParams. I think this way  circumvents having to do a
> SS-match + BondStereo replacement loop.
>
>
> To clarify, will implementing the above effectively mean unspecified
> stereo will be depicted as a crossed double bond too?
>
>
> Because then, the only way to differentiate between stereo unspecified and
> stereo unknown would be to run bond.GetStereo(), which would give
> STEREOANY or STEREONONE respectively. I think this would be OK...unless
> depiction-folks have alternative suggestions.
>
>
> Adelene
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing
> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
> L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> ------------------------------
> *From:* Paolo Tosco <paolo.tosco.m...@gmail.com>
> *Sent:* Wednesday, October 21, 2020 10:56:24 AM
> *To:* Adelene LAI
> *Cc:* Greg Landrum; rdkit-discuss
>
> *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry?
>
> Hi Adelene, Greg,
>
> I have updated my gist fixing my gross vocabulary mistake ("undefined" to
> "unspecified") and I have also added an example of the crossed bond
> depiction by changing the BondStereo attribute to STEREOANY.
>
> @Adelene: I think you touched an interesting point here. There are indeed
> cases where it would be nice to address the SMILES ambiguity (no way to
> symbolically discriminate "unspecified" from "unknown") more efficiently
> than by doing a time-consuming (and potentially error-prone)
> substructure match and BondStereo replacement on all input molecules,
> particularly if you have a large number of those.
>
> I propose to do that by adding a unspecifiedBondStereoMeansUnknown
> (suggestion on a better name welcome) flag to SmilesParserParams - I
> believe that would be useful to many.
>
> Cheers,
> p.
>
> On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI <adelene....@uni.lu> wrote:
>
>> Hi Greg, Hi Paolo,
>>
>>
>> @Paolo - thanks for the updated gist!
>>
>>
>> @Greg - thanks for this detailed explanation. I think it makes sense to
>> equate unspecified with unknown stereochem. I can't think of any obvious
>> caveats to this convention change for now (but maybe others in the
>> community can?).
>>
>>
>> When you say "have unspecified double bonds be marked as unknown", you
>> mean have unspecified double bonds be represented by crossed bonds too?
>>
>>
>> If so, would this loop you're suggesting be computationally
>> not-too-expensive when working with 1000s of molecules?
>>
>>
>>
>>
>> Thanks and good morning!
>>
>>
>> Adelene
>>
>>
>>
>> Doctoral Researcher
>>
>> Environmental Cheminformatics
>>
>> UNIVERSITÉ DU LUXEMBOURG
>>
>>
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>
>> 6, avenue du Swing
>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>> L-4367 Belvaux
>>
>> T +356 46 66 44 67 18
>>
>> [image: github.png] adelenelai
>>
>>
>>
>>
>>
>> ------------------------------
>> *From:* Greg Landrum <greg.land...@gmail.com>
>> *Sent:* Wednesday, October 21, 2020 6:15:58 AM
>> *To:* Adelene LAI
>> *Cc:* rdkit-discuss
>> *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry?
>>
>> Paolo's gist includes a vocabulary mistake[1] that I think is confusing
>> things here.
>>
>> In the RDKit the stereochemistry of a double bond can be unspecified,
>> unknown, or known. Unspecified means that you haven't said anything about
>> what the stereo is; unknown means that you've actively provided the
>> information that you don't know what the stereochemistry is; known is clear.
>>
>> The RDKit only draws crossed bonds in molecule drawings when the
>> stereochemistry of the double bond is unknown.
>>
>> The problem here is that in standard SMILES there is no way to actively
>> specify that you don't know the stereochemistry of a double bond (the same
>> thing applies to stereocenters). You can either provide information about
>> the stereochemistry by using "/" and "\" bonds, or you provide no
>> information. So the SMILES C/C=C/C produces a double bond with known
>> stereochemistry but CC=CC produces a double bond with unspecified
>> stereochemistry.
>>
>> If, based on what you know about the SMILES that you are parsing, you
>> would like to change the convention and have unspecified double bonds be
>> marked as unknown, it's straightforward to write a script that loops over
>> the molecule and makes that change (watch out for ring bonds).
>>
>> -greg
>> [1] Perhaps "mistake" isn't the right word. It's confusing
>>
>> On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco <paolo.tosco.m...@gmail.com>
>> wrote:
>>
>>> Hi Adelene,
>>>
>>> this gist
>>>
>>> https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b
>>>
>>> shows how to add stereo annotations to RDKit 2D depictions, and also how
>>> to access the double bond stereochemistry programmatically.
>>>
>>> Cheers,
>>> p.
>>>
>>>
>>> On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI <adelene....@uni.lu> wrote:
>>>
>>>> Hi RDKit Community,
>>>>
>>>>
>>>> Is there a way to preserve undefined stereochemistry aka unspecified
>>>> stereochemistry when doing MolFromSmiles?
>>>>
>>>> I'm working with a bunch of molecules, some with stereochemistry
>>>> defined, some without.
>>>>
>>>>
>>>> If stereochemistry is undefined in the SMILES, I would like it to stay
>>>> that way when converted to a Mol, but this doesn't seem to be the case:
>>>>
>>>>
>>>> > mol =
>>>> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
>>>> > mol
>>>>
>>>> One would expect that C=C to either be crossed, as in PubChem's
>>>> depiction:
>>>>
>>>> https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure
>>>>
>>>>
>>>> <https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>
>>>>
>>>>
>>>> or that single bond to be squiggly, as in CDK's depiction:
>>>>
>>>> But it's not just a matter of depiction, as it seems internally, mol is
>>>> equivalent to its stereochem-specific sibling (Entgegen form)
>>>>
>>>>
>>>> CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O
>>>>
>>>>
>>>>
>>>> I've tried sanitize=False, but it doesn't seem to have any effect. I
>>>> would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY)
>>>> for every molecule with undefined stereochem (not sure how I would even go
>>>> about that...).
>>>>
>>>>
>>>> Possibly related to:
>>>>
>>>>
>>>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570
>>>>
>>>>
>>>>
>>>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
>>>>
>>>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
>>>> o = Chem.MolFromSmiles('C/C=C/C')
>>>>
>>>>
>>>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
>>>> https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html
>>>>
>>>> https://github.com/openforcefield/openforcefield/issues/146
>>>>
>>>>
>>>>
>>>>
>>>> Any help would be much appreciated.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Adelene
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Doctoral Researcher
>>>>
>>>> Environmental Cheminformatics
>>>>
>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>
>>>>
>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>
>>>> 6, avenue du Swing
>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>>>> L-4367 Belvaux
>>>>
>>>> T +356 46 66 44 67 18
>>>>
>>>> [image: github.png] adelenelai
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to