Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-23 Thread Adelene LAI
Hi Dave,


Understood, but I actually meant distinguishing between the mol objects of the 
unspecified vs. unknown stereochem forms, not their SMILES.


Since Paolo is proposing the option for both unspecified and unknown to be 
depicted as crossed bonds (and since both forms would have the same underlying 
SMILES), the only way the user could distinguish them would be to check 
bond.GetStereo.


Adelene

Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: David Cosgrove 
Sent: Thursday, October 22, 2020 12:46:54 PM
To: Adelene LAI
Cc: Greg Landrum; Paolo Tosco; rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene,
In SMILES, there’s no way of distinguishing between unknown and unspecified. 
Technically in a SMILES string it’s either specified or unspecified. In an SDF 
you can also say you have a Rumsfeldian “known unknown”.

Dave

On Thu, 22 Oct 2020 at 10:07, Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Dear Paolo,



Thanks for updating the gist - it's a really important resource for me and 
probably future RDKit beginners too. Thanks.


I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag to 
SmilesParserParams. I think this way  circumvents having to do a SS-match + 
BondStereo replacement loop.


To clarify, will implementing the above effectively mean unspecified stereo 
will be depicted as a crossed double bond too?


Because then, the only way to differentiate between stereo unspecified and 
stereo unknown would be to run bond.GetStereo(), which would give STEREOANY or 
STEREONONE respectively. I think this would be OK...unless depiction-folks have 
alternative suggestions.


Adelene














Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du 
Swing<https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
 L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>>
Sent: Wednesday, October 21, 2020 10:56:24 AM
To: Adelene LAI
Cc: Greg Landrum; rdkit-discuss

Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene, Greg,

I have updated my gist fixing my gross vocabulary mistake ("undefined" to 
"unspecified") and I have also added an example of the crossed bond depiction 
by changing the BondStereo attribute to STEREOANY.

@Adelene: I think you touched an interesting point here. There are indeed cases 
where it would be nice to address the SMILES ambiguity (no way to symbolically 
discriminate "unspecified" from "unknown") more efficiently than by doing a 
time-consuming (and potentially error-prone) substructure match and BondStereo 
replacement on all input molecules, particularly if you have a large number of 
those.

I propose to do that by adding a unspecifiedBondStereoMeansUnknown (suggestion 
on a better name welcome) flag to SmilesParserParams - I believe that would be 
useful to many.

Cheers,
p.

On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi Greg, Hi Paolo,


@Paolo - thanks for the updated gist!


@Greg - thanks for this detailed explanation. I think it makes sense to equate 
unspecified with unknown stereochem. I can't think of any obvious caveats to 
this convention change for now (but maybe others in the community can?).


When you say "have unspecified double bonds be marked as unknown", you mean 
have unspecified double bonds be represented by crossed bonds too?


If so, would this loop you're suggesting be computationally not-too-expensive 
when working with 1000s of molecules?




Thanks and good morning!


Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du 
Swing<https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
 L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Greg Landrum mailto:greg.land...@gmail.com>>
Sent: Wednesday, October 21, 2020 6:15:58 AM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Paolo's gist includes a vocabulary mistake[1] that I think is confusing things 
here.

In the RDKit the stereochemistry of a double bond can be unspecified, unknown, 
or known. Unspecified means that you haven't said anything about what the 
stereo is; unknown means that you've actively provided the information that you 
don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-22 Thread David Cosgrove
Hi Adelene,
In SMILES, there’s no way of distinguishing between unknown and
unspecified. Technically in a SMILES string it’s either specified or
unspecified. In an SDF you can also say you have a Rumsfeldian “known
unknown”.

Dave

On Thu, 22 Oct 2020 at 10:07, Adelene LAI  wrote:

> Dear Paolo,
>
>
>
> Thanks for updating the gist - it's a really important resource for me and
> probably future RDKit beginners too. Thanks.
>
>
> I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag
> to SmilesParserParams. I think this way  circumvents having to do a
> SS-match + BondStereo replacement loop.
>
>
> To clarify, will implementing the above effectively mean unspecified
> stereo will be depicted as a crossed double bond too?
>
>
> Because then, the only way to differentiate between stereo unspecified and
> stereo unknown would be to run bond.GetStereo(), which would give
> STEREOANY or STEREONONE respectively. I think this would be OK...unless
> depiction-folks have alternative suggestions.
>
>
> Adelene
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing
> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
> L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> ----------
> *From:* Paolo Tosco 
> *Sent:* Wednesday, October 21, 2020 10:56:24 AM
> *To:* Adelene LAI
> *Cc:* Greg Landrum; rdkit-discuss
>
> *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry?
>
> Hi Adelene, Greg,
>
> I have updated my gist fixing my gross vocabulary mistake ("undefined" to
> "unspecified") and I have also added an example of the crossed bond
> depiction by changing the BondStereo attribute to STEREOANY.
>
> @Adelene: I think you touched an interesting point here. There are indeed
> cases where it would be nice to address the SMILES ambiguity (no way to
> symbolically discriminate "unspecified" from "unknown") more efficiently
> than by doing a time-consuming (and potentially error-prone)
> substructure match and BondStereo replacement on all input molecules,
> particularly if you have a large number of those.
>
> I propose to do that by adding a unspecifiedBondStereoMeansUnknown
> (suggestion on a better name welcome) flag to SmilesParserParams - I
> believe that would be useful to many.
>
> Cheers,
> p.
>
> On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI  wrote:
>
>> Hi Greg, Hi Paolo,
>>
>>
>> @Paolo - thanks for the updated gist!
>>
>>
>> @Greg - thanks for this detailed explanation. I think it makes sense to
>> equate unspecified with unknown stereochem. I can't think of any obvious
>> caveats to this convention change for now (but maybe others in the
>> community can?).
>>
>>
>> When you say "have unspecified double bonds be marked as unknown", you
>> mean have unspecified double bonds be represented by crossed bonds too?
>>
>>
>> If so, would this loop you're suggesting be computationally
>> not-too-expensive when working with 1000s of molecules?
>>
>>
>>
>>
>> Thanks and good morning!
>>
>>
>> Adelene
>>
>>
>>
>> Doctoral Researcher
>>
>> Environmental Cheminformatics
>>
>> UNIVERSITÉ DU LUXEMBOURG
>>
>>
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>
>> 6, avenue du Swing
>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>> L-4367 Belvaux
>>
>> T +356 46 66 44 67 18
>>
>> [image: github.png] adelenelai
>>
>>
>>
>>
>>
>> --
>> *From:* Greg Landrum 
>> *Sent:* Wednesday, October 21, 2020 6:15:58 AM
>> *To:* Adelene LAI
>> *Cc:* rdkit-discuss
>> *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry?
>>
>> Paolo's gist includes a vocabulary mistake[1] that I think is confusing
>> things here.
>>
>> In the RDKit the stereochemistry of a double bond can be unspecified,
>> unknown, or known. Unspecified means that you haven't said anything about
>> what the stereo is; unknown means that you've actively provided the
>> information that you don't know what the stereochemistry is; known is clear.
>>
>> The RDKit only draws crossed bonds in molecule draw

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-22 Thread Adelene LAI
Dear Paolo,



Thanks for updating the gist - it's a really important resource for me and 
probably future RDKit beginners too. Thanks.


I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag to 
SmilesParserParams. I think this way  circumvents having to do a SS-match + 
BondStereo replacement loop.


To clarify, will implementing the above effectively mean unspecified stereo 
will be depicted as a crossed double bond too?


Because then, the only way to differentiate between stereo unspecified and 
stereo unknown would be to run bond.GetStereo(), which would give STEREOANY or 
STEREONONE respectively. I think this would be OK...unless depiction-folks have 
alternative suggestions.


Adelene














Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Paolo Tosco 
Sent: Wednesday, October 21, 2020 10:56:24 AM
To: Adelene LAI
Cc: Greg Landrum; rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene, Greg,

I have updated my gist fixing my gross vocabulary mistake ("undefined" to 
"unspecified") and I have also added an example of the crossed bond depiction 
by changing the BondStereo attribute to STEREOANY.

@Adelene: I think you touched an interesting point here. There are indeed cases 
where it would be nice to address the SMILES ambiguity (no way to symbolically 
discriminate "unspecified" from "unknown") more efficiently than by doing a 
time-consuming (and potentially error-prone) substructure match and BondStereo 
replacement on all input molecules, particularly if you have a large number of 
those.

I propose to do that by adding a unspecifiedBondStereoMeansUnknown (suggestion 
on a better name welcome) flag to SmilesParserParams - I believe that would be 
useful to many.

Cheers,
p.

On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi Greg, Hi Paolo,


@Paolo - thanks for the updated gist!


@Greg - thanks for this detailed explanation. I think it makes sense to equate 
unspecified with unknown stereochem. I can't think of any obvious caveats to 
this convention change for now (but maybe others in the community can?).


When you say "have unspecified double bonds be marked as unknown", you mean 
have unspecified double bonds be represented by crossed bonds too?


If so, would this loop you're suggesting be computationally not-too-expensive 
when working with 1000s of molecules?




Thanks and good morning!


Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Greg Landrum mailto:greg.land...@gmail.com>>
Sent: Wednesday, October 21, 2020 6:15:58 AM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Paolo's gist includes a vocabulary mistake[1] that I think is confusing things 
here.

In the RDKit the stereochemistry of a double bond can be unspecified, unknown, 
or known. Unspecified means that you haven't said anything about what the 
stereo is; unknown means that you've actively provided the information that you 
don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in molecule drawings when the 
stereochemistry of the double bond is unknown.

The problem here is that in standard SMILES there is no way to actively specify 
that you don't know the stereochemistry of a double bond (the same thing 
applies to stereocenters). You can either provide information about the 
stereochemistry by using "/" and "\" bonds, or you provide no information. So 
the SMILES C/C=C/C produces a double bond with known stereochemistry but CC=CC 
produces a double bond with unspecified stereochemistry.

If, based on what you know about the SMILES that you are parsing, you would 
like to change the convention and have unspecified double bonds be marked as 
unknown, it's straightforward to write a script that loops over the molecule 
and makes that change (watch out for ring bonds).

-greg
[1] Perhaps "mistake" isn't the right word. It's confusing

On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>> wrote:
Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to p

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-21 Thread Paolo Tosco
Hi Adelene, Greg,

I have updated my gist fixing my gross vocabulary mistake ("undefined" to
"unspecified") and I have also added an example of the crossed bond
depiction by changing the BondStereo attribute to STEREOANY.

@Adelene: I think you touched an interesting point here. There are indeed
cases where it would be nice to address the SMILES ambiguity (no way to
symbolically discriminate "unspecified" from "unknown") more efficiently
than by doing a time-consuming (and potentially error-prone)
substructure match and BondStereo replacement on all input molecules,
particularly if you have a large number of those.

I propose to do that by adding a unspecifiedBondStereoMeansUnknown
(suggestion on a better name welcome) flag to SmilesParserParams - I
believe that would be useful to many.

Cheers,
p.

On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI  wrote:

> Hi Greg, Hi Paolo,
>
>
> @Paolo - thanks for the updated gist!
>
>
> @Greg - thanks for this detailed explanation. I think it makes sense to
> equate unspecified with unknown stereochem. I can't think of any obvious
> caveats to this convention change for now (but maybe others in the
> community can?).
>
>
> When you say "have unspecified double bonds be marked as unknown", you
> mean have unspecified double bonds be represented by crossed bonds too?
>
>
> If so, would this loop you're suggesting be computationally
> not-too-expensive when working with 1000s of molecules?
>
>
>
>
> Thanks and good morning!
>
>
> Adelene
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Greg Landrum 
> *Sent:* Wednesday, October 21, 2020 6:15:58 AM
> *To:* Adelene LAI
> *Cc:* rdkit-discuss
> *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry?
>
> Paolo's gist includes a vocabulary mistake[1] that I think is confusing
> things here.
>
> In the RDKit the stereochemistry of a double bond can be unspecified,
> unknown, or known. Unspecified means that you haven't said anything about
> what the stereo is; unknown means that you've actively provided the
> information that you don't know what the stereochemistry is; known is clear.
>
> The RDKit only draws crossed bonds in molecule drawings when the
> stereochemistry of the double bond is unknown.
>
> The problem here is that in standard SMILES there is no way to actively
> specify that you don't know the stereochemistry of a double bond (the same
> thing applies to stereocenters). You can either provide information about
> the stereochemistry by using "/" and "\" bonds, or you provide no
> information. So the SMILES C/C=C/C produces a double bond with known
> stereochemistry but CC=CC produces a double bond with unspecified
> stereochemistry.
>
> If, based on what you know about the SMILES that you are parsing, you
> would like to change the convention and have unspecified double bonds be
> marked as unknown, it's straightforward to write a script that loops over
> the molecule and makes that change (watch out for ring bonds).
>
> -greg
> [1] Perhaps "mistake" isn't the right word. It's confusing
>
> On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
> wrote:
>
>> Hi Adelene,
>>
>> this gist
>>
>> https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b
>>
>> shows how to add stereo annotations to RDKit 2D depictions, and also how
>> to access the double bond stereochemistry programmatically.
>>
>> Cheers,
>> p.
>>
>>
>> On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI  wrote:
>>
>>> Hi RDKit Community,
>>>
>>>
>>> Is there a way to preserve undefined stereochemistry aka unspecified
>>> stereochemistry when doing MolFromSmiles?
>>>
>>> I'm working with a bunch of molecules, some with stereochemistry
>>> defined, some without.
>>>
>>>
>>> If stereochemistry is undefined in the SMILES, I would like it to stay
>>> that way when converted to a Mol, but this doesn't seem to be the case:
>>>
>>>
>>> > mol =
>>> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
>>> > mol
>>>
>>> One would expect that C=C to either be crossed, as in PubChem's
>>> depiction:
>>>
>>&

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Adelene LAI
Hi Greg, Hi Paolo,


@Paolo - thanks for the updated gist!


@Greg - thanks for this detailed explanation. I think it makes sense to equate 
unspecified with unknown stereochem. I can't think of any obvious caveats to 
this convention change for now (but maybe others in the community can?).


When you say "have unspecified double bonds be marked as unknown", you mean 
have unspecified double bonds be represented by crossed bonds too?


If so, would this loop you're suggesting be computationally not-too-expensive 
when working with 1000s of molecules?




Thanks and good morning!


Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Greg Landrum 
Sent: Wednesday, October 21, 2020 6:15:58 AM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Paolo's gist includes a vocabulary mistake[1] that I think is confusing things 
here.

In the RDKit the stereochemistry of a double bond can be unspecified, unknown, 
or known. Unspecified means that you haven't said anything about what the 
stereo is; unknown means that you've actively provided the information that you 
don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in molecule drawings when the 
stereochemistry of the double bond is unknown.

The problem here is that in standard SMILES there is no way to actively specify 
that you don't know the stereochemistry of a double bond (the same thing 
applies to stereocenters). You can either provide information about the 
stereochemistry by using "/" and "\" bonds, or you provide no information. So 
the SMILES C/C=C/C produces a double bond with known stereochemistry but CC=CC 
produces a double bond with unspecified stereochemistry.

If, based on what you know about the SMILES that you are parsing, you would 
like to change the convention and have unspecified double bonds be marked as 
unknown, it's straightforward to write a script that loops over the molecule 
and makes that change (watch out for ring bonds).

-greg
[1] Perhaps "mistake" isn't the right word. It's confusing

On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>> wrote:
Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://owa.uni.lu/owa/]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>


or that single bond to be squiggly, as in CDK's depiction:

[https://owa.uni.lu/owa/]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570


<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>

<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Resea

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Greg Landrum
Paolo's gist includes a vocabulary mistake[1] that I think is confusing
things here.

In the RDKit the stereochemistry of a double bond can be unspecified,
unknown, or known. Unspecified means that you haven't said anything about
what the stereo is; unknown means that you've actively provided the
information that you don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in molecule drawings when the
stereochemistry of the double bond is unknown.

The problem here is that in standard SMILES there is no way to actively
specify that you don't know the stereochemistry of a double bond (the same
thing applies to stereocenters). You can either provide information about
the stereochemistry by using "/" and "\" bonds, or you provide no
information. So the SMILES C/C=C/C produces a double bond with known
stereochemistry but CC=CC produces a double bond with unspecified
stereochemistry.

If, based on what you know about the SMILES that you are parsing, you would
like to change the convention and have unspecified double bonds be marked
as unknown, it's straightforward to write a script that loops over the
molecule and makes that change (watch out for ring bonds).

-greg
[1] Perhaps "mistake" isn't the right word. It's confusing

On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
wrote:

> Hi Adelene,
>
> this gist
>
> https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b
>
> shows how to add stereo annotations to RDKit 2D depictions, and also how
> to access the double bond stereochemistry programmatically.
>
> Cheers,
> p.
>
>
> On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI  wrote:
>
>> Hi RDKit Community,
>>
>>
>> Is there a way to preserve undefined stereochemistry aka unspecified
>> stereochemistry when doing MolFromSmiles?
>>
>> I'm working with a bunch of molecules, some with stereochemistry defined,
>> some without.
>>
>>
>> If stereochemistry is undefined in the SMILES, I would like it to stay
>> that way when converted to a Mol, but this doesn't seem to be the case:
>>
>>
>> > mol =
>> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
>> > mol
>>
>> One would expect that C=C to either be crossed, as in PubChem's depiction:
>>
>> https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure
>>
>> 
>>
>>
>> or that single bond to be squiggly, as in CDK's depiction:
>>
>> But it's not just a matter of depiction, as it seems internally, mol is
>> equivalent to its stereochem-specific sibling (Entgegen form)
>>
>>
>> CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O
>>
>>
>>
>> I've tried sanitize=False, but it doesn't seem to have any effect. I
>> would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY)
>> for every molecule with undefined stereochem (not sure how I would even go
>> about that...).
>>
>>
>> Possibly related to:
>>
>>
>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570
>>
>>
>>
>> 
>>
>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
>> o = Chem.MolFromSmiles('C/C=C/C')
>>
>>
>> 
>> https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html
>>
>> https://github.com/openforcefield/openforcefield/issues/146
>>
>>
>>
>>
>> Any help would be much appreciated.
>>
>>
>> Thanks,
>>
>> Adelene
>>
>>
>>
>>
>>
>>
>>
>>
>> Doctoral Researcher
>>
>> Environmental Cheminformatics
>>
>> UNIVERSITÉ DU LUXEMBOURG
>>
>>
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>
>> 6, avenue du Swing, L-4367 Belvaux
>>
>> T +356 46 66 44 67 18
>>
>> [image: github.png] adelenelai
>>
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Paolo Tosco
Hi Adelene,

I have updated the gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

to account for your questions.

Cheers,
p.

On Tue, Oct 20, 2020 at 2:08 PM Adelene LAI  wrote:

> Hi Dave and Pablo,
>
>
> Thanks for your helpful replies.
>
>
> @Dave, issue created: https://github.com/rdkit/rdkit/issues/3514
>
>
> @Pablo, your gist shows that the internal representation of the mol does
> indeed factor in undefined stereo, contrary to the way it is depicted.
>
>
> But why then does this happen when I check if the 2 molecules are the same?
>
>
> smi =
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> isosmi =
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
> print(smi == isosmi)#True, expect False
> print(smi.HasSubstructMatch(isosmi)) #True, expect False
> print(isosmi.HasSubstructMatch(smi))   #True, expect False
> print(smi.HasSubstructMatch(isosmi) and isosmi.HasSubstructMatch(smi))   
> #True,
> expect False
>
>
> However, converting smi and isosmi to canonical smiles and comparing them
> gives False, as expected:
>
> a =
> Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> b =
> Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
> a == b   #False
>
>
> (If there are better ways to check if 2 molecules are equal, I'd be
> interested to know.)
>
> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/9DF05ED7-A30E-4742-A568-9B3995689382%40dalkescientific.com/#msg29882815
> ?
>
>
> Adelene
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Paolo Tosco 
> *Sent:* Tuesday, October 20, 2020 1:52:12 PM
> *To:* Adelene LAI
> *Cc:* rdkit-discuss
> *Subject:* Re: [Rdkit-discuss] How to preserve undefined stereochemistry?
>
> Hi Adelene,
>
> this gist
>
> https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b
>
> shows how to add stereo annotations to RDKit 2D depictions, and also how
> to access the double bond stereochemistry programmatically.
>
> Cheers,
> p.
>
>
> On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI  wrote:
>
>> Hi RDKit Community,
>>
>>
>> Is there a way to preserve undefined stereochemistry aka unspecified
>> stereochemistry when doing MolFromSmiles?
>>
>> I'm working with a bunch of molecules, some with stereochemistry defined,
>> some without.
>>
>>
>> If stereochemistry is undefined in the SMILES, I would like it to stay
>> that way when converted to a Mol, but this doesn't seem to be the case:
>>
>>
>> > mol =
>> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
>> > mol
>>
>> One would expect that C=C to either be crossed, as in PubChem's depiction:
>>
>> https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure
>>
>> <https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>
>>
>>
>> or that single bond to be squiggly, as in CDK's depiction:
>>
>> But it's not just a matter of depiction, as it seems internally, mol is
>> equivalent to its stereochem-specific sibling (Entgegen form)
>>
>>
>> CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O
>>
>>
>>
>> I've tried sanitize=False, but it doesn't seem to have any effect. I
>> would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY)
>> for every molecule with undefined stereochem (not sure how I would even go
>> about that...).
>>
>>
>> Possibly related to:
>>
>>
>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570
>>
>>
>>
>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
>>
>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
>> o = Chem.MolFromSmiles('C/C=C/C')
>>
>>
>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-304

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Adelene LAI
Hi Dave and Pablo,


Thanks for your helpful replies.


@Dave, issue created: https://github.com/rdkit/rdkit/issues/3514


@Pablo, your gist shows that the internal representation of the mol does indeed 
factor in undefined stereo, contrary to the way it is depicted.


But why then does this happen when I check if the 2 molecules are the same?


smi = Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
isosmi = 
Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
print(smi == isosmi)#True, expect False
print(smi.HasSubstructMatch(isosmi)) #True, expect False
print(isosmi.HasSubstructMatch(smi))   #True, expect False
print(smi.HasSubstructMatch(isosmi) and isosmi.HasSubstructMatch(smi))   #True, 
expect False


However, converting smi and isosmi to canonical smiles and comparing them gives 
False, as expected:

a = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
b = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
a == b   #False


(If there are better ways to check if 2 molecules are equal, I'd be interested 
to know.)
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/9DF05ED7-A30E-4742-A568-9B3995689382%40dalkescientific.com/#msg29882815
 ?


Adelene





Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Paolo Tosco 
Sent: Tuesday, October 20, 2020 1:52:12 PM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/service.svc/s/GetFileAttachment?id=AAMkAGZmYjQwYmQ2LTcxODYtNDNhYi1hNTZlLTFiNDgxODA0MjNiZQBGAADhez7GVLyNT6vooKL2ihHhBwBuSX%2BNSPCHQainUEFyygsfAAAB%2B4B1AABuSX%2BNSPCHQainUEFyygsfAAGQzO9iAAABEgAQACo4Qhn9gSVGjyknvlrNy9g%3D&X-OWA-CANARY=KzXvJGD5S0GSEPfNkS5fZYDFe7bcdNgIObv5ckhjF4wefmj-g3q1TT_E6gcW1r5xr5EjBUEwMBo.&isImagePreview=True]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://lh6.googleusercontent.com/qcj3x-KsughszG8tryquO6V-VDfqWT0oNF-LfA0jHbbue2pSzA69HqOAWsa_34FYyxQKfTdJv6gWeIsXW-hhNglMy4_rpf6l_x-Y3ufGRpuz_c1ZCK69k4VKVmE1Cq93rhdD7a7ij8U]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>


or that single bond to be squiggly, as in CDK's depiction:

[https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CC(C)(C1%3DCC(%3DC(C(%3DC1)Br)O)Br)C(%3DCC(C(%3DO)O)Br)CC(%3DO)O&w=80&h=50&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570


<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>

<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FO

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Paolo Tosco
Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI  wrote:

> Hi RDKit Community,
>
>
> Is there a way to preserve undefined stereochemistry aka unspecified
> stereochemistry when doing MolFromSmiles?
>
> I'm working with a bunch of molecules, some with stereochemistry defined,
> some without.
>
>
> If stereochemistry is undefined in the SMILES, I would like it to stay
> that way when converted to a Mol, but this doesn't seem to be the case:
>
>
> > mol =
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> > mol
>
> One would expect that C=C to either be crossed, as in PubChem's depiction:
>
> https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure
>
> 
>
>
> or that single bond to be squiggly, as in CDK's depiction:
>
> But it's not just a matter of depiction, as it seems internally, mol is
> equivalent to its stereochem-specific sibling (Entgegen form)
>
>
> CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O
>
>
>
> I've tried sanitize=False, but it doesn't seem to have any effect. I
> would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY)
> for every molecule with undefined stereochem (not sure how I would even go
> about that...).
>
>
> Possibly related to:
>
>
> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570
>
>
>
> 
>
> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
> o = Chem.MolFromSmiles('C/C=C/C')
>
>
> 
> https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html
>
> https://github.com/openforcefield/openforcefield/issues/146
>
>
>
>
> Any help would be much appreciated.
>
>
> Thanks,
>
> Adelene
>
>
>
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Adelene LAI
Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/service.svc/s/GetFileAttachment?id=AAMkAGZmYjQwYmQ2LTcxODYtNDNhYi1hNTZlLTFiNDgxODA0MjNiZQBGAADhez7GVLyNT6vooKL2ihHhBwBuSX%2BNSPCHQainUEFyygsfAAAB%2B4B1AABuSX%2BNSPCHQainUEFyygsfAAGQzO9iAAABEgAQACo4Qhn9gSVGjyknvlrNy9g%3D&X-OWA-CANARY=KzXvJGD5S0GSEPfNkS5fZYDFe7bcdNgIObv5ckhjF4wefmj-g3q1TT_E6gcW1r5xr5EjBUEwMBo.&isImagePreview=True]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://lh6.googleusercontent.com/qcj3x-KsughszG8tryquO6V-VDfqWT0oNF-LfA0jHbbue2pSzA69HqOAWsa_34FYyxQKfTdJv6gWeIsXW-hhNglMy4_rpf6l_x-Y3ufGRpuz_c1ZCK69k4VKVmE1Cq93rhdD7a7ij8U]


or that single bond to be squiggly, as in CDK's depiction:

[https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CC(C)(C1%3DCC(%3DC(C(%3DC1)Br)O)Br)C(%3DCC(C(%3DO)O)Br)CC(%3DO)O&w=80&h=50&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570





https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss