Re: [Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-20 Thread Ivan Tubert-Brohman
Thank you, Greg and Andrew, for your replies, and I'm glad to hear that
this is something that can be fixed within RDKit. I had almost forgotten I
had sent this email... :-)

Best,
Ivan

On Wed, Nov 20, 2019 at 12:17 AM Greg Landrum 
wrote:

> Hi Ivan,
>
> I agree that there is a bug here, but I think the problem is actually that
> the double bond is being assigned stereochemistry at all in this case.
>
> In [2]: m = Chem.MolFromSmiles('[H]/C=C/F')
>
>
>
> In [3]: m.Debug()
>
>
> Atoms:
> 0 1 H chg: 0  deg: 1 exp: 1 imp: 0 hyb: 1 arom?: 0 chi: 0
> 1 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
> 2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
> 3 9 F chg: 0  deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0
> Bonds:
> 0 0->1 order: 1 dir: 4 conj?: 0 aromatic?: 0
> 1 1->2 order: 2 stereo: 3 stereoAts: (0 3) conj?: 0 aromatic?: 0
> 2 2->3 order: 1 dir: 4 conj?: 0 aromatic?: 0
>
>
> Given that the two substituents on the first C are the same, the double
> bond shouldn't be marked as STEREOE at all.
>
> I'll get this fixed.
> -greg
>
>
>
> On Wed, Nov 6, 2019 at 4:34 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi,
>>
>> For reasons to complicated to get into here, I ended up with a molecule
>> containing a =CH2 in which one of the hydrogens was explicit and had E/Z
>> stereo info. For example, consider [H]/C=C/F.
>>
>> I was surprised that RemoveHs() refused to remove the hydrogen, although
>> later I found that that's the documented behavior, and generally it makes
>> sense as a way to prevent the loss of stereochemical information.
>>
>> For example, compare these two:
>>
>> In [7]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]/C=C/F')))
>> Out[7]: '[H]/C=C/F'
>>
>> In [8]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]C=C/F')))
>> Out[8]: 'C=CF'
>>
>> A chemist would say that these two are obviously the same molecule, and
>> arguably the second representation is better, because a double bond ending
>> in =CH2 can't have geometric isomers. Maybe it's unreasonable to expect
>> RDKit to make that kind of inference, but still I wonder, what would be a
>> good automated way to get from [H]/C=C/F to C=CF?
>>
>> One idea is to add a "=CH2 cleanup" step, perhaps implemented by applying
>> this reaction:
>>
>> [H][C:1]=[*:2]>>[CH2:1]=[*:2]
>>
>> but perhaps there's a better way?
>>
>> Best,
>> Ivan
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-19 Thread Greg Landrum
Hi Ivan,

I agree that there is a bug here, but I think the problem is actually that
the double bond is being assigned stereochemistry at all in this case.

In [2]: m = Chem.MolFromSmiles('[H]/C=C/F')



In [3]: m.Debug()


Atoms:
0 1 H chg: 0  deg: 1 exp: 1 imp: 0 hyb: 1 arom?: 0 chi: 0
1 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
3 9 F chg: 0  deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0
Bonds:
0 0->1 order: 1 dir: 4 conj?: 0 aromatic?: 0
1 1->2 order: 2 stereo: 3 stereoAts: (0 3) conj?: 0 aromatic?: 0
2 2->3 order: 1 dir: 4 conj?: 0 aromatic?: 0


Given that the two substituents on the first C are the same, the double
bond shouldn't be marked as STEREOE at all.

I'll get this fixed.
-greg



On Wed, Nov 6, 2019 at 4:34 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi,
>
> For reasons to complicated to get into here, I ended up with a molecule
> containing a =CH2 in which one of the hydrogens was explicit and had E/Z
> stereo info. For example, consider [H]/C=C/F.
>
> I was surprised that RemoveHs() refused to remove the hydrogen, although
> later I found that that's the documented behavior, and generally it makes
> sense as a way to prevent the loss of stereochemical information.
>
> For example, compare these two:
>
> In [7]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]/C=C/F')))
> Out[7]: '[H]/C=C/F'
>
> In [8]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]C=C/F')))
> Out[8]: 'C=CF'
>
> A chemist would say that these two are obviously the same molecule, and
> arguably the second representation is better, because a double bond ending
> in =CH2 can't have geometric isomers. Maybe it's unreasonable to expect
> RDKit to make that kind of inference, but still I wonder, what would be a
> good automated way to get from [H]/C=C/F to C=CF?
>
> One idea is to add a "=CH2 cleanup" step, perhaps implemented by applying
> this reaction:
>
> [H][C:1]=[*:2]>>[CH2:1]=[*:2]
>
> but perhaps there's a better way?
>
> Best,
> Ivan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss