Re: [Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen

2018-04-06 Thread Greg Landrum
Yeah, it would be great if you could create an issue for the general
problem that removeHs() should not remove H atoms that are contributing to
the definition of a stereo bond

On Fri, 6 Apr 2018 at 18:49, Dan Nealschneider <
dan.nealschnei...@schrodinger.com> wrote:

> Thanks, Greg-
>
>
>>
>>> What is the correct treatment of bond stereochemistry at centers for
>>> which a hydrogen is required in order to specify the bond stereochemistry?
>>> For example, an imine with a hydrogen substituent (trivial example,
>>> F/C=N/[H]).
>>>
>>
>> In these cases the H cannot be implicit. The double bond stereochemistry
>> is always defined relative to atoms bonded to the double-bonded atoms (more
>> complex to write than it actually is) and there’s just no way to do this if
>> either of those atoms is implicit.
>>
>
> Ok. It sounds like the correct treatment for my schrodinger/rdkit
> translation layer is to leave these hydrogens explicit.
>
>
>> I notice that when I use the smiles constructor, or if I read from an SDF
>>> file using the SDMolSupplier, the C=N bond in the example shown above is
>>> not recognized as having stereochemistry. However, if I use
>>> removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as
>>> Z.
>>>
>>
>> I need to confirm it (I’m on my phone at the moment), but I think this is
>> a bug: removeHs() should not remove atoms that determine stereochemistry.
>> This might be something I can get fixed before the next release.
>>
>
> Reading from SMILES in RDKit also loses this hydrogen:
>
> Python 3.6.2 (default, Sep 26 2017, 17:33:28)
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
> >>> import rdkit.Chem
> >>> rdkit.__version__
> '2017.03.1'
> >>> m = rdkit.Chem.MolFromSmiles('F/C=N/[H]')
> >>> rdkit.Chem.MolToSmiles(m, isomericSmiles=True)
> 'N=CF'
>
> Would it be useful for me to file a bug report?
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen

2018-04-06 Thread Dan Nealschneider
Thanks, Greg-


>
>> What is the correct treatment of bond stereochemistry at centers for
>> which a hydrogen is required in order to specify the bond stereochemistry?
>> For example, an imine with a hydrogen substituent (trivial example,
>> F/C=N/[H]).
>>
>
> In these cases the H cannot be implicit. The double bond stereochemistry
> is always defined relative to atoms bonded to the double-bonded atoms (more
> complex to write than it actually is) and there’s just no way to do this if
> either of those atoms is implicit.
>

Ok. It sounds like the correct treatment for my schrodinger/rdkit
translation layer is to leave these hydrogens explicit.


> I notice that when I use the smiles constructor, or if I read from an SDF
>> file using the SDMolSupplier, the C=N bond in the example shown above is
>> not recognized as having stereochemistry. However, if I use
>> removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as
>> Z.
>>
>
> I need to confirm it (I’m on my phone at the moment), but I think this is
> a bug: removeHs() should not remove atoms that determine stereochemistry.
> This might be something I can get fixed before the next release.
>

Reading from SMILES in RDKit also loses this hydrogen:

Python 3.6.2 (default, Sep 26 2017, 17:33:28)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
>>> import rdkit.Chem
>>> rdkit.__version__
'2017.03.1'
>>> m = rdkit.Chem.MolFromSmiles('F/C=N/[H]')
>>> rdkit.Chem.MolToSmiles(m, isomericSmiles=True)
'N=CF'

Would it be useful for me to file a bug report?
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen

2018-04-06 Thread Greg Landrum
Hi Dan,

On Fri, 6 Apr 2018 at 00:24, Dan Nealschneider <
dan.nealschnei...@schrodinger.com> wrote:

>
> What is the correct treatment of bond stereochemistry at centers for which
> a hydrogen is required in order to specify the bond stereochemistry? For
> example, an imine with a hydrogen substituent (trivial example, F/C=N/[H]).
>

In these cases the H cannot be implicit. The double bond stereochemistry is
always defined relative to atoms bonded to the double-bonded atoms (more
complex to write than it actually is) and there’s just no way to do this if
either of those atoms is implicit.

I notice that when I use the smiles constructor, or if I read from an SDF
> file using the SDMolSupplier, the C=N bond in the example shown above is
> not recognized as having stereochemistry. However, if I use
> removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as
> Z.
>

I need to confirm it (I’m on my phone at the moment), but I think this is a
bug: removeHs() should not remove atoms that determine stereochemistry.
This might be something I can get fixed before the next release.

*At core, I have 2 questions:* Is RDKit able to represent stereochemistry
> about this bond if the hydrogen is implicit?
>

Nope. Not at the moment.

-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen

2018-04-05 Thread Dan Nealschneider
I'm working on a translation layer between Schrodinger structures and RDKit
mols. Schrodinger structures do not have implicit hydrogens, so I'm
struggling a bit to understand how best to treat potentially implicit
hydrogens!

What is the correct treatment of bond stereochemistry at centers for which
a hydrogen is required in order to specify the bond stereochemistry? For
example, an imine with a hydrogen substituent (trivial example, F/C=N/[H]).

I notice that when I use the smiles constructor, or if I read from an SDF
file using the SDMolSupplier, the C=N bond in the example shown above is
not recognized as having stereochemistry. However, if I use
removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as Z.
Maybe that can beg presented more clearly as code (here's an interactive
Python shell, I've also attached this as a script, as well as an SDF file).

Python 3.6.2 (default, Jul 21 2017, 13:21:26)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdkit
>>> print(rdkit.__version__)
2017.03.1
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> from rdkit.Chem import rdmolops
>>> def summarize(mol):
...  bond = mol.GetBondBetweenAtoms(0, 1)
...  atoms = list(bond.GetStereoAtoms())
...  atoms.insert(1, bond.GetEndAtom().GetIdx())
...  atoms.insert(1, bond.GetBeginAtom().GetIdx())
...  print(Chem.MolToSmiles(mol, isomericSmiles=True))
...  print(bond.GetStereo(), atoms)
...
>>> has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False))
>>> no_h = rdmolops.RemoveHs(has_h)
>>> has_h_again = rdmolops.AddHs(no_h)
>>> summarize(has_h)
[H]/N=C(/[H])F
STEREOZ [3, 0, 1, 2]
>>> summarize(no_h)
N=CF
STEREOZ [1, 0]
>>> summarize(has_h_again)
[H]N=C([H])F
STEREOZ [1, 0]
>>> AllChem.EmbedMolecule(has_h)
0
>>> AllChem.EmbedMolecule(no_h)
0
>>> AllChem.EmbedMolecule(has_h_again)
Fatal Python error: Segmentation fault

Current thread 0x7faa949d8740 (most recent call first):
  File "", line 1 in 
Segmentation fault

*At core, I have 2 questions:* Is RDKit able to represent stereochemistry
about this bond if the hydrogen is implicit? It's fine if not, I just want
to know. If RDKit can represent stereochemistry for bonds for which one
substituent is hydrogen, what different information do I need to provide
RDKit?

- dan nealschneider

(né wandschneider)

Senior Developer
Schr*ö*dinger, Inc
Portland, OR


cis_imine.sdf
Description: Binary data
"""
Demonstrate my questions about bonds whose stereochemistry is specified
based on a hydrogen, especially when that hydrogen is made implicit.

"""
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdmolops

has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False))
def summarize(mol, a0=0, a1=1):
bond = mol.GetBondBetweenAtoms(a0, a1)
atoms = list(bond.GetStereoAtoms())
atoms.insert(1, bond.GetEndAtom().GetIdx())
atoms.insert(1, bond.GetBeginAtom().GetIdx())
print(Chem.MolToSmiles(mol, isomericSmiles=True))
print(bond.GetStereo(), atoms)

no_h = rdmolops.RemoveHs(has_h)
has_h_again = rdmolops.AddHs(no_h)

print(rdkit.__version__)
summarize(has_h)
summarize(no_h)
summarize(has_h_again)
AllChem.EmbedMolecule(has_h)
AllChem.EmbedMolecule(no_h)
# This generates a SEGV in my hands. Totalview says it happened in
# _ZN5RDKit12DGeomHelpers14_getAtomStereoEPKNS_4BondEjj, but I
# can't find a getAtomStereo or 2DGeomHelpers in RDKit's github.
AllChem.EmbedMolecule(has_h_again)

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss