Re: [Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen
Yeah, it would be great if you could create an issue for the general problem that removeHs() should not remove H atoms that are contributing to the definition of a stereo bond On Fri, 6 Apr 2018 at 18:49, Dan Nealschneider < dan.nealschnei...@schrodinger.com> wrote: > Thanks, Greg- > > >> >>> What is the correct treatment of bond stereochemistry at centers for >>> which a hydrogen is required in order to specify the bond stereochemistry? >>> For example, an imine with a hydrogen substituent (trivial example, >>> F/C=N/[H]). >>> >> >> In these cases the H cannot be implicit. The double bond stereochemistry >> is always defined relative to atoms bonded to the double-bonded atoms (more >> complex to write than it actually is) and there’s just no way to do this if >> either of those atoms is implicit. >> > > Ok. It sounds like the correct treatment for my schrodinger/rdkit > translation layer is to leave these hydrogens explicit. > > >> I notice that when I use the smiles constructor, or if I read from an SDF >>> file using the SDMolSupplier, the C=N bond in the example shown above is >>> not recognized as having stereochemistry. However, if I use >>> removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as >>> Z. >>> >> >> I need to confirm it (I’m on my phone at the moment), but I think this is >> a bug: removeHs() should not remove atoms that determine stereochemistry. >> This might be something I can get fixed before the next release. >> > > Reading from SMILES in RDKit also loses this hydrogen: > > Python 3.6.2 (default, Sep 26 2017, 17:33:28) > [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin > >>> import rdkit.Chem > >>> rdkit.__version__ > '2017.03.1' > >>> m = rdkit.Chem.MolFromSmiles('F/C=N/[H]') > >>> rdkit.Chem.MolToSmiles(m, isomericSmiles=True) > 'N=CF' > > Would it be useful for me to file a bug report? > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen
Thanks, Greg- > >> What is the correct treatment of bond stereochemistry at centers for >> which a hydrogen is required in order to specify the bond stereochemistry? >> For example, an imine with a hydrogen substituent (trivial example, >> F/C=N/[H]). >> > > In these cases the H cannot be implicit. The double bond stereochemistry > is always defined relative to atoms bonded to the double-bonded atoms (more > complex to write than it actually is) and there’s just no way to do this if > either of those atoms is implicit. > Ok. It sounds like the correct treatment for my schrodinger/rdkit translation layer is to leave these hydrogens explicit. > I notice that when I use the smiles constructor, or if I read from an SDF >> file using the SDMolSupplier, the C=N bond in the example shown above is >> not recognized as having stereochemistry. However, if I use >> removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as >> Z. >> > > I need to confirm it (I’m on my phone at the moment), but I think this is > a bug: removeHs() should not remove atoms that determine stereochemistry. > This might be something I can get fixed before the next release. > Reading from SMILES in RDKit also loses this hydrogen: Python 3.6.2 (default, Sep 26 2017, 17:33:28) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin >>> import rdkit.Chem >>> rdkit.__version__ '2017.03.1' >>> m = rdkit.Chem.MolFromSmiles('F/C=N/[H]') >>> rdkit.Chem.MolToSmiles(m, isomericSmiles=True) 'N=CF' Would it be useful for me to file a bug report? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen
Hi Dan, On Fri, 6 Apr 2018 at 00:24, Dan Nealschneider < dan.nealschnei...@schrodinger.com> wrote: > > What is the correct treatment of bond stereochemistry at centers for which > a hydrogen is required in order to specify the bond stereochemistry? For > example, an imine with a hydrogen substituent (trivial example, F/C=N/[H]). > In these cases the H cannot be implicit. The double bond stereochemistry is always defined relative to atoms bonded to the double-bonded atoms (more complex to write than it actually is) and there’s just no way to do this if either of those atoms is implicit. I notice that when I use the smiles constructor, or if I read from an SDF > file using the SDMolSupplier, the C=N bond in the example shown above is > not recognized as having stereochemistry. However, if I use > removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as > Z. > I need to confirm it (I’m on my phone at the moment), but I think this is a bug: removeHs() should not remove atoms that determine stereochemistry. This might be something I can get fixed before the next release. *At core, I have 2 questions:* Is RDKit able to represent stereochemistry > about this bond if the hydrogen is implicit? > Nope. Not at the moment. -greg -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen
I'm working on a translation layer between Schrodinger structures and RDKit mols. Schrodinger structures do not have implicit hydrogens, so I'm struggling a bit to understand how best to treat potentially implicit hydrogens! What is the correct treatment of bond stereochemistry at centers for which a hydrogen is required in order to specify the bond stereochemistry? For example, an imine with a hydrogen substituent (trivial example, F/C=N/[H]). I notice that when I use the smiles constructor, or if I read from an SDF file using the SDMolSupplier, the C=N bond in the example shown above is not recognized as having stereochemistry. However, if I use removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as Z. Maybe that can beg presented more clearly as code (here's an interactive Python shell, I've also attached this as a script, as well as an SDF file). Python 3.6.2 (default, Jul 21 2017, 13:21:26) [GCC 4.9.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import rdkit >>> print(rdkit.__version__) 2017.03.1 >>> from rdkit import Chem >>> from rdkit.Chem import AllChem >>> from rdkit.Chem import rdmolops >>> def summarize(mol): ... bond = mol.GetBondBetweenAtoms(0, 1) ... atoms = list(bond.GetStereoAtoms()) ... atoms.insert(1, bond.GetEndAtom().GetIdx()) ... atoms.insert(1, bond.GetBeginAtom().GetIdx()) ... print(Chem.MolToSmiles(mol, isomericSmiles=True)) ... print(bond.GetStereo(), atoms) ... >>> has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False)) >>> no_h = rdmolops.RemoveHs(has_h) >>> has_h_again = rdmolops.AddHs(no_h) >>> summarize(has_h) [H]/N=C(/[H])F STEREOZ [3, 0, 1, 2] >>> summarize(no_h) N=CF STEREOZ [1, 0] >>> summarize(has_h_again) [H]N=C([H])F STEREOZ [1, 0] >>> AllChem.EmbedMolecule(has_h) 0 >>> AllChem.EmbedMolecule(no_h) 0 >>> AllChem.EmbedMolecule(has_h_again) Fatal Python error: Segmentation fault Current thread 0x7faa949d8740 (most recent call first): File "", line 1 in Segmentation fault *At core, I have 2 questions:* Is RDKit able to represent stereochemistry about this bond if the hydrogen is implicit? It's fine if not, I just want to know. If RDKit can represent stereochemistry for bonds for which one substituent is hydrogen, what different information do I need to provide RDKit? - dan nealschneider (né wandschneider) Senior Developer Schr*ö*dinger, Inc Portland, OR cis_imine.sdf Description: Binary data """ Demonstrate my questions about bonds whose stereochemistry is specified based on a hydrogen, especially when that hydrogen is made implicit. """ import rdkit from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import rdmolops has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False)) def summarize(mol, a0=0, a1=1): bond = mol.GetBondBetweenAtoms(a0, a1) atoms = list(bond.GetStereoAtoms()) atoms.insert(1, bond.GetEndAtom().GetIdx()) atoms.insert(1, bond.GetBeginAtom().GetIdx()) print(Chem.MolToSmiles(mol, isomericSmiles=True)) print(bond.GetStereo(), atoms) no_h = rdmolops.RemoveHs(has_h) has_h_again = rdmolops.AddHs(no_h) print(rdkit.__version__) summarize(has_h) summarize(no_h) summarize(has_h_again) AllChem.EmbedMolecule(has_h) AllChem.EmbedMolecule(no_h) # This generates a SEGV in my hands. Totalview says it happened in # _ZN5RDKit12DGeomHelpers14_getAtomStereoEPKNS_4BondEjj, but I # can't find a getAtomStereo or 2DGeomHelpers in RDKit's github. AllChem.EmbedMolecule(has_h_again) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss