I'm working on a translation layer between Schrodinger structures and RDKit mols. Schrodinger structures do not have implicit hydrogens, so I'm struggling a bit to understand how best to treat potentially implicit hydrogens!
What is the correct treatment of bond stereochemistry at centers for which a hydrogen is required in order to specify the bond stereochemistry? For example, an imine with a hydrogen substituent (trivial example, F/C=N/[H]). I notice that when I use the smiles constructor, or if I read from an SDF file using the SDMolSupplier, the C=N bond in the example shown above is not recognized as having stereochemistry. However, if I use removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as Z. Maybe that can beg presented more clearly as code (here's an interactive Python shell, I've also attached this as a script, as well as an SDF file). Python 3.6.2 (default, Jul 21 2017, 13:21:26) [GCC 4.9.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import rdkit >>> print(rdkit.__version__) 2017.03.1 >>> from rdkit import Chem >>> from rdkit.Chem import AllChem >>> from rdkit.Chem import rdmolops >>> def summarize(mol): ... bond = mol.GetBondBetweenAtoms(0, 1) ... atoms = list(bond.GetStereoAtoms()) ... atoms.insert(1, bond.GetEndAtom().GetIdx()) ... atoms.insert(1, bond.GetBeginAtom().GetIdx()) ... print(Chem.MolToSmiles(mol, isomericSmiles=True)) ... print(bond.GetStereo(), atoms) ... >>> has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False)) >>> no_h = rdmolops.RemoveHs(has_h) >>> has_h_again = rdmolops.AddHs(no_h) >>> summarize(has_h) [H]/N=C(/[H])F STEREOZ [3, 0, 1, 2] >>> summarize(no_h) N=CF STEREOZ [1, 0] >>> summarize(has_h_again) [H]N=C([H])F STEREOZ [1, 0] >>> AllChem.EmbedMolecule(has_h) 0 >>> AllChem.EmbedMolecule(no_h) 0 >>> AllChem.EmbedMolecule(has_h_again) Fatal Python error: Segmentation fault Current thread 0x00007faa949d8740 (most recent call first): File "<stdin>", line 1 in <module> Segmentation fault *At core, I have 2 questions:* Is RDKit able to represent stereochemistry about this bond if the hydrogen is implicit? It's fine if not, I just want to know. If RDKit can represent stereochemistry for bonds for which one substituent is hydrogen, what different information do I need to provide RDKit? - dan nealschneider (né wandschneider) Senior Developer Schr*ö*dinger, Inc Portland, OR
cis_imine.sdf
Description: Binary data
""" Demonstrate my questions about bonds whose stereochemistry is specified based on a hydrogen, especially when that hydrogen is made implicit. """ import rdkit from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import rdmolops has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False)) def summarize(mol, a0=0, a1=1): bond = mol.GetBondBetweenAtoms(a0, a1) atoms = list(bond.GetStereoAtoms()) atoms.insert(1, bond.GetEndAtom().GetIdx()) atoms.insert(1, bond.GetBeginAtom().GetIdx()) print(Chem.MolToSmiles(mol, isomericSmiles=True)) print(bond.GetStereo(), atoms) no_h = rdmolops.RemoveHs(has_h) has_h_again = rdmolops.AddHs(no_h) print(rdkit.__version__) summarize(has_h) summarize(no_h) summarize(has_h_again) AllChem.EmbedMolecule(has_h) AllChem.EmbedMolecule(no_h) # This generates a SEGV in my hands. Totalview says it happened in # _ZN5RDKit12DGeomHelpers14_getAtomStereoEPKNS_4BondEjj, but I # can't find a getAtomStereo or 2DGeomHelpers in RDKit's github. AllChem.EmbedMolecule(has_h_again)
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss