Dear Markus, On Tue, Jan 12, 2010 at 10:28 AM, markus kossner <[email protected]> wrote: > > I recently played around with the rdkit and got an interesting behavior when > reading sd-files from different vendors: > As I scrolled through the mailing list I found the thread by Marshall > Levesque ( [Rdkit-discuss] preservation of stereocenters in 2D->3D > generation) > who posted Problems with Chiral Centers after embedding when reading the > mols from sd. > I had some problems preserving chirality also, that were solved if I just > did the 'Chem.AssignAtomChiralTagsFromStructure(mol)' > that I found in the code snippets of a previous thread. {cf. > '[Rdkit-discuss] preservation of stereocenters in 2D->3D generation' > -> Problems with Chiral Centers after embedding when reading the mols from > sd} > > As I regard sdfs as real beasts in terms of conventions/definitions I > checked the rdkit for reading sdfs from different vendors and found > differences in the two > sd files I attached. Ala.sdf is just the two respective enantiomers of > Alanin drawn in Chemaxons msketch, the zinc.sdf are two respective > enantiomers of the same molecule as downloaded from Zinc. > Obviously the mols read from Chemaxon files are directly interpreted as the > respective enantiomers, whereas we need the > Chem.AssignAtomChiralTagsFromStructure(mol) for the 'zinc-style' sdfs. As I > said rdkit reliably yields the right chiral smiles > if we just do not forget this step, does it? > Here's what I did: > >> >> from rdkit.Chem import AllChem as Chem >> >> ala=Chem.SDMolSupplier('Ala.sdf') >> >> for mol in ala: > > ... print Chem.MolToSmiles(mol,isomericSmiles=True) > ... > c...@h](N)C(=O)O > C[C@@H](N)C(=O)O >> >> >> zinc=Chem.SDMolSupplier('zinc.sdf') >> >> for mol in zinc: > > ... print Chem.MolToSmiles(mol,isomericSmiles=True) > ... > CCC1C(=O)N=C(SCC(NCc2ccccc2)=O)NC1=O > CCC1C(=O)N=C(SCC(NCc2ccccc2)=O)NC1=O >> >> >> for mol in zinc: > > ... Chem.AssignAtomChiralTagsFromStructure(mol) > ... print Chem.MolToSmiles(mol,isomericSmiles=True) > ... > CC[C@@H]1C(=O)N=C(SCC(NCc2ccccc2)=O)NC1=O > c...@h]1c(=O)N=C(SCC(NCc2ccccc2)=O)NC1=O
I'm not completely sure what your question is, but I guess you're
looking for an explanation of the behavior you've observed. I'll do
that.
When the RDKit reads a molecule from a CTAB (the molecule part of an
SD file or mol block), the only pieces of information it has directly
available for assigning chiral tags to atoms are the bond tags in the
CTAB. Here's part of one of the CTABs from your alanine example:
-------------------
Marvin 01121009242D
6 5 0 0 0 0 999 V2000
-1.6021 0.5304 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.8876 0.1179 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
-0.8877 -0.7071 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-0.1730 1.3553 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-0.1731 0.5303 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5414 0.1178 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 0 0 0
3 2 1 0 0 0 0
2 5 1 0 0 0 0
5 4 2 0 0 0 0
5 6 1 0 0 0 0
M END
-------------------
Take a look at the bond for the first line:
2 1 1 1 0 0 0
this tells the code that it's a single bond between atoms 2 and 1 and
that it is wedged upwards (that's the last "1"
The RDKit can use that wedging information plus the 2D coordinates of
the other atoms around atom 2 to assign a chiral tag to the atom.
3D SD files are different, there the bond wedging is harder to
interpret and many pieces of software don't write it at all. Luckily,
in 3D SD files you have the coordinates of the atoms themselves. These
let the RDKit assign stereochemistry, which is what
"Chem.AssignAtomChiralTagsFromStructure" does.
The fact that the SD file parser does not automatically do this 3D
coordinates is a design decision: rather than automatically marking
every possible chiral center with the observed chirality every time a
3D CTAB is read, the RDKit leaves it to the user to decide when that
is appropriate.
Does this clear things up?
> P.S. sorry for not posting to rdkit-discuss directly but I wasn't sure about
> the attachments...
small attachments like this one (which I've included in my reply so
that others can see it) are fine.
Best Regards,
-greg
ChiralSdfs.tar.gz
Description: GNU Zip compressed data
------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

