Hi Noel,

Thanks for the pointer to your blog post ... it explains the issue well.
I'll address the topic here, but let me know if it would be better to post
on your blog for completeness or on the OB list for wider distribution.

My overall answer to this whole question is that it's always a mistake to
lose information -- particularly in a toolkit like OpenBabel.  The
primary *raison
d'etre* of OpenBabel is to communicate between different file formats with
the greatest fidelity possible.  With this change, we have a situation
where a round-trip between two formats loses critical molecular information
where previously it didn't.

I see it as more of a pragmatic question than anything else.  There is a
way to keep the information, so why not do it?

The origin of this problem is the age-old complaint that the SD File Format
has both ambiguity and redundancy.  Each developer interprets the spec
differently and chaos results. My philosophy has always been to err on the
side of too much information rather than just enough or too little. When a
stereo center is present, mark it every way possible.  When a cis/trans
bond is present, use both the 2D coordinates and the bond labels.

>From your blog:
> My current understanding is that where 3D coordinates are present,
there's no need
> to store stereochemical information in either the atom parity or the bond
block. I think
> I'll probably set the atom parity anyway (since I've already written the
code, and it
> helps when you look at the file to be able to easily identify the chiral
centers).

There are three reasons why you should store stereo information everywhere.

First, because there's no reason not to (what's the harm?).

Second, it's often used to designate partially-known stereochemistry.  It's
common for a molecule to have both known and unknown stereo centers.
SMILES handles this because each stereo center is specified independently.
People often will generate 3D coordinates for a molecule even though they
don't know each stereo center -- they just arbitrarily pick a configuration
for the unknown centers.  By marking some centers' parity bits or up/down
bonds and leaving others out, you can make it clear that the
stereochemistry is partially known.  (It would be nice if this were written
into the CTFile specification.)

And third, there are applications out there that rely on the atom parity
and bond blocks to specify chirality.  It's a bit of work to do the
geometry to deduce stereochemistry from 3D coordinates, so many apps just
count on the atom-parity bit or bond block.  My recollection is that
Daylight's SDF-to-SMILES conversion programs used the atom parity and bond
up/down flags if they could, and only used the 3D geometry as a last resort.

> For 2D coordinates, there's no need to store the bond stereochemistry (as
this can
> be worked out from the coordinates), but chirality needs to be stored
explicitly. The
> normal way to store this is not using atom parity (but I'll set this
anyway for the same
> reasons as above), but by setting one of the bonds on the tetrahedral
center to up or down.

This is true in theory but useless in practice.  The first argument above
("what's the harm?") applies here too.  But more importantly, most molecule
editors and 2D generators (including OpenBabel!) will use 120-degree bonds
on every double bond they draw or lay out.  And in almost all cases, by
default they draw the trans configuration.  In real life, often time a
chemist will draw a double bond in the trans configuration without actually
knowing (or caring) whether it's cis or trans.

And like the 3D information, it's often the case that one double-bond's
configuration is known while another's is not.  If you assume that you can
derive the cis/trans configuration from the 2D coordinates, then there's no
way to represent the information in "CC=CC/C=CC/".  On the other hand, by
using the up/down bond flags, you can represent this molecule correctly.

> For 0D coordinates, there are no guidelines. I propose to store cis/trans
stereo
> using the bond stereo (you know, UP [or DOWN] at both ends of a double
bond
> means cis),

But right now OpenBabel isn't even doing this.  It's just discarding the
cis/trans information.

> and chirality using the atom parity. The MDL spec states that atom
> parity should be ignored when read, but the alternative is to just forget
the
> stereochemistry, or else to store both cis/trans stereo *and* chirality
in the bond
> block, which may just about be possible but is likely to be a real mess.

Here again, I'd argue for putting the information everywhere possible for
reasons of portability. The CTFile spec, combined with various heroic
attempts to work around its shortcomings, means that for every possible
choice of how to write the chirality there's at least one app that does it
that way.  If OpenBabel can write correct SD Files that put redundant but
consistent chiral specifications (i.e. use 3D, atom parity and bond flags),
then why not?

Here's a more pragmatic argument.  In OB 2.3.1, they only way to get a
correct round-trip SMILES-SDF-SMILES generation is to use --gen2D.  That
requires a very expensive and unnecessary *ab initio* calculation of 2D
coordinates.  For many real molecules, generating 2D coordinates can be 10x
or 100x slower than merely parsing the molecule ... and it was completely
unnecessary in OB 2.2.x.

And more to the point, this is a showstopper for us.  In our experience,
most pharmaceutical researchers use SMILES for molecular modeling,
diversity analysis, toxicology analysis and so forth. Once they decide what
to buy, they may send us the SMILES, or may send us SD Files. These files
can range from a few compounds to hundreds of thousands of compounds.  It
would be a disaster if the cis/trans information was lost at the end of
this time-consuming analysis just because they (or we) converted their
SMILES to SDF format using OpenBabel before buying the compounds.

Since I know about this problem, eMolecules can exercise diligence and
never do a SMILES-to-SDF conversion.  But customers might not be aware of
this restriction -- they use OpenBabel because it is known to be good at
file-format conversion.  It would be really unpleasant for us to have to
explain to a customer that they'd ordered hundreds of incorrect compounds
because OpenBabel doesn't handle cis/trans the way you'd expect.

Thanks,
Craig


On Sat, May 12, 2012 at 6:37 AM, Noel O'Boyle <baoille...@gmail.com> wrote:

> Sorry - I got things backward. It's storing the cis/trans stereochemistry
> in a 0D format that's the problem. See the post and comments at
> http://baoilleach.blogspot.com/2010/02/how-to-store-stereochemistry-in-mol.html
>
> - Noel
>
> On 12 May 2012 13:52, Noel O'Boyle <baoille...@gmail.com> wrote:
>
>> It's intentional, rather than a bug. I originally had some code in there
>> to support stereo in 0D SDF, but the format really doesn't support this
>> officially - it's supposed to be either 2D or 3D. It's all very well for
>> cis/trans, but it's not possible to store tet stereo without coordinates
>> (which aren't present in 0D) or tet parities (which the spec explicitly
>> says to ignore on reading).
>>
>> In short, we could support this, but Open Babel would be the only
>> software to do so, and these 0D SDF files would not be handled correctly by
>> others...
>>
>> In short, if you use --gen2d or --gen3d it will work fine.
>>
>> - Noel
>>
>> On 11 May 2012 23:48, Craig James <cja...@emolecules.com> wrote:
>>
>>>  This looks bad:
>>>
>>>    echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
>>>    CC=CC
>>>
>>> Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:
>>>
>>>    echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
>>>     C/C=C/C
>>>
>>> The problem seems to be here in 2.3.x:
>>>
>>>    echo "C/C=C/C" | babel -i smi -o sdf
>>>
>>>     OpenBabel05111215342D
>>>
>>>      4  3  0  0  0  0  0  0  0  0999 V2000
>>>        0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>        0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>      1  2  1  0  0  0  0
>>>       2  3  2  0  0  0  0
>>>      3  4  1  0  0  0  0
>>>     M  END
>>>    $$$$
>>>
>>> Notice that the bond block has no stereo (cis/trans) markings.  Do the
>>> same thing in 2.2.x and the cis/trans bonds are properly marked:
>>>
>>>     echo "C/C=C/C" | babel -i smi -o sdf
>>>
>>>      OpenBabel05111215352D
>>>
>>>       4  3  0  0  0  0  0  0  0  0999 V2000
>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>       1  2  1  1  0  0
>>>       2  3  2  3  0  0
>>>       3  4  1  6  0  0
>>>     M  END
>>>
>>>     $$$$
>>>
>>> The bond block is correct here in this output from 2.2.x.
>>>
>>> Any ideas when this might have happened and if it was intentional?
>>>
>>> Thanks,
>>> Craig
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> OpenBabel-Devel mailing list
>>> OpenBabel-Devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>>
>>>
>>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to