Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Greg Landrum
On Wed, Mar 3, 2010 at 4:00 PM, Noel O'Boyle  wrote:
> On 3 March 2010 14:48, Craig James  wrote:
>> Noel O'Boyle wrote:
>>>
>>> Are some of the wedge/hash bonds in typical MOL files unrelated to
>>> stereochemistry? That is, are some purely for depiction? If I knew
>>> this for sure, I would not retain the wedge/hash bond designations in
>>> the input but just work them out from the perceived stereo.
>>
>> YES.  Lots of them.  We see this all the time - people use wedge/hash to do
>> pseudo-perspective drawings.  This is particularly common with metals.
>>
>> http://www.emolecules.com/image?db=549&id=17252456&width=400&height=400
>> http://www.emolecules.com/image?db=549&id=718320&width=400&height=400
>>
>> But I also see it all the time with organic molecules, particularly
>> structures that are hard to draw in 2D (semi-cage ring systems that won't
>> lay flat).  People use hashes and wedges to try to make them look nice, that
>> have nothing to do with stereochemistry.
>
> So...should we retain them or not? I think what I'll do is add an
> option to allow users to retain them exactly. However, the default
> will be that the wedges/hashes in the output will be solely dependent
> on the perceived stereochemistry. *Sigh* This applies to all 2D output
> formats.

Having the option to retain them exactly sounds sensible, but that
means retaining all of the user-provided markings, right? This almost
sounds to me like it's a read setting, not a write one. But then I'm
not familiar with the internal flow for processing mols in OB.

-greg

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Craig James
Noel O'Boyle wrote:
> Are some of the wedge/hash bonds in typical MOL files unrelated to
> stereochemistry? That is, are some purely for depiction? If I knew
> this for sure, I would not retain the wedge/hash bond designations in
> the input but just work them out from the perceived stereo.

YES.  Lots of them.  We see this all the time - people use wedge/hash to do 
pseudo-perspective drawings.  This is particularly common with metals.

http://www.emolecules.com/image?db=549&id=17252456&width=400&height=400
http://www.emolecules.com/image?db=549&id=718320&width=400&height=400

But I also see it all the time with organic molecules, particularly structures 
that are hard to draw in 2D (semi-cage ring systems that won't lay flat).  
People use hashes and wedges to try to make them look nice, that have nothing 
to do with stereochemistry.

Craig

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Noel O'Boyle
On 3 March 2010 14:48, Craig James  wrote:
> Noel O'Boyle wrote:
>>
>> Are some of the wedge/hash bonds in typical MOL files unrelated to
>> stereochemistry? That is, are some purely for depiction? If I knew
>> this for sure, I would not retain the wedge/hash bond designations in
>> the input but just work them out from the perceived stereo.
>
> YES.  Lots of them.  We see this all the time - people use wedge/hash to do
> pseudo-perspective drawings.  This is particularly common with metals.
>
> http://www.emolecules.com/image?db=549&id=17252456&width=400&height=400
> http://www.emolecules.com/image?db=549&id=718320&width=400&height=400
>
> But I also see it all the time with organic molecules, particularly
> structures that are hard to draw in 2D (semi-cage ring systems that won't
> lay flat).  People use hashes and wedges to try to make them look nice, that
> have nothing to do with stereochemistry.

So...should we retain them or not? I think what I'll do is add an
option to allow users to retain them exactly. However, the default
will be that the wedges/hashes in the output will be solely dependent
on the perceived stereochemistry. *Sigh* This applies to all 2D output
formats.

> Craig
>

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Peter Murray-Rust
On Wed, Mar 3, 2010 at 9:29 AM, Noel O'Boyle  wrote:

> On 2 March 2010 11:23, Greg Landrum  wrote:
>
>
> Are some of the wedge/hash bonds in typical MOL files unrelated to
> stereochemistry? That is, are some purely for depiction? If I knew
> this for sure, I would not retain the wedge/hash bond designations in
> the input but just work them out from the perceived stereo.
>
> It is *posssible* to write wedge/hash bonds for any bond in a V2000 file.
The terms used in CTFILE are "bond stereo" and "wedge". So presumably all
such fields are related to "bond stereo". We wrote a CML2SDF writer
(sponsored by MDL and I think meeting with their unofficial approval) and
there are no special checks as to when such a field can be written.

The only other formal questions are (a) can the current version of ISIS Draw
or similar program write out a wedge bond unrelated to stereochemistry and
(b) can the current program read such a field (that does not make
stereochemical sense) without flagging an error. If the answerr to both of
these is "yes" then I suspect the answer is "yes, they can be purely for
depiction",

P.



> - Noel
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-03 Thread Noel O'Boyle
On 2 March 2010 11:23, Greg Landrum  wrote:
> Dear Noel,
>
> Thanks for the repost; this helps.
>
> My 2 cents are below.
>
> On Tue, Mar 2, 2010 at 11:34 AM, Noel O'Boyle  wrote:
>> On 2 March 2010 09:40, Peter Murray-Rust  wrote:
>>> Thanks,
>>> This is a useful initiative
>>>
>>> On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle  wrote:

 (Reposted from my blog following Greg's suggestion )

 Hello all,

 Right now, I'm adding stereo (i.e. double bond stereochemistry, and
 chirality) to the MDL Mol format in OpenBabel. There are three places
 where stereochemical information can be stored in these files: the
 coordinates, the atom parity (in the atom block), the bond stereo (in
 the bond block).

 My current understanding is that where 3D coordinates are present,
 there's no need to store stereochemical information in either the atom
 parity or the bond block. I think I'll probably set the atom parity
 anyway (since I've already written the code, and it helps when you
 look at the file to be able to easily identify the chiral centers).
>
> Agreed that setting parity is a useful service to human readers but,
> as is already mentioned below, the spec is quite clear that these
> flags should be ignored on read.
>

>>>
>>> The main problem is lack of information as to whether the geometry (2D or
>>> 3D) is definitive or arbitrary. It is impossible to construct a 3D model of
>>> (say) alanine without a perceived stereochemistry at the Carbon. Similarly
>>> most modern 2D graphic programs will draw a double bond as cis or trans (not
>>> normally linear although this was common in typesetting). If the (arbitrary)
>>> geometry is then transmitted without details of authoring, then the reader
>>> may assume a definitive stereochemistry. Put another way, there is no way of
>>> indicating by coordinates alone that stereochemistry is unknown. I thinks
>>> it's very important not to use the geometry as definitive unless it is clear
>>> that the author specified it (which normally only comes from crystal
>>> structures or computational chemistry).
>>
>> Sure, but I think this is outside the scope here.
>
> I'm not sure I agree. I think this is one of the critical points when
> doing CTABs: when writing 3D or 2D coordinates how do you indicate
> what you *don't* know as well as indicating what you *do* know.
>
> In2D (and 3D) the problem is stereochemistry around double bonds: the
> coordinates provided in the output determine the stereochemistry.
> Luckily here the CTAB spec provides a way to indicate what isn't
> known: you use the 4th field in the bond line to indicate that the
> bond is an "either" bond (value 3). Technically this is what should be
> done by any toolkit that builds a molecule from the SMILES CC=CC.
>
> With atomic stereochemistry in 3D structures, the coordinats again
> determine the stereochemistry. As far as I know, the CTAB spec doesn't
> provide specific guidance about what to do when you have a
> stereocenter that's undetermined in your molecule. One possibility is
> to make sure that the bonds from that atom have 0 in field 4. Maybe
> it's "polite" to assign an either bond here as well (value 4 in this
> case) to make explicit to the viewer that the stereochemistry isn't
> known. But either of these raise the question of what to do if you
> *do* know the stereochemistry. My opinion here, and I'm aware it's one
> that many people do not share, is that it's best to treat the 3D case
> the same as the 2D one and use a wedged bond to mark atoms where the
> stereochemistry is known. It's somewhat ugly, but it has the advantage
> of being consistent (yes, yes, I know, when foolish it's the hobgoblin
> of little minds... but I don't think it's foolish here).
>
>>> P.
>>>

 For 2D coordinates, there's no need to store the bond stereochemistry
 (as this can be worked out from the coordinates), but chirality needs
 to be stored explicitly. The normal way to store this is not using
 atom parity (but I'll set this anyway for the same reasons as above),
 but by setting one of the bonds on the tetrahedral center to up or
 down.

 For 0D coordinates, there are no guidelines. I propose to store
 cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
 ends of a double bond means cis), and chirality using the atom parity.
 The MDL spec states that atom parity should be ignored when read,
>>>
>>> I know this is the spec and I don't want to get into more arguments about
>>> whether it should be changed. At this stage I think it is useful if programs
>>> have the capability to read and interpret this field.
>>
>> I think that I may move this to an option. So, if you don't explicitly
>> ask for it, you will just get what the spec says - i.e. no
>> stereochemistry will be stored if there are no coordinates.
>
> This is what I would suggest. Anything else involves introducing
> conventions 

Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-02 Thread Geoffrey Hutchison

On Mar 2, 2010, at 6:46 AM, Noel O'Boyle wrote:

>> nasty stuff... better to avoid stereochem in 0D files.
> 
> That's great Greg - you've obviously a lot of experience with this.
> Will do as you suggest across the board. At least we will have two
> toolkits behaving the same. I'll write up the specifics on our wiki
> and the BO wiki once I'm done.


What I've said in the past, is that SMILES -> SDF -> SMILES will unfortunately 
drop stereochem, unless 2D or 3D structure layout is performed in the SDF 
conversion. Of course, at the time I said that, Open Babel didn't have either 
3D or 2D generation capabilities.

Now, I think anytime someone posts a "bug" about this process, I'm going to 
suggest they do at least 2D layout. If so, obviously stereo is retained.

-Geoff
--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-02 Thread Noel O'Boyle
On 2 March 2010 11:23, Greg Landrum  wrote:
> Dear Noel,
>
> Thanks for the repost; this helps.
>
> My 2 cents are below.
>
> On Tue, Mar 2, 2010 at 11:34 AM, Noel O'Boyle  wrote:
>> On 2 March 2010 09:40, Peter Murray-Rust  wrote:
>>> Thanks,
>>> This is a useful initiative
>>>
>>> On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle  wrote:

 (Reposted from my blog following Greg's suggestion )

 Hello all,

 Right now, I'm adding stereo (i.e. double bond stereochemistry, and
 chirality) to the MDL Mol format in OpenBabel. There are three places
 where stereochemical information can be stored in these files: the
 coordinates, the atom parity (in the atom block), the bond stereo (in
 the bond block).

 My current understanding is that where 3D coordinates are present,
 there's no need to store stereochemical information in either the atom
 parity or the bond block. I think I'll probably set the atom parity
 anyway (since I've already written the code, and it helps when you
 look at the file to be able to easily identify the chiral centers).
>
> Agreed that setting parity is a useful service to human readers but,
> as is already mentioned below, the spec is quite clear that these
> flags should be ignored on read.
>

>>>
>>> The main problem is lack of information as to whether the geometry (2D or
>>> 3D) is definitive or arbitrary. It is impossible to construct a 3D model of
>>> (say) alanine without a perceived stereochemistry at the Carbon. Similarly
>>> most modern 2D graphic programs will draw a double bond as cis or trans (not
>>> normally linear although this was common in typesetting). If the (arbitrary)
>>> geometry is then transmitted without details of authoring, then the reader
>>> may assume a definitive stereochemistry. Put another way, there is no way of
>>> indicating by coordinates alone that stereochemistry is unknown. I thinks
>>> it's very important not to use the geometry as definitive unless it is clear
>>> that the author specified it (which normally only comes from crystal
>>> structures or computational chemistry).
>>
>> Sure, but I think this is outside the scope here.
>
> I'm not sure I agree. I think this is one of the critical points when
> doing CTABs: when writing 3D or 2D coordinates how do you indicate
> what you *don't* know as well as indicating what you *do* know.
>
> In2D (and 3D) the problem is stereochemistry around double bonds: the
> coordinates provided in the output determine the stereochemistry.
> Luckily here the CTAB spec provides a way to indicate what isn't
> known: you use the 4th field in the bond line to indicate that the
> bond is an "either" bond (value 3). Technically this is what should be
> done by any toolkit that builds a molecule from the SMILES CC=CC.
>
> With atomic stereochemistry in 3D structures, the coordinats again
> determine the stereochemistry. As far as I know, the CTAB spec doesn't
> provide specific guidance about what to do when you have a
> stereocenter that's undetermined in your molecule. One possibility is
> to make sure that the bonds from that atom have 0 in field 4. Maybe
> it's "polite" to assign an either bond here as well (value 4 in this
> case) to make explicit to the viewer that the stereochemistry isn't
> known. But either of these raise the question of what to do if you
> *do* know the stereochemistry. My opinion here, and I'm aware it's one
> that many people do not share, is that it's best to treat the 3D case
> the same as the 2D one and use a wedged bond to mark atoms where the
> stereochemistry is known. It's somewhat ugly, but it has the advantage
> of being consistent (yes, yes, I know, when foolish it's the hobgoblin
> of little minds... but I don't think it's foolish here).
>
>>> P.
>>>

 For 2D coordinates, there's no need to store the bond stereochemistry
 (as this can be worked out from the coordinates), but chirality needs
 to be stored explicitly. The normal way to store this is not using
 atom parity (but I'll set this anyway for the same reasons as above),
 but by setting one of the bonds on the tetrahedral center to up or
 down.

 For 0D coordinates, there are no guidelines. I propose to store
 cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
 ends of a double bond means cis), and chirality using the atom parity.
 The MDL spec states that atom parity should be ignored when read,
>>>
>>> I know this is the spec and I don't want to get into more arguments about
>>> whether it should be changed. At this stage I think it is useful if programs
>>> have the capability to read and interpret this field.
>>
>> I think that I may move this to an option. So, if you don't explicitly
>> ask for it, you will just get what the spec says - i.e. no
>> stereochemistry will be stored if there are no coordinates.
>
> This is what I would suggest. Anything else involves introducing
> conventions 

Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-02 Thread Greg Landrum
Dear Noel,

Thanks for the repost; this helps.

My 2 cents are below.

On Tue, Mar 2, 2010 at 11:34 AM, Noel O'Boyle  wrote:
> On 2 March 2010 09:40, Peter Murray-Rust  wrote:
>> Thanks,
>> This is a useful initiative
>>
>> On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle  wrote:
>>>
>>> (Reposted from my blog following Greg's suggestion )
>>>
>>> Hello all,
>>>
>>> Right now, I'm adding stereo (i.e. double bond stereochemistry, and
>>> chirality) to the MDL Mol format in OpenBabel. There are three places
>>> where stereochemical information can be stored in these files: the
>>> coordinates, the atom parity (in the atom block), the bond stereo (in
>>> the bond block).
>>>
>>> My current understanding is that where 3D coordinates are present,
>>> there's no need to store stereochemical information in either the atom
>>> parity or the bond block. I think I'll probably set the atom parity
>>> anyway (since I've already written the code, and it helps when you
>>> look at the file to be able to easily identify the chiral centers).

Agreed that setting parity is a useful service to human readers but,
as is already mentioned below, the spec is quite clear that these
flags should be ignored on read.

>>>
>>
>> The main problem is lack of information as to whether the geometry (2D or
>> 3D) is definitive or arbitrary. It is impossible to construct a 3D model of
>> (say) alanine without a perceived stereochemistry at the Carbon. Similarly
>> most modern 2D graphic programs will draw a double bond as cis or trans (not
>> normally linear although this was common in typesetting). If the (arbitrary)
>> geometry is then transmitted without details of authoring, then the reader
>> may assume a definitive stereochemistry. Put another way, there is no way of
>> indicating by coordinates alone that stereochemistry is unknown. I thinks
>> it's very important not to use the geometry as definitive unless it is clear
>> that the author specified it (which normally only comes from crystal
>> structures or computational chemistry).
>
> Sure, but I think this is outside the scope here.

I'm not sure I agree. I think this is one of the critical points when
doing CTABs: when writing 3D or 2D coordinates how do you indicate
what you *don't* know as well as indicating what you *do* know.

In2D (and 3D) the problem is stereochemistry around double bonds: the
coordinates provided in the output determine the stereochemistry.
Luckily here the CTAB spec provides a way to indicate what isn't
known: you use the 4th field in the bond line to indicate that the
bond is an "either" bond (value 3). Technically this is what should be
done by any toolkit that builds a molecule from the SMILES CC=CC.

With atomic stereochemistry in 3D structures, the coordinats again
determine the stereochemistry. As far as I know, the CTAB spec doesn't
provide specific guidance about what to do when you have a
stereocenter that's undetermined in your molecule. One possibility is
to make sure that the bonds from that atom have 0 in field 4. Maybe
it's "polite" to assign an either bond here as well (value 4 in this
case) to make explicit to the viewer that the stereochemistry isn't
known. But either of these raise the question of what to do if you
*do* know the stereochemistry. My opinion here, and I'm aware it's one
that many people do not share, is that it's best to treat the 3D case
the same as the 2D one and use a wedged bond to mark atoms where the
stereochemistry is known. It's somewhat ugly, but it has the advantage
of being consistent (yes, yes, I know, when foolish it's the hobgoblin
of little minds... but I don't think it's foolish here).

>> P.
>>
>>>
>>> For 2D coordinates, there's no need to store the bond stereochemistry
>>> (as this can be worked out from the coordinates), but chirality needs
>>> to be stored explicitly. The normal way to store this is not using
>>> atom parity (but I'll set this anyway for the same reasons as above),
>>> but by setting one of the bonds on the tetrahedral center to up or
>>> down.
>>>
>>> For 0D coordinates, there are no guidelines. I propose to store
>>> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>>> ends of a double bond means cis), and chirality using the atom parity.
>>> The MDL spec states that atom parity should be ignored when read,
>>
>> I know this is the spec and I don't want to get into more arguments about
>> whether it should be changed. At this stage I think it is useful if programs
>> have the capability to read and interpret this field.
>
> I think that I may move this to an option. So, if you don't explicitly
> ask for it, you will just get what the spec says - i.e. no
> stereochemistry will be stored if there are no coordinates.

This is what I would suggest. Anything else involves introducing
conventions that will work with OB, but that may or may not work with
other toolkits. Since there's no clear answer, or anything that even
really makes much sense, it's probably best to no

Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-02 Thread Noel O'Boyle
On 2 March 2010 09:40, Peter Murray-Rust  wrote:
> Thanks,
> This is a useful initiative
>
> On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle  wrote:
>>
>> (Reposted from my blog following Greg's suggestion )
>>
>> Hello all,
>>
>> Right now, I'm adding stereo (i.e. double bond stereochemistry, and
>> chirality) to the MDL Mol format in OpenBabel. There are three places
>> where stereochemical information can be stored in these files: the
>> coordinates, the atom parity (in the atom block), the bond stereo (in
>> the bond block).
>>
>> My current understanding is that where 3D coordinates are present,
>> there's no need to store stereochemical information in either the atom
>> parity or the bond block. I think I'll probably set the atom parity
>> anyway (since I've already written the code, and it helps when you
>> look at the file to be able to easily identify the chiral centers).
>>
>
> The main problem is lack of information as to whether the geometry (2D or
> 3D) is definitive or arbitrary. It is impossible to construct a 3D model of
> (say) alanine without a perceived stereochemistry at the Carbon. Similarly
> most modern 2D graphic programs will draw a double bond as cis or trans (not
> normally linear although this was common in typesetting). If the (arbitrary)
> geometry is then transmitted without details of authoring, then the reader
> may assume a definitive stereochemistry. Put another way, there is no way of
> indicating by coordinates alone that stereochemistry is unknown. I thinks
> it's very important not to use the geometry as definitive unless it is clear
> that the author specified it (which normally only comes from crystal
> structures or computational chemistry).

Sure, but I think this is outside the scope here.

> P.
>
>>
>> For 2D coordinates, there's no need to store the bond stereochemistry
>> (as this can be worked out from the coordinates), but chirality needs
>> to be stored explicitly. The normal way to store this is not using
>> atom parity (but I'll set this anyway for the same reasons as above),
>> but by setting one of the bonds on the tetrahedral center to up or
>> down.
>>
>> For 0D coordinates, there are no guidelines. I propose to store
>> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>> ends of a double bond means cis), and chirality using the atom parity.
>> The MDL spec states that atom parity should be ignored when read,
>
> I know this is the spec and I don't want to get into more arguments about
> whether it should be changed. At this stage I think it is useful if programs
> have the capability to read and interpret this field.

I think that I may move this to an option. So, if you don't explicitly
ask for it, you will just get what the spec says - i.e. no
stereochemistry will be stored if there are no coordinates.

>>
>> but
>> the alternative is to just forget the stereochemistry, or else to
>> store both cis/trans stereo *and* chirality in the bond block, which
>> may just about be possible but is likely to be a real mess.
>>
> Is it ambiguous or merely complicated? If the latter then we should use it
> to remove ambiguity.

As it is (for 2D), it's already ambiguous. The interpretation of a
hash or wedge bond between two stereocentres is ambiguous (as in one
toolkit may interpret as describing the stereo only at the start,
while another might interpret it as describing the stereo at the
beginning and end). In the case of 0D, if you cram all of the
stereochemical information into the bond block it will only get worse;
you will have situations like a stereochemical center attached to a
double bond. Can the same single bond be used to indicate both
cis/trans across the double bond, and the chirality of the center? All
of these problems can be avoided using conventions, but the spec
doesn't go that far.

>
>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


Re: [BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-02 Thread Peter Murray-Rust
Thanks,
This is a useful initiative

On Tue, Mar 2, 2010 at 9:14 AM, Noel O'Boyle  wrote:

> (Reposted from my blog following Greg's suggestion )
>
> Hello all,
>
> Right now, I'm adding stereo (i.e. double bond stereochemistry, and
> chirality) to the MDL Mol format in OpenBabel. There are three places
> where stereochemical information can be stored in these files: the
> coordinates, the atom parity (in the atom block), the bond stereo (in
> the bond block).
>
> My current understanding is that where 3D coordinates are present,
> there's no need to store stereochemical information in either the atom
> parity or the bond block. I think I'll probably set the atom parity
> anyway (since I've already written the code, and it helps when you
> look at the file to be able to easily identify the chiral centers).
>
>
The main problem is lack of information as to whether the geometry (2D or
3D) is definitive or arbitrary. It is impossible to construct a 3D model of
(say) alanine without a perceived stereochemistry at the Carbon. Similarly
most modern 2D graphic programs will draw a double bond as cis or trans (not
normally linear although this was common in typesetting). If the (arbitrary)
geometry is then transmitted without details of authoring, then the reader
may assume a definitive stereochemistry. Put another way, there is no way of
indicating by coordinates alone that stereochemistry is unknown. I thinks
it's very important not to use the geometry as definitive unless it is clear
that the author specified it (which normally only comes from crystal
structures or computational chemistry).

P.


> For 2D coordinates, there's no need to store the bond stereochemistry
> (as this can be worked out from the coordinates), but chirality needs
> to be stored explicitly. The normal way to store this is not using
> atom parity (but I'll set this anyway for the same reasons as above),
> but by setting one of the bonds on the tetrahedral center to up or
> down.
>
> For 0D coordinates, there are no guidelines. I propose to store
> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
> ends of a double bond means cis), and chirality using the atom parity.
> The MDL spec states that atom parity should be ignored when read,


I know this is the spec and I don't want to get into more arguments about
whether it should be changed. At this stage I think it is useful if programs
have the capability to read and interpret this field.



> but
> the alternative is to just forget the stereochemistry, or else to
> store both cis/trans stereo *and* chirality in the bond block, which
> may just about be possible but is likely to be a real mess.
>
> Is it ambiguous or merely complicated? If the latter then we should use it
to remove ambiguity.





-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


[BlueObelisk-discuss] Stereochemistry in MDL files

2010-03-02 Thread Noel O'Boyle
(Reposted from my blog following Greg's suggestion )

Hello all,

Right now, I'm adding stereo (i.e. double bond stereochemistry, and
chirality) to the MDL Mol format in OpenBabel. There are three places
where stereochemical information can be stored in these files: the
coordinates, the atom parity (in the atom block), the bond stereo (in
the bond block).

My current understanding is that where 3D coordinates are present,
there's no need to store stereochemical information in either the atom
parity or the bond block. I think I'll probably set the atom parity
anyway (since I've already written the code, and it helps when you
look at the file to be able to easily identify the chiral centers).

For 2D coordinates, there's no need to store the bond stereochemistry
(as this can be worked out from the coordinates), but chirality needs
to be stored explicitly. The normal way to store this is not using
atom parity (but I'll set this anyway for the same reasons as above),
but by setting one of the bonds on the tetrahedral center to up or
down.

For 0D coordinates, there are no guidelines. I propose to store
cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
ends of a double bond means cis), and chirality using the atom parity.
The MDL spec states that atom parity should be ignored when read, but
the alternative is to just forget the stereochemistry, or else to
store both cis/trans stereo *and* chirality in the bond block, which
may just about be possible but is likely to be a real mess.

Any thoughts?

- Noel

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss