On 9/27/10 4:24 AM, Noel O'Boyle wrote: > To summarise, I need to make a decision on how to handle cis/trans > stereochem at ring closures in SMILES and I think that Daylight are > wrong. > > Daylight has the following: (a) C/C=C/1NC1 is the same as (b) > C/C=C1NC\1 and (c) C/C=C/1NC\1 > > The consequences are that (1) conjugated double bonds in a ring cannot > be represented by SMILES, and (2) only expert users would know that > SMILES a and b are the same.
For those of you who weren't in on the Blue Obelisk discussion, Noel's point (1) is documented here: http://opensmiles.org/spec/open-smiles-6-extensions.html#6.3 In that Blue Obelisk discussion, I ultimately argued for this syntax: CC/=C1CN1 trans CC\=C1CN1 cis This proposal is totally clear, and it doesn't suffer from all of the weird problems that we've all complained about for years. > If I were in charge of SMILES, I would require that only the ring > closure symbol on the double bond indicated the stereochemistry. > Stereo at the other ring closures would be ignored. This would > increase the functionality and reduce ambiguity. The most general rule is "Mark either single bond or both, but if you mark both, the marks must agree." > Specifically (1) conjugated double bonds in a ring would be > representable by SMILES, and (2) SMILES b, with its inherent > ambiguity, would be regarded as unspecified stereo. With the proposed syntax above (CC/=C1CN1), the issue never even comes up. > So what to do? Some sort of compromise. > > (1) Well, for starters Open Babel will only write out form (a). (I > think this is already implemented, but I will be checking) > (2) Where two stereos are specified, Open Babel will ignore the one > that is not on the double bond. This will allow the representation of > conjugated double bonds in a ring. > (3) Where a single stereo is specified, it will be interpreted in > accordance with Daylight's ambiguous system. (I would much prefer just > to ignore the stereo symbol at a ring closure away from the double > bond I'm not sure "ambiguous" is quite the right word. It's very confusing for sure, but strictly speaking there is no ambiguity. Just ambiguous documentation of a very tricky specification. > ... it's almost certain that the user will not have understood > Daylight's arbitary rule and it will be wrong.) I agree 100%! > I discussed this to a certain extent on the blueobelisk-smiles list > with Bob Hanson and Craig, but I didn't have time/energy to flesh out > my disagreement. The core argument presented against what I am > suggesting is that this is not what Daylight does. I no longer think > this is a valid reason for crippling the SMILES spec in this way. I agree. It's time to move forward on this problem. SMILES was an amazing and elegant solution to the problem of representing chemical structure, but this is a huge flaw. When you explain it with a trivial example like dimethyl ethene, it seems like a cool typographical solution. But even a simple problem like C/C=C/C trans C(/C)=C/C cis has fooled lots of people, and several one of the important SMILES parsers (including OpenBabel) got this wrong. By contrast, there's only one way to write CC/=CC trans CC\=CC cis and it's never ambiguous. The key to why this fixes the problem is that it puts the stereo symbol at the stereo center. It's just like the '@' symbol: You put it on the atom, not the surrounding bonds. Imagine if the original SMILES spec had specified tetrahedral centers on the neighbor atoms, something like this: [...@1]c([...@2)([...@3])[...@4]. It looks silly, right? The problems are immediately obvious: What if the first carbon was part of another stereo center? But that's exactly what the current SMILES specification does for double-bond stereochemistry. It moves the stereo specification away from the double bond onto the single bonds nearby, and therein lies the problem. By moving the stereo spec back onto the double bond itself, all the problems go away. Back to the present... the immediate problem of what to do for OpenBabel really has two parts: parsing SMILES and printing SMILES. I think they are two different problems. For printing SMILES, I totally agree with your plan: Only print out example (a). But for parsing SMILES, I think we should use the "either bond, or both, but if both they must agree". It just doesn't seem like a good idea to ignore errors. If the stereochemistry symbols disagree then there's a 50/50 chance you're picking the wrong one. Craig ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel