I've been converting a load of mol files to smiles. The mol files are
all without hydrogens, and turn into smiles. However, I then converted
these smiles back into smiles (simply to aggregate a number of files
into one large one) and 2 of the smiles become wrong.

The culprits are:
Before:
c12c3c(C(=O)c1cc(cc2)C=C)cccc3 2-Vinyl-9H-fluoren-9-one
[s]1n[s]n1 1H,3H-1,3,2,4-Dithiadiazete

After:
C12=C3C(=CC=CC3)C(=O)C1C=C(C=C2)C=C 2-Vinyl-9H-fluoren-9-one
S1NSN1 1H,3H-1,3,2,4-Dithiadiazete

This is using version 2.2.99.

As you can see the first s miles has had 2 H added to it. The second
smiles isn't even valid according to depict. Once it is loaded back
into babel and out again as a smiles, its seems to have swapped the
hydrogens around from what was intended in the origional mol file. (or
possibly the mol file is wrong, I'm not sure about the numbering of
Dithiadiazete compounds.)

The mol files these came from are given below. The first problem
molecule is probably related to the aromatic smiles bug, the second
problem molecule is outputting an invalid Smiles according to daylight
depict.

I'll have a quick root around and see if I can find the bug.

Nick England

00798725K

csCF900/09280922592D



 16 18  0  0  0  0  0  0  0  0999 V2000

    4.4563   -0.0666    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    3.6473    0.5212    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    5.2653    0.5212    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    4.5608   -1.0611    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    3.9563    1.4723    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    2.6690    0.3133    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    4.9563    1.4723    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    6.1788    0.1145    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    5.4744   -1.4678    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    3.2872    2.2153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    2.0000    1.0564    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    5.5441    2.2813    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0

    6.2834   -0.8800    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    2.3090    2.0075    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    7.1969   -1.2868    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

    7.3015   -2.2813    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0

  1  2  1  0  0  0  0

  1  3  2  0  0  0  0

  1  4  1  0  0  0  0

  2  5  2  0  0  0  0

  2  6  1  0  0  0  0

  3  7  1  0  0  0  0

  3  8  1  0  0  0  0

  4  9  2  0  0  0  0

  5  7  1  0  0  0  0

  5 10  1  0  0  0  0

  6 11  2  0  0  0  0

  7 12  2  0  0  0  0

  8 13  2  0  0  0  0

  9 13  1  0  0  0  0

 10 14  2  0  0  0  0

 11 14  1  0  0  0  0

 13 15  1  0  0  0  0

 15 16  2  0  0  0  0

M  END

01513195K

csCF900/09290901162D



  4  4  0  0  0  0  0  0  0  0999 V2000

    2.7071   -0.7070    0.0000 S   0  0  0  0  0  4  0  0  0  0  0  0

    2.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0

    3.4142    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0

    2.7071    0.7070    0.0000 S   0  0  0  0  0  4  0  0  0  0  0  0

  1  2  1  0  0  0  0

  1  3  2  0  0  0  0

  2  4  2  0  0  0  0

  3  4  1  0  0  0  0

M  END




2009/7/21 Yongjin Xu <[email protected]>:
> One more example may be related:
> C1=S=NOC1=O   -->  c1snoc1=O
> the output SMILES is wrong based on daylight parser. actually the output
> is ambiguous since there may be several ways of double bond placement and
> also the oxidation states of S. this information just get removed during the
> conversion.
> By the way, does anyone know why I am getting a segmentation error when I
> try to use Separate() in OBMol, when the molecule I passed to is something
> like:  c1ccccc1.Cc1cccnc1
> Thanks
> Yongjin
>
> On Tue, Jul 21, 2009 at 7:32 AM, Craig A. James <[email protected]>
> wrote:
>>
>> Noel O'Boyle wrote:
>> > I would say that the evidence could just as well point to a
>> > kekulization bug. It should be easy though to rule in/out a smiles
>> > parser error, right?
>>
>> Well, it's not parentheses. Another beautiful theory shot down by ugly
>> data:
>>
>> c1c2nonc2ccc1  ==>  c1ccc2nonc2c1
>> c1cc2nonc2cc1  ==>  C1CCC2NONC2C1
>>
>> This is the simplest example yet.  I'll keep digging.
>>
>> I'm now going on the theory that ring-closure parsing sets the internal
>> state variables (_order, _aromNH, and so forth) slightly differently than
>> normal bond parsing, resulting in missing information that the Kekule code
>> needs.  I don't believe it's the Kekule code itself since these molecules
>> are identical in every respect.  It's just about got to be in the SMILES
>> parser.
>>
>> I have to say, now that I'm looking at the SMILES parser in some detail,
>> the aromaticity detection is very confusing.  It's trying to give tentative
>> aromaticity assignment to bonds as it parses; for example, at line 850, it
>> decides that if this atom is aromatic and the previous one was aromatic,
>> then the bond is assigned order 5 ("potential aromatic"), which is bogus.
>>
>> This may just be a matter of misleading comments.  If bond order 5 is
>> considered "unspecified bond type" rather than the misleading "potential
>> aromatic," the semantics of the code would be more correct.
>>
>> Slogging onward...
>>
>> Craig
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Enter the BlackBerry Developer Challenge
>> This is your chance to win up to $100,000 in prizes! For a limited time,
>> vendors submitting new applications to BlackBerry App World(TM) will have
>> the opportunity to enter the BlackBerry Developer Challenge. See full
>> prize
>> details at: http://p.sf.net/sfu/Challenge
>> _______________________________________________
>> OpenBabel-Devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> OpenBabel-Devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
OpenBabel-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to