Well - Corina does make use of C.ar and N.ar - just not in this combination. 
Problem with having all ar (bonds and atoms) is that it can be be non-specific.


Here is indole as retrieved from Corina:

@<TRIPOS>MOLECULE
NoName
  16   17    0    0    0
SMALL
NO_CHARGES


@<TRIPOS>ATOM
   1 C1            -0.0170     1.4025     0.0098 C.ar
   2 C2            -1.2389     2.0675     0.0301 C.ar
   3 C3            -2.4112     1.3443     0.0423 C.ar
   4 C4            -2.3870    -0.0438     0.0346 C.ar
   5 C5            -1.1984    -0.7171     0.0152 C.ar
   6 C6             0.0021    -0.0041     0.0020 C.ar
   7 C7             1.4152    -0.3895    -0.0184 C.2
   8 C8             2.1316     0.7457    -0.0225 C.2
   9 N9             1.2929     1.8266    -0.0005 N.pl3
  10 H10           -1.2681     3.1471     0.0365 H
  11 H11           -3.3587     1.8624     0.0584 H
  12 H12           -3.3153    -0.5957     0.0445 H
  13 H13           -1.1873    -1.7970     0.0097 H
  14 H14            1.8064    -1.3961    -0.0279 H
  15 H15            3.2102     0.7981    -0.0362 H
  16 H16            1.5785     2.7536     0.0013 H
@<TRIPOS>BOND
   1    1    6 ar
   2    1    9 1
   3    1    2 ar
   4    2    3 ar
   5    2   10 1
   6    3    4 ar
   7    3   11 1
   8    4    5 ar
   9    4   12 1
  10    5    6 ar
  11    5   13 1
  12    6    7 1
  13    7    8 2
  14    7   14 1
  15    8    9 1
  16    8   15 1
  17    9   16 1

#       End of record

And yes - I would say that this is a "Won't fix". Unfortunately, but the mol2 
file format (documentation) is such a pain in general and the multiple 
different implementations doesn't make it any better.

Sorry
Nik



From: JP [mailto:[email protected]]
Sent: Thursday, January 12, 2012 3:33 PM
To: Stiefl, Nikolaus
Cc: [email protected]
Subject: Re: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but 
with a twist.

Thanks for the explanation Stiefl.  File formats - what a pain.
So Corina does not make use of C.ar or N.ar?

This is a "Won't Fix" then ... right?  Maybe a note in the documentation of the 
list of unsupported atom types from the spec (pg 53 in 
http://tripos.com/data/support/mol2.pdf) which are not supported may be useful 
then (as people like me have never used corina) ?

Many thanks,

-
Jean-Paul Ebejer
Early Stage Researcher

On 12 January 2012 14:13, Stiefl, Nikolaus 
<[email protected]<mailto:[email protected]>> wrote:
Dear JP,

When the Mol2 parser was implemented we had to take a decision at some point 
about which format to use. Given the "unspecific" Tripos specs this was 
actually quite tricky. If you write the same molecule using Sybyl, Tripos' db 
tools or other software like Corina you will get all different results (note 
that Tripos is not even giving the same results when using their own tools).

Hence, we decided on corina since this is one of them most widely used tools 
and also seems to give the most consitsent results when evaluating a largish 
set I converted and reviewed. As you can see, there is a Note when checking the 
Mol2 parser (eg MolFromMol2File) that will tell you that it is optimized for 
the atom-typing scheme by Corina.

Sorry I can't be of more help

Nik

From: JP 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, January 12, 2012 2:57 PM
To: 
[email protected]<mailto:[email protected]>
Subject: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but with a 
twist.

Hi there RDkitters,

Using RDKit 2011.09.1 on Ubuntu Linux 11.10 64 bit with a noisy fan.

I am trying to read a MOL2 file (which I think is in line with the Tripos spec 
http://tripos.com/data/support/mol2.pdf -- your favorite molecular format, I 
know).

The structure is a simple indole.  If the atom types in the mol atom block are 
C.ar or N.ar the sanitization fails (but I think this should be allowed - 
especially since the bonds are also defined as aromatic).  If I change the atom 
types to C.2 and N.2 respectively then everything works fine and the aromatic 
parts of the molecules are still correct (because of the aromatic bond 
definitions).

An example of this so you can just copy and paste it:

#!/usr/bin/env python

from rdkit import Chem

# the following is a valid molecule - why does it break?
indole_broken = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1          C1    38.6029   -19.6265     0.0000    C.ar     1          noname
2          C2    38.6029   -21.1665     0.0000    C.ar     1          noname
3          C3    37.2692   -21.9365     0.0000    C.ar     1          noname
4          C4    35.9356   -21.1665     0.0000    C.ar     1          noname
5          C5    35.9356   -19.6265     0.0000    C.ar     1          noname
6          C6    37.2692   -18.8565     0.0000    C.ar     1          noname
7          C7    34.4709   -21.6424     0.0000    C.ar     1          noname
8          C8    33.5657   -20.3965     0.0000    C.ar     1          noname
9          N1    34.4709   -19.1506     0.0000    N.ar     1          noname
10        H1    33.9950   -17.6860     0.0000    H         1          noname
@<TRIPOS>BOND
1          1          2          ar
2          2          3          ar
3          3          4          ar
4          5          6          ar
5          1          6          ar
6          4          7          ar
7          5          4          ar
8          5          9          ar
9          7          8          ar
10        8          9          ar
11        9          10        1
@<TRIPOS>SUBSTRUCTURE
1          noname           1"""

indole_fixed = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1          C1    38.6029   -19.6265     0.0000    C.2      1          noname
2          C2    38.6029   -21.1665     0.0000    C.2      1          noname
3          C3    37.2692   -21.9365     0.0000    C.2      1          noname
4          C4    35.9356   -21.1665     0.0000    C.2      1          noname
5          C5    35.9356   -19.6265     0.0000    C.2      1          noname
6          C6    37.2692   -18.8565     0.0000    C.2      1          noname
7          C7    34.4709   -21.6424     0.0000    C.2      1          noname
8          C8    33.5657   -20.3965     0.0000    C.2      1          noname
9          N1    34.4709   -19.1506     0.0000    N.2      1          noname
10        H1    33.9950   -17.6860     0.0000    H         1          noname
@<TRIPOS>BOND
1          1          2          ar
2          2          3          ar
3          3          4          ar
4          5          6          ar
5          1          6          ar
6          4          7          ar
7          5          4          ar
8          5          9          ar
9          7          8          ar
10        8          9          ar
11        9          10        1
@<TRIPOS>SUBSTRUCTURE
1          noname           1"""

print Chem.MolFromMol2Block(indole_broken)
print Chem.MolFromMol2Block(indole_fixed)
print Chem.MolToSmiles(Chem.MolFromMol2Block(indole_fixed)) # 
[c]1[nH]c2c([c]1)[c][c][c][c]2


Any comments?
# Please

Many thanks,


-
Jean-Paul Ebejer
Early Stage Researcher

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to