Dear JP,

When the Mol2 parser was implemented we had to take a decision at some point 
about which format to use. Given the "unspecific" Tripos specs this was 
actually quite tricky. If you write the same molecule using Sybyl, Tripos' db 
tools or other software like Corina you will get all different results (note 
that Tripos is not even giving the same results when using their own tools).

Hence, we decided on corina since this is one of them most widely used tools 
and also seems to give the most consitsent results when evaluating a largish 
set I converted and reviewed. As you can see, there is a Note when checking the 
Mol2 parser (eg MolFromMol2File) that will tell you that it is optimized for 
the atom-typing scheme by Corina.

Sorry I can't be of more help

Nik

From: JP [mailto:[email protected]]
Sent: Thursday, January 12, 2012 2:57 PM
To: [email protected]
Subject: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but with a 
twist.

Hi there RDkitters,

Using RDKit 2011.09.1 on Ubuntu Linux 11.10 64 bit with a noisy fan.

I am trying to read a MOL2 file (which I think is in line with the Tripos spec 
http://tripos.com/data/support/mol2.pdf -- your favorite molecular format, I 
know).

The structure is a simple indole.  If the atom types in the mol atom block are 
C.ar or N.ar the sanitization fails (but I think this should be allowed - 
especially since the bonds are also defined as aromatic).  If I change the atom 
types to C.2 and N.2 respectively then everything works fine and the aromatic 
parts of the molecules are still correct (because of the aromatic bond 
definitions).

An example of this so you can just copy and paste it:

#!/usr/bin/env python

from rdkit import Chem

# the following is a valid molecule - why does it break?
indole_broken = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1          C1    38.6029   -19.6265     0.0000    C.ar     1          noname
2          C2    38.6029   -21.1665     0.0000    C.ar     1          noname
3          C3    37.2692   -21.9365     0.0000    C.ar     1          noname
4          C4    35.9356   -21.1665     0.0000    C.ar     1          noname
5          C5    35.9356   -19.6265     0.0000    C.ar     1          noname
6          C6    37.2692   -18.8565     0.0000    C.ar     1          noname
7          C7    34.4709   -21.6424     0.0000    C.ar     1          noname
8          C8    33.5657   -20.3965     0.0000    C.ar     1          noname
9          N1    34.4709   -19.1506     0.0000    N.ar     1          noname
10        H1    33.9950   -17.6860     0.0000    H         1          noname
@<TRIPOS>BOND
1          1          2          ar
2          2          3          ar
3          3          4          ar
4          5          6          ar
5          1          6          ar
6          4          7          ar
7          5          4          ar
8          5          9          ar
9          7          8          ar
10        8          9          ar
11        9          10        1
@<TRIPOS>SUBSTRUCTURE
1          noname           1"""

indole_fixed = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1          C1    38.6029   -19.6265     0.0000    C.2      1          noname
2          C2    38.6029   -21.1665     0.0000    C.2      1          noname
3          C3    37.2692   -21.9365     0.0000    C.2      1          noname
4          C4    35.9356   -21.1665     0.0000    C.2      1          noname
5          C5    35.9356   -19.6265     0.0000    C.2      1          noname
6          C6    37.2692   -18.8565     0.0000    C.2      1          noname
7          C7    34.4709   -21.6424     0.0000    C.2      1          noname
8          C8    33.5657   -20.3965     0.0000    C.2      1          noname
9          N1    34.4709   -19.1506     0.0000    N.2      1          noname
10        H1    33.9950   -17.6860     0.0000    H         1          noname
@<TRIPOS>BOND
1          1          2          ar
2          2          3          ar
3          3          4          ar
4          5          6          ar
5          1          6          ar
6          4          7          ar
7          5          4          ar
8          5          9          ar
9          7          8          ar
10        8          9          ar
11        9          10        1
@<TRIPOS>SUBSTRUCTURE
1          noname           1"""

print Chem.MolFromMol2Block(indole_broken)
print Chem.MolFromMol2Block(indole_fixed)
print Chem.MolToSmiles(Chem.MolFromMol2Block(indole_fixed)) # 
[c]1[nH]c2c([c]1)[c][c][c][c]2


Any comments?
# Please

Many thanks,


-
Jean-Paul Ebejer
Early Stage Researcher
------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to