Dear JP,
When the Mol2 parser was implemented we had to take a decision at some point
about which format to use. Given the "unspecific" Tripos specs this was
actually quite tricky. If you write the same molecule using Sybyl, Tripos' db
tools or other software like Corina you will get all different results (note
that Tripos is not even giving the same results when using their own tools).
Hence, we decided on corina since this is one of them most widely used tools
and also seems to give the most consitsent results when evaluating a largish
set I converted and reviewed. As you can see, there is a Note when checking the
Mol2 parser (eg MolFromMol2File) that will tell you that it is optimized for
the atom-typing scheme by Corina.
Sorry I can't be of more help
Nik
From: JP [mailto:[email protected]]
Sent: Thursday, January 12, 2012 2:57 PM
To: [email protected]
Subject: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but with a
twist.
Hi there RDkitters,
Using RDKit 2011.09.1 on Ubuntu Linux 11.10 64 bit with a noisy fan.
I am trying to read a MOL2 file (which I think is in line with the Tripos spec
http://tripos.com/data/support/mol2.pdf -- your favorite molecular format, I
know).
The structure is a simple indole. If the atom types in the mol atom block are
C.ar or N.ar the sanitization fails (but I think this should be allowed -
especially since the bonds are also defined as aromatic). If I change the atom
types to C.2 and N.2 respectively then everything works fine and the aromatic
parts of the molecules are still correct (because of the aromatic bond
definitions).
An example of this so you can just copy and paste it:
#!/usr/bin/env python
from rdkit import Chem
# the following is a valid molecule - why does it break?
indole_broken = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1 C1 38.6029 -19.6265 0.0000 C.ar 1 noname
2 C2 38.6029 -21.1665 0.0000 C.ar 1 noname
3 C3 37.2692 -21.9365 0.0000 C.ar 1 noname
4 C4 35.9356 -21.1665 0.0000 C.ar 1 noname
5 C5 35.9356 -19.6265 0.0000 C.ar 1 noname
6 C6 37.2692 -18.8565 0.0000 C.ar 1 noname
7 C7 34.4709 -21.6424 0.0000 C.ar 1 noname
8 C8 33.5657 -20.3965 0.0000 C.ar 1 noname
9 N1 34.4709 -19.1506 0.0000 N.ar 1 noname
10 H1 33.9950 -17.6860 0.0000 H 1 noname
@<TRIPOS>BOND
1 1 2 ar
2 2 3 ar
3 3 4 ar
4 5 6 ar
5 1 6 ar
6 4 7 ar
7 5 4 ar
8 5 9 ar
9 7 8 ar
10 8 9 ar
11 9 10 1
@<TRIPOS>SUBSTRUCTURE
1 noname 1"""
indole_fixed = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1 C1 38.6029 -19.6265 0.0000 C.2 1 noname
2 C2 38.6029 -21.1665 0.0000 C.2 1 noname
3 C3 37.2692 -21.9365 0.0000 C.2 1 noname
4 C4 35.9356 -21.1665 0.0000 C.2 1 noname
5 C5 35.9356 -19.6265 0.0000 C.2 1 noname
6 C6 37.2692 -18.8565 0.0000 C.2 1 noname
7 C7 34.4709 -21.6424 0.0000 C.2 1 noname
8 C8 33.5657 -20.3965 0.0000 C.2 1 noname
9 N1 34.4709 -19.1506 0.0000 N.2 1 noname
10 H1 33.9950 -17.6860 0.0000 H 1 noname
@<TRIPOS>BOND
1 1 2 ar
2 2 3 ar
3 3 4 ar
4 5 6 ar
5 1 6 ar
6 4 7 ar
7 5 4 ar
8 5 9 ar
9 7 8 ar
10 8 9 ar
11 9 10 1
@<TRIPOS>SUBSTRUCTURE
1 noname 1"""
print Chem.MolFromMol2Block(indole_broken)
print Chem.MolFromMol2Block(indole_fixed)
print Chem.MolToSmiles(Chem.MolFromMol2Block(indole_fixed)) #
[c]1[nH]c2c([c]1)[c][c][c][c]2
Any comments?
# Please
Many thanks,
-
Jean-Paul Ebejer
Early Stage Researcher
------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss