Well - Corina does make use of C.ar and N.ar - just not in this combination.
Problem with having all ar (bonds and atoms) is that it can be be non-specific.
Here is indole as retrieved from Corina:
@<TRIPOS>MOLECULE
NoName
16 17 0 0 0
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1 C1 -0.0170 1.4025 0.0098 C.ar
2 C2 -1.2389 2.0675 0.0301 C.ar
3 C3 -2.4112 1.3443 0.0423 C.ar
4 C4 -2.3870 -0.0438 0.0346 C.ar
5 C5 -1.1984 -0.7171 0.0152 C.ar
6 C6 0.0021 -0.0041 0.0020 C.ar
7 C7 1.4152 -0.3895 -0.0184 C.2
8 C8 2.1316 0.7457 -0.0225 C.2
9 N9 1.2929 1.8266 -0.0005 N.pl3
10 H10 -1.2681 3.1471 0.0365 H
11 H11 -3.3587 1.8624 0.0584 H
12 H12 -3.3153 -0.5957 0.0445 H
13 H13 -1.1873 -1.7970 0.0097 H
14 H14 1.8064 -1.3961 -0.0279 H
15 H15 3.2102 0.7981 -0.0362 H
16 H16 1.5785 2.7536 0.0013 H
@<TRIPOS>BOND
1 1 6 ar
2 1 9 1
3 1 2 ar
4 2 3 ar
5 2 10 1
6 3 4 ar
7 3 11 1
8 4 5 ar
9 4 12 1
10 5 6 ar
11 5 13 1
12 6 7 1
13 7 8 2
14 7 14 1
15 8 9 1
16 8 15 1
17 9 16 1
# End of record
And yes - I would say that this is a "Won't fix". Unfortunately, but the mol2
file format (documentation) is such a pain in general and the multiple
different implementations doesn't make it any better.
Sorry
Nik
From: JP [mailto:[email protected]]
Sent: Thursday, January 12, 2012 3:33 PM
To: Stiefl, Nikolaus
Cc: [email protected]
Subject: Re: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but
with a twist.
Thanks for the explanation Stiefl. File formats - what a pain.
So Corina does not make use of C.ar or N.ar?
This is a "Won't Fix" then ... right? Maybe a note in the documentation of the
list of unsupported atom types from the spec (pg 53 in
http://tripos.com/data/support/mol2.pdf) which are not supported may be useful
then (as people like me have never used corina) ?
Many thanks,
-
Jean-Paul Ebejer
Early Stage Researcher
On 12 January 2012 14:13, Stiefl, Nikolaus
<[email protected]<mailto:[email protected]>> wrote:
Dear JP,
When the Mol2 parser was implemented we had to take a decision at some point
about which format to use. Given the "unspecific" Tripos specs this was
actually quite tricky. If you write the same molecule using Sybyl, Tripos' db
tools or other software like Corina you will get all different results (note
that Tripos is not even giving the same results when using their own tools).
Hence, we decided on corina since this is one of them most widely used tools
and also seems to give the most consitsent results when evaluating a largish
set I converted and reviewed. As you can see, there is a Note when checking the
Mol2 parser (eg MolFromMol2File) that will tell you that it is optimized for
the atom-typing scheme by Corina.
Sorry I can't be of more help
Nik
From: JP
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, January 12, 2012 2:57 PM
To:
[email protected]<mailto:[email protected]>
Subject: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but with a
twist.
Hi there RDkitters,
Using RDKit 2011.09.1 on Ubuntu Linux 11.10 64 bit with a noisy fan.
I am trying to read a MOL2 file (which I think is in line with the Tripos spec
http://tripos.com/data/support/mol2.pdf -- your favorite molecular format, I
know).
The structure is a simple indole. If the atom types in the mol atom block are
C.ar or N.ar the sanitization fails (but I think this should be allowed -
especially since the bonds are also defined as aromatic). If I change the atom
types to C.2 and N.2 respectively then everything works fine and the aromatic
parts of the molecules are still correct (because of the aromatic bond
definitions).
An example of this so you can just copy and paste it:
#!/usr/bin/env python
from rdkit import Chem
# the following is a valid molecule - why does it break?
indole_broken = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1 C1 38.6029 -19.6265 0.0000 C.ar 1 noname
2 C2 38.6029 -21.1665 0.0000 C.ar 1 noname
3 C3 37.2692 -21.9365 0.0000 C.ar 1 noname
4 C4 35.9356 -21.1665 0.0000 C.ar 1 noname
5 C5 35.9356 -19.6265 0.0000 C.ar 1 noname
6 C6 37.2692 -18.8565 0.0000 C.ar 1 noname
7 C7 34.4709 -21.6424 0.0000 C.ar 1 noname
8 C8 33.5657 -20.3965 0.0000 C.ar 1 noname
9 N1 34.4709 -19.1506 0.0000 N.ar 1 noname
10 H1 33.9950 -17.6860 0.0000 H 1 noname
@<TRIPOS>BOND
1 1 2 ar
2 2 3 ar
3 3 4 ar
4 5 6 ar
5 1 6 ar
6 4 7 ar
7 5 4 ar
8 5 9 ar
9 7 8 ar
10 8 9 ar
11 9 10 1
@<TRIPOS>SUBSTRUCTURE
1 noname 1"""
indole_fixed = """@<TRIPOS>MOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1 C1 38.6029 -19.6265 0.0000 C.2 1 noname
2 C2 38.6029 -21.1665 0.0000 C.2 1 noname
3 C3 37.2692 -21.9365 0.0000 C.2 1 noname
4 C4 35.9356 -21.1665 0.0000 C.2 1 noname
5 C5 35.9356 -19.6265 0.0000 C.2 1 noname
6 C6 37.2692 -18.8565 0.0000 C.2 1 noname
7 C7 34.4709 -21.6424 0.0000 C.2 1 noname
8 C8 33.5657 -20.3965 0.0000 C.2 1 noname
9 N1 34.4709 -19.1506 0.0000 N.2 1 noname
10 H1 33.9950 -17.6860 0.0000 H 1 noname
@<TRIPOS>BOND
1 1 2 ar
2 2 3 ar
3 3 4 ar
4 5 6 ar
5 1 6 ar
6 4 7 ar
7 5 4 ar
8 5 9 ar
9 7 8 ar
10 8 9 ar
11 9 10 1
@<TRIPOS>SUBSTRUCTURE
1 noname 1"""
print Chem.MolFromMol2Block(indole_broken)
print Chem.MolFromMol2Block(indole_fixed)
print Chem.MolToSmiles(Chem.MolFromMol2Block(indole_fixed)) #
[c]1[nH]c2c([c]1)[c][c][c][c]2
Any comments?
# Please
Many thanks,
-
Jean-Paul Ebejer
Early Stage Researcher
------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss