Thanks for the explanation Stiefl. File formats - what a pain.
So Corina does not make use of C.ar or N.ar?
This is a "Won't Fix" then ... right? Maybe a note in the documentation of
the list of unsupported atom types from the spec (pg 53 in
http://tripos.com/data/support/mol2.pdf) which are not supported may be
useful then (as people like me have never used corina) ?
Many thanks,
-
Jean-Paul Ebejer
Early Stage Researcher
On 12 January 2012 14:13, Stiefl, Nikolaus <[email protected]>wrote:
> Dear JP,****
>
> ** **
>
> When the Mol2 parser was implemented we had to take a decision at some
> point about which format to use. Given the “unspecific” Tripos specs this
> was actually quite tricky. If you write the same molecule using Sybyl,
> Tripos’ db tools or other software like Corina you will get all different
> results (note that Tripos is not even giving the same results when using
> their own tools).****
>
> ** **
>
> Hence, we decided on corina since this is one of them most widely used
> tools and also seems to give the most consitsent results when evaluating a
> largish set I converted and reviewed. As you can see, there is a Note when
> checking the Mol2 parser (eg MolFromMol2File) that will tell you that it is
> optimized for the atom-typing scheme by Corina.****
>
> ** **
>
> Sorry I can’t be of more help****
>
> ** **
>
> Nik****
>
> ** **
>
> *From:* JP [mailto:[email protected]]
> *Sent:* Thursday, January 12, 2012 2:57 PM
> *To:* [email protected]
> *Subject:* [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol --
> but with a twist.****
>
> ** **
>
> Hi there RDkitters,****
>
> ** **
>
> Using RDKit 2011.09.1 on Ubuntu Linux 11.10 64 bit with a noisy fan.****
>
> ** **
>
> I am trying to read a MOL2 file (which I think is in line with the Tripos
> spec http://tripos.com/data/support/mol2.pdf -- your favorite molecular
> format, I know).****
>
> ** **
>
> The structure is a simple indole. If the atom types in the mol atom block
> are C.ar or N.ar the sanitization fails (but I think this should be allowed
> - especially since the bonds are also defined as aromatic). If I change
> the atom types to C.2 and N.2 respectively then everything works fine and
> the aromatic parts of the molecules are still correct (because of the
> aromatic bond definitions).****
>
> ** **
>
> An example of this so you can just copy and paste it:****
>
> ** **
>
> #!/usr/bin/env python****
>
> ** **
>
> from rdkit import Chem****
>
> ** **
>
> # the following is a valid molecule - why does it break?****
>
> indole_broken = """@<TRIPOS>MOLECULE****
>
> MVSketch_Indole****
>
> 10 11 1****
>
> SMALL****
>
> NO_CHARGES****
>
> @<TRIPOS>ATOM****
>
> 1 C1 38.6029 -19.6265 0.0000 C.ar 1
> noname****
>
> 2 C2 38.6029 -21.1665 0.0000 C.ar 1
> noname****
>
> 3 C3 37.2692 -21.9365 0.0000 C.ar 1
> noname****
>
> 4 C4 35.9356 -21.1665 0.0000 C.ar 1
> noname****
>
> 5 C5 35.9356 -19.6265 0.0000 C.ar 1
> noname****
>
> 6 C6 37.2692 -18.8565 0.0000 C.ar 1
> noname****
>
> 7 C7 34.4709 -21.6424 0.0000 C.ar 1
> noname****
>
> 8 C8 33.5657 -20.3965 0.0000 C.ar 1
> noname****
>
> 9 N1 34.4709 -19.1506 0.0000 N.ar 1
> noname****
>
> 10 H1 33.9950 -17.6860 0.0000 H 1
> noname****
>
> @<TRIPOS>BOND****
>
> 1 1 2 ar****
>
> 2 2 3 ar****
>
> 3 3 4 ar****
>
> 4 5 6 ar****
>
> 5 1 6 ar****
>
> 6 4 7 ar****
>
> 7 5 4 ar****
>
> 8 5 9 ar****
>
> 9 7 8 ar****
>
> 10 8 9 ar****
>
> 11 9 10 1****
>
> @<TRIPOS>SUBSTRUCTURE****
>
> 1 noname 1"""****
>
> ** **
>
> indole_fixed = """@<TRIPOS>MOLECULE****
>
> MVSketch_Indole****
>
> 10 11 1****
>
> SMALL****
>
> NO_CHARGES****
>
> @<TRIPOS>ATOM****
>
> 1 C1 38.6029 -19.6265 0.0000 C.2 1
> noname****
>
> 2 C2 38.6029 -21.1665 0.0000 C.2 1
> noname****
>
> 3 C3 37.2692 -21.9365 0.0000 C.2 1
> noname****
>
> 4 C4 35.9356 -21.1665 0.0000 C.2 1
> noname****
>
> 5 C5 35.9356 -19.6265 0.0000 C.2 1
> noname****
>
> 6 C6 37.2692 -18.8565 0.0000 C.2 1
> noname****
>
> 7 C7 34.4709 -21.6424 0.0000 C.2 1
> noname****
>
> 8 C8 33.5657 -20.3965 0.0000 C.2 1
> noname****
>
> 9 N1 34.4709 -19.1506 0.0000 N.2 1
> noname****
>
> 10 H1 33.9950 -17.6860 0.0000 H 1
> noname****
>
> @<TRIPOS>BOND****
>
> 1 1 2 ar****
>
> 2 2 3 ar****
>
> 3 3 4 ar****
>
> 4 5 6 ar****
>
> 5 1 6 ar****
>
> 6 4 7 ar****
>
> 7 5 4 ar****
>
> 8 5 9 ar****
>
> 9 7 8 ar****
>
> 10 8 9 ar****
>
> 11 9 10 1****
>
> @<TRIPOS>SUBSTRUCTURE****
>
> 1 noname 1"""****
>
> ** **
>
> print Chem.MolFromMol2Block(indole_broken)****
>
> print Chem.MolFromMol2Block(indole_fixed)****
>
> print Chem.MolToSmiles(Chem.MolFromMol2Block(indole_fixed))
> # [c]1[nH]c2c([c]1)[c][c][c][c]2****
>
> ** **
>
> ** **
>
> Any comments? ****
>
> # Please****
>
> ** **
>
> Many thanks,****
>
> ** **
>
>
> -
> Jean-Paul Ebejer
> Early Stage Researcher****
>
------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss