Following up on this, I did some tests that indicate to me that changes to
the mol2 reader have led to a general inability to read neutral aromatic
heterocycles.These problems were not present in openbabel 2.3.1.
To look into mol2 files and aromatic heterocycles, I decided I should use
VEHICLe
http://chembl.blogspot.com/2010/04/vehicle-virtual-exploratory.html
Using the file downloaded from there with 24,867 smiles, I decided to look
into the preservation of canonical smiles when going through mol2 format.
$ tail -n +2 VEHICLe.csv | cut -f 2 -d ',' | ./bin/obabel -ismi -ocan | awk
'{print $1}' | tee VEHICLe.can | ./bin/obabel -ican -omol2 | tee
VEHICLe.mol2| ./bin/obabel -imol2 -ocan | awk '{print $1}' >
VEHICLe_roundtripped.can
Then, count the number of times the canonical smiles was changed by going
through the mol2 format
paste VEHICLe.can VEHICLe_roundtripped.can | awk '$1 != $2 | wc -l
Results for three versions of openbabel:
2.3.1 : 1,468
2.3.2 : 21,786
master: 21,933 (88.2% failure rate)
The intermediate mol2 files in all three cases are identical, so this major
regression is isolated to converting mol2 -> canonical smiles
Just to make sure it was not in writing canonical smiles, I did another
test:
don't convert formats, instead do -ican -ocan to just go through openbabel
internal representation.
For all three versions, this test resulted in 1,522 cases of the canonical
smiles being changed. In 2.3.1, all 1,468 cases with changes going through
mol2 are among the 1,522 that change when you just read and write canonical
smiles. This suggests to me that these 1,468 errors should be blamed on
other parts of openbabel. Briefly looking at some of these cases, it
appears they have lost their aromaticity in the can->mol2 transition.
Since the behavior of directly going can->can did not change in between
versions, nor did the intermediate mol2 files during can->mol2->can, the
increase in failures by 20,000 (out of a set of ~25,000 aromatic
heterocycles) is likely due to changes in the mol2 reader.
-David
On Wed, Oct 9, 2013 at 8:34 AM, David Hall <li...@cowsandmilk.net> wrote:
> Hi all,
>
> I just wanted to mention a bug I filed this morning:
>
> https://sourceforge.net/p/openbabel/bugs/897/
>
> Basically this patch to the mol2 reader:
>
> https://github.com/openbabel/openbabel/commit/097636fd7cdba1c842d27a80ccab809c558d0b98#diff-d6e9941b72192e2ba1a2d244948450ae
>
> breaks aromaticity for 1,5,6,7-Tetrahydro-4-indolone. That said, it does
> fix aromaticity for pyridinium, which was its purpose.
>
> Any help or insight into fixing would be appreciated.
>
> Thanks,
> David
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss