Hello.

Sorry if it has taken me too long (almost one month to continue with
this thread!) to elaborate a list of files with representative examples
to illustrate OpenBabel performance on inorganic compounds. I have not
found till now time for doing this. As you can see (I prevent you!) the
message is very long even if it only deals with nine examples.

I have selected CIF files of the Crystallography Open Database trying to
focus each example in a single particular problem, avoiding examples
with more than one simultaneous problem and also files with
"crystallographic" problems (symmetry, disorder, poor or incomplete
data, ill-formatted files, ...). The CIF files used for the tests may be
downloaded from COD using the URL's
http://www.crystallography.net/xxxxxxx.cif (xxxxxxx is the numeric
identifier used for each structure in COD).

For each file, I perform the command "babel -aB xxxxxxx.cif -osmi",
using openbabel 2.2.3 and openbabel 2.3.2.

The "-aB" flag is used to include in the molecular model all bonds
listed by the authors in the CIF files, since without it openbabel
sometimes leaves some bonds out (I think that the maximum number of
bonds without the "-aB" flag is too short in many cases). This approach
has the disadvantage that, in some cases, authors list in the CIF
distances that may be interesting for them but that are not "bonds" thus
introducing spurious bonds in the result but, in most cases, "-aB"
option gives more satisfactory results. This only applies to version
2.2.3. Version 2.3.2 seems to ignore the "-aB" flag.

After quoting openbabel output, I indicate what I think it must be the
correct result, taking into account that the definition of "bond" or
"bond order" in inorganic chemistry is not as well established as in
organic chemistry and some of "my" representations may be sub judice. I
also write a short comment about the result for each example.

2008819.cif (a pyridine complex).
Babel 2.2.3 output: [Os](Cl)(F)(F)([n]1ccccc1)([n]1ccccc1)[n]1ccccc1
Babel 2.3.2 output: [Os](Cl)(F)(F)(N1C=CCC=C1)(N1C=CCC=C1)N1C=CCC=C1
Expected: [Os](Cl)(F)(F)([n]1ccccc1)([n]1ccccc1)[n]1ccccc1
* Version 2.2.3 got it right whereas version 2.3.2 "dearomatizes"
pyridine and inserts an spurious H atom in one of the carbon atoms of
the ring trying to keep valence 3 for nitrogen at all costs.

2227419.cif (a bipyridine complex).
Babel 2.2.3 output: [Pt]1(Br)(Br)(Br)(Br)[n]2ccccc2c2[n]1cccc2
Babel 2.3.2 output: [Pt]1(Br)(Br)(Br)(Br)N2C=CCC=C2[C@@H]2N1C=CC=C2
Expected: [Pt]1(Br)(Br)(Br)(Br)[n]2ccccc2c2[n]1cccc2
* Similar to the previous example. The spurious H-atom introduced by
2.3.2 in one of the rings also implies some imaginary chirality.

2223192.cif (a phenanthroline complex).
Babel 2.2.3 output: [Mo]1(F)(F)([O])([O])[n]2cccc3c2c2[n]1cccc2cc3
Babel 2.3.2 output: [Mo]1(F)(F)([O])([O])n2cccc3c2c2n1cccc2cc3
Expected: [Mo]1(F)(F)(=O)(=O)[n]2cccc3c2c2[n]1cccc2cc3
* Version 2.2.3 got the phenanthroline right and 2.3.2 does not put
brackets in the nitrogens. I think brackets are necessary since nitrogen
is not using its standard valence. Nevertheless, no spurious H atoms are
added in this case. Babel does not regard the molybdenum-oxygen bond as
double, probably this is quite difficult to spot.

8100257.cif (a phosphane complex).
Babel output (both versions):
[Ru](Cl)(Cl)(P(c1ccccc1)(c1ccccc1)c1ccccc1)(P(c1ccccc1)(c1ccccc1)c1ccccc1)P(c1ccccc1)(c1ccccc1)c1ccccc1
Expected:
[Ru](Cl)(Cl)([P](c1ccccc1)(c1ccccc1)c1ccccc1)([P](c1ccccc1)(c1ccccc1)c1ccccc1)[P](c1ccccc1)(c1ccccc1)c1ccccc1
* Brackets in phosphorus are required. In SMILES specification, standard
valences for phosphorus are 3 and 5, hence not including the brackets
means adding a spurious implicit H-atom attached to each phosphorus.

7007515.cif (an acetylacetonato complex).
Babel 2.2.3 output: [Pb]12(O[C@H](C)C=C(C)O1)O[C@@H](C)C=C(C)O2
Babel 2.3.2 output: [Pb@]12(O[C](C)C=C(C)O1)O[C](C)C=C(C)O2
Expected: [Pb]12([O]=C(C)C=C(C)O1)[O]=C(C)C=C(C)O2
* Both version try to keep valence two for all oxygen atoms, 2.2.3 add
spurious H atoms (with invented chirality) to one of the C atoms
attached to oxygen, whereas 2.3.2 regard these C atoms as radical
trivalent centres. I think that the best representation is writing one
of the two possible resonance forms of acetylacetonate, with C=O at one
side and C-O at the other.

2228718.cif (an imino complex).
Babel 2.2.3 output: [Cu@]12(Cl)Oc3ccccc3[C@H](N1CC[N]12CCOCC1)C
Babel 2.3.2 output: [Cu@]12(Cl)Oc3ccccc3[C](N1CC[N]12CCOCC1)C
Expected: [Cu]12(Cl)Oc3ccccc3C(=[N]1CC[N]12CCOCC1)C
* Similar to previous example. Trying to keep valence 3 for the imino N
atom, 2.2.3 add a spurious H-atom and 2.3.2 set the imino carbon as a
radical. The chiral mark at Cu is correct for a individual molecule but
the crystal is racemic and hence, it is more correct to remove it.

7105215.cif (an ether complex).
Babel 2.2.3 output:
[Th]12([Cl-])(Cl)([Cl-])(Cl)([O@H](C)CC[O@H]1C)[O@H](C)CC[O@H]2C
Babel 2.3.2 output: [Th](Cl)Cl.[Cl-].[Cl-].O(C)CCOC.O(C)CCOC
Expected: [Th]12(Cl)(Cl)(Cl)(Cl)([O](C)CC[O]1C)[O](C)CC[O]2C
* Version 2.2.3 binds spurious H-atoms to oxygen (why it is preferred an
oxygen with valence 4 to an oxygen with valence 3??). Also, two
chlorides appear as "Cl" and the other two as "[Cl-]". Version 2.3.2
ignores some Th-Cl and Th-O bonds leaving some disconnected moieties:
probably Th-Cl and Th-O distances are too large to be considered by
Openbabel as "bonds" (Th is a large atom!!) and clearly ignores the
"-aB" flag. Using 2.2.3 without the "-aB" flag yields the same result
than 2.3.2, regardless if "-aB" is or is not present with the latter.

2004668.cif (a closo carborane).
Babel 2.2.3 output: [C]1234([CH]567B891B1%102B2%113B345B45%11B%11%
102B291B168B734B5%1121)c1ccccc1
Babel 2.3.2 output:
[C@]12([C@H]3[B@H]4BB[B@@H]1[B@H]1[B@@H]2BB[B@@H]3[B@@H]41)c1ccccc1
Expected: [C]1234([CH]567[BH]891[BH]1%102[BH]2%113[BH]345[BH]45%11[BH]%
11%102[BH]291[BH]168[BH]734[BH]5%1121)c1ccccc1
* Boron and carbon are not using their standard valence (both form 6
bonds!) so the use of brackets and the explicit indication of the number
of hydrogens is compulsory. In version 2.3.2 a lot of edges of the
icosahedron are missing, even if the 30 bonds are listed in the CIF
file: apparently, babel 2.3.2 limits the bonds of C and B to four and
ignores again the "-aB" flag.

1504361.cif (dimethylferrocene)
Babel 2.2.3 output:
[Fe]12345678([CH]9=[CH]1[C]2(=[CH]4[C@@H]59)C)[C]1(=[CH]6[C@H]7[CH]8=[CH]31)C
Babel 2.3.2 output: [Fe]([C@H]1C=CC(=C1)C)[C@H]1C=C(C=C1)C
Expected:
[Fe]12345678([cH]9[cH]1[c]2([cH]4[cH]59)C)[c]1([cH]6[cH]7[cH]8[cH]31)C
* 2.2.3 displays some sort of "kekulized" version of ferrocene that is
not too bad, except perhaps by considering one of the carbon of each
rings as sp3 with indication of the chirality (strictly speaking, the
carbon atoms not bearing the methyl could be regarded as "asymmetric").
2.3.2 again ignores the "-aB" flag and links iron to the rings only
through one C atom which for sure is wrong.

Another test I have made is to use as input supposedly "correct" SMILES
chains (hence, those listed as "expected" in the preceding paragraphs)
and check if the chain remains unchanged when piped through openbabel
(command babel xxxxxxx.smi -osmi, the input file containing the SMILES
chain). Expected result is input = output.

2008819 (a pyridine complex).
Input: [Os](Cl)(F)(F)([n]1ccccc1)([n]1ccccc1)[n]1ccccc1
Babel 2.2.3 output: [Os](Cl)(F)(F)([n]1ccccc1)([n]1ccccc1)[n]1ccccc1
Babel 2.3.2 output: [Os](Cl)(F)(F)(N1CCCCC1)(N1CCCCC1)N1CCCCC1
* 2.3.2 fully dearomatizes pyridine and convert all CH into CH2. 2.2.3
is OK.

2227419 (a bipyridine complex).
Input: [Pt]1(Br)(Br)(Br)(Br)[n]2ccccc2c2[n]1cccc2
Babel 2.2.3 output: [Pt]1(Br)(Br)(Br)(Br)[n]2ccccc2c2[n]1cccc2
Babel 2.3.2 output: [Pt]1(Br)(Br)(Br)(Br)N2CCCCC2C2N1CCCC2
* Same comment as previous.

2223192 (a phenanthroline complex).
Input: [Mo]1(F)(F)(=O)(=O)[n]2cccc3c2c2[n]1cccc2cc3
Babel 2.2.3 output: [Mo]1(=O)(=O)(F)(F)[n]2cccc3c2c2[n]1cccc2cc3
Babel 2.3.2 output: [Mo]1(=O)(=O)(F)(F)n2cccc3c2c2n1cccc2cc3
* 2.2.3 is OK. 2.3.2 insists in removing brackets.

8100257 (a phosphane complex).
Input:
[Ru](Cl)(Cl)([P](c1ccccc1)(c1ccccc1)c1ccccc1)([P](c1ccccc1)(c1ccccc1)c1ccccc1)[P](c1ccccc1)(c1ccccc1)c1ccccc1
Babel output (both versions):
[Ru](Cl)(Cl)(P(c1ccccc1)(c1ccccc1)c1ccccc1)(P(c1ccccc1)(c1ccccc1)c1ccccc1)P(c1ccccc1)(c1ccccc1)c1ccccc1
* Babel insists in removing brackets and hence, adding spurious
hydrogens.

7007515 (an acetylacetonato complex).
Input: [Pb]12([O]=C(C)C=C(C)O1)[O]=C(C)C=C(C)O2
* Both versions keeps this chain unchanged, so this example is OK.

2228718 (an imino complex).
Input: [Cu]12(Cl)Oc3ccccc3C(=[N]1CC[N]12CCOCC1)C
* Both versions keeps this chain unchanged, so this example is OK.

7105215 (an ether complex).
Input: [Th]12(Cl)(Cl)(Cl)(Cl)([O](C)CC[O]1C)[O](C)CC[O]2C
* Both versions keeps this chain unchanged, so this example is OK.

2004668 (a closo carborane).
Input: [C]1234([CH]567[BH]891[BH]1%102[BH]2%113[BH]345[BH]45%11[BH]%11%
102[BH]291[BH]168[BH]734[BH]5%1121)c1ccccc1
Babel 2.2.3 output: [C]1234([CH]567B891B1%102B2%113B345B452B21%11B18%
10B869B734B5218)c1ccccc1
Babel 2.3.2 output: [C]1234([CH]567[BH]891[BH]1%102[BH]2%
113[BH]345[BH]452[BH]21%11[BH]18%10[BH]869[BH]734[BH]5218)c1ccccc1
* 2.2.3 removes brackets and H-atoms from boron, which is not correct.
2.3.2 does it right, this is the only test in which 2.3.2 has performed
better than 2.2.3.

1504361.cif (dimethylferrocene)
Input:
[Fe]12345678([cH]9[cH]1[c]2([cH]4[cH]59)C)[c]1([cH]6[cH]7[cH]8[cH]31)C
Babel output (both versions): [Fe]12345678(C9C1C2(C3C49)C)C1(C5C6C7C81)C
* Babel simply consider the carbons as non-aromatic, as they are forming
four bonds the output at least keeps correct the hydrogen count. Looking
at SMILES format specifications (can ferrocene be considered as
"informatically non-aromatic" even if chemically for sure is
aromatic??), I am not able to tell if the output is correct or not. 

I think these examples are a rather representative sample to study and
try to improve the performance of openbabel with inorganic compounds.
Thanks a lot for your interest in this subject.

Best wishes,
Miguel Quirós


El jue, 19-12-2013 a las 10:17 +0100, Miguel Quirós Olozábal escribió:
> Thanks a lot for your message.
> 
> It probably will take some time to prepare such set. I would like to
> include files containing a single different problem each one, with
> compounds as simple as possible and without purely crystallographic
> problems (molecules in symmetry elements, disorder, ...). All these to
> avoid the concurrence of several problems of different nature in the
> same file.
> The set should be representative of the problems more frequently found
> and also not be too large.
> 
> When I am satisfied with the selection, I will forward it to you.
> 
> Best wishes,
> Miguel Quirós
> 
> 
> El mié, 18-12-2013 a las 10:06 -0500, Geoffrey Hutchison escribió:
> > That's a pretty bad regression, and I will investigate the two examples you 
> > sent.
> > 
> > Certainly if you can prepare a test set (in whatever format) that would be 
> > extremely helpful, since it could be added as a unit test. Not only would 
> > this ensure all such examples will be fixed, but future versions will need 
> > to ensure they pass.
> > 
> > I'd actually be very interested in such a set for other reasons, since the 
> > gen3d builder and other parts of he code (UFF) need similar testing on 
> > inorganic compounds.
> > 
> > You can either send the CIFs to me personally, or provide entries into the 
> > COD, since I can script the downloads.
> > 
> > As I said before, I really want to know of these types of bugs (inorganics, 
> > but also any type of changes from one release to another).
> > 
> > Thanks,
> > Geoff
> > 
> > > On Dec 14, 2013, at 1:26 AM, mquiros <mqui...@ugr.es> wrote
> > > El 13/12/2013 22:16, Geoffrey Hutchison escribió:
> > >>> I need to review and, in most cases, fix the SMILES chains coming out
> > >>> from OpenBabel for inorganic compounds (either manually or
> > >>> semiautomatically). I am also stuck to version 2.2.3 because versions
> > >>> newer than this perform worse for inorganic compounds.
> > >> 
> > >> If you can give some bug reports or somewhat more detailed
> > >> descriptions of the problems we'd obviously really appreciate it. I
> > >> suspect many of these issues can be resolved, but if we're operating
> > >> in a vacuum, it's hard to know what bugs exist. For example, we're
> > >> firming up plans for v2.4, and obviously, I'd prefer to have improved
> > >> inorganic / organometallic support.
> > >> 
> > >> As you say, some issues are inevitable, since there is a mismatch
> > >> between inorganic / organometallic bonding and the valence bond model,
> > >> but that doesn't mean we can't aim to represent things well. For
> > >> example, the latest development code has improved support for "zero
> > >> order" bonds, including an extension to the SD file format.
> > >> 
> > >> All of these discussions are quite productive, thanks!
> > >> -Geoff
> > > 
> > > Hello.
> > > 
> > > Thanks a lot for your interest.
> > > 
> > > I think I have already provided examples in previous posts, but if you 
> > > want just a couple of quick examples, ferrocene and a pyridine complex 
> > > (tetrakispyridine copper(II) chloride) with just SMILES -> SMILES 
> > > conversion (the "inorganic problems" are not a CIF format problem but a 
> > > more general one). I have prepared the following files with a single line:
> > > 
> > > cupyrcl.smi:
> > > [Cu]([n]1ccccc1)([n]1ccccc1)([n]1ccccc1)[n]1ccccc1.[Cl-].[Cl-]    
> > > Cupyr4Cl2
> > > 
> > > ferrocene.smi:
> > > [Fe]12345678([cH]9[cH]1[cH]2[cH]3[cH]49)[cH]1[cH]5[cH]6[cH]7[cH]18   
> > > ferrocene
> > > 
> > > If I perform "babel cupyrcl.smi -osmi" and "babel ferrocene.smi -osmi", I 
> > > expect the output to be equal to the input (or perhaps just changing the 
> > > order of atoms).
> > > 
> > > In the first example, I got it right with babel 2.2.3 but with babel 
> > > 2.3.2, the output is very wrong:
> > > [Cu](N1CCCCC1)(N1CCCCC1)(N1CCCCC1)N1CCCCC1.[Cl-].[Cl-]    Cupyr4Cl2
> > > Full conversion of pyridine into piperidinato, all CH changed to CH2. 
> > > Babel 2.3.2 wants to keep valence 3 for nitrogen even at the cost of 
> > > completely corrupting the whole heterocycle.
> > > 
> > > For ferrocene, with any of the two versions, the output is:
> > > [Fe]12345678(C9C1C2C3C49)C1C5C6C7C81   ferrocene
> > > Again dearomatization, the hydrogen count is however correct in this 
> > > case. But I want the conversions to perform substructure search and any 
> > > inorganic chemist looking for ferrocene derivatives will regard ferrocene 
> > > as an aromatic compound and the search will fail.
> > > 
> > > I can provide thousands of examples: metalocenes, metal carbonyls, 
> > > phosphane complexes, imino complexes, boranes and carboranes, 
> > > acetylacetonato complexes and a long etcetera that include the vast 
> > > majority of metal-organic and organometallic compounds. Perhaps I can 
> > > prepare a bunch of CIF files including at least one belonging to each of 
> > > the most oustanding inorganic families to use as a test bench.
> > > 
> > > Thanks again. Best wishes,
> > > Miguel Quirós
> 

-- 
Miguel Quirós Olozábal
Departamento de Química Inorgánica. Facultad de Ciencias.
Universidad de Granada. 18071 Granada. SPAIN.
email: mquiros<at>ugr<dot>es
       mquiros<arroba>ugr<punto>es


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to