Hello. I posted the attached long message a couple of weeks ago and I have had no reply at all, not even a comment. Fully uninteresting subject?
El vie, 17-06-2011 a las 09:58 +0200, Miguel Quirós Olozábal escribió: > Hello. > > I am working in the Crystallography Open Database > (www.crystallography.net), a collection that contains now more than > 140000 CIFs files of small molecule crystal structures. We are using > openbabel to perform substructure search on the database, at present > such search is already available for a fraction (~26000) of the > database. > > Openbabel is used in two stages, first to extract the chemical > connectivity of the CIFs and store it in SMILES format and also to > perform the actual search of a SMARTS pattern into the SMILES > collection. > > Openbabel works very well in the two stages when working with purely > organic compounds but poses a lot of problems when dealing with > metal-organic compounds, the main reason of this is that atoms like N, > O, P, S, C, link metals in many cases through non-bonding pairs and > thus, they form more bonds (in the most frequent case, one more bond) > than those expected from their "organic" valence. Openbabel tries to > keep the "organic" valence for these atoms at all costs and this > produces results with chemically unreasonable bond orders, unrealistic > losses of aromaticity, artificial radicals, apparition of spurious > hydrogen atoms, etc. Moreover, the situation is worse for version 2.3.0 > than it was for 2.2.3. In fact, I keep using 2.2.3 for my work. > > To illustrate the problem, I will focus in a frequent example which is > the metal complexes with ligands linked through an aromatic nitrogen > (pyridine and related compounds). Openbabel does not like to consider > more links to this nitrogen than the two adjacent aromatic carbons and > yields unreasonable results. I think that the correct SMILES for a > pyridine complex would be, say for copper [Cu][n]1ccccc1. The bracket > around the "n" is necessary to point that nitrogen has a valence > different to its SMILES standards 3 or 5. I am attaching three examples, > one containing pyridine (Cupyr4.cif), another containing bipyridine > (Cubpy.cif) and a third containing phenanthroline (Cuphen.cif). > > When getting the SMILES with version 2.3.0 I get: > > babel --title "" Cupyr4.cif -osmi > [Cu](OC(=O)c1c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)(N1C=CCC=C1)(N1C=CCC=C1)(N1C=CCC=C1)N1C=CCC=C1.c1(c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)C(=O)[O].O.O > > babel --title "" Cubpy.cif -osmi > [Cu]12(OC(=O)CC(=O)O1)N1C=CCC=C1C1=CCC=CN21.O.O > > babel --title "" Cuphen.cif -osmi > [Cu]12(OC(=O)CC(=O)O1)n1cccc3c(c(c4cccn2c4c13)C)C.O.O > > As can be seen, the results are wrong, the aromaticity is broken for > pyridine and bipyridine and one of the CH of the ring is transformed > into a CH2. For phenantroline aromaticity is kept but openbabel > "forgets" the brackets around the "n", probably due to that is possible > to write a Kekulé form of pheantroline keeping single bonds of the > nitrogens to the adjacent carbons. With version 2.2.3, the situation was > less bad: > > babel --title "" Cupyr4.cif -osmi > [Cu](OC(=O)c1c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)([n]1ccccc1)([n]1ccccc1)([n]1ccccc1)[n]1ccccc1.c1(c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)C(=O)[O].O.O > > babel --title "" Cubpy.cif -osmi > [Cu]12(=[n]3c(=c4n2cccc4)cccc3)OC(=O)CC(=O)O1.O.O > > babel --title "" Cuphen.cif -osmi > [Cu]12(OC(=O)CC(=O)O1)[n]1cccc3c(c(c4ccc[n]2c4c13)C)C.O.O > > Thus, correct results for pyridine and phenantroline but for bipyridine > an exotic double bond is included for one of the N-Cu links (thus giving > valence 5 for this nitrogen) and also an spurious double bond is > introduced between the two aromatic rings (that bond is single for sure) > to keep valence 3 fot the other nitrogen. > > Pyridine-like ligands is only one of many examples. Many compounds with > metal-N bond with N belonging to a R-CH=N-R group usually ends in > alteration of the bond orders in the ligand (or addition of H atoms or > apparition of radicals) to keep valence 3 for the N. Carbonyl oxygen > linked to metals is other example ... > > On the other hand, when the metal is linked to an anionic heteroatom, > the results use to be correct, since the atom satisfies in this case the > "organic" valence. For example, see the attached Zncarboxylate.cif file > with Zn linked to anionic oxygen: > > babel --title "" Zncarboxylate.cif -osmi > [Zn](Cl)(Cl)(OC(=O)c1cc(N(=O)=O)cc(N(=O)=O)c1)OC(=O)c1cc(N(=O)=O)cc(N(=O)=O)c1.[NH2+]1CCCCC1.[NH2+]1CCCCC1 > > > In delocalized situations, where there is one bond with an anionic > heteroatom and another bond with a neutral one, the results are not > correct either. A typical example is acetylacetonate complexes, for > example the attached file Mnacac.cif > > With 2.3.0: > babel --title "" Mnacac.cif -osmi > [Mn]12([N](CCO)(C)C)(O[C](C)C=C(C)O1)O[C](C)C=C(C)O2 > > With 2.2.3: > babel --title "" Mnacac.cif -osmi > [Mn]12([N](CCO)(C)C)(O[C@@H](C)C=C(C)O1)O[C@H](C)C=C(C)O2 > > To keep valence 2 at oxygen, version 2.3.0 creates a radical in one of > the C atoms and version 2.2.3 adds a spurious H atom putting chirality > on a planar C atom. The correct result should be in my opinion > [Mn]1[O]=C(C)C=C(C)O1 (of course, an equivalent resonant form may be > written interchanging the "neutral" and "anionic" roles of oxygens). > > I also have attached an example with phosphorous, PdPPh3.cif > > With 2.3.0: > babel --title "" PdPPh3.cif -osmi > [Pt](Cl)(Cl)([P@](c1ccccc1)(c1ccccc1)c1ccccc1)[P@@](c1ccccc1)(c1ccccc1)c1ccccc1.[Pt](Cl)(Cl)([P@@](c1ccccc1)(c1ccccc1)c1ccccc1)[P@](c1ccccc1)(c1ccccc1)c1ccccc1 > > With 2.2.3: > babel22 --title "" PdPPh3.cif -osmi > [Pt](Cl)(Cl)(P(c1ccccc1)(c1ccccc1)c1ccccc1)P(c1ccccc1)(c1ccccc1)c1ccccc1.[Pt](Cl)(Cl)(P(c1ccccc1)(c1ccccc1)c1ccccc1)P(c1ccccc1)(c1ccccc1)c1ccccc1 > > As you see, 2.3.0 adds chiral marks on phosphorus (obviously spurious > since it is linked to three identical moieties) whereas 2.2.3 "forgets" > the brackets, "P" should be changed by "[P]". Brackets are necessary > because the SMILES specifications stablish that a P without brackets has > valence 3 or 5, in this case bracket abscence would mean than an H atom > is linked to the phosphorus which is not the case. > > I could put also examples with bonds to carbon (organometallics) but the > variability of situations is very large in these cases. > > I am fixing (mostly by hand) most of these inaccuracies when building > the SMILES parallel database but the problem appears again when > performing the actual search, apparently openbabel "fixes" my > manipulated SMILES and recreates the wrong connectivities before > searching. > > Well, in my opinion, when defining bond orders and aromaticity openbabel > should not consider metal-(N,O,P,S,C, ...) bonds in the same way than > organic usual bonds. I suggest the following strategy. > > - Firstly ignore bonds with metals when establishing bond orders and > aromaticity. The atoms linked in anionic form may appear as radicals at > this stage. > > - Then add all bonds to metals but without changing the bond orders or > aromaticity found in the previous stage and not adding/deleting hydrogen > atoms. > > - Finally put brackets to atoms that do not show their standard valence. > > I am just a chemist, not a software developer, so I cannot help with > developing code for this. Probably the task is quite difficult but I > think that the performance of openbabel in the "inorganic realm" will > improve a lot and at least this should be included in the "to do" list > (rather long, I imagine). > > Thanks for your attention (or even just for reading this long message). > > Best wishes, > Miguel Quirós -- Miguel Quirós Olozábal Departamento de Química Inorgánica. Facultad de Ciencias. Universidad de Granada. 18071 Granada. SPAIN. email: mquiros<at>ugr<dot>es mquiros<arroba>ugr<punto>es ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss