Hello. I posted the attached long message a couple of weeks ago and I
have had no reply at all, not even a comment. Fully uninteresting
subject?


El vie, 17-06-2011 a las 09:58 +0200, Miguel Quirós Olozábal escribió:
> Hello.
> 
> I am working in the Crystallography Open Database
> (www.crystallography.net), a collection that contains now more than
> 140000 CIFs files of small molecule crystal structures. We are using
> openbabel to perform substructure search on the database, at present
> such search is already available for a fraction (~26000) of the
> database.
> 
> Openbabel is used in two stages, first to extract the chemical
> connectivity of the CIFs and store it in SMILES format and also to
> perform the actual search of a SMARTS pattern into the SMILES
> collection.
> 
> Openbabel works very well in the two stages when working with purely
> organic compounds but poses a lot of problems when dealing with
> metal-organic compounds, the main reason of this is that atoms like N,
> O, P, S, C, link metals in many cases through non-bonding pairs and
> thus, they form more bonds (in the most frequent case, one more bond)
> than those expected from their "organic" valence. Openbabel tries to
> keep the "organic" valence for these atoms at all costs and this
> produces results with chemically unreasonable bond orders, unrealistic
> losses of aromaticity, artificial radicals, apparition of spurious
> hydrogen atoms, etc. Moreover, the situation is worse for version 2.3.0
> than it was for 2.2.3. In fact, I keep using 2.2.3 for my work.
> 
> To illustrate the problem, I will focus in a frequent example which is
> the metal complexes with ligands linked through an aromatic nitrogen
> (pyridine and related compounds). Openbabel does not like to consider
> more links to this nitrogen than the two adjacent aromatic carbons and
> yields unreasonable results. I think that the correct SMILES for a
> pyridine complex would be, say for copper [Cu][n]1ccccc1. The bracket
> around the "n" is necessary to point that nitrogen has a valence
> different to its SMILES standards 3 or 5. I am attaching three examples,
> one containing pyridine (Cupyr4.cif), another containing bipyridine
> (Cubpy.cif) and a third containing phenanthroline (Cuphen.cif).
> 
> When getting the SMILES with version 2.3.0 I get:
> 
> babel --title "" Cupyr4.cif -osmi
> [Cu](OC(=O)c1c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)(N1C=CCC=C1)(N1C=CCC=C1)(N1C=CCC=C1)N1C=CCC=C1.c1(c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)C(=O)[O].O.O
> 
> babel --title "" Cubpy.cif -osmi
> [Cu]12(OC(=O)CC(=O)O1)N1C=CCC=C1C1=CCC=CN21.O.O       
> 
> babel --title "" Cuphen.cif -osmi
> [Cu]12(OC(=O)CC(=O)O1)n1cccc3c(c(c4cccn2c4c13)C)C.O.O
> 
> As can be seen, the results are wrong, the aromaticity is broken for
> pyridine and bipyridine and one of the CH of the ring is transformed
> into a CH2. For phenantroline aromaticity is kept but openbabel
> "forgets" the brackets around the "n", probably due to that is possible
> to write a Kekulé form of pheantroline keeping single bonds of the
> nitrogens to the adjacent carbons. With version 2.2.3, the situation was
> less bad:
> 
> babel --title "" Cupyr4.cif -osmi
> [Cu](OC(=O)c1c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)([n]1ccccc1)([n]1ccccc1)([n]1ccccc1)[n]1ccccc1.c1(c(cc(cc1c1ccccc1)C(=O)O)c1ccccc1)C(=O)[O].O.O
> 
> babel --title "" Cubpy.cif -osmi 
> [Cu]12(=[n]3c(=c4n2cccc4)cccc3)OC(=O)CC(=O)O1.O.O
> 
> babel --title "" Cuphen.cif -osmi
> [Cu]12(OC(=O)CC(=O)O1)[n]1cccc3c(c(c4ccc[n]2c4c13)C)C.O.O
> 
> Thus, correct results for pyridine and phenantroline but for bipyridine
> an exotic double bond is included for one of the N-Cu links (thus giving
> valence 5 for this nitrogen) and also an spurious double bond is
> introduced between the two aromatic rings (that bond is single for sure)
> to keep valence 3 fot the other nitrogen.
> 
> Pyridine-like ligands is only one of many examples. Many compounds with
> metal-N bond with N belonging to a R-CH=N-R group usually ends in
> alteration of the bond orders in the ligand (or addition of H atoms or
> apparition of radicals) to keep valence 3 for the N. Carbonyl oxygen
> linked to metals is other example ...
> 
> On the other hand, when the metal is linked to an anionic heteroatom,
> the results use to be correct, since the atom satisfies in this case the
> "organic" valence. For example, see the attached Zncarboxylate.cif file
> with Zn linked to anionic oxygen:
> 
> babel --title "" Zncarboxylate.cif -osmi
> [Zn](Cl)(Cl)(OC(=O)c1cc(N(=O)=O)cc(N(=O)=O)c1)OC(=O)c1cc(N(=O)=O)cc(N(=O)=O)c1.[NH2+]1CCCCC1.[NH2+]1CCCCC1
>     
> 
> In delocalized situations, where there is one bond with an anionic
> heteroatom and another bond with a neutral one, the results are not
> correct either. A typical example is acetylacetonate complexes, for
> example the attached file Mnacac.cif
> 
> With 2.3.0:
> babel --title "" Mnacac.cif -osmi
> [Mn]12([N](CCO)(C)C)(O[C](C)C=C(C)O1)O[C](C)C=C(C)O2
> 
> With 2.2.3:
> babel --title "" Mnacac.cif -osmi
> [Mn]12([N](CCO)(C)C)(O[C@@H](C)C=C(C)O1)O[C@H](C)C=C(C)O2
> 
> To keep valence 2 at oxygen, version 2.3.0 creates a radical in one of
> the C atoms and version 2.2.3 adds a spurious H atom putting chirality
> on a planar C atom. The correct result should be in my opinion
> [Mn]1[O]=C(C)C=C(C)O1 (of course, an equivalent resonant form may be
> written interchanging the "neutral" and "anionic" roles of oxygens).
> 
> I also have attached an example with phosphorous, PdPPh3.cif
> 
> With 2.3.0:
> babel --title "" PdPPh3.cif -osmi
> [Pt](Cl)(Cl)([P@](c1ccccc1)(c1ccccc1)c1ccccc1)[P@@](c1ccccc1)(c1ccccc1)c1ccccc1.[Pt](Cl)(Cl)([P@@](c1ccccc1)(c1ccccc1)c1ccccc1)[P@](c1ccccc1)(c1ccccc1)c1ccccc1
> 
> With 2.2.3:
> babel22 --title "" PdPPh3.cif -osmi
> [Pt](Cl)(Cl)(P(c1ccccc1)(c1ccccc1)c1ccccc1)P(c1ccccc1)(c1ccccc1)c1ccccc1.[Pt](Cl)(Cl)(P(c1ccccc1)(c1ccccc1)c1ccccc1)P(c1ccccc1)(c1ccccc1)c1ccccc1
> 
> As you see, 2.3.0 adds chiral marks on phosphorus (obviously spurious
> since it is linked to three identical moieties) whereas 2.2.3 "forgets"
> the brackets, "P" should be changed by "[P]". Brackets are necessary
> because the SMILES specifications stablish that a P without brackets has
> valence 3 or 5, in this case bracket abscence would mean than an H atom
> is linked to the phosphorus which is not the case.
> 
> I could put also examples with bonds to carbon (organometallics) but the
> variability of situations is very large in these cases.
> 
> I am fixing (mostly by hand) most of these inaccuracies when building
> the SMILES parallel database but the problem appears again when
> performing the actual search, apparently openbabel "fixes" my
> manipulated SMILES and recreates the wrong connectivities before
> searching.
> 
> Well, in my opinion, when defining bond orders and aromaticity openbabel
> should not consider metal-(N,O,P,S,C, ...) bonds in the same way than
> organic usual bonds. I suggest the following strategy.
> 
> - Firstly ignore bonds with metals when establishing bond orders and
> aromaticity. The atoms linked in anionic form may appear as radicals at
> this stage.
> 
> - Then add all bonds to metals but without changing the bond orders or
> aromaticity found in the previous stage and not adding/deleting hydrogen
> atoms.
> 
> - Finally put brackets to atoms that do not show their standard valence.
> 
> I am just a chemist, not a software developer, so I cannot help with
> developing code for this. Probably the task is quite difficult but I
> think that the performance of openbabel in the "inorganic realm" will
> improve a lot and at least this should be included in the "to do" list
> (rather long, I imagine).
> 
> Thanks for your attention (or even just for reading this long message).
> 
> Best wishes,
> Miguel Quirós

-- 
Miguel Quirós Olozábal
Departamento de Química Inorgánica. Facultad de Ciencias.
Universidad de Granada. 18071 Granada. SPAIN.
email: mquiros<at>ugr<dot>es
       mquiros<arroba>ugr<punto>es


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to