Hello. A very interesting initiative and rather related to what I am working in.
I am working in the Crystallography Open Database (COD, www.crystallography.net), a large collection of openly accessible CIF files. I have posted a few times messages previously to this list. COD contains today 245336 files, probably a noticeable proportion of your 44000 may be already included here. The task I am actually doing is extracting the chemical connectivity from the CIF files and storing it in SMILES format, so that chemical substructure search can be performed on it. At present the SMILES collection is around 70000 entries. The conversion is done trough OpenBabel. Doing this, I have found that the results are rather satisfactory for organic compounds but generally they are not for inorganic ones. In most cases, the results are not perhaps a "bug", but simply a representation of the molecule that is not coincident with the one that an inorganic chemist will usually have in his/her mind (or should I say "in my mind?"). In other cases, the results are wrong, specially with the appearance of spurious H-atoms. Because of this, I would be able to add a lot of stuff to your list of unsatisfactory conversions from CIF files. I need to review and, in most cases, fix the SMILES chains coming out from OpenBabel for inorganic compounds (either manually or semiautomatically). I am also stuck to version 2.2.3 because versions newer than this perform worse for inorganic compounds. These facts are understandable since the bond and valence concepts that are behind the spirit of cheminformatics formats and cheminformatics in general are mostly in the valence bond theory realm and thus in the organic chemistry formalism, many of these concepts (such as the definition of "double" and "aromatic" bonds) become more dubious when metal atoms are present. Also, the behaviour of common "organic" atoms is different when they interact with metals. A clear example of this is nitrogen: OpenBabel does not consider that this element usually binds to metal atoms through its lone pair (thus forming four bonds) and this introduces a lot of mistakes when OpenBabel tries to keep its trivalent state at all costs. By the way, you can use the SMILES COD collection, many of which have been humanly revised, for your task if you think it may help you in any way. I do not know if a SMILES string has enough information to build an acceptable MOL file, though. Links in chemspider to COD CIFs are of course welcome. Best wishes, Miguel Quirós El lun, 09-12-2013 a las 08:44 -0800, daya escribió: > I’ve just supervised a student project in which we used OpenBabel to convert > over 44,000 Royal Society of Chemistry CIF structures to mol files, then a > student checked over 4,000 of these conversions so that we could upload the > successfully processed CIFs to ChemSpider for the corresponding ChemSpider > compounds. A summary of the results of that project are detailed here: > http://www.chemspider.com/blog/adding-rsc-cifs-to-chemspider.html > It seemed like a valuable opportunity to identify the most frequent > OpenBabel bugs when doing a CIF to Mol conversion so these are documented in > there, along with test cases to identify the problems and with a view to > fixing them and making OpenBabel more bulletproof. > We’re taking a bit of a break from this project for now, but in the next > phase of the project will see if we can fix at least some of the bugs > identified if they haven’t already been. > But we’re sharing these results here for now though since we thought you > would be interested in the project and the performance of OpenBabel when run > over such a large and varied test set, possibly even enough to look into > some of them yourselves… > Looking forward to working with you on some of them in the future… > Aileen Day (Informatics Analyst, RSC ChemSpider) > > > > -- > View this message in context: > http://forums.openbabel.org/Converting-CIFs-to-mols-using-OpenBabel-tp4657031.html > Sent from the General discussion mailing list archive at Nabble.com. > > ------------------------------------------------------------------------------ > Sponsored by Intel(R) XDK > Develop, test and display web and hybrid apps with a single code base. > Download it for free now! > http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk > _______________________________________________ > OpenBabel-discuss mailing list > OpenBabel-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss -- Miguel Quirós Olozábal Departamento de Química Inorgánica. Facultad de Ciencias. Universidad de Granada. 18071 Granada. SPAIN. email: mquiros<at>ugr<dot>es mquiros<arroba>ugr<punto>es ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss