On Fri, May 1, 2009 at 8:51 PM, Evgueni Kolossov <ekolos...@gmail.com> wrote: > Ok Greg, > > What if we will try to define the format and start with the record separator > - may be use the same as SDF? > Index file can be created during the writing.
Minor point: If the index file is created on writing, a record separator isn't needed. I'm still trying to decide how I feel about this suggestion. It would be nice to have a random-access reader for binary molecule files. There's some value in having that be self-contained in the RDKit (i.e. invent the format) instead of using an external library to handle the reading/writing. And it's completely fitting with the current situation, which is that the RDKit binary formats are undocumented (except by the code) and unique to the RDKit. On the other hand, there's also something to be said for striving for interoperability; which leads me to prefer a more documented and portable binary format. There are some nice libraries out there for doing this. Another interesting, and relatively easy to implement, idea would be to use sqlite to store things. So multi-molecule binary files are sqlite databases. One can think of all kinds of interesting things to do with this. Thoughts? -greg