[moving a general-interest question to the mailing list] Hi Kirk,
On Thu, Nov 20, 2008 at 6:03 PM, <[email protected]> wrote: > > I have another question on DbCLI. After getting rid of problematic > structures, I was able to get DbCLI to the pairwise comparison step, but my I'm not sure what the "pairwise comparison step" is with the DbCLI stuff. Step one is loading the database with CreateDb.py, step 2 is doing searches with SearchDb.py. What are you asking about? > dataset has on the order of 100,000 structures. After about 50,000 > structures Python issued an "Unexpected error" response and stopped. Is this > likely due to the enormous size of a pairwise distance table for this > dataset? Have to had problems with very large datasets in the past or has > this typically worked smoothly? I must admit that I've never queried with that number of structures. My typical use case is to have a large database (10^5-10^6 compounds) and query that with a few (~10) structures. The code hasn't really been written to deal with giant query sets. That is doable, but it would require some reworking. Probably the best bet would be to support loading the queries from a database as well; that way you wouldn't have to reprocess the queries every time and could pretty easily handle the "only loading a few at a time" problem. It's an interesting thing to think about. -greg

