Jim Fulton wrote at 2007-3-25 09:53 -0400: > >On Mar 25, 2007, at 3:01 AM, Adam Groszer wrote: >> MF> I think one of the main limitations of the current catalog (and >> MF> hurry.query) is efficient support for sorting and batching the >> query >> MF> results. The Zope 3 catalog returns all matching results, which >> can then >> MF> be sorted and batched. This will stop being scalable for large >> MF> collections. A relational database is able to do this >> internally, and is >> MF> potentially able to use optimizations there. > >What evidence to you have to support this assertion? We did some >literature search on this a few years ago and found no special trick >to avoid sorting costs. > >I know of 2 approaches to reducing sort cost: > >1. Sort your results based on the "primary key" and therefore, pick >your primary key to match your sort results. In terms of the Zope >catalog framework, the primary keys are the document IDs, which are >traditionally chosen randomly. You can pick your primary keys based >on a desired sort order instead. A variation on this theme is to use >multiple sets of document ids, storing multiple sets of ids in each >index. Of course, this approach doesn't help with something like >relevance ranks. > >2. Use an N-best algorithm. If N is the size of the batch and M is >the corpus size, then this is O(M*ln(N)) rather than O(M*ln(M)) which >is a significant improvement if N << M, but still quite expensive.
The major costs in sorting are usually not the "log(n)" but the very high linear costs fetching the sort keys (although for huge n, we will reach the asymptotic limits). Under normal conditions, a relational database can be far more efficient to fetch values either from index structures or the data records than Zope -- as * its data representation is much more compact * it often supports direct access * the server itself can access and process all data. With the ZODB, the data is hidden in pickles (less compact), there is no direct access (instead the complete pickle need to be decoded) and all operations are done in the client (rather than in the server). That said, I fear we can nothing do in ZODB land to significantly improve on this. We would instead need something like relational based catalogs. And of course, relational databases, too, cannot beat the asymptotic complexity of sorting -- they can only improve the linear factor. -- Dieter _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev