On Mar 25, 2007, at 5:27 PM, Martijn Faassen wrote:

Hey Jim,

Jim Fulton wrote:
On Mar 25, 2007, at 12:33 PM, Martijn Faassen wrote:
[snip]
I have the strong suspicion that modern relational databases are currently better able to scale at queries using LIMIT and ORDER BY
 than the Zope 3 catalog.
I had a similar suspicion.  I assigned the Python Labs team the task
of finding out through literature search the approaches used.  They
found that there were none other than the sorts of things I've
mentioned.

What about caching strategies? (as I sketched out in my last mail)

Obviously, it depends a lot on access patterns. I expect that this is an area where picking the right strategy and suceeding is highly application specific.

Take batching. Caching would potentially make getting multiple batching go faster,. but to benefit, you'd have to increase the internal batch size. For example, if the user visible batch size is 20 and you wanted them to be able to get the second batch without searching and sorting, you'd have to make your internal batch size 40. That would increase the cost for the first batch by on the order of log(2). I suspect that most people don't look at multiple batches, so caching to support multiple batches could be a significant loss, even leaving memory impact aside.

OTOH, we've used some highly application specific caching strategies in some of our commercial applications to great success. These caches were implemented as specialized indexes, and I would argue that indexes are really a form of caching.

This article about MySQL claims that MySQL is the only database that does query result set caching. Surprising for such an obvious thought:

Sounds like BS to me. :)


http://dev.mysql.com/tech-resources/articles/mysql-query-cache.html

Perhaps it doesn't work as well as one would think and that's why other database engines rejected it. :)


I suspect it is a hard general strategy to get right.

Note that SQL methods support query caching and Zope's caching framework is often used to cache various kinds of computations, including searches.


I cannot back this up as I haven't done measurements. Perhaps you
have done so?
We did a literature search.

That's useful, but doesn't tell us very much about how they compare in
practice.

Actually, it does.  But feel free to to dome performance tests.

Perhaps someone should do measurements and see how the two compare in a
sort/batch use case. It shouldn't be too hard to set up a relational
database-based sorted batch along with a ZODB/catalog based sorted batch
and see how they both hold up.

Yup, although, to be meaningful, you need to look at large data sets. This raises the amount of effort required.


* Do you estimate the performance of the Zope 3 catalog to be equivalent to the performance of a modern relational database
system for queries that need to sort and batch their results?
I estimate that the same issues apply to both.

Theoretical algorithm scalability is one thing, and the same issues
apply to both. Practical scalability might vary widely.

OK, I give up. This argument just isn't worth my time any more. I'm sorry I objected to the original point.

Jim

--
Jim Fulton                      mailto:[EMAIL PROTECTED]                Python 
Powered!
CTO                             (540) 361-1714                  
http://www.python.org
Zope Corporation        http://www.zope.com             http://www.zope.org



_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to