On Mar 25, 2007, at 5:27 PM, Martijn Faassen wrote:
Hey Jim,
Jim Fulton wrote:
On Mar 25, 2007, at 12:33 PM, Martijn Faassen wrote:
[snip]
I have the strong suspicion that modern relational databases are
currently better able to scale at queries using LIMIT and ORDER BY
than the Zope 3 catalog.
I had a similar suspicion. I assigned the Python Labs team the task
of finding out through literature search the approaches used. They
found that there were none other than the sorts of things I've
mentioned.
What about caching strategies? (as I sketched out in my last mail)
Obviously, it depends a lot on access patterns. I expect that this
is an area where picking the right strategy and suceeding is highly
application specific.
Take batching. Caching would potentially make getting multiple
batching go faster,. but to benefit, you'd have to increase the
internal batch size. For example, if the user visible batch size is
20 and you wanted them to be able to get the second batch without
searching and sorting, you'd have to make your internal batch size
40. That would increase the cost for the first batch by on the order
of log(2). I suspect that most people don't look at multiple
batches, so caching to support multiple batches could be a
significant loss, even leaving memory impact aside.
OTOH, we've used some highly application specific caching strategies
in some of our commercial applications to great success. These caches
were implemented as specialized indexes, and I would argue that
indexes are really a form of caching.
This article about MySQL claims that MySQL is the only database
that does query result set caching. Surprising for such an obvious
thought:
Sounds like BS to me. :)
http://dev.mysql.com/tech-resources/articles/mysql-query-cache.html
Perhaps it doesn't work as well as one would think and that's why
other database engines rejected it. :)
I suspect it is a hard general strategy to get right.
Note that SQL methods support query caching and Zope's caching
framework is often used to cache various kinds of computations,
including searches.
I cannot back this up as I haven't done measurements. Perhaps you
have done so?
We did a literature search.
That's useful, but doesn't tell us very much about how they compare in
practice.
Actually, it does. But feel free to to dome performance tests.
Perhaps someone should do measurements and see how the two compare
in a
sort/batch use case. It shouldn't be too hard to set up a relational
database-based sorted batch along with a ZODB/catalog based sorted
batch
and see how they both hold up.
Yup, although, to be meaningful, you need to look at large data
sets. This raises the amount of effort required.
* Do you estimate the performance of the Zope 3 catalog to be
equivalent to the performance of a modern relational database
system for queries that need to sort and batch their results?
I estimate that the same issues apply to both.
Theoretical algorithm scalability is one thing, and the same issues
apply to both. Practical scalability might vary widely.
OK, I give up. This argument just isn't worth my time any more. I'm
sorry I objected to the original point.
Jim
--
Jim Fulton mailto:[EMAIL PROTECTED] Python
Powered!
CTO (540) 361-1714
http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev