Re: [ZODB-Dev] Indexing and dates/times
On Tue, Jul 13, 2010 at 4:35 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. One of the problems I'm facing is that of index granularity: I'd like to provide to the second granularity, will there ever be more than item with the same key? Exactly, that's the problem. Typically, to model something like this, you's have a BTree who's values are sets. If single items are common and you were willing to work a bit harder, you could have BTrees whos values could be either a set or a scalar. but for that I need some structure that lets me do that. So, the options I see are: - A timestamp-based What do you mean by timestamp Well, it could be a UNIX timestamp. It could be lots of things. I was asking what you meant. If you used a unix time stamp, you could ise one of the Ix flavors of BTree. BTree index - looks highly inefficient, as there will be many entries with only one element (probably almost all of them), I have no idea what you mean by this. That's the problem you've already mentioned above. So, the issue is that you have multiple items with the same key. This is simply handled by using sets as values ion a BTree. There are existing index implementations that do this. So, in a relational DB i would do something like: SELECT * FROM table WHERE timestamp = X AND timestamp = Y Since I cannot do this with ZODB, I don't know what this is. Range seaches? SQL? BTrees and various index implementations based on the,m support range searches. of course, ZODB doesn't support SQL. I'd have to have a BTree, indexed by timestamp... however, as you said, if I want to the second granularity, I will rarely have two items with the same key (which makes it pretty useless). I don't know why it is useless, but it is easily handled. So, I was wondering if there is some data structure I can use for this, as this seems to be a pretty common use case. That's why the various indexing(/catalog) schemes already support it. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
On Tue, Jul 13, 2010 at 4:35 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. So, in a relational DB i would do something like: SELECT * FROM table WHERE timestamp = X AND timestamp = Y Since I cannot do this with ZODB, I'd have to have a BTree, indexed by timestamp... however, as you said, if I want to the second granularity, I will rarely have two items with the same key (which makes it pretty useless). If you use the timestamp as the key and you want to retrieve all values between two timestamps (inclusive), you can do my_btree.values(min=start, max=end) -- Benji York ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
So, the issue is that you have multiple items with the same key. This is simply handled by using sets as values ion a BTree. There are existing index implementations that do this. Hmm... no, in fact the problem is that most of the time I will have only one value per index entry.So, in a relational DB i would do something like: SELECT * FROM table WHERE timestamp= X AND timestamp= Y Since I cannot do this with ZODB, I don't know what this is. Range seaches? SQL? BTrees and various index implementations based on the,m support range searches. of course, ZODB doesn't support SQL. Yes, I know ZODB doesn't support SQL, I was just trying to demonstrate my use case. What I meant was that in relational DBs a query like the one above can be performed over an arbitrary table, without the need for having an extra data structure for indexing. I'd have to have a BTree, indexed by timestamp... however, as you said, if I want to the second granularity, I will rarely have two items with the same key (which makes it pretty useless). I don't know why it is useless, but it is easily handled. It's not useless. I'm sorry, I have used the wrong word. I meant that a range query will normally involve the union of a higher number of sets as the granularity gets smaller and smaller. If there is only one item per index entry, the union operation will take longer... I assumed that the more BTree entries we have, the more buckets we will have to fetch from the DB, for a given range query. But I am probably wrong... So, I was wondering if there is some data structure I can use for this, as this seems to be a pretty common use case. That's why the various indexing(/catalog) schemes already support it. So, if I need an index where items can be queried by date, and range queries can be performed efficiently, an IOBTree will do the job? As I mention above, my only concern is the number of sets that will have to be joined. Thanks, once again, Pedro -- José Pedro Ferreira Indico Team IT-UDS-AVC 513-R-0042 CERN, Geneva, Switzerland ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
If you use the timestamp as the key and you want to retrieve all values between two timestamps (inclusive), you can do my_btree.values(min=start, max=end) Yes, but, as I mentioned in my answer to Jim's mail, my concern is the performance of this range operation for a very large range. If you tell me it will be OK, I won't hesitate :). Regards, Pedro -- José Pedro Ferreira Indico Team IT-UDS-AVC 513-R-0042 CERN, Geneva, Switzerland ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
On Tue, Jul 13, 2010 at 7:51 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: ... Hmm... no, in fact the problem is that most of the time I will have only one value per index entry.So, in a relational DB i would do something like: As I mentioned in some text you didn't quote, you could use a strategy of storing a scalar unless there are dups. If dups are very rare, this *might* be a win. ... I'm sorry, I have used the wrong word. I meant that a range query will normally involve the union of a higher number of sets as the granularity gets smaller and smaller. If there is only one item per index entry, the union operation will take longer... I assumed that the more BTree entries we have, the more buckets we will have to fetch from the DB, for a given range query. But I am probably wrong... The BTrees package has a pretty efficient strategy for merging multiple sets. So, I was wondering if there is some data structure I can use for this, as this seems to be a pretty common use case. That's why the various indexing(/catalog) schemes already support it. So, if I need an index where items can be queried by date, and range queries can be performed efficiently, an IOBTree will do the job? An IOBTree could be the *basis* of a solution. As I mention above, my only concern is the number of sets that will have to be joined. Depending on how rare dups are, I'd either use a BTree who's values are scalers or sets. I might start with just using sets as the values. Typically the strategy is to assign each object you want to index an integer id. Then your index is of the form: {key - {docids}} Then you can do a lookup with something like: result = BTrees.IOBTree.multiunion(index.values(min,max)) For more code examples, you should look at the zope.index and zope.catalog packages. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Indexing and dates/times
Hello all, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. One of the problems I'm facing is that of index granularity: I'd like to provide to the second granularity, but for that I need some structure that lets me do that. So, the options I see are: - A timestamp-based BTree index - looks highly inefficient, as there will be many entries with only one element (probably almost all of them), and querying by range will be a nightmare (correct me if I'm wrong, please); - Checking the timestamp for every object and comparing it with the provided range (even worse); Maybe there is some other way to do it? Any suggestions? Thanks in advance, Regards, Pedro -- José Pedro Ferreira Indico Team IT-UDS-AVC 513-R-0042 CERN, Geneva, Switzerland ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
On Mon, Jul 12, 2010 at 11:47 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello all, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. One of the problems I'm facing is that of index granularity: I'd like to provide to the second granularity, will there ever be more than item with the same key? but for that I need some structure that lets me do that. So, the options I see are: - A timestamp-based What do you mean by timestamp BTree index - looks highly inefficient, as there will be many entries with only one element (probably almost all of them), I have no idea what you mean by this. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] indexing and querying objects in the ZODB
On Feb 13, 2009, at 1:54 AM, Sebastian Wehrmann wrote: Hi, I finished my diploma thesis about indexing and querying objects [1] (only available in german - sorry) last year. With that thesis I developed some piece of software, which can be used to query for objects within the ZODB with a XPath like syntax (its called regular path expressions). I released a very pre-alpha version some days ago at pypi [2]. If you are interested, feel free to play with it and report me your experiences. Please also report bugs to [3]. If you have questions, please send me an Email. Kind regards, Sebastian [1] http://archiv.tu-chemnitz.de/pub/2008/0081/data/diplomarbeit.pdf [2] http://pypi.python.org/pypi/gocept.objectquery/ [3] https://bugs.launchpad.net/gocept.objectquery This looks very cool. Have you done any scale experiments? Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] indexing and querying objects in the ZODB
Hi, On Friday, Feb 13, at 9:30am +0100, Jim Fulton wrote: Have you done any scale experiments? Indeed, I've done tests with some binary trees with a heigth from 2 till 20. The results pointed out, that filling the index structures required very much more time than the query itself (which was expected, of course :) ). For a tree with a height of e.g. 14 (round about 32000 objects) it took 20 seconds to fill the ZODB (including indexstructures in gocept.objectquery) and less than 2 seconds for some testing queries. Of course, the times raise exponentially with the height of the tree. However, the main development focus until now was in stability and not in performance. Probably, performance can be increased with simple code refactorings. Basti -- Sebastian Wehrmann · s...@gocept.com gocept gmbh co. kg · forsterstraße 29 · 06112 halle (saale) · germany http://gocept.com · tel +49 345 1229889 12 · fax +49 345 1229889 1 Zope and Plone consulting and development smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] indexing and querying objects in the ZODB
Hi, I finished my diploma thesis about indexing and querying objects [1] (only available in german - sorry) last year. With that thesis I developed some piece of software, which can be used to query for objects within the ZODB with a XPath like syntax (its called regular path expressions). I released a very pre-alpha version some days ago at pypi [2]. If you are interested, feel free to play with it and report me your experiences. Please also report bugs to [3]. If you have questions, please send me an Email. Kind regards, Sebastian [1] http://archiv.tu-chemnitz.de/pub/2008/0081/data/diplomarbeit.pdf [2] http://pypi.python.org/pypi/gocept.objectquery/ [3] https://bugs.launchpad.net/gocept.objectquery -- Sebastian Wehrmann · s...@gocept.com gocept gmbh co. kg · forsterstraße 29 · 06112 halle (saale) · germany http://gocept.com · tel +49 345 1229889 12 · fax +49 345 1229889 1 Zope and Plone consulting and development smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] indexing
On Wed, 2008-07-23 at 11:41 -0400, [EMAIL PROTECTED] wrote: I've found the ZODB website to be very disorganized an not nearly as helpful as repeated googlings. I have been collecting links and writing some new text for the new ZODB website. But alas there are only so many hours in the day. :-) I am currently running a workshop http://www.oshipworkshop.if.uff.br/ and I will get back onto the ZODB website project after August 1. My apologies. Cheers, Tim -- ** Join the OSHIP project. It is the standards based, open source healthcare application platform in Python. Home page: https://launchpad.net/oship/ Wiki: http://www.openehr.org/wiki/display/dev/Python+developer%27s+page ** signature.asc Description: This is a digitally signed message part ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] indexing
Sean Allen [EMAIL PROTECTED] wrote on 07/22/2008 11:44:11 PM: googling and the main zodb website seem to take me to a bunch of what appears to be severely dated websites, most not updated in at least 2 years ( some upwards of 6 ). this is probably because i simply don't know what to google for, so, could someone be so kind as to shoot some links to different indexing options for use with zodb/zeo? I've just started playing around with ZODB myself. We have a database of ~650,000 items, with dates and multiple tags. For indexing, I'm using the zc.catalog package from PyPi, with a value index for the dates and a set index for the tags. It seems good enough. There were other indexing classes, but I don't remember any of them. I've found the ZODB website to be very disorganized an not nearly as helpful as repeated googlings. -- Anthony Foglia Princeton Consultants (609) 987-8787 x233___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] indexing
googling and the main zodb website seem to take me to a bunch of what appears to be severely dated websites, most not updated in at least 2 years ( some upwards of 6 ). this is probably because i simply don't know what to google for, so, could someone be so kind as to shoot some links to different indexing options for use with zodb/zeo? much obliged. -Sean- ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] indexing object in ZODB
Hi all, What is the algorithm of indexing objects in ZODB? Is there any place where I can see the structure of ZODB ? Thanks alot ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Indexing: Query Optimization
[Thomas Guettler] I developed a simple index using ZODB. Searching for single values is very fast. Searching big ranges is slow. Example: Search: customer_id=0815 date_start=2001-01-01 date_end=2004-12-31 The index holds a btree which maps values to docids. The search for customer_id is very fast (Maybe 500 results) . But the range search from date_start to date_end is slow (Maybe 100.000 results). Up to now I use the intersection method of the BTree package to join the result with a logical AND. It would be better if the index would do the search for customer_id first, and then filter the result by deleting entries which are not in the given time period. Has anyone done such a Query Optimization? (Zope, Zope3, IndexCatalog, ...) Since btrees and btree ranges don't know their size, you need to do something like this: - do range searches at the end - if index foo is used, use it first, ... Dieter Maurer wrote some packages I suspect you'd find interesting in this respect: http://mail.zope.org/pipermail/zope/2004-August/152627.html ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev