Re: [ZODB-Dev] Indexing and dates/times
On Tue, Jul 13, 2010 at 4:35 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. One of the problems I'm facing is that of index granularity: I'd like to provide to the second granularity, will there ever be more than item with the same key? Exactly, that's the problem. Typically, to model something like this, you's have a BTree who's values are sets. If single items are common and you were willing to work a bit harder, you could have BTrees whos values could be either a set or a scalar. but for that I need some structure that lets me do that. So, the options I see are: - A timestamp-based What do you mean by timestamp Well, it could be a UNIX timestamp. It could be lots of things. I was asking what you meant. If you used a unix time stamp, you could ise one of the Ix flavors of BTree. BTree index - looks highly inefficient, as there will be many entries with only one element (probably almost all of them), I have no idea what you mean by this. That's the problem you've already mentioned above. So, the issue is that you have multiple items with the same key. This is simply handled by using sets as values ion a BTree. There are existing index implementations that do this. So, in a relational DB i would do something like: SELECT * FROM table WHERE timestamp = X AND timestamp = Y Since I cannot do this with ZODB, I don't know what this is. Range seaches? SQL? BTrees and various index implementations based on the,m support range searches. of course, ZODB doesn't support SQL. I'd have to have a BTree, indexed by timestamp... however, as you said, if I want to the second granularity, I will rarely have two items with the same key (which makes it pretty useless). I don't know why it is useless, but it is easily handled. So, I was wondering if there is some data structure I can use for this, as this seems to be a pretty common use case. That's why the various indexing(/catalog) schemes already support it. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
On Tue, Jul 13, 2010 at 4:35 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. So, in a relational DB i would do something like: SELECT * FROM table WHERE timestamp = X AND timestamp = Y Since I cannot do this with ZODB, I'd have to have a BTree, indexed by timestamp... however, as you said, if I want to the second granularity, I will rarely have two items with the same key (which makes it pretty useless). If you use the timestamp as the key and you want to retrieve all values between two timestamps (inclusive), you can do my_btree.values(min=start, max=end) -- Benji York ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
So, the issue is that you have multiple items with the same key. This is simply handled by using sets as values ion a BTree. There are existing index implementations that do this. Hmm... no, in fact the problem is that most of the time I will have only one value per index entry.So, in a relational DB i would do something like: SELECT * FROM table WHERE timestamp= X AND timestamp= Y Since I cannot do this with ZODB, I don't know what this is. Range seaches? SQL? BTrees and various index implementations based on the,m support range searches. of course, ZODB doesn't support SQL. Yes, I know ZODB doesn't support SQL, I was just trying to demonstrate my use case. What I meant was that in relational DBs a query like the one above can be performed over an arbitrary table, without the need for having an extra data structure for indexing. I'd have to have a BTree, indexed by timestamp... however, as you said, if I want to the second granularity, I will rarely have two items with the same key (which makes it pretty useless). I don't know why it is useless, but it is easily handled. It's not useless. I'm sorry, I have used the wrong word. I meant that a range query will normally involve the union of a higher number of sets as the granularity gets smaller and smaller. If there is only one item per index entry, the union operation will take longer... I assumed that the more BTree entries we have, the more buckets we will have to fetch from the DB, for a given range query. But I am probably wrong... So, I was wondering if there is some data structure I can use for this, as this seems to be a pretty common use case. That's why the various indexing(/catalog) schemes already support it. So, if I need an index where items can be queried by date, and range queries can be performed efficiently, an IOBTree will do the job? As I mention above, my only concern is the number of sets that will have to be joined. Thanks, once again, Pedro -- José Pedro Ferreira Indico Team IT-UDS-AVC 513-R-0042 CERN, Geneva, Switzerland ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
If you use the timestamp as the key and you want to retrieve all values between two timestamps (inclusive), you can do my_btree.values(min=start, max=end) Yes, but, as I mentioned in my answer to Jim's mail, my concern is the performance of this range operation for a very large range. If you tell me it will be OK, I won't hesitate :). Regards, Pedro -- José Pedro Ferreira Indico Team IT-UDS-AVC 513-R-0042 CERN, Geneva, Switzerland ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
On Tue, Jul 13, 2010 at 7:51 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: ... Hmm... no, in fact the problem is that most of the time I will have only one value per index entry.So, in a relational DB i would do something like: As I mentioned in some text you didn't quote, you could use a strategy of storing a scalar unless there are dups. If dups are very rare, this *might* be a win. ... I'm sorry, I have used the wrong word. I meant that a range query will normally involve the union of a higher number of sets as the granularity gets smaller and smaller. If there is only one item per index entry, the union operation will take longer... I assumed that the more BTree entries we have, the more buckets we will have to fetch from the DB, for a given range query. But I am probably wrong... The BTrees package has a pretty efficient strategy for merging multiple sets. So, I was wondering if there is some data structure I can use for this, as this seems to be a pretty common use case. That's why the various indexing(/catalog) schemes already support it. So, if I need an index where items can be queried by date, and range queries can be performed efficiently, an IOBTree will do the job? An IOBTree could be the *basis* of a solution. As I mention above, my only concern is the number of sets that will have to be joined. Depending on how rare dups are, I'd either use a BTree who's values are scalers or sets. I might start with just using sets as the values. Typically the strategy is to assign each object you want to index an integer id. Then your index is of the form: {key - {docids}} Then you can do a lookup with something like: result = BTrees.IOBTree.multiunion(index.values(min,max)) For more code examples, you should look at the zope.index and zope.catalog packages. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Indexing and dates/times
Hello all, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. One of the problems I'm facing is that of index granularity: I'd like to provide to the second granularity, but for that I need some structure that lets me do that. So, the options I see are: - A timestamp-based BTree index - looks highly inefficient, as there will be many entries with only one element (probably almost all of them), and querying by range will be a nightmare (correct me if I'm wrong, please); - Checking the timestamp for every object and comparing it with the provided range (even worse); Maybe there is some other way to do it? Any suggestions? Thanks in advance, Regards, Pedro -- José Pedro Ferreira Indico Team IT-UDS-AVC 513-R-0042 CERN, Geneva, Switzerland ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Indexing and dates/times
On Mon, Jul 12, 2010 at 11:47 AM, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote: Hello all, I am currently trying to devise a way to index and retrieve some millions of objects according to their modification date/time. One of the problems I'm facing is that of index granularity: I'd like to provide to the second granularity, will there ever be more than item with the same key? but for that I need some structure that lets me do that. So, the options I see are: - A timestamp-based What do you mean by timestamp BTree index - looks highly inefficient, as there will be many entries with only one element (probably almost all of them), I have no idea what you mean by this. Jim -- Jim Fulton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev