Re: [ZODB-Dev] Indexing and dates/times

2010-07-13 Thread Jim Fulton
On Tue, Jul 13, 2010 at 4:35 AM, Pedro Ferreira
jose.pedro.ferre...@cern.ch wrote:
 Hello,

 I am currently trying to devise a way to index and retrieve some
 millions of objects according to their modification date/time. One of
 the problems I'm facing is that of index granularity: I'd like to
 provide to the second granularity,


 will there ever be more than item with the same key?


 Exactly, that's the problem.

Typically, to model something like this, you's have a BTree who's
values are sets.  If single items are common and you were willing to
work a bit harder, you could have BTrees whos values could be either a
set or a scalar.

 but for that I need some structure
 that lets me do that. So, the options I see are:
  - A timestamp-based


 What do you mean by timestamp


 Well, it could be a UNIX timestamp.

It could be lots of things. I was asking what you meant.

If you used a unix time stamp, you could ise one of the Ix flavors of BTree.



 BTree index - looks highly inefficient, as there
 will be many entries with only one element (probably almost all of
 them),


 I have no idea what you mean by this.


 That's the problem you've already mentioned above.

So, the issue is that you have multiple items with the same
key. This is simply handled by using sets as values ion a BTree.
There are existing index implementations that do this.


 So, in a relational DB i would do something like:

 SELECT * FROM table WHERE timestamp = X AND timestamp = Y

 Since I cannot do this with ZODB,

I don't know what this is. Range seaches? SQL? BTrees and various
index implementations based on the,m support range searches.  of
course, ZODB doesn't support SQL.

 I'd have to have a BTree, indexed by
 timestamp... however, as you said, if I want to the second granularity, I
 will rarely have two items with the same key (which makes it pretty
 useless).

I don't know why it is useless, but it is easily handled.

 So, I was wondering if there is some data structure I can use for this, as
 this seems to be a pretty common use case.

That's why the various indexing(/catalog) schemes already support it.

Jim

--
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing and dates/times

2010-07-13 Thread Benji York
On Tue, Jul 13, 2010 at 4:35 AM, Pedro Ferreira
jose.pedro.ferre...@cern.ch wrote:
 Hello,
 I am currently trying to devise a way to index and retrieve some
 millions of objects according to their modification date/time.

 So, in a relational DB i would do something like:

 SELECT * FROM table WHERE timestamp = X AND timestamp = Y

 Since I cannot do this with ZODB, I'd have to have a BTree, indexed by
 timestamp... however, as you said, if I want to the second
 granularity, I will rarely have two items with the same key (which makes
 it pretty useless).

If you use the timestamp as the key and you want to retrieve all values
between two timestamps (inclusive), you can do

my_btree.values(min=start, max=end)
-- 
Benji York
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing and dates/times

2010-07-13 Thread Pedro Ferreira

 So, the issue is that you have multiple items with the same
 key. This is simply handled by using sets as values ion a BTree.
 There are existing index implementations that do this.


Hmm... no, in fact the problem is that most of the time I will have only 
one value per index entry.So, in a relational DB i would do something like:
 SELECT * FROM table WHERE timestamp= X AND timestamp= Y

 Since I cannot do this with ZODB,
  
 I don't know what this is. Range seaches? SQL? BTrees and various
 index implementations based on the,m support range searches.  of
 course, ZODB doesn't support SQL.


Yes, I know ZODB doesn't support SQL, I was just trying to demonstrate 
my use case.
What I meant was that in relational DBs a query like the one above can 
be performed over an arbitrary table, without the need for having an 
extra data structure for indexing.

 I'd have to have a BTree, indexed by
 timestamp... however, as you said, if I want to the second granularity, I
 will rarely have two items with the same key (which makes it pretty
 useless).
  
 I don't know why it is useless, but it is easily handled.



It's not useless. I'm sorry, I have used the wrong word. I meant that a 
range query will normally involve the union of a higher number of sets 
as the granularity gets smaller and smaller. If there is only one item 
per index entry, the union operation will take longer... I assumed that 
the more BTree entries we have, the more buckets we will have to fetch 
from the DB, for a given range query. But I am probably wrong...

 So, I was wondering if there is some data structure I can use for this, as
 this seems to be a pretty common use case.
  
 That's why the various indexing(/catalog) schemes already support it.


So, if I need an index where items can be queried by date, and range 
queries can be performed efficiently, an IOBTree will do the job? As I 
mention above, my only concern is the number of sets that will have to 
be joined.

Thanks, once again,

Pedro

-- 
José Pedro Ferreira

Indico Team

IT-UDS-AVC

513-R-0042
CERN, Geneva, Switzerland

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing and dates/times

2010-07-13 Thread Pedro Ferreira

 If you use the timestamp as the key and you want to retrieve all values
 between two timestamps (inclusive), you can do

  my_btree.values(min=start, max=end)


Yes, but, as I mentioned in my answer to Jim's mail, my concern is the 
performance of this range operation for a very large range. If you 
tell me it will be OK, I won't hesitate :).

Regards,

Pedro

-- 
José Pedro Ferreira

Indico Team

IT-UDS-AVC

513-R-0042
CERN, Geneva, Switzerland

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing and dates/times

2010-07-13 Thread Jim Fulton
On Tue, Jul 13, 2010 at 7:51 AM, Pedro Ferreira
jose.pedro.ferre...@cern.ch wrote:
...
 Hmm... no, in fact the problem is that most of the time I will have only one
 value per index entry.So, in a relational DB i would do something like:

As I mentioned in some text you didn't quote, you could use a strategy
of storing a scalar unless there are dups.  If dups are very rare,
this *might* be a win.

...

 I'm sorry, I have used the wrong word. I meant that a
 range query will normally involve the union of a higher number of sets as
 the granularity gets smaller and smaller. If there is only one item per
 index entry, the union operation will take longer... I assumed that the more
 BTree entries we have, the more buckets we will have to fetch from the DB,
 for a given range query. But I am probably wrong...

The BTrees package has a pretty efficient strategy for merging
multiple sets.


 So, I was wondering if there is some data structure I can use for this,
 as
 this seems to be a pretty common use case.


 That's why the various indexing(/catalog) schemes already support it.


 So, if I need an index where items can be queried by date, and range queries
 can be performed efficiently, an IOBTree will do the job?

An IOBTree could be the *basis* of a solution.

 As I mention
 above, my only concern is the number of sets that will have to be joined.

Depending on how rare dups are, I'd either use a BTree who's values
are scalers or sets. I might start with just using sets as the values.

Typically the strategy is to assign each object you want to index an
integer id.  Then your index is of the form:

  {key - {docids}}

Then you can do a lookup with something like:

  result = BTrees.IOBTree.multiunion(index.values(min,max))

For more code examples, you should look at the zope.index and
zope.catalog packages.

Jim

--
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Indexing and dates/times

2010-07-12 Thread Pedro Ferreira
Hello all,

I am currently trying to devise a way to index and retrieve some 
millions of objects according to their modification date/time. One of 
the problems I'm facing is that of index granularity: I'd like to 
provide to the second granularity, but for that I need some structure 
that lets me do that. So, the options I see are:
  - A timestamp-based BTree index - looks highly inefficient, as there 
will be many entries with only one element (probably almost all of 
them), and querying by range will be a nightmare (correct me if I'm 
wrong, please);
- Checking the timestamp for every object and comparing it with the 
provided range (even worse);

Maybe there is some other way to do it? Any suggestions?

Thanks in advance,

Regards,

Pedro

-- 
José Pedro Ferreira

Indico Team

IT-UDS-AVC

513-R-0042
CERN, Geneva, Switzerland

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing and dates/times

2010-07-12 Thread Jim Fulton
On Mon, Jul 12, 2010 at 11:47 AM, Pedro Ferreira
jose.pedro.ferre...@cern.ch wrote:
 Hello all,

 I am currently trying to devise a way to index and retrieve some
 millions of objects according to their modification date/time. One of
 the problems I'm facing is that of index granularity: I'd like to
 provide to the second granularity,

will there ever be more than item with the same key?

 but for that I need some structure
 that lets me do that. So, the options I see are:
  - A timestamp-based

What do you mean by timestamp

 BTree index - looks highly inefficient, as there
 will be many entries with only one element (probably almost all of
 them),

I have no idea what you mean by this.

Jim


-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] indexing and querying objects in the ZODB

2009-02-13 Thread Jim Fulton

On Feb 13, 2009, at 1:54 AM, Sebastian Wehrmann wrote:

 Hi,

 I finished my diploma thesis about indexing and querying objects [1]  
 (only available in german - sorry) last year.

 With that thesis I developed some piece of software, which can be  
 used to query for objects within the ZODB with a XPath like syntax  
 (its called regular path expressions). I released a very pre-alpha  
 version some days ago at pypi [2]. If you are interested, feel free  
 to play with it and report me your experiences. Please also report  
 bugs to [3].

 If you have questions, please send me an Email.

 Kind regards,
  Sebastian

 [1] http://archiv.tu-chemnitz.de/pub/2008/0081/data/diplomarbeit.pdf
 [2] http://pypi.python.org/pypi/gocept.objectquery/
 [3] https://bugs.launchpad.net/gocept.objectquery

This looks very cool.

Have you done any scale experiments?

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] indexing and querying objects in the ZODB

2009-02-13 Thread Sebastian Wehrmann

Hi,

On Friday, Feb 13, at 9:30am +0100, Jim Fulton wrote:

Have you done any scale experiments?



Indeed, I've done tests with some binary trees with a heigth from 2  
till 20. The results pointed out, that filling the index structures  
required very much more time than the query itself (which was  
expected, of course :) ).


For a tree with a height of e.g. 14 (round about 32000 objects) it  
took 20 seconds to fill the ZODB (including indexstructures in  
gocept.objectquery) and less than 2 seconds for some testing queries.


Of course, the times raise exponentially with the height of the tree.

However, the main development focus until now was in stability and not  
in performance. Probably, performance can be increased with simple  
code refactorings.


Basti
--
Sebastian Wehrmann · s...@gocept.com
gocept gmbh  co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 12 · fax +49 345 1229889 1
Zope and Plone consulting and development



smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] indexing and querying objects in the ZODB

2009-02-12 Thread Sebastian Wehrmann

Hi,

I finished my diploma thesis about indexing and querying objects [1]  
(only available in german - sorry) last year.


With that thesis I developed some piece of software, which can be used  
to query for objects within the ZODB with a XPath like syntax (its  
called regular path expressions). I released a very pre-alpha  
version some days ago at pypi [2]. If you are interested, feel free to  
play with it and report me your experiences. Please also report bugs  
to [3].


If you have questions, please send me an Email.

Kind regards,
  Sebastian

[1] http://archiv.tu-chemnitz.de/pub/2008/0081/data/diplomarbeit.pdf
[2] http://pypi.python.org/pypi/gocept.objectquery/
[3] https://bugs.launchpad.net/gocept.objectquery
--
Sebastian Wehrmann · s...@gocept.com
gocept gmbh  co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 12 · fax +49 345 1229889 1
Zope and Plone consulting and development



smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] indexing

2008-07-24 Thread Tim Cook

On Wed, 2008-07-23 at 11:41 -0400, [EMAIL PROTECTED] wrote:

 I've found the ZODB website to be very disorganized an not nearly as
 helpful as repeated googlings. 

I have been collecting links and writing some new text for the new ZODB
website.  But alas there are only so many hours in the day.  :-)

I am currently running a workshop http://www.oshipworkshop.if.uff.br/
and I will get back onto the ZODB website project after August 1.

My apologies.

Cheers,
Tim



-- 
**
Join the OSHIP project.  It is the standards based, open source
healthcare application platform in Python.
Home page: https://launchpad.net/oship/ 
Wiki: http://www.openehr.org/wiki/display/dev/Python+developer%27s+page 
**


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] indexing

2008-07-23 Thread AFoglia
Sean Allen [EMAIL PROTECTED] wrote on 07/22/2008 11:44:11 PM:

 googling and the main zodb website seem to take me to a bunch of what 
 appears to be severely dated websites,
 most not updated in at least 2 years ( some upwards of 6 ). this is 
 probably because i simply don't know what to google
 for, so, could someone be so kind as to shoot some links to different 
 indexing options for use with zodb/zeo?

I've just started playing around with ZODB myself.  We have a database of 
~650,000 items, with dates and multiple tags.  For indexing, I'm using the 
zc.catalog package from PyPi, with a value index for the dates and a set 
index for the tags.  It seems good enough.  There were other indexing 
classes, but I don't remember any of them.

I've found the ZODB website to be very disorganized an not nearly as 
helpful as repeated googlings.

-- 
Anthony Foglia
Princeton Consultants
(609) 987-8787 x233___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] indexing

2008-07-22 Thread Sean Allen
googling and the main zodb website seem to take me to a bunch of what  
appears to be severely dated websites,
most not updated in at least 2 years ( some upwards of 6 ). this is  
probably because i simply don't know what to google
for, so, could someone be so kind as to shoot some links to different  
indexing options for use with zodb/zeo?


much obliged.

-Sean-

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] indexing object in ZODB

2007-03-01 Thread shahrzad khorrami

Hi all,
What is the algorithm of indexing objects in ZODB?

Is there any place where I can see the structure of ZODB ?

Thanks alot
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] Indexing: Query Optimization

2005-05-03 Thread Tim Peters
[Thomas Guettler]
 I developed a simple index using ZODB. Searching for single values is
 very fast. Searching big ranges is slow.

 Example: Search:

 customer_id=0815
 date_start=2001-01-01
 date_end=2004-12-31

 The index holds a btree which maps values to docids. The search for
 customer_id is very fast (Maybe 500 results) . But the range search from
 date_start to date_end is slow (Maybe 100.000 results).

 Up to now I use the intersection method of the BTree package to join the
 result with a logical AND.

 It would be better if the index would do the search for customer_id
 first, and then filter the result by deleting entries which are not in
 the given time period.

 Has anyone done such a Query Optimization? (Zope, Zope3, IndexCatalog,
 ...)

 Since btrees and btree ranges don't know their size, you need to do
 something like this:
 - do range searches at the end
 - if index foo is used, use it first, ...

Dieter Maurer wrote some packages I suspect you'd find interesting in this
respect:

http://mail.zope.org/pipermail/zope/2004-August/152627.html


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev