Re: [ZODB-Dev] Indexing: Query Optimization

2005-05-09 Thread Christian Robottom Reis
On Mon, May 09, 2005 at 05:28:05PM +0200, Thomas Guettler wrote:
> > > Since btrees and btree ranges don't know their size, you need to do
> > > something like this:
> > > - do range searches at the end
> > > - if index "foo" is used, use it first, ...
> >
> > Hmmm. Can you elaborate on this? I don't quite grasp what you mean.
> 
> If you want the intersection (AND) of some sets, you can increase the speed
> if you take the the smallest set and check for each entry if it is a member 
> of 
> the other sets. If you do take the large set first, you need to do more 
> checks for membership.
> 
> The applhe large set first, you need to do more 
> checks for membership.
> 
> The application developer might know that a search in index "foo" results in 
> small results. That's what I meant with "if index 'foo' is used, use it 
> first".

Ah, yes, this makes lots of sense. Hardcoding these heuristics isn't
very nice, though -- at least I don't know how to do it beyond using a
learning query optimizer, or telling the user to reorder his query: for
instance, in IC, it is faster to query through a small catalog and AND
that with a large catalog than the reverse. Having the user alter the
order of the query is a hack but it allows for situation-specific
optimization, which in many contexts is useful.

We do a little trick that avoids even touching the larger catalog (and
the largish sets that are stored in their keys) when doing ANDed joins:
if the number of items in the smaller catalog is below a certain magic
number, we synthesize a temporary index with just those values, querying
through there. This surprisingly gives great results.

However, then I question the general validity of the phrase:

> > > - do range searches at the end

Because in the case of our [business-oriented] applications,
double-bounded range searches (A < x < B) are among the fastest, thanks
to BTree index operations.

--
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3376 0125
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing: Query Optimization

2005-05-09 Thread Thomas Guettler
Am Donnerstag, 5. Mai 2005 19:18 schrieb Christian Robottom Reis:
> On Tue, May 03, 2005 at 02:44:58PM +0200, Thomas Guettler wrote:
> > It would be better if the index would do the search for customer_id
> > first, and then filter the result by deleting entries which are not in
> > the given time period.
> >
> > Has anyone done such a "Query Optimization"? (Zope, Zope3, IndexCatalog,
> > ...)
>
> I have done it, but the way we do it in IC is synthesize an index
> on-the-fly and then do the range query through it. The code is in
> Indexes.py:build_temp_index; it's a nasty little hack.
>
> > Since btrees and btree ranges don't know their size, you need to do
> > something like this:
> > - do range searches at the end
> > - if index "foo" is used, use it first, ...
>
> Hmmm. Can you elaborate on this? I don't quite grasp what you mean.

If you want the intersection (AND) of some sets, you can increase the speed
if you take the the smallest set and check for each entry if it is a member of 
the other sets. If you do take the large set first, you need to do more 
checks for membership.

The application developer might know that a search in index "foo" results in 
small results. That's what I meant with "if index 'foo' is used, use it 
first".

 Thomas




sets
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing: Query Optimization

2005-05-05 Thread Christian Robottom Reis
On Tue, May 03, 2005 at 02:44:58PM +0200, Thomas Guettler wrote:
> It would be better if the index would do the search for customer_id
> first, and then filter the result by deleting entries which are not in
> the given time period.
> 
> Has anyone done such a "Query Optimization"? (Zope, Zope3, IndexCatalog, ...)

I have done it, but the way we do it in IC is synthesize an index
on-the-fly and then do the range query through it. The code is in
Indexes.py:build_temp_index; it's a nasty little hack.

> Since btrees and btree ranges don't know their size, you need to do something 
> like this:
> - do range searches at the end
> - if index "foo" is used, use it first, ...

Hmmm. Can you elaborate on this? I don't quite grasp what you mean.

I think the only way to implement this correctly is using a different
data structure -- one that allows you to slice and dice your data in
different axis. I may be short-sighted, though.

Take care,
--
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3361 2331
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Indexing: Query Optimization

2005-05-04 Thread Dieter Maurer
Thomas Guettler wrote at 2005-5-3 14:44 +0200:
> ...
>I developed a simple index using ZODB. Searching for single values is
>very fast. Searching big ranges is slow.
>
>Example: 
>Search: 
>
>customer_id=0815
>date_start=2001-01-01
>date_end=2004-12-31
>
>The index holds a btree which maps values to docids. The search
>for customer_id is very fast (Maybe 500 results) . But the range search from 
>date_start to date_end is slow (Maybe 100.000 results).
>
>Up to now I use the intersection method of the BTree package to
>join the result with a logical AND.
>
>It would be better if the index would do the search for customer_id
>first, and then filter the result by deleting entries which are not in
>the given time period.

"IncrementalSearch" does it quite intelligently (although not
in your suggested way).

Unfortunately, it is not yet implemented in "C"
and therefore cannot yet beat "multiunion" (which you should use for unions
of more than a few sets). However, for highly specific "and"
queries, it is already faster than traditional search
even when a large "or" query is involved (which uses "multiunion";
it is several orders of magnitude faster, if the "or" uses
iterated "union"s rather than "multiunion").

You find "IncrementalSearch" at

  

I work on a C implementation. But, this is a lot of work
and therefore will take time...

-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


RE: [ZODB-Dev] Indexing: Query Optimization

2005-05-03 Thread Tim Peters
[Thomas Guettler]
> I developed a simple index using ZODB. Searching for single values is
> very fast. Searching big ranges is slow.
>
> Example: Search:
>
> customer_id=0815
> date_start=2001-01-01
> date_end=2004-12-31
>
> The index holds a btree which maps values to docids. The search for
> customer_id is very fast (Maybe 500 results) . But the range search from
> date_start to date_end is slow (Maybe 100.000 results).
>
> Up to now I use the intersection method of the BTree package to join the
> result with a logical AND.
>
> It would be better if the index would do the search for customer_id
> first, and then filter the result by deleting entries which are not in
> the given time period.
>
> Has anyone done such a "Query Optimization"? (Zope, Zope3, IndexCatalog,
> ...)
>
> Since btrees and btree ranges don't know their size, you need to do
> something like this:
> - do range searches at the end
> - if index "foo" is used, use it first, ...

Dieter Maurer wrote some packages I suspect you'd find interesting in this
respect:

http://mail.zope.org/pipermail/zope/2004-August/152627.html


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Indexing: Query Optimization

2005-05-03 Thread Thomas Guettler
Hi,

I developed a simple index using ZODB. Searching for single values is
very fast. Searching big ranges is slow.

Example: 
Search: 

customer_id=0815
date_start=2001-01-01
date_end=2004-12-31

The index holds a btree which maps values to docids. The search
for customer_id is very fast (Maybe 500 results) . But the range search from 
date_start to date_end is slow (Maybe 100.000 results).

Up to now I use the intersection method of the BTree package to
join the result with a logical AND.

It would be better if the index would do the search for customer_id
first, and then filter the result by deleting entries which are not in
the given time period.

Has anyone done such a "Query Optimization"? (Zope, Zope3, IndexCatalog, ...)

Since btrees and btree ranges don't know their size, you need to do something 
like this:
- do range searches at the end
- if index "foo" is used, use it first, ...

 Thomas

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev