subject:"\[HACKERS\] Best way to scan on\-disk bitmaps"

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-06-14 Thread Oleg Bartunov


On Mon, 13 Jun 2005, Tom Lane wrote:


Teodor Sigaev [EMAIL PROTECTED] wrote:

... So, if index is defined as 'using gist (a,b,c)' then, in
principle, GiST index can speed up queries like 'a=V1 and c=V2'.  But
it will not usable for queries ( b=V3 and c=V2 ). By the way, instead
of '=' operation may be used other operations. Number of supported
operations by GiST is indefinite unlike, for example, btree which
supported only five: , =, =, =, .


I have committed changes to the planner to arrange that a GiST indexscan
must supply at least one restriction clause for the first index column,
and can supply restriction clauses for any, all, or none of the
remaining columns; the old left-to-right heuristic is gone.

As far as I can tell, this doesn't require any changes to the GiST code,
but please take another look if you aren't too sure about it.


I did quick test and found no problem with gist(a,b,c) and index does used for 
(a,*,c) case




regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-06-13 Thread Tom Lane

Teodor Sigaev [EMAIL PROTECTED] wrote:
 ... So, if index is defined as 'using gist (a,b,c)' then, in
 principle, GiST index can speed up queries like 'a=V1 and c=V2'.  But
 it will not usable for queries ( b=V3 and c=V2 ). By the way, instead
 of '=' operation may be used other operations. Number of supported
 operations by GiST is indefinite unlike, for example, btree which
 supported only five: , =, =, =, .

I have committed changes to the planner to arrange that a GiST indexscan
must supply at least one restriction clause for the first index column,
and can supply restriction clauses for any, all, or none of the
remaining columns; the old left-to-right heuristic is gone.

As far as I can tell, this doesn't require any changes to the GiST code,
but please take another look if you aren't too sure about it.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-16 Thread Greg Stark



Tom Lane [EMAIL PROTECTED] writes:

 I think it would be easy to change the planner and btree to handle
 this (where easy means I remember where all the skeletons are
 buried).  But I don't know the gist code hardly at all.  Can anyone
 offer an informed opinion on whether gist can handle this now, and
 if not what it would take to handle it?

Currently gist indexes only use the first column for page splits, making
multi-key gist indexes basically useless. The problem is it's hard to imagine
an API for a pickSplit function that could handle multi-key indexes with
disparate data types and operator classes. I had an idea of a new way to deal
with gist indexes that simplified the API and side-stepped the whole issue but
you raised concerns that it might be too limiting.

Unfortunately the mailing list archive seems to have missed this discussion.
I've attached three messages from the discussion at the time.

---BeginMessage---
Oleg  Teodor,
If I understand the code correctly, GiST will only pass the first 
attribute of each index tuple to the user-defined PickSplit method when 
it wants to split a node. (see circa line 1269 of gist.c)

Is this a wise design decision? Granted, in many situations the first 
attribute in the index is sufficient to make a reasonable decision about 
how to divide the node into two halves, but I don't think that is 
universally true. For example, consider a two column index whose first 
attribute has a small number of distinct values. It could well be that 
all the first attribute values in a node-to-be-split would be the same. 
Only passing the first attribute to PickSplit would result in an 
essentially random distribution of tuples among the split nodes, rather 
than allowing the GiST extension to make use of the second attribution 
to partition the nodes. That's an extreme example, but it is easy to 
construct more realistic scenarios (basically, any situation in which 
the cardinality of the first index attribute is low -- a reasonably 
common occurrence with a multi-column index, I believe).

I'm not sure whether this would be a problem in practice. Speculation: 
repeated invocations of PickSplit are one of the main factors in 
deriving the ultimate shape of the GiST tree. Poor distribution of keys 
resulting from PickSplit would eventually result in unnecessarily loose 
bounding predicates in internal nodes, which would hurt performance.

Comments welcome,
Neil
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

---End Message---
---BeginMessage---
Greg Stark [EMAIL PROTECTED] writes:

 I'm not sure that GiST indexes behave the same way as btree indexes for the
 multi-column case. 
 
 In a btree index the second column is entirely subordinate to the first
 column. In a GiST index the data is multi-dimensional, and all dimensions are
 equally important.

In fact on further consideration I do have a proposal.

If you look at the GiST implementations for various index types you'll see
that many (all?) take the same approach for PickSplit. In fact they pretty
much all have the same code copy/pasted to handle it.

The approach they take is to have a function which calculates an abstract
distance between any two entries. There's an algorithm that they use to pick
the split based on this distance function.

If you abandoned PickSplit and instead exposed this distance function as the
external API then the behaviour for multi-column indexes is clear. You
calculate the distance along all the axes and calculate the diagonal distance.

I think abandoning PickSplit for the distance function might also mean you
don't need a separate function for Penalty either, but I'm not sure on that.

-- 
greg



---End Message---
---BeginMessage---
Greg Stark [EMAIL PROTECTED] writes:
 The approach they take is to have a function which calculates an
 abstract distance between any two entries. There's an algorithm that
 they use to pick the split based on this distance function.

 If you abandoned PickSplit and instead exposed this distance
 function as the external API then the behaviour for multi-column
 indexes is clear. You calculate the distance along all the axes and
 calculate the diagonal distance.

Hmm ... the problem with that is the assumption that different opclasses
will compute similarly-scaled distances.  If opclass A generates
distances in the range (0,1e6) while B generates in the range (0,1),
combining them with Euclidean distance won't work well at all.  OTOH you
can't blindly normalize, because in some cases maybe the data is such
that a massive difference in distances is truly appropriate.

I'm also a bit leery of the assumption that every GiST application can
reduce its PickSplit logic to Euclidean distances.

regards, tom lane


---End Message---



 (hash and rtree are not at issue since they don't support multi-key
 indexes.)

It seems like it would be trivial to make

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-16 Thread Teodor Sigaev

About page splitting algorithm in GiST in multikey case. For the beginning, page 
is splitted by calling pickSplit method of key of first column (pickSplit method 
is defined for opclass and it is a user function), then it try to find equal 
values  of first column in left and right pages ( gist.c lines 1264-1901 ). If 
it is, then GiST core will try to resort tuples with first equal keys between 
left and right pages using penalty method for second and higher column's key. If 
it's not, it leave pages untouched. But unions for parent page of second and 
higher column's keys will be formed.

So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST index 
can speed up queries like 'a=V1 and c=V2'.  But it will not usable for queries 
( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other 
operations. Number of supported operations by GiST is indefinite unlike, for 
example, btree which supported only five: , =, =, =, .


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-16 Thread Bruce Momjian


If people have GIST TODOs, please post them.

---

Teodor Sigaev wrote:
 About page splitting algorithm in GiST in multikey case. For the beginning, 
 page 
 is splitted by calling pickSplit method of key of first column (pickSplit 
 method 
 is defined for opclass and it is a user function), then it try to find equal 
 values  of first column in left and right pages ( gist.c lines 1264-1901 ). 
 If 
 it is, then GiST core will try to resort tuples with first equal keys between 
 left and right pages using penalty method for second and higher column's key. 
 If 
 it's not, it leave pages untouched. But unions for parent page of second and 
 higher column's keys will be formed.
 
 So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST 
 index 
 can speed up queries like 'a=V1 and c=V2'.  But it will not usable for 
 queries 
 ( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other 
 operations. Number of supported operations by GiST is indefinite unlike, for 
 example, btree which supported only five: , =, =, =, .
 
 
 
 -- 
 Teodor Sigaev   E-mail: [EMAIL PROTECTED]
 WWW: http://www.sigaev.ru/
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-16 Thread Christopher Kings-Lynne


Bruce Momjian wrote:
If people have GIST TODOs, please post them.
Concurrency :)
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-16 Thread Alvaro Herrera

On Tue, May 17, 2005 at 09:39:20AM +0800, Christopher Kings-Lynne wrote:
 
 
 Bruce Momjian wrote:
 If people have GIST TODOs, please post them.
 
 Concurrency :)

And WAL support.

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
No necesitamos banderas
 No reconocemos fronteras  (Jorge González)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-16 Thread Bruce Momjian

Alvaro Herrera wrote:
 On Tue, May 17, 2005 at 09:39:20AM +0800, Christopher Kings-Lynne wrote:
  
  
  Bruce Momjian wrote:
  If people have GIST TODOs, please post them.
  
  Concurrency :)
 
 And WAL support.

Already there:

* Add WAL index reliability improvement to non-btree indexes

and this too:

* Add concurrency to GIST

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-15 Thread Greg Stark

Bruce Momjian pgman@candle.pha.pa.us writes:

  Hmm.  That particular case will work, but the planner believes that only
  consecutive columns in the index are usable --- that is, if you have
  quals for a and c but not for b, it will think that the condition for c
  isn't usable with the index.  This is true for btree and gist indexes,
  so I suppose we'd need to introduce a pg_am column that tells what to
  do.
 
 We do have a TODO for this:
 
 * Use index to restrict rows returned by multi-key index when used with
   non-consecutive keys to reduce heap accesses
 
   For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and
   col3 = 9, spin though the index checking for col1 and col3 matches,
   rather than just col1; also called skip-scanning.

That TODO is something else. 

Though it is related in that it is another example of why the existing code is
too simplistic and will eventually need to be enhanced. Not only would bitmap
indexes and (possibly) gist indexes, but even btree indexes would need to do
so if this TODO were implemented.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-15 Thread Manfred Koizar

On Thu, 12 May 2005 17:40:06 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
the planner believes that only
consecutive columns in the index are usable --- that is, if you have
quals for a and c but not for b, it will think that the condition for c
isn't usable with the index.  This is true for btree [...]

It's not difficult to setup a test case where an index is used, but
only with a=42 as an index condition, and c='foo' is applied as a
filter condition.  Now add a redundant condition on b
... AND b BETWEEN minb AND maxb ...
and watch how c='foo' moves into the index condition.

I did this test some time ago and I believe that adding the condition
on b did not change the estimated cost, only the actual execution
time.

Servus
 Manfred


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-15 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 Bruce Momjian pgman@candle.pha.pa.us writes:
 Hmm.  That particular case will work, but the planner believes that only
 consecutive columns in the index are usable --- that is, if you have
 quals for a and c but not for b, it will think that the condition for c
 isn't usable with the index.

 We do have a TODO for this:
 
 * Use index to restrict rows returned by multi-key index when used with
 non-consecutive keys to reduce heap accesses

 That TODO is something else. 

No, I think Bruce is right --- it's essentially the same thing.  It
certainly would be a good idea to change btrees to work like that,
if we are going to modify the planner to relax the restriction for
other index types.

I think it would be easy to change the planner and btree to handle
this (where easy means I remember where all the skeletons are
buried).  But I don't know the gist code hardly at all.  Can anyone
offer an informed opinion on whether gist can handle this now, and
if not what it would take to handle it?

(hash and rtree are not at issue since they don't support multi-key
indexes.)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-15 Thread Oleg Bartunov

On Sun, 15 May 2005, Tom Lane wrote:
Greg Stark [EMAIL PROTECTED] writes:
Bruce Momjian pgman@candle.pha.pa.us writes:
Hmm.  That particular case will work, but the planner believes that only
consecutive columns in the index are usable --- that is, if you have
quals for a and c but not for b, it will think that the condition for c
isn't usable with the index.
We do have a TODO for this:
* Use index to restrict rows returned by multi-key index when used with
non-consecutive keys to reduce heap accesses

That TODO is something else.
No, I think Bruce is right --- it's essentially the same thing.  It
certainly would be a good idea to change btrees to work like that,
if we are going to modify the planner to relax the restriction for
other index types.
I think it would be easy to change the planner and btree to handle
this (where easy means I remember where all the skeletons are
buried).  But I don't know the gist code hardly at all.  Can anyone
offer an informed opinion on whether gist can handle this now, and
if not what it would take to handle it?
I think that handling this in GiST is depends solely on how users consistent 
function designed to handle NULLs in keys. Other words, it should works as 
soon as users consistent function know what to do with NULLs in internal keys.

Teodor will comment multi-key GiST tomorrow.
We used Paul Aoki paper Generalizing ''Search'' in Generalized Search Trees,
(available from 
http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf )
for implementation of multi-key GiST index support. It's true, that first
key is used for splitting, but elements with duplicated first key could
be shuffled to get better clustering on second key.
(hash and rtree are not at issue since they don't support multi-key
indexes.)
regards, tom lane
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-14 Thread Bruce Momjian

Tom Lane wrote:
 Victor Y. Yegorov [EMAIL PROTECTED] writes:
  If I have on-disk bitmap
  ON (a, b, c)
  will the planner pick an index scan for
  WHERE a = 42 AND b = 'foo'
  (i.e. only part of the index attributes are involved)? Any modifications
  needed to achieve this functionality?
 
 Hmm.  That particular case will work, but the planner believes that only
 consecutive columns in the index are usable --- that is, if you have
 quals for a and c but not for b, it will think that the condition for c
 isn't usable with the index.  This is true for btree and gist indexes,
 so I suppose we'd need to introduce a pg_am column that tells what to
 do.

We do have a TODO for this:

* Use index to restrict rows returned by multi-key index when used with
  non-consecutive keys to reduce heap accesses

  For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and
  col3 = 9, spin though the index checking for col1 and col3 matches,
  rather than just col1; also called skip-scanning.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-13 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Plan B would be to remove that restriction and teach btree and gist to
 cope.  While a btree couldn't use a nonconsecutive restriction as part
 of its where-to-scan logic, I don't see any good reason why it couldn't
 still perform the test before returning the TID, thus possibly saving a
 trip to the heap.

 [ snip ]

 In this model the columns listed in the gist index are unordered. Any subset
 of columns can be used to perform an index lookup. Making it more like the
 bitmap index behaviour you're looking at than the btree index behaviour.

I thought some more about this since sending my earlier message.  As far
as I can recall at the moment, there really isn't anything fundamental
that depends on the consecutive-columns rule.  The one place where the
rubber meets the road is in the index cost estimation functions: if we
were to relax that rule, then btcostestimate would have to be taught to
include only the consecutive columns when estimating how much of a btree
index is going to be touched.

And more than that: if you've studied the btree code at all, you realize
that that's only an incomplete heuristic anyway.  For instance, if the
leading key is a  xxx, second keys like b  yyy and b  yyy act
completely differently in terms of indexscan cost, but btcostestimate
doesn't presently know that.

I wonder if we shouldn't migrate the amcostestimate functions into the
individual index AMs (which would mean adding a column to pg_am, but so
what).  btcostestimate could be much less phony about this if it had
access to the same infrastructure that _bt_first uses to examine the
index clauses.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

[HACKERS] Best way to scan on-disk bitmaps

2005-05-12 Thread Victor Y. Yegorov

Greetings.

I have questions on how to implement on-disk bitmap scan.


I've been working on on-disk bitmaps as an ordinary Index Access Method, but
now it's clear to me, that it'll lose all it's strengths this way. One on-disk
bitmap has exactly one list of indexed table's TIDs and potentially unlimited
number of bitmaps (number of index attributes multiplied by attribute's
cardinality, to be precise).

So, for better performance, one should first retrieve all the needed bitmaps
from the index, then join all bitmaps according to the query clauses and, as
the last phase, retrieve TIDs from index, that matches final bitmap.

According to the docs the index access method is responsible for
regurgitating the TIDs ..., but for on-disk bitmaps index scan is devided
into 3 phases. So, to perform the scan in phases, to my mind, executor
should be involved. (I'd like to mention again, that this is the first time
I got so deep inside postgres code).

I wanted to use Tom's nodeBitmap* stuff, but it's problematic a bit. Bitmaps
in nodeBitmap* are built upon list of TIDs retrieved during relation scan.
For on-disk bitmap indexes, there's no need for that, as all bitmaps are
already inside the index.

The question is: Is it possible to extend nodeBitmap functionality in such a
way, that Executor can either build bitmap after list of TIDs, obtained from
RelationScan, or ask index access method to give bitmaps it contain (and TIDs
at given position in the map later)?
This will, probably, require more functions in the pg_am catalog.

Or should I create a completely new node for on-disk bitmaps?


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-12 Thread Tom Lane

Victor Y. Yegorov [EMAIL PROTECTED] writes:
 I have questions on how to implement on-disk bitmap scan.

I think your best plan might be

1. Be sure that all the indexable WHERE conditions are passed to the
   indexscan as indexquals.  This might be, say,
WHERE a = 42 and b = 'foo'

2. Within the index AM, form the AND of the relevant bitmaps (here the
   ones for a = 42 and b = 'foo').

3. Within the index AM, pick up the TIDs for the remaining one-bits,
   and pass them back.

4. Let the existing machinery handle the OR-ing problem as well as
   actual fetching of the heap rows.

This can be done without any restructuring of the index AM API.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-12 Thread Victor Y. Yegorov

* Tom Lane [EMAIL PROTECTED] [12.05.2005 23:09]:
 1. Be sure that all the indexable WHERE conditions are passed to the
indexscan as indexquals.  This might be, say,
   WHERE a = 42 and b = 'foo'

If I have on-disk bitmap
ON (a, b, c)
will the planner pick an index scan for
WHERE a = 42 AND b = 'foo'
(i.e. only part of the index attributes are involved)? Any modifications
needed to achieve this functionality?

To my mind, bitmap scan even for 1 attribute of a multi-column index would be
a win, though I haven't tested this yet.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-12 Thread Tom Lane

Victor Y. Yegorov [EMAIL PROTECTED] writes:
 If I have on-disk bitmap
   ON (a, b, c)
 will the planner pick an index scan for
   WHERE a = 42 AND b = 'foo'
 (i.e. only part of the index attributes are involved)? Any modifications
 needed to achieve this functionality?

Hmm.  That particular case will work, but the planner believes that only
consecutive columns in the index are usable --- that is, if you have
quals for a and c but not for b, it will think that the condition for c
isn't usable with the index.  This is true for btree and gist indexes,
so I suppose we'd need to introduce a pg_am column that tells what to
do.

[ thinks some more ... ]

Plan B would be to remove that restriction and teach btree and gist to
cope.  While a btree couldn't use a nonconsecutive restriction as part
of its where-to-scan logic, I don't see any good reason why it couldn't
still perform the test before returning the TID, thus possibly saving a
trip to the heap.  Offhand it seems this should be true of gist as well,
but I don't know that code well enough to be sure.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-12 Thread Greg Stark


Tom Lane [EMAIL PROTECTED] writes:

 Plan B would be to remove that restriction and teach btree and gist to
 cope.  While a btree couldn't use a nonconsecutive restriction as part
 of its where-to-scan logic, I don't see any good reason why it couldn't
 still perform the test before returning the TID, thus possibly saving a
 trip to the heap.  Offhand it seems this should be true of gist as well,
 but I don't know that code well enough to be sure.

Not long ago there was some discussion about how gist indexes don't really
handle multicolumn indexes usefully currently. They use only the first column
to determine page splits. The discussion wandered and it became clear that it
wasn't even clear what a multicolumn gist index should mean.

I suggested treating each column as independent axes. Independently ask each
column's datatype for the distance value and then calculate the inner
product as a kind of geometric n-dimensional distance. There was some
resistance since this limits gist indexes to always basing page splits on a
single distance based algorithm. (Though all current gist index methods in
the postgres source tree do work this way, mostly with copy-pasted code in
fact.)

In this model the columns listed in the gist index are unordered. Any subset
of columns can be used to perform an index lookup. Making it more like the
bitmap index behaviour you're looking at than the btree index behaviour.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

[HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

Re: [HACKERS] Best way to scan on-disk bitmaps

19 matches

Site Navigation

Mail list logo

Footer information