subject:"\[HACKERS\] CLUSTER and indisclustered"

Re: [HACKERS] CLUSTER and indisclustered

2002-08-13 Thread Hannu Krosing


On Tue, 2002-08-13 at 09:25, Bruce Momjian wrote:
 
 There is a web page about star joins used a lot in data warehousing,
 where you don't know what queries are going to be required and what
 indexes to create:
 
   http://www.dbdomain.com/a100397.htm
 
 They show some sample queries, which is good.  Here is some
 interesting text:
 
   Star Transformation
 
   If there are bitmap indexes on SALES_REP_ID, PRODUCT_ID, and
   DEPARTMENT_ID in the SALES table, then Oracle can resolve the query
   using merges of the bitmap indexes.
   
   Because Oracle can efficiently merge multiple bitmap indexes, you can 
   create a single bitmap index on each of the foreign-key columns in the
   fact table rather than on every possible combination of columns.

Another way to achive the similar result would be using segmented hash
indexes, where each column maps directly to some part of hash value.

 This
   lets you support all possible combinations of dimensions without
   creating an unreasonable number of indexes.

---
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-12 Thread Bruce Momjian



I wanted to comment on this bitmapped index discussion because I am
hearing a lot about star joins, data warehousing, and bitmapped indexes
recently.

It seems we have several uses for bitmapped indexes:

Do index lookups in sequential heap order
Allow joining of bitmapped indexes to construct arbitrary indexes

There is a web page about star joins used a lot in data warehousing,
where you don't know what queries are going to be required and what
indexes to create:

http://www.dbdomain.com/a100397.htm

They show some sample queries, which is good.  Here is some
interesting text:

Star Transformation

If there are bitmap indexes on SALES_REP_ID, PRODUCT_ID, and
DEPARTMENT_ID in the SALES table, then Oracle can resolve the query
using merges of the bitmap indexes.

Because Oracle can efficiently merge multiple bitmap indexes, you can 
create a single bitmap index on each of the foreign-key columns in the
fact table rather than on every possible combination of columns. This
lets you support all possible combinations of dimensions without
creating an unreasonable number of indexes.

Added to TODO:

* Use bitmaps to fetch heap pages in sequential order [performance] 
* Use bitmaps to combine existing indexes [performance]

and I will add some of these emails to TODO.detail/performance.

---

Tom Lane wrote:
 Curt Sampson [EMAIL PROTECTED] writes:
  But after doing some benchmarking of various sorts of random reads
  and writes, it occurred to me that there might be optimizations
  that could help a lot with this sort of thing. What if, when we've
  got an index block with a bunch of entries, instead of doing the
  reads in the order of the entries, we do them in the order of the
  blocks the entries point to?
 
 I thought to myself didn't I just post something about that?
 and then realized it was on a different mailing list.  Here ya go
 (and no, this is not the first time around on this list either...)
 
 
 I am currently thinking that bitmap indexes per se are not all that
 interesting.  What does interest me is bitmapped index lookup, which
 came back into mind after hearing Ann Harrison describe how FireBird/
 InterBase does it.
 
 The idea is that you don't scan the index and base table concurrently
 as we presently do it.  Instead, you scan the index and make a list
 of the TIDs of the table tuples you need to visit.  This list can
 be conveniently represented as a sparse bitmap.  After you've finished
 looking at the index, you visit all the required table tuples *in
 physical order* using the bitmap.  This eliminates multiple fetches
 of the same heap page, and can possibly let you get some win from
 sequential access.
 
 Once you have built this mechanism, you can then move on to using
 multiple indexes in interesting ways: you can do several indexscans
 in one query and then AND or OR their bitmaps before doing the heap
 scan.  This would allow, for example, WHERE a = foo and b = bar
 to be handled by ANDing results from separate indexes on the a and b
 columns, rather than having to choose only one index to use as we do
 now.
 
 Some thoughts about implementation: FireBird's implementation seems
 to depend on an assumption about a fixed number of tuple pointers
 per page.  We don't have that, but we could probably get away with
 just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page.
 Also, the main downside of this approach is that the bitmap could
 get large --- but you could have some logic that causes you to fall
 back to plain sequential scan if you get too many index hits.  (It's
 interesting to think of this as lossy compression of the bitmap...
 which leads to the idea of only being fuzzy in limited areas of the
 bitmap, rather than losing all the information you have.)
 
 A possibly nasty issue is that lazy VACUUM has some assumptions in it
 about indexscans holding pins on index pages --- that's what prevents
 it from removing heap tuples that a concurrent indexscan is just about
 to visit.  It might be that there is no problem: even if lazy VACUUM
 removes a heap tuple and someone else then installs a new tuple in that
 same TID slot, you should be okay because the new tuple is too new to
 pass your visibility test.  But I'm not convinced this is safe.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Alvaro Herrera


Tom Lane dijo: 

 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom, should we be updating that flag after we CLUSTER instead of
  requiring an ANALYZE after the CLUSTER?
 
 Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
 quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
 settle for a note in the CLUSTER docs that recommends a subsequent
 ANALYZE --- this seems no different from recommending ANALYZE after bulk
 data load or other major update of a table.

What if I [try to] extend the grammar to support an additional ANALYZE
in CLUSTER, so that it analyzes the table automatically? Say

CLUSTER index ON table [ANALYZE];

Or maybe just do an analyze of the table automatically after the
CLUSTERing.

What does everybody think?

-- 
Alvaro Herrera (alvherre[a]atentus.com)
Para tener mas hay que desear menos


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Neil Conway


Alvaro Herrera [EMAIL PROTECTED] writes:
 What if I [try to] extend the grammar to support an additional ANALYZE
 in CLUSTER, so that it analyzes the table automatically?

I don't like this -- it seems like bloat. What's the advantage of

CLUSTER foo ON bar ANALYZE;

over

CLUSTER foo ON bar;
ANALYZE;

 Or maybe just do an analyze of the table automatically after the
 CLUSTERing.

Hmmm... I don't really see the problem with adding a note in the docs
suggesting that users following a CLUSTER with an ANALYZE (of course,
that assumes that the CLUSTER will significantly change the ordering
of the data in the table, which isn't always the case -- which is
another reason why make this automatic seems unwarranted, IMHO). It
seems like you're looking for a solution to a non-existent problem.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Alvaro Herrera


Neil Conway dijo: 

 Alvaro Herrera [EMAIL PROTECTED] writes:
  What if I [try to] extend the grammar to support an additional ANALYZE
  in CLUSTER, so that it analyzes the table automatically?
 
 I don't like this -- it seems like bloat.

Maybe you are right.


  Or maybe just do an analyze of the table automatically after the
  CLUSTERing.
 
 Hmmm... I don't really see the problem with adding a note in the docs
 suggesting that users following a CLUSTER with an ANALYZE (...).

ANALYZE is an inexpensive operation (compared to CLUSTER, anyway), so it
can't hurt to have it done automatically.

-- 
Alvaro Herrera (alvherre[a]atentus.com)
Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo (Jaime Salinas)



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Christopher Kings-Lynne


   Or maybe just do an analyze of the table automatically after the
   CLUSTERing.
 
  Hmmm... I don't really see the problem with adding a note in the docs
  suggesting that users following a CLUSTER with an ANALYZE (...).

 ANALYZE is an inexpensive operation (compared to CLUSTER, anyway), so it
 can't hurt to have it done automatically.

Well we have previously had discussions on the topic of adding analyze to
the end of dumps, etc. and the result has always been in favour of keeping
the command set orthogonal and not doing an automatic analyze...

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Alvaro Herrera


Christopher Kings-Lynne dijo: 

Or maybe just do an analyze of the table automatically after the
CLUSTERing.
 
 Well we have previously had discussions on the topic of adding analyze to
 the end of dumps, etc. and the result has always been in favour of keeping
 the command set orthogonal and not doing an automatic analyze...

Oh.  Sorry for the noise.

I'm trying to look at other things in the TODO so I stop pestering about
CLUSTER.

-- 
Alvaro Herrera (alvherre[a]atentus.com)
Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura (Perelandra, CSLewis)


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Christopher Kings-Lynne


  Well we have previously had discussions on the topic of adding
 analyze to
  the end of dumps, etc. and the result has always been in favour
 of keeping
  the command set orthogonal and not doing an automatic analyze...

 Oh.  Sorry for the noise.

 I'm trying to look at other things in the TODO so I stop pestering about
 CLUSTER.

All I can say is - thanks for fixing CLUSTER.  As soon as we upgrade to 7.3
I'm going on a CLUSTERing spree :)

Chris


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Christopher Kings-Lynne


If you're looking for something very useful to work on, see if Gavin
Sherry(?) can post his old CREATE OR REPLACE VIEW code.  I'm pretty sure he
(or someone) said that he had an old patch, that needed to be synced with
HEAD...  This functionality is pretty essential for 7.3...

Chris

 -Original Message-
 From: Alvaro Herrera [mailto:[EMAIL PROTECTED]]
 Sent: Friday, 9 August 2002 10:21 AM
 To: Christopher Kings-Lynne
 Cc: Neil Conway; Tom Lane; Bruce Momjian; [EMAIL PROTECTED]
 Subject: Re: [HACKERS] CLUSTER and indisclustered


 Christopher Kings-Lynne dijo:

 Or maybe just do an analyze of the table automatically after the
 CLUSTERing.
 
  Well we have previously had discussions on the topic of adding
 analyze to
  the end of dumps, etc. and the result has always been in favour
 of keeping
  the command set orthogonal and not doing an automatic analyze...

 Oh.  Sorry for the noise.

 I'm trying to look at other things in the TODO so I stop pestering about
 CLUSTER.

 --
 Alvaro Herrera (alvherre[a]atentus.com)
 Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
 sólo le suma el nuevo terror de la locura (Perelandra, CSLewis)



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Tom Lane


Neil Conway [EMAIL PROTECTED] writes:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 What if I [try to] extend the grammar to support an additional ANALYZE
 in CLUSTER, so that it analyzes the table automatically?

 I don't like this -- it seems like bloat.

My reaction exactly.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry


Hi all,

It occured to me on the plane home that now that CLUSTER is fixed we may
be able to put pg_index.indisclustered to use. If CLUSTER was to set
indisclustered to true when it clusters a heap according to the given
index, we could speed up sequantial scans. There are two possible ways.

1) Planner determines that a seqscan is appropriate *and* the retrieval is
qualified by the key(s) of one of the relation's indexes
2) Planner determines that the relation is clustered on disk according to
the index over the key(s) used to qualify the retrieval
3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
different from SeqNext) called SeqClusterNext
6) SeqClusterNext() has all the heapgettup() logic with two
exceptions: a) we find the first tuple more intelligently (instead of
scanning from the first page) b) if we have found tuple(s) matching the
ScanKey when we encounter an non-matching tuple (via
HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
scan

Any reason this isn't possible? Any reason it couldn't dramatically speed
up the performance of the type of query i've mentioned?

Gavin


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane


Gavin Sherry [EMAIL PROTECTED] writes:
 It occured to me on the plane home that now that CLUSTER is fixed we may
 be able to put pg_index.indisclustered to use. If CLUSTER was to set
 indisclustered to true when it clusters a heap according to the given
 index, we could speed up sequantial scans.

AFAICT you're assuming that the table is *exactly* ordered by the
clustered attribute.  While this is true at the instant CLUSTER
completes, the exact ordering will be destroyed by the first insert or
update :-(.  I can't see much value in creating a whole new scan type
that's only usable on a perfectly-clustered table.

The existing approach to making the planner smart about clustered tables
is to compute a physical-vs-logical-order-correlation statistic and use
that to adjust the estimated cost of indexscans.  I believe this is a
more robust approach than considering a table to be clustered or not
clustered, since it can deal with the gradual degradation of clustered
order over time.  However, I will not make any great claims for the
specific equations currently used for this purpose --- they're surely in
need of improvement.  Feel free to take a look and see if you have any
ideas.  The collection of the statistic is in commands/analyze.c and the
use of it is in optimizer/path/costsize.c.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry


On Sat, 3 Aug 2002, Tom Lane wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:
  It occured to me on the plane home that now that CLUSTER is fixed we may
  be able to put pg_index.indisclustered to use. If CLUSTER was to set
  indisclustered to true when it clusters a heap according to the given
  index, we could speed up sequantial scans.
 
 AFAICT you're assuming that the table is *exactly* ordered by the
 clustered attribute.  While this is true at the instant CLUSTER
 completes, the exact ordering will be destroyed by the first insert or
 update :-(.  I can't see much value in creating a whole new scan type

Sorry, I meant to say that heap_insert() etc would need to set
indisclustered to false.

I do see some worth in this however. Naturally, in a situation where a
database is being modified very often this is of little value. However,
for applications focussed on analysing large amounts of static data this
could increase performance significantly. Once I get some time I will
attempt to explore this further in `diff -c` format :-).

Gavin


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Tom Lane wrote:
 Gavin Sherry [EMAIL PROTECTED] writes:
  It occured to me on the plane home that now that CLUSTER is fixed we may
  be able to put pg_index.indisclustered to use. If CLUSTER was to set
  indisclustered to true when it clusters a heap according to the given
  index, we could speed up sequantial scans.
 
 AFAICT you're assuming that the table is *exactly* ordered by the
 clustered attribute.  While this is true at the instant CLUSTER
 completes, the exact ordering will be destroyed by the first insert or
 update :-(.  I can't see much value in creating a whole new scan type
 that's only usable on a perfectly-clustered table.
 
 The existing approach to making the planner smart about clustered tables
 is to compute a physical-vs-logical-order-correlation statistic and use
 that to adjust the estimated cost of indexscans.  I believe this is a
 more robust approach than considering a table to be clustered or not
 clustered, since it can deal with the gradual degradation of clustered
 order over time.  However, I will not make any great claims for the
 specific equations currently used for this purpose --- they're surely in
 need of improvement.  Feel free to take a look and see if you have any
 ideas.  The collection of the statistic is in commands/analyze.c and the
 use of it is in optimizer/path/costsize.c.

Tom, should we be updating that flag after we CLUSTER instead of
requiring an ANALYZE after the CLUSTER?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Gavin Sherry wrote:
 Hi all,
 
 It occured to me on the plane home that now that CLUSTER is fixed we may
 be able to put pg_index.indisclustered to use. If CLUSTER was to set
 indisclustered to true when it clusters a heap according to the given
 index, we could speed up sequantial scans. There are two possible ways.
 
 1) Planner determines that a seqscan is appropriate *and* the retrieval is
 qualified by the key(s) of one of the relation's indexes
 2) Planner determines that the relation is clustered on disk according to
 the index over the key(s) used to qualify the retrieval
 3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
 4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
 5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
 different from SeqNext) called SeqClusterNext
 6) SeqClusterNext() has all the heapgettup() logic with two
 exceptions: a) we find the first tuple more intelligently (instead of
 scanning from the first page) b) if we have found tuple(s) matching the
 ScanKey when we encounter an non-matching tuple (via
 HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
 scan

Gavin, is that a big win compared to just using the index and looping
through the entries, knowing that the index matches are on the same
page, and the heap matches are on the same page.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry


On Sat, 3 Aug 2002, Bruce Momjian wrote:

 Gavin Sherry wrote:
  Hi all,
  
  It occured to me on the plane home that now that CLUSTER is fixed we may
  be able to put pg_index.indisclustered to use. If CLUSTER was to set
  indisclustered to true when it clusters a heap according to the given
  index, we could speed up sequantial scans. There are two possible ways.
  
  1) Planner determines that a seqscan is appropriate *and* the retrieval is
  qualified by the key(s) of one of the relation's indexes
  2) Planner determines that the relation is clustered on disk according to
  the index over the key(s) used to qualify the retrieval
  3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
  4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
  5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
  different from SeqNext) called SeqClusterNext
  6) SeqClusterNext() has all the heapgettup() logic with two
  exceptions: a) we find the first tuple more intelligently (instead of
  scanning from the first page) b) if we have found tuple(s) matching the
  ScanKey when we encounter an non-matching tuple (via
  HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
  scan
 
 Gavin, is that a big win compared to just using the index and looping
 through the entries, knowing that the index matches are on the same
 page, and the heap matches are on the same page.

Bruce,

It would cut out the index over head. Besides at (1) (above) we would have
determined that an index scan was too expensive and we would be using a
SeqScan instead. This would just be faster, since a) we would locate the
tuples more intelligently b) we wouldn't need to scan the whole heap once
we'd found all tuples matching the scan key.

Gavin


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom, should we be updating that flag after we CLUSTER instead of
  requiring an ANALYZE after the CLUSTER?
 
 Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
 quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
 settle for a note in the CLUSTER docs that recommends a subsequent
 ANALYZE --- this seems no different from recommending ANALYZE after bulk
 data load or other major update of a table.

OK.  I am sure it is not obvious to people to ANALYZE because the data
in their table hasn't changed, just the ordering.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Gavin Sherry wrote:
  Gavin, is that a big win compared to just using the index and looping
  through the entries, knowing that the index matches are on the same
  page, and the heap matches are on the same page.
 
 Bruce,
 
 It would cut out the index over head. Besides at (1) (above) we would have
 determined that an index scan was too expensive and we would be using a
 SeqScan instead. This would just be faster, since a) we would locate the
 tuples more intelligently b) we wouldn't need to scan the whole heap once
 we'd found all tuples matching the scan key.

Yes, but in a clustered table, an index scan is _never_ (?) more
expensive than a sequential scan, at least if the optimizer is working
correctly.  Index scans are slower only because they assume random heap
access, but with a clustered table, there is no random heap access.  The
index takes to right to the spot to start.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane


Gavin Sherry [EMAIL PROTECTED] writes:
 On Sat, 3 Aug 2002, Tom Lane wrote:
 AFAICT you're assuming that the table is *exactly* ordered by the
 clustered attribute.  While this is true at the instant CLUSTER
 completes, the exact ordering will be destroyed by the first insert or
 update :-(.  I can't see much value in creating a whole new scan type

 Sorry, I meant to say that heap_insert() etc would need to set
 indisclustered to false.

itch  You could do that, but only if you are prepared to invent
a mechanism that will instantly invalidate any existing query plans
that assume the clustered ordering is good.

Up to now we've only allowed the planner to make decisions that impact
performace, not correctness of the result.  I'm uncomfortable with the
idea that a clusterscan plan could silently return wrong answers after
someone else updates the table and doesn't tell us they did.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane


Bruce Momjian [EMAIL PROTECTED] writes:
 Tom, should we be updating that flag after we CLUSTER instead of
 requiring an ANALYZE after the CLUSTER?

Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
settle for a note in the CLUSTER docs that recommends a subsequent
ANALYZE --- this seems no different from recommending ANALYZE after bulk
data load or other major update of a table.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

[HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

20 matches

Site Navigation

Mail list logo

Footer information