Re: [PATCHES] [HACKERS] Index creation takes for ever

2004-03-17 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Where was it posted anyway?

 Found it:

   
 http://groups.google.com/groups?hl=enlr=ie=UTF-8selm=200312010450.hB14ovH16330%40candle.pha.pa.usrnum=8

Thanks.  The original patch is much older than I thought --- I was
looking in the November/December part of the archives.

 Personally, because frequently accessed duplicates appear more forward
 in the duplicate index, I think the sorting is only valuable when
 creating a new index.

Yes, and that's what this does.  Looking back, the original discussion
got a little confused because the TODO item about order duplicate index
entries by tid got brought into the mix.  Actually this patch has
nothing to do with that, because it only acts during btree creation not
during index updates.

On inspection I have no problem with the patch, only with the comments ;-)
If you like I'll revise the comments and apply.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] [HACKERS] Index creation takes for ever

2004-03-16 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Where are we on this?   It seems like a win to me.

I thought it was a bad idea, although I no longer remember the details.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Index creation takes for ever

2004-03-16 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 If I remember correctly, you didn't like the index routines reading the
 tuple information, or something like that, but there was a performance
 benefit for duplicate keys, so I think we should re-investigate this.

I don't see the actual patch either in the hackers or patches archives,
nor on your to-apply pages, making it a bit difficult to re-investigate.
Where was it posted anyway?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Index creation takes for ever

2004-03-16 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  If I remember correctly, you didn't like the index routines reading the
  tuple information, or something like that, but there was a performance
  benefit for duplicate keys, so I think we should re-investigate this.
 
 I don't see the actual patch either in the hackers or patches archives,
 nor on your to-apply pages, making it a bit difficult to re-investigate.
 Where was it posted anyway?

Found it:


http://groups.google.com/groups?hl=enlr=ie=UTF-8selm=200312010450.hB14ovH16330%40candle.pha.pa.usrnum=8

Personally, because frequently accessed duplicates appear more forward
in the duplicate index, I think the sorting is only valuable when
creating a new index.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-12-02 Thread Manfred Koizar
On Mon, 01 Dec 2003 13:32:10 -0500, Tom Lane [EMAIL PROTECTED] wrote:
Manfred Koizar [EMAIL PROTECTED] writes:
 comparetup_index() compares two IndexTuples.  The structure
 IndexTupleData consists basically of not much more than an ItemPointer,
 and the patch is not much more than adding a comparison of two
 ItemPointers.  So how does the patch introduce a new low level
 implementation dependency?

Because it sorts on tuple position, which is certainly about as low
level as you can get.

The patch affects only the sort during index creation.  Mapping key
values to tuple positions is the sole purpose of an index.  The notion
that an index should not care for tuple positions looks a bit strange to
me.

  More to the point, though, no evidence has been
provided that this is a good idea.

The test script I posted with the patch shows that the patch produces
more efficient b-tree indices when there are lots of duplicates.

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-12-01 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 comparetup_index() compares two IndexTuples.  The structure
 IndexTupleData consists basically of not much more than an ItemPointer,
 and the patch is not much more than adding a comparison of two
 ItemPointers.  So how does the patch introduce a new low level
 implementation dependency?

Because it sorts on tuple position, which is certainly about as low
level as you can get.  More to the point, though, no evidence has been
provided that this is a good idea.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-11-30 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---


Manfred Koizar wrote:
 On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane [EMAIL PROTECTED]
 wrote:
 [EMAIL PROTECTED] writes:
  it took 69 minutes to finish, 75% of this time was devoted to create 2
  indexes on varchar(2) with value being 'O', 'N' or null;
 
 I still say it's either strcoll or qsort's fault.
 
 If qsort is to blame, then maybe this patch could help.  It sorts
 equal key values on item pointer.  And if it doesn't help index
 creation speed, at least the resulting index has better correlation.
 
 Test script:
 CREATE TABLE t (i int NOT NULL, t text NOT NULL);
 INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf');
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf');
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 INSERT INTO t SELECT * FROM t;
 ANALYZE t;
 CREATE INDEX t_i ON t(i);
 SET enable_seqscan = 0;
 SELECT ctid FROM t WHERE i=100 LIMIT 10;
 
 Result without patch:
ctid
 --
  (153,14)
  (306,23)
  (305,80)
  (152,91)
   (76,68)
   (38,34)
  (153,34)
  (305,50)
(9,62)
  (305,40)
 (10 rows)
 
 Result with patch:
   ctid
 
   (0,5)
  (0,10)
  (0,15)
  (0,20)
  (0,25)
  (0,30)
  (0,35)
  (0,40)
  (0,45)
  (0,50)
 (10 rows)
 
 For testing purposes I have made a second patch that provides a
 boolean GUC variable sort_index.  It is available here:
 http://www.pivot.at/pg/23.test-IdxTupleSort.diff
 
 Servus
  Manfred

 diff -ruN ../base/src/backend/utils/sort/tuplesort.c 
 src/backend/utils/sort/tuplesort.c
 --- ../base/src/backend/utils/sort/tuplesort.c2003-08-17 21:58:06.0 
 +0200
 +++ src/backend/utils/sort/tuplesort.c2003-09-05 10:04:22.0 +0200
 @@ -2071,6 +2071,33 @@
   (errcode(ERRCODE_UNIQUE_VIOLATION),
errmsg(could not create unique index),
errdetail(Table contains duplicated values.)));
 + else
 + {
 + /*
 +  * If key values are equal, we sort on ItemPointer.  This might help
 +  * for some bad qsort implementation having performance problems
 +  * with many equal items.  OTOH I wouldn't trust such a weak qsort
 +  * to handle pre-sorted sequences very well ...
 +  *
 +  * Anyway, this code doesn't hurt much, and it helps produce indices
 +  * with better index correlation which is a good thing per se.
 +  */
 + ItemPointer tid1 = tuple1-t_tid;
 + ItemPointer tid2 = tuple2-t_tid;
 + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1);
 + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2);
 + 
 + if (blk1 != blk2)
 + return (blk1  blk2) ? -1 : 1;
 + else
 + {
 + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1);
 + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2);
 + 
 + if (pos1 != pos2)
 + return (pos1  pos2) ? -1 : 1;
 + }
 + }
  
   return 0;
  }

 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-11-30 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 If qsort is to blame, then maybe this patch could help.  It sorts
 equal key values on item pointer.  And if it doesn't help index
 creation speed, at least the resulting index has better correlation.

 I will try to apply it within the next 48 hours.

I think this is a *very* dubious idea.  It introduces a low-level
implementation dependency into our sort behavior on the strength of no
more than an unfounded speculation that some platform's broken qsort
might run faster.  Even if the speculation were proven true, I'd be
hesistant to apply it.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-09-08 Thread Zeugswetter Andreas SB SD

  I don't think so, because the patch does nothing to keep the sort
  order once the index is initially created.
 
 As Tom mentioned, we might not want to keep the tid's in order after the
 index is created because he wants the most recent tid's first, so the
 expired ones migrate to the end.

But on average this argument only holds true for unique indexes, no ?
Is there any code that stops the heap lookup after the visible tuple is found ?
At least in an index with more rows per key you will fetch all heaps after the 
first one anyway to get at the next row. This is better done in heap order, no ?

And the bitmap approach will not work for large result sets.

Summa summarum I would leave the TODO item (maybe add a comment 
(only for non-unique, evaluate performance))

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-09-08 Thread Manfred Koizar
On Mon, 8 Sep 2003 11:31:05 +0200, Zeugswetter Andreas SB SD
[EMAIL PROTECTED] wrote:
 As Tom mentioned, we might not want to keep the tid's in order after the
 index is created because he wants the most recent tid's first, so the
 expired ones migrate to the end.

But on average this argument only holds true for unique indexes, no ?
Is there any code that stops the heap lookup after the visible tuple is found ?
At least in an index with more rows per key you will fetch all heaps after the 
first one anyway to get at the next row. This is better done in heap order, no ?

Yes, yes, yes, and yes.  Seems we all agree on that; the patch has
been queued for 7.5.

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-09-07 Thread Bruce Momjian
Manfred Koizar wrote:
 On Sun, 7 Sep 2003 11:43:42 -0400 (EDT), Bruce Momjian
 [EMAIL PROTECTED] wrote:
 I assume this completes this TODO:
 
  * Order duplicate index entries by tid for faster heap lookups
 
 I don't think so, because the patch does nothing to keep the sort
 order once the index is initially created.

As Tom mentioned, we might not want to keep the tid's in order after the
index is created because he wants the most recent tid's first, so the
expired ones migrate to the end.

Seems this patch does what we want currently because it only affects the
initial index structure.

  If you want to post it now, [...]
 
 I did already post it.  It's only the last page or so of the original
 message.  The link in that message points to a testing aid which is
 not part of what I would like to see committed.

OK, in 7.5 queue:

http:/momjian.postgresql.org/cgi-bin/pgpatches2

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-09-07 Thread Manfred Koizar
On Sun, 07 Sep 2003 12:23:28 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
Maybe so, but it would degrade the performance in the unique-index case
if we do it as the TODO is worded.

The patch would only hurt with a unique index, if there are lots of
duplicate tuples at CREATE INDEX time.

My own opinion is that the bitmap-index-lookup approach will be superior

So is mine, but I was not able to do this in 30 lines.  Sorry ;-)

to trying to keep the index entries in TID order.
  
... which the patch does not.  I see its main advantage in creating
better b-tree indices when you restore a large database with many
duplicate index entries.

It is intended to be only effective at CREATE INDEX or REINDEX time.
I don't believe it is activated when you insert a single new entry,
otherwise it wouldn't pass regression tests ...

Servus
 Manfred

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Index creation takes for ever

2003-09-07 Thread Manfred Koizar
On Sun, 7 Sep 2003 11:43:42 -0400 (EDT), Bruce Momjian
[EMAIL PROTECTED] wrote:
I assume this completes this TODO:

   * Order duplicate index entries by tid for faster heap lookups

I don't think so, because the patch does nothing to keep the sort
order once the index is initially created.

 If you want to post it now, [...]

I did already post it.  It's only the last page or so of the original
message.  The link in that message points to a testing aid which is
not part of what I would like to see committed.

Servus
 Manfred

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org