Re: [PATCHES] CREATE TABLE LIKE INCLUDING INDEXES support

2007-07-10 Thread NikhilS

Hi,

This version updates the patch to CVS HEAD and has various fixes and

refactoring, including proper docs. I refactored get_opclass_name() into
lsyscache.c, but then realized that this means that lsyscache.c will
depend on commands/indexcmds.c (for GetDefaultOpClass()), which is
arguably improper, so I'm tempted to revert and just duplicate the
syscache lookups in both ruleutils.c and parse_utilcmd.c



Yes, I too would vote for duplicating the  lookups for the sake of
simplicity.

Nikhil: why are both options and inhreloptions necessary in

IndexStmt? Won't at least one be NULL?



Yes, in the CREATE..LIKE case, options will be NULL and in the normal
CREATE..INDEX case inhreloptions will be NULL. Note that options is a List
of DefElem entries, whereas inhreloptions is a char pointer.

The challenge was with converting the stored reloptions belonging to the
parent index into some form which could be consumed by
transformRelOptions(). It was difficult to convert it into a list of DefElem
entries and hence I had to introduce inhreloptions.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [PATCHES] Maintaining cluster order on insert

2007-07-10 Thread Heikki Linnakangas

Tom Lane wrote:

What you would hope to see as the benefit of the patch is that the time
for the range SELECT degrades more slowly as more of the table is
replaced.  Ignoring the first SELECT as being a startup transient, it
looks like HEAD degrades from about 3 msec to 6 msec over 10 iterations
(20% replacement of the table), whereas with the patch it's about 3 msec
to about 4 and a half.  However, the INSERT steps went from around 20
sec each to about twice that.


Clustering in general isn't very important on a table that fits in cache.

Also note that the overhead of descending the b-tree twice on INSERTs 
hurts the most in a CPU bound test like that. The overhead of the 
two-phase method I proposed should be lower.


Test 1:

executed CREATE in   0.066563 sec
executed INSERT in  40.465653 sec
executed CREATE in   9.152698 sec
executed CLUSTE in  20.036375 sec
executed VACUUM in   1.440232 sec

Test 2:

executed CREATE in   0.086862 sec
executed INSERT in  50.746362 sec
executed CREATE in  12.115655 sec
executed CLUSTE in  33.656341 sec
executed VACUUM in   4.306563 sec


Why is there such a big difference in all these initialization steps as 
well? Is it just random noise?


I played a bit with that test program, results from my laptop attached. 
I used a version patched with the latest patch I submitted, because 
that's what I had readily available. I used the same patched binaries in 
both test runs, I just didn't CLUSTER the table in the other run. The 
main difference to your results is that the DELETE, VACUUM and INSERT 
operations are much faster both with and without the patch. Most INSERTs 
for example took  1 s, and in your results they took  15 s. Any idea why?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
executed CREATE in   0.098872 sec
executed INSERT in  36.958824 sec
executed CREATE in   7.320412 sec
executed CLUSTE in  22.579578 sec
executed VACUUM in   0.809287 sec
executed DELETE in   8.567730 sec
executed VACUUM in   5.194736 sec
executed INSERT in   0.882852 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 839347 AND 840347
retrieved  993 tuples in   0.239210 sec
executing select: SELECT count(*) FROM testtab
retrieved 999898 tuples in   0.417749 sec
executed DELETE in   5.398222 sec
executed VACUUM in   0.949003 sec
executed INSERT in   0.771617 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 393988 AND 394988
retrieved 1015 tuples in   0.001261 sec
executing select: SELECT count(*) FROM testtab
retrieved 138 tuples in   0.373199 sec
executed DELETE in   5.613083 sec
executed VACUUM in   1.289501 sec
executed INSERT in   1.434903 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 782316 AND 783316
retrieved 1011 tuples in   0.001723 sec
executing select: SELECT count(*) FROM testtab
retrieved 999826 tuples in   0.353616 sec
executed DELETE in   3.729799 sec
executed VACUUM in   1.406940 sec
executed INSERT in   1.169133 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 797641 AND 798641
retrieved  991 tuples in   0.001118 sec
executing select: SELECT count(*) FROM testtab
retrieved 999731 tuples in   0.340721 sec
executed DELETE in   1.448537 sec
executed VACUUM in   0.918642 sec
executed INSERT in   0.471951 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 910735 AND 911735
retrieved 1029 tuples in   0.001174 sec
executing select: SELECT count(*) FROM testtab
retrieved 999773 tuples in   0.334880 sec
executed DELETE in   0.972467 sec
executed VACUUM in   0.925557 sec
executed INSERT in   0.645214 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 197353 AND 198353
retrieved  975 tuples in   0.001076 sec
executing select: SELECT count(*) FROM testtab
retrieved 999751 tuples in   0.334343 sec
executed DELETE in   0.966780 sec
executed VACUUM in   0.923394 sec
executed INSERT in   0.468693 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 334887 AND 335887
retrieved  983 tuples in   0.001381 sec
executing select: SELECT count(*) FROM testtab
retrieved 999801 tuples in   0.335184 sec
executed DELETE in   0.792456 sec
executed VACUUM in   0.934451 sec
executed INSERT in   0.435028 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 767461 AND 768461
retrieved  998 tuples in   0.001279 sec
executing select: SELECT count(*) FROM testtab
retrieved 999854 tuples in   0.338010 sec
executed DELETE in   0.553884 sec
executed VACUUM in   0.933886 sec
executed INSERT in   0.435980 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 277496 AND 278496
retrieved 1017 tuples in   0.001257 sec
executing select: SELECT count(*) FROM testtab
retrieved 23 tuples in   0.333156 sec
executed DELETE in   0.518371 sec
executed VACUUM in   0.932207 sec
executed INSERT in  18.031924 sec
executing select: SELECT count(*) FROM testtab WHERE k BETWEEN 553415 AND 554415
retrieved 1012 tuples in   0.001280 sec
executing select: 

Re: [PATCHES] Maintaining cluster order on insert

2007-07-10 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

   * delete a random 2% of the table
   * vacuum to recover space
   * insert a random 2% of the table
   * select (about) 1000 consecutively-numbered rows
   * select all the rows (this is just a cross check that the
 number of rows isn't changing too much)

 What you would hope to see as the benefit of the patch is that the time
 for the range SELECT degrades more slowly as more of the table is
 replaced.  Ignoring the first SELECT as being a startup transient, it
 looks like HEAD degrades from about 3 msec to 6 msec over 10 iterations
 (20% replacement of the table), whereas with the patch it's about 3 msec
 to about 4 and a half.  However, the INSERT steps went from around 20
 sec each to about twice that.

I would suggest:

a) continuing the test until you have 100% replacement.

b) defining the tables with a fill-factor large enough to hold at least one
   extra tuple

c) considering this patch alongside GII where it is especially important.

It's pretty serious what you're suggesting since it means that we'll basically
never have a real cluster feature. I would sure hope we're missing something
and there's a way to make this work usefully.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Maintaining cluster order on insert

2007-07-10 Thread Heikki Linnakangas

Gregory Stark wrote:

It's pretty serious what you're suggesting since it means that we'll basically
never have a real cluster feature. I would sure hope we're missing something
and there's a way to make this work usefully.


Another approach would be an online CLUSTER command. That means there'll 
be a lot of churn when tuples need to be moved back and forth, along 
with updating indexes, but it'd take the overhead out of the critical path.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Maintaining cluster order on insert

2007-07-10 Thread Greg Smith

On Mon, 9 Jul 2007, Tom Lane wrote:

The hardware is just a Dell x86_64 workstation with crappy IDE disk, so 
maybe things would look better elsewhere, but it's all I have to work 
with.


Do you have write-caching turned off on the drive so INSERTs are being 
rate-limited by WAL syncs?  Trying to characterize the class of setup 
you're using.


The part that seemed curious to me about your results in the unpatched 
version is why the first INSERT takes 2.4 seconds, while the second takes 
12.2 then later ones settle from 17-23 s.  I could understand the 12-23 
variation, but that the first one fires off in 2.4 seems kind of fishy; 
why so fast?  Is there something that's just fitting in memory in that 
case that's just missing in the patched version?


results-head:
...
executed DELETE in  14.770937 sec
executed VACUUM in  10.663301 sec
executed INSERT in   2.449248 sec (1st)
...
executed INSERT in  12.212027 sec (2nd)


results-patch:
...
executed DELETE in  18.062664 sec
executed VACUUM in  28.487570 sec
executed INSERT in  25.638022 sec (1st)
...
executed INSERT in  40.759404 sec (2nd)

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Maintaining cluster order on insert

2007-07-10 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 It's pretty serious what you're suggesting since it means that we'll
 basically never have a real cluster feature.

It seems entirely likely that this is not the way to go about real
clustering.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Maintaining cluster order on insert

2007-07-10 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 On Mon, 9 Jul 2007, Tom Lane wrote:
 The hardware is just a Dell x86_64 workstation with crappy IDE disk, so 
 maybe things would look better elsewhere, but it's all I have to work 
 with.

 Do you have write-caching turned off on the drive so INSERTs are being 
 rate-limited by WAL syncs?  Trying to characterize the class of setup 
 you're using.

It's just desktop-grade junk :-(.  Feel free to repeat the test on
something more serious.

 The part that seemed curious to me about your results in the unpatched 
 version is why the first INSERT takes 2.4 seconds, while the second takes 
 12.2 then later ones settle from 17-23 s.  I could understand the 12-23 
 variation, but that the first one fires off in 2.4 seems kind of fishy; 

The test seemed to be mostly I/O bound according to vmstat.  I supposed
that it was fast up until the kernel's available space was full of dirty
pages, and then subsequent writes would be constrained by actual disk
write bandwidth.  Anyway the numbers seemed relatively consistent after
the setup and first test cycle, so I think the part I was actually
trying to draw conclusions from was probably real enough.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] CREATE TABLE LIKE INCLUDING INDEXES support

2007-07-10 Thread Bruce Momjian
Neil Conway wrote:
 On Tue, 2007-05-06 at 14:50 +0530, NikhilS wrote:
  PFA, a patch which provides the CREATE .. INCLUDING INDEXES
  functionality. This patch uses the same functionality introduced by
  the earlier patches in this thread albeit for the INCLUDING INDEXES
  case. 
 
 Attached is a revised version of this patch. Sorry for taking so long to
 make progress on this: my new job been keeping my busy, and I've
 recently been ill.

Illness only counts as an excuse if you _don't_ recover.  If you
recover, you weren't sick enough.  ;-)  LOL

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] CREATE TABLE LIKE INCLUDING INDEXES support

2007-07-10 Thread Jaime Casanova

On 7/10/07, Bruce Momjian [EMAIL PROTECTED] wrote:

Neil Conway wrote:
 Attached is a revised version of this patch. Sorry for taking so long to
 make progress on this: my new job been keeping my busy, and I've
 recently been ill.

Illness only counts as an excuse if you _don't_ recover.  If you
recover, you weren't sick enough.  ;-)  LOL



uh! that sounds like my boss talking!

--
regards,
Jaime Casanova

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning.
  Richard Cook

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster