Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-03-21 Thread Tom Lane
Last month I wrote: It seems clear that our qsort.c is doing a pretty awful job of picking qsort pivots, while glibc is mostly managing not to make that mistake. I re-ran Gary's test script using the just-committed improvements to qsort.c, and got pretty nice numbers (attached --- compare to

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-03-02 Thread Bruce Momjian
] Strange Create Index behaviour) Ron [EMAIL PROTECTED] writes: How are we choosing our pivots? See qsort.c: it looks like median of nine equally spaced inputs (ie, the 1/8th points of the initial input array, plus the end points), implemented as two rounds of median-of-three

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-16 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes: I think the reason I wasn't seeing performance issues with normal sort operations is because they use work_mem not maintenance_work_mem which was only set to 2048 anyway. Does that sound right? Very probable. Do you want to test the theory by jacking that

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades
Tom Lane wrote: Interesting. I tried your test script and got fairly close times for all the cases on two different machines: old HPUX machine: shortest 5800 msec, longest 7960 msec new Fedora 4 machine: shortest 461 msec, longest 608 msec (the HPUX machine was doing other stuff

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades
Tom Lane wrote: shared_buffers is unlikely to impact index build time noticeably in recent PG releases. maintenance_work_mem would affect it a lot, though. What setting were you using for that? Also, i tried upping maintenance_work_mem to 65536 and it didn't make much difference (maybe 10%

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Simon Riggs
On Wed, 2006-02-15 at 20:00 +, Gary Doades wrote: I have put together a test case Please enable trace_sort=on and then repeat tests and post the accompanying log file. I think this is simply the sort taking longer depending upon the data distribution, but I'd like to know for sure.

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Tom Lane
I wrote: Interesting. I tried your test script and got fairly close times for all the cases on two different machines: old HPUX machine: shortest 5800 msec, longest 7960 msec new Fedora 4 machine: shortest 461 msec, longest 608 msec So what this looks like to me is a corner case

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades
Tom Lane wrote: I tried forcing PG to use src/port/qsort.c on the Fedora machine, and lo and behold: new Fedora 4 machine: shortest 434 msec, longest 8530 msec So it sure looks like this script does expose a problem on BSD-derived qsorts. Curiously, the case that's much the worst for

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: Please enable trace_sort=on and then repeat tests and post the accompanying log file. I did this on my Fedora machine with port/qsort.c, and got the results attached. Curiously, this run has the spikes in completely different places than the prior one did.

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes: Interestingly, if I don't delete the table after a run, but just drop and re-create the index repeatedly it stays a pretty consistent time, either repeatedly good or repeatedly bad! This is consistent with the theory of a data-dependent performance

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades
Tom Lane wrote: So it sure looks like this script does expose a problem on BSD-derived qsorts. Curiously, the case that's much the worst for me is the third in the script, while the shortest time is the first case, which was slow for Gary. So I'd venture that the *BSD code has been tweaked

qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes: If I run the script again, it is not always the first case that is slow, it varies from run to run, which is why I repeated it quite a few times for the test. For some reason I hadn't immediately twigged to the fact that your test script is just N

Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes: Is this likely to hit me in a random fashion during normal operation, joins, sorts, order by for example? Yup, anytime you're passing data with that kind of distribution through a sort. So the options are: 1) Fix the included qsort.c code and use that

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
I wrote: Gary Doades [EMAIL PROTECTED] writes: Ouch! That confirms my problem. I generated the random test case because it was easier than including the dump of my tables, but you can appreciate that tables 20 times the size are basically crippled when it comes to creating an index on

Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Ron [EMAIL PROTECTED] writes: How are we choosing our pivots? See qsort.c: it looks like median of nine equally spaced inputs (ie, the 1/8th points of the initial input array, plus the end points), implemented as two rounds of median-of-three choices. With half of the data inputs zero, it's not

Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Simon Riggs
On Wed, 2006-02-15 at 20:00 +, Gary Doades wrote: I have put together a test case that demonstrates the problem (see below). I create a simple table, as close in structure to one of my problem tables and populate an integer column with 100,000 zeros follow by 100,000 random integers

Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Christopher Kings-Lynne
Ouch! That confirms my problem. I generated the random test case because it was easier than including the dump of my tables, but you can appreciate that tables 20 times the size are basically crippled when it comes to creating an index on them. I have to say that I restored a few gigabyte