Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling
On Mon, 22 Mar 2010, Yeb Havinga wrote: Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. Yeah,

Re: [PERFORM] GiST index performance

2010-03-22 Thread Yeb Havinga
Matthew Wakeling wrote: On Sat, 20 Mar 2010, Yeb Havinga wrote: The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not cha

Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling
On Sat, 20 Mar 2010, Yeb Havinga wrote: The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with

Re: [PERFORM] GiST index performance

2010-03-20 Thread Yeb Havinga
On Fri, Mar 19, 2010 at 10:16 PM, Kenneth Marshall wrote: > Hi Yeb, > > I have not looked at the gist code, but would it be possible to > make virtual pages that have a size that is 1/power-of-2 * blocksize. > Then the leaf node could be 1/8 or even 1/16 the size of the full > pagesize. > Hello

Re: [PERFORM] GiST index performance

2010-03-19 Thread Kenneth Marshall
Hi Yeb, I have not looked at the gist code, but would it be possible to make virtual pages that have a size that is 1/power-of-2 * blocksize. Then the leaf node could be 1/8 or even 1/16 the size of the full pagesize. Regards, Ken On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote: > Ye

Re: [PERFORM] GiST index performance

2010-03-19 Thread Yeb Havinga
Yeb Havinga wrote: Since the gistpagesize is derived from the database blocksize, it might be wise to set the blocksize low for this case, I'm going to play with this a bit more. Ok, one last mail before it turns into spam: with a 1KB database blocksize, the query now runs in 30 seconds (orig

Re: [PERFORM] GiST index performance

2010-03-19 Thread Yeb Havinga
Yeb Havinga wrote: Matthew Wakeling wrote: A second quite distinct issue is the general performance of GiST indexes which is also mentioned in the old thread linked from Open Items. For that, we have a test case at http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for btre

Re: [PERFORM] GiST index performance

2010-03-19 Thread Yeb Havinga
Yeb Havinga wrote: Yeb Havinga wrote: Matthew Wakeling wrote: Matthew Wakeling wrote: A second quite distinct issue is the general performance of GiST indexes which is also mentioned in the old thread linked from Open Items. For that, we have a test case at http://archives.postgresql.org/pgsq

Re: [PERFORM] GiST index performance

2010-03-17 Thread Yeb Havinga
Yeb Havinga wrote: Matthew Wakeling wrote: Matthew Wakeling wrote: A second quite distinct issue is the general performance of GiST indexes which is also mentioned in the old thread linked from Open Items. For that, we have a test case at http://archives.postgresql.org/pgsql-performance/2009-0

Re: [PERFORM] GiST index performance

2010-03-16 Thread Yeb Havinga
Matthew Wakeling wrote: Matthew Wakeling wrote: A second quite distinct issue is the general performance of GiST indexes which is also mentioned in the old thread linked from Open Items. For that, we have a test case at http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for

Re: [PERFORM] GiST index performance

2010-03-15 Thread Robert Haas
On Mon, Mar 15, 2010 at 11:58 AM, Matthew Wakeling wrote: > On Thu, 25 Feb 2010, Bruce Momjian wrote: >> >> Was there every any conclusion on this issue? > > Not really. Comments inline: > >> Matthew Wakeling wrote: >>> >>> Revisiting the thread a month back or so, I'm still investigating >>> perf

Re: [PERFORM] GiST index performance

2010-03-15 Thread Matthew Wakeling
On Thu, 25 Feb 2010, Bruce Momjian wrote: Was there every any conclusion on this issue? Not really. Comments inline: Matthew Wakeling wrote: Revisiting the thread a month back or so, I'm still investigating performance problems with GiST indexes in Postgres. Looking at http://wiki.postgresq

Re: [PERFORM] GiST index performance

2010-03-02 Thread Bruce Momjian
Robert Haas wrote: > On Thu, Feb 25, 2010 at 6:44 PM, Bruce Momjian wrote: > > Was this corrected? ?I don't see any commits to seg.c. > > I don't think this was ever reviewed. > > It seems like a good patch but I'd be skeptical of committing it now > unless someone has the time to review it care

Re: [PERFORM] GiST index performance

2010-03-02 Thread Robert Haas
On Thu, Feb 25, 2010 at 6:44 PM, Bruce Momjian wrote: > Was this corrected?  I don't see any commits to seg.c. I don't think this was ever reviewed. It seems like a good patch but I'd be skeptical of committing it now unless someone has the time to review it carefully. If not, let's add it to t

Re: [PERFORM] GiST index performance

2010-03-02 Thread Robert Haas
On Thu, Feb 25, 2010 at 6:42 PM, Bruce Momjian wrote: > Was there every any conclusion on this issue? I don't think so. ...Robert > --- > > Matthew Wakeling wrote: >> >> Revisiting the thread a month back or so, I'm still i

Re: [PERFORM] GiST index performance

2010-02-25 Thread Bruce Momjian
Was this corrected? I don't see any commits to seg.c. --- Matthew Wakeling wrote: > On Thu, 7 May 2009, Oleg Bartunov wrote: > > Did you try Guttman quadratic split algorithm ? We also found linear > > split algorithm for R

Re: [PERFORM] GiST index performance

2010-02-25 Thread Bruce Momjian
Was there every any conclusion on this issue? --- Matthew Wakeling wrote: > > Revisiting the thread a month back or so, I'm still investigating > performance problems with GiST indexes in Postgres. > > Looking at http://w

Re: [PERFORM] GiST index performance

2009-06-26 Thread Robert Haas
On Fri, Jun 26, 2009 at 10:33 AM, Greg Smith wrote: > On Thu, 11 Jun 2009, Tom Lane wrote: > >> Matthew Wakeling writes: >>> >>> Oprofile scares me with the sheer number of options. >> >> You can ignore practically all of them; the defaults are pretty sane. >> The recipe I usually follow is: > > A

Re: [PERFORM] GiST index performance

2009-06-26 Thread Greg Smith
On Thu, 11 Jun 2009, Tom Lane wrote: Matthew Wakeling writes: Oprofile scares me with the sheer number of options. You can ignore practically all of them; the defaults are pretty sane. The recipe I usually follow is: An excellent brain dump from Tom and lots of other good stuff in this th

Re: [PERFORM] GiST index performance

2009-06-17 Thread Heikki Linnakangas
Tom Lane wrote: Matthew Wakeling writes: I'm guessing my next step is to install a version of libc with debugging symbols? Yeah, if you want to find out what's happening in libc, that's what you need. Getting callgraph information from oprofile would also help. Although it won't directly t

Re: [PERFORM] GiST index performance

2009-06-12 Thread Adam Gundy
Matthew Wakeling wrote: Okay, I don't know quite what's happening here. Tom, perhaps you could advise. Running opannotate --source, I get this sort of stuff: /* * Total samples for file : ".../postgresql-8.4beta2/src/backend/access/gist/gistget.c" * * 6880 0.2680 */ and then:

Re: [PERFORM] GiST index performance

2009-06-11 Thread Tom Lane
Matthew Wakeling writes: > The gistnext total doesn't seem to correspond to the amount I get by > adding up all the individual lines in gistnest. Hmm, hadn't you determined that some other stuff was being inlined into gistnext? I'm not really sure how opannotate displays such cases, but this mi

Re: [PERFORM] GiST index performance

2009-06-11 Thread Matthew Wakeling
Okay, I don't know quite what's happening here. Tom, perhaps you could advise. Running opannotate --source, I get this sort of stuff: /* * Total samples for file : ".../postgresql-8.4beta2/src/backend/access/gist/gistget.c" * * 6880 0.2680 */ and then: :static int64

Re: [PERFORM] GiST index performance

2009-06-11 Thread Matthew Wakeling
On Thu, 11 Jun 2009, Matthew Wakeling wrote: A quick grep in the postgres source for mcount reveals no hits. No idea what it does - there is no man page for it. Ah - that's part of gprof. I'll recompile without --enable-profiling and try again. Duh. Matthew -- What goes up must come down. A

Re: [PERFORM] GiST index performance

2009-06-11 Thread Matthew Wakeling
On Thu, 11 Jun 2009, Tom Lane wrote: So it'd be worth converting your functions to V1 style. Does that produce a significant reduction in overhead? (You'll probably say "yes, that's the whole point"). hmm ... memcpy or qsort maybe? Surprise: CPU: Core 2, speed 1998 MHz (estimated) Counte

Re: [PERFORM] GiST index performance

2009-06-11 Thread Tom Lane
Matthew Wakeling writes: > So it seems that btree_gist and bioseg are not using that much CPU at all, > compared to core postgres code. In fact, the majority of time seems to be > spent in libc. Unfortunately my libc doesn't have any debugging symbols. hmm ... memcpy or qsort maybe? > Anyway,

Re: [PERFORM] GiST index performance

2009-06-11 Thread Matthew Wakeling
On Thu, 11 Jun 2009, Tom Lane wrote: Oprofile scares me with the sheer number of options. You can ignore practically all of them; the defaults are pretty sane. Thanks, that was helpful. Here is the top of opreport --long-filenames: CPU: Core 2, speed 1998 MHz (estimated) Counted CPU_CLK_UNHA

Re: [PERFORM] GiST index performance

2009-06-11 Thread Tom Lane
Matthew Wakeling writes: > That sucks. However, as another observation, no calls to "gistfindnext" > are recorded in the profile either, and that's in the same source file as > "gistnext" which is recorded. Could it have been inlined? Probably. > Shouldn't inlining be switched off on a profili

Re: [PERFORM] GiST index performance

2009-06-11 Thread Matthew Wakeling
On Wed, 10 Jun 2009, Tom Lane wrote: Also, no calls to anything including "bioseg" in the name are recorded, although they are definitely called as the GiST support functions for that data type. I have never had any success in getting gprof to profile functions that are in loadable libraries, w

Re: [PERFORM] GiST index performance

2009-06-10 Thread Tom Lane
Matthew Wakeling writes: > The final cumulative time is 5894.06 seconds, which doesn't seem to match > the query run time at all. Depending on the platform you're using, gprof might have the wrong idea about the kernel's tick rate, leading to its numbers being some multiple or fraction of the tr

Re: [PERFORM] GiST index performance

2009-06-10 Thread Matthew Wakeling
On Fri, 5 Jun 2009, Robert Haas wrote: On Thu, Jun 4, 2009 at 12:33 PM, Matthew Wakeling wrote: Do you have a recommendation for how to go about profiling Postgres, what profiler to use, etc? I'm running on Debian Linux x86_64. I mostly compile with --enable-profiling and use gprof. I know To

Re: [PERFORM] GiST index performance

2009-06-05 Thread Robert Haas
On Thu, Jun 4, 2009 at 12:33 PM, Matthew Wakeling wrote: > Do you have a recommendation for how to go about profiling Postgres, what > profiler to use, etc? I'm running on Debian Linux x86_64. I mostly compile with --enable-profiling and use gprof. I know Tom Lane has had success with oprofile fo

[PERFORM] GiST index performance

2009-06-04 Thread Matthew Wakeling
Revisiting the thread a month back or so, I'm still investigating performance problems with GiST indexes in Postgres. Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd like to clarify the contrib/seg issue. Contrib/seg is vulnerable to pathological behaviour which is f

Re: [PERFORM] GiST index performance

2009-05-07 Thread Matthew Wakeling
On Thu, 7 May 2009, Oleg Bartunov wrote: Did you try Guttman quadratic split algorithm ? We also found linear split algorithm for Rtree. The existing (bugfixed) seg split algorithm is the Guttman quadratic split algorithm. Guttman did all his work on two-dimensional and above data, dismissing

Re: [PERFORM] GiST index performance

2009-05-07 Thread Oleg Bartunov
On Wed, 6 May 2009, Tom Lane wrote: Matthew Wakeling writes: Here is my patch ported over to the seg contrib package, attached. Apply it to seg.c and all should be well. A similar thing needs to be done to cube, but I haven't looked at that. Teodor, Oleg, do you intend to review/apply this p

Re: [PERFORM] GiST index performance

2009-05-06 Thread Tom Lane
Matthew Wakeling writes: > Here is my patch ported over to the seg contrib package, attached. Apply > it to seg.c and all should be well. A similar thing needs to be done to > cube, but I haven't looked at that. Teodor, Oleg, do you intend to review/apply this patch? re

Re: [PERFORM] GiST index performance

2009-04-22 Thread Matthew Wakeling
On Wed, 22 Apr 2009, Matthew Wakeling wrote: I will post a patch when I have ported my bioseg code over to the seg data type. Here is my patch ported over to the seg contrib package, attached. Apply it to seg.c and all should be well. A similar thing needs to be done to cube, but I haven't lo

Re: [PERFORM] GiST index performance

2009-04-22 Thread Matthew Wakeling
On Tue, 21 Apr 2009, Matthew Wakeling wrote: Unfortunately, it seems there is another bug in the picksplit function. My patch fixes a bug that reveals this new bug. The whole picksplit algorithm is fundamentally broken, and needs to be rewritten completely, which is what I am doing. I have no

Re: [PERFORM] GiST index performance

2009-04-21 Thread Matthew Wakeling
On Mon, 20 Apr 2009, Teodor Sigaev wrote: Looks like contrib/cube has the same error. I don't see a similar code pattern elsewhere though. Oleg, Teodor, do you concur that this is a correct patch? Is it safe to back-patch (I think it should be)? Yeah, good catch, and it doesn't touch any alrea

Re: [PERFORM] GiST index performance

2009-04-20 Thread Tom Lane
Matthew Wakeling writes: > I have found a bug in the contrib package seg, which has been copied into > the bioseg data type as well. It causes the index to be created with > horribly bad unselective trees, so that when a search is performed many of > the branches of the tree need to be followed

Re: [PERFORM] GiST index performance

2009-04-20 Thread Matthew Wakeling
On Fri, 17 Apr 2009, Matthew Wakeling wrote: I have done a bit of investigation, and I think I might have found the smoking gun I was looking for. I have found a bug in the contrib package seg, which has been copied into the bioseg data type as well. It causes the index to be created with hor

Re: [PERFORM] GiST index performance

2009-04-17 Thread Matthew Wakeling
On Thu, 16 Apr 2009, Tom Lane wrote: Matthew, can you put together a self-contained test case with a similar slowdown? I have done a bit of investigation, and I think I might have found the smoking gun I was looking for. I just added a load of debug to the gist consistent function on the bios

Re: [PERFORM] GiST index performance

2009-04-16 Thread Craig Ringer
dforum wrote: > hello, > > there is other performance problem on this request. > > If you analyse query plan, you see that most of the time are lost during > sequencial scan, and you have 2 seq scan. > > You have to create other indexes to match the request. > > Postgresq is totally dependant o

Re: [PERFORM] GiST index performance

2009-04-16 Thread Matthew Wakeling
On Thu, 16 Apr 2009, Tom Lane wrote: Hmm, and what is shared_buffers set to? How big are the tables and other indexes used in the query? We still have to explain why the inner nestloop got slower, and it's hard to see that unless something stopped fitting in cache. I just noticed that someone

Re: [PERFORM] GiST index performance

2009-04-16 Thread Tom Lane
Matthew Wakeling writes: > On Thu, 16 Apr 2009, Tom Lane wrote: >> Also, what are the physical sizes of the two indexes? > location_object_start_gist | 193 MB > location_object_start | 75 MB >> I notice that the inner nestloop join gets slower too, when it's not >> changed at all --- th

Re: [PERFORM] GiST index performance

2009-04-16 Thread Matthew Wakeling
On Thu, 16 Apr 2009, Tom Lane wrote: Matthew, can you put together a self-contained test case with a similar slowdown? It isn't the smoking gun I thought it would be, but: CREATE TABLE a AS SELECT a FROM generate_series(1,100) AS a(a); CREATE TABLE b AS SELECT b FROM generate_series(1,1000

Re: [PERFORM] GiST index performance

2009-04-16 Thread Tom Lane
dforum writes: > If you analyse query plan, you see that most of the time are lost during > sequencial scan, and you have 2 seq scan. I think you missed the loops count. >> -> Index Scan using location_object_start_gist on location l1 >>(cost=0.00..4.16 rows=150 width=

Re: [PERFORM] GiST index performance

2009-04-16 Thread dforum
hello, there is other performance problem on this request. If you analyse query plan, you see that most of the time are lost during sequencial scan, and you have 2 seq scan. You have to create other indexes to match the request. Postgresq is totally dependant on index to reach is performance

Re: [PERFORM] GiST index performance

2009-04-16 Thread Matthew Wakeling
On Thu, 16 Apr 2009, dforum wrote: there is other performance problem on this request. If you analyse query plan, you see that most of the time are lost during sequencial scan, and you have 2 seq scan. Nonsense. Sequential scans account for all of one or two seconds of processing in these qu

Re: [PERFORM] GiST index performance

2009-04-16 Thread Tom Lane
"Kevin Grittner" writes: > Matthew Wakeling wrote: >> I have been doing some queries that are best answered with GiST >> indexes > For what definition of "best answered"? > Since an index is only a performance tuning feature (unless declared > UNIQUE), and should never alter the results (beyo

Re: [PERFORM] GiST index performance

2009-04-16 Thread Matthew Wakeling
On Thu, 16 Apr 2009, Kevin Grittner wrote: Matthew Wakeling wrote: I have been doing some queries that are best answered with GiST indexes For what definition of "best answered"? Since an index is only a performance tuning feature (unless declared UNIQUE), and should never alter the results

Re: [PERFORM] GiST index performance

2009-04-16 Thread Kevin Grittner
Matthew Wakeling wrote: > I have been doing some queries that are best answered with GiST > indexes For what definition of "best answered"? Since an index is only a performance tuning feature (unless declared UNIQUE), and should never alter the results (beyond possibly affecting row order if t

[PERFORM] GiST index performance

2009-04-16 Thread Matthew Wakeling
I have been doing some queries that are best answered with GiST indexes, however I have found that their performance is a little lacking. I thought I would do a direct comparison on a level playing field. Here are two EXPLAIN ANALYSE results for the same query, with two different indexes. The