Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-29 Thread Tom Lane
Jesper Krogh jes...@krogh.cc writes: I'm currently trying to figure out why the tsearch performance seems to vary a lot between different queryplans. I have created a sample dataset that sort of resembles the data I have to work on. The script that builds the dataset is at:

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-27 Thread Jesper Krogh
Craig Ringer wrote: On 8.4 on a different system Pg uses the seq scan by preference, with a runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when searching for `commonterm' even when enable_seqscan is set to `off'. A search for `commonterm80' also uses a seq scan (1067ms),

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-27 Thread Robert Haas
On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh jes...@krogh.cc wrote: Hi. I'm currently trying to figure out why the tsearch performance seems to vary a lot between different queryplans. I have created a sample dataset that sort of resembles the data I have to work on. The script that builds

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-27 Thread jesper
On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh jes...@krogh.cc wrote: Given that the seq-scan have to visit 50K row to create the result and the bitmap heap scan only have to visit 40K (but search the index) we would expect the seq-scan to be at most 25% more expensive than the bitmap-heap

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-27 Thread Robert Haas
On Tue, Oct 27, 2009 at 11:08 AM, jes...@krogh.cc wrote: In my example the seq-scan evaulates 50K tuples and the heap-scan 40K. The question is why does the per-tuple evaluation become that much more expensive (x7.5)[1] on the seq-scan than on the index-scan, when the complete dataset indeed

[PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-26 Thread Jesper Krogh
Hi. I'm currently trying to figure out why the tsearch performance seems to vary a lot between different queryplans. I have created a sample dataset that sort of resembles the data I have to work on. The script that builds the dataset is at: http://krogh.cc/~jesper/build-test.pl and

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-26 Thread Craig Ringer
On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote: Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. The dataset consists of words randomized, but .. all records contains commonterm, around 80% contains commonterm80 and so on.. my $rand = rand();

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-26 Thread Jesper Krogh
Craig Ringer wrote: On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote: Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. The dataset consists of words randomized, but .. all records contains commonterm, around 80% contains commonterm80 and so on.. my

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-26 Thread Craig Ringer
On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote: You should probably re-generate your random value for each call rather than store it. Currently, every document with commonterm20 is guaranteed to also have commonterm40, commonterm60, etc, which probably isn't very realistic, and

Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

2009-10-26 Thread Jesper Krogh
Craig Ringer wrote: On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote: You should probably re-generate your random value for each call rather than store it. Currently, every document with commonterm20 is guaranteed to also have commonterm40, commonterm60, etc, which probably isn't very