Re: [PERFORM] Improve Seq scan performance

Vladimir Sitnikov Mon, 10 Nov 2008 00:19:29 -0800

>
> Maybe there's some hybrid type possible where you can scan the index to
> find large table regions that are known /not/ to contain tuples of interest
> and seek over them in your scan. I wouldn't know, really, but it sounds like
> it'd probably be more I/O than a pure seq scan (given the reading of the
> index too) unless the table had the values of interest rather neatly
> clustered. It'd also surely use more memory and CPU time processing the
> whole index to find table regions without values of interest.



>
> Is that what you meant, though?

Not exactly.  I mean the following:  there are cases when index scan even
over non-clustered values is a complete win (basically, it is a win when the
number of returned values is relatively small no matter is it due to
selectivity or due to limit clause).
The test case that I have provided creates a 667 pages long table and 30
pages long index thus a complete scan of the index is 22 times faster in
terms of I/O.

Suppose you want to find all the values that contain '%123%'. Currently
PostgreSQL will do a sec scan, while the better option might be (and it is)
to loop through all the items in the index (it will cost 30 I/O), find
records that truly contain %123% (it will find 20 of them) and do 20 I/O to
check tuple visiblity. That is 50 I/O versus 667 for seq scan.



> A b-tree index cannot be used on a LIKE query with a leading wildcard. See
> the FAQ.

Unfortunately it is true. I would love to improve that particular case.

In addition, if your database is not in the C locale you can't use an
> ordinary index for LIKE queries. See the FAQ. You need to create a
> text_pattern_ops index instead:
>
> create index i_ix_txt on seq_test(i text_pattern_ops);

Good catch. However, that does not change the results. PostgresSQL does the
same amount of 2529 I/O for index scan on '%123%' for some unknown reason.


>
>
>  set enable_seqscan=off
>> -- Index Scan reads 2529 pages for some reason. I would expect *30 *(index
>> size) + *20 *(number of matching entries) = 50 pages maximum, that is 10
>> times better than with seq scan.
>> Index Scan using i_ix on seq_test  (cost=0.00..1643.74 rows=356 width=508)
>> (actual time=0.334..16.746 rows=*20 *loops=1 read_shared=2529(2529)
>> read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
>>  Filter: (i ~~ '%123%'::text)
>> Total runtime: 16.863 ms
>>
>
> I think it's reading the whole index, because it can't do a prefix search
> if there's a leading wildcard. I'm a bit confused, though, since I thought
> in this case it couldn't actually execute the query w/o a sequential scan,
> and would just use one irrespective of the enable_seqscan param. That's what
> happens here.

Please, follow the case carefully:  the index is only 30 pages long. Why is
PostgreSQL doing 2529 I/O? It drives me crazy.


Regards,
Vladimir Sitnikov

Re: [PERFORM] Improve Seq scan performance

Reply via email to