Tom Lane wrote:
Scott Cain <[EMAIL PROTECTED]> writes:

A few days ago, I asked for advice on speeding up substring queries on
the GENERAL mailing list.  Joe Conway helpfully pointed out the ALTER
TABLE STORAGE EXTERNAL documentation.  After doing the alter,
the queries got slower!  Here is the background:

Ah-hah, I've sussed it ... you didn't actually change the storage representation. You wrote:

Yeah, I came to the same conclusion this morning (update longdna set dna = dna || '';), but it still seems that the chunked table is very slightly faster than the substring on the externally stored column:


dna=# explain analyze select pdna from dna where foffset > 6000000 and foffset < 6024000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using foffset_idx on dna (cost=0.00..4.22 rows=14 width=32) (actual time=0.06..0.16 rows=11 loops=1)
Index Cond: ((foffset > 6000000) AND (foffset < 6024000))
Total runtime: 0.27 msec
(3 rows)


dna=# explain analyze select pdna from dna where foffset > 6000000 and foffset < 6024000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using foffset_idx on dna (cost=0.00..4.22 rows=14 width=32) (actual time=0.07..0.16 rows=11 loops=1)
Index Cond: ((foffset > 6000000) AND (foffset < 6024000))
Total runtime: 0.25 msec
(3 rows)


dna=# explain analyze select substr(dna,6002000,20000) from longdna;
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on longdna (cost=0.00..1.01 rows=1 width=32) (actual time=0.46..0.47 rows=1 loops=1)
Total runtime: 0.58 msec
(2 rows)


dna=# explain analyze select substr(dna,6002000,20000) from longdna;
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on longdna (cost=0.00..1.01 rows=1 width=32) (actual time=0.23..0.24 rows=1 loops=1)
Total runtime: 0.29 msec
(2 rows)


I ran each command twice after starting psql to observe the effects of caching.

However with the provided sample data, longdna has only one row, and dna has 122,540 rows, all of which are chunks of the one longdna row. I would tend to think that if you had 1000 or so longdna records indexed on some id column, versus 122,540,000 dna chunks indexed on both an id and segment column, the substring from longdna would win.

Joe


---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to