Heikki Linnakangas wrote:
Jan Urbański wrote:
So right now the idea is to:
(1) pre-sort STATISTIC_KIND_MCELEM values
(2) build an array of pointers to detoasted values in tssel()
(3) use binary search when looking for MCELEMs during tsquery analysis
Sounds like a plan. In (2), it's even better to detoast the values
lazily. For a typical one-word tsquery, the binary search will only look
at a small portion of the elements.
Hm, how can I do that? Toast is still a bit black magic to me... Do you
mean I should stick to having Datums in TextFreq? And use DatumGetTextP
in bsearch() (assuming I'll get rid of qsort())? I wanted to avoid that,
so I won't detoast the same value multiple times, but it's true: a
binary search won't touch most elements.
Another thing is, how significant is the time spent in tssel() anyway,
compared to actually running the query? You ran pgbench on EXPLAIN,
which is good to see where in tssel() the time is spent, but if the time
spent in tssel() is say 1% of the total execution time, there's no point
optimizing it further.
Changed to the pgbench script to
select * from manual where tsvector @@ to_tsquery('foo');
and the parameters to
pgbench -n -f tssel-bench.sql -t 1000 postgres
and got
number of clients: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
tps = 12.238282 (including connections establishing)
tps = 12.238606 (excluding connections establishing)
samples % symbol name
174731 31.6200 pglz_decompress
88105 15.9438 tsvectorout
17280 3.1271 pg_mblen
13623 2.4653 AllocSetAlloc
13059 2.3632 hash_search_with_hash_value
10845 1.9626 pg_utf_mblen
10335 1.8703 internal_text_pattern_compare
9196 1.6641 index_getnext
9102 1.6471 bttext_pattern_cmp
8075 1.4613 pg_detoast_datum_packed
7437 1.3458 LWLockAcquire
7066 1.2787 hash_any
6811 1.2325 AllocSetFree
6623 1.1985 pg_qsort
6439 1.1652 LWLockRelease
5793 1.0483 DirectFunctionCall2
5322 0.9631 _bt_compare
4664 0.8440 tsCompareString
4636 0.8389 .plt
4539 0.8214 compare_two_textfreqs
But I think I'll go with pre-sorting anyway, it feels cleaner and neater.
--
Jan Urbanski
GPG key ID: E583D7D2
ouden estin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers