Heikki Linnakangas wrote:
Jan Urbański wrote:
So right now the idea is to:
 (1) pre-sort STATISTIC_KIND_MCELEM values
 (2) build an array of pointers to detoasted values in tssel()
 (3) use binary search when looking for MCELEMs during tsquery analysis

Sounds like a plan. In (2), it's even better to detoast the values lazily. For a typical one-word tsquery, the binary search will only look at a small portion of the elements.

Hm, how can I do that? Toast is still a bit black magic to me... Do you mean I should stick to having Datums in TextFreq? And use DatumGetTextP in bsearch() (assuming I'll get rid of qsort())? I wanted to avoid that, so I won't detoast the same value multiple times, but it's true: a binary search won't touch most elements.

Another thing is, how significant is the time spent in tssel() anyway, compared to actually running the query? You ran pgbench on EXPLAIN, which is good to see where in tssel() the time is spent, but if the time spent in tssel() is say 1% of the total execution time, there's no point optimizing it further.

Changed to the pgbench script to
select * from manual where tsvector @@ to_tsquery('foo');
and the parameters to
pgbench -n -f tssel-bench.sql -t 1000 postgres

and got

number of clients: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
tps = 12.238282 (including connections establishing)
tps = 12.238606 (excluding connections establishing)

samples  %        symbol name
174731   31.6200  pglz_decompress
88105    15.9438  tsvectorout
17280     3.1271  pg_mblen
13623     2.4653  AllocSetAlloc
13059     2.3632  hash_search_with_hash_value
10845     1.9626  pg_utf_mblen
10335     1.8703  internal_text_pattern_compare
9196      1.6641  index_getnext
9102      1.6471  bttext_pattern_cmp
8075      1.4613  pg_detoast_datum_packed
7437      1.3458  LWLockAcquire
7066      1.2787  hash_any
6811      1.2325  AllocSetFree
6623      1.1985  pg_qsort
6439      1.1652  LWLockRelease
5793      1.0483  DirectFunctionCall2
5322      0.9631  _bt_compare
4664      0.8440  tsCompareString
4636      0.8389  .plt
4539      0.8214  compare_two_textfreqs

But I think I'll go with pre-sorting anyway, it feels cleaner and neater.
--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to