Re: [HACKERS] gsoc, oprrest function for text search take 2

Jan Urbański Thu, 14 Aug 2008 02:54:09 -0700

Heikki Linnakangas wrote:

Jan Urbański wrote:
Not good... Shall I try sorting pg_statistics arrays on text valuesinstead of frequencies?
Yeah, I'd go with that. If you only do it for the newSTATISTIC_KIND_MCV_ELEMENT statistics, you shouldn't need to change anyother code.


OK, will do.

BTW: I just noticed some text_to_cstring calls, they came fromelog(DEBUG1)s that I have in my code. But they couldn't have skewn theresults much, could they?
Well, text_to_cstring was consuming 1.1% of the CPU time on its own, andpresumably some of the AllocSetAlloc overhead is attributable to that aswell. And perhaps some of the detoasting as well.
Speaking of which, a lot of time seems to be spent on detoasting. I'dlike to understand that a better. Where is the detoasting coming from?

Hmm, maybe bttext_pattern_cmp does some detoasting? It callsPG_GETARG_TEXT_PP(), which in turn calls pg_detoast_datum_packed(). Oh,and also I think that compare_lexeme_textfreq() uses DatumGetTextP() andthat also does detoasting. The root of all evil could by keeping a Datumin the TextFreq array, and not a "text *", which is something youpointed out earlier and I apparently didn't understand.


So right now the idea is to:
 (1) pre-sort STATISTIC_KIND_MCELEM values
 (2) build an array of pointers to detoasted values in tssel()
 (3) use binary search when looking for MCELEMs during tsquery analysis

Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

Reply via email to