At Wed, 13 Feb 2019 16:38:50 +0100, "Daniel Verite" <dan...@manitou-mail.org> wrote in <bc319ec6-60d0-4878-a800-bcc12a190...@manitou-mail.org> > Hi, > > replace_text() in varlena.c builds the result in a StringInfo buffer, > and finishes by copying it into a freshly allocated varlena structure > with cstring_to_text_with_len(), in the same memory context. > > It looks like that copy step could be avoided by preprending the > varlena header to the StringInfo to begin with, and return the buffer > as a text*, as in the attached patch. > > On large strings, the time saved can be significant. For instance > I'm seeing a ~20% decrease in total execution time on a test with > lengths in the 2-3 MB range, like this: > > select sum(length( > replace(repeat('abcdefghijklmnopqrstuvwxyz', i*10), 'abc', 'ABC') > )) > from generate_series(10000,12000) as i; > > Also, at a glance, there are a few other functions with similar > StringInfo-to-varlena copies that seem avoidable: > concat_internal(), text_format(), replace_text_regexp(). > > Are there reasons not to do this? Otherwise, should it be considered > in in a more principled way, such as adding to the StringInfo API > functions like void InitStringInfoForVarlena() and > text *StringInfoAsVarlena()?
First, I agree that the waste of cycles should be eliminated. Grepping with 'cstring_to_text_with_len\(.*[\.>]data,.*\)' shows many instances of the use. Though StringInfo seems very open-minded, the number of instances would be a good reason to have new API functions. That is, I vote for providing a set of API for the use in StringInfo. But it seems to be difficult to name the latter function. The name convention for the object is basically <verb>StringInfo. getVarlenaStringInfo/getTextStringInfo apparently fits the convention but seems to me a bit strange. regards. -- Kyotaro Horiguchi NTT Open Source Software Center