On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <[email protected]>
wrote:

> On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> > Just a small concern about where some varlenas have a larger binary size
> > than its text representation ex:
> > SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
> >  pg_column_size
> > ----------------
> >              32
> >
> > its text representation is less than sizeof(Vector8) so currently v3
> would
> > enter SIMD path and exit out just from the beginning (two extra branches)
> > because it does this:
> > + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> > + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
> >
> > I thought maybe we could do * 2 or * 4 its binary size, depends on the
> type
> > really but this is just a proposition if this case is something
> concerning.
>
> Can we measure the impact of this?  How likely is this case?
>
I'll respond to this separately in a different email.

>
> > +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState
> cstate, const char *string,
> > +
>                                              bool use_simd, size_t len);
> > +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState
> cstate, const char *string,
> > +
>                                         bool use_quote, bool use_simd,
> size_t len);
>
> Can you test this on its own, too?  We might be able to separate this and
> the change below into a prerequisite patch, assuming they show benefits.
>
I tested inlining alone and found the results were about an improvement of
1% to 4% across all configurations.
The inlining is only meaningful in combination with the SIMD work, for the
reason described below.

>
> >                       if (is_csv)
> > -                             CopyAttributeOutCSV(cstate, string,
> > -
>  cstate->opts.force_quote_flags[attnum - 1]);
> > +                     {
> > +                             if (use_simd)
> > +                                     CopyAttributeOutCSV(cstate, string,
> > +
>      cstate->opts.force_quote_flags[attnum - 1],
> > +
>      true, len);
> > +                             else
> > +                                     CopyAttributeOutCSV(cstate, string,
> > +
>      cstate->opts.force_quote_flags[attnum - 1],
> > +
>      false, len);
>
> There isn't a terrible amount of branching on use_simd in these functions,
> so I'm a little skeptical this makes much difference.  As above, it would
> be good to measure it

I compiled three variants

v3: use_simd passed as compile-time, CopyAttribute functions inlined.
v3_variable: use_simd as is variable, CopyAttribute functions inlined.
v3_variable_noinline: use_simd as is variable, CopyAttribute functions are
not inlined.

None of the helpers are explicitly inlined by us.

The assembly reveals two things:
1) The CSV SIMD helpers (CopyCheckCSVQuoteNeedSIMD, CopySkipCSVEscapeSIMD)
are inlined by the compiler naturally in all
three variants, CopySkipTextSIMD is never inlined by the compiler in any
variant.

2) The constant-emitting approach (v3) does matter (just a little
apparently) specifically for CopySkipTextSIMD.
Its the same story as COPY FROM patch's first commit it just emits code
without use_simd branch
     jbe  ...   ; len > sizeof(Vector8)
     je   ...   ; need_transcoding
     call CopySkipTextSIMD

Whether the extra branching in for constant passing is worth it or not is
demonstrated by the benchmark.


  Test                 Master    v3       v3_var   v3_var_noinl
  TEXT clean           1504ms   -24.1%   -23.0%   -21.5%
  CSV clean            1760ms   -34.9%   -32.7%   -33.0%
  TEXT 1/3 backslashes     3763ms    +4.6%    +6.9%   +4.1%
  CSV 1/3 quotes           3885ms    +3.1%    +2.7%    -0.8%

Wide table TEXT (integer columns):

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2083ms   -0.7%    -0.6%    +3.5%
  100     4094ms   -0.1%    -0.5%    +4.5%
  200     1560ms   +0.6%    -2.3%    +3.2%
  500     1905ms   -1.0%    -1.3%    +4.7%
  1000    1455ms   +1.8%    +0.4%    +4.3%

Wide table CSV:

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2421ms   +4.0%    +6.7%    +5.8%
  100     4980ms   +0.1%    +2.0%     +0.1%
  200     1901ms   +1.4%    +3.5%    +1.4%
  500     2328ms   +1.8%    +2.7%    +2.2%
  1000    1815ms   +2.0%    +2.8%    +2.5%

I'm not sure whether there's a diff between v3 and v3_var practically
speaking, what do you think ?


Regards,
Ayoub

Reply via email to