On Wed, Dec 16, 2020 at 8:18 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > Currently, COPY FROM parses the input one line at a time. Each line is > converted to the database encoding separately, or if the file encoding > matches the database encoding, we just check that the input is valid for > the encoding. It would be more efficient to do the encoding > conversion/verification in larger chunks. At least potentially; the > current conversion/verification implementations work one byte a time so > it doesn't matter too much, but there are faster algorithms out there > that use SIMD instructions or lookup tables that benefit from larger inputs.
Hi Heikki, This is great news. I've seen examples of such algorithms and that'd be nice to have. I haven't studied the patch in detail, but it looks fine on the whole. In 0004, it seems you have some doubts about upgrade compatibility. Is that because user-defined conversions would no longer have the right signature? -- John Naylor EDB: http://www.enterprisedb.com