Hello, Following Nazir's recommendation to move this to a different thread so it can be looked at separately.
On Thu, Jan 8, 2026 at 2:49 PM Manni Wood <[email protected]> wrote: > On Wed, 24 Dec 2025 at 18:08, KAZAR Ayoub <[email protected]> wrote: >>>> > >>>> > Hello, >>>> > Following the same path of optimizing COPY FROM using SIMD, i found >>>> that COPY TO can also benefit from this. >>>> > >>>> > I attached a small patch that uses SIMD to skip data and advance as >>>> far as the first special character is found, then fallback to scalar >>>> processing for that character and re-enter the SIMD path again... >>>> > There's two ways to do this: >>>> > 1) Essentially we do SIMD until we find a special character, then >>>> continue scalar path without re-entering SIMD again. >>>> > - This gives from 10% to 30% speedups depending on the weight of >>>> special characters in the attribute, we don't lose anything here since it >>>> advances with SIMD until it can't (using the previous scripts: 1/3, 2/3 >>>> specials chars). >>>> > >>>> > 2) Do SIMD path, then use scalar path when we hit a special >>>> character, keep re-entering the SIMD path each time. >>>> > - This is equivalent to the COPY FROM story, we'll need to find the >>>> same heuristic to use for both COPY FROM/TO to reduce the regressions (same >>>> regressions: around from 20% to 30% with 1/3, 2/3 specials chars). >>>> > >>>> > Something else to note is that the scalar path for COPY TO isn't as >>>> heavy as the state machine in COPY FROM. >>>> > >>>> > So if we find the sweet spot for the heuristic, doing the same for >>>> COPY TO will be trivial and always beneficial. >>>> > Attached is 0004 which is option 1 (SIMD without re-entering), 0005 >>>> is the second one. >>> >>> > Ayoub Kazar, I tested your v4 "copy to" patch, doing everything in RAM, > and using the cpupower tips from above. (I wanted to test your v5, but `git > apply --check` gave me an error, so I can look at that another day.) > > The results look great: > > master: (forgot to get commit hash) > > text, no special: 8165 > text, 1/3 special: 22662 > csv, no special: 9619 > csv, 1/3 special: 23213 > > v4 (copy to) > > text, no special: 4577 (43.9% speedup) > text, 1/3 special: 22847 (0.8% regression) > csv, no special: 4720 (50.9% speedup) > csv, 1/3 special: 23195 (0.07% regression) > > Seems like a very clear win to me! > -- Manni Wood EDB: https://www.enterprisedb.com > Currently optimizing COPY FROM using SIMD is still under review, but for the case of COPY TO using the same ideas, we found that the problem is trivial, the attached patch gives very nice speedups as confirmed by Manni's benchmarks. Regards, Ayoub
From bfc580b17ad5e6d981adc146c24690afe4634ce1 Mon Sep 17 00:00:00 2001 From: AyoubKAZ <[email protected]> Date: Wed, 24 Dec 2025 12:55:15 +0100 Subject: [PATCH] Speed up COPY TO text CSV using SIMD --- src/backend/commands/copyto.c | 126 ++++++++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index dae91630ac3..e1306728509 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -31,6 +31,8 @@ #include "mb/pg_wchar.h" #include "miscadmin.h" #include "pgstat.h" +#include "port/pg_bitutils.h" +#include "port/simd.h" #include "storage/fd.h" #include "tcop/tcopprot.h" #include "utils/lsyscache.h" @@ -1266,6 +1268,36 @@ CopyAttributeOutText(CopyToState cstate, const char *string) if (cstate->encoding_embeds_ascii) { start = ptr; + #ifndef USE_NO_SIMD + { + const char* end = ptr + strlen(ptr); + while (ptr + sizeof(Vector8) <= end) { + Vector8 chunk; + Vector8 control_mask; + Vector8 backslash_mask; + Vector8 delim_mask; + Vector8 special_mask; + uint32 mask; + + vector8_load(&chunk, (const uint8 *) ptr); + control_mask = vector8_gt(vector8_broadcast(0x20), chunk); + backslash_mask = vector8_eq(vector8_broadcast('\\'), chunk); + delim_mask = vector8_eq(vector8_broadcast(delimc), chunk); + + special_mask = vector8_or(control_mask, vector8_or(backslash_mask, delim_mask)); + + mask = vector8_highbit_mask(special_mask); + if (mask != 0) { + int advance = pg_rightmost_one_pos32(mask); + ptr += advance; + break; + } + + ptr += sizeof(Vector8); + } + } + #endif + while ((c = *ptr) != '\0') { if ((unsigned char) c < (unsigned char) 0x20) @@ -1326,6 +1358,36 @@ CopyAttributeOutText(CopyToState cstate, const char *string) else { start = ptr; + #ifndef USE_NO_SIMD + { + const char* end = ptr + strlen(ptr); + while (ptr + sizeof(Vector8) <= end) { + Vector8 chunk; + Vector8 control_mask; + Vector8 backslash_mask; + Vector8 delim_mask; + Vector8 special_mask; + uint32 mask; + + vector8_load(&chunk, (const uint8 *) ptr); + control_mask = vector8_gt(vector8_broadcast(0x20), chunk); + backslash_mask = vector8_eq(vector8_broadcast('\\'), chunk); + delim_mask = vector8_eq(vector8_broadcast(delimc), chunk); + + special_mask = vector8_or(control_mask, vector8_or(backslash_mask, delim_mask)); + + mask = vector8_highbit_mask(special_mask); + if (mask != 0) { + int advance = pg_rightmost_one_pos32(mask); + ptr += advance; + break; + } + + ptr += sizeof(Vector8); + } + } + #endif + while ((c = *ptr) != '\0') { if ((unsigned char) c < (unsigned char) 0x20) @@ -1428,6 +1490,40 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, { const char *tptr = ptr; + #ifndef USE_NO_SIMD + { + const char* end = tptr + strlen(tptr); + + Vector8 delim_mask = vector8_broadcast(delimc); + Vector8 quote_mask = vector8_broadcast(quotec); + Vector8 newline_mask = vector8_broadcast('\n'); + Vector8 carriage_return_mask = vector8_broadcast('\r'); + + while (tptr + sizeof(Vector8) <= end) { + Vector8 chunk; + Vector8 special_mask; + uint32 mask; + + vector8_load(&chunk, (const uint8 *) tptr); + special_mask = vector8_or( + vector8_or(vector8_eq(chunk, delim_mask), + vector8_eq(chunk, quote_mask)), + vector8_or(vector8_eq(chunk, newline_mask), + vector8_eq(chunk, carriage_return_mask)) + ); + + mask = vector8_highbit_mask(special_mask); + if (mask != 0) { + tptr += pg_rightmost_one_pos32(mask); + use_quote = true; + break; + } + + tptr += sizeof(Vector8); + } + } + #endif + while ((c = *tptr) != '\0') { if (c == delimc || c == quotec || c == '\n' || c == '\r') @@ -1451,6 +1547,36 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, * We adopt the same optimization strategy as in CopyAttributeOutText */ start = ptr; + + #ifndef USE_NO_SIMD + { + const char* end = ptr + strlen(ptr); + + Vector8 escape_mask = vector8_broadcast(escapec); + Vector8 quote_mask = vector8_broadcast(quotec); + + while (ptr + sizeof(Vector8) <= end) { + Vector8 chunk; + Vector8 special_mask; + uint32 mask; + + vector8_load(&chunk, (const uint8 *) ptr); + special_mask = vector8_or( + vector8_eq(chunk, escape_mask), + vector8_eq(chunk, quote_mask)); + + mask = vector8_highbit_mask(special_mask); + if (mask != 0) { + ptr += pg_rightmost_one_pos32(mask); + use_quote = true; + break; + } + + ptr += sizeof(Vector8); + } + } + #endif + while ((c = *ptr) != '\0') { if (c == quotec || c == escapec) -- 2.34.1
