Hi, Thanks for the review!
On Thu, 12 Feb 2026 at 01:39, Nathan Bossart <[email protected]> wrote: > > On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote: > > I am sharing a v6 which implements (1). My benchmark results show > > almost no difference for the special-character cases and a nice > > improvement for the no-special-character cases. > > Thanks! > > > + /* Initialize SIMD variables */ > > + cstate->simd_enabled = false; > > + cstate->simd_initialized = false; > > > + /* Initialize SIMD on the first read */ > > + if (unlikely(!cstate->simd_initialized)) > > + { > > + cstate->simd_initialized = true; > > + cstate->simd_enabled = true; > > + } > > Why do we do this initialization in CopyReadLine() as opposed to setting > simd_enabled to true when initializing cstate in BeginCopyFrom()? If we > can initialize it in BeginCopyFrom, we could probably remove > simd_initialized. Correct, I guess this is left over from the earlier versions. > > + if (cstate->simd_enabled) > > + result = CopyReadLineText(cstate, is_csv, true); > > + else > > + result = CopyReadLineText(cstate, is_csv, false); > > I know we discussed this upthread, but I'd like to take a closer look at > this to see whether/why it makes such a big difference. It's a bit awkward > that CopyReadLineText() needs to manage both its local simd_enabled and > cstate->simd_enabled. I extensively benchmarked this with the new v6 version. If I change this to either of: CopyReadLineText(cstate, is_csv); or CopyReadLineText(cstate, is_csv, cstate->simd_enabled); then there is %5-%10 regression for the scalar path. I ran my benchmarks with both "meson --buildtype=debugoptimized" and "meson --buildtype=release" but the result is the same. Also, if I change this code to: if (cstate->simd_enabled) { if (is_csv) result = CopyReadLineText(cstate, true, true); else result = CopyReadLineText(cstate, false, true); } else { if (is_csv) result = CopyReadLineText(cstate, true, false); else result = CopyReadLineText(cstate, false, false); } then I see ~%5 performance improvement in scalar path compared to master. > + /* Load a chunk of data into a vector register */ > + vector8_load(&chunk, (const uint8 *) > ©_input_buf[input_buf_ptr]); > > As mentioned upthread [0], I think it's worth testing whether processing > multiple vectors worth of data in each loop iteration is worthwhile. > > [0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan There are multiple keys in CopyReadLineText() compared to pg_lfind32(). I am not sure if I correctly used multiple vectors but I attached what I did as 0002, could you please look at it? I didn't see any performance benefit in my benchmarks, though. -- Regards, Nazir Bilal Yavuz Microsoft
From c4b29849ad9f87f51022b947a9a0ab695dd1cde2 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz <[email protected]> Date: Fri, 13 Feb 2026 13:28:55 +0300 Subject: [PATCH v7 1/2] Speed up COPY FROM text/CSV parsing using SIMD This patch disables SIMD when SIMD encounters a special character which is neither EOF nor EOL. Author: Shinya Kato <[email protected]> Author: Nazir Bilal Yavuz <[email protected]> Reviewed-by: Kazar Ayoub <[email protected]> Reviewed-by: Nathan Bossart <[email protected]> Reviewed-by: Neil Conway <[email protected]> Reviewed-by: Andrew Dunstan <[email protected]> Reviewed-by: Manni Wood <[email protected]> Reviewed-by: Mark Wong <[email protected]> Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com --- src/backend/commands/copyfrom.c | 3 + src/backend/commands/copyfromparse.c | 125 ++++++++++++++++++++++- src/include/commands/copyfrom_internal.h | 3 + 3 files changed, 126 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 25ee20b23db..40dae0bdacc 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1721,6 +1721,9 @@ BeginCopyFrom(ParseState *pstate, cstate->cur_attval = NULL; cstate->relname_only = false; + /* Initialize SIMD */ + cstate->simd_enabled = true; + /* * Allocate buffers for the input pipeline. * diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 94d6f415a06..4a127d1af90 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -72,6 +72,7 @@ #include "miscadmin.h" #include "pgstat.h" #include "port/pg_bswap.h" +#include "port/simd.h" #include "utils/builtins.h" #include "utils/rel.h" @@ -141,12 +142,14 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ static bool CopyReadLine(CopyFromState cstate, bool is_csv); -static bool CopyReadLineText(CopyFromState cstate, bool is_csv); static int CopyReadAttributesText(CopyFromState cstate); static int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull); +static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, + bool is_csv, + bool simd_enabled); static pg_attribute_always_inline bool CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, @@ -1173,8 +1176,14 @@ CopyReadLine(CopyFromState cstate, bool is_csv) resetStringInfo(&cstate->line_buf); cstate->line_buf_valid = false; - /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate, is_csv); + /* + * Parse data and transfer into line_buf. To benefit from inlining, call + * CopyReadLineText() with constant boolean arguments. + */ + if (cstate->simd_enabled) + result = CopyReadLineText(cstate, is_csv, true); + else + result = CopyReadLineText(cstate, is_csv, false); if (result) { @@ -1241,8 +1250,8 @@ CopyReadLine(CopyFromState cstate, bool is_csv) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static bool -CopyReadLineText(CopyFromState cstate, bool is_csv) +static pg_attribute_always_inline bool +CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled) { char *copy_input_buf; int input_buf_ptr; @@ -1257,6 +1266,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv) char quotec = '\0'; char escapec = '\0'; +#ifndef USE_NO_SIMD + Vector8 nl = vector8_broadcast('\n'); + Vector8 cr = vector8_broadcast('\r'); + Vector8 bs = vector8_broadcast('\\'); + Vector8 quote = vector8_broadcast(0); + Vector8 escape = vector8_broadcast(0); +#endif + if (is_csv) { quotec = cstate->opts.quote[0]; @@ -1264,6 +1281,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv) /* ignore special escape processing if it's the same as quotec */ if (quotec == escapec) escapec = '\0'; + +#ifndef USE_NO_SIMD + quote = vector8_broadcast(quotec); + if (quotec != escapec) + escape = vector8_broadcast(escapec); +#endif } /* @@ -1330,6 +1353,98 @@ CopyReadLineText(CopyFromState cstate, bool is_csv) need_data = false; } +#ifndef USE_NO_SIMD + + /* + * Use SIMD instructions to efficiently scan the input buffer for + * special characters (e.g., newline, carriage return, quote, and + * escape). This is faster than byte-by-byte iteration, especially on + * large buffers. + * + * We do not apply the SIMD fast path in either of the following + * cases: - When the previously processed character was an escape + * character (last_was_esc), since the next byte must be examined + * sequentially. - When the remaining buffer is smaller than one + * vector width (sizeof(Vector8)), since SIMD operates on fixed-size + * chunks. + * + * Note that, SIMD may become slower when the input contains many + * special characters. To avoid this regression, we disable SIMD for + * the rest of the input once we encounter a special character which + * is neither EOF nor EOL. + */ + if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8)) + { + Vector8 chunk; + Vector8 match = vector8_broadcast(0); + uint32 mask; + + /* Load a chunk of data into a vector register */ + vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]); + + if (is_csv) + { + /* \n and \r are not special inside quotes */ + if (!in_quote) + match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); + + match = vector8_or(match, vector8_eq(chunk, quote)); + if (escapec != '\0') + match = vector8_or(match, vector8_eq(chunk, escape)); + } + else + { + match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); + match = vector8_or(match, vector8_eq(chunk, bs)); + } + + /* Check if we found any special characters */ + mask = vector8_highbit_mask(match); + if (mask != 0) + { + /* + * Found a special character. Advance up to that point and let + * the scalar code handle it. + */ + int advance = pg_rightmost_one_pos32(mask); + char c1, + c2; + bool simd_hit_eol, + simd_hit_eof; + + input_buf_ptr += advance; + c1 = copy_input_buf[input_buf_ptr]; + + /* + * Since we stopped within the chunk and ((copy_buf_len - + * input_buf_ptr) > sizeof(Vector8)) is true, + * copy_input_buf[input_buf_ptr + 1] is guaranteed to be + * readable. + */ + c2 = copy_input_buf[input_buf_ptr + 1]; + simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote); + simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv; + + /* + * Do not disable SIMD when we hit EOL or EOF characters. In + * practice, it does not matter for EOF because parsing ends + * there, but we keep the behavior consistent. + */ + if (!(simd_hit_eof || simd_hit_eol)) + { + simd_enabled = false; + cstate->simd_enabled = false; + } + } + else + { + /* No special characters found, so skip the entire chunk */ + input_buf_ptr += sizeof(Vector8); + continue; + } + } +#endif + /* OK to fetch a character */ prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 822ef33cf69..73ce777c52b 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -89,6 +89,9 @@ typedef struct CopyFromStateData const char *cur_attval; /* current att value for error messages */ bool relname_only; /* don't output line number, att, etc. */ + /* SIMD variables */ + bool simd_enabled; + /* * Working state */ -- 2.47.3
From 2de9b5bc18bfa169b3ba3507b6bdf79d277c0ad4 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz <[email protected]> Date: Fri, 13 Feb 2026 13:36:34 +0300 Subject: [PATCH v7 2/2] Use 4 vectors in CopyReadLineText() SIMD --- src/backend/commands/copyfromparse.c | 116 +++++++++++++++++++++------ 1 file changed, 92 insertions(+), 24 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 4a127d1af90..caadc40cc8b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -1361,6 +1361,9 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled) * escape). This is faster than byte-by-byte iteration, especially on * large buffers. * + * For better instruction-level parallelism, we try to process four + * vectors at a time. + * * We do not apply the SIMD fast path in either of the following * cases: - When the previously processed character was an escape * character (last_was_esc), since the next byte must be examined @@ -1373,53 +1376,118 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled) * the rest of the input once we encounter a special character which * is neither EOF nor EOL. */ - if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8)) + if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr >= 4 * sizeof(Vector8)) { - Vector8 chunk; - Vector8 match = vector8_broadcast(0); - uint32 mask; - - /* Load a chunk of data into a vector register */ - vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]); + Vector8 chunk1, + chunk2, + chunk3, + chunk4; + Vector8 match1, + match2, + match3, + match4; + Vector8 tmp1, + tmp2, + result; + + /* Load four chunks of data into vector registers */ + vector8_load(&chunk1, (const uint8 *) ©_input_buf[input_buf_ptr]); + vector8_load(&chunk2, (const uint8 *) ©_input_buf[input_buf_ptr + sizeof(Vector8)]); + vector8_load(&chunk3, (const uint8 *) ©_input_buf[input_buf_ptr + 2 * sizeof(Vector8)]); + vector8_load(&chunk4, (const uint8 *) ©_input_buf[input_buf_ptr + 3 * sizeof(Vector8)]); if (is_csv) { /* \n and \r are not special inside quotes */ if (!in_quote) - match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); + { + match1 = vector8_or(vector8_eq(chunk1, nl), vector8_eq(chunk1, cr)); + match2 = vector8_or(vector8_eq(chunk2, nl), vector8_eq(chunk2, cr)); + match3 = vector8_or(vector8_eq(chunk3, nl), vector8_eq(chunk3, cr)); + match4 = vector8_or(vector8_eq(chunk4, nl), vector8_eq(chunk4, cr)); + } + else + { + match1 = vector8_broadcast(0); + match2 = vector8_broadcast(0); + match3 = vector8_broadcast(0); + match4 = vector8_broadcast(0); + } - match = vector8_or(match, vector8_eq(chunk, quote)); + match1 = vector8_or(match1, vector8_eq(chunk1, quote)); + match2 = vector8_or(match2, vector8_eq(chunk2, quote)); + match3 = vector8_or(match3, vector8_eq(chunk3, quote)); + match4 = vector8_or(match4, vector8_eq(chunk4, quote)); if (escapec != '\0') - match = vector8_or(match, vector8_eq(chunk, escape)); + { + match1 = vector8_or(match1, vector8_eq(chunk1, escape)); + match2 = vector8_or(match2, vector8_eq(chunk2, escape)); + match3 = vector8_or(match3, vector8_eq(chunk3, escape)); + match4 = vector8_or(match4, vector8_eq(chunk4, escape)); + } } else { - match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); - match = vector8_or(match, vector8_eq(chunk, bs)); + match1 = vector8_or(vector8_eq(chunk1, nl), vector8_eq(chunk1, cr)); + match2 = vector8_or(vector8_eq(chunk2, nl), vector8_eq(chunk2, cr)); + match3 = vector8_or(vector8_eq(chunk3, nl), vector8_eq(chunk3, cr)); + match4 = vector8_or(vector8_eq(chunk4, nl), vector8_eq(chunk4, cr)); + + match1 = vector8_or(match1, vector8_eq(chunk1, bs)); + match2 = vector8_or(match2, vector8_eq(chunk2, bs)); + match3 = vector8_or(match3, vector8_eq(chunk3, bs)); + match4 = vector8_or(match4, vector8_eq(chunk4, bs)); } - /* Check if we found any special characters */ - mask = vector8_highbit_mask(match); - if (mask != 0) + /* Combine results to check if any chunk has special characters */ + tmp1 = vector8_or(match1, match2); + tmp2 = vector8_or(match3, match4); + result = vector8_or(tmp1, tmp2); + + if (vector8_is_highbit_set(result)) { /* - * Found a special character. Advance up to that point and let - * the scalar code handle it. + * Found a special character somewhere in the four chunks. + * Identify the first chunk containing it. */ - int advance = pg_rightmost_one_pos32(mask); + uint32 mask; + int advance; char c1, c2; bool simd_hit_eol, simd_hit_eof; + mask = vector8_highbit_mask(match1); + if (mask == 0) + { + input_buf_ptr += sizeof(Vector8); + mask = vector8_highbit_mask(match2); + } + if (mask == 0) + { + input_buf_ptr += sizeof(Vector8); + mask = vector8_highbit_mask(match3); + } + if (mask == 0) + { + input_buf_ptr += sizeof(Vector8); + mask = vector8_highbit_mask(match4); + } + Assert(mask != 0); + + /* + * Found a special character. Advance up to that point and let + * the scalar code handle it. + */ + advance = pg_rightmost_one_pos32(mask); input_buf_ptr += advance; c1 = copy_input_buf[input_buf_ptr]; /* - * Since we stopped within the chunk and ((copy_buf_len - - * input_buf_ptr) > sizeof(Vector8)) is true, - * copy_input_buf[input_buf_ptr + 1] is guaranteed to be - * readable. + * Since we stopped within the block and ((copy_buf_len - + * input_buf_ptr) >= 4 * sizeof(Vector8)) was true at the + * start, copy_input_buf[input_buf_ptr + 1] is guaranteed to + * be readable. */ c2 = copy_input_buf[input_buf_ptr + 1]; simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote); @@ -1438,8 +1506,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled) } else { - /* No special characters found, so skip the entire chunk */ - input_buf_ptr += sizeof(Vector8); + /* No special characters found, so skip the entire block */ + input_buf_ptr += 4 * sizeof(Vector8); continue; } } -- 2.47.3
