Hi, Looking at the profiles in [1], and similar profiles locally, made me wonder why a basic COPY TO shows pg_server_to_any() and the strlen() to compute the length of the to-be-converted string so heavily in profiles. Example profile, for [2]:
- 88.11% 12.02% postgres postgres [.] CopyOneRowTo - 76.09% CopyOneRowTo - 37.24% CopyAttributeOutText + 14.25% __strlen_evex + 2.76% pg_server_to_any + 0.03% 0xffffffff82a00c86 + 31.82% OutputFunctionCall + 2.98% CopySendEndOfRow + 2.75% appendBinaryStringInfo + 0.58% MemoryContextReset + 0.02% 0xffffffff82a00c86 + 12.01% standard_ExecutorRun + 0.02% PostgresMain In the basic cases the client and server encoding should be the same after all, so why do we need to do any conversion? The code has a comment about this: /* * Set up encoding conversion info. Even if the file and server encodings * are the same, we must apply pg_any_to_server() to validate data in * multibyte encodings. */ cstate->need_transcoding = (cstate->file_encoding != GetDatabaseEncoding() || pg_database_encoding_max_length() > 1); I don't really understand why we need to validate anything during COPY TO? Which is good, because it turns out that we don't actually validate anything, as pg_server_to_any() returns without doing anything if the encoding matches: if (encoding == DatabaseEncoding->encoding || encoding == PG_SQL_ASCII) return unconstify(char *, s); /* assume data is valid */ This means that the strlen() we do in the call do pg_server_to_any(), which on its own takes 14.25% of the cycles, computes something that will never be used. Unsurprisingly, only doing transcoding when encodings differ yields a sizable improvement, about 18% for [2]. I haven't yet dug into the code history. One guess is that this should only have been set this way for COPY FROM. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/ZcGE8LrjGW8pmtOf%40paquier.xyz [2] COPY (SELECT 1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2, generate_series(1, 1000000::int4)) TO '/dev/null';